Split a file into smaller chunks
Published 23 July 2020 13:08 (2-minute read)
To limit the size of files being between our servers we chunk them before sending over the internet. The servers will make more requests but the it's easier to retry the smaller chunks than a full 50GB file.
We make use of the command "split" & "cat" to make the chunks for sending files.
1. split the files in smaller chunks
With the following command we split the file in chunks of 100mb.
split -b 100m "my_file.zip" "my_file_chunks.zip."
For the manual of the split command you can visit: man7.org/linux/man-pages/man1/split.1.html
2. process the small files (for example, upload them with a http request)
We upload the files to a remote server, that may looks like this:
curl -F 'file=@/path/my_file_chunks.zip.aa' https://localhost/upload
3. combine the files
After you processed all the files you need to combine them to 1 file, this can be done by using the "cat" command.
cat my_file_chunks.zip.* > my_combined_file.zip
"my_combined_file.zip" will contain all data from the "my_file_chunks.zip.*" files after it's done processing.
It's possible to do this for all files at once or combine it in a script that process each file as a unique file.
4. validate the out come of the process
Want to be sure the file is complete? Then you can add a checksum for the file. This can be done by running the following command before step 1 & after step 4.
Create a checksum of the original file:
cat my_file.zip | md5
Create a checksum of the combined file:
cat my_combined_file.zip | md5
Validate the checksum from the original file with the combined file:
#!/bin/bash
# 0. make a checksum of the original file
checksumOriginalFile=$(cat my_file.zip | md5)
# 1.1 split the files
split -b 100m "my_file.zip" "my_file_chunks.zip."
# 1.2 check all files that are generated
ls my_file_chunks.zip.*
# 2. upload/download your files from the remote server/client
#todo
# 3 combine the files into a file
cat my_file_chunks.zip.* > my_combined_file.zip
# 4. validate the files are equal by generating an checksum
checksumCombinedFile=$(cat my_combined_file.zip | md5)
if [ $checksumOriginalFile == $checksumCombinedFile ]
then
echo "File checksum match."
else
echo "File checksum does NOT match!!!"
fi