Split a file into smaller chunks

Published 23 July 2020 13:08 (2-minute read)

To limit the size of files being between our servers we chunk them before sending over the internet. The servers will make more requests but the it's easier to retry the smaller chunks than a full 50GB file.

We make use of the command "split" & "cat" to make the chunks for sending files.

1. split the files in smaller chunks

With the following command we split the file in chunks of 100mb.

split -b 100m "my_file.zip" "my_file_chunks.zip."

For the manual of the split command you can visit: man7.org/linux/man-pages/man1/split.1.html

2. process the small files (for example, upload them with a http request)

We upload the files to a remote server, that may looks like this:

curl -F 'file=@/path/my_file_chunks.zip.aa' https://localhost/upload

3. combine the files

After you processed all the files you need to combine them to 1 file, this can be done by using the "cat" command.

cat my_file_chunks.zip.* > my_combined_file.zip

"my_combined_file.zip" will contain all data from the "my_file_chunks.zip.*" files after it's done processing.

It's possible to do this for all files at once or combine it in a script that process each file as a unique file.

4. validate the out come of the process

Want to be sure the file is complete? Then you can add a checksum for the file. This can be done by running the following command before step 1 & after step 4.

Create a checksum of the original file:

cat my_file.zip | md5

Create a checksum of the combined file:

cat my_combined_file.zip | md5

Validate the checksum from the original file with the combined file:

#!/bin/bash

# 0. make a checksum of the original file
checksumOriginalFile=$(cat my_file.zip | md5)

# 1.1 split the files
split -b 100m "my_file.zip" "my_file_chunks.zip."

# 1.2 check all files that are generated
ls my_file_chunks.zip.*

# 2. upload/download your files from the remote server/client
#todo

# 3 combine the files into a file
cat my_file_chunks.zip.* > my_combined_file.zip

# 4. validate the files are equal by generating an checksum
checksumCombinedFile=$(cat my_combined_file.zip | md5)

if [ $checksumOriginalFile == $checksumCombinedFile ]
then
    echo "File checksum match."
else
    echo "File checksum does NOT match!!!"
fi
Robin Dirksen

Follow me on Twitter, there I post web-related content, tips/tricks, and other interesting things.

On my blog, you can find articles that I've found useful or wanted to share with anyone else.

If you want to know more about this article or just want to talk to me, don't hesitate to reach out.