8.5 KiB
Step 8 — Archiving & Compression (Ubuntu 24)
Type along exactly as shown. Nothing here alters system config. Optional installs use
apt.
Estimated time: ~20–25 minutes
What you’ll learn
- The difference between archiving (tar) and compressing (gzip/xz/zstd/zip/7z)
- Create, list, verify, extract tar archives with and without compression
- Choose the right compressor for speed vs size (gzip, xz, zstd)
- Include/exclude paths, preserve ownership/permissions, and handle symlinks and sparse files
- Stream archives via pipes and SSH; split and rejoin large archives
- Verify integrity with
tar -tvf/-Wandsha256sum
Setup: Work in a safe playground:
mkdir -p ~/playground/arch && cd ~/playground/arch
0) Prepare sample data
Create a small tree with different file types:
mkdir -p src dirA/dirB
printf 'hello\n' > src/hello.txt
head -c 1M </dev/urandom > src/random.bin
ln -s ../src hello_link # symlink
# create a sparse file (looks big, uses little disk)
truncate -s 500M src/sparse.img
# hidden file, and a file to exclude later
printf 'secret\n' > .env
printf 'ignore me\n' > src/tmp.log
Confirm structure:
find . -maxdepth 3 -printf '%M %u:%g %8s %p\n' | sed -n '1,40p'
1) Archiving 101 — tar without compression
Create an archive of src/ and dirA/:
# c=create, v=verbose, f=filename
tar -cvf lab.tar src dirA
ls -lh lab.tar
List contents without extracting:
tar -tvf lab.tar | head -20
Extract into a new folder and compare:
mkdir extract && tar -xvf lab.tar -C extract
diff -qr src extract/src && echo 'src matches extract/src'
Note:
tarstores paths relative to where you run it; use-Cto control base paths.
2) Compressors quick tour
Install common tools:
sudo apt update && sudo apt install -y zstd xz-utils gzip zip unzip p7zip-full
Single‑file compression (no tar):
# gzip (fast, widespread)
gzip -k src/hello.txt # keeps original with -k
ls -lh src/hello.txt*
# xz (smallest, slower)
xz -k src/random.bin
# zstd (balanced, very fast; levels 1–19; -T0 = all cores)
zstd -k -T0 src/random.bin
Inspect compression ratios:
gzip -l src/hello.txt.gz
xz -l src/random.bin.xz
zstd -lv src/random.bin.zst
Rule of thumb:
zstdis great default for speed;xzfor maximum shrink (archives) when time is okay;gzipfor legacy compatibility.
3) Tar + compression combos
Easiest: let tar drive the compressor
# gzip
tar -czvf lab.tar.gz src dirA
# xz
tar -cJvf lab.tar.xz src dirA
# zstd (modern)
tar --zstd -cvf lab.tar.zst src dirA
List without extracting:
tar -tvf lab.tar.gz | head -10
tar -tvf lab.tar.xz | head -10
tar -tvf lab.tar.zst | head -10
Custom compressor flags with -I
# zstd level 19, all cores
tar -I 'zstd -T0 -19' -cvf lab-max.tar.zst src dirA
# parallel gzip if `pigz` is installed
# sudo apt install -y pigz
tar -I pigz -cvf lab.tar.gz src dirA
4) Extracting safely
Basic extraction into a target directory:
mkdir -p /tmp/lab_extract
sudo tar -xvpf lab.tar.zst --zstd -C /tmp/lab_extract
Flags used:
xextract,vverbose,ppreserve perms,ffilename--same-owner(root only) to preserve ownership exactly
Strip leading path components (handy when archive contains a top‑level folder):
mkdir clean && tar -xvf lab.tar.gz --strip-components=1 -C clean
Extract one item:
tar -xvf lab.tar.zst --zstd src/hello.txt -C extract_one
5) Excludes, includes, and quoting
Create an archive but exclude logs and hidden files:
tar --exclude='*.log' --exclude='.*' -cvf lab_nohidden.tar src dirA
tar -tvf lab_nohidden.tar | grep -E '\.log|/\.' || echo 'No logs/hidden files included'
Use an exclude‑from file (one pattern per line):
printf '*.log\n.env\n*.tmp\n' > exclude.txt
tar --exclude-from=exclude.txt -cvf lab_filtered.tar src dirA
Quoting tip: Quote globs (
'*.log') so your shell doesn’t expand them beforetarsees them.
6) Sparse files & symlinks
The sample src/sparse.img is sparse. Use --sparse to store holes efficiently:
tar --sparse -cvf sparse.tar src/sparse.img
ls -lh sparse.tar
Control how symlinks are handled:
# default: store symlink as link (recommended)
tar -cvf links.tar hello_link
# follow symlinks (stores target content)
tar -h -cvf links_follow.tar hello_link
tar -tvf links_follow.tar | head -3
7) Streaming: pipes & SSH
Create and compress on the fly, no temp file:
tar -c src | zstd -T0 -19 -o src.tar.zst
Send to a remote host over SSH (requires SSH access):
# on local machine
tar -c src | ssh user@remote 'tar -x -C /tmp'
# or with compression on the wire
tar -c src | zstd -T0 | ssh user@remote 'zstd -d | tar -x -C /tmp'
8) Integrity: list, verify, checksum
List is your first sanity check:
tar -tvf lab.tar.zst --zstd | head -5
Ask tar to verify after writing (-W):
tar --zstd -cvWf lab_verify.tar.zst src dirA
Create a strong checksum alongside the archive:
sha256sum lab_verify.tar.zst > lab_verify.tar.zst.sha256
sha256sum -c lab_verify.tar.zst.sha256 # verify later
9) Split & rejoin large archives
Split into ~200 MiB parts:
tar --zstd -cvf big.tar.zst src dirA
split -b 200M big.tar.zst big.tar.zst.part-
ls -lh big.tar.zst.part-*
Rejoin and verify:
cat big.tar.zst.part-* > big.rejoined.tar.zst
cmp big.tar.zst big.rejoined.tar.zst && echo 'Parts rejoined OK'
10) zip and 7z (cross‑platform)
zip
zip -r lab.zip src dirA . -x '*.log' .env
unzip -l lab.zip | head -10
unzip lab.zip -d unzip_out
7‑Zip / 7z (high ratio, solid archives)
7z a -t7z -m0=lzma2 -mx=9 lab.7z src dirA
7z l lab.7z | head -15
7z x lab.7z -o7z_out -y
11) Ownership, perms, and umask
tarrecords modes, owners, groups, times. Extraction as non‑root maps owners to your user unless you usesudo+--same-owner.- Use
-pto preserve permissions even if yourumaskwould change them. - For shared/team archives, consider setting a consistent
umaskbefore creating, e.g.,umask 022.
12) Clean up (optional)
rm -rf extract extract_one clean unzip_out 7z_out *.tar* *.gz *.xz *.zst *.zip *.7z *.sha256 big.rejoined.tar.zst src dirA hello_link exclude.txt .env
13) Practice tasks (do these now)
- Create
proj.tar.zstfromsrc/excluding*.logand hidden files. List its contents. - Extract only
src/hello.txtinto~/playground/arch/single/and verify checksum withsha256sum. - Make a sparse 1 GiB file and build a space‑efficient archive of it; compare
.tarsize with and without--sparse. - Stream‑extract
src/to/tmp/arch_stream/without creating a local archive file. - Split
proj.tar.zstinto 100 MiB parts, rejoin, and verify withcmp. - (Optional) Create
lab.zipandlab.7z, list both, extract to separate folders, and compare with the original tree usingdiff -qr.
14) Troubleshooting
- “file changed as we read it” during tar: the file mutated mid‑archive; rerun when idle, or snapshot first.
- Permissions wrong after extract: add
-pand (if root)--same-owner; checkumask. - Symlinks unexpectedly dereferenced: you probably used
-h; remove it to store links as links. - Archive too slow: use
zstd -T0(multi‑threaded) orpigzfor gzip; avoidxzfor huge trees if you care about speed. - Disk full: prefer streaming (
tar | zstd -o /mnt/big/…) or-Cto write on a larger filesystem; check free space withdf -h.
15) Quick quiz (1 minute)
- Does
tarcompress by default? - Which compressor is fastest on multi‑core systems at decent ratios?
- Which
taroption keeps original file modes on extract? - How do you exclude all hidden files?
- What’s the safest way to copy to a remote host without writing a local archive file?
Answers: No, tar only archives unless you add a compressor; zstd -T0; -p (and --same-owner when root); --exclude='.*'; tar -c dir | ssh host 'tar -x -C /dest' (optionally compress in the middle).
Next Step
Proceed to Step 9 — Users & Authentication (local users, groups, passwords, SSH basics). If your curriculum orders differ, update the previous step’s “Next Step” pointer to this page.