8.3 KiB

Raw Blame History

Step 6 — System Info & Monitoring (Ubuntu 24)

Type along exactly as shown. Safe to run on your machine; nothing here makes persistent config changes (a few optional installs use apt).
Estimated time: ~15–20 minutes

What you’ll learn

Identify OS, kernel, uptime, and hardware at a glance
Inspect CPU, memory, disks/partitions, and network
Watch processes and system load with top/htop, ps, and vmstat
Check running services with systemd (systemctl) and view logs (journalctl)
Measure disk/CPU/IO hot spots with iostat, pidstat, iotop
Run a quick “health sweep” to triage performance issues

Setup: Use your lab folder:
mkdir -p ~/playground && cd ~/playground

0) Snapshot: who, what, where

Start with the fastest high‑level checks.

whoami
hostnamectl          # hostname, OS, kernel
uname -rsm           # kernel: release + machine arch
cat /etc/os-release  # distro details
uptime -p            # human-friendly uptime
uptime               # load averages (1/5/15 min)
date                 # current time (TZ matters!)

Load average rule of thumb: On a 4‑core machine, a 1‑minute load of ~4 means “fully busy”; much higher means queued work.

1) CPU & memory

CPU facts

lscpu                 # sockets, cores, threads, flags
nproc                 # logical CPU count
cat /proc/cpuinfo | grep -m1 'model name'

Memory / swap

free -h               # Mem/Swap usage (human units)
vmstat -s | head -20  # memory counters summary
swapon --show         # active swap areas
cat /proc/meminfo | sed -n '1,10p'

Tip: High si/so (swap‑in/out) in vmstat 1 output usually correlates with memory pressure.

2) Processes, load, and scheduling

`top` basics (built‑in)

top

Useful keys inside top:

1: show per‑CPU cores
M: sort by memory; P: sort by CPU; T: by time
k: kill a process (enter PID, then signal like 15)
Shift+E: change units (KiB/MiB)
q: quit

`htop` (nicer UI)

sudo apt update && sudo apt install -y htop
htop

Point-in-time listings with `ps`

ps aux --sort=-%cpu | head -15
ps aux --sort=-%mem | head -15
pstree -a | head -40 || echo 'Install: sudo apt install -y psmisc'

Niceness & signals (be careful)

nice -n 10 sleep 60 &                   # start lower priority
pid=$!
renice +5 -p "$pid"                    # increase niceness (lower priority)
kill -15 "$pid"                        # graceful
# kill -9 "$pid"  # last resort (commented)

3) Disks, partitions, and filesystem usage

Layout & mounts

lsblk -f               # devices, filesystems, labels
mount | column -t | head -20
findmnt -t ext4,xfs    # mounted FS by type

Space usage

df -hT                 # size, used, type per mount
# top 20 heavy dirs under /
sudo du -xh / | sort -h | tail -20
# top 10 heavy dirs under /var (faster)
sudo du -h -d1 /var | sort -h | tail -10
# inode pressure
df -i

When a disk looks full but df -h shows space free: check inodes with df -i. Lots of tiny files can exhaust inodes.

Disk performance (optional)

sudo apt install -y sysstat iotop
# per-disk stats, queue, utilization
iostat -xz 1 3
# per-process IO (press 'o' to filter active)
sudo iotop -oPa

4) Network quick checks

ip -br a              # brief addresses
ip r                  # routing table
hostname -I           # IPs only
resolvectl status | sed -n '1,40p'   # DNS view

Listening ports and sockets:

ss -tulpn | head -30  # TCP/UDP listeners with PIDs
ss -s                 # socket summary

Connectivity basics:

ping -c 3 8.8.8.8
ping -c 3 google.com
sudo apt install -y mtr-tiny dnsutils
mtr -rwbzc 50 google.com   # quick route quality report
dig +short google.com
curl -I https://example.com

If DNS is flaky: try dig @1.1.1.1 example.com to bypass local resolvers.

5) Services with systemd

List running services and check status:

systemctl list-units --type=service --state=running | head -30
systemctl status ssh
systemctl is-enabled ssh

Restart a service (requires sudo):

sudo systemctl restart ssh

See boot performance & failures:

systemd-analyze time
systemd-analyze blame | head -20
systemctl --failed

6) Logs: journald and friends

Live/system logs:

journalctl -n 200 --no-pager             # last 200 lines
journalctl -f                            # follow
journalctl -p warning --since '1 hour ago'

Service-specific logs:

journalctl -u ssh --since 'today' --no-pager

Kernel messages and last boot:

journalctl -k --since '1 hour ago'
journalctl -b -1 --no-pager              # previous boot

Legacy file logs (some distros still write to these):

sudo tail -n 200 /var/log/syslog
sudo grep -i 'oom' /var/log/kern.log || true

Tip: Use -g PATTERN to grep inside journalctl, e.g., journalctl -g "Out of memory" -k.

7) Resource pressure signals (advanced but handy)

Linux exposes PSI (Pressure Stall Information):

cat /proc/pressure/{cpu,io,memory}

If you see sustained some/full memory or IO pressure, correlate with iostat, vmstat, and logs (possible OOMs or slow disks).

OOM killer evidence:

journalctl -k -g 'Out of memory' --since '1 day ago'
dmesg -T | grep -i 'killed process' | tail -n 10

8) Quick health sweep (copy/paste)

Run this as a one‑shot collection for triage (prints to screen):

{ echo '=== SNAPSHOT ===';
  date; hostnamectl | sed -n '1,8p'; uname -rsm; uptime; echo;
  echo '=== CPU/MEM ==='; lscpu | sed -n '1,8p'; free -h; vmstat 1 3; echo;
  echo '=== DISK ==='; df -hT; iostat -xz 1 2 2>/dev/null || true; echo;
  echo '=== TOP PROCS ==='; ps aux --sort=-%cpu | head -10; ps aux --sort=-%mem | head -10; echo;
  echo '=== NET ==='; ip -br a; ss -tulpn | head -20; echo;
  echo '=== SERVICES ==='; systemctl --failed || true; systemd-analyze time || true; echo;
  echo '=== LOGS (last 50) ==='; journalctl -n 50 --no-pager;
} | sed 's/\x1b\[[0-9;]*m//g'

Note: Some parts need packages (sysstat) or privileges; missing tools will be gracefully skipped.

9) Practice tasks (do these now)

Find your CPU core/thread count and current load.
Hint: lscpu, uptime, top (1 key).
Identify the top 5 memory‑hungry processes and the top 5 CPU‑hungry processes.
Hint: ps aux --sort=-%mem / --sort=-%cpu.
Determine which directory under /var consumes the most space.
Hint: sudo du -h -d1 /var | sort -h | tail -5.
List all listening TCP sockets with owning PIDs.
Hint: ss -tulpn.
Check logs for ssh in the last hour and restart the service.
Hint: journalctl -u ssh --since '1 hour ago', then sudo systemctl restart ssh.
(Optional) Install sysstat and run iostat -xz 1 3; identify any device >90% util.

10) Troubleshooting quick guide

High load avg with low CPU → usually IO wait or many blocked procs. Check iostat -xz, ps with STAT column (D = uninterruptible IO sleep).
Memory spikes / OOM → check journalctl -k -g 'Out of memory', watch free -h, consider which process grew via ps --sort=-rss.
Disk full → use df -hT; then du to find culprits; also check inodes (df -i).
Service down → systemctl status <svc> then _journalctl -u <svc>_ to see the why; look for ExecStart errors, missing ports.
No DNS resolution → resolvectl status, try dig @1.1.1.1 example.com; if that works, local resolver is suspect.
Network seems fine but web fails → check outbound firewall/proxy, test raw IP curl -I http://1.1.1.1.

11) Quick quiz (1 minute)

What do the three numbers in uptime represent?
Which tool shows per‑disk queue and utilization quickly?
How do you list services that failed on boot?
One command to show listening sockets with PIDs?
Where do you look for OOM killer events?

Answers: 1/5/15‑min load avgs; iostat -xz; systemctl --failed; ss -tulpn; journalctl -k -g 'Out of memory' (or dmesg).

Next Step

Proceed to Step 7 — Users & Authentication to manage local users, groups, passwords, and SSH hardening.

8.3 KiB Raw Blame History Unescape Escape