dcd9fedd-5509-4f32-8754-e48.../docs/06_system_info.md

8.3 KiB
Raw Permalink Blame History

Step 6 — System Info & Monitoring (Ubuntu 24)

Type along exactly as shown. Safe to run on your machine; nothing here makes persistent config changes (a few optional installs use apt).
Estimated time: ~1520 minutes


What youll learn

  • Identify OS, kernel, uptime, and hardware at a glance
  • Inspect CPU, memory, disks/partitions, and network
  • Watch processes and system load with top/htop, ps, and vmstat
  • Check running services with systemd (systemctl) and view logs (journalctl)
  • Measure disk/CPU/IO hot spots with iostat, pidstat, iotop
  • Run a quick “health sweep” to triage performance issues

Setup: Use your lab folder:

mkdir -p ~/playground && cd ~/playground

0) Snapshot: who, what, where

Start with the fastest highlevel checks.

whoami
hostnamectl          # hostname, OS, kernel
uname -rsm           # kernel: release + machine arch
cat /etc/os-release  # distro details
uptime -p            # human-friendly uptime
uptime               # load averages (1/5/15 min)
date                 # current time (TZ matters!)

Load average rule of thumb: On a 4core machine, a 1minute load of ~4 means “fully busy”; much higher means queued work.


1) CPU & memory

CPU facts

lscpu                 # sockets, cores, threads, flags
nproc                 # logical CPU count
cat /proc/cpuinfo | grep -m1 'model name'

Memory / swap

free -h               # Mem/Swap usage (human units)
vmstat -s | head -20  # memory counters summary
swapon --show         # active swap areas
cat /proc/meminfo | sed -n '1,10p'

Tip: High si/so (swapin/out) in vmstat 1 output usually correlates with memory pressure.


2) Processes, load, and scheduling

top basics (builtin)

top

Useful keys inside top:

  • 1: show perCPU cores
  • M: sort by memory; P: sort by CPU; T: by time
  • k: kill a process (enter PID, then signal like 15)
  • Shift+E: change units (KiB/MiB)
  • q: quit

htop (nicer UI)

sudo apt update && sudo apt install -y htop
htop

Point-in-time listings with ps

ps aux --sort=-%cpu | head -15
ps aux --sort=-%mem | head -15
pstree -a | head -40 || echo 'Install: sudo apt install -y psmisc'

Niceness & signals (be careful)

nice -n 10 sleep 60 &                   # start lower priority
pid=$!
renice +5 -p "$pid"                    # increase niceness (lower priority)
kill -15 "$pid"                        # graceful
# kill -9 "$pid"  # last resort (commented)

3) Disks, partitions, and filesystem usage

Layout & mounts

lsblk -f               # devices, filesystems, labels
mount | column -t | head -20
findmnt -t ext4,xfs    # mounted FS by type

Space usage

df -hT                 # size, used, type per mount
# top 20 heavy dirs under /
sudo du -xh / | sort -h | tail -20
# top 10 heavy dirs under /var (faster)
sudo du -h -d1 /var | sort -h | tail -10
# inode pressure
df -i

When a disk looks full but df -h shows space free: check inodes with df -i. Lots of tiny files can exhaust inodes.

Disk performance (optional)

sudo apt install -y sysstat iotop
# per-disk stats, queue, utilization
iostat -xz 1 3
# per-process IO (press 'o' to filter active)
sudo iotop -oPa

4) Network quick checks

ip -br a              # brief addresses
ip r                  # routing table
hostname -I           # IPs only
resolvectl status | sed -n '1,40p'   # DNS view

Listening ports and sockets:

ss -tulpn | head -30  # TCP/UDP listeners with PIDs
ss -s                 # socket summary

Connectivity basics:

ping -c 3 8.8.8.8
ping -c 3 google.com
sudo apt install -y mtr-tiny dnsutils
mtr -rwbzc 50 google.com   # quick route quality report
dig +short google.com
curl -I https://example.com

If DNS is flaky: try dig @1.1.1.1 example.com to bypass local resolvers.


5) Services with systemd

List running services and check status:

systemctl list-units --type=service --state=running | head -30
systemctl status ssh
systemctl is-enabled ssh

Restart a service (requires sudo):

sudo systemctl restart ssh

See boot performance & failures:

systemd-analyze time
systemd-analyze blame | head -20
systemctl --failed

6) Logs: journald and friends

Live/system logs:

journalctl -n 200 --no-pager             # last 200 lines
journalctl -f                            # follow
journalctl -p warning --since '1 hour ago'

Service-specific logs:

journalctl -u ssh --since 'today' --no-pager

Kernel messages and last boot:

journalctl -k --since '1 hour ago'
journalctl -b -1 --no-pager              # previous boot

Legacy file logs (some distros still write to these):

sudo tail -n 200 /var/log/syslog
sudo grep -i 'oom' /var/log/kern.log || true

Tip: Use -g PATTERN to grep inside journalctl, e.g., journalctl -g "Out of memory" -k.


7) Resource pressure signals (advanced but handy)

Linux exposes PSI (Pressure Stall Information):

cat /proc/pressure/{cpu,io,memory}

If you see sustained some/full memory or IO pressure, correlate with iostat, vmstat, and logs (possible OOMs or slow disks).

OOM killer evidence:

journalctl -k -g 'Out of memory' --since '1 day ago'
dmesg -T | grep -i 'killed process' | tail -n 10

8) Quick health sweep (copy/paste)

Run this as a oneshot collection for triage (prints to screen):

{ echo '=== SNAPSHOT ===';
  date; hostnamectl | sed -n '1,8p'; uname -rsm; uptime; echo;
  echo '=== CPU/MEM ==='; lscpu | sed -n '1,8p'; free -h; vmstat 1 3; echo;
  echo '=== DISK ==='; df -hT; iostat -xz 1 2 2>/dev/null || true; echo;
  echo '=== TOP PROCS ==='; ps aux --sort=-%cpu | head -10; ps aux --sort=-%mem | head -10; echo;
  echo '=== NET ==='; ip -br a; ss -tulpn | head -20; echo;
  echo '=== SERVICES ==='; systemctl --failed || true; systemd-analyze time || true; echo;
  echo '=== LOGS (last 50) ==='; journalctl -n 50 --no-pager;
} | sed 's/\x1b\[[0-9;]*m//g'

Note: Some parts need packages (sysstat) or privileges; missing tools will be gracefully skipped.


9) Practice tasks (do these now)

  1. Find your CPU core/thread count and current load.
    Hint: lscpu, uptime, top (1 key).
  2. Identify the top 5 memoryhungry processes and the top 5 CPUhungry processes.
    Hint: ps aux --sort=-%mem / --sort=-%cpu.
  3. Determine which directory under /var consumes the most space.
    Hint: sudo du -h -d1 /var | sort -h | tail -5.
  4. List all listening TCP sockets with owning PIDs.
    Hint: ss -tulpn.
  5. Check logs for ssh in the last hour and restart the service.
    Hint: journalctl -u ssh --since '1 hour ago', then sudo systemctl restart ssh.
  6. (Optional) Install sysstat and run iostat -xz 1 3; identify any device >90% util.

10) Troubleshooting quick guide

  • High load avg with low CPU → usually IO wait or many blocked procs. Check iostat -xz, ps with STAT column (D = uninterruptible IO sleep).
  • Memory spikes / OOM → check journalctl -k -g 'Out of memory', watch free -h, consider which process grew via ps --sort=-rss.
  • Disk full → use df -hT; then du to find culprits; also check inodes (df -i).
  • Service downsystemctl status <svc> then _journalctl -u <svc>_ to see the why; look for ExecStart errors, missing ports.
  • No DNS resolutionresolvectl status, try dig @1.1.1.1 example.com; if that works, local resolver is suspect.
  • Network seems fine but web fails → check outbound firewall/proxy, test raw IP curl -I http://1.1.1.1.

11) Quick quiz (1 minute)

  • What do the three numbers in uptime represent?
  • Which tool shows perdisk queue and utilization quickly?
  • How do you list services that failed on boot?
  • One command to show listening sockets with PIDs?
  • Where do you look for OOM killer events?

Answers: 1/5/15min load avgs; iostat -xz; systemctl --failed; ss -tulpn; journalctl -k -g 'Out of memory' (or dmesg).


Next Step

Proceed to Step 7 — Users & Authentication to manage local users, groups, passwords, and SSH hardening.