# Step 6 — System Info & Monitoring (Ubuntu 24) > **Type along** exactly as shown. Safe to run on your machine; nothing here makes persistent config changes (a few optional installs use `apt`). > **Estimated time:** ~15–20 minutes --- ## What you’ll learn - Identify OS, kernel, uptime, and hardware at a glance - Inspect **CPU**, **memory**, **disks/partitions**, and **network** - Watch processes and system load with `top`/`htop`, `ps`, and `vmstat` - Check running **services** with `systemd` (`systemctl`) and view **logs** (`journalctl`) - Measure disk/CPU/IO hot spots with `iostat`, `pidstat`, `iotop` - Run a quick “health sweep” to triage performance issues > **Setup:** Use your lab folder: > ```bash > mkdir -p ~/playground && cd ~/playground > ``` --- ## 0) Snapshot: who, what, where Start with the fastest high‑level checks. ```bash whoami hostnamectl # hostname, OS, kernel uname -rsm # kernel: release + machine arch cat /etc/os-release # distro details uptime -p # human-friendly uptime uptime # load averages (1/5/15 min) date # current time (TZ matters!) ``` > **Load average rule of thumb:** On a 4‑core machine, a 1‑minute load of ~4 means “fully busy”; much higher means queued work. --- ## 1) CPU & memory ### CPU facts ```bash lscpu # sockets, cores, threads, flags nproc # logical CPU count cat /proc/cpuinfo | grep -m1 'model name' ``` ### Memory / swap ```bash free -h # Mem/Swap usage (human units) vmstat -s | head -20 # memory counters summary swapon --show # active swap areas cat /proc/meminfo | sed -n '1,10p' ``` > **Tip:** High `si/so` (swap‑in/out) in `vmstat 1` output usually correlates with memory pressure. --- ## 2) Processes, load, and scheduling ### `top` basics (built‑in) ```bash top ``` Useful keys inside **top**: - `1`: show per‑CPU cores - `M`: sort by memory; `P`: sort by CPU; `T`: by time - `k`: kill a process (enter PID, then signal like `15`) - `Shift+E`: change units (KiB/MiB) - `q`: quit ### `htop` (nicer UI) ```bash sudo apt update && sudo apt install -y htop htop ``` ### Point-in-time listings with `ps` ```bash ps aux --sort=-%cpu | head -15 ps aux --sort=-%mem | head -15 pstree -a | head -40 || echo 'Install: sudo apt install -y psmisc' ``` ### Niceness & signals (be careful) ```bash nice -n 10 sleep 60 & # start lower priority pid=$! renice +5 -p "$pid" # increase niceness (lower priority) kill -15 "$pid" # graceful # kill -9 "$pid" # last resort (commented) ``` --- ## 3) Disks, partitions, and filesystem usage ### Layout & mounts ```bash lsblk -f # devices, filesystems, labels mount | column -t | head -20 findmnt -t ext4,xfs # mounted FS by type ``` ### Space usage ```bash df -hT # size, used, type per mount # top 20 heavy dirs under / sudo du -xh / | sort -h | tail -20 # top 10 heavy dirs under /var (faster) sudo du -h -d1 /var | sort -h | tail -10 # inode pressure df -i ``` > **When a disk looks full but `df -h` shows space free:** check **inodes** with `df -i`. Lots of tiny files can exhaust inodes. ### Disk performance (optional) ```bash sudo apt install -y sysstat iotop # per-disk stats, queue, utilization iostat -xz 1 3 # per-process IO (press 'o' to filter active) sudo iotop -oPa ``` --- ## 4) Network quick checks ```bash ip -br a # brief addresses ip r # routing table hostname -I # IPs only resolvectl status | sed -n '1,40p' # DNS view ``` Listening ports and sockets: ```bash ss -tulpn | head -30 # TCP/UDP listeners with PIDs ss -s # socket summary ``` Connectivity basics: ```bash ping -c 3 8.8.8.8 ping -c 3 google.com sudo apt install -y mtr-tiny dnsutils mtr -rwbzc 50 google.com # quick route quality report dig +short google.com curl -I https://example.com ``` > **If DNS is flaky:** try `dig @1.1.1.1 example.com` to bypass local resolvers. --- ## 5) Services with systemd List running services and check status: ```bash systemctl list-units --type=service --state=running | head -30 systemctl status ssh systemctl is-enabled ssh ``` Restart a service (requires sudo): ```bash sudo systemctl restart ssh ``` See boot performance & failures: ```bash systemd-analyze time systemd-analyze blame | head -20 systemctl --failed ``` --- ## 6) Logs: journald and friends Live/system logs: ```bash journalctl -n 200 --no-pager # last 200 lines journalctl -f # follow journalctl -p warning --since '1 hour ago' ``` Service-specific logs: ```bash journalctl -u ssh --since 'today' --no-pager ``` Kernel messages and last boot: ```bash journalctl -k --since '1 hour ago' journalctl -b -1 --no-pager # previous boot ``` Legacy file logs (some distros still write to these): ```bash sudo tail -n 200 /var/log/syslog sudo grep -i 'oom' /var/log/kern.log || true ``` > **Tip:** Use `-g PATTERN` to grep inside `journalctl`, e.g., `journalctl -g "Out of memory" -k`. --- ## 7) Resource pressure signals (advanced but handy) Linux exposes PSI (Pressure Stall Information): ```bash cat /proc/pressure/{cpu,io,memory} ``` If you see sustained **some**/**full** memory or IO pressure, correlate with `iostat`, `vmstat`, and logs (possible OOMs or slow disks). OOM killer evidence: ```bash journalctl -k -g 'Out of memory' --since '1 day ago' dmesg -T | grep -i 'killed process' | tail -n 10 ``` --- ## 8) Quick health sweep (copy/paste) Run this as a one‑shot collection for triage (prints to screen): ```bash { echo '=== SNAPSHOT ==='; date; hostnamectl | sed -n '1,8p'; uname -rsm; uptime; echo; echo '=== CPU/MEM ==='; lscpu | sed -n '1,8p'; free -h; vmstat 1 3; echo; echo '=== DISK ==='; df -hT; iostat -xz 1 2 2>/dev/null || true; echo; echo '=== TOP PROCS ==='; ps aux --sort=-%cpu | head -10; ps aux --sort=-%mem | head -10; echo; echo '=== NET ==='; ip -br a; ss -tulpn | head -20; echo; echo '=== SERVICES ==='; systemctl --failed || true; systemd-analyze time || true; echo; echo '=== LOGS (last 50) ==='; journalctl -n 50 --no-pager; } | sed 's/\x1b\[[0-9;]*m//g' ``` > **Note:** Some parts need packages (`sysstat`) or privileges; missing tools will be gracefully skipped. --- ## 9) Practice tasks (do these now) 1) Find your CPU core/thread count and current load. *Hint:* `lscpu`, `uptime`, `top` (`1` key). 2) Identify the **top 5** memory‑hungry processes and the **top 5** CPU‑hungry processes. *Hint:* `ps aux --sort=-%mem` / `--sort=-%cpu`. 3) Determine which directory under `/var` consumes the most space. *Hint:* `sudo du -h -d1 /var | sort -h | tail -5`. 4) List all listening TCP sockets with owning PIDs. *Hint:* `ss -tulpn`. 5) Check logs for **ssh** in the last hour and restart the service. *Hint:* `journalctl -u ssh --since '1 hour ago'`, then `sudo systemctl restart ssh`. 6) (Optional) Install `sysstat` and run `iostat -xz 1 3`; identify any device >90% util. --- ## 10) Troubleshooting quick guide - **High load avg with low CPU** → usually **IO wait** or many blocked procs. Check `iostat -xz`, `ps` with `STAT` column (`D` = uninterruptible IO sleep). - **Memory spikes / OOM** → check `journalctl -k -g 'Out of memory'`, watch `free -h`, consider which process grew via `ps --sort=-rss`. - **Disk full** → use `df -hT`; then `du` to find culprits; also check **inodes** (`df -i`). - **Service down** → `systemctl status ` then `_journalctl -u _` to see the why; look for ExecStart errors, missing ports. - **No DNS resolution** → `resolvectl status`, try `dig @1.1.1.1 example.com`; if that works, local resolver is suspect. - **Network seems fine but web fails** → check outbound firewall/proxy, test raw IP `curl -I http://1.1.1.1`. --- ## 11) Quick quiz (1 minute) - What do the **three** numbers in `uptime` represent? - Which tool shows **per‑disk** queue and utilization quickly? - How do you list services that **failed** on boot? - One command to show **listening** sockets with PIDs? - Where do you look for **OOM killer** events? **Answers:** 1/5/15‑min load avgs; `iostat -xz`; `systemctl --failed`; `ss -tulpn`; `journalctl -k -g 'Out of memory'` (or `dmesg`). --- ## Next Step Proceed to **Step 7 — Users & Authentication** to manage local users, groups, passwords, and SSH hardening.