Process Management

A running program is a process: an address space, open file descriptors, environment variables, and a place in the scheduler's queue. Shells, containers, and systemd all orchestrate the same primitives— fork, exec, wait, and signals.

dev sysadmin

What is a process?

The kernel assigns each process a unique PID, tracks parent/child relationships (PPID), and switches the CPU between processes in time slices.

Concept Meaning
PID Process ID—integer handle for signals, kill, /proc/<pid>
PPID Parent PID; shell is parent of commands it launches
UID / GID Effective user/group for permission checks
Thread Lightweight unit within a process; shares address space
Session / process group Job control grouping; terminal sends signals to the foreground group
bash
# Current shell's PID; parent of commands you run
echo $$

# Process tree — who spawned whom
ps -ef --forest | head -20
pstree -p $$

Lifecycle: fork, exec, wait, exit

Unix does not spawn a fresh process from scratch for each command. It clones the parent, then optionally replaces the child's memory with a new program.

  1. fork — Kernel duplicates the calling process. Child gets a new PID; memory is copy-on-write.
  2. exec — Child (or same process) loads a new executable; PID unchanged, image replaced.
  3. run — Scheduler runs the process in user mode until I/O, timer, or signal.
  4. exit — Process calls exit(n) or is killed; kernel frees most resources.
  5. wait — Parent collects exit status; kernel removes zombie entry from process table.

Exit codes matter in scripts and CI: 0 = success, non-zero = failure. Shell variable $? holds the last foreground command's exit code.

bash
grep -q "error" /var/log/app.log
echo $?                    # 0 if found, 1 if not

false; echo $?             # 1
true; echo $?              # 0

# Propagate failure in a pipeline (bash)
set -o pipefail
curl -f https://api.example.com/health || exit 1

See Architecture → fork/exec examples for C-level syscalls and strace.

Process states

In ps output, the STAT column abbreviates state. Knowing them explains "why is my process stuck?" and "why are there defunct entries?"

R · Running

Running / runnable

On CPU or ready to run. High CPU here means hot loops or heavy compute—not always "healthy."

dev infra
S · Sleeping

Interruptible sleep

Waiting for an event (disk, network, lock). Normal for idle servers and blocked I/O. D is uninterruptible sleep (often I/O)—harder to kill.

dev
T · Stopped

Stopped (job control)

Paused by SIGSTOP, SIGTSTP (Ctrl+Z), or debugger. Resume with fg / kill -CONT.

dev
Z · Zombie

Zombie (defunct)

Exited but parent has not waited. Consumes a PID slot, not memory. Fix the parent or restart the service.

sysadmin
bash
# STAT column: state + optional flags (+ foreground group, < high priority, etc.)
ps aux | awk '$8 ~ /Z/ {print}'    # zombies only

# Full state for one PID
cat /proc/1234/status | grep -E '^(Name|State|PPid):'

Pro Tip: A process in D state (uninterruptible sleep) often means stuck NFS or failing disk. Reboot may be the only fix—killing won't work until the kernel completes the I/O.

Foreground vs background

The terminal attaches a controlling session. One process group runs in the foreground (receives keyboard input and signals like Ctrl+C); others run in the background.

Mechanism Behavior
command & Start in background; shell does not wait; job number printed
Ctrl+Z Suspend foreground job (SIGTSTP)
fg / bg Resume job in foreground or background
jobs List shell's active/stopped jobs
nohup cmd & Ignore hangup; keep running after SSH disconnect (stdout/stderr to nohup.out)
disown Remove job from shell's table so hangup does not kill it
bash
# Long build in background while you keep using the shell
npm run build > build.log 2>&1 &
jobs -l
fg %1                      # bring job 1 to foreground

# Accidentally started in foreground — suspend and background
# Ctrl+Z
bg
disown -h %1               # survive terminal close

# Better than nohup for servers: systemd, tmux, or screen
tmux new -s deploy
# ... run deploy; detach with Ctrl+b d

Pro Tip: For remote work, prefer tmux or screen over raw nohup—you get reattachable sessions and scrollback. dev infra

Signals

Signals are asynchronous notifications to a process. Default actions: terminate, stop, continue, or ignore. Well-behaved daemons trap SIGTERM for graceful shutdown; SIGKILL cannot be caught.

Signal Default Typical use
SIGHUP (1) Terminate Terminal disconnected; daemons reload config on HUP (nginx -s reload)
SIGINT (2) Terminate Ctrl+C in terminal
SIGQUIT (3) Terminate + core dump Ctrl+\ — debug crashes
SIGKILL (9) Terminate (forced) Last resort; process cannot handle or block it
SIGTERM (15) Terminate Polite shutdown—systemctl stop, docker stop default
SIGTSTP (20) Stop Ctrl+Z job control
SIGSTOP (19) Stop Cannot be ignored; debugger pause
SIGCONT (18) Continue Resume after stop
SIGCHLD (17) Ignore Parent notified when child exits (handled by libc/shell)
bash
# Graceful then force (same as many orchestrators)
kill -TERM 1234
sleep 5
kill -0 1234 2>/dev/null && kill -KILL 1234

# By name — sends SIGTERM by default
pkill -f "gunicorn master"
killall nginx

# List signal names
kill -l

Warning: kill -9 / SIGKILL skips cleanup—corrupt DB pages, half-written files, orphaned locks. Use SIGTERM first and wait. kill -9 1 (init) is catastrophic—do not experiment on production.

systemd and signal mapping

systemctl stop unit sends SIGTERM, waits TimeoutStopSec, then SIGKILL. Reload often maps to SIGHUP (systemctl reload nginx).

bash
systemctl show nginx -p KillSignal -p RestartKillSignal
journalctl -u nginx -b --no-pager | tail -20

Inspecting and controlling processes

Day-to-day tools map directly to kernel process structures and cgroup limits.

ps

Snapshot of processes. aux for BSD style, -ef for POSIX. Filter with grep or -C comm.

  • ps aux --sort=-%mem | head
  • ps -p 1234 -o pid,ppid,cmd,%cpu,%mem
sysadmin

top / htop

Live view sorted by CPU or memory. htop adds tree view and mouse—better for interactive triage.

  • top -o %CPU
  • htop -u deploy
infra

/proc

Per-PID virtual files: cmdline, environ, fd, limits, cgroup.

  • cat /proc/1234/cmdline | tr '\0' ' '
  • ls -l /proc/1234/fd
dev

nice / renice

Lower priority (higher nice value) so batch jobs do not starve interactive work.

  • nice -n 10 tar czf backup.tgz /data
  • renice +5 -p 1234
infra

ulimit

Per-shell resource limits (open files, core size). systemd sets limits per unit too.

  • ulimit -n
  • ulimit -a
sysadmin
bash
# What's listening on port 8080?
ss -tlnp | grep 8080
# or
lsof -i :8080

# Thread count for a Java service
ps -o nlwp= -p $(pgrep -f 'java.*myapp')

# OOM score — who gets killed first when RAM is gone
cat /proc/1234/oom_score_adj   # -1000 to 1000

Zombies, orphans, and pitfalls

Misunderstanding parent/child lifecycle causes mystery PIDs and runaway process tables.

Zombie processes

Child exited; parent never called wait. Zombie row stays until parent exits or reaps children. Fix: restart the parent service, or patch the buggy daemon to handle SIGCHLD.

Orphan processes

Parent died first; child adopted by PID 1 (systemd/init), which normally reaps them. Long-running orphans under PID 1 are fine; zombies under a broken parent are not.

Fork bombs and PID exhaustion

Warning: A script that forks in a loop without exiting can exhaust PIDs and freeze the host. cgroup pids.max and ulimit -u limit damage—configure them on shared CI runners.

bash
# Find zombies and their parents
ps -eo stat,pid,ppid,cmd | awk '$1 ~ /Z/ {print}'

# Parent of zombie PID 9999
ps -o ppid= -p 9999 | xargs ps -p

# Don't kill -9 PID 1 on Linux
ps -p 1 -o comm=

Pro Tip: In Kubernetes, "stuck terminating" pods are often SIGTERM ignored—check terminationGracePeriodSeconds then SIGKILL. Same semantics as bare metal, different tooling.