Process Management

What is a process?

The kernel assigns each process a unique PID, tracks parent/child relationships (PPID), and switches the CPU between processes in time slices.

Concept	Meaning
PID	Process ID—integer handle for signals, `kill`, `/proc/<pid>`
PPID	Parent PID; shell is parent of commands it launches
UID / GID	Effective user/group for permission checks
Thread	Lightweight unit within a process; shares address space
Session / process group	Job control grouping; terminal sends signals to the foreground group

# Current shell's PID; parent of commands you run
echo $$

# Process tree — who spawned whom
ps -ef --forest | head -20
pstree -p $$

Lifecycle: fork, exec, wait, exit

Unix does not spawn a fresh process from scratch for each command. It clones the parent, then optionally replaces the child's memory with a new program.

fork — Kernel duplicates the calling process. Child gets a new PID; memory is copy-on-write.
exec — Child (or same process) loads a new executable; PID unchanged, image replaced.
run — Scheduler runs the process in user mode until I/O, timer, or signal.
exit — Process calls exit(n) or is killed; kernel frees most resources.
wait — Parent collects exit status; kernel removes zombie entry from process table.

  PARENT (bash)                         CHILD
       │                                    │
       │ fork()                             │
       ├───────────────────────────────────►│ copy of bash, new PID
       │                                    │
       │ (parent continues)                 │ execve("/bin/ls", ...)
       │                                    ├────► now running ls
       │ wait() blocks                      │ runs, writes stdout
       │                                    │ exit(0)
       │◄───────────────────────────────────┤ status available
       │ wait returns, shell prompt         │ (zombie until wait)
       ▼                                    ▼

Exit codes matter in scripts and CI: 0 = success, non-zero = failure. Shell variable $? holds the last foreground command's exit code.

grep -q "error" /var/log/app.log
echo $?                    # 0 if found, 1 if not

false; echo $?             # 1
true; echo $?              # 0

# Propagate failure in a pipeline (bash)
set -o pipefail
curl -f https://api.example.com/health || exit 1

See Architecture → fork/exec examples for C-level syscalls and strace.

Process states

In ps output, the STAT column abbreviates state. Knowing them explains "why is my process stuck?" and "why are there defunct entries?"

R · Running

Running / runnable

On CPU or ready to run. High CPU here means hot loops or heavy compute—not always "healthy."

dev infra

S · Sleeping

Interruptible sleep

Waiting for an event (disk, network, lock). Normal for idle servers and blocked I/O. D is uninterruptible sleep (often I/O)—harder to kill.

dev

T · Stopped

Stopped (job control)

Paused by SIGSTOP, SIGTSTP (Ctrl+Z), or debugger. Resume with fg / kill -CONT.

dev

Z · Zombie

Zombie (defunct)

Exited but parent has not waited. Consumes a PID slot, not memory. Fix the parent or restart the service.

sysadmin

# STAT column: state + optional flags (+ foreground group, < high priority, etc.)
ps aux | awk '$8 ~ /Z/ {print}'    # zombies only

# Full state for one PID
cat /proc/1234/status | grep -E '^(Name|State|PPid):'

Pro Tip: A process in D state (uninterruptible sleep) often means stuck NFS or failing disk. Reboot may be the only fix—killing won't work until the kernel completes the I/O.

Foreground vs background

The terminal attaches a controlling session. One process group runs in the foreground (receives keyboard input and signals like Ctrl+C); others run in the background.

Mechanism	Behavior
`command &`	Start in background; shell does not wait; job number printed
Ctrl+Z	Suspend foreground job (`SIGTSTP`)
`fg` / `bg`	Resume job in foreground or background
`jobs`	List shell's active/stopped jobs
`nohup cmd &`	Ignore hangup; keep running after SSH disconnect (stdout/stderr to `nohup.out`)
`disown`	Remove job from shell's table so hangup does not kill it

# Long build in background while you keep using the shell
npm run build > build.log 2>&1 &
jobs -l
fg %1                      # bring job 1 to foreground

# Accidentally started in foreground — suspend and background
# Ctrl+Z
bg
disown -h %1               # survive terminal close

# Better than nohup for servers: systemd, tmux, or screen
tmux new -s deploy
# ... run deploy; detach with Ctrl+b d

Pro Tip: For remote work, prefer tmux or screen over raw nohup—you get reattachable sessions and scrollback. dev infra

Signals

Signals are asynchronous notifications to a process. Default actions: terminate, stop, continue, or ignore. Well-behaved daemons trap SIGTERM for graceful shutdown; SIGKILL cannot be caught.

Signal	Default	Typical use
`SIGHUP` (1)	Terminate	Terminal disconnected; daemons reload config on HUP (`nginx -s reload`)
`SIGINT` (2)	Terminate	Ctrl+C in terminal
`SIGQUIT` (3)	Terminate + core dump	Ctrl+\ — debug crashes
`SIGKILL` (9)	Terminate (forced)	Last resort; process cannot handle or block it
`SIGTERM` (15)	Terminate	Polite shutdown—`systemctl stop`, `docker stop` default
`SIGTSTP` (20)	Stop	Ctrl+Z job control
`SIGSTOP` (19)	Stop	Cannot be ignored; debugger pause
`SIGCONT` (18)	Continue	Resume after stop
`SIGCHLD` (17)	Ignore	Parent notified when child exits (handled by libc/shell)

# Graceful then force (same as many orchestrators)
kill -TERM 1234
sleep 5
kill -0 1234 2>/dev/null && kill -KILL 1234

# By name — sends SIGTERM by default
pkill -f "gunicorn master"
killall nginx

# List signal names
kill -l

Warning: kill -9 / SIGKILL skips cleanup—corrupt DB pages, half-written files, orphaned locks. Use SIGTERM first and wait. kill -9 1 (init) is catastrophic—do not experiment on production.

systemd and signal mapping

systemctl stop unit sends SIGTERM, waits TimeoutStopSec, then SIGKILL. Reload often maps to SIGHUP (systemctl reload nginx).

systemctl show nginx -p KillSignal -p RestartKillSignal
journalctl -u nginx -b --no-pager | tail -20

Inspecting and controlling processes

Day-to-day tools map directly to kernel process structures and cgroup limits.

ps

Snapshot of processes. aux for BSD style, -ef for POSIX. Filter with grep or -C comm.

ps aux --sort=-%mem | head
ps -p 1234 -o pid,ppid,cmd,%cpu,%mem

sysadmin

top / htop

Live view sorted by CPU or memory. htop adds tree view and mouse—better for interactive triage.

top -o %CPU
htop -u deploy

infra

/proc

Per-PID virtual files: cmdline, environ, fd, limits, cgroup.

cat /proc/1234/cmdline | tr '\0' ' '
ls -l /proc/1234/fd

dev

nice / renice

Lower priority (higher nice value) so batch jobs do not starve interactive work.

nice -n 10 tar czf backup.tgz /data
renice +5 -p 1234

infra

ulimit

Per-shell resource limits (open files, core size). systemd sets limits per unit too.

ulimit -n
ulimit -a

sysadmin

# What's listening on port 8080?
ss -tlnp | grep 8080
# or
lsof -i :8080

# Thread count for a Java service
ps -o nlwp= -p $(pgrep -f 'java.*myapp')

# OOM score — who gets killed first when RAM is gone
cat /proc/1234/oom_score_adj   # -1000 to 1000

Zombies, orphans, and pitfalls

Misunderstanding parent/child lifecycle causes mystery PIDs and runaway process tables.

Zombie processes

Child exited; parent never called wait. Zombie row stays until parent exits or reaps children. Fix: restart the parent service, or patch the buggy daemon to handle SIGCHLD.

Orphan processes

Parent died first; child adopted by PID 1 (systemd/init), which normally reaps them. Long-running orphans under PID 1 are fine; zombies under a broken parent are not.

Fork bombs and PID exhaustion

Warning: A script that forks in a loop without exiting can exhaust PIDs and freeze the host. cgroup pids.max and ulimit -u limit damage—configure them on shared CI runners.

# Find zombies and their parents
ps -eo stat,pid,ppid,cmd | awk '$1 ~ /Z/ {print}'

# Parent of zombie PID 9999
ps -o ppid= -p 9999 | xargs ps -p

# Don't kill -9 PID 1 on Linux
ps -p 1 -o comm=

Pro Tip: In Kubernetes, "stuck terminating" pods are often SIGTERM ignored—check terminationGracePeriodSeconds then SIGKILL. Same semantics as bare metal, different tooling.