Process management
Creates processes (fork), loads programs (execve), schedules CPU time, tracks PIDs and exit codes, delivers signals.
ps aux·kill -TERM 1234/proc/<pid>/status
User-space programs never touch hardware directly. They ask the kernel through a narrow, stable interface—system calls. Understanding the layers and subsystems explains why strace shows what your app actually does, why containers share a kernel, and why a misconfigured ulimit can crash production.
Every Unix-like system stacks the same abstractions. Your shell command travels down through applications and the shell, crosses into the kernel, and only then reaches devices.
Applications — nginx, PostgreSQL, your Python API, git. They use libc and syscalls; they cannot access raw hardware.
Shell — bash, zsh, sh. Parses commands, expands globs, wires pipes, launches programs via fork + exec.
Kernel — scheduler, memory, VFS, drivers, network stack. Runs in privileged mode; enforces permissions and resource limits.
Hardware — CPU, RAM, SSD/NVMe, NIC, GPU, USB. The kernel's job is to multiplex these safely among processes.
Pro Tip: Docker containers swap Layer 4 apps and often Layer 3 tools—but they share the host kernel (Layer 2). That is why a container running Linux binaries cannot run natively on a Windows kernel without a VM or WSL.
When you run cat /etc/hosts, the shell forks a child, execs /bin/cat, and cat calls open(), read(), and write() syscalls. The kernel's virtual file system (VFS) maps /etc/hosts to an inode on disk (or a virtual file in /proc).
Inside the kernel, subsystems cooperate through internal APIs. User programs see only the syscall boundary.
USER SPACE KERNEL SPACE
┌─────────────────────┐ ┌──────────────────────────────────────┐
│ apps: nginx, ssh, │ │ SYSTEM CALL TABLE │
│ python, docker cli │ │ open read write fork exec wait exit … │
└──────────┬──────────┘ └───────────────────┬──────────────────────┘
│ │
┌──────────▼──────────┐ ┌───────────────────▼──────────────────────┐
│ libc / runtime │ trap │ ┌────────────┐ ┌─────────────────────┐ │
│ (glibc, musl) │ ──────► │ │ Scheduler │ │ Memory management │ │
└─────────────────────┘ │ │ (processes)│ │ (pages, mmap, swap) │ │
│ └─────┬──────┘ └──────────┬──────────┘ │
│ │ │ │
│ ┌─────▼──────┐ ┌──────────▼──────────┐ │
│ │ VFS layer │ │ Networking stack │ │
│ │ ext4, proc │ │ sockets, TCP/IP │ │
│ └─────┬──────┘ └──────────┬──────────┘ │
│ │ │ │
│ ┌─────▼────────────────────▼──────────┐ │
│ │ Block / char device drivers │ │
│ │ NVMe · NIC · GPU · tty · /dev/* │ │
│ └───────────────────┬─────────────────┘ │
└──────────────────────┼────────────────────┘
│
┌──────────────────────▼────────────────────┐
│ HARDWARE: CPU · RAM · disk · network │
└───────────────────────────────────────────┘
The system call table is the contract between user space and the kernel. Linux exposes hundreds of syscalls; POSIX defines a portable subset. glibc wraps raw syscalls with friendlier functions— you call fopen(), glibc calls open().
Each subsystem owns a slice of kernel responsibility. Failures in one often surface elsewhere—a memory leak becomes swap thrashing; a full disk blocks writes to logs.
Creates processes (fork), loads programs (execve), schedules CPU time, tracks PIDs and exit codes, delivers signals.
ps aux · kill -TERM 1234/proc/<pid>/statusVirtual address spaces per process, page tables, demand paging, mmap, swap, OOM killer when RAM is exhausted.
free -h · vmstat 1/proc/meminfoUnified interface over ext4, xfs, tmpfs, procfs, sysfs. Paths, inodes, permissions, directory operations—all via the same syscalls.
ls -li /var/logmount | column -tKernel modules that speak to hardware. Block devices (disks) and char devices (terminals, /dev/null) appear as files.
lsblk · lspcidmesg | tailSockets, TCP/UDP, routing, iptables/nftables, DNS resolution from the app's perspective via libc. Packets enter/leave through NIC drivers.
ss -tlnp · ip addrcurl -v https://example.comPro Tip: /proc and /sys are not "real" disk directories—they are kernel-generated views. Reading /proc/cpuinfo asks the kernel to format live CPU data, not open a static file.
The CPU runs in different privilege rings. User programs cannot read arbitrary memory, poke device registers, or disable interrupts—that would compromise every process on the machine.
| Aspect | User space | Kernel space |
|---|---|---|
| Privilege | Restricted (ring 3 on x86) | Full (ring 0) |
| Memory | Own virtual address space; segfault on illegal access | All physical RAM; maps into user space on demand |
| Crash impact | Single process dies | Kernel panic → whole system down |
| Examples | bash, node, postgres |
Scheduler, ext4 driver, TCP stack |
| Entry mechanism | — | Syscall, interrupt, exception |
A context switch between user and kernel mode has a cost—one reason high-throughput servers batch I/O and use epoll instead of one thread per connection. Infra engineers tune this boundary; developers feel it when profiling shows time spent in syscalls.
A system call is a controlled gateway. The process loads arguments into registers (or a struct), executes a special instruction (syscall on x86-64), and the kernel validates permissions before doing the work.
| Syscall | Purpose | Typical use |
|---|---|---|
open |
Open or create a file, return a file descriptor | Opening config files, logs, sockets (via special paths) |
read |
Read bytes from a file descriptor into a buffer | Reading stdin, config, network sockets |
write |
Write bytes from a buffer to a file descriptor | stdout, log files, pipe to another process |
fork |
Clone the current process; child gets copy-on-write memory | Shell launching every external command |
execve |
Replace process image with a new program | Running /bin/ls after fork |
wait |
Block until a child exits; collect exit status | Shell waiting for pipeline to finish |
exit |
Terminate the process with an exit code | Clean shutdown; codes 0 = success, non-zero = error |
These C snippets strip away libc convenience to show the raw syscalls. In production you would use fopen and error handling—but interviews and debugging often start here.
Every cat file.txt reduces to opening a path, reading chunks, writing to fd 1 (stdout).
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#define BUF_SIZE 4096
int main(void) {
int fd = open("/etc/hostname", O_RDONLY);
if (fd < 0) { perror("open"); return 1; }
char buf[BUF_SIZE];
ssize_t n;
while ((n = read(fd, buf, BUF_SIZE)) > 0) {
if (write(STDOUT_FILENO, buf, n) != n) {
perror("write");
close(fd);
return 1;
}
}
close(fd);
return 0;
}
# Same operation from the shell — watch the syscalls
strace -e trace=open,openat,read,write cat /etc/hostname 2>&1 | head -20
When you type ls, bash forks itself, then the child execs /bin/ls. The parent calls wait unless the command is backgrounded.
bash (PID 1000) kernel
│ │
│ fork() │
├─────────────────────────────────►│ creates child PID 1001 (copy of bash)
│ │
│ execve("/bin/ls", ...) │ (in child 1001)
│ ├──────► loads /bin/ls, replaces memory
│ wait(&status) │
│ │ ls: open(".") read write×N
│◄─────────────────────────────────┤ child exits code 0
│ prompt returns │
#include <unistd.h>
#include <sys/wait.h>
#include <stdio.h>
int main(void) {
pid_t pid = fork();
if (pid < 0) { perror("fork"); return 1; }
if (pid == 0) {
/* Child: replace image with ls */
char *argv[] = { "/bin/ls", "-la", NULL };
execve("/bin/ls", argv, NULL);
perror("execve"); /* only reached if exec fails */
return 127;
}
/* Parent: wait for child */
int status;
waitpid(pid, &status, 0);
if (WIFEXITED(status))
printf("ls exited with code %d\n", WEXITSTATUS(status));
return 0;
}
# Trace fork, exec, and wait for a simple command
strace -f -e trace=fork,execve,wait4,clone ls /tmp 2>&1 | head -30
Warning: Forking in loops without wait creates zombie processes (exited children still in the process table). Production daemons and init systems reap zombies; runaway fork bombs can exhaust PIDs and take down a host.
You do not need to write C to see architecture in action. These tools bridge theory and daily work.
# Why is nginx slow to start? Count syscalls during startup
strace -c nginx -t 2>&1
# What files does a Python import touch?
strace -e trace=openat,stat python3 -c "import django" 2>&1 | tail -20
# Live view of a process's open files
ls -l /proc/$(pgrep -n nginx)/fd/
Pro Tip: strace -c gives a summary histogram—perfect for interviews when asked "how would you debug a slow startup?" Mention syscalls, then narrow with -e trace=openat.