Unix Architecture

User-space programs never touch hardware directly. They ask the kernel through a narrow, stable interface—system calls. Understanding the layers and subsystems explains why strace shows what your app actually does, why containers share a kernel, and why a misconfigured ulimit can crash production.

dev infra

The four layers

Every Unix-like system stacks the same abstractions. Your shell command travels down through applications and the shell, crosses into the kernel, and only then reaches devices.

  1. Layer 4

    Applications — nginx, PostgreSQL, your Python API, git. They use libc and syscalls; they cannot access raw hardware.

  2. Layer 3

    Shell — bash, zsh, sh. Parses commands, expands globs, wires pipes, launches programs via fork + exec.

  3. Layer 2

    Kernel — scheduler, memory, VFS, drivers, network stack. Runs in privileged mode; enforces permissions and resource limits.

  4. Layer 1

    Hardware — CPU, RAM, SSD/NVMe, NIC, GPU, USB. The kernel's job is to multiplex these safely among processes.

Pro Tip: Docker containers swap Layer 4 apps and often Layer 3 tools—but they share the host kernel (Layer 2). That is why a container running Linux binaries cannot run natively on a Windows kernel without a VM or WSL.

When you run cat /etc/hosts, the shell forks a child, execs /bin/cat, and cat calls open(), read(), and write() syscalls. The kernel's virtual file system (VFS) maps /etc/hosts to an inode on disk (or a virtual file in /proc).

Kernel architecture diagram

Inside the kernel, subsystems cooperate through internal APIs. User programs see only the syscall boundary.

The system call table is the contract between user space and the kernel. Linux exposes hundreds of syscalls; POSIX defines a portable subset. glibc wraps raw syscalls with friendlier functions— you call fopen(), glibc calls open().

Kernel subsystems

Each subsystem owns a slice of kernel responsibility. Failures in one often surface elsewhere—a memory leak becomes swap thrashing; a full disk blocks writes to logs.

Process management

Creates processes (fork), loads programs (execve), schedules CPU time, tracks PIDs and exit codes, delivers signals.

  • ps aux · kill -TERM 1234
  • /proc/<pid>/status
dev sysadmin

Memory management

Virtual address spaces per process, page tables, demand paging, mmap, swap, OOM killer when RAM is exhausted.

  • free -h · vmstat 1
  • /proc/meminfo
infra

File system (VFS)

Unified interface over ext4, xfs, tmpfs, procfs, sysfs. Paths, inodes, permissions, directory operations—all via the same syscalls.

  • ls -li /var/log
  • mount | column -t
sysadmin

Device drivers

Kernel modules that speak to hardware. Block devices (disks) and char devices (terminals, /dev/null) appear as files.

  • lsblk · lspci
  • dmesg | tail
infra

Networking stack

Sockets, TCP/UDP, routing, iptables/nftables, DNS resolution from the app's perspective via libc. Packets enter/leave through NIC drivers.

  • ss -tlnp · ip addr
  • curl -v https://example.com
infra dev

Pro Tip: /proc and /sys are not "real" disk directories—they are kernel-generated views. Reading /proc/cpuinfo asks the kernel to format live CPU data, not open a static file.

User space vs kernel space

The CPU runs in different privilege rings. User programs cannot read arbitrary memory, poke device registers, or disable interrupts—that would compromise every process on the machine.

Aspect User space Kernel space
Privilege Restricted (ring 3 on x86) Full (ring 0)
Memory Own virtual address space; segfault on illegal access All physical RAM; maps into user space on demand
Crash impact Single process dies Kernel panic → whole system down
Examples bash, node, postgres Scheduler, ext4 driver, TCP stack
Entry mechanism Syscall, interrupt, exception

A context switch between user and kernel mode has a cost—one reason high-throughput servers batch I/O and use epoll instead of one thread per connection. Infra engineers tune this boundary; developers feel it when profiling shows time spent in syscalls.

System call interface

A system call is a controlled gateway. The process loads arguments into registers (or a struct), executes a special instruction (syscall on x86-64), and the kernel validates permissions before doing the work.

How a syscall works (simplified)

  1. Application calls a libc wrapper (e.g. write(fd, buf, n)).
  2. libc places the syscall number and arguments in CPU registers.
  3. CPU traps into kernel mode; kernel dispatches via the syscall table.
  4. Kernel performs the operation (or returns -1 with errno).
  5. Return value is copied back to user space; libc returns to your code.
Syscall Purpose Typical use
open Open or create a file, return a file descriptor Opening config files, logs, sockets (via special paths)
read Read bytes from a file descriptor into a buffer Reading stdin, config, network sockets
write Write bytes from a buffer to a file descriptor stdout, log files, pipe to another process
fork Clone the current process; child gets copy-on-write memory Shell launching every external command
execve Replace process image with a new program Running /bin/ls after fork
wait Block until a child exits; collect exit status Shell waiting for pipeline to finish
exit Terminate the process with an exit code Clean shutdown; codes 0 = success, non-zero = error

Examples: open, read, write, fork, exec

These C snippets strip away libc convenience to show the raw syscalls. In production you would use fopen and error handling—but interviews and debugging often start here.

open, read, write — copy a file

Every cat file.txt reduces to opening a path, reading chunks, writing to fd 1 (stdout).

c
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

#define BUF_SIZE 4096

int main(void) {
    int fd = open("/etc/hostname", O_RDONLY);
    if (fd < 0) { perror("open"); return 1; }

    char buf[BUF_SIZE];
    ssize_t n;
    while ((n = read(fd, buf, BUF_SIZE)) > 0) {
        if (write(STDOUT_FILENO, buf, n) != n) {
            perror("write");
            close(fd);
            return 1;
        }
    }
    close(fd);
    return 0;
}
bash
# Same operation from the shell — watch the syscalls
strace -e trace=open,openat,read,write cat /etc/hostname 2>&1 | head -20

fork + execve — how the shell runs a command

When you type ls, bash forks itself, then the child execs /bin/ls. The parent calls wait unless the command is backgrounded.

c
#include <unistd.h>
#include <sys/wait.h>
#include <stdio.h>

int main(void) {
    pid_t pid = fork();
    if (pid < 0) { perror("fork"); return 1; }

    if (pid == 0) {
        /* Child: replace image with ls */
        char *argv[] = { "/bin/ls", "-la", NULL };
        execve("/bin/ls", argv, NULL);
        perror("execve");  /* only reached if exec fails */
        return 127;
    }

    /* Parent: wait for child */
    int status;
    waitpid(pid, &status, 0);
    if (WIFEXITED(status))
        printf("ls exited with code %d\n", WEXITSTATUS(status));
    return 0;
}
bash
# Trace fork, exec, and wait for a simple command
strace -f -e trace=fork,execve,wait4,clone ls /tmp 2>&1 | head -30

Warning: Forking in loops without wait creates zombie processes (exited children still in the process table). Production daemons and init systems reap zombies; runaway fork bombs can exhaust PIDs and take down a host.

Observing syscalls in practice

You do not need to write C to see architecture in action. These tools bridge theory and daily work.

bash
# Why is nginx slow to start? Count syscalls during startup
strace -c nginx -t 2>&1

# What files does a Python import touch?
strace -e trace=openat,stat python3 -c "import django" 2>&1 | tail -20

# Live view of a process's open files
ls -l /proc/$(pgrep -n nginx)/fd/

Pro Tip: strace -c gives a summary histogram—perfect for interviews when asked "how would you debug a slow startup?" Mention syscalls, then narrow with -e trace=openat.