Unix Architecture

The four layers

Every Unix-like system stacks the same abstractions. Your shell command travels down through applications and the shell, crosses into the kernel, and only then reaches devices.

Layer 4
Applications — nginx, PostgreSQL, your Python API, git. They use libc and syscalls; they cannot access raw hardware.
Layer 3
Shell — bash, zsh, sh. Parses commands, expands globs, wires pipes, launches programs via fork + exec.
Layer 2
Kernel — scheduler, memory, VFS, drivers, network stack. Runs in privileged mode; enforces permissions and resource limits.
Layer 1
Hardware — CPU, RAM, SSD/NVMe, NIC, GPU, USB. The kernel's job is to multiplex these safely among processes.

Pro Tip: Docker containers swap Layer 4 apps and often Layer 3 tools—but they share the host kernel (Layer 2). That is why a container running Linux binaries cannot run natively on a Windows kernel without a VM or WSL.

When you run cat /etc/hosts, the shell forks a child, execs /bin/cat, and cat calls open(), read(), and write() syscalls. The kernel's virtual file system (VFS) maps /etc/hosts to an inode on disk (or a virtual file in /proc).

Kernel architecture diagram

Inside the kernel, subsystems cooperate through internal APIs. User programs see only the syscall boundary.

  USER SPACE                          KERNEL SPACE
 ┌─────────────────────┐         ┌──────────────────────────────────────┐
 │  apps: nginx, ssh,  │         │              SYSTEM CALL TABLE         │
 │  python, docker cli │         │  open read write fork exec wait exit … │
 └──────────┬──────────┘         └───────────────────┬──────────────────────┘
            │                                        │
 ┌──────────▼──────────┐         ┌───────────────────▼──────────────────────┐
 │  libc / runtime     │  trap   │  ┌────────────┐  ┌─────────────────────┐  │
 │  (glibc, musl)      │ ──────► │  │ Scheduler  │  │ Memory management   │  │
 └─────────────────────┘         │  │ (processes)│  │ (pages, mmap, swap) │  │
                                 │  └─────┬──────┘  └──────────┬──────────┘  │
                                 │        │                    │             │
                                 │  ┌─────▼──────┐  ┌──────────▼──────────┐  │
                                 │  │ VFS layer  │  │ Networking stack    │  │
                                 │  │ ext4, proc │  │ sockets, TCP/IP     │  │
                                 │  └─────┬──────┘  └──────────┬──────────┘  │
                                 │        │                    │             │
                                 │  ┌─────▼────────────────────▼──────────┐  │
                                 │  │     Block / char device drivers     │  │
                                 │  │  NVMe · NIC · GPU · tty · /dev/*    │  │
                                 │  └───────────────────┬─────────────────┘  │
                                 └──────────────────────┼────────────────────┘
                                                        │
                                 ┌──────────────────────▼────────────────────┐
                                 │  HARDWARE: CPU · RAM · disk · network     │
                                 └───────────────────────────────────────────┘

The system call table is the contract between user space and the kernel. Linux exposes hundreds of syscalls; POSIX defines a portable subset. glibc wraps raw syscalls with friendlier functions— you call fopen(), glibc calls open().

Kernel subsystems

Each subsystem owns a slice of kernel responsibility. Failures in one often surface elsewhere—a memory leak becomes swap thrashing; a full disk blocks writes to logs.

Process management

Creates processes (fork), loads programs (execve), schedules CPU time, tracks PIDs and exit codes, delivers signals.

ps aux · kill -TERM 1234
/proc/<pid>/status

dev sysadmin

Memory management

Virtual address spaces per process, page tables, demand paging, mmap, swap, OOM killer when RAM is exhausted.

free -h · vmstat 1
/proc/meminfo

infra

File system (VFS)

Unified interface over ext4, xfs, tmpfs, procfs, sysfs. Paths, inodes, permissions, directory operations—all via the same syscalls.

ls -li /var/log
mount | column -t

sysadmin

Device drivers

Kernel modules that speak to hardware. Block devices (disks) and char devices (terminals, /dev/null) appear as files.

lsblk · lspci
dmesg | tail

infra

Networking stack

Sockets, TCP/UDP, routing, iptables/nftables, DNS resolution from the app's perspective via libc. Packets enter/leave through NIC drivers.

ss -tlnp · ip addr
curl -v https://example.com

infra dev

Pro Tip: /proc and /sys are not "real" disk directories—they are kernel-generated views. Reading /proc/cpuinfo asks the kernel to format live CPU data, not open a static file.

User space vs kernel space

The CPU runs in different privilege rings. User programs cannot read arbitrary memory, poke device registers, or disable interrupts—that would compromise every process on the machine.

Aspect	User space	Kernel space
Privilege	Restricted (ring 3 on x86)	Full (ring 0)
Memory	Own virtual address space; segfault on illegal access	All physical RAM; maps into user space on demand
Crash impact	Single process dies	Kernel panic → whole system down
Examples	`bash`, `node`, `postgres`	Scheduler, ext4 driver, TCP stack
Entry mechanism	—	Syscall, interrupt, exception

A context switch between user and kernel mode has a cost—one reason high-throughput servers batch I/O and use epoll instead of one thread per connection. Infra engineers tune this boundary; developers feel it when profiling shows time spent in syscalls.

System call interface

A system call is a controlled gateway. The process loads arguments into registers (or a struct), executes a special instruction (syscall on x86-64), and the kernel validates permissions before doing the work.

How a syscall works (simplified)

Application calls a libc wrapper (e.g. write(fd, buf, n)).
libc places the syscall number and arguments in CPU registers.
CPU traps into kernel mode; kernel dispatches via the syscall table.
Kernel performs the operation (or returns -1 with errno).
Return value is copied back to user space; libc returns to your code.

Syscall	Purpose	Typical use
`open`	Open or create a file, return a file descriptor	Opening config files, logs, sockets (via special paths)
`read`	Read bytes from a file descriptor into a buffer	Reading stdin, config, network sockets
`write`	Write bytes from a buffer to a file descriptor	stdout, log files, pipe to another process
`fork`	Clone the current process; child gets copy-on-write memory	Shell launching every external command
`execve`	Replace process image with a new program	Running `/bin/ls` after fork
`wait`	Block until a child exits; collect exit status	Shell waiting for pipeline to finish
`exit`	Terminate the process with an exit code	Clean shutdown; codes 0 = success, non-zero = error

Examples: open, read, write, fork, exec

These C snippets strip away libc convenience to show the raw syscalls. In production you would use fopen and error handling—but interviews and debugging often start here.

open, read, write — copy a file

Every cat file.txt reduces to opening a path, reading chunks, writing to fd 1 (stdout).

#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

#define BUF_SIZE 4096

int main(void) {
    int fd = open("/etc/hostname", O_RDONLY);
    if (fd < 0) { perror("open"); return 1; }

    char buf[BUF_SIZE];
    ssize_t n;
    while ((n = read(fd, buf, BUF_SIZE)) > 0) {
        if (write(STDOUT_FILENO, buf, n) != n) {
            perror("write");
            close(fd);
            return 1;
        }
    }
    close(fd);
    return 0;
}

# Same operation from the shell — watch the syscalls
strace -e trace=open,openat,read,write cat /etc/hostname 2>&1 | head -20

fork + execve — how the shell runs a command

When you type ls, bash forks itself, then the child execs /bin/ls. The parent calls wait unless the command is backgrounded.

  bash (PID 1000)                    kernel
       │                                  │
       │  fork()                          │
       ├─────────────────────────────────►│  creates child PID 1001 (copy of bash)
       │                                  │
       │  execve("/bin/ls", ...)          │  (in child 1001)
       │                                  ├──────► loads /bin/ls, replaces memory
       │  wait(&status)                   │
       │                                  │  ls: open(".") read write×N
       │◄─────────────────────────────────┤  child exits code 0
       │  prompt returns                  │

#include <unistd.h>
#include <sys/wait.h>
#include <stdio.h>

int main(void) {
    pid_t pid = fork();
    if (pid < 0) { perror("fork"); return 1; }

    if (pid == 0) {
        /* Child: replace image with ls */
        char *argv[] = { "/bin/ls", "-la", NULL };
        execve("/bin/ls", argv, NULL);
        perror("execve");  /* only reached if exec fails */
        return 127;
    }

    /* Parent: wait for child */
    int status;
    waitpid(pid, &status, 0);
    if (WIFEXITED(status))
        printf("ls exited with code %d\n", WEXITSTATUS(status));
    return 0;
}

# Trace fork, exec, and wait for a simple command
strace -f -e trace=fork,execve,wait4,clone ls /tmp 2>&1 | head -30

Warning: Forking in loops without wait creates zombie processes (exited children still in the process table). Production daemons and init systems reap zombies; runaway fork bombs can exhaust PIDs and take down a host.

Observing syscalls in practice

You do not need to write C to see architecture in action. These tools bridge theory and daily work.

strace — trace syscalls of a running command. Use when a program hangs on I/O or fails with "Permission denied". dev infra
ltrace — trace library calls (libc) instead of raw syscalls. dev
/proc/<pid>/maps — memory layout of a process. infra
perf syscall — aggregate syscall counts across the system for performance tuning.

# Why is nginx slow to start? Count syscalls during startup
strace -c nginx -t 2>&1

# What files does a Python import touch?
strace -e trace=openat,stat python3 -c "import django" 2>&1 | tail -20

# Live view of a process's open files
ls -l /proc/$(pgrep -n nginx)/fd/

Pro Tip: strace -c gives a summary histogram—perfect for interviews when asked "how would you debug a slow startup?" Mention syscalls, then narrow with -e trace=openat.