Container Networking
Every container gets its own network namespace—a private view of interfaces, routes, and sockets on a shared kernel. Docker's network drivers decide how that namespace connects to the host and to other containers: virtual Ethernet pairs, Linux bridges, iptables NAT, embedded DNS, and overlay tunnels for Swarm. Master these primitives and service discovery, port conflicts, and "can't reach the database" tickets become predictable.
Docker network drivers
A network driver is Docker's plugin that wires a container's NET namespace to the outside world. The default on a single host is bridge—containers on a virtual LAN behind NAT. Multi-host Swarm uses overlay. Legacy apps needing a real MAC on the LAN use macvlan. Pick the wrong driver and you get broken DNS, double NAT, or containers that can't talk to each other.
Driver overview
| Driver | Scope | Container gets | When to use |
|---|---|---|---|
| bridge | Single host | Private IP on virtual bridge; NAT to outside | Default for standalone containers and Compose stacks on one machine |
| host | Single host | No separate network stack—uses host interfaces directly | Max throughput, low latency; monitoring agents, CNI-like patterns |
| none | Single host | Only loopback (lo) | Batch jobs with no network; security-sensitive isolation; custom networking later |
| overlay | Multi-host (Swarm) | IP on VXLAN tunnel spanning cluster nodes | Docker Swarm service discovery across nodes; legacy before Kubernetes CNI |
| macvlan | Single host | Real MAC address on physical VLAN; appears as LAN host | Legacy apps needing L2 adjacency; DHCP from upstream router; SIP/VoIP |
| ipvlan | Single host | Shares parent NIC MAC; distinct IP per container | MAC address exhaustion on switch; many containers on same subnet |
bridge — the default workhorse
When you docker run nginx without --network, Docker attaches the container to the default bridge (docker0) or, preferably, a user-defined bridge you created with docker network create. Containers get private IPs (typically 172.17.0.0/16 or a custom subnet), reach each other by IP, and egress to the internet through MASQUERADE rules on the host.
User-defined bridges add automatic DNS between containers on the same network—a critical reason to avoid the legacy default bridge for multi-container apps.
host — zero isolation, maximum performance
--network host removes NET namespace isolation. The container process binds directly to host ports and sees host interfaces. Port mapping (-p) is ignored—there is nothing to NAT. Useful when every microsecond of network overhead matters, but two containers cannot both bind port 8080.
none — network-off switch
--network none leaves only 127.0.0.1. No route to other containers or the internet. Combine with --network container:<name> on a sidecar to share another container's stack instead.
overlay — multi-host virtual LAN
Swarm mode creates an overlay network backed by VXLAN. Each node runs an overlay bridge; control plane (encrypted by default) distributes endpoint records. Services resolve each other by name via Swarm's embedded DNS. Requires --advertise-addr and open UDP 4789 between nodes.
macvlan & ipvlan — containers as LAN citizens
macvlan creates a sub-interface on the parent NIC and assigns each container its own MAC—upstream switches see them as separate physical hosts. Modes: bridge (most common), vepa, private, passthru.
ipvlan shares the parent's MAC but gives each container a unique IP—saves MAC table entries on crowded VLANs. Modes: l2 (same subnet broadcast domain) and l3 (routed, no broadcast).
| Aspect | macvlan | ipvlan |
|---|---|---|
| MAC per container | Yes—unique MAC | No—shares parent MAC |
| Switch MAC table | One entry per container | One entry for many containers |
| Host ↔ container traffic | Requires macvlan-shim or hairpin (often blocked) | L3 mode routes cleanly |
| Typical use | Legacy L2 apps, DHCP clients | High-density IP on flat subnet |
# List drivers and networks
docker network ls
docker info --format '{{json .Plugins.Network}}' | jq
# User-defined bridge (recommended)
docker network create --driver bridge app-net
# Macvlan on eth0
docker network create -d macvlan \
--subnet=192.168.1.0/24 --gateway=192.168.1.1 \
-o parent=eth0 macvlan-net
# Ipvlan L2
docker network create -d ipvlan \
--subnet=10.0.0.0/24 -o parent=eth0.10 \
-o ipvlan_mode=l2 ipvlan-net
# Host and none
docker run -d --network host --name metrics-agent prom/node-exporter
docker run --rm --network none alpine ip addr
flowchart TD
START[Need container networking?]
MULTI{Multi-host\nSwarm?}
L2{Need real MAC\non LAN?}
PERF{Max performance\nno isolation OK?}
BRIDGE[bridge driver\nuser-defined network]
OVERLAY[overlay driver\nVXLAN cluster]
MACV[macvlan / ipvlan]
HOST[host driver]
NONE[none driver]
START --> MULTI
MULTI -->|yes| OVERLAY
MULTI -->|no| L2
L2 -->|yes| MACV
L2 -->|no| PERF
PERF -->|yes| HOST
PERF -->|no| BRIDGE
START -->|no network| NONE
bridge vs host: bridge gives isolation, DNS, and predictable port mapping at the cost of NAT and veth overhead (usually negligible). host eliminates NAT and iptables hairpins but removes network namespace boundaries—treat it like running the process directly on the host.
"What Docker network drivers do you know?" — Hit bridge (default, user-defined DNS), host (no isolation), none, overlay (Swarm VXLAN), macvlan/ipvlan (LAN-visible IPs). Bonus: mention Kubernetes uses CNI plugins (Calico, Cilium) instead of Docker drivers on nodes.
Bridge network internals
A bridge network is not magic—it is a Linux bridge (software switch), a bundle of veth pairs (virtual patch cables), and iptables rules that NAT egress and publish ingress ports. Peel back these layers and "container can't ping gateway" debugging takes minutes, not hours.
The veth pair: container's virtual NIC
When Docker attaches a container to a bridge network, it creates a veth pair—two virtual interfaces linked like a pipe. One end (vethXXXX) sits in the host network namespace and is enslaved to the bridge; the other end (eth0) appears inside the container namespace with an IP from the network's subnet.
HOST namespace CONTAINER namespace
┌─────────────────────┐ ┌─────────────────────┐
│ docker0 / br-abc │ │ eth0 172.18.0.3 │
│ (Linux bridge) │ │ default gw .1 │
│ │ │ │ │
│ vethA ━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━┿━━ vethB (peer) │
│ 172.18.0.1 │ veth pair │ │
└─────────┬───────────┘ └─────────────────────┘
│ NAT (iptables MASQUERADE)
▼
physical NIC (eth0) → internet
docker0 — the legacy default bridge
On daemon start, Docker creates docker0—a bridge usually at 172.17.0.1/16. Containers started without --network land here. Problems with the default bridge:
- No automatic DNS — containers must use IPs or --link (deprecated)
- All containers share one flat network — no logical segmentation
- ICC always on — inter-container communication cannot be disabled per network
- Inconsistent with Compose — Compose creates user-defined networks by default
User-defined bridge vs default bridge
| Feature | default bridge (docker0) | user-defined bridge |
|---|---|---|
| Creation | Automatic at daemon start | docker network create mynet |
| Container DNS | No—use IPs or --link | Yes—resolve by name on same network |
| Network isolation | Single shared LAN | Separate bridges per network |
| Configurable subnet | Fixed unless daemon.json | --subnet, --gateway, --ip-range |
| ICC control | Global daemon flag only | --icc=false per network (legacy) or firewall policies |
| Compose default | No | Yes—project-scoped network |
iptables: the DOCKER chain
Docker programs iptables (or nftables via iptables-nft) in the nat and filter tables. Key chains:
- DOCKER — port publish DNAT rules (host port → container IP:port)
- DOCKER-ISOLATION — restricts traffic between bridges (older versions)
- MASQUERADE in POSTROUTING — SNAT for container egress via host IP
When you -p 8080:80, Docker inserts a rule: traffic to host:8080 is destination-NAT'd to 172.18.0.3:80. Disabling iptables ("iptables": false in daemon.json) breaks port publishing and outbound NAT—only do this if an external firewall manages rules.
$ docker network create --subnet 172.30.0.0/24 app-net a1b2c3d4e5f6... $ docker run -d --name web --network app-net -p 8080:80 nginx:alpine f7e8d9c0b1a2... $ brctl show br-a1b2c3d4e5f6 bridge name bridge id STP enabled interfaces br-a1b2c3d4e5f6 8000.0242ac110002 no veth1a2b3c4d $ docker exec web ip route default via 172.30.0.1 dev eth0 172.30.0.0/24 dev eth0 scope link src 172.30.0.2 $ sudo iptables -t nat -L DOCKER -n -v | head Chain DOCKER (2 references) pkts bytes target prot opt in out source destination 0 0 DNAT tcp -- !br-a1b2 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 to:172.30.0.2:80
Network lifecycle: create, connect, disconnect
Networks persist independently of containers. A container can join multiple networks—each attachment adds another veth pair and eth1, eth2, etc.
# Create with options
docker network create \
--driver bridge \
--subnet 10.10.0.0/24 \
--gateway 10.10.0.1 \
--ip-range 10.10.0.128/25 \
--label env=staging \
backend
# Connect running container (adds second interface)
docker network connect frontend myapp
# Disconnect (removes interface from that network)
docker network disconnect frontend myapp
# Inspect endpoints, IPs, MACs
docker network inspect backend --format '{{json .Containers}}' | jq
# Prune unused networks
docker network prune
sequenceDiagram participant C as Container eth0 participant V as veth pair participant B as Linux bridge participant I as iptables NAT participant H as Host NIC participant EXT as External host C->>V: egress to 8.8.8.8 V->>B: frame to gateway .1 B->>I: routed to host stack I->>I: MASQUERADE SNAT I->>H: src = host IP H->>EXT: internet EXT->>H: reply to host:8080 H->>I: DNAT DOCKER chain I->>B: forward to 172.30.0.2:80 B->>V: deliver to container
User-defined bridges are named br-<network-id-prefix> on the host. The bridge itself holds the gateway IP (e.g. 172.30.0.1). Container eth0 default route points there—it's the bridge interface acting as L3 gateway via the host kernel's IP forwarding stack.
Still using the default bridge? Two containers on docker0 cannot resolve each other by name—only by IP. Compose projects silently create user-defined networks; mixing docker run on default bridge with Compose services causes "connection refused" mysteries. Always pass --network <name> or use Compose.
Debug connectivity with docker run --rm -it --network container:<target> nicolaka/netshoot—you inherit the target's network stack and get tcpdump, dig, and curl without installing tools in production images.
Container DNS
On user-defined networks, curl http://api:3000 just works—Docker runs an embedded DNS server at 127.0.0.11 inside every container. It resolves container names, service aliases, and network-scoped names before forwarding external queries upstream. Misunderstand this resolver and you'll chase ghosts in /etc/resolv.conf.
The embedded DNS resolver (127.0.0.11)
Docker injects a lightweight DNS proxy into each container's network namespace, listening on 127.0.0.11:53. /etc/resolv.conf points exclusively to this address (unless overridden). The proxy:
- Answers queries for names on Docker networks the container belongs to
- Forwards everything else to upstream resolvers (host's resolv.conf at daemon start, or custom)
- Supports A/AAAA records for container names, service names, and network aliases
# Inside a container on a user-defined network
cat /etc/resolv.conf
# nameserver 127.0.0.11
# options ndots:0
# Resolve another container by name
docker exec app getent hosts db
docker exec app dig +short api @127.0.0.11
# Daemon-wide upstream DNS (daemon.json)
# { "dns": ["8.8.8.8", "1.1.1.1"] }
# Per-container override
docker run --dns 10.0.0.2 --dns-search corp.internal myapp
Name resolution rules
| Name queried | Resolved when | Returns |
|---|---|---|
| db | Container named db on same network | Container IP on that network |
| api | Compose service name / container name | Task/container VIP (Swarm: virtual IP) |
| db.backend | Network-scoped name (network backend) | IP on backend network |
| google.com | Always (if upstream reachable) | Forwarded to upstream DNS |
| tasks.<service> | Swarm mode only | All task IPs (DNS round-robin) |
Network aliases
A container can answer to multiple names on a network via aliases—useful when several logical services share one container, or for blue/green cutover without renaming containers.
# Alias at run time
docker run -d --name postgres --network app-net \
--network-alias db --network-alias database postgres:16
# Alias when connecting to second network
docker network connect --alias cache backend redis
# Compose equivalent
# services:
# redis:
# networks:
# backend:
# aliases: [cache, session-store]
resolv.conf behavior
Docker manages /etc/resolv.conf unless you use --dns or mount your own (not recommended—breaks embedded DNS). Key options injected:
- ndots:0 — bare names like api resolve immediately without search-domain expansion
- search / domain — from host or --dns-search
- options rotate — when multiple upstream servers configured
host.docker.internal and --add-host
Containers often need to reach services on the host machine (local dev database, IDE debugger). host.docker.internal is a magic hostname:
- Docker Desktop (Mac/Windows) — injected automatically; resolves to host gateway IP
- Linux — not automatic; add --add-host=host.docker.internal:host-gateway (Docker 20.10+)
# Linux: reach host from container
docker run --add-host=host.docker.internal:host-gateway myapp
# curl http://host.docker.internal:5432
# Custom static host entries
docker run --add-host=legacy.corp:10.50.0.99 myapp
# Compose
# extra_hosts:
# - "host.docker.internal:host-gateway"
Spring Boot devs hitting localhost:5432 from inside a container are connecting to the container's loopback—not the host. Fix: use host.docker.internal, publish Postgres on 0.0.0.0, or run Postgres in Compose on the same user-defined network with service name db.
Mounting a custom resolv.conf disables Docker's embedded DNS—you lose name resolution for other containers. If you must override upstream servers, use --dns flags instead of bind-mounting the file.
"How do containers find each other?" — On user-defined networks, embedded DNS at 127.0.0.11 resolves container/service names to IPs. Not available on default bridge. Kubernetes equivalent: CoreDNS via cluster DNS Service (10.96.0.10).
Port publishing
Containers on bridge networks live on private IPs unreachable from outside the host. Port publishing (-p) punches holes via iptables DNAT—mapping a host port to a container port. Bind carelessly to 0.0.0.0 and you expose services to the entire network; bind to 127.0.0.1 and only local processes can reach them.
Publish syntax
| Flag | Meaning | Example |
|---|---|---|
| -p 8080:80 | Host port 8080 → container TCP 80 (all interfaces) | docker run -p 8080:80 nginx |
| -p 127.0.0.1:8080:80 | Loopback only—not reachable from other machines | Dev databases, admin UIs |
| -p 8080:80/udp | UDP mapping (DNS, QUIC, gaming) | -p 53:53/udp |
| -p 8080:80/tcp -p 8080:80/udp | Same port, both protocols | HTTP/3 gateways |
| -p 80 (short) | Random host port → container 80 | CI ephemeral services |
| -P | Publish all EXPOSEd ports to random host ports | Quick local testing only |
| --publish-all | Alias for -P | Same as -P |
EXPOSE vs -p
EXPOSE 8080 in a Dockerfile is documentation—it does not publish the port. It informs -P which ports to map and helps humans reading the image. Actual exposure requires -p at run time or ports: in Compose.
# Explicit publish
docker run -d -p 127.0.0.1:3000:3000 --name api myapi:v1
# Random host port for container 5432
docker run -d -p 5432 postgres:16
docker port <container> # shows 0.0.0.0:49153 -> 5432/tcp
# Publish all EXPOSEd ports
docker run -d -P myimage:with-expose
# Compose
# ports:
# - "127.0.0.1:8080:80"
# - "8443:443"
iptables NAT path
Published ports traverse the nat table:
- PREROUTING / DOCKER chain — incoming packet to host:8080 gets DNAT to 172.18.0.2:80
- FORWARD chain — packet routed from host bridge to container veth (requires ip_forward=1)
- POSTROUTING MASQUERADE — return traffic SNAT'd if needed for hairpin NAT
$ docker run -d -p 127.0.0.1:9090:9090 --name prom prom/prometheus abc123... $ docker port prom 9090/tcp -> 127.0.0.1:9090 $ ss -tlnp | grep 9090 LISTEN 0 4096 127.0.0.1:9090 0.0.0.0:* users:(("docker-proxy",pid=1234)) $ curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:9090/-/healthy 200 $ # From another machine — connection refused (loopback bind)
docker-proxy and userland binding
By default Docker starts a docker-proxy process per published port to listen on the host socket and forward to the container. With "userland-proxy": false in daemon.json, hairpin NAT via pure iptables is used instead—fewer processes, but hairpin (container calling host-published port of sibling) may need sysctl net.ipv4.conf.all.route_localnet=1 for loopback publishes.
host.docker.internal: Desktop vs Linux
| Aspect | Docker Desktop (Mac/Win) | Docker Engine (Linux) |
|---|---|---|
| host.docker.internal | Auto-injected in all containers | Requires --add-host=host.docker.internal:host-gateway |
| Resolves to | VM gateway IP (Mac) or host (Win) | Host bridge gateway IP (172.17.0.1 or routing table) |
| Reach host service on localhost | Works if service binds 0.0.0.0 or gateway IP | Host must listen on bridge IP or 0.0.0.0, not only 127.0.0.1 |
| Published port from container | Use host.docker.internal:<port> | Same, or container IP of host via gateway |
flowchart LR EXT[External client\n203.0.113.5] HOST[Host eth0\n:8080] PROXY[docker-proxy\nor iptables DNAT] BR[bridge br-abc] CNT[Container\n172.18.0.2:80] EXT -->|TCP :8080| HOST HOST --> PROXY PROXY -->|DNAT| BR BR --> CNT
Never -p 6379:6379 on Redis/Postgres in production without authentication and firewall rules. Default bind 0.0.0.0 exposes the service to every interface—including public cloud NICs. Prefer internal user-defined networks; expose only through a reverse proxy or API gateway with TLS.
For local dev, bind sensitive services to loopback: 127.0.0.1:5432:5432. Other containers on the same Docker network reach Postgres via db:5432 on the internal IP—no host publish needed for inter-container traffic.
Published ports vs reverse proxy: direct -p is simple but one-port-per-service, no TLS termination, no path routing. Traefik/Caddy/nginx in front of unpublished containers on an internal network scales better and keeps attack surface smaller.
Network security
Docker's default bridge is a flat LAN—any container can ping any other on the same network. Production requires segmentation: frontends talk to APIs, APIs talk to databases, databases talk to nothing. Combine network tiers, ICC controls, tight port binding, and host firewalls to shrink lateral movement after a compromise.
ICC — inter-container communication
ICC controls whether containers on the same bridge can reach each other directly. On the legacy default bridge, ICC is a global daemon setting ("icc": false in daemon.json). On user-defined bridges, Docker historically supported per-network ICC; modern best practice is separate networks instead of one flat LAN with ICC disabled.
| Control | Mechanism | Effect |
|---|---|---|
| Separate networks | frontend + backend networks; API joins both | DB isolated from frontend—only API can reach it |
| ICC off (legacy) | docker network create --icc=false | Containers on same bridge cannot talk directly |
| Internal network | docker network create --internal backend | No external route—containers cannot reach internet |
| Host firewall | iptables/nftables, ufw, firewalld | Restrict published ports by source IP/CIDR |
| Compose profiles | Network definitions per tier | Declarative segmentation in version control |
# docker-compose.yml — three-tier segmentation
services:
web:
image: nginx:alpine
networks: [frontend]
ports:
- "127.0.0.1:8080:80"
api:
image: myapi:latest
networks: [frontend, backend]
db:
image: postgres:16
networks: [backend] # NOT on frontend — web cannot reach db directly
networks:
frontend:
backend:
internal: false # set true if db must not reach internet
Network segmentation tiers
Architects commonly model Docker/Compose networks as security zones:
| Tier | Contains | Ingress | Egress |
|---|---|---|---|
| DMZ / edge | Reverse proxy, WAF, TLS terminator | Public internet (443) | To app tier only |
| Application | API servers, workers, BFF | From edge tier | To data tier + external APIs |
| Data | Postgres, Redis, message brokers | From app tier only | None or restricted (--internal) |
| Management | Admin UIs, debug exporters | VPN / bastion / loopback | Read-only metrics paths |
flowchart TB
subgraph DMZ[DMZ network]
RP[Reverse proxy :443]
end
subgraph APP[Application network]
API[API service]
WORK[Worker]
end
subgraph DATA[Data network — internal]
PG[(PostgreSQL)]
RD[(Redis)]
end
INTERNET((Internet)) --> RP
RP --> API
API --> PG
API --> RD
WORK --> RD
WORK --> PG
Published port binding security
Risk matrix for common publish patterns:
| Binding | Risk level | Appropriate for |
|---|---|---|
| 0.0.0.0:80:80 | High—world reachable | Public web behind WAF; cloud SG must restrict |
| 127.0.0.1:8080:80 | Low—local only | Dev laptops, sidecar-to-sidecar via host loopback |
| 10.0.1.5:3306:3306 | Medium—VPC/internal NIC only | Private subnet admin access |
| No publish (internal network) | Lowest | DB, cache, message queues—container DNS only |
Defense-in-depth checklist
- Run databases and caches on internal networks with no ports: publish
- Bind admin and metrics ports to loopback or management VLAN IPs
- Layer host firewall rules under Docker's iptables (beware rule ordering)
- Disable icc on default bridge or stop using default bridge entirely
- Never expose the Docker socket (/var/run/docker.sock) into containers—equivalent to root
- In Kubernetes, graduate to NetworkPolicies (Calico/Cilium) for L3/L4 policy
$ docker exec web ping -c1 db ping: bad address 'db' # web not on backend network — expected $ docker exec api ping -c1 db PING db (10.10.0.3): 56 data bytes 64 bytes from 10.10.0.3: seq=0 ttl=64 time=0.08 ms $ nmap -p 5432 <cloud-public-ip> 5432/tcp closed # no publish rule — expected
Docker's iptables rules are inserted before many host firewall tools. ufw on Ubuntu often conflicts with Docker port publishing. Use ufw-docker or manage rules in the DOCKER-USER chain—which Docker documents as the supported hook for custom filter policy.
Shodan regularly indexes exposed dockerd on port 2375 and Redis on 6379 from cloud VMs. Automated scanners hit within minutes of a wide -p bind. CIS Docker Benchmark §4.x covers network partitioning—audit with docker network inspect and cloud security groups together.
"How do you isolate a database container?" — Put it on a backend-only user-defined network, no published ports, app container dual-homed to frontend+backend. Mention --internal for no-internet data tier. K8s follow-up: NetworkPolicy deny-all + allow-from-app-namespace.