Infrastructure & IaC explained

Applications need more than containers—they need networks, DNS, databases, IAM roles, and clusters. Infrastructure as Code (IaC) describes that footprint in version-controlled files so every environment is reproducible, reviewable, and aligned with how you already ship apps via DevSec Core and Kubernetes.

Helpful background: basic cloud vocabulary (VPC, subnet, security group) and comfort with git PR reviews.

After reading, you should be able to:

Contrast click-ops with declarative IaC.
Explain plan, apply, and why state exists.
Name core building blocks (provider, resource, variable, output, module).
See how IaC provisions the platform your deploy pipelines target.
Avoid classic mistakes (local-only state, secrets in git, unmanaged drift).

Step 1 — What “infrastructure” means on a cloud team

Layer	Examples	Often owned by
Foundation	VPC, subnets, routing, NAT, VPN	Platform / infra engineers
Compute	EKS/GKE/AKS cluster, node groups, autoscaling	Platform + SRE
Data	RDS, S3, Redis, message queues	Infra + app teams
Identity	IAM roles, OIDC for CI, service accounts	Security + platform
App runtime	Kubernetes Deployments, Helm releases	App teams (GitOps / CI)

IaC usually owns the top four layers; Kubernetes manifests or Helm often live in adjacent repos but follow the same review-and-apply discipline.

Step 2 — Manual changes vs declarative IaC

Click-ops in a cloud console is fast for experiments and dangerous for production: no diff review, no audit trail tied to git, and “what is actually running?” becomes a mystery after staff turnover.

Declarative IaC says: “Here is the desired end state.” The tool figures out create/update/delete steps.

IaC flow from HCL desired state through plan and apply to cloud resources and state file. — Terraform-style workflow: plan shows the delta; apply reconciles; state remembers cloud IDs for the next run.

For beginners: Think of IaC like a recipe checked into git—anyone can reproduce the same cake (VPC + cluster) in staging and prod with different ingredient sizes (variables).

For experienced readers: Declarative reconciliation differs from imperative scripts (AWS CLI loops); tools compute dependency graphs and parallelize safe creates.

Step 3 — Terraform vocabulary (used throughout this track)

We use Terraform syntax in examples; OpenTofu is an open-source fork with compatible workflows. Pulumi and CDK use real programming languages but share the same ideas (desired state, diff, deploy).

Concept	Role
Provider	Plugin that talks to AWS, Azure, GCP, GitHub, etc.
Resource	One infrastructure object (`aws_vpc`, `aws_eks_cluster`)
Data source	Read-only lookup (existing VPC ID, AMI ID)
Variable	Input per environment (CIDR, instance size)
Output	Values for other stacks or humans (cluster endpoint)
Module	Reusable package of resources (vpc module, eks module)

Step 4 — Minimal configuration example

# versions.tf
terraform {
  required_version = ">= 1.6.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# variables.tf
variable "environment" {
  type        = string
  description = "staging or production"
}

# main.tf
resource "aws_s3_bucket" "logs" {
  bucket = "myapp-${var.environment}-logs"
  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

# outputs.tf
output "logs_bucket_name" {
  value = aws_s3_bucket.logs.id
}

Step 5 — plan and apply (the daily loop)

terraform init      # download providers, configure backend
terraform fmt -check  # style in CI
terraform validate    # static checks
terraform plan -var="environment=staging"
# review: + create, ~ update, - destroy
terraform apply -var="environment=staging"

plan is your PR review artifact—paste output in the ticket or attach as CI comment. Never apply unreviewed changes to production.

+ create something new
~ update in place (may replace if forces new)
- destroy (watch for data loss)
-/+ destroy then create (new physical ID)

# PR job - plan only, no apply
- run: terraform init -backend=false
- run: terraform plan -input=false -out=tfplan
- uses: actions/upload-artifact@v4
  with:
    name: tfplan
    path: tfplan

Apply runs on merge to main with remote state and OIDC credentials.

Step 6 — State: the tool’s memory

Terraform stores a mapping from resource address in code to real cloud ID (e.g. aws_s3_bucket.logs → myapp-staging-logs). That JSON is the state file.

Local state (terraform.tfstate) — OK for solo learning; never for teams.
Remote state — S3 + DynamoDB lock, Terraform Cloud, GCS bucket; required for collaboration.
Locking — prevents two applies at once corrupting state.

State can contain sensitive values—treat remote buckets as confidential, enable encryption and least-privilege IAM.

Step 7 — Drift: when reality diverges

Drift happens when someone changes resources in the console or an outage replaces hardware. The next plan shows unexpected diffs.

terraform plan   # shows drift vs code
# fix options:
# 1) Update code to match intentional console change
# 2) apply to revert cloud to code
# 3) terraform import for resources adopted into management

Step 8 — How IaC connects to the rest of DevOps

IaC creates VPC + EKS + RDS + IAM for GitHub OIDC.
CI/CD builds images and deploys to that cluster (CI/CD track).
Kubernetes runs workloads; platform may install ingress, metrics-server via Helm or add-ons module.
Observability (next track) scrapes metrics and ships logs from those resources.

Splitting repos is common: infra-live for Terraform, app-api for service code, app-deploy for manifests—linked by outputs (cluster name, subnet IDs).

Step 9 — Environments without copy-paste

Same modules, different variables:

infra/
  modules/
    vpc/
    eks/
  environments/
    staging/
      main.tf      # module "vpc" { cidr = "10.1.0.0/16" ... }
    production/
      main.tf      # module "vpc" { cidr = "10.2.0.0/16" ... }

Each environment has its own remote state key (staging/terraform.tfstate) so blast radius stays isolated.

Step 10 — Anti-patterns

Anti-pattern	Why it hurts
State file in git	Merge conflicts, leaked secrets, no locking
`terraform.tfvars` with secrets committed	Credentials in history—use vault or CI secrets
One giant root module	Slow plans, scary blast radius—use modules
Apply from laptops to prod	No audit trail—CI with approval gates only
Ignoring `-` destroys in plan	Accidental data loss on databases and buckets

Step 11 — What to learn next on this track

Terraform: first project — init, VPC, S3 bucket, first apply and destroy in a sandbox account.
State, modules & remote backends — S3 state, DynamoDB locks, VPC modules, stack split by blast radius.
Environments, variables & secrets — tfvars per env, CI plan on PR, GitHub Environment gates, Secrets Manager.
Networking, IAM & drift — EKS subnets, security groups, IRSA, import, refresh, and drift detection.

Interview phrase: “We treat infrastructure as declarative code in git; terraform plan is the PR review surface; remote state with locking is the source of truth for IDs; CI applies to staging automatically and production after approval—drift is detected on every plan.”

The one line to remember

IaC is version-controlled desired state for your cloud—plan before you touch prod, store state where the team can collaborate, and let CI apply what reviewers approved.