State, modules & remote backends

A solo terraform.tfstate on your laptop does not scale. Teams store state in S3 with DynamoDB locking, reuse infrastructure via modules, and split work into separate stacks so one bad apply cannot take down the whole platform.

Prerequisites: Terraform: first project completed (or equivalent VPC + S3 lab).

After reading, you should be able to:

Root stack calls VPC module; state in S3 per stack; DynamoDB provides locks.
Remote state is the team’s source of truth for resource IDs; modules encode reusable patterns; multiple state files limit blast radius.

Why remote state matters

Local state problemRemote state fix
Lost laptop = lost IDsState object in S3, versioned
Two engineers apply at onceDynamoDB lock serializes writes
CI cannot apply without your diskPipeline uses same backend config
No audit trailS3 versioning + IAM who read/wrote state

Bootstrap chicken-and-egg: The bucket that holds state cannot be created by the same stack that depends on that bucket. Use a tiny bootstrap stack (or one-time console setup) for the state bucket and lock table, then point all other stacks at it.

Step 1 — Bootstrap backend resources (one-time)

Create ~/tf-bootstrap/main.tf with local state (this stack stays local or in a separate “meta” bucket):

terraform {
  required_version = ">= 1.6.0"
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

provider "aws" {
  region = "us-east-1"
}

data "aws_caller_identity" "current" {}

resource "aws_s3_bucket" "tfstate" {
  bucket = "tfstate-${data.aws_caller_identity.current.account_id}-sharpbyte"
}

resource "aws_s3_bucket_versioning" "tfstate" {
  bucket = aws_s3_bucket.tfstate.id
  versioning_configuration { status = "Enabled" }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "tfstate" {
  bucket = aws_s3_bucket.tfstate.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "tfstate" {
  bucket                  = aws_s3_bucket.tfstate.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_dynamodb_table" "lock" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

output "state_bucket" {
  value = aws_s3_bucket.tfstate.id
}

output "lock_table" {
  value = aws_dynamodb_table.lock.name
}
cd ~/tf-bootstrap
terraform init && terraform apply
# note outputs: state_bucket, lock_table

Step 2 — Point your lab stack at S3

In ~/tf-sandbox (from the first-project guide), add backend.tf:

terraform {
  backend "s3" {
    bucket         = "tfstate-ACCOUNT_ID-sharpbyte"   # from bootstrap output
    key            = "sandbox/network/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

The key path is how you separate stacks in one bucket—use distinct keys per stack and environment.

cd ~/tf-sandbox
terraform init -migrate-state
# answer yes to copy local state to S3

Verify: aws s3 ls s3://tfstate-.../sandbox/network/ should show terraform.tfstate. Local terraform.tfstate may remain as backup until you delete it after confirming remote state works.

Step 3 — Locking in practice

While one terminal runs terraform apply, a second terraform plan should block with:

Error acquiring the state lock

Never force-unlock unless you are sure no apply is running. Stale locks after a crashed laptop:

terraform force-unlock LOCK_ID   # only after verifying no active apply

Step 4 — Extract a VPC module

Restructure tf-sandbox:

tf-sandbox/
  backend.tf
  versions.tf
  variables.tf
  main.tf              # calls module
  outputs.tf
  modules/
    vpc/
      main.tf
      variables.tf
      outputs.tf

modules/vpc/variables.tf

variable "name_prefix" {
  type = string
}

variable "vpc_cidr" {
  type    = string
  default = "10.0.0.0/16"
}

variable "public_subnet_cidr" {
  type    = string
  default = "10.0.1.0/24"
}

modules/vpc/main.tf — move VPC, IGW, subnet, and route resources here (same resources as the first-project guide, parameterized with var.name_prefix in tags).

modules/vpc/outputs.tf

output "vpc_id" {
  value = aws_vpc.main.id
}

output "public_subnet_id" {
  value = aws_subnet.public.id
}

Root main.tf

module "vpc" {
  source = "./modules/vpc"

  name_prefix         = "${var.project}-${var.environment}"
  vpc_cidr            = "10.0.0.0/16"
  public_subnet_cidr  = "10.0.1.0/24"
}

# S3 logs bucket can stay at root or move to modules/s3-logs later
output "vpc_id" {
  value = module.vpc.vpc_id
}
terraform init    # modules are local paths; no registry download
terraform plan    # should show no changes if refactor is equivalent
terraform apply

Step 5 — Provider versioning and lock file

Pin providers in versions.tf with ~> 5.0 (minor upgrades allowed). After terraform init, commit .terraform.lock.hcl so CI and teammates resolve the same provider checksums.

terraform providers
terraform init -upgrade   # deliberate bump only

Step 6 — Split stacks by blast radius

One monolithic state file for VPC, RDS, EKS, and IAM means a typo in a security group rule can block a database migration apply. Common split:

StackState key exampleChanges often
networkprod/network/terraform.tfstateRarely—VPC, subnets, NAT
platformprod/eks/terraform.tfstateCluster upgrades
appprod/app/terraform.tfstateDeployments, S3, SQS

Downstream stacks read upstream outputs via terraform_remote_state data source (or pass IDs through CI/CD variables):

# In app stack — reads network stack outputs
data "terraform_remote_state" "network" {
  backend = "s3"
  config = {
    bucket = "tfstate-ACCOUNT_ID-sharpbyte"
    key    = "prod/network/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "bastion" {
  subnet_id = data.terraform_remote_state.network.outputs.public_subnet_id
  # ...
}

Apply order: network → platform → app. Destroy order is reversed.

Step 7 — Module sources (beyond local paths)

module "vpc" {
  source = "./modules/vpc"
}

Step 8 — State operations you will use

terraform state list
terraform state show module.vpc.aws_vpc.main
terraform state mv aws_vpc.main module.vpc.aws_vpc.main   # after refactor
terraform state rm aws_instance.old                       # removed from code, orphan in state

state mv and state rm are surgical—prefer fixing code so plan shows the intended diff when possible.

Step 9 — Troubleshooting

SymptomAction
Backend configuration changedterraform init -reconfigure or -migrate-state with care
AccessDenied on S3 stateIAM policy needs s3:GetObject, PutObject, ListBucket on state bucket
Lock never releasesConfirm no running apply; check DynamoDB item; then force-unlock
Module produces unexpected destroyRenamed resource address—use moved blocks (TF 1.1+) or state mv
Remote state output missingUpstream stack not applied; wrong key in data source

Step 10 — Anti-patterns

Interview phrase: “We bootstrap S3 and DynamoDB once, every stack has its own state key, modules encapsulate VPC patterns with versioned sources, and CI runs plan on every PR—apply to prod is gated and uses the same remote backend the team shares.”

The one line to remember

Remote state + locks let the team collaborate safely; modules stop copy-paste; separate state files keep one mistake from freezing the entire cloud.