beta

Terraform

Terraform style guide for infrastructure without surprises

#terraform#infrastructure#iac

Background

This guide establishes standards for Terraform infrastructure within the AdaptiveGears ecosystem. It prioritizes Day 2 operations (maintenance, debugging, modification) over Day 1 ease of authoring. The guidelines assume a team size of 3+ engineers and aim to scale infrastructure without necessitating a dedicated Platform Engineering team for module maintenance.

Anti-DRY Principle

Do not apply DRY (Don’t Repeat Yourself) principles to Terraform code.

Repetition is preferred over shared dependencies. The AWS provider itself serves as the primary abstraction layer.

Rationale: The costs of abstraction in Terraform—coupled failures, increased blast radius, and maintenance overhead—outweigh the benefits of code reuse.

Modules require infrastructure: versioning, private registry, release cycle, migration path. Each version bump needs testing across consumers. Provider updates break compatibility.

Generalization is expensive. Different volume sizes between staging and production require a variable. Different instance types require another variable. Different backup retention requires another variable. Every difference demands parameterization. The “reusable” module becomes a configuration maze.

The AWS provider is the abstraction. Resources have sensible defaults and documentation. Modules add a second interface to learn, additional abstraction layers to debug, and limitations when the module doesn’t expose required parameters.

Modules SHOULD be used for:

  • Multiplying identical resources (e.g., creating N identical storage buckets from a list).
  • Tightly coupled resources that cannot function independently (e.g., network + subnets + routing).

Modules MAY be used for:

  • Standardizing high-variability patterns (e.g., instance initialization templates).
  • Encapsulating non-obvious composition patterns not clear from provider documentation.

Modules SHOULD NOT be used for:

  • Trivial wrappers around single resources.
  • Forced bundling of unrelated resources.
  • Re-implementing provider APIs.
Good ModuleBad Module
Establishes standard for high-variability patternWraps single resource with few options
Creates N identical things from a listBundles unrelated resources “for convenience”
One focused concernForced coupling of unrelated concerns
Adds value beyond providerRe-implements provider API

Modules don’t guarantee compliance. “Standardized modules ensure compliance” is a fallacy. Auditors verify the deployed state (e.g., “Is encryption enabled?”), not the code that produced it. Compliance is enforced by policy-as-code (AWS Config, OPA) scanning actual resources. Wrapping resources in “compliant modules” adds no value if policy checks the result anyway.

Wrapper modules break DRY. Wrapper modules repeat the provider’s API with a different interface. You end up maintaining two interfaces for the same thing—learning the module API instead of learning the provider API.

AI assistants mitigate repetition costs. AI coding assistants like Claude Code drastically reduce the cost of avoiding DRY. Finding all occurrences, making consistent edits across files, handling repetitive updates—the cognitive overhead that justified DRY is eliminated by AI tooling.

File Organization

Organize files by workload and component.

A flat directory of .tf files is forbidden. Use a hierarchical structure based on logical workloads and physical components.

Logical organization (from AWS Well-Architected):

  • Workload: A set of components that together deliver business value. Domain boundary.
  • Component: Resources you provision and manage as a unit.

Physical deployment:

  • Stack: A component with its own state file. The actual terraform apply target.
  • Module: Reusable template for deploying multiple instances of the same resource.
LogicalPhysical
Workload
ComponentStack
Module
terraform/
├── app/                       # Workload
│   ├── .sops.yaml             # Encryption config
│   ├── cluster/               # Component
│   ├── database/              # Component
│   ├── cache/                 # Module
│   ├── cache-1/               # Component → Module
│   └── cache-2/               # Component → Module
└── monitoring/                # Workload
    ├── .sops.yaml
    └── alerts/                # Component

One state file per component. Each component must be independently deployable and rollback-able.

One state file per component. Independent apply, independent rollback.

Do not use Terragrunt include blocks.

Terragrunt’s include blocks are a DRY feature that violates our constraint of using Terragrunt only for YAML loading. A common anti-pattern involves empty root terragrunt.hcl files with child configs using include "root":

workload/
├── terragrunt.hcl              # Empty or near-empty "root" file (WRONG)
└── stacks/
    └── component/
        └── terragrunt.hcl      # Contains include block (WRONG)

Symptom:

WARN  Using `terragrunt.hcl` as the root of Terragrunt configurations is an
anti-pattern, and no longer recommended. In a future version of Terragrunt,
this will result in an error.

Bad:

include "root" {
  path = find_in_parent_folders()
}

inputs = merge(
  yamldecode(file(find_in_parent_folders("terraform.tfvars.yaml"))),
  yamldecode(file("terraform.tfvars.yaml"))
)

Good:

inputs = merge(
  yamldecode(file(find_in_parent_folders("terraform.tfvars.yaml"))),
  yamldecode(file("terraform.tfvars.yaml"))
)

The fix: remove the include "root" block and delete the empty root terragrunt.hcl. Using find_in_parent_folders() to locate YAML files is fine—it finds the YAML file directly without needing a root terragrunt.hcl as a marker.

Component Dependencies

Declare components fully via explicit inputs.

Components SHOULD be fully declared by inputs, following the Kubernetes pattern.

This refers to deployment-time resolution. In Kubernetes manifests, a Pod explicitly references a ConfigMap by name; it does not query the cluster during deployment to discover it. Similarly, Terraform components should have their dependencies (such as Subnet IDs) passed explicitly as variables.

Components SHOULD NOT resolve dependencies at runtime—all values known at plan time. For values that change frequently with deploys, use terraform_remote_state instead—see Cross-Component Dependencies.

Rationale: Terraform builds dependency trees. When components share state or use data sources, changes cascade—resources recreate, timing issues emerge, “known after apply” forces terraform apply -target to provision resources before planning the rest. Explicit inputs break the chain.

Inputs over data sources. Explicit over implicit.

Bad: Implicit dependency via data source.

data "aws_subnet" "this" {
  id = "subnet-abc123" # Hidden dependency
}

Good: Explicit dependency via input variable.

variable "subnet_id" {
  type = string
}

resource "aws_instance" "this" {
  subnet_id = var.subnet_id
}

Deployment Ordering

Use numeric prefixes for sequential components.

When a workload requires ordered deployment, use alphanumerically sortable numeric prefixes (e.g., 10-, 20-).

workload/
├── 10-foundation/
├── 11-permissions/
├── 20-networking/
├── 30-storage/
└── 40-compute/

The ordering (10, 20, 30) is for humans to understand the intended deployment sequence. The safety net is the isolation principle: deploying out of order either fails cleanly (missing input) or succeeds (dependency exists). Numbering documents intent, not programmatic execution order.

Prefixes MUST be alphanumerically sortable. Use ranges (10, 20, 30) instead of sequential (1, 2, 3). Sequential numbering hides logical grouping and requires renumbering when inserting components.

When a component contains multiple independent resources, they MAY be organized as subfolders:

workload/
└── 20-networking/
    ├── public/
    ├── private/
    └── internal/

These sub-components MUST remain independent of each other.

File Structure

Maintain a consistent file set for every component and module.

The following files MUST exist in every component directory, even if empty:

component/
├── main.tf                     # Resources
├── variables.tf                # Variable declarations
├── outputs.tf                  # Output declarations
├── terraform.tf                # Backend/Provider config
├── terragrunt.hcl              # YAML variable loading
├── terraform.tfvars.yaml       # Variables (YAML)
└── terraform.tfvars.sops.yaml  # OPTIONAL - only if component has secrets

The following files MUST exist in every module directory:

module/
├── main.tf           # Resources and entrypoint
├── variables.tf      # Variable declarations
├── outputs.tf        # Output declarations
└── README.md         # Documentation

Modules have no terraform.tf, YAML, or tfvars files—they inherit providers from the calling component and receive inputs as variables.

Do not split resources into multiple files unnecessarily. Group resources within main.tf using comment dividers. Split into thematic files (e.g., iam.tf) only when main.tf exceeds ~300 lines. Never use per-resource files (e.g., s3_bucket.tf).

Section comments SHOULD include reasoning or architectural context. They MUST NOT explain the code itself—code should be self-explanatory. When adding context, wrap it with a closing divider line.

Example: Section Divider

# ------------------------------------------------------------------------------
# Networking
# ------------------------------------------------------------------------------

resource "aws_network_interface" "this" {
  # ...
}

Example: Section Divider with Context

# ------------------------------------------------------------------------------
# Networking
# ------------------------------------------------------------------------------
#
# Network interfaces decoupled from compute, enabling instance replacement
# without IP changes.
#
# Architecture:
#   Server
#   ├── Interface (persistent) → Stable IP across replacements
#   └── Instance (ephemeral)   → Replaceable compute
#
# ------------------------------------------------------------------------------

resource "aws_network_interface" "this" {
  # ...
}

Naming Conventions

Use this for singleton resources. When a resource is the primary or only one of its kind in a module/component, name it this.

Good:

resource "aws_security_group" "this" { ... }

Bad:

resource "aws_security_group" "main_security_group" { ... }

Use descriptive names for multiple resources. If multiple resources of the same type exist, use functional names.

resource "aws_security_group" "ingress" {
  # ...
}

resource "aws_security_group" "egress" {
  # ...
}

Variables, locals, and outputs: Use snake_case. Match provider attribute names where possible (e.g., instance_type). Prefix booleans with enable_ or use_. Only add outputs when the use case is clear—no speculative outputs.

Boolean Variables:

variable "enable_monitoring" {
  type = bool
}

variable "use_private_subnet" {
  type = bool
}

Output Naming:

output "instance_id" {
  value = aws_instance.this.id
}

output "instance_arn" {
  value = aws_instance.this.arn
}

Formatting

Code must pass terraform fmt. The formatter handles indentation (2 spaces), = alignment within blocks, and whitespace normalization.

The formatter does NOT enforce the following rules:

  • Line Length: Soft limit of 100 characters.
  • Blank Lines: One blank line between top-level blocks. No double blank lines. One blank line before nested blocks within a resource.
  • Trailing Commas: Not required in maps/objects (newline is separator). Required in lists.
  • Heredoc: Use <<-EOF to strip indentation.
  • Multi-line Ternary: Operators at start of line.
  • For Expressions: Opening bracket on same line, closing aligned with block start.

Tags:

# Multi-key
tags = {
  Name        = var.name
  Environment = var.environment
}

# Single-key: inline preferred
tags = { Name = var.name }

Heredoc:

user_data = <<-EOF
  #!/bin/bash
  echo "hello"
EOF

Multi-line Ternary:

subnet_id = var.use_private
  ? var.private_subnet_id
  : var.public_subnet_id

For Expression:

locals {
  instance_ids = [
    for instance in aws_instance.this :
    instance.id
  ]
}

Good:

resource "aws_instance" "this" {
  ami = var.ami_id

  root_block_device {
    volume_size = 20
  }

  tags = { Name = var.name }
}

Ordering

Order blocks consistently in files.

  • terraform.tf: terraform {} block first, then provider {} blocks (alphabetical).
  • variables.tf, outputs.tf: Group logically, then alphabetically within groups.
  • main.tf: Order sections by dependency. Within sections: Locals, then Resources (logical then alphabetical).

terraform.tf:

terraform {
  required_version = ">= 1.5.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }

  backend "s3" {
    # ...
  }
}

provider "aws" {
  # ...
}

provider "aws" {
  alias = "us_west"
  # ...
}

variables.tf:

# Networking
variable "subnet_id" {
  type = string
}

variable "vpc_id" {
  type = string
}

# Compute
variable "instance_type" {
  type = string
}

main.tf:

# ------------------------------------------------------------------------------
# IAM
# ------------------------------------------------------------------------------

locals {
  role_name = "${var.name}-role"
}

resource "aws_iam_role" "this" {
  name = local.role_name
  # ...
}

resource "aws_iam_role_policy_attachment" "this" {
  role = aws_iam_role.this.name
  # ...
}

# ------------------------------------------------------------------------------
# Compute
# ------------------------------------------------------------------------------

locals {
  instance_name = "${var.name}-instance"
}

resource "aws_instance" "this" {
  iam_instance_profile = aws_iam_instance_profile.this.name
  # ...
}

Order arguments within resource blocks as follows:

  1. Meta-arguments: count, for_each, provider.
  2. Identifiers: name, id, description.
  3. Configuration: Logically grouped, then alphabetical.
  4. Nested blocks: Logically grouped, then alphabetical.
  5. Tags.
  6. lifecycle.
  7. depends_on.
resource "aws_instance" "this" {
  count = var.enabled ? 1 : 0

  ami           = var.ami_id
  instance_type = var.instance_type

  iam_instance_profile   = aws_iam_instance_profile.this.name
  subnet_id              = var.subnet_id
  vpc_security_group_ids = var.security_group_ids

  root_block_device {
    volume_size = 20
    volume_type = "gp3"
  }

  tags = { Name = var.name }

  lifecycle {
    create_before_destroy = true
  }

  depends_on = [aws_iam_role.this]
}

Variables and Locals

Define variables with explicit types. Arguments within a variable block must be ordered: type, description (avoid if name is self-documenting), default, sensitive, validation.

variable "environment" {
  type    = string
  default = "dev"

  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Invalid environment."
  }
}

Use locals for complex transformations only. Do not use locals for simple pass-throughs or aliases. Use them for chained transformations, filtering, or complex calculations.

locals {
  # Normalize input
  raw_instances = [
    for k, v in var.instances : merge(v, { name = k })
  ]

  # Filter
  enabled_instances = [
    for inst in local.raw_instances : inst
    if inst.enabled
  ]

  # Transform to map for for_each
  instances_map = {
    for inst in local.enabled_instances : inst.name => inst
  }
}

resource "aws_instance" "this" {
  for_each = local.instances_map

  ami           = each.value.ami
  instance_type = each.value.instance_type

  tags = { Name = each.key }
}

Coding Patterns

  • Conditionals: Use count for 0-or-1 booleans, for_each for collections. Convert lists with toset(var.list).
  • Nulls: Use coalesce() to handle nulls.
  • Interpolation: Use bare references (e.g., var.id) instead of string interpolation ("${var.id}") where possible.
  • References: Use one() for 0-or-1 resources. Avoid try() for references as it masks errors.
  • Dynamic blocks: Avoid. Use only when the number of nested blocks is determined by input variables. If the number is static, write blocks explicitly.
  • Comments: Avoid inline comments. Code should be self-explanatory. Use comments only for unavoidable complexity.

Conditional Resources:

# 0-or-1: use count
resource "aws_instance" "this" {
  count = var.enabled ? 1 : 0
  # ...
}
# Collections: use for_each
resource "aws_instance" "this" {
  for_each = var.instances
  # ...
}

Null Handling:

name = coalesce(var.custom_name, "${var.prefix}-default")

String Interpolation:

# Good
subnet_id = var.subnet_id

# Avoid
subnet_id = "${var.subnet_id}"

Referencing Conditional Resources:

# Good - explicit about 0-or-1
output "instance_id" {
  value = one(aws_instance.this[*].id)
}

# Avoid - swallows typos and wrong attributes
output "instance_id" {
  value = try(aws_instance.this[0].id, null)
}

# Only when resource definitely exists
instance_id = aws_instance.this[0].id

Inline Comments:

# Avoid
subnet_id = var.subnet_id  # the subnet to deploy into

# Only when necessary
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)  # /24 blocks starting at .10

IAM Policies

Policy documents SHOULD use aws_iam_policy_document data sources.

The data source provides validation at plan time, catches syntax errors before apply, and enables dynamic resource references. jsonencode MAY be used for trivial policies where the overhead isn’t justified.

data "aws_iam_policy_document" "assume_role" {
  statement {
    actions = ["sts:AssumeRole"]
    effect  = "Allow"

    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "this" {
  name               = var.name
  assume_role_policy = data.aws_iam_policy_document.assume_role.json
}

Security Groups

Use inline ingress and egress blocks.

Prefer defining rules inline within the aws_security_group resource rather than using separate aws_security_group_rule resources. This prevents race conditions and keeps logic contained.

resource "aws_security_group" "this" {
  name   = var.name
  vpc_id = var.vpc_id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = { Name = var.name }
}

Data Sources

Limit data sources to stable, non-workload resources.

Safe uses include aws_caller_identity, aws_region, or aws_ami (for base images). Do not use data sources to look up workload-specific resources, as this creates hidden dependencies.

data "aws_caller_identity" "this" {}
data "aws_region" "this" {}

data "aws_ami" "this" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

Cross-Component Dependencies

Use terraform_remote_state for values that change with deploys.

Explicit YAML inputs work well for stable values. But when Component B depends on Component A’s outputs that change frequently, manually gathering and updating YAML after each deploy adds overhead. terraform_remote_state reads outputs directly.

data "terraform_remote_state" "foundation" {
  backend = "s3"
  config = {
    bucket = "company-terraform-state"
    key    = "workload/10-foundation.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "this" {
  subnet_id = data.terraform_remote_state.foundation.outputs.subnet_id
}

Why terraform_remote_state over Terragrunt dependencies? Terragrunt dependencies try to refresh or deploy the dependency. terraform_remote_state just reads—no side effects.

Why terraform_remote_state over SSM? SSM duplicates data (output stored in state AND in SSM). Use SSM only when crossing tool boundaries.

MethodUse When
Explicit YAMLFriction acceptable, values change rarely
terraform_remote_stateTerraform-to-Terraform, values change with deploys
SSM Parameter StoreTerraform-to-Ansible/bash, crossing tool boundaries

YAML Configuration

Use YAML for all variable inputs. Terragrunt is required to load variables from YAML files.

Terraform has no native way to pass YAML as variable input (-var-file only accepts .tfvars or .tfvars.json). Starting with native Terraform means starting with .tfvars. Later, when SOPS encryption or complex nested structures are needed, everything must be converted. Mandating Terragrunt from day one ensures YAML usage everywhere and SOPS readiness without future migration.

YAML also aligns with Kubernetes and Ansible—one format across the infrastructure stack.

terraform.tfvars.yaml:

servers:
  server-1:
    subnet_id: subnet-abc123
    instance_type: t3.medium
  server-2:
    subnet_id: subnet-def456
    instance_type: t3.large

user_data: |
  #!/bin/bash
  echo "hello"

terragrunt.hcl (simple):

inputs = yamldecode(file("terraform.tfvars.yaml"))

terragrunt.hcl (with secrets):

inputs = merge(
  yamldecode(file("terraform.tfvars.yaml")),
  yamldecode(sops_decrypt_file("terraform.tfvars.sops.yaml"))
)

Constraint: Use Terragrunt only for YAML loading. Do not use its DRY features (includes, dependency blocks).

Secrets with SOPS

Encrypt secrets in git using SOPS.

SOPS encrypts YAML values while keeping keys readable. Secrets live alongside code—no external secret store, no runtime dependency, full git history.

.sops.yaml at workload root defines encryption rules:

creation_rules:
  - path_regex: \.sops\.yaml$
    kms: arn:aws:kms:us-east-1:123456789:key/abc-123

terraform.tfvars.sops.yaml:

database_password: ENC[AES256_GCM,data:...,type:str]
api_key: ENC[AES256_GCM,data:...,type:str]

Terragrunt’s sops_decrypt_file() decrypts at plan/apply time. Values pass through as variables.

Workflow:

sops terraform.tfvars.sops.yaml  # Edit secrets (decrypts in $EDITOR)
terragrunt plan                   # Decrypts automatically

Native Terraform cannot decrypt SOPS files. Terragrunt is required.

Workload-Level Variables

Shared variables at the workload root are OPTIONAL.

Common variables (Region, Account ID) MAY be defined in a root terraform.tfvars.yaml and merged into child components via find_in_parent_folders(). This pattern is useful when multiple components share the same values, but adds complexity. Start without shared files; add them when duplication becomes painful.

workload/
├── .sops.yaml                    # Encryption config (components inherit, MAY override)
├── terraform.tfvars.yaml         # OPTIONAL - shared variables
├── terraform.tfvars.sops.yaml    # OPTIONAL - shared secrets
├── 10-foundation/
│   ├── terragrunt.hcl
│   ├── terraform.tfvars.yaml     # REQUIRED - component variables
│   └── terraform.tfvars.sops.yaml  # OPTIONAL - only if component has secrets
└── 20-networking/
    └── ...

component/terragrunt.hcl (without secrets):

inputs = yamldecode(file("terraform.tfvars.yaml"))

component/terragrunt.hcl (with secrets):

inputs = merge(
  yamldecode(file("terraform.tfvars.yaml")),
  yamldecode(sops_decrypt_file("terraform.tfvars.sops.yaml"))
)

component/terragrunt.hcl (with shared files):

inputs = merge(
  yamldecode(file(find_in_parent_folders("terraform.tfvars.yaml"))),
  yamldecode(sops_decrypt_file(find_in_parent_folders("terraform.tfvars.sops.yaml"))),
  yamldecode(file("terraform.tfvars.yaml")),
  yamldecode(sops_decrypt_file("terraform.tfvars.sops.yaml"))
)

When using shared files, workload-level YAML can serve as “global state”—containing variables for all components. Terraform only uses variables declared in variables.tf, ignoring the rest. Each component extracts what it needs.

State Management

Store state in S3 with DynamoDB locking.

Backend configuration must be explicit in terraform.tf. Paths MUST be hardcoded, not interpolated from variables. Dynamic paths risk accidental state migration when variables change. Locking MUST use a DynamoDB table shared across the organization.

State locking via DynamoDB prevents concurrent modifications. The lock table SHOULD be shared across all state files—one table handles locks for the entire organization.

terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "workload/component.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

Tagging

Apply tags via the default_tags provider block.

All resources must carry the following tags:

  • Name: Stack name, typically matches the component name. Individual resources override this with their own Name tag (e.g., web-server-1) when needed.
  • Workload: Identifies what workload the component belongs to.
  • Component: Identifies the component within the workload.
  • UUID: Guardrail that prevents collisions when selecting resources by tags. Without UUID, generic tags like Workload=cache would match all cache resources across different deployments. UUID SHOULD be applied at workload level, MAY be applied at component level.

Use Cases:

  • Cost tracking by workload or specific deployment.
  • SSM associations targeting instances by UUID.
  • AWS Resource Groups filtered by tags.
provider "aws" {
  default_tags {
    tags = {
      Name      = var.name
      Workload  = "app"
      Component = "database"
      UUID      = var.uuid
    }
  }
}

Version Control

Pin major versions using ~> X.0. The ~> operator allows the rightmost component to increment. ~> 5.0 permits any 5.x release but blocks 6.0. This protects against breaking changes while receiving bug fixes and new features.

Pin tool versions using version files. Place version files at the workload root to enable automatic switching via tfenv and tgenv.

  • .terraform-version
  • .terragrunt-version
workload/
├── .terraform-version     # e.g., 1.5.7
├── .terragrunt-version    # e.g., 0.66.5
└── ...

Also declare requirements in terraform.tf:

terraform {
  required_version = ">= 1.5.0"
}

References