Terraform
Terraform style guide for infrastructure without surprises
Background
This guide establishes standards for Terraform infrastructure within the AdaptiveGears ecosystem. It prioritizes Day 2 operations (maintenance, debugging, modification) over Day 1 ease of authoring. The guidelines assume a team size of 3+ engineers and aim to scale infrastructure without necessitating a dedicated Platform Engineering team for module maintenance.
Anti-DRY Principle
Do not apply DRY (Don’t Repeat Yourself) principles to Terraform code.
Repetition is preferred over shared dependencies. The AWS provider itself serves as the primary abstraction layer.
Rationale: The costs of abstraction in Terraform—coupled failures, increased blast radius, and maintenance overhead—outweigh the benefits of code reuse.
Modules require infrastructure: versioning, private registry, release cycle, migration path. Each version bump needs testing across consumers. Provider updates break compatibility.
Generalization is expensive. Different volume sizes between staging and production require a variable. Different instance types require another variable. Different backup retention requires another variable. Every difference demands parameterization. The “reusable” module becomes a configuration maze.
The AWS provider is the abstraction. Resources have sensible defaults and documentation. Modules add a second interface to learn, additional abstraction layers to debug, and limitations when the module doesn’t expose required parameters.
Modules SHOULD be used for:
- Multiplying identical resources (e.g., creating N identical storage buckets from a list).
- Tightly coupled resources that cannot function independently (e.g., network + subnets + routing).
Modules MAY be used for:
- Standardizing high-variability patterns (e.g., instance initialization templates).
- Encapsulating non-obvious composition patterns not clear from provider documentation.
Modules SHOULD NOT be used for:
- Trivial wrappers around single resources.
- Forced bundling of unrelated resources.
- Re-implementing provider APIs.
| Good Module | Bad Module |
|---|---|
| Establishes standard for high-variability pattern | Wraps single resource with few options |
| Creates N identical things from a list | Bundles unrelated resources “for convenience” |
| One focused concern | Forced coupling of unrelated concerns |
| Adds value beyond provider | Re-implements provider API |
Modules don’t guarantee compliance. “Standardized modules ensure compliance” is a fallacy. Auditors verify the deployed state (e.g., “Is encryption enabled?”), not the code that produced it. Compliance is enforced by policy-as-code (AWS Config, OPA) scanning actual resources. Wrapping resources in “compliant modules” adds no value if policy checks the result anyway.
Wrapper modules break DRY. Wrapper modules repeat the provider’s API with a different interface. You end up maintaining two interfaces for the same thing—learning the module API instead of learning the provider API.
AI assistants mitigate repetition costs. AI coding assistants like Claude Code drastically reduce the cost of avoiding DRY. Finding all occurrences, making consistent edits across files, handling repetitive updates—the cognitive overhead that justified DRY is eliminated by AI tooling.
File Organization
Organize files by workload and component.
A flat directory of .tf files is forbidden. Use a hierarchical structure based on logical workloads and physical components.
Logical organization (from AWS Well-Architected):
- Workload: A set of components that together deliver business value. Domain boundary.
- Component: Resources you provision and manage as a unit.
Physical deployment:
- Stack: A component with its own state file. The actual
terraform applytarget. - Module: Reusable template for deploying multiple instances of the same resource.
| Logical | Physical |
|---|---|
| Workload | — |
| Component | Stack |
| — | Module |
terraform/
├── app/ # Workload
│ ├── .sops.yaml # Encryption config
│ ├── cluster/ # Component
│ ├── database/ # Component
│ ├── cache/ # Module
│ ├── cache-1/ # Component → Module
│ └── cache-2/ # Component → Module
└── monitoring/ # Workload
├── .sops.yaml
└── alerts/ # Component
One state file per component. Each component must be independently deployable and rollback-able.
One state file per component. Independent apply, independent rollback.
Do not use Terragrunt include blocks.
Terragrunt’s include blocks are a DRY feature that violates our constraint of using Terragrunt only for YAML loading. A common anti-pattern involves empty root terragrunt.hcl files with child configs using include "root":
workload/
├── terragrunt.hcl # Empty or near-empty "root" file (WRONG)
└── stacks/
└── component/
└── terragrunt.hcl # Contains include block (WRONG)
Symptom:
WARN Using `terragrunt.hcl` as the root of Terragrunt configurations is an
anti-pattern, and no longer recommended. In a future version of Terragrunt,
this will result in an error.
Bad:
include "root" {
path = find_in_parent_folders()
}
inputs = merge(
yamldecode(file(find_in_parent_folders("terraform.tfvars.yaml"))),
yamldecode(file("terraform.tfvars.yaml"))
)
Good:
inputs = merge(
yamldecode(file(find_in_parent_folders("terraform.tfvars.yaml"))),
yamldecode(file("terraform.tfvars.yaml"))
)
The fix: remove the include "root" block and delete the empty root terragrunt.hcl. Using find_in_parent_folders() to locate YAML files is fine—it finds the YAML file directly without needing a root terragrunt.hcl as a marker.
Component Dependencies
Declare components fully via explicit inputs.
Components SHOULD be fully declared by inputs, following the Kubernetes pattern.
This refers to deployment-time resolution. In Kubernetes manifests, a Pod explicitly references a ConfigMap by name; it does not query the cluster during deployment to discover it. Similarly, Terraform components should have their dependencies (such as Subnet IDs) passed explicitly as variables.
Components SHOULD NOT resolve dependencies at runtime—all values known at plan time. For values that change frequently with deploys, use terraform_remote_state instead—see Cross-Component Dependencies.
Rationale:
Terraform builds dependency trees. When components share state or use data sources, changes cascade—resources recreate, timing issues emerge, “known after apply” forces terraform apply -target to provision resources before planning the rest. Explicit inputs break the chain.
Inputs over data sources. Explicit over implicit.
Bad: Implicit dependency via data source.
data "aws_subnet" "this" {
id = "subnet-abc123" # Hidden dependency
}
Good: Explicit dependency via input variable.
variable "subnet_id" {
type = string
}
resource "aws_instance" "this" {
subnet_id = var.subnet_id
}
Deployment Ordering
Use numeric prefixes for sequential components.
When a workload requires ordered deployment, use alphanumerically sortable numeric prefixes (e.g., 10-, 20-).
workload/
├── 10-foundation/
├── 11-permissions/
├── 20-networking/
├── 30-storage/
└── 40-compute/
The ordering (10, 20, 30) is for humans to understand the intended deployment sequence. The safety net is the isolation principle: deploying out of order either fails cleanly (missing input) or succeeds (dependency exists). Numbering documents intent, not programmatic execution order.
Prefixes MUST be alphanumerically sortable. Use ranges (10, 20, 30) instead of sequential (1, 2, 3). Sequential numbering hides logical grouping and requires renumbering when inserting components.
When a component contains multiple independent resources, they MAY be organized as subfolders:
workload/
└── 20-networking/
├── public/
├── private/
└── internal/
These sub-components MUST remain independent of each other.
File Structure
Maintain a consistent file set for every component and module.
The following files MUST exist in every component directory, even if empty:
component/
├── main.tf # Resources
├── variables.tf # Variable declarations
├── outputs.tf # Output declarations
├── terraform.tf # Backend/Provider config
├── terragrunt.hcl # YAML variable loading
├── terraform.tfvars.yaml # Variables (YAML)
└── terraform.tfvars.sops.yaml # OPTIONAL - only if component has secrets
The following files MUST exist in every module directory:
module/
├── main.tf # Resources and entrypoint
├── variables.tf # Variable declarations
├── outputs.tf # Output declarations
└── README.md # Documentation
Modules have no terraform.tf, YAML, or tfvars files—they inherit providers from the calling component and receive inputs as variables.
Do not split resources into multiple files unnecessarily. Group resources within main.tf using comment dividers. Split into thematic files (e.g., iam.tf) only when main.tf exceeds ~300 lines. Never use per-resource files (e.g., s3_bucket.tf).
Section comments SHOULD include reasoning or architectural context. They MUST NOT explain the code itself—code should be self-explanatory. When adding context, wrap it with a closing divider line.
Example: Section Divider
# ------------------------------------------------------------------------------
# Networking
# ------------------------------------------------------------------------------
resource "aws_network_interface" "this" {
# ...
}
Example: Section Divider with Context
# ------------------------------------------------------------------------------
# Networking
# ------------------------------------------------------------------------------
#
# Network interfaces decoupled from compute, enabling instance replacement
# without IP changes.
#
# Architecture:
# Server
# ├── Interface (persistent) → Stable IP across replacements
# └── Instance (ephemeral) → Replaceable compute
#
# ------------------------------------------------------------------------------
resource "aws_network_interface" "this" {
# ...
}
Naming Conventions
Use this for singleton resources. When a resource is the primary or only one of its kind in a module/component, name it this.
Good:
resource "aws_security_group" "this" { ... }
Bad:
resource "aws_security_group" "main_security_group" { ... }
Use descriptive names for multiple resources. If multiple resources of the same type exist, use functional names.
resource "aws_security_group" "ingress" {
# ...
}
resource "aws_security_group" "egress" {
# ...
}
Variables, locals, and outputs: Use snake_case. Match provider attribute names where possible (e.g., instance_type). Prefix booleans with enable_ or use_. Only add outputs when the use case is clear—no speculative outputs.
Boolean Variables:
variable "enable_monitoring" {
type = bool
}
variable "use_private_subnet" {
type = bool
}
Output Naming:
output "instance_id" {
value = aws_instance.this.id
}
output "instance_arn" {
value = aws_instance.this.arn
}
Formatting
Code must pass terraform fmt. The formatter handles indentation (2 spaces), = alignment within blocks, and whitespace normalization.
The formatter does NOT enforce the following rules:
- Line Length: Soft limit of 100 characters.
- Blank Lines: One blank line between top-level blocks. No double blank lines. One blank line before nested blocks within a resource.
- Trailing Commas: Not required in maps/objects (newline is separator). Required in lists.
- Heredoc: Use
<<-EOFto strip indentation. - Multi-line Ternary: Operators at start of line.
- For Expressions: Opening bracket on same line, closing aligned with block start.
Tags:
# Multi-key
tags = {
Name = var.name
Environment = var.environment
}
# Single-key: inline preferred
tags = { Name = var.name }
Heredoc:
user_data = <<-EOF
#!/bin/bash
echo "hello"
EOF
Multi-line Ternary:
subnet_id = var.use_private
? var.private_subnet_id
: var.public_subnet_id
For Expression:
locals {
instance_ids = [
for instance in aws_instance.this :
instance.id
]
}
Good:
resource "aws_instance" "this" {
ami = var.ami_id
root_block_device {
volume_size = 20
}
tags = { Name = var.name }
}
Ordering
Order blocks consistently in files.
terraform.tf:terraform {}block first, thenprovider {}blocks (alphabetical).variables.tf,outputs.tf: Group logically, then alphabetically within groups.main.tf: Order sections by dependency. Within sections: Locals, then Resources (logical then alphabetical).
terraform.tf:
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
# ...
}
}
provider "aws" {
# ...
}
provider "aws" {
alias = "us_west"
# ...
}
variables.tf:
# Networking
variable "subnet_id" {
type = string
}
variable "vpc_id" {
type = string
}
# Compute
variable "instance_type" {
type = string
}
main.tf:
# ------------------------------------------------------------------------------
# IAM
# ------------------------------------------------------------------------------
locals {
role_name = "${var.name}-role"
}
resource "aws_iam_role" "this" {
name = local.role_name
# ...
}
resource "aws_iam_role_policy_attachment" "this" {
role = aws_iam_role.this.name
# ...
}
# ------------------------------------------------------------------------------
# Compute
# ------------------------------------------------------------------------------
locals {
instance_name = "${var.name}-instance"
}
resource "aws_instance" "this" {
iam_instance_profile = aws_iam_instance_profile.this.name
# ...
}
Order arguments within resource blocks as follows:
- Meta-arguments:
count,for_each,provider. - Identifiers:
name,id,description. - Configuration: Logically grouped, then alphabetical.
- Nested blocks: Logically grouped, then alphabetical.
- Tags.
lifecycle.depends_on.
resource "aws_instance" "this" {
count = var.enabled ? 1 : 0
ami = var.ami_id
instance_type = var.instance_type
iam_instance_profile = aws_iam_instance_profile.this.name
subnet_id = var.subnet_id
vpc_security_group_ids = var.security_group_ids
root_block_device {
volume_size = 20
volume_type = "gp3"
}
tags = { Name = var.name }
lifecycle {
create_before_destroy = true
}
depends_on = [aws_iam_role.this]
}
Variables and Locals
Define variables with explicit types. Arguments within a variable block must be ordered: type, description (avoid if name is self-documenting), default, sensitive, validation.
variable "environment" {
type = string
default = "dev"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Invalid environment."
}
}
Use locals for complex transformations only. Do not use locals for simple pass-throughs or aliases. Use them for chained transformations, filtering, or complex calculations.
locals {
# Normalize input
raw_instances = [
for k, v in var.instances : merge(v, { name = k })
]
# Filter
enabled_instances = [
for inst in local.raw_instances : inst
if inst.enabled
]
# Transform to map for for_each
instances_map = {
for inst in local.enabled_instances : inst.name => inst
}
}
resource "aws_instance" "this" {
for_each = local.instances_map
ami = each.value.ami
instance_type = each.value.instance_type
tags = { Name = each.key }
}
Coding Patterns
- Conditionals: Use
countfor 0-or-1 booleans,for_eachfor collections. Convert lists withtoset(var.list). - Nulls: Use
coalesce()to handle nulls. - Interpolation: Use bare references (e.g.,
var.id) instead of string interpolation ("${var.id}") where possible. - References: Use
one()for 0-or-1 resources. Avoidtry()for references as it masks errors. - Dynamic blocks: Avoid. Use only when the number of nested blocks is determined by input variables. If the number is static, write blocks explicitly.
- Comments: Avoid inline comments. Code should be self-explanatory. Use comments only for unavoidable complexity.
Conditional Resources:
# 0-or-1: use count
resource "aws_instance" "this" {
count = var.enabled ? 1 : 0
# ...
}
# Collections: use for_each
resource "aws_instance" "this" {
for_each = var.instances
# ...
}
Null Handling:
name = coalesce(var.custom_name, "${var.prefix}-default")
String Interpolation:
# Good
subnet_id = var.subnet_id
# Avoid
subnet_id = "${var.subnet_id}"
Referencing Conditional Resources:
# Good - explicit about 0-or-1
output "instance_id" {
value = one(aws_instance.this[*].id)
}
# Avoid - swallows typos and wrong attributes
output "instance_id" {
value = try(aws_instance.this[0].id, null)
}
# Only when resource definitely exists
instance_id = aws_instance.this[0].id
Inline Comments:
# Avoid
subnet_id = var.subnet_id # the subnet to deploy into
# Only when necessary
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10) # /24 blocks starting at .10
IAM Policies
Policy documents SHOULD use aws_iam_policy_document data sources.
The data source provides validation at plan time, catches syntax errors before apply, and enables dynamic resource references. jsonencode MAY be used for trivial policies where the overhead isn’t justified.
data "aws_iam_policy_document" "assume_role" {
statement {
actions = ["sts:AssumeRole"]
effect = "Allow"
principals {
type = "Service"
identifiers = ["ec2.amazonaws.com"]
}
}
}
resource "aws_iam_role" "this" {
name = var.name
assume_role_policy = data.aws_iam_policy_document.assume_role.json
}
Security Groups
Use inline ingress and egress blocks.
Prefer defining rules inline within the aws_security_group resource rather than using separate aws_security_group_rule resources. This prevents race conditions and keeps logic contained.
resource "aws_security_group" "this" {
name = var.name
vpc_id = var.vpc_id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = { Name = var.name }
}
Data Sources
Limit data sources to stable, non-workload resources.
Safe uses include aws_caller_identity, aws_region, or aws_ami (for base images). Do not use data sources to look up workload-specific resources, as this creates hidden dependencies.
data "aws_caller_identity" "this" {}
data "aws_region" "this" {}
data "aws_ami" "this" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
Cross-Component Dependencies
Use terraform_remote_state for values that change with deploys.
Explicit YAML inputs work well for stable values. But when Component B depends on Component A’s outputs that change frequently, manually gathering and updating YAML after each deploy adds overhead. terraform_remote_state reads outputs directly.
data "terraform_remote_state" "foundation" {
backend = "s3"
config = {
bucket = "company-terraform-state"
key = "workload/10-foundation.tfstate"
region = "us-east-1"
}
}
resource "aws_instance" "this" {
subnet_id = data.terraform_remote_state.foundation.outputs.subnet_id
}
Why terraform_remote_state over Terragrunt dependencies? Terragrunt dependencies try to refresh or deploy the dependency. terraform_remote_state just reads—no side effects.
Why terraform_remote_state over SSM? SSM duplicates data (output stored in state AND in SSM). Use SSM only when crossing tool boundaries.
| Method | Use When |
|---|---|
| Explicit YAML | Friction acceptable, values change rarely |
terraform_remote_state | Terraform-to-Terraform, values change with deploys |
| SSM Parameter Store | Terraform-to-Ansible/bash, crossing tool boundaries |
YAML Configuration
Use YAML for all variable inputs. Terragrunt is required to load variables from YAML files.
Terraform has no native way to pass YAML as variable input (-var-file only accepts .tfvars or .tfvars.json). Starting with native Terraform means starting with .tfvars. Later, when SOPS encryption or complex nested structures are needed, everything must be converted. Mandating Terragrunt from day one ensures YAML usage everywhere and SOPS readiness without future migration.
YAML also aligns with Kubernetes and Ansible—one format across the infrastructure stack.
terraform.tfvars.yaml:
servers:
server-1:
subnet_id: subnet-abc123
instance_type: t3.medium
server-2:
subnet_id: subnet-def456
instance_type: t3.large
user_data: |
#!/bin/bash
echo "hello"
terragrunt.hcl (simple):
inputs = yamldecode(file("terraform.tfvars.yaml"))
terragrunt.hcl (with secrets):
inputs = merge(
yamldecode(file("terraform.tfvars.yaml")),
yamldecode(sops_decrypt_file("terraform.tfvars.sops.yaml"))
)
Constraint: Use Terragrunt only for YAML loading. Do not use its DRY features (includes, dependency blocks).
Secrets with SOPS
Encrypt secrets in git using SOPS.
SOPS encrypts YAML values while keeping keys readable. Secrets live alongside code—no external secret store, no runtime dependency, full git history.
.sops.yaml at workload root defines encryption rules:
creation_rules:
- path_regex: \.sops\.yaml$
kms: arn:aws:kms:us-east-1:123456789:key/abc-123
terraform.tfvars.sops.yaml:
database_password: ENC[AES256_GCM,data:...,type:str]
api_key: ENC[AES256_GCM,data:...,type:str]
Terragrunt’s sops_decrypt_file() decrypts at plan/apply time. Values pass through as variables.
Workflow:
sops terraform.tfvars.sops.yaml # Edit secrets (decrypts in $EDITOR)
terragrunt plan # Decrypts automatically
Native Terraform cannot decrypt SOPS files. Terragrunt is required.
Workload-Level Variables
Shared variables at the workload root are OPTIONAL.
Common variables (Region, Account ID) MAY be defined in a root terraform.tfvars.yaml and merged into child components via find_in_parent_folders(). This pattern is useful when multiple components share the same values, but adds complexity. Start without shared files; add them when duplication becomes painful.
workload/
├── .sops.yaml # Encryption config (components inherit, MAY override)
├── terraform.tfvars.yaml # OPTIONAL - shared variables
├── terraform.tfvars.sops.yaml # OPTIONAL - shared secrets
├── 10-foundation/
│ ├── terragrunt.hcl
│ ├── terraform.tfvars.yaml # REQUIRED - component variables
│ └── terraform.tfvars.sops.yaml # OPTIONAL - only if component has secrets
└── 20-networking/
└── ...
component/terragrunt.hcl (without secrets):
inputs = yamldecode(file("terraform.tfvars.yaml"))
component/terragrunt.hcl (with secrets):
inputs = merge(
yamldecode(file("terraform.tfvars.yaml")),
yamldecode(sops_decrypt_file("terraform.tfvars.sops.yaml"))
)
component/terragrunt.hcl (with shared files):
inputs = merge(
yamldecode(file(find_in_parent_folders("terraform.tfvars.yaml"))),
yamldecode(sops_decrypt_file(find_in_parent_folders("terraform.tfvars.sops.yaml"))),
yamldecode(file("terraform.tfvars.yaml")),
yamldecode(sops_decrypt_file("terraform.tfvars.sops.yaml"))
)
When using shared files, workload-level YAML can serve as “global state”—containing variables for all components. Terraform only uses variables declared in variables.tf, ignoring the rest. Each component extracts what it needs.
State Management
Store state in S3 with DynamoDB locking.
Backend configuration must be explicit in terraform.tf. Paths MUST be hardcoded, not interpolated from variables. Dynamic paths risk accidental state migration when variables change. Locking MUST use a DynamoDB table shared across the organization.
State locking via DynamoDB prevents concurrent modifications. The lock table SHOULD be shared across all state files—one table handles locks for the entire organization.
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "workload/component.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
Tagging
Apply tags via the default_tags provider block.
All resources must carry the following tags:
- Name: Stack name, typically matches the component name. Individual resources override this with their own
Nametag (e.g.,web-server-1) when needed. - Workload: Identifies what workload the component belongs to.
- Component: Identifies the component within the workload.
- UUID: Guardrail that prevents collisions when selecting resources by tags. Without UUID, generic tags like
Workload=cachewould match all cache resources across different deployments. UUID SHOULD be applied at workload level, MAY be applied at component level.
Use Cases:
- Cost tracking by workload or specific deployment.
- SSM associations targeting instances by UUID.
- AWS Resource Groups filtered by tags.
provider "aws" {
default_tags {
tags = {
Name = var.name
Workload = "app"
Component = "database"
UUID = var.uuid
}
}
}
Version Control
Pin major versions using ~> X.0. The ~> operator allows the rightmost component to increment. ~> 5.0 permits any 5.x release but blocks 6.0. This protects against breaking changes while receiving bug fixes and new features.
Pin tool versions using version files. Place version files at the workload root to enable automatic switching via tfenv and tgenv.
.terraform-version.terragrunt-version
workload/
├── .terraform-version # e.g., 1.5.7
├── .terragrunt-version # e.g., 0.66.5
└── ...
Also declare requirements in terraform.tf:
terraform {
required_version = ">= 1.5.0"
}