AWS Cost Management | AdaptiveGears

AWS Cost Management Architecture

Use both AWS Budgets and Cost Anomaly Detection for cost visibility.

┌─────────────────┐     ┌─────────────────────┐
│  AWS Budgets    │     │  Cost Anomaly       │
│  (Thresholds)   │     │  (ML Detection)     │
└────────┬────────┘     └──────────┬──────────┘
         │                         │
         └──────────┬──────────────┘
                    ▼
              ┌───────────┐
              │ SNS Topic │ → Slack/Email
              └───────────┘

Each layer catches what the other misses:

Scenario	Budgets	Anomaly
Gradual cost creep toward limit	✓ Alerts at 100%	✗ Becomes “normal”
Sudden spike, still under budget	✗ No threshold crossed	✓ Detects deviation
Predictable seasonal increase	✗ May trigger false alarm	✓ ML learns the pattern

Budgets enforce hard limits. Anomaly detection catches the unexpected.

AWS Budget

Set static monthly limits with percentage-based alert thresholds.

Predictable and explicit. Requires an SNS topic for notifications (created separately).

locals {
  budget = 1000
}

data "aws_sns_topic" "this" {
  name = "notifications"
}

resource "aws_budgets_budget" "this" {
  name         = var.name
  budget_type  = "COST"
  limit_amount = local.budget
  limit_unit   = "USD"
  time_unit    = "MONTHLY"

  cost_filter {
    name   = "TagKeyValue"
    values = [format("UUID$%s", var.uuid)]
  }

  # 100% Forecasted
  notification {
    comparison_operator       = "GREATER_THAN"
    threshold                 = 100
    threshold_type            = "PERCENTAGE"
    notification_type         = "FORECASTED"
    subscriber_sns_topic_arns = [data.aws_sns_topic.this.arn]
  }

  # 100% Actual
  notification {
    comparison_operator       = "GREATER_THAN"
    threshold                 = 100
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_sns_topic_arns = [data.aws_sns_topic.this.arn]
  }

  tags = {
    Name = var.name
  }
}

Budget limit: Expected baseline plus anticipated usage, with ~20% buffer. The buffer accounts for estimation uncertainty and variable month lengths (28-31 days).

Thresholds:

100% forecasted → projected to exceed budget
100% actual → budget exceeded

Utilization and Coverage Budgets

Monitor commitment usage to avoid paying for unused RIs or Savings Plans.

Budget Type	Question it answers	Alert when
`RI_UTILIZATION`	Are we using purchased RIs?	Below target (e.g., <80%)
`RI_COVERAGE`	What % of usage is covered by RIs?	Below target
`SAVINGS_PLANS_UTILIZATION`	Are we using purchased SPs?	Below target (e.g., <80%)
`SAVINGS_PLANS_COVERAGE`	What % of usage is covered by SPs?	Below target

resource "aws_budgets_budget" "sp_utilization" {
  name         = "savings-plans-utilization"
  budget_type  = "SAVINGS_PLANS_UTILIZATION"
  limit_amount = "80.0"
  limit_unit   = "PERCENTAGE"
  time_unit    = "MONTHLY"

  notification {
    comparison_operator       = "LESS_THAN"
    threshold                 = 80
    threshold_type            = "PERCENTAGE"
    notification_type         = "ACTUAL"
    subscriber_sns_topic_arns = [data.aws_sns_topic.this.arn]
  }
}

Low utilization means you’re paying for commitments you’re not using. Investigate:

Over-purchased capacity
Workload changes since purchase
Resources moved to different instance families (for EC2 RIs)

See: Creating a Budget

Budget Actions

Automate responses when budget thresholds are crossed.

Budget Actions execute automatically or queue for approval when thresholds trigger.

Action Type	What it does
`APPLY_IAM_POLICY`	Attach deny policy to users/roles/groups
`APPLY_SCP_POLICY`	Apply SCP at OU level (management account only)
`RUN_SSM_DOCUMENTS`	Stop/terminate EC2 or RDS instances

Approval Model	Behavior
`AUTOMATIC`	Executes immediately when threshold crossed
`MANUAL`	Queues action, notifies via SNS, requires approval

Manual approval flow:

Budget exceeded → Action queued → SNS notification →
Human reviews → Approves via console/CLI → Action executes

Use case: Budget hits 100% → automatically apply deny policy → blocks new EC2 launches → prevents runaway spend.

Constraints:

Requires IAM role granting Budgets permission to execute actions
SCPs apply at OU level only, not individual accounts
Actions can auto-reverse when budget returns to range

See: Configuring Budget Actions

Budget Estimation

Use Cost Explorer CLI to analyze historical costs before setting budget limits.

Step 1: Get monthly costs for the last 3 months to establish baseline:

# macOS (BSD)
START=$(date -v-3m +%Y-%m-01)

# Linux (GNU)
START=$(date -d "-3 months" +%Y-%m-01)

aws ce get-cost-and-usage \
  --time-period Start=$START,End=$(date +%Y-%m-01) \
  --granularity MONTHLY \
  --metrics UnblendedCost \
  --filter '{"Dimensions": {"Key": "SERVICE", "Values": ["Amazon Simple Storage Service"]}}'

Step 2: If a month looks anomalous, drill down to daily costs:

aws ce get-cost-and-usage \
  --time-period Start=$(date +%Y-%m-01),End=$(date +%Y-%m-%d) \
  --granularity DAILY \
  --metrics UnblendedCost \
  --filter '{"Dimensions": {"Key": "SERVICE", "Values": ["Amazon Simple Storage Service"]}}'

Step 3: Project monthly cost with buffer:

Daily average × 31 days = Projected monthly
Projected monthly × 1.2 = Budget limit (with 20% buffer)

The 20% buffer accounts for estimation uncertainty and variable month lengths (28-31 days).

AWS Cost Anomaly Detection

Use ML-based monitors to detect deviations from learned spending patterns.

Alerts regardless of budget limits—catches spikes that stay under threshold.

resource "aws_ce_anomaly_monitor" "this" {
  name         = var.name
  monitor_type = "CUSTOM"

  monitor_specification = jsonencode({
    And            = null
    CostCategories = null
    Dimensions     = null
    Not            = null
    Or             = null
    Tags = {
      Key          = "user:UUID"
      Values       = [var.uuid]
      MatchOptions = ["EQUALS"]
    }
  })

  tags = {
    Name = var.name
  }
}

resource "aws_ce_anomaly_subscription" "this" {
  name             = var.name
  frequency        = "IMMEDIATE"
  monitor_arn_list = [aws_ce_anomaly_monitor.this.arn]

  subscriber {
    type    = "SNS"
    address = data.aws_sns_topic.this.arn
  }

  threshold_expression {
    and {
      dimension {
        key           = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
        match_options = ["GREATER_THAN_OR_EQUAL"]
        values        = [local.budget * 0.05]
      }
    }
    and {
      dimension {
        key           = "ANOMALY_TOTAL_IMPACT_PERCENTAGE"
        match_options = ["GREATER_THAN_OR_EQUAL"]
        values        = ["20"]
      }
    }
  }

  tags = {
    Name = var.name
  }
}

Thresholds (AND logic):

Absolute: ≥$50 impact (~5% of budget limit)
Percentage: ≥20% above ML-predicted cost

Both conditions MUST be true, filtering noise while catching real spikes.

Absolute threshold calibration: Set to ~5% of budget limit. This derives from error budget thinking—if your 20% buffer is your “cost error budget,” alert when a single anomaly threatens to consume ~25% of that buffer (0.20 × 0.25 = 0.05).

AWS Cost Categories

Use Cost Categories to group costs by rules, not by tagging resources.

Cost Categories apply to cost line items, not resources. Define rules based on dimensions (account, service, tag, region), and AWS automatically categorizes all matching costs.

┌─────────────────────────────────────────────────────────────┐
│                      Cost Categories                        │
├─────────────────────────────────────────────────────────────┤
│  Tags                          Cost Categories              │
│  ────                          ───────────────              │
│  Applied to: Resources         Applied to: Cost line items  │
│  Requires: Tagging each        Requires: Rules              │
│  Retroactive: No (backfill)    Retroactive: Yes (in month)  │
└─────────────────────────────────────────────────────────────┘

Rule Dimensions

Cost Categories can group by:

Dimension	Example
Account	Account ID or name
Service	`AmazonS3`, `AmazonEC2`, `AWSLambda`
Region	`us-east-1`, `eu-west-1`
Tag	Any activated cost allocation tag
Charge Type	Usage, Tax, Fee, Refund
Cost Category	Another cost category (hierarchical)

Rule Types

Regular rules - static mapping:

rule {
  value = "Platform"
  rule {
    dimension {
      key           = "LINKED_ACCOUNT"
      values        = ["111111111111", "222222222222"]
      match_options = ["EQUALS"]
    }
  }
}

Inherited value rules - dynamic from tag values:

rule {
  type = "INHERITED_VALUE"
  inherited_value {
    dimension_name = "TAG"
    dimension_key  = "Team"
  }
}

Inherited rules automatically create category values from tag values. If resources have Team=alpha, Team=beta, the cost category gets values alpha, beta without manual rule updates.

Service-Level Anomaly Detection

Use Cost Categories to scope anomaly monitors to a specific AWS service.

Anomaly monitors cannot filter by service directly. Create a Cost Category first, then reference it in the monitor specification.

resource "aws_ce_cost_category" "this" {
  name         = var.name
  rule_version = "CostCategoryExpression.v1"

  rule {
    value = "S3"

    rule {
      dimension {
        key           = "SERVICE_CODE"
        values        = ["AmazonS3"]
        match_options = ["EQUALS"]
      }
    }
  }

  default_value = "Other"
}

resource "aws_ce_anomaly_monitor" "this" {
  name         = var.name
  monitor_type = "CUSTOM"

  monitor_specification = jsonencode({
    And        = null
    Dimensions = null
    Not        = null
    Or         = null
    Tags       = null
    CostCategories = {
      Key          = aws_ce_cost_category.this.name
      Values       = ["S3"]
      MatchOptions = ["EQUALS"]
    }
  })

  tags = { Name = var.name }
}

Common service codes: AmazonS3, AmazonEC2, AmazonRDS, AWSLambda.

Constraints:

Cost Categories take up to 24 hours to populate after creation
Only management account can create/manage
Retroactive within current month only

See: Organizing Costs Using Cost Categories

AWS Data Exports

Use Data Exports (CUR 2.0) for granular cost and usage data in S3.

CUR 2.0 delivers detailed billing data to S3 for analysis with Athena, QuickSight, or custom pipelines.

Features:

Fixed schema with nested key-value pairs for tags, cost categories, product attributes
SQL-based column selection and row filtering
Split cost allocation data for ECS/EKS container costs

Setup

Billing Console → Data Exports → Create export →
Select table (CUR 2.0) → Choose S3 bucket → Configure columns

CLI allows full SQL: column selection, row filters, column renaming.

Constraints:

Parquet format (columnar, optimized for Athena queries)
No backfill—data starts from export creation date
Delivered to S3 (standard storage costs apply)

See: What is AWS Data Exports?

AWS Cost Allocation Tags

Tags on resources are NOT the same as tags in billing. Activation is required.

Resource tags and cost allocation tags are separate concepts. A tag applied to an EC2 instance does nothing for cost tracking until you explicitly activate it in the Billing console.

Tag Types

AWS provides two tag types, activated separately:

Type	Prefix	Source	Scope
AWS-generated	`aws:`	Created by AWS automatically	Limited services (no Lambda, RDS, SNS)
User-defined	`user:`	Created by you	All taggable resources

AWS-generated tags (e.g., aws:createdBy) auto-enable for all member accounts once activated. User-defined tags require manual application but offer full control over your business taxonomy.

Use both. AWS-generated catches what you forgot to tag. User-defined expresses your cost structure.

See: AWS-Generated vs User-Defined Cost Allocation Tags

Activation

Activate tags in the Billing console or via CLI before they appear in Cost Explorer or Budgets.

Console:

Billing Console → Cost Allocation Tags → Select tags → Activate

CLI:

# List inactive tags
aws ce list-cost-allocation-tags --status Inactive

# Activate tags (max 20 per request)
aws ce update-cost-allocation-tags-status \
  --cost-allocation-tags-status \
    TagKey=Environment,Status=Active \
    TagKey=Project,Status=Active \
    TagKey=UUID,Status=Active

Constraints:

Only management account can activate tags
Takes up to 24 hours to appear after activation
Maximum 500 active cost allocation tags
When moving accounts between organizations, tags lose “active” status—reactivate in new org

See: Activating User-Defined Cost Allocation Tags, update-cost-allocation-tags-status CLI

Retroactivity

Cost allocation tags are prospective by default.

Activating a tag today shows costs from today forward. Historical costs remain untagged.

Backfill (since March 2024) allows retroactive application up to 12 months:

Billing Console → Cost Allocation Tags → Backfill tags → Select month

Constraints:

Resource MUST have had the tag at that time - can’t invent history
Backfill date must be 1st of month (billing period start)
One backfill request per 24 hours
Updates Cost Explorer, Data Exports, CUR within 24 hours

Timeline example:

June 2024     - Tag "Project=X" applied to resource
November 2024 - Tag activated for cost allocation
December 2024 - Backfill requested from January 2024

Result:
- Jan-May 2024: No tag values (tag wasn't on resource)
- Jun-Dec 2024: Tag values visible in cost data

See: Backfill Cost Allocation Tags

Tagging Strategy

Use hierarchical tags for aggregation and unique tags for isolation.

Cost allocation serves two purposes: aggregate costs for reporting (showback) and isolate costs for alerting (budgets, anomaly detection). Different tag types serve each purpose.

┌─────────────────────────────────────────────────────────────┐
│                         Workload: app                       │
│                         UUID: a1b2c3d4                      │
├─────────────────────────────┬───────────────────────────────┤
│   Component: database       │   Component: cache            │
├─────────────────────────────┼───────────────────────────────┤
│   Name: primary-db          │   Name: redis-1               │
│   Name: replica-db          │                               │
└─────────────────────────────┴───────────────────────────────┘

Tag	Scope	Purpose
`Name`	Per resource	Human-readable identifier in console
`Workload`	Shared across deployment	Group of resources delivering business value
`Component`	Shared within component	Logical unit within workload
`UUID`	Workload or component level	Collision guardrail for precise filtering

Aggregation (shared tags):

Workload=app → total cost of the app across all deployments
Component=database → cost of database components

Isolation (unique tags):

UUID=a1b2c3d4 → cost of this specific deployment

Without UUID, generic tags match unrelated resources:

# Bad - matches all production resources
cost_filter {
  name   = "TagKeyValue"
  values = ["Environment$production"]
}

# Good - scoped to exact deployment
cost_filter {
  name   = "TagKeyValue"
  values = [format("UUID$%s", var.uuid)]
}

UUID enables:

Budget alerts scoped to specific infrastructure
Anomaly detection without cross-deployment noise
SSM associations targeting instances by deployment
AWS Resource Groups filtered to exact resources

See: Terraform Tagging for implementation with default_tags

Tag Key Format by Service

Tag key syntax differs across AWS cost services.

Service	Format	Example
Cost Explorer	`tag:KeyName`	`tag:Environment`
Budgets cost_filter	`TagKeyValue` with `Key$Value`	`Environment$production`
Anomaly Detection	`user:` prefix	`user:Environment`

This inconsistency causes silent failures. A filter that works in Cost Explorer won’t work in Budgets without reformatting.

See: Using Cost Allocation Tags

Split Cost Allocation

Use split cost allocation to attribute shared EC2 costs to individual containers.

EC2-backed ECS tasks and EKS pods share instance costs. Standard billing shows EC2 line items, not container-level breakdown. Split cost allocation calculates each container’s share based on CPU and memory consumption.

┌─────────────────────────────────────┐
│         EC2 Instance ($100)         │
├─────────────┬─────────────┬─────────┤
│  Pod A      │  Pod B      │  Pod C  │
│  40% CPU    │  35% CPU    │  25%    │
│  $40        │  $35        │  $25    │
└─────────────┴─────────────┴─────────┘

Opt-in required (two steps):

1. Cost Management Preferences → Split cost allocation data → Enable
2. CUR report → Edit → Report content → Split cost allocation data ✓

For EKS, AWS auto-generates cost allocation tags:

Tag	Description
`aws:eks:cluster-name`	Cluster name
`aws:eks:namespace`	Kubernetes namespace
`aws:eks:node`	Node name
`aws:eks:workload-type`	ReplicaSet, StatefulSet, Job, DaemonSet
`aws:eks:workload-name`	Workload name
`aws:eks:deployment`	Parent deployment (ReplicaSets only)

EKS also supports importing Kubernetes labels as cost allocation tags (up to 50 per pod).

Constraints:

Data appears in CUR only, not Cost Explorer
Significant CUR volume increase (2-3 new line items per container per hour)
EKS accelerator support (GPU, Trainium, Inferentia) adds third line item
Fargate tasks already have discrete costs—split allocation not needed

See: Understanding Split Cost Allocation Data, Enabling Split Cost Allocation