IaC

Building a CI Policy Gate for AWS Without Third-Party Tools

Daniel Ferreira May 6, 2025

The goal: block deployments that would violate your AWS SCPs before the terraform apply runs. Do it using only AWS APIs and Python. No OPA server, no Sentinel, no third-party policy engine. Here's a working approach and an honest assessment of where it breaks down when you try to scale it.

This is a post for teams that want to understand the underlying mechanics before deciding whether to build or buy. We'll get to the limitations — they're real and important — but the foundation is useful regardless of what you decide to run in production.

What the IAM Policy Simulator Actually Does

AWS provides a policy simulation API: iam:SimulatePrincipalPolicy. It takes a principal ARN, a list of action strings, and optional resource ARNs and context keys, and returns whether each action would be allowed or denied given the principal's current effective policy set — including SCPs.

import boto3
import json

def simulate_policy(principal_arn, actions, resource_arns=None, context_entries=None):
    iam = boto3.client('iam')

    if resource_arns is None:
        resource_arns = ['*']
    if context_entries is None:
        context_entries = []

    response = iam.simulate_principal_policy(
        PolicySourceArn=principal_arn,
        ActionNames=actions,
        ResourceArns=resource_arns,
        ContextEntries=context_entries
    )

    results = {}
    for result in response['EvaluationResults']:
        action = result['EvalActionName']
        decision = result['EvalDecision']
        results[action] = {
            'decision': decision,
            'allowed': decision == 'allowed',
            'matched_statements': result.get('MatchedStatements', [])
        }

    return results

For a Terraform deployment role, you'd call this with the role's ARN and the set of actions your Terraform configuration will call. If any action comes back as implicitDeny or explicitDeny, the gate fails and the pipeline stops.

Wiring It Into GitHub Actions

A minimal GitHub Actions workflow that runs the policy gate before a Terraform plan:

name: Policy Gate + Deploy

on:
  pull_request:
    paths:
      - 'infra/**'

permissions:
  id-token: write
  contents: read

jobs:
  policy-gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_POLICY_CHECK_ROLE_ARN }}
          aws-region: us-east-1

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: pip install boto3

      - name: Run policy gate
        env:
          DEPLOY_ROLE_ARN: ${{ secrets.AWS_DEPLOY_ROLE_ARN }}
          REQUIRED_ACTIONS_FILE: infra/required-actions.json
        run: python scripts/policy_gate.py

  terraform-plan:
    needs: policy-gate
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_DEPLOY_ROLE_ARN }}
          aws-region: us-east-1
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
        working-directory: infra
      - run: terraform plan
        working-directory: infra

The policy_gate.py script reads a required-actions.json file that lists the AWS API actions the Terraform configuration is expected to call, then runs the simulation:

import boto3
import json
import os
import sys

def main():
    deploy_role_arn = os.environ['DEPLOY_ROLE_ARN']
    actions_file = os.environ.get('REQUIRED_ACTIONS_FILE', 'required-actions.json')

    with open(actions_file) as f:
        config = json.load(f)

    iam = boto3.client('iam')
    failures = []

    for check in config['checks']:
        actions = check['actions']
        resource_arns = check.get('resources', ['*'])
        context = check.get('context', [])

        paginator = iam.get_paginator('simulate_principal_policy')
        pages = paginator.paginate(
            PolicySourceArn=deploy_role_arn,
            ActionNames=actions,
            ResourceArns=resource_arns,
            ContextEntries=context
        )

        for page in pages:
            for result in page['EvaluationResults']:
                if result['EvalDecision'] != 'allowed':
                    failures.append({
                        'action': result['EvalActionName'],
                        'decision': result['EvalDecision'],
                        'resource': result.get('EvalResourceName', '*')
                    })

    if failures:
        print("Policy gate FAILED. The following actions are denied:")
        for f in failures:
            print(f"  {f['action']} on {f['resource']}: {f['decision']}")
        sys.exit(1)
    else:
        print("Policy gate PASSED. All required actions are allowed.")

if __name__ == '__main__':
    main()

The required-actions.json looks like this:

{
  "checks": [
    {
      "actions": ["s3:CreateBucket", "s3:PutBucketEncryption", "s3:PutPublicAccessBlock"],
      "resources": ["arn:aws:s3:::*"],
      "context": [
        {
          "ContextKeyName": "aws:RequestedRegion",
          "ContextKeyValues": ["us-east-1"],
          "ContextKeyType": "string"
        }
      ]
    },
    {
      "actions": ["ec2:RunInstances", "ec2:DescribeInstances"],
      "resources": ["*"]
    }
  ]
}

The Role Separation: Policy Check Role vs. Deploy Role

Notice the workflow uses two different roles: AWS_POLICY_CHECK_ROLE_ARN for the gate step and AWS_DEPLOY_ROLE_ARN for the Terraform plan step. This separation is important.

The policy check role needs only iam:SimulatePrincipalPolicy permission. It doesn't need the broad permissions of the deployment role. Keeping these separate means the policy check step doesn't require assuming the deployment role — which matters if your SCP restricts who can assume the deployment role (you don't want to assume it during PRs, only during actual deployments).

The check role's policy is minimal:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "iam:SimulatePrincipalPolicy",
        "iam:GetRole",
        "iam:ListRolePolicies",
        "iam:GetRolePolicy"
      ],
      "Resource": [
        "arn:aws:iam::*:role/terraform-deploy-*"
      ]
    }
  ]
}

Where This Breaks at Scale

The approach above works well for small to medium Terraform configurations with stable action sets. Here's where it starts to fail, honestly described:

The required-actions.json maintenance problem

The required-actions.json file has to be manually maintained. As your Terraform configuration evolves, someone has to update the list of required actions to match. If the list is stale — missing a new action that a new resource type requires — the gate passes and the action isn't checked. This is a false-negative scenario: the gate passes, Terraform runs, and fails at apply time with an SCP denial for an action you didn't think to include.

Automating the required-actions list is possible but non-trivial. Tools like iamlive can capture the actual API calls made by a Terraform plan in a sandbox account, which gives you the accurate action list. But that requires a sandbox account with equivalent permissions to run the plan against — which is infrastructure overhead that partially defeats the "no third-party tools" premise.

Context key accuracy

Many SCP conditions use context keys: aws:RequestedRegion, aws:PrincipalTag/*, aws:RequestTag/*. The policy simulation only evaluates correctly if you provide those context keys in the ContextEntries field. If your SCP has a condition on aws:RequestTag/CostCenter and you don't include that context key in the simulation, the simulation may return allowed when the real call would be denied because the tag is absent.

Context keys that are only populated at runtime — like aws:PrincipalTag/* that come from the role's tags — need to be specified explicitly in the simulation. You need to know what tags your deployment role has and include them as context. This is additional metadata to maintain.

Resource-level conditions

Some SCP conditions use resource-specific condition keys that are only meaningful for specific resource ARNs — for example, s3:prefix for S3 object operations. To simulate these correctly, you need to provide the actual resource ARNs. If your Terraform creates resources with names generated at apply time (using random_id or similar), you can't construct accurate resource ARNs at plan time.

The simulator is not authoritative for service-linked roles

AWS service-linked roles and certain cross-service principal calls don't simulate accurately through SimulatePrincipalPolicy. If your Terraform creates resources that subsequently make calls via service-linked roles (e.g., ECS tasks, Lambda execution roles), those calls are not captured by simulating your deployment role's permissions. The simulation tells you whether your Terraform deployment role can make the calls — it doesn't tell you whether the resulting resources' service-linked roles can.

What This Approach Is Actually Good For

Despite the limitations, this approach is genuinely useful in a specific scenario: you have a well-defined set of AWS actions your Terraform calls, a stable set of SCPs, and you want to catch accidental SCP misconfigurations — like an SCP change that unintentionally breaks a deployment role's ability to create certain resource types — before they cause CI failures.

It's a regression detection mechanism more than a comprehensive policy gate. If your security team updates the SCPs and introduces an unintentional deny for a deployment role's actions, this gate catches it on the next PR that touches infrastructure. That's a real operational value, even with the limitations.

For a more comprehensive policy gate — one that catches all API calls including generated ones, accurately evaluates complex conditions, and doesn't require a manually maintained action list — you need either native tooling built into your IaC workflow (CloudFormation hooks, Terraform Sentinel, AWS Config conformance packs) or something purpose-built to evaluate policies at deployment time rather than simulating them against a static list.

Building this without third-party tools is a useful exercise because it exposes exactly what the native AWS policy evaluation APIs can and can't tell you. That understanding is worth having whether you decide to build or buy the production solution.