Building a CI/CD Tool Exposed My Deployment Blind Spots

I started DeployLens because I was frustrated. Not in a visionary way. I was working on AegisMesh, which is one of my another IAM project, pushing changes constantly, and couldn't tell what version was running where. GitHub said the workflow passed. AWS said something was running. Same thing? I had no clean answer.

So I built a tool to connect those dots. Somewhere in the middle of building it, I realized I also had no idea how insecure my pipelines were.

Your pipeline has permissions. Real ones.

CI/CD tutorials frame it as a speed thing. Push code, tests run, it deploys. What they skip: your pipeline can push container images, update infrastructure, write to S3, call AWS APIs. In most early setups, mine included, those permissions are wide open.

That's what started bothering me while building DeployLens. I kept noticing how much access my GitHub Actions workflows quietly assumed. Nobody hacked me, nothing broke, but I kept asking: if someone pushed a malicious commit right now, what could the pipeline actually do with its current permissions?

I didn't love where that question went.

The secrets problem

Early on I had AWS credentials in GitHub Actions secrets. AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, injected as env vars. Standard setup, everyone does it.

Then I started reading about what goes wrong. Workflows that run on PRs from forks. Compromised dependencies that exfiltrate env vars. A workflow that accidentally logs env to stdout and now your keys are in a public build log. These aren't hypothetical.

I switched to OIDC. GitHub Actions can assume an IAM role directly, no stored credentials, short-lived token that expires after the job. Maybe 30 minutes of setup total. I wouldn't have touched it if DeployLens hadn't forced me to actually look at how my pipeline was talking to AWS.

What CodeQL found

I added CodeQL to AegisMesh mostly because it looked good. Security scanning, SAST, sure.

Then it flagged something. User input passing through without proper sanitization. Not catastrophic, but the kind of pattern that becomes catastrophic when the code around it changes. Fixed it in 10 minutes.

The part I keep thinking about: I wrote that code. I reviewed it. I didn't see it.

Static analysis catches a specific class of problems that humans miss not because we're careless but because we read for logic. We're checking if it does what we want, not whether it could be exploited. CodeQL doesn't read for intent. It just looks for patterns. That's exactly why it caught something I didn't.

The actual problem is visibility

Security issues in CI/CD are usually not dramatic. It's just that nobody knows the real state of what's running. The SHA that passed CI and the container image actually serving traffic — same thing? Did the last deploy finish? Did it roll back without telling anyone?

If you can't answer those questions, you can't answer whether a vulnerable version is deployed right now.

DeployLens matched commit SHAs from GitHub against ECS task definitions on the AWS side. Simple idea, genuinely annoying to implement because AWS doesn't expose that data in any obvious way. But building it changed how I think about pipelines. Less "automation tube" and more "system with its own state, access, and history that nobody's watching."

What I'd tell myself at the start

Don't save security for a cleanup pass. OIDC takes less time to set up than rotating leaked credentials later. Branch protection is five minutes. CodeQL is a checkbox in GitHub's UI.

The hard part was never the implementation. It was noticing the gap existed. And I only noticed because I was building something that forced me to look.

Command Palette

Your pipeline has permissions. Real ones.

The secrets problem

What CodeQL found

The actual problem is visibility

What I'd tell myself at the start