Self-Hosted GitHub Runners on Spot Instances
GitHub-hosted runners are $0.008/minute. EC2 Spot c5.2xlarge is $0.0344/hour — 7× cheaper for the same hardware. Here's the architecture behind jit-runners: on-demand self-hosted runners via AWS Lambda and EC2 Spot.
GitHub-hosted runners cost $0.008 per minute (Linux, 2 vCPU / 7 GB). If your team runs 500 minutes of CI per day, that’s $1,200/month. For a startup with a moderately active engineering team, CI bills are often the largest single cloud cost line item.
The standard alternative — persistent self-hosted runners — solves the cost problem but introduces uptime management: servers that are always on, patching to handle, capacity to predict. On bursty workloads (PR-heavy development, release days), they’re either under-provisioned (queued builds) or over-provisioned (idle machines you’re paying for).
jit-runners is our answer: GitHub Actions runners that provision on demand when a workflow starts, use EC2 Spot for 60–80% cost reduction versus on-demand, and terminate when the job finishes.
The architecture
GitHub Actions workflow queued
│
▼
GitHub webhook (workflow_job: queued)
│
▼
API Gateway → Lambda (Go)
│
├── EC2 RunInstances (Spot, user data: GitHub runner install + register)
│
▼
EC2 instance comes online
│
├── Registers with GitHub as ephemeral runner
│
▼
GitHub assigns job to runner
│
├── Job executes
│
▼
workflow_job: completed webhook
│
├── Lambda terminates instance (or instance self-terminates on idle)
Three components:
- Lambda function (Go) — receives webhook events, provisions runners, handles termination
- EC2 Spot instances — ephemeral runners with JIT registration tokens
- GitHub App — handles webhook delivery and runner registration tokens
Why Lambda + Go
Lambda is the right compute for this use case:
- Event-driven — webhook fires, Lambda runs, done. No polling, no scheduler.
- Cost — Lambda invocations for webhook handling are essentially free (well within the free tier)
- Scale — 1,000 concurrent PRs? Lambda scales horizontally without configuration
Go is the right language for this Lambda:
- Cold start — compiled Go binaries have 10–50ms Lambda cold starts. Python or Node.js are fine too, but Go is genuinely fast.
- Single binary — no runtime dependencies, trivial deployment (
GOARCH=amd64 GOOS=linux go build) - AWS SDK v2 — the official Go SDK is well-maintained and performant
The Lambda function is small: parse webhook event, call EC2 RunInstances with a user data script, handle errors. The whole thing fits comfortably in a single file.
EC2 Spot for runners
Spot instances offer unused EC2 capacity at 60–80% discount versus on-demand. The risk: AWS can reclaim capacity with 2 minutes notice.
For CI runners, this risk is manageable:
- Spot interruption = job failure = job retry — GitHub Actions retries interrupted jobs automatically (with
retry-failed-jobs: true) or manually - Short job duration — most CI jobs finish in 5–15 minutes. Spot interruption probability over that window is low.
- Diversified instance types — using a Spot Fleet or instance type diversification reduces interruption frequency
Our default configuration: c5.2xlarge (8 vCPU / 16 GB) as primary, c5a.2xlarge as fallback. Average CI cost: ~$0.05/hour versus ~$0.34/hour on-demand.
The user data script
When the EC2 instance starts, user data handles runner setup:
#!/bin/bash
set -euo pipefail
# Install runner
mkdir -p /actions-runner && cd /actions-runner
curl -sSL https://github.com/actions/runner/releases/latest/download/actions-runner-linux-x64-*.tar.gz | tar -xz
# Register as ephemeral runner
./config.sh \
--url "https://github.com/ORG_NAME" \
--token "REGISTRATION_TOKEN" \
--name "spot-$(ec2-metadata --instance-id)" \
--labels "self-hosted,linux,spot" \
--ephemeral \
--unattended
# Run (exits after one job when --ephemeral)
./run.sh
The --ephemeral flag is key: the runner deregisters automatically after completing one job. No cleanup needed, no state to manage between runs.
The registration token (REGISTRATION_TOKEN) is fetched by the Lambda function via the GitHub API just before calling RunInstances, then injected into user data. Tokens expire after one hour — tight enough window for a newly provisioned instance.
Handling Spot interruptions gracefully
GitHub Actions doesn’t automatically retry jobs on runner failure. To handle Spot interruptions:
- Termination notice poller — user data starts a background process that polls the EC2 metadata endpoint for termination notices. When a notice arrives, it sends
SIGTERMto the runner process. - Runner cancels job — the runner marks the job as canceled on
SIGTERM, which is retriable. - Lambda retriggers — a new
workflow_job: queuedevent fires for the canceled job, provisioning a fresh instance.
This adds ~2 minutes of latency on interruption (termination notice → cancel → new instance provision → runner online). For most CI workloads this is acceptable; for time-critical jobs, use on-demand instance types.
Cost comparison
For a team running 1,000 CI minutes/day:
| Approach | Monthly cost |
|---|---|
| GitHub-hosted runners | $240 |
| EC2 on-demand (c5.2xlarge) | ~$180 |
| EC2 Spot (c5.2xlarge, ~70% discount) | ~$55 |
At scale the savings compound. 5,000 minutes/day: GitHub → $1,200/month, Spot → ~$275/month.
The break-even on engineering time to set up jit-runners is typically under a week of CI spend.
What jit-runners handles
The open source jit-runners project provides:
- Lambda function (Go) for webhook handling and instance provisioning
- Terraform module for Lambda, API Gateway, IAM roles, and security groups
- GitHub App configuration guide
- User data templates for Ubuntu and Amazon Linux 2
- Spot interruption handler sidecar
The deployment guide is in the repository README. Setup takes about 30 minutes if you have AWS credentials and GitHub App access.
jit-runners is open source under MIT. If you hit a Spot interruption rate that’s causing real problems, open an issue — there are several approaches to mitigation we haven’t implemented yet.