GCP compute skill. Use when: (1) deploying serverless containers on Cloud Run — scale-to-zero, pay per request,
GCP offers four compute tiers. Pick the leftmost that fits — each step right costs more operational weight:
Cloud Run → Cloud Functions → GKE Autopilot → GKE Standard → Compute Engine
The 2nth.ai default is Cloud Run. It gives you containers with scale-to-zero, pay-per-request, global HTTPS endpoints, and integrates cleanly with Pub/Sub, Cloud SQL, Vertex AI, and Workspace DWD. Lambda-equivalent workloads that need >60 min runtime, persistent state, or GPU specialisation move up to GKE or GCE.
| Service | Best for | Cold start | Runtime | Scale-to-zero | Price model |
|---|---|---|---|---|---|
| Cloud Run | HTTP/gRPC containers, event triggers | 100–500ms | Up to 60 min/req | ✓ | vCPU-sec + GB-sec + requests |
| Cloud Functions v2 | Single-function glue, event consumers | 200–800ms | Up to 60 min | ✓ | Same as Cloud Run (runs on it) |
| GKE Autopilot | Multi-service K8s apps, complex routing | Node-level (minutes) | Unlimited | ✗ (pod-level HPA) | vCPU + memory per pod-hour |
| GKE Standard | Full K8s control, DaemonSets, custom nodes | Node-level | Unlimited | ✗ | Node VM + control plane fee |
| Compute Engine | Stateful VMs, GPUs, legacy workloads | Instance boot (seconds) | Unlimited | ✗ | Per-second VM billing |
Cloud Run runs any containerised HTTP server on Google's managed platform. You push an image, Google runs it — auto-scaling from 0 to thousands of instances, auto-TLS, auto-HTTPS URL. The sweet spot for ~90% of backend services.
# Cloud Run builds a container via Buildpacks and deploys in one command
gcloud run deploy my-service \
--source . \
--region africa-south1 \
--allow-unauthenticated
# Your app just needs to listen on the $PORT env var (default 8080)
gcloud run deploy my-service \
--image europe-west2-docker.pkg.dev/my-app-prod/my-repo/my-service:v1 \
--region africa-south1 \
--platform managed \
--allow-unauthenticated \
--memory 512Mi --cpu 1 \
--concurrency 80 \
--min-instances 0 --max-instances 100 \
--timeout 300 \
--set-env-vars NODE_ENV=production \
--set-secrets DB_URL=db-url:latest,API_KEY=stripe-key:latest \
--service-account [email protected]
Cloud Run bills per vCPU-second. A Node.js service doing mostly I/O can handle 80–250 concurrent requests per instance. Raising concurrency reduces the instance count you need.
| Workload | Concurrency |
|---|---|
| CPU-bound (image processing, PDF) | 1–5 |
| I/O-bound Node/Python (typical API) | 80 (default) |
| Async I/O heavy (proxies, streaming) | 250–1000 |
# Lower for CPU-bound
gcloud run services update my-service --concurrency 4 --region africa-south1
# Higher for I/O-bound
gcloud run services update my-service --concurrency 250 --region africa-south1
| Mode | When CPU is billed | When to use |
|---|---|---|
| CPU only during request handling (default) | Only while the instance has an active request | Web APIs, sync handlers |
CPU always allocated (--no-cpu-throttling) | Continuously while the instance is running | Background tasks, Queue consumers, min-instances > 0 |
# Background worker with min-instances — pay for always-on CPU
gcloud run services update my-worker \
--no-cpu-throttling \
--min-instances 1 \
--region africa-south1
// src/index.ts — Fastify on Cloud Run
import Fastify from 'fastify';
const app = Fastify({ logger: true });
app.get('/health', async () => ({ ok: true }));
app.post('/api/process', async (req) => {
const body = req.body as { input: string };
// ...business logic
return { processed: body.input };
});
const port = Number(process.env.PORT ?? 8080);
app.listen({ port, host: '0.0.0.0' }).catch((err) => {
app.log.error(err);
process.exit(1);
});
# Dockerfile — minimal, fast cold start
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY dist ./dist
ENV NODE_ENV=production
CMD ["node", "dist/index.js"]
# Grant Pub/Sub permission to invoke your Cloud Run service
gcloud run services add-iam-policy-binding my-consumer \
--region africa-south1 \
--member "serviceAccount:[email protected]" \
--role "roles/run.invoker"
# Push subscription posts JSON to your HTTPS endpoint
gcloud pubsub subscriptions create my-sub \
--topic my-topic \
--push-endpoint https://my-consumer-xyz-ew.a.run.app/events \
--push-auth-service-account "[email protected]"
// Cloud Run receives Pub/Sub push
app.post('/events', async (req) => {
const { message } = req.body as { message: { data: string; attributes: Record<string, string> } };
const payload = JSON.parse(Buffer.from(message.data, 'base64').toString());
await handleEvent(payload);
return { ack: true }; // return 200 to ack
});
For cron-style or CLI-triggered batch work that doesn't need HTTP serving:
gcloud run jobs create my-nightly-etl \
--image europe-west2-docker.pkg.dev/my-app-prod/my-repo/etl:v1 \
--region africa-south1 \
--tasks 10 \
--task-timeout 1h \
--parallelism 5 \
--cpu 2 --memory 4Gi \
--set-env-vars INPUT_BUCKET=raw,OUTPUT_BUCKET=processed
# Run manually
gcloud run jobs execute my-nightly-etl --region africa-south1
# Schedule via Cloud Scheduler
gcloud scheduler jobs create http nightly-etl-trigger \
--schedule "0 2 * * *" --time-zone "Africa/Johannesburg" \
--uri "https://REGION-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/PROJECT_ID/jobs/my-nightly-etl:run" \
--http-method POST \
--oauth-service-account-email [email protected]
Cloud Run service accepts Google OIDC ID tokens. From a Cloudflare Worker, mint an ID token using a service-account key (stored in Workers secrets):
// Sign a Google-compatible JWT → exchange for ID token → call private Cloud Run
export async function getIdentityToken(env: Env, audience: string): Promise<string> {
const now = Math.floor(Date.now() / 1000);
const jwtHeader = btoa(JSON.stringify({ alg: 'RS256', typ: 'JWT' }));
const jwtPayload = btoa(JSON.stringify({
iss: env.GCP_SA_EMAIL,
scope: 'https://www.googleapis.com/auth/cloud-platform',
aud: 'https://oauth2.googleapis.com/token',
target_audience: audience, // the Cloud Run URL
iat: now,
exp: now + 3600,
}));
// Sign with the SA private key (imported via SubtleCrypto)
const keyData = pemToArrayBuffer(env.GCP_SA_PRIVATE_KEY);
const cryptoKey = await crypto.subtle.importKey('pkcs8', keyData,
{ name: 'RSASSA-PKCS1-v1_5', hash: 'SHA-256' }, false, ['sign']);
const sig = await crypto.subtle.sign('RSASSA-PKCS1-v1_5', cryptoKey,
new TextEncoder().encode(`${jwtHeader}.${jwtPayload}`));
const signedJwt = `${jwtHeader}.${jwtPayload}.${arrayBufferToBase64Url(sig)}`;
// Exchange assertion for an ID token
const res = await fetch('https://oauth2.googleapis.com/token', {
method: 'POST',
headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
body: new URLSearchParams({
grant_type: 'urn:ietf:params:oauth:grant-type:jwt-bearer',
assertion: signedJwt,
}),
});
const { id_token } = await res.json() as { id_token: string };
return id_token;
}
// Call private Cloud Run
const idToken = await getIdentityToken(env, 'https://my-service-xyz-ew.a.run.app');
const result = await fetch('https://my-service-xyz-ew.a.run.app/api/process', {
method: 'POST',
headers: { Authorization: `Bearer ${idToken}`, 'Content-Type': 'application/json' },
body: JSON.stringify({ input: 'hello' }),
});
# Map your domain (requires DNS verification)
gcloud run domain-mappings create \
--service my-service \
--domain api.example.com \
--region africa-south1
# Returns DNS records (A/AAAA or CNAME) to add at your registrar or Cloudflare
For Cloudflare-fronted custom domains, set Cloudflare DNS to "DNS only" (grey cloud) during verification, then re-enable proxy.
Cloud Functions v2 is a thin wrapper over Cloud Run — same scaling, same container, but the platform builds the container for you from a source function. Use it when you want zero Dockerfile boilerplate for a single event handler.
# HTTP function
gcloud functions deploy my-webhook \
--gen2 --runtime nodejs20 \
--region africa-south1 \
--trigger-http --allow-unauthenticated \
--entry-point handler \
--source .
# Pub/Sub trigger
gcloud functions deploy on-event \
--gen2 --runtime nodejs20 \
--region africa-south1 \
--trigger-topic my-topic \
--entry-point onEvent \
--source .
# GCS object-created trigger
gcloud functions deploy on-upload \
--gen2 --runtime nodejs20 \
--region africa-south1 \
--trigger-bucket my-bucket \
--entry-point onUpload \
--source .
// index.ts
import { http, cloudEvent } from '@google-cloud/functions-framework';
http('handler', (req, res) => {
res.json({ ok: true, body: req.body });
});
cloudEvent('onEvent', (event) => {
const data = Buffer.from(event.data.message.data, 'base64').toString();
console.log('Pub/Sub:', JSON.parse(data));
});
When to use Functions over Cloud Run: a one-file glue handler with no build pipeline. When to skip Functions: anything with dependencies, multi-route, or that benefits from a Dockerfile — Cloud Run direct is cleaner and the same runtime.
GKE runs Kubernetes. Pick the mode first:
| Mode | Control over nodes | Billing | Best for |
|---|---|---|---|
| Autopilot | None — Google manages | Per pod vCPU/mem + cluster fee | Default — ops-light K8s |
| Standard | Full — you pick VMs | Node VMs + cluster fee | DaemonSets, GPU pools, custom kernels |
gcloud container clusters create-auto my-cluster \
--region africa-south1 \
--release-channel regular
# Get kubeconfig
gcloud container clusters get-credentials my-cluster --region africa-south1
# Deploy
kubectl apply -f deployment.yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-service
spec:
replicas: 3
selector: { matchLabels: { app: my-service } }
template:
metadata: { labels: { app: my-service } }
spec:
serviceAccountName: my-service-ksa # bound to GCP SA via Workload Identity
containers:
- name: app
image: europe-west2-docker.pkg.dev/my-app-prod/my-repo/my-service:v1
ports: [{ containerPort: 3000 }]
resources:
requests: { cpu: 250m, memory: 512Mi }
limits: { cpu: 500m, memory: 1Gi }
readinessProbe:
httpGet: { path: /health, port: 3000 }
---
apiVersion: v1
kind: Service
metadata: { name: my-service }
spec:
type: LoadBalancer
selector: { app: my-service }
ports: [{ port: 80, targetPort: 3000 }]
# Bind Kubernetes SA (KSA) to GCP SA (GSA) — no JSON keys in the pod
gcloud iam service-accounts add-iam-policy-binding \
[email protected] \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:my-app-prod.svc.id.goog[default/my-service-ksa]"
kubectl annotate serviceaccount my-service-ksa \
iam.gke.io/gcp-service-account=my-service-gsa@my-app-prod.iam.gserviceaccount.com
Now pods running as my-service-ksa get GCP-scoped tokens automatically via the metadata server. No key files in the container.
Use GCE when you need persistent VMs: legacy workloads that don't containerise, GPU/TPU specialisation, or very high-utilisation steady-state where committed-use discounts beat Cloud Run.
# Create a small always-on VM
gcloud compute instances create my-vm \
--machine-type e2-small \
--zone africa-south1-a \
--image-family debian-12 --image-project debian-cloud \
--boot-disk-size 20GB --boot-disk-type pd-balanced \
--service-account [email protected] \
--scopes cloud-platform \
--tags http-server \
--metadata-from-file startup-script=startup.sh
# SSH via IAP (no external IP needed, no firewall rule for port 22)
gcloud compute ssh my-vm --zone africa-south1-a --tunnel-through-iap
| Family | Example | Best for |
|---|---|---|
| E2 | e2-small, e2-medium | Cheap general purpose, dev/test |
| N2 | n2-standard-4 | Balanced production workloads |
| T2D | t2d-standard-4 | AMD EPYC, best price/perf general |
| C3 | c3-highcpu-8 | Compute-bound, Intel Sapphire Rapids |
| C2D | c2d-standard-8 | Compute-bound, AMD |
| M2/M3 | m3-ultramem | In-memory databases, SAP HANA |
| A3/G2 | a3-highgpu-8g | GPU (H100/L4) — ML training, video |
gcloud compute instances create my-batch-vm \
--machine-type n2-standard-4 \
--zone africa-south1-a \
--provisioning-model SPOT \
--instance-termination-action STOP
Up to 60–91% off on-demand. Google can preempt with 30s notice.
# Instance template
gcloud compute instance-templates create my-template \
--machine-type n2-standard-2 \
--image-family debian-12 --image-project debian-cloud \
--metadata-from-file startup-script=startup.sh \
--tags http-server
# Regional MIG with auto-scaling
gcloud compute instance-groups managed create my-mig \
--base-instance-name my-app --region africa-south1 \
--template my-template --size 2
gcloud compute instance-groups managed set-autoscaling my-mig \
--region africa-south1 \
--max-num-replicas 10 --min-num-replicas 2 \
--target-cpu-utilization 0.6 --cool-down-period 90
| Component | Price |
|---|---|
| vCPU | ~$0.000024 / vCPU-second (tier 1) |
| Memory | ~$0.0000025 / GB-second |
| Requests | $0.40 / million |
| Free tier | 2M requests + 180k vCPU-s + 360k GB-s / month |
Example: 10M req/month, 512MB, avg 200ms, concurrency 80 → ~$25–40/month.
| Component | Price |
|---|---|
| Control plane | $0.10/hr (regional) per cluster ≈ $73/mo |
| Autopilot vCPU | ~$0.0445 / vCPU-hour |
| Autopilot memory | ~$0.0049 / GB-hour |
| Standard (node VMs) | Per GCE instance pricing |
One Autopilot pod at 250m CPU + 512Mi ≈ $10/month plus the $73 control-plane fee. GKE only makes sense from ~5+ services or when you need K8s specifically.
| Type | vCPU | RAM | $/hr | $/mo |
|---|---|---|---|---|
| e2-small | 2 (shared) | 2GB | ~$0.017 | ~$12 |
| e2-medium | 2 (shared) | 4GB | ~$0.034 | ~$25 |
| n2-standard-2 | 2 | 8GB | ~$0.098 | ~$72 |
| n2-standard-4 | 4 | 16GB | ~$0.195 | ~$142 |
| c3-highcpu-8 | 8 | 16GB | ~$0.320 | ~$234 |
Committed-use discounts: 1-year = ~25% off, 3-year = ~52% off. Spot: up to ~91% off.
setTimeout/workers stall between requests. Set --no-cpu-throttling if your app does post-response work.--min-instances 1 for latency-critical paths (billed 24/7 at idle CPU).$PORT, not 3000: Cloud Run injects PORT=8080 but expects your app to read the env var. Hardcoding any other port = endless cold-start failures.--gen2 for new work.privileged: true / hostNetwork: true won't schedule. Use Standard for those.gcloud iam ... add-iam-policy-binding must happen BEFORE kubectl annotate or tokens won't be issued. Both-ways breakage is silent — pods get no credentials, calls fail with ambiguous 403.gcloud run deploy --source .: Invisibly enables Cloud Build + Artifact Registry + grants Cloud Build's SA the roles it needs. First-run permission failures are cryptic — check [cloudbuild.googleapis.com] and the Cloud Build SA's IAM.pd-extreme and local SSD availability varies by zone. Prefer pd-balanced for general VMs, pd-ssd for latency-sensitive databases.europe-west2 as a fallback, design for bursty sync-back to africa-south1 via Pub/Sub or scheduled Dataflow, not per-request.