tech/google/cloud/compute

COMPUTE

GCP compute skill. Use when: (1) deploying serverless containers on Cloud Run — scale-to-zero, pay per request,

production gcloud CLI v500+, Cloud Run gen2, Cloud Functions v2, GKE 1.30+
requires: tech/google/cloud

Google Cloud Compute

GCP offers four compute tiers. Pick the leftmost that fits — each step right costs more operational weight:

Cloud Run → Cloud Functions → GKE Autopilot → GKE Standard → Compute Engine

The 2nth.ai default is Cloud Run. It gives you containers with scale-to-zero, pay-per-request, global HTTPS endpoints, and integrates cleanly with Pub/Sub, Cloud SQL, Vertex AI, and Workspace DWD. Lambda-equivalent workloads that need >60 min runtime, persistent state, or GPU specialisation move up to GKE or GCE.

Service comparison

ServiceBest forCold startRuntimeScale-to-zeroPrice model
Cloud RunHTTP/gRPC containers, event triggers100–500msUp to 60 min/reqvCPU-sec + GB-sec + requests
Cloud Functions v2Single-function glue, event consumers200–800msUp to 60 minSame as Cloud Run (runs on it)
GKE AutopilotMulti-service K8s apps, complex routingNode-level (minutes)Unlimited✗ (pod-level HPA)vCPU + memory per pod-hour
GKE StandardFull K8s control, DaemonSets, custom nodesNode-levelUnlimitedNode VM + control plane fee
Compute EngineStateful VMs, GPUs, legacy workloadsInstance boot (seconds)UnlimitedPer-second VM billing

Cloud Run

Cloud Run runs any containerised HTTP server on Google's managed platform. You push an image, Google runs it — auto-scaling from 0 to thousands of instances, auto-TLS, auto-HTTPS URL. The sweet spot for ~90% of backend services.

Deploy from source (no Dockerfile needed)

# Cloud Run builds a container via Buildpacks and deploys in one command
gcloud run deploy my-service \
  --source . \
  --region africa-south1 \
  --allow-unauthenticated

# Your app just needs to listen on the $PORT env var (default 8080)

Deploy a pre-built container

gcloud run deploy my-service \
  --image europe-west2-docker.pkg.dev/my-app-prod/my-repo/my-service:v1 \
  --region africa-south1 \
  --platform managed \
  --allow-unauthenticated \
  --memory 512Mi --cpu 1 \
  --concurrency 80 \
  --min-instances 0 --max-instances 100 \
  --timeout 300 \
  --set-env-vars NODE_ENV=production \
  --set-secrets DB_URL=db-url:latest,API_KEY=stripe-key:latest \
  --service-account [email protected]

Concurrency — the single biggest cost lever

Cloud Run bills per vCPU-second. A Node.js service doing mostly I/O can handle 80–250 concurrent requests per instance. Raising concurrency reduces the instance count you need.

WorkloadConcurrency
CPU-bound (image processing, PDF)1–5
I/O-bound Node/Python (typical API)80 (default)
Async I/O heavy (proxies, streaming)250–1000
# Lower for CPU-bound
gcloud run services update my-service --concurrency 4 --region africa-south1

# Higher for I/O-bound
gcloud run services update my-service --concurrency 250 --region africa-south1

CPU allocation: "request" vs "always-on"

ModeWhen CPU is billedWhen to use
CPU only during request handling (default)Only while the instance has an active requestWeb APIs, sync handlers
CPU always allocated (--no-cpu-throttling)Continuously while the instance is runningBackground tasks, Queue consumers, min-instances > 0
# Background worker with min-instances — pay for always-on CPU
gcloud run services update my-worker \
  --no-cpu-throttling \
  --min-instances 1 \
  --region africa-south1

TypeScript service (Node 20)

// src/index.ts — Fastify on Cloud Run
import Fastify from 'fastify';

const app = Fastify({ logger: true });

app.get('/health', async () => ({ ok: true }));

app.post('/api/process', async (req) => {
  const body = req.body as { input: string };
  // ...business logic
  return { processed: body.input };
});

const port = Number(process.env.PORT ?? 8080);
app.listen({ port, host: '0.0.0.0' }).catch((err) => {
  app.log.error(err);
  process.exit(1);
});
# Dockerfile — minimal, fast cold start
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY dist ./dist
ENV NODE_ENV=production
CMD ["node", "dist/index.js"]

Pub/Sub push trigger

# Grant Pub/Sub permission to invoke your Cloud Run service
gcloud run services add-iam-policy-binding my-consumer \
  --region africa-south1 \
  --member "serviceAccount:[email protected]" \
  --role "roles/run.invoker"

# Push subscription posts JSON to your HTTPS endpoint
gcloud pubsub subscriptions create my-sub \
  --topic my-topic \
  --push-endpoint https://my-consumer-xyz-ew.a.run.app/events \
  --push-auth-service-account "[email protected]"
// Cloud Run receives Pub/Sub push
app.post('/events', async (req) => {
  const { message } = req.body as { message: { data: string; attributes: Record<string, string> } };
  const payload = JSON.parse(Buffer.from(message.data, 'base64').toString());
  await handleEvent(payload);
  return { ack: true };  // return 200 to ack
});

Cloud Run jobs (batch, not HTTP)

For cron-style or CLI-triggered batch work that doesn't need HTTP serving:

gcloud run jobs create my-nightly-etl \
  --image europe-west2-docker.pkg.dev/my-app-prod/my-repo/etl:v1 \
  --region africa-south1 \
  --tasks 10 \
  --task-timeout 1h \
  --parallelism 5 \
  --cpu 2 --memory 4Gi \
  --set-env-vars INPUT_BUCKET=raw,OUTPUT_BUCKET=processed

# Run manually
gcloud run jobs execute my-nightly-etl --region africa-south1

# Schedule via Cloud Scheduler
gcloud scheduler jobs create http nightly-etl-trigger \
  --schedule "0 2 * * *" --time-zone "Africa/Johannesburg" \
  --uri "https://REGION-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/PROJECT_ID/jobs/my-nightly-etl:run" \
  --http-method POST \
  --oauth-service-account-email [email protected]

Calling Cloud Run from Cloudflare Workers (IAM-authenticated)

Cloud Run service accepts Google OIDC ID tokens. From a Cloudflare Worker, mint an ID token using a service-account key (stored in Workers secrets):

// Sign a Google-compatible JWT → exchange for ID token → call private Cloud Run
export async function getIdentityToken(env: Env, audience: string): Promise<string> {
  const now = Math.floor(Date.now() / 1000);
  const jwtHeader = btoa(JSON.stringify({ alg: 'RS256', typ: 'JWT' }));
  const jwtPayload = btoa(JSON.stringify({
    iss: env.GCP_SA_EMAIL,
    scope: 'https://www.googleapis.com/auth/cloud-platform',
    aud: 'https://oauth2.googleapis.com/token',
    target_audience: audience,             // the Cloud Run URL
    iat: now,
    exp: now + 3600,
  }));

  // Sign with the SA private key (imported via SubtleCrypto)
  const keyData = pemToArrayBuffer(env.GCP_SA_PRIVATE_KEY);
  const cryptoKey = await crypto.subtle.importKey('pkcs8', keyData,
    { name: 'RSASSA-PKCS1-v1_5', hash: 'SHA-256' }, false, ['sign']);
  const sig = await crypto.subtle.sign('RSASSA-PKCS1-v1_5', cryptoKey,
    new TextEncoder().encode(`${jwtHeader}.${jwtPayload}`));
  const signedJwt = `${jwtHeader}.${jwtPayload}.${arrayBufferToBase64Url(sig)}`;

  // Exchange assertion for an ID token
  const res = await fetch('https://oauth2.googleapis.com/token', {
    method: 'POST',
    headers: { 'Content-Type': 'application/x-www-form-urlencoded' },
    body: new URLSearchParams({
      grant_type: 'urn:ietf:params:oauth:grant-type:jwt-bearer',
      assertion: signedJwt,
    }),
  });
  const { id_token } = await res.json() as { id_token: string };
  return id_token;
}

// Call private Cloud Run
const idToken = await getIdentityToken(env, 'https://my-service-xyz-ew.a.run.app');
const result = await fetch('https://my-service-xyz-ew.a.run.app/api/process', {
  method: 'POST',
  headers: { Authorization: `Bearer ${idToken}`, 'Content-Type': 'application/json' },
  body: JSON.stringify({ input: 'hello' }),
});

Custom domain

# Map your domain (requires DNS verification)
gcloud run domain-mappings create \
  --service my-service \
  --domain api.example.com \
  --region africa-south1

# Returns DNS records (A/AAAA or CNAME) to add at your registrar or Cloudflare

For Cloudflare-fronted custom domains, set Cloudflare DNS to "DNS only" (grey cloud) during verification, then re-enable proxy.


Cloud Functions v2

Cloud Functions v2 is a thin wrapper over Cloud Run — same scaling, same container, but the platform builds the container for you from a source function. Use it when you want zero Dockerfile boilerplate for a single event handler.

# HTTP function
gcloud functions deploy my-webhook \
  --gen2 --runtime nodejs20 \
  --region africa-south1 \
  --trigger-http --allow-unauthenticated \
  --entry-point handler \
  --source .

# Pub/Sub trigger
gcloud functions deploy on-event \
  --gen2 --runtime nodejs20 \
  --region africa-south1 \
  --trigger-topic my-topic \
  --entry-point onEvent \
  --source .

# GCS object-created trigger
gcloud functions deploy on-upload \
  --gen2 --runtime nodejs20 \
  --region africa-south1 \
  --trigger-bucket my-bucket \
  --entry-point onUpload \
  --source .
// index.ts
import { http, cloudEvent } from '@google-cloud/functions-framework';

http('handler', (req, res) => {
  res.json({ ok: true, body: req.body });
});

cloudEvent('onEvent', (event) => {
  const data = Buffer.from(event.data.message.data, 'base64').toString();
  console.log('Pub/Sub:', JSON.parse(data));
});

When to use Functions over Cloud Run: a one-file glue handler with no build pipeline. When to skip Functions: anything with dependencies, multi-route, or that benefits from a Dockerfile — Cloud Run direct is cleaner and the same runtime.


GKE

GKE runs Kubernetes. Pick the mode first:

ModeControl over nodesBillingBest for
AutopilotNone — Google managesPer pod vCPU/mem + cluster feeDefault — ops-light K8s
StandardFull — you pick VMsNode VMs + cluster feeDaemonSets, GPU pools, custom kernels

Autopilot cluster

gcloud container clusters create-auto my-cluster \
  --region africa-south1 \
  --release-channel regular

# Get kubeconfig
gcloud container clusters get-credentials my-cluster --region africa-south1

# Deploy
kubectl apply -f deployment.yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-service
spec:
  replicas: 3
  selector: { matchLabels: { app: my-service } }
  template:
    metadata: { labels: { app: my-service } }
    spec:
      serviceAccountName: my-service-ksa   # bound to GCP SA via Workload Identity
      containers:
        - name: app
          image: europe-west2-docker.pkg.dev/my-app-prod/my-repo/my-service:v1
          ports: [{ containerPort: 3000 }]
          resources:
            requests: { cpu: 250m, memory: 512Mi }
            limits:   { cpu: 500m, memory: 1Gi }
          readinessProbe:
            httpGet: { path: /health, port: 3000 }
---
apiVersion: v1
kind: Service
metadata: { name: my-service }
spec:
  type: LoadBalancer
  selector: { app: my-service }
  ports: [{ port: 80, targetPort: 3000 }]

Workload Identity (GKE pod → GCP service account)

# Bind Kubernetes SA (KSA) to GCP SA (GSA) — no JSON keys in the pod
gcloud iam service-accounts add-iam-policy-binding \
  [email protected] \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:my-app-prod.svc.id.goog[default/my-service-ksa]"

kubectl annotate serviceaccount my-service-ksa \
  iam.gke.io/gcp-service-account=my-service-gsa@my-app-prod.iam.gserviceaccount.com

Now pods running as my-service-ksa get GCP-scoped tokens automatically via the metadata server. No key files in the container.


Compute Engine (VMs)

Use GCE when you need persistent VMs: legacy workloads that don't containerise, GPU/TPU specialisation, or very high-utilisation steady-state where committed-use discounts beat Cloud Run.

# Create a small always-on VM
gcloud compute instances create my-vm \
  --machine-type e2-small \
  --zone africa-south1-a \
  --image-family debian-12 --image-project debian-cloud \
  --boot-disk-size 20GB --boot-disk-type pd-balanced \
  --service-account [email protected] \
  --scopes cloud-platform \
  --tags http-server \
  --metadata-from-file startup-script=startup.sh

# SSH via IAP (no external IP needed, no firewall rule for port 22)
gcloud compute ssh my-vm --zone africa-south1-a --tunnel-through-iap

Machine families (when to pick what)

FamilyExampleBest for
E2e2-small, e2-mediumCheap general purpose, dev/test
N2n2-standard-4Balanced production workloads
T2Dt2d-standard-4AMD EPYC, best price/perf general
C3c3-highcpu-8Compute-bound, Intel Sapphire Rapids
C2Dc2d-standard-8Compute-bound, AMD
M2/M3m3-ultramemIn-memory databases, SAP HANA
A3/G2a3-highgpu-8gGPU (H100/L4) — ML training, video

Preemptible / Spot VMs (cheap, interruptible)

gcloud compute instances create my-batch-vm \
  --machine-type n2-standard-4 \
  --zone africa-south1-a \
  --provisioning-model SPOT \
  --instance-termination-action STOP

Up to 60–91% off on-demand. Google can preempt with 30s notice.

Managed Instance Groups (auto-scaling fleet)

# Instance template
gcloud compute instance-templates create my-template \
  --machine-type n2-standard-2 \
  --image-family debian-12 --image-project debian-cloud \
  --metadata-from-file startup-script=startup.sh \
  --tags http-server

# Regional MIG with auto-scaling
gcloud compute instance-groups managed create my-mig \
  --base-instance-name my-app --region africa-south1 \
  --template my-template --size 2

gcloud compute instance-groups managed set-autoscaling my-mig \
  --region africa-south1 \
  --max-num-replicas 10 --min-num-replicas 2 \
  --target-cpu-utilization 0.6 --cool-down-period 90

Cost model (approximate, africa-south1)

Cloud Run

ComponentPrice
vCPU~$0.000024 / vCPU-second (tier 1)
Memory~$0.0000025 / GB-second
Requests$0.40 / million
Free tier2M requests + 180k vCPU-s + 360k GB-s / month

Example: 10M req/month, 512MB, avg 200ms, concurrency 80 → ~$25–40/month.

GKE

ComponentPrice
Control plane$0.10/hr (regional) per cluster ≈ $73/mo
Autopilot vCPU~$0.0445 / vCPU-hour
Autopilot memory~$0.0049 / GB-hour
Standard (node VMs)Per GCE instance pricing

One Autopilot pod at 250m CPU + 512Mi ≈ $10/month plus the $73 control-plane fee. GKE only makes sense from ~5+ services or when you need K8s specifically.

Compute Engine (approximate, on-demand)

TypevCPURAM$/hr$/mo
e2-small2 (shared)2GB~$0.017~$12
e2-medium2 (shared)4GB~$0.034~$25
n2-standard-228GB~$0.098~$72
n2-standard-4416GB~$0.195~$142
c3-highcpu-8816GB~$0.320~$234

Committed-use discounts: 1-year = ~25% off, 3-year = ~52% off. Spot: up to ~91% off.


Gotchas

See Also