Kubernetes backend
Run fakecloud's container-backed services (Lambda, ECS, RDS, ElastiCache, EC2) as native Kubernetes Pods instead of Docker containers. Avoid docker-in-docker, get real resource limits, debug with kubectl.
By default, fakecloud runs its container-backed services — Lambda, ECS, RDS, ElastiCache, EC2 — in Docker containers via docker run. When fakecloud itself runs inside Kubernetes (CI pipelines, multi-tenant test clusters), that means docker-in-docker: privileged pods, opaque resource accounting, and harder debugging.
The Kubernetes backend (issue #1234) replaces the Docker path with native Pods. Flip the whole stack with FAKECLOUD_CONTAINER_BACKEND=k8s, or opt a single service in/out with its FAKECLOUD_<SERVICE>_BACKEND variable. Each Lambda function gets a Pod sized from the function's MemorySize, executes the AWS Runtime Interface Emulator image, and is reused across invocations exactly like a warm Docker container; ECS tasks, RDS instances, ElastiCache nodes, and EC2 instances get Pods too (see the per-service sections below).
When to use it
- fakecloud runs as a Pod in your CI / dev cluster
- You'd rather not grant fakecloud's Pod the
privilegedsecurity context that docker-in-docker requires - You want real Kubernetes
requests/limitson Lambda Pods so the scheduler can pack them - You want
kubectl logs/kubectl describe podon misbehaving functions
If fakecloud runs on your laptop with Docker Desktop, stick with the default Docker backend — it's faster (no init-container HTTP fetch) and needs no cluster.
Enabling it
Set on the fakecloud Pod:
FAKECLOUD_LAMBDA_BACKEND=k8s
FAKECLOUD_K8S_SELF_URL=http://fakecloud.fakecloud.svc.cluster.local:4566FAKECLOUD_K8S_SELF_URL must resolve from inside Lambda Pods — that's how init containers pull function code and layers from the fakecloud process. Use the in-cluster service DNS name, never localhost or 127.0.0.1.
Selecting the backend
Backend selection is per-service with a global fallback:
FAKECLOUD_<SERVICE>_BACKEND=k8s|docker— an explicit per-service override (FAKECLOUD_LAMBDA_BACKEND, and the same pattern for the other container-backed services). An explicit value always wins.FAKECLOUD_CONTAINER_BACKEND=k8s|docker— a global default applied to any service whose own variable is unset. Set this once to flip the whole stack to Kubernetes.- Unset everywhere — Docker (the default).
So FAKECLOUD_CONTAINER_BACKEND=k8s with FAKECLOUD_LAMBDA_BACKEND=docker runs everything on Kubernetes except Lambda, which stays on Docker.
Optional env vars:
| Variable | Default | Purpose |
|---|---|---|
FAKECLOUD_K8S_NAMESPACE | default | Namespace Lambda Pods are created in. |
FAKECLOUD_K8S_ECR_URL | host of FAKECLOUD_K8S_SELF_URL | Override the host:port the backend rewrites AWS private-ECR URIs to. |
FAKECLOUD_K8S_PULL_SECRET | unset | Name of a kubernetes.io/dockerconfigjson Secret attached as imagePullSecrets. Only needed for PackageType=Image Lambda functions whose image registry requires auth. |
RBAC
The fakecloud Pod's ServiceAccount needs permission to create / list / watch / delete Pods in the configured namespace. The ElastiCache backend additionally execs into cache Pods (redis-cli for CONFIG/ACL and snapshot SAVE), so it needs pods/exec too.
apiVersion: v1
kind: ServiceAccount
metadata:
name: fakecloud
namespace: fakecloud
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: fakecloud-lambda-pods
namespace: fakecloud
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create", "get", "list", "watch", "delete"]
# Required only for the ElastiCache backend (exec into cache Pods).
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: fakecloud-lambda-pods
namespace: fakecloud
subjects:
- kind: ServiceAccount
name: fakecloud
namespace: fakecloud
roleRef:
kind: Role
name: fakecloud-lambda-pods
apiGroup: rbac.authorization.k8s.ioIf you set FAKECLOUD_K8S_NAMESPACE to a different namespace from where fakecloud itself runs, create the Role + RoleBinding in that target namespace.
Deployment + Service
A minimal in-cluster install:
apiVersion: apps/v1
kind: Deployment
metadata:
name: fakecloud
namespace: fakecloud
spec:
replicas: 1
selector:
matchLabels: { app: fakecloud }
template:
metadata:
labels: { app: fakecloud }
spec:
serviceAccountName: fakecloud
containers:
- name: fakecloud
image: ghcr.io/faiscadev/fakecloud:latest
ports:
- containerPort: 4566
env:
- name: FAKECLOUD_LAMBDA_BACKEND
value: "k8s"
- name: FAKECLOUD_K8S_SELF_URL
value: "http://fakecloud.fakecloud.svc.cluster.local:4566"
- name: FAKECLOUD_K8S_NAMESPACE
value: "fakecloud"
---
apiVersion: v1
kind: Service
metadata:
name: fakecloud
namespace: fakecloud
spec:
selector: { app: fakecloud }
ports:
- name: http
port: 4566
targetPort: 4566How it works
- Your test client calls
lambda:Invokeagainst the fakecloud Service. - The Lambda service in fakecloud computes a deploy fingerprint (function code SHA + attached layer hashes) and looks for a matching warm Pod. If one exists, it POSTs the invocation payload to that Pod's
:8080. - On a cache miss, fakecloud creates a new Pod via the Kubernetes API. The Pod has:
- A
busyboxinit container that downloads the function's code zip and a tar of its layers from fakecloud's internal/_fakecloud/lambda/_internal/*endpoints (bearer-token-protected, per-process token), unpacks them into sharedemptyDirvolumes mounted at/var/taskand/opt. - A main container running the AWS RIE image for the function's runtime (
public.ecr.aws/lambda/python:3.12,public.ecr.aws/lambda/nodejs:20, etc.). ForPackageType=Imagefunctions, the user's image is used directly; AWS private-ECR URIs are rewritten to the in-cluster fakecloud OCI registry. - Resource requests / limits sized from
MemorySize, ephemeral/tmpasemptyDir { medium: Memory, sizeLimit: EphemeralStorage.Size },restartPolicy: Never.
- A
- fakecloud watches the Pod until
status.podIPis populated, then TCP-handshakes the RIE port and forwards the invocation. - Idle Pods are torn down by the same TTL loop the Docker backend uses.
- On fakecloud startup, any Pod labeled
fakecloud-managed-by=fakecloudwhosefakecloud-instancelabel doesn't match the current process is deleted — covers crashes that left orphans behind.
Security model
- Init containers pull code over the cluster network via a process-local bearer token. The token is generated at fakecloud startup, embedded into Pod specs at launch time, and never persisted or logged. Each fakecloud restart mints a fresh token, invalidating any in-flight artifact downloads.
- Pods carry labels
fakecloud-managed-by=fakecloud,fakecloud-instance=<pid>,fakecloud-lambda=<function>,fakecloud-deploy-id=<hash>so you cankubectl get pods -l fakecloud-managed-by=fakecloud. - Pods run with whatever default security context your cluster's
PodSecurityPolicy/PodSecurityAdmissionenforces — noprivileged, no special capabilities. The fakecloud Pod itself also doesn't need privileged.
Pod scheduling and metadata
Real clusters often need fakecloud's Pods to carry a nodeSelector (pin workloads to a node pool), taint tolerations (run on tainted/spot nodes), or annotations (cost allocation, mesh sidecar injection, scrape config). You can attach all three at three levels, lowest to highest precedence:
- Global —
FAKECLOUD_K8S_NODE_SELECTOR/FAKECLOUD_K8S_TOLERATIONS/FAKECLOUD_K8S_ANNOTATIONS, applied to every fakecloud Pod across all k8s-backed services. - Per-service —
FAKECLOUD_<SERVICE>_K8S_<KEY>(e.g.FAKECLOUD_LAMBDA_K8S_NODE_SELECTOR,FAKECLOUD_RDS_K8S_TOLERATIONS), applied to that service's Pods. - Per-instance — a reserved tag on the individual resource (Lambda function, DB instance, cache cluster, ECS task, EC2 instance):
fakecloud-k8s/node-selector,fakecloud-k8s/tolerations,fakecloud-k8s/annotations.
Value formats are the same at every level:
NODE_SELECTOR/ANNOTATIONS: a flatkey=value,key=valuemap.TOLERATIONS: a JSON array of KubernetesTolerationobjects.
Merge across the levels: the maps (node selector, annotations) union per key, and a higher level overrides only the keys it sets — lower-level keys survive. Tolerations combine additively (a Pod tolerates the union of all configured taints), dropping exact duplicates.
# Global: keep every fakecloud Pod off the default node pool.
FAKECLOUD_K8S_NODE_SELECTOR=fakecloud=true
FAKECLOUD_K8S_TOLERATIONS='[{"key":"fakecloud","operator":"Exists","effect":"NoSchedule"}]'
# Service: send all Lambda Pods to the burst pool, with a cost-center annotation.
FAKECLOUD_LAMBDA_K8S_NODE_SELECTOR=pool=burst
FAKECLOUD_LAMBDA_K8S_ANNOTATIONS=cost-center=lambda# Per-instance: this one function pins to GPU nodes (overrides the service/global
# node selector for the `accelerator` key) via a tag at create time.
aws lambda create-function \
--tags 'fakecloud-k8s/node-selector=accelerator=nvidia,fakecloud-k8s/annotations=team=ml' \
... # other create-function argsA malformed env value (e.g. invalid tolerations JSON) fails fast at fakecloud startup so a typo is loud. A malformed per-instance tag is logged and ignored for that field, so a bad tag never makes a single resource un-runnable. Per-instance tags only take effect when present at the resource's creation time; tags added later (TagResource / add-tags-to-resource) don't retroactively re-schedule a running Pod — recreate the resource to pick them up.
The fakecloud-k8s/* tags are ordinary resource tags — fakecloud reads them but does not strip them, so they remain visible in ListTags / list-tags-for-resource output and count toward the resource's AWS tag limit (e.g. 50 for Lambda) like any other tag.
ElastiCache backend
Set FAKECLOUD_ELASTICACHE_BACKEND=k8s (or the global FAKECLOUD_CONTAINER_BACKEND=k8s) to run cache clusters, replication groups, and serverless caches as native Pods instead of Docker containers.
- Each cache resource becomes one Pod with a single
redis:7-alpineormemcached:1.6-alpinecontainer; the resource's endpoint address is the Pod IP and the standard engine port (6379 / 11211). - CONFIG / ACL changes and snapshot
SAVEare applied bykubectl exec-style calls through the API server (hence thepods/execRBAC rule) — no dependency on Pod-IP routability from fakecloud. - Snapshot restore: a cache created from a snapshot gets a Pod whose container
wgets the snapshot RDB from a per-process, bearer-token-guarded/_fakecloud/elasticache/_internal/rdb/<pod>endpoint into/data/dump.rdbbefore launchingredis-server, so the engine loads it at startup — the k8s analogue of the Docker backend'sdocker cp. - Reboot (RebootCacheCluster) recreates the Pod; for Redis the live dataset is snapshotted and reloaded across the recreate so data survives, matching the Docker backend's in-place restart. Memcached reboots flush (no persistence), as on AWS.
- Cache Pods carry
fakecloud-service=elasticache; the startup reaper sweeps only its own service's orphans.
RDS
Set FAKECLOUD_RDS_BACKEND=k8s (or the global FAKECLOUD_CONTAINER_BACKEND=k8s) to run DB instances as native Pods.
- Each DB instance becomes one Pod. The bridge engines (
postgres/mysql/mariadb) use the prebuiltghcr.io/faiscadev/fakecloud-*images — the cluster pulls them (there's no in-cluster image build, unlike the Docker backend; override the registry withFAKECLOUD_POSTGRES_REGISTRY, or supplyFAKECLOUD_K8S_PULL_SECRETfor a private one). The heavy engines (Oracle / SQL Server / Db2) use their upstream images; Db2 runs with a privileged security context. - Bridge engines receive
FAKECLOUD_ENDPOINTpointing at the in-clusterFAKECLOUD_K8S_SELF_URL, so theiraws_lambda/aws_s3/ UDF callbacks reach fakecloud. - Readiness is a real connection for Postgres / MySQL / MariaDB and a Pod-log marker (then a TCP probe) for the heavy engines. Snapshot dump / restore and log-file reads run through
kubectl exec(pg_dump/mysqldump/psql/cat) — another reason for thepods/execRBAC rule. - Reboot recreates the Pod; for the dumpable engines the dataset is snapshotted and reloaded across the recreate.
- Because fakecloud connects to the DB over the Pod IP, the RDS k8s backend requires fakecloud to run in-cluster (the standard
FAKECLOUD_K8S_SELF_URLdeployment).
ECS
Set FAKECLOUD_ECS_BACKEND=k8s (or the global FAKECLOUD_CONTAINER_BACKEND=k8s) to run ECS tasks as native Pods.
- Each task becomes one Pod, with one Pod container per
containerDefinitionsentry — all sharing the Pod's network namespace (localhost), which is exactly theawsvpcmodel. - A container that is the target of a
dependsOnCOMPLETE/SUCCESScondition becomes an initContainer (Kubernetes runs initContainers to completion, in order, before the app containers) — the natural fit for run-once migration/bootstrap containers.START/HEALTHYordering among the long-running app containers isn't strictly enforceable inside one Pod, so it's best-effort; thehealthCheckstill becomes a containerreadinessProbe. secrets[]resolve from SecretsManager / SSM exactly as on the Docker backend and are injected as env; the task-role and metadata endpoints (AWS_CONTAINER_CREDENTIALS_FULL_URI,ECS_CONTAINER_METADATA_URI[_V4]) point at the in-clusterFAKECLOUD_K8S_SELF_URL.- The task lifecycle (
PENDING→RUNNING→STOPPED, per-container exit codes, captured logs) is driven off the Pod's container statuses; logs are captured per container via the Pod log API. Task lifetime follows ECS semantics — the first essential container's exit stops the task. - Container images (including AWS ECR URIs, rewritten to the in-cluster registry) must be pullable by the cluster.
- Low-level Docker-runtime knobs (
ulimits,devices,sysctls,tmpfs, Linux capabilities) aren't translated to the Pod;privileged,readonlyRootFilesystem, and a numericuserare. Task volumes become Pod-localemptyDirscratch shared by name within the task.
EC2
Set FAKECLOUD_EC2_BACKEND=k8s (or the global FAKECLOUD_CONTAINER_BACKEND=k8s) to run EC2 instances as native Pods instead of Docker containers.
- Each
RunInstancesinstance becomes one Pod running the instance's base image (Amazon Linux by default, overridable viaFAKECLOUD_EC2_DEFAULT_IMAGE), kept alive withtail -f /dev/null. The instance's private IP inDescribeInstancesis the Pod IP. - User-data runs at boot exactly as on the Docker backend: it's decoded and executed as a root shell script in the container, backgrounded so a slow script never blocks readiness.
- A Pod can't be stopped and restarted in place, so
StopInstancesdeletes the Pod andStartInstances/RebootInstancesrecreate it under the same deterministic name (re-running user-data). Instances aren't persistent disks, so this matches the container model — not EBS-backed stop/start semantics.TerminateInstancesdeletes the Pod for good. - The EC2 backend only creates and deletes Pods — it never execs into them — so it needs no
pods/execpermission (unlike ElastiCache).
Limitations
- The Kubernetes backend covers Lambda, ElastiCache, RDS, ECS, and EC2 execution — the whole container-backed stack can run natively on Kubernetes.
- Container-image Lambda functions whose image registry requires auth need a manually-created
kubernetes.io/dockerconfigjsonSecret referenced viaFAKECLOUD_K8S_PULL_SECRET. Auto-creating that secret requiressecretspermissions that not every cluster admin wants to grant fakecloud. - Cold-start latency adds the init container HTTP round-trip to download code + layers (typically <500ms intra-cluster), on top of image pull + RIE start. Warm-Pod reuse keeps subsequent invocations as fast as the Docker backend.
- The K8s backend requires fakecloud's process to remain reachable at
FAKECLOUD_K8S_SELF_URLfor the lifetime of each Pod's init container. If fakecloud restarts mid-init, that Pod's bootstrap fails and the facade will spawn a fresh one on the next invocation.
Troubleshooting
FAKECLOUD_LAMBDA_BACKEND=k8s but Kubernetes backend failed to initialize— kube client construction failed. fakecloud useskube::Client::try_default(): in-cluster service account first, thenKUBECONFIG. Make sure the Pod has a ServiceAccount mounted at/var/run/secrets/kubernetes.io/serviceaccount, or setKUBECONFIGfor out-of-cluster testing.- Lambda Pods stuck in
Init:0/1withImagePullBackOffon busybox — the cluster's default image registry can't reachdocker.io. Mirrorbusybox:1.36to your internal registry, or configure a default-pull-secret that has Docker Hub credentials. - Init container exits non-zero with
wget: server returned error: HTTP/1.1 401— fakecloud restarted between Pod create and init-container start, invalidating the bearer token. The facade will retry on the next invocation. fakecloud-lambdaPods are leaking after fakecloud crashes — confirm fakecloud hasdeletepermission onpodsin the configured namespace. The startup reaper deletes any Pod labeledfakecloud-managed-by=fakecloudwhosefakecloud-instancediffers from the current process.kubectl get pods -l fakecloud-managed-by=fakecloud— quick health check showing every Lambda Pod fakecloud has spawned.
Running the test suite
The K8s backend has unit tests (Pod-spec generation, helpers) that run on every workspace cargo test. Real-cluster integration tests are opt-in and gated behind the k8s-integration feature so a casual cargo test doesn't try to talk to a cluster that isn't there.
To run them:
kind create cluster --name fakecloud-k8s-test
FAKECLOUD_K8S_TEST=1 cargo test -p fakecloud-lambda \
--features k8s-integration --test k8s_integration -- --test-threads=1The first test hard-fails (not skips) when FAKECLOUD_K8S_TEST is unset, so you can't silently miss a regression. CI runs the same suite against a kind cluster on every push that touches crates/fakecloud-lambda/** via .github/workflows/lambda-k8s.yml.
Status
Shipped in fakecloud 0.14.x. Beta — please open an issue or comment on #1234 if you hit edge cases.