Test Bedrock locally

Test code that calls AWS Bedrock, locally and deterministically. Configurable responses per prompt, fault injection for retry logic, call-history assertions. Free, no account, no GPU.

Want to test code that calls Bedrock, without spending real tokens or hitting a real LLM? Use fakecloud. Real Bedrock-wire-protocol server. Deterministic responses. Millisecond latency. Free.

curl -fsSL https://raw.githubusercontent.com/faiscadev/fakecloud/main/install.sh | bash
fakecloud

Point any AWS SDK at http://localhost:4566.

Why you don't want a real LLM in tests

Tests exist to prove your code is correct. A real LLM — including Ollama-backed local inference — makes that job harder, not easier:

  1. Non-deterministic. Same prompt, different output every run. Even at temperature=0, attention-kernel non-determinism and tokenizer drift break snapshot tests. You end up asserting output.length > 0, which passes even if the model returned "cabbage."
  2. Slow. Ollama on CPU = seconds per call. 100 tests × 5s = 8 minutes. Your TDD loop dies. CI times out.
  3. Resource-heavy. GB-scale model weights. CI runners balloon, docker pull drags, laptops OOM.
  4. Tests the wrong thing. A real LLM tests whether the model understood your prompt. Your tests should cover whether your code handles Bedrock's response shape, retries correctly on throttling, builds the request body right. Those are different concerns.
  5. Flaky. Rate limits if real AWS. GPU-less OOM if local. Cold-start delays everywhere.

What your tests actually need:

fakecloud gives exactly this.

Configurable responses

import { FakeCloud } from "fakecloud";
import {
  BedrockRuntimeClient,
  InvokeModelCommand,
} from "@aws-sdk/client-bedrock-runtime";

const fc = new FakeCloud();
const rt = new BedrockRuntimeClient({ endpoint: "http://localhost:4566" });

beforeEach(() => fc.reset());

test("summarizer returns 3-bullet format", async () => {
  await fc.bedrock.setResponseRule({
    whenPromptContains: "summarize",
    respond: {
      completion: "- bullet one\n- bullet two\n- bullet three",
    },
  });

  const out = await summarize(rt, "long text here");

  expect(out.bullets).toHaveLength(3);
});

test("classifier returns spam vs ham", async () => {
  await fc.bedrock.setResponseRule({
    whenPromptContains: "classify",
    respond: { completion: JSON.stringify({ label: "spam" }) },
  });

  expect(await classify(rt, "buy now!!")).toBe("spam");
});

Same test file, different fixtures per branch of your code.

Fault injection for retry logic

import boto3
from fakecloud import FakeCloud

fc = FakeCloud()
rt = boto3.client('bedrock-runtime', endpoint_url='http://localhost:4566',
    aws_access_key_id='test', aws_secret_access_key='test', region_name='us-east-1')

fc.bedrock.inject_fault(
    operation='InvokeModel',
    error='ThrottlingException',
    count=2,
)

# your code retries with exponential backoff; third call succeeds
result = resilient_classify(rt, "some text")

history = fc.bedrock.get_call_history()
assert len(history) == 3  # two throttles, one success
assert history[0].error == 'ThrottlingException'
assert history[2].error is None

Your retry path — the thing that breaks in production if you didn't write it right — now gets exercised in every test run.

Call-history assertions

await myAgent.run({ task: "research fakecloud" });

const calls = await fc.bedrock.getCallHistory();
expect(calls).toHaveLength(3);                    // agent made exactly 3 model calls
expect(calls[0].modelId).toBe("anthropic.claude-3-haiku-20240307-v1:0");
expect(calls[0].messages[0].content).toContain("research");
expect(calls[2].messages).toHaveLength(5);        // 3-turn conversation

Assert on what your code sent, not just what it received.

Streaming

Bedrock returns binary EventStream frames for streaming endpoints. fakecloud encodes them correctly — your streaming consumer sees real chunks:

await fc.bedrock.setResponseRule({
  operation: "InvokeModelWithResponseStream",
  chunks: ["Once ", "upon ", "a ", "time"],
});

const stream = await rt.send(new InvokeModelWithResponseStreamCommand({
  modelId: "anthropic.claude-3-haiku-20240307-v1:0",
  body: JSON.stringify({ messages: [{ role: "user", content: "story" }] }),
}));

const chunks = [];
for await (const evt of stream.body!) {
  chunks.push(decodeChunk(evt));
}
expect(chunks).toEqual(["Once ", "upon ", "a ", "time"]);

Comparison

DeterministicFault injectionCall historySpeedCostAWS bill
fakecloudYesYesYesmsFreeNone
Real BedrockNoHard (rate-limit abuse)No100-2000ms$$$Real
Ollama / local LLMNoNoNo1-30sCPU + RAMNone
LocalStack UltimateNo (Ollama backed)LimitedNo1-30sPaid Ultimate tierNone
Mock libraryYesYesYesmsFreeNone, but doesn't test HTTP path