Local Generative AI: Implementing Bedrock with fakecloud

May 17, 2026 · Lucas Vieira

Generative AI development on AWS Bedrock is governed by a single, inescapable metric: the cost of the feedback loop. As of May 2026, frontier models like Claude 4.7 Opus command $15 per million input tokens and $75 per million output tokens. Even the workhorse models, such as Claude 4.6 Sonnet, maintain a price point of $3 per million input tokens. When your integration tests run on every commit, and your local development environment calls live APIs for every prompt tweak, these costs scale linearly with your engineering velocity.

Testing generative AI on live infrastructure is inefficient. It introduces network latency, consumes production quotas, and generates line items on your AWS bill for code that hasn't even reached a staging environment. fakecloud eliminates this friction by providing a high-fidelity, zero-friction local AWS environment. It allows you to implement and test 214 Bedrock operations across 4 APIs locally with a ~19MB binary that starts in ~500ms.

The Cost of Iteration: Why Live Infrastructure Fails Developers

In a modern CI/CD pipeline, a single feature branch might undergo fifty test runs before merging. If those tests involve Bedrock Agents, Knowledge Bases, or complex Guardrail configurations, the overhead is not just financial.

  1. Latency: A round-trip to us-east-1 for a model invocation adds hundreds of milliseconds of overhead. In a suite of 200 integration tests, this turns a sub-minute test run into a coffee break.
  2. State Management: Cleaning up Bedrock Agents or Knowledge Bases in a live account is slow and prone to leaving orphaned resources that continue to accrue costs (like the $345/month minimum for OpenSearch Serverless backends).
  3. Auth Friction: As of March 2026, the primary incumbent in the local AWS emulation space consolidated its offerings into a proprietary image requiring an account and an auth token. This forces developers to manage credentials even for local-only work.

fakecloud operates on a different philosophy. It requires no account, no auth token, and no internet connection. You run the binary, point your SDK at the local endpoint, and execute your logic against a 100% API-conformant environment.

Bedrock Locally: Support for 214 Operations Across 4 APIs

fakecloud provides comprehensive coverage for the Bedrock API surface, spanning both the control plane (management) and the data plane (runtime). While other emulators might only support basic model invocation via a proxy to Ollama, fakecloud implements the full Smithy-modeled API shape. This ensures that your code, which uses the official AWS SDK, behaves exactly as it would in production.

Data Plane: Runtime Operations

The runtime operations are where your application spends most of its time. fakecloud supports the full suite of invocation methods:

Because fakecloud is built for testing, it doesn't just proxy these calls. It allows you to configure deterministic responses. You can simulate model hallucinations, specific JSON structures, or Guardrail violations to ensure your application's error-handling logic is robust.

Control Plane: Infrastructure Management

Building AI applications often involves more than just calling an LLM. You need to manage the surrounding infrastructure. fakecloud supports 214 operations across 4 APIs, including:

CategoryOperations SupportedKey Features
AgentsCreateAgent, InvokeAgent, CreateAgentAliasFull agentic loop emulation with action group triggers.
Knowledge BasesCreateKnowledgeBase, IngestContent, RetrieveLocal RAG workflows using integrated vector storage.
GuardrailsCreateGuardrail, UpdateGuardrail, ListGuardrailsPolicy-based content filtering and sensitive information masking.
Provisioned ThroughputCreateProvisionedModelThroughputTesting code that relies on dedicated capacity identifiers.
Custom ModelsCreateModelImportJob, ListCustomModelsSimulating the lifecycle of fine-tuned or imported weights.

Coding the Mock: Python Implementation

To use fakecloud, you do not need to change your application logic. You only change the configuration of your AWS client. In Python, using boto3, you direct the endpoint_url to your local fakecloud instance (defaulting to port 4566).

Step 1: Start fakecloud

Run the standalone binary. It occupies ~19MB of disk space and starts in roughly 500ms.

# Download and run
curl -fsSL https://fakecloud.dev/install.sh | bash
fakecloud

Step 2: Configure the Client

Your application code remains clean. You inject the endpoint URL via environment variables or direct client configuration.

import boto3
import os

# Use dummy credentials - fakecloud doesn't validate them
os.environ["AWS_ACCESS_KEY_ID"] = "testing"
os.environ["AWS_SECRET_ACCESS_KEY"] = "testing"
os.environ["AWS_DEFAULT_REGION"] = "us-east-1"

# Point the client at fakecloud
bedrock_runtime = boto3.client(
    service_name="bedrock-runtime",
    endpoint_url="http://localhost:4566"
)

def generate_response(prompt):
    response = bedrock_runtime.invoke_model(
        modelId="anthropic.claude-3-sonnet-20240229-v1:0",
        body=f'{{"prompt": "{prompt}", "max_tokens": 100}}'
    )
    return response['body'].read()

This code is identical to what you would run in production, minus the endpoint_url. By keeping the SDK usage standard, you ensure that your integration tests are actually testing the code you ship, not a simplified mock object.

Verifying Outputs: Using the fakecloud SDK

Testing generative AI is notoriously difficult because of the non-deterministic nature of LLMs. fakecloud solves this by separating the Application Client from the Test Client.

While your application uses the standard boto3 client to call InvokeModel, your test suite uses the fakecloud-sdk to assert what happened inside the emulator. This allows you to verify that the correct prompt was sent, the correct parameters (like temperature or top_p) were used, and that the model was called the expected number of times.

from fakecloud_sdk import FakeCloudClient

# The test client connects to fakecloud's internal management API
fc = FakeCloudClient("http://localhost:4566")

def test_ai_logic():
    # 1. Run the application logic
    generate_response("Explain quantum physics to a cat.")

    # 2. Assert on the invocation using the fakecloud SDK
    invocations = fc.bedrock.list_invocations()
    
    assert len(invocations) == 1
    assert "quantum physics" in invocations[0].request_body['prompt']
    assert invocations[0].model_id == "anthropic.claude-3-sonnet-20240229-v1:0"

This approach eliminates the need for complex monkey-patching or manual mocking of the boto3 client. You are asserting against the actual state of the local infrastructure.

Reliability Through Conformance

One of the primary risks of using a local emulator is "drift"—the possibility that the emulator behaves differently than the real AWS API. fakecloud mitigates this through a rigorous conformance pipeline.

Every implemented service is validated against AWS's own Smithy models. On every commit, the fakecloud engine runs through 86,327 generated test variants. These variants cover:

For Bedrock specifically, this means that when you call ConverseStream, the chunks returned by fakecloud follow the exact binary framing and JSON structure required by the AWS SDK's event stream parser. If your code works against fakecloud, it will work against the real Bedrock endpoint.

Advanced Scenarios: Guardrails and Agents

As of 2026, AI applications are moving beyond simple chat interfaces into complex agentic workflows. fakecloud supports these advanced scenarios out of the box.

Local Guardrail Testing

You can create a Guardrail locally to test how your application handles blocked content. fakecloud's implementation of ApplyGuardrail actually evaluates the input against the configured sensitive information filters and denied topics.

# Create a guardrail via CLI
aws --endpoint-url http://localhost:4566 bedrock create-guardrail \
    --name "PII-Filter" \
    --sensitive-information-policy-config '{"piiEntitiesConfig": [{"type": "EMAIL", "action": "BLOCK"}]}'

When your application calls InvokeModel with a prompt containing an email address, fakecloud will return a GuardrailAction: BLOCK response, allowing you to test your UI's "Content Blocked" state without ever touching a live model.

Agentic Loops

fakecloud's support for InvokeAgent includes the ability to trigger local Lambda functions as action groups. This allows for full end-to-end testing of the agentic loop:

  1. App calls InvokeAgent on fakecloud.
  2. fakecloud parses the request and identifies the required Action Group.
  3. fakecloud triggers a local Lambda (running in a local Docker container).
  4. Lambda returns data to fakecloud.
  5. fakecloud formats the final response for the App.

This entire flow happens locally, with zero latency from the public internet and zero cost from AWS.

Comparison: fakecloud vs. Alternatives

FeaturefakecloudLocalStack (2026)Live AWS
Binary Size~19MB~1.2GB (Docker Image)N/A
Startup Time~500ms~15-30sN/A
Account RequiredNoYes (as of March 2026)Yes
Auth TokenNoYesYes
Bedrock Ops214 (4 APIs)4 (Ultimate Tier)All
CostFree (AGPL-3.0)Paid SubscriptionPay-per-token
Offline ModeYesLimitedNo

Implementation Strategy for Teams

To maximize the utility of fakecloud in your organization, follow these three steps:

  1. Standardize the Endpoint: Use an environment variable like AWS_ENDPOINT_URL in your application's configuration layer. Default it to None for production and http://localhost:4566 for local development.
  2. CI Integration: Add fakecloud to your GitHub Actions or GitLab CI pipeline as a service container. Because it is a single binary, it starts instantly, adding negligible time to your build.
  3. Deterministic Seeding: Use the fakecloud SDK to seed your local environment with the necessary Bedrock Agents and Knowledge Bases before your tests run. This ensures every developer starts with a clean, known state.

Next Steps

fakecloud provides the most comprehensive local emulation of AWS Bedrock available today. By moving your generative AI testing to a local environment, you reduce costs, eliminate network dependencies, and tighten your development loop.

To begin implementing local Bedrock workflows, view the supported model list and detailed operation mapping in our documentation at fakecloud.dev/docs/services/bedrock.