Glue
AWS Glue Data Catalog (databases, tables, partitions) + Jobs/JobRuns control plane. JSON 1.1 protocol.
fakecloud implements AWS Glue's JSON 1.1 control plane covering the Data Catalog (databases, tables, partitions) and Jobs (job CRUD + JobRun lifecycle), 26 operations total. The Data Catalog is the same store Athena reads through: tables created here surface immediately under AwsDataCatalog for Athena's ListDatabases / GetTableMetadata paths.
Status: control-plane parity. The Glue ETL runtime does not execute Spark / Python Shell scripts — JobRuns transition through STARTING -> RUNNING -> SUCCEEDED for control-plane testing only.
Supported today
- Databases —
CreateDatabase/GetDatabase/UpdateDatabase/DeleteDatabase/GetDatabases. Per-account namespace, name-uniqueness enforcement, parameter passthrough. - Tables —
CreateTable/GetTable/UpdateTable/DeleteTable/GetTables. FullStorageDescriptorround-trip: columns + types, location, input/output format, SerDe info, partition keys, table parameters, view text, table type. Tables are looked up by(catalogId, databaseName, name). - Partitions —
CreatePartition/BatchCreatePartition/BatchGetPartition/GetPartition/UpdatePartition/DeletePartition/GetPartitions.GetPartitionsExpression pruning — theExpressionfilter is parsed and evaluated against partition values server-side. Supports=/!=/<>/</<=/>/>=/IN/BETWEEN/LIKE/IS NULL/IS NOT NULL, plusAND/OR/NOTcombinators and parentheses. Type-aware comparison forstring,int,bigint,date,timestamp. Unparseable expressions returnInvalidInputException, matching real Glue.
- Jobs —
CreateJob/GetJob/GetJobs/ListJobs/UpdateJob/DeleteJob. Full round-trip onCommand(name + script location + python version + runtime),DefaultArguments,Connections,MaxRetries,Timeout,MaxCapacity,WorkerType,NumberOfWorkers,GlueVersion,ExecutionProperty,NotificationProperty. - JobRuns —
StartJobRun/GetJobRun/GetJobRuns. Runs are assigned a JobRunId, acceptArgumentsoverrides, captureTimeout/MaxCapacity/WorkerType/NumberOfWorkers, and step throughSTARTING -> RUNNING -> SUCCEEDEDwithStartedOn/CompletedOn/ExecutionTimepopulated.
Athena integration
The same Data Catalog state powers Athena's catalog reads:
glue:CreateDatabase-> visible inathena:ListDatabasesunderAwsDataCatalog.glue:CreateTable-> visible inathena:GetTableMetadata/athena:ListTableMetadata.- Column types and partition keys round-trip end-to-end, so Athena's
DESCRIBE/SHOW TABLESreturn the schema you registered through Glue.
See the Athena docs for the minimal SQL evaluator that runs over this catalog.
Smoke test
fakecloud &
aws --endpoint-url http://localhost:4566 glue create-database \
--database-input Name=analytics
aws --endpoint-url http://localhost:4566 glue create-table \
--database-name analytics \
--table-input 'Name=events,PartitionKeys=[{Name=dt,Type=string}],StorageDescriptor={Columns=[{Name=id,Type=string}],Location=s3://my-bucket/events/}'
# Register a few partitions, then prune them server-side.
for d in 2026-05-09 2026-05-10 2026-05-11; do
aws --endpoint-url http://localhost:4566 glue create-partition \
--database-name analytics --table-name events \
--partition-input Values=$d,StorageDescriptor={Location=s3://my-bucket/events/dt=$d/}
done
aws --endpoint-url http://localhost:4566 glue get-partitions \
--database-name analytics --table-name events \
--expression "dt >= '2026-05-10'"
# Job + JobRun control plane.
aws --endpoint-url http://localhost:4566 glue create-job \
--name daily-rollup \
--role arn:aws:iam::000000000000:role/glue \
--command Name=glueetl,ScriptLocation=s3://my-bucket/scripts/rollup.py,PythonVersion=3 \
--glue-version 4.0
RUN_ID=$(aws --endpoint-url http://localhost:4566 glue start-job-run \
--job-name daily-rollup --query 'JobRunId' --output text)
aws --endpoint-url http://localhost:4566 glue get-job-run \
--job-name daily-rollup --run-id $RUN_IDCaveats
The ETL runtime is not implemented. StartJobRun does not fetch your script from S3, does not spin up a Spark / Python Shell worker, and does not produce any output. Runs land in SUCCEEDED immediately. Use real Glue for actual ETL execution; use fakecloud for testing the job-orchestration and catalog-management code paths around it.
Crawlers, connections, triggers, workflows, dev endpoints, ML transforms, blueprints, schema registry, and data quality APIs are not implemented. The Data Catalog is the same store Athena reads through, so registered tables stay exactly what you wrote with CreateTable / CreatePartition.