S3 backups from scratch
Depths v0.1.1 can seal each UTC day of data and ship it to S3 or an S3-compatible endpoint. In this guide you will configure S3 via environment, ingest logs into a past day for determinism, run a synchronous ship, and then read back from S3. Please note that this process happens automatically behind the scenes, we are manually doing it here for demonstrative purposes.What you will build
- A local instance with a small dataset written into “yesterday”
- A one-shot ship that seals, uploads, verifies row counts, and cleans local copies
- A verification query that reads sealed data from S3
Prerequisites — S3 environment variables
Set the environment before starting Python. RequiredS3_BUCKET
AWS_ACCESS_KEY_ID
orS3_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
orS3_SECRET_KEY
orS3_SECRET_ACCESS_KEY
AWS_REGION
orS3_REGION
AWS_ENDPOINT_URL
orS3_URL
S3_PREFIX
AWS_SESSION_TOKEN
Imports and setup
We keep the instance explicit and generate a small dataset with stable timestamps in the target day.S3Config.from_env()
reads the loaded environment, validates configuration and provides reader and upload options internally.
Note that from_env()
will also pick the environment variables set via terminal, just that
load_dotenv provides convenience of loading from a .env file
Load S3 config and initialize the logger
We ensure S3 is present, then start a logger. The logger prepares directories, schemas, and background services. Note that since we are manually controlling S3 backup here, we need to toggle a setting calledshipping_enabled
as false, to ensure that we manually control the S3 shipping behavior.
print(s3)
to check if config was correctly loaded.
Reads can still use local storage.
Target a past day for deterministic shipping
We ingest into “yesterday” by two steps: compute a base timestamp at UTC midnight yesterday, and retarget the logs table to yesterday’s local path so all batches land under the same day. Retargeting the aggregator ensures files are created under the desired day directory even if the process runs today. We are doing this manually purely for demonstrative purposes. In direct usage, theDepthsLogger
automatically does such
rollovers.
Ingest a batch into yesterday
We vary severity and body so the S3 read is easy to spot. Timestamps advance by 1 ms to stay in order.Seal and ship now
We call a synchronous ship. It seals yesterday, uploads to S3, verifies remote row counts, and cleans local copies if verification passes. The return value is a compact status summary of the process. Once again, this shipping happens automatically behind the scenes and is generally not to be controlled manually.DepthsLogger
.
Past days are allowed. The summary includes per-table counts and an overall status.
Read back from S3
We run two reads: a small dict sample and a LazyFrame for a quick severity rollup. Thestorage="s3"
selector forces object storage.
What you learned
- How to configure S3 for Depths using environment variables
- How to write into a past day by retargeting the logs table and using UTC timestamps
- How to run a one-shot ship and inspect its summary
- How to read sealed data from S3 as dicts or as a LazyFrame for quick summaries