depths.core.config.LogProducerConfig
and a depths.core.config.LogAggregatorConfig
, attach them to depths.core.config.DepthsLoggerOptions
, run a short ingest, verify that options persist to disk, and recreate the logger without passing options to confirm reload.
What you will build
- A customized
depths.core.logger.DepthsLogger
with tuned producer and aggregator - A small dataset ingested locally
- A check that
options.json
is written to the instance and reloaded on restart
Prerequisites
- Python 3.12+
pip install depths
Imports and versions
The instance layout usesINSTANCE_DIR/INSTANCE_ID
as the instance root. Options are stored under <instance_root>/configs/options.json
.
Producer knobs — depths.core.config.LogProducerConfig
The producer handles validation, normalization, and queueing before rows hit the aggregator.
The key fields are queue size and drop policy. Validation is enabled by default.
The key aspect of producer is to control how much dynamic memory do you want to allocate for the in-memory queue.
But it is also a tradeoff between persistence and throughput: larger in-memory queue gives higher throughput but also
increases odds of data loss if the server crashes. Similarly, keeping the in-memory queue too small applies backpressure frequently,
causing incoming signals to be dropped even before they hit the aggregator
So, think of producer as the conduit between the tap (incoming signals) and the swimming pool (aggregator)
Aggregator knobs — depths.core.config.LogAggregatorConfig
The aggregator batches rows into typed Polars frames and appends to a Delta table.
The core controls are batch age (how long does a batch of data stays in-memory)
and row thresholds. Strict frames keep types stable.
So, essentially, tuning Aggregator is tuning a tradeoff between throughput and persistence.
If you increase the batch age, you are doing fewer disk writes so potentially higher throughput, but
then more data is in-memory: potential loss if the program terminates
Similarly, keeping it too low blows up file counts on the disk and makes the write process slow amortized, hence lower
throughput but higher persistence guarantee.
Assemble depths.core.config.DepthsLoggerOptions
Options orchestrate startup, signal hooks, and shipper toggles. Signal handlers essentially tell the DepthsLogger
how to
handle program terminations with as much grace as possible (flush to disk wherever possible). The auto start toggle tells DepthsLogger
to not require an explicit DepthsLogger.start()
before starting to ingest telemetry.
The DepthsLogger.start()
behind the scenes essentially gears up the producer to start handling incoming signals.
The auto_start
toggle hides this behind the scenes for you. There can be instances where you would exact control when the logging should
start in the code, and for those scenarios, you would opt for manually controlling this (so, auto_start=False
and a manual DepthsLogger.start()
when a start is desired)
Initialize depths.core.logger.DepthsLogger
with options
The logger prepares the instance, merges and persists options, and starts the aggregators.
Ingest a sample batch
A tiny dataset with varying severities makes filtering obvious.Stop and flush
With an explicitstop()
, we gracefully tell the DepthsLogger
instance
to flush the remaining in-memory data to memory. The auto
lets the age expiry gracefully come through.
You can also do flush="none"
to immediately flush without waiting for batch age to expire.
The key difference is the flush="none"
doesn’t drain the non-aggregated signals in the Producer’s queue. It simply wraps up the
queued disk write tasks and shutdowns.
Verify persisted rows
Use the named read helper to pull a small projection.Verify options persistence (options.json
)
Options are serialized in the instance configs. Load the file and check selected fields. The exact path is <instance_root>/configs/options.json
.
Recreate logger without options and verify reload
Construct a new logger pointing at the same instance. The saved options are merged in automatically.Quick peek query (WARN+ with substring)
Filter helpers are pushed down and collected only at the end.What just happened
- The producer enforced queueing and validation using
LogProducerConfig
- The aggregator flushed on age or row thresholds using
LogAggregatorConfig
DepthsLoggerOptions
persisted to<instance_root>/configs/options.json
and were reloaded on the next runread_logs
applied equality, substring, severity, and time-range filters with projection and row limits
Wrap-up and next steps
- Keep these knobs small at first, then widen
max_batch_rows
ormax_age_s
to trade latency for throughput - Move on to Querying possibilities with Depths to explore grouped and lazy reads
- When you are ready for object storage, try S3 backups from scratch