Querying possibilities with Depths
Depths v0.1.1 lets you query persisted data through helpers ondepths.core.logger.DepthsLogger
.
This guide focuses on logs and shows four patterns: return rows as Python dicts, keep results lazy for Polars transforms, apply named predicates, and group results.
This rich flexibility allows you to perform efficient queries on your data, regardless of whether it is stored on disk or on S3.
What you will build
- A local dataset written via
DepthsLogger
- A “rows as dicts” read for quick printing
- A LazyFrame read for chained transforms and a final
collect
- Examples of named predicates and projection
- A grouped severity summary
Prerequisites
- Python 3.12+
pip install depths
Imports and setup
We’ll set up a fresh instance and generate a small log dataset with varying severities and bodies. We capture today’s date as a timezone-aware UTC string so day filters are precise.Create the logger and write a sample dataset
As shown in previous guides, we construct a logger, create minimal OTel-shaped log rows, ingest them, and then stop with an automatic flush to persist batches.A basic read as dicts
We query today’s data using an aware UTC day string, select a few columns, and cap results with max_rows. Returning dicts is ergonomic for quick inspection and printing.Keep it lazy for downstream transforms
Ask Depths for a LazyFrame to build a transform pipeline. Here we add a string-length column using str.len_chars() on the body column and then collect() to realize results. with_columns adds or replaces columns in a lazy plan. LazyFrame option allows us to prune the data down to most relevant segments, minimizing disk/S3 reads.Named predicates: time, equality, severity, substring
Depths read helpers expose common predicates that push down work. Here we constrain by day, project, and service; filter by minimum severity; and search for a substring in body via thebody_like
predicate.
We also project a small set of columns and limit rows.
Group-by summary on a lazy plan
We compute a quick severity distribution. In Polars, call group_by then agg with expression and sort the result. We collect only once at the end.Storage selector overview
read_logs
accepts a storage
selector. Use local
to force local Delta tables, s3
to force object storage when S3 is configured, or auto
to let Depths choose.
This guide used local storage only. The default value is auto
so you don’t have to worry about explicitly selecting s3
when you start backing up the telemetry on an S3 bucket, the queries would run seamlessly.
What you learned
- Returning rows as dicts fits simple scripts and prints
- Returning a lazy frame lets you add Polars transforms and collect once
- Named predicates cut input size and speed up reads
- Group-by on a lazy plan gives quick summaries without building a separate ETL