Data Schema Reference¶
blq stores build logs and parsed events in a DuckDB database at .bird/blq.duckdb. This document describes the schema for users who want to write custom SQL queries.
Directory Structure¶
.bird/
├── blq.duckdb # DuckDB database (tables, macros)
├── blobs/ # Content-addressed output storage
│ └── content/
│ ├── ab/
│ │ └── {hash}.bin
│ └── ...
├── config.toml # Project configuration
└── commands.toml # Registered commands
Tables¶
attempts¶
Written when a command starts. Commands without a matching outcomes row are still running.
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
session_id |
VARCHAR | Session identifier |
timestamp |
TIMESTAMP | When command started |
cwd |
VARCHAR | Working directory |
cmd |
VARCHAR | Full command string |
executable |
VARCHAR | Extracted executable name |
pid |
INTEGER | Process ID |
format_hint |
VARCHAR | Detected format (gcc, pytest, etc.) |
client_id |
VARCHAR | Client identifier (blq-shell, blq-mcp) |
hostname |
VARCHAR | Machine hostname |
username |
VARCHAR | User who ran command |
tag |
VARCHAR | User-defined tag |
source_name |
VARCHAR | Registered command name |
source_type |
VARCHAR | run, exec, import, capture |
environment |
JSON | Captured environment variables |
platform |
VARCHAR | OS (Linux, Darwin, Windows) |
arch |
VARCHAR | Architecture (x86_64, arm64) |
git_commit |
VARCHAR | HEAD SHA |
git_branch |
VARCHAR | Current branch |
git_dirty |
BOOLEAN | Uncommitted changes present |
ci |
JSON | CI provider context |
extension_data |
JSON | Extension data (sandbox spec, grades, metrics) |
date |
DATE | Partition date |
outcomes¶
Written when a command completes. One-to-one with attempts.
| Column | Type | Description |
|---|---|---|
attempt_id |
UUID | References attempts.id |
completed_at |
TIMESTAMP | When command finished |
duration_ms |
BIGINT | Wall-clock duration in milliseconds |
exit_code |
INTEGER | Exit code (NULL = crashed) |
signal |
INTEGER | If killed by signal (15=SIGTERM, 9=SIGKILL) |
timeout |
BOOLEAN | If killed by timeout |
date |
DATE | Partition date |
events¶
Parsed diagnostics (errors, warnings, test results).
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
invocation_id |
UUID | References invocations.id |
event_index |
INTEGER | Index within invocation (1-based) |
client_id |
VARCHAR | Client identifier |
hostname |
VARCHAR | Machine hostname |
event_type |
VARCHAR | diagnostic, test_result, etc. |
severity |
VARCHAR | error, warning, info, note |
ref_file |
VARCHAR | Source file path |
ref_line |
INTEGER | Line number |
ref_column |
INTEGER | Column number |
message |
VARCHAR | Error/warning message |
code |
VARCHAR | Error code (e.g., E0308) |
rule |
VARCHAR | Rule name (e.g., no-unused-vars) |
tool_name |
VARCHAR | Tool that generated event |
category |
VARCHAR | Error category |
fingerprint |
VARCHAR | Unique identifier for deduplication |
log_line_start |
INTEGER | Start line in raw log |
log_line_end |
INTEGER | End line in raw log |
context |
VARCHAR | Surrounding context |
metadata |
JSON | Format-specific extras |
format_used |
VARCHAR | Parser format (gcc, cargo, pytest) |
date |
DATE | Partition date |
outputs¶
Captured stdout/stderr, with content stored in blobs.
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
invocation_id |
UUID | References invocations.id |
stream |
VARCHAR | stdout, stderr, combined |
content_hash |
VARCHAR | BLAKE2b hash (64 hex chars) |
byte_length |
BIGINT | Content size in bytes |
storage_type |
VARCHAR | inline or blob |
storage_ref |
VARCHAR | data: URI or file: path |
content_type |
VARCHAR | MIME type or format hint |
date |
DATE | Partition date |
blob_registry¶
Tracks content-addressed blobs for deduplication.
| Column | Type | Description |
|---|---|---|
content_hash |
VARCHAR | BLAKE2b hash (primary key) |
byte_length |
BIGINT | Content size |
compression |
VARCHAR | none, gzip, zstd |
ref_count |
INTEGER | Reference count |
first_seen |
TIMESTAMP | When first stored |
last_accessed |
TIMESTAMP | Last access time |
storage_path |
VARCHAR | Relative path in blobs/ |
sessions¶
Tracks invoker sessions (shell, CLI, MCP).
| Column | Type | Description |
|---|---|---|
session_id |
VARCHAR | Primary key |
client_id |
VARCHAR | Client identifier |
invoker |
VARCHAR | blq, blq-mcp |
invoker_pid |
INTEGER | Process ID |
invoker_type |
VARCHAR | cli, mcp, import, capture |
registered_at |
TIMESTAMP | Session start time |
cwd |
VARCHAR | Initial working directory |
date |
DATE | Partition date |
invocations (legacy)¶
Legacy table for completed runs. New code uses attempts + outcomes.
| Column | Type | Description |
|---|---|---|
id |
UUID | Primary key |
session_id |
VARCHAR | Session identifier |
timestamp |
TIMESTAMP | When command ran |
duration_ms |
BIGINT | Duration in milliseconds |
cwd |
VARCHAR | Working directory |
cmd |
VARCHAR | Full command string |
exit_code |
INTEGER | Exit code |
source_name |
VARCHAR | Registered command name |
| ... | ... | (same metadata as attempts) |
Blob Storage¶
Output content is stored using content-addressed storage:
- Hashing: Content is hashed with BLAKE2b (32 bytes = 64 hex chars)
- Deduplication: Identical outputs share the same blob
- Path structure:
blobs/content/{first-2-chars}/{full-hash}.bin
Example:
Storage types:
- inline: Small outputs (<4KB) stored as data: URIs in the database
- blob: Larger outputs stored as files with file: references
Reference Format¶
Events and runs use human-friendly references:
| Format | Example | Description |
|---|---|---|
source:run_serial |
build:3 |
Run reference |
source:run_serial:event_id |
build:3:1 |
Event reference |
run_serial |
3 |
Run without source (any command) |
run_serial:event_id |
3:1 |
Event without source |
source= registered command name (build, test, lint)run_serial= sequential run number (1, 2, 3...)event_id= event index within run (1, 2, 3...)
Macros¶
Table-returning macros for common queries. Use with SELECT * FROM macro_name().
Data Loading¶
| Macro | Description |
|---|---|
blq_load_events() |
All events with run metadata joined |
blq_load_runs() |
Completed runs with event counts |
blq_load_attempts() |
All attempts with status (pending/completed/orphaned) |
blq_load_source_status() |
Latest run per source with status badge |
Quick Queries¶
| Macro | Description |
|---|---|
blq_status() |
Status overview (latest run per source) |
blq_errors(n := 10) |
Recent errors |
blq_warnings(n := 10) |
Recent warnings |
blq_history(n := 20) |
Run history |
blq_running() |
Currently running commands |
blq_diff(run1, run2) |
Compare errors between two runs |
Utilities¶
| Macro | Description |
|---|---|
blq_status_badge(errors, warnings, exit_code) |
Format status badge |
blq_ref(run_id, event_id) |
Build reference string |
blq_parse_ref(ref) |
Parse reference into components |
blq_output(inv_id, stream := 'combined') |
Get output content |
blq_location(ref_file, ref_line, ref_column) |
Format location string (file:line:col) |
blq_blob_root() |
Get blob storage root path |
blq_errors_json(n := 10) |
Recent errors as JSON (for MCP/agents) |
blq_history_status(n := 20, status := NULL) |
Run history with optional status filter |
Sandbox¶
| Macro | Description |
|---|---|
blq_sandbox_summary() |
Sandbox specs and grades per command (from extension_data) |
Line Selection (requires read_lines extension)¶
| Macro | Description |
|---|---|
blq_read_lines(content, lines_spec, marks := []) |
Read lines with optional markers for highlighting |
blq_search_lines(content, pattern, ctx := 0, case_insensitive := true) |
Search lines with regex pattern and context |
Example Queries¶
Basic Queries¶
-- Recent errors
SELECT * FROM blq_errors(20);
-- All errors from a specific run
SELECT * FROM blq_load_events()
WHERE run_serial = 5 AND severity = 'error';
-- Status overview
SELECT * FROM blq_status();
Filtering Events¶
-- Errors in a specific file
SELECT ref, message, ref_line
FROM blq_load_events()
WHERE severity = 'error' AND ref_file LIKE '%parser.c';
-- Errors by fingerprint (deduplicated)
SELECT fingerprint, COUNT(*) as occurrences, MIN(message) as message
FROM blq_load_events()
WHERE severity = 'error'
GROUP BY fingerprint
ORDER BY occurrences DESC;
-- Events from the last hour
SELECT *
FROM blq_load_events()
WHERE started_at > now() - INTERVAL '1 hour';
Run Analysis¶
-- Failed runs in the last week
SELECT source_name, run_serial, exit_code, error_count, started_at
FROM blq_load_runs()
WHERE exit_code != 0 AND started_at > now() - INTERVAL '7 days'
ORDER BY started_at DESC;
-- Average duration by command
SELECT source_name,
AVG(duration_ms) / 1000.0 as avg_seconds,
COUNT(*) as run_count
FROM blq_load_runs()
GROUP BY source_name;
-- Commands currently running
SELECT source_name, command, elapsed_ms / 1000.0 as seconds
FROM blq_load_attempts()
WHERE status = 'pending';
Comparing Runs¶
-- Errors that appear in run 5 but not run 4
SELECT * FROM blq_diff(4, 5);
-- Manual diff using fingerprints
WITH run4 AS (
SELECT DISTINCT fingerprint FROM blq_load_events()
WHERE run_serial = 4 AND severity = 'error'
),
run5 AS (
SELECT DISTINCT fingerprint, message, ref_file, ref_line
FROM blq_load_events()
WHERE run_serial = 5 AND severity = 'error'
)
SELECT r5.* FROM run5 r5
LEFT JOIN run4 r4 ON r5.fingerprint = r4.fingerprint
WHERE r4.fingerprint IS NULL; -- New errors in run 5
Git Integration¶
-- Errors by commit
SELECT git_commit, git_branch, COUNT(*) as error_count
FROM blq_load_events()
WHERE severity = 'error'
GROUP BY git_commit, git_branch
ORDER BY error_count DESC;
-- Runs with uncommitted changes
SELECT source_name, run_serial, error_count
FROM blq_load_runs()
WHERE git_dirty = true;
Output Access¶
-- Get output info for a run
SELECT o.stream, o.byte_length, o.storage_type
FROM outputs o
JOIN invocations i ON o.invocation_id = i.id
JOIN (SELECT id, ROW_NUMBER() OVER (ORDER BY timestamp) as run_serial
FROM invocations) numbered ON i.id = numbered.id
WHERE numbered.run_serial = 5;
-- Using the macro
SELECT * FROM blq_output(
(SELECT id FROM invocations ORDER BY timestamp LIMIT 1 OFFSET 4),
'combined'
);
Advanced Analysis¶
-- Most common error messages
SELECT LEFT(message, 80) as message_prefix, COUNT(*) as count
FROM blq_load_events()
WHERE severity = 'error'
GROUP BY message_prefix
ORDER BY count DESC
LIMIT 10;
-- Error rate over time (by day)
SELECT date,
SUM(error_count) as total_errors,
COUNT(*) as run_count,
SUM(error_count) * 1.0 / COUNT(*) as errors_per_run
FROM blq_load_runs()
GROUP BY date
ORDER BY date DESC;
-- Files with most errors
SELECT ref_file, COUNT(*) as error_count
FROM blq_load_events()
WHERE severity = 'error' AND ref_file IS NOT NULL
GROUP BY ref_file
ORDER BY error_count DESC
LIMIT 10;