ADR Export Guide
Complete guide for exporting Architecture Decision Records (ADRs) in JSON/JSONL format.
Table of Contents
- Overview
- Format Comparison
- Filter Options
- Output Schema
- Integration Patterns
- Performance Considerations
- Troubleshooting
Overview
The adr export command exports ADRs as knowledge fragments - RAG-optimized representations designed for semantic search, machine learning, and knowledge graph applications.
Key Features
- Knowledge Fragments: Not raw ADR dumps, but semantic extractions optimized for LLM ingestion
- Filtering: Status, project, classification, date range
- Streaming: JSONL format for constant-memory processing
- Metadata: Embedding priority, content hashing, relations graph
- FAISS-Ready: Compatible with vector databases and semantic search engines
Format Comparison
JSON Format
Use Case: Interactive exploration, small datasets (<100 ADRs), human-readable output
Characteristics:
- Single JSON array
- Pretty-printed by default (unless --compact)
- Requires loading entire file into memory
- Easy to manipulate with jq
Example:
adr export adr/accepted --format json > decisions.json
Output Structure:
[
{
"id": "ADR-0001",
"type": "architecture_decision",
...
},
{
"id": "ADR-0002",
...
}
]
JSONL Format
Use Case: Pipeline ingestion, large datasets (>100 ADRs), streaming processing
Characteristics: - One JSON object per line - Streamable (constant memory) - Line-by-line processing - Ideal for ETL pipelines
Example:
adr export adr/accepted --format jsonl --compact > decisions.jsonl
Output Structure:
{"id": "ADR-0001", "type": "architecture_decision", ...}
{"id": "ADR-0002", "type": "architecture_decision", ...}
Compact vs Pretty
| Option | Size | Use Case |
|---|---|---|
| Default (pretty) | Larger | Development, debugging, human reading |
--compact |
~60% smaller | Production, pipelines, storage |
Example Comparison:
# Pretty (3.2MB)
adr export adr --format json > pretty.json
# Compact (1.2MB)
adr export adr --format json --compact > compact.json
Filter Options
Status Filter
Filter by ADR lifecycle status.
Syntax:
adr export <path> --filter-status <status>
Valid Values: proposed, accepted, rejected, deprecated, superseded
Multiple Status:
adr export adr --filter-status accepted --filter-status superseded
Use Cases: - Export only accepted decisions for production knowledge base - Analyze rejected alternatives for research - Track deprecated decisions for migration planning
Project Filter
Filter by project scope (OR logic).
Syntax:
adr export <path> --filter-project <project>
Common Projects: CEREBRO, PHANTOM, SPECTRE, NEUTRON, GLOBAL
Multiple Projects (OR logic):
adr export adr --filter-project CEREBRO --filter-project PHANTOM
This returns ADRs that belong to CEREBRO OR PHANTOM (not necessarily both).
Use Cases: - Export PHANTOM-specific decisions for ML ingestion - Generate project-specific documentation - Audit cross-project dependencies
Classification Filter
Filter by decision classification (severity/impact).
Syntax:
adr export <path> --filter-classification <class>
Valid Values: critical, major, minor, patch
Use Cases: - Export critical decisions for security review - Generate change logs by classification - Prioritize knowledge base updates
Date Range Filter
Filter by decision date (YYYY-MM-DD format).
Syntax:
adr export <path> --since YYYY-MM-DD --until YYYY-MM-DD
Examples:
# Decisions from 2026
adr export adr --since 2026-01-01 --until 2026-12-31
# Recent decisions (last 30 days)
adr export adr --since $(date -d '30 days ago' +%Y-%m-%d)
# Future decisions (proposed for next quarter)
adr export adr --filter-status proposed --since 2026-04-01
Use Cases: - Generate quarterly architecture reports - Track decision velocity over time - Export recent changes for onboarding
Combined Filters
All filters use AND logic when combined.
Example:
adr export adr \
--filter-status accepted \
--filter-project CEREBRO \
--filter-classification major \
--since 2026-01-01 \
--format jsonl --compact
This returns ADRs that match ALL criteria:
- Status is accepted AND
- Project includes CEREBRO AND
- Classification is major AND
- Date >= 2026-01-01
Output Schema
Knowledge Fragment Structure
interface KnowledgeFragment {
// Identity
id: string; // ADR-0001
type: "architecture_decision";
title: string;
status: "proposed" | "accepted" | "rejected" | "deprecated" | "superseded";
summary: string; // One-line summary
// Scope
scope: {
projects: string[]; // ["CEREBRO", "PHANTOM"]
layers: string[]; // ["infrastructure", "ml"]
};
// Knowledge (RAG-Optimized)
knowledge: {
what: string; // Decision text
why: string; // Context/rationale
implications: {
positive: string[];
negative: string[];
};
alternatives_rejected: string[];
};
// Semantic Enrichment
questions: string[]; // Questions this ADR answers
keywords: string[]; // Searchable keywords
concepts: string[]; // High-level concepts
// Relations (Knowledge Graph)
relations: {
supersedes: string[]; // ADRs this replaces
related: string[]; // Related decisions
enables: string[]; // ADRs this enables
};
// Governance
governance: {
classification: "critical" | "major" | "minor" | "patch";
compliance: string[]; // ["LGPD", "SOC2"]
};
// Metadata
metadata: {
date: string; // YYYY-MM-DD
version: number;
hash: string; // Content hash (change detection)
embedding_priority: "low" | "normal" | "high";
};
}
Field Descriptions
embedding_priority: Hints for semantic search engines
- high: Critical decisions, frequently queried
- normal: Standard decisions
- low: Historical, rarely queried
hash: SHA256 prefix for change detection
- Computed from context + decision fields
- Use to detect updates without full comparison
questions: Natural language queries this ADR answers
- Used for query-to-document matching
- Example: "Why did we choose NixOS?" → ADR-0001
relations: Knowledge graph edges
- supersedes: This ADR replaces older decisions
- related: Cross-references to related decisions
- enables: This ADR is a prerequisite for others
Integration Patterns
PHANTOM (Data Sanitizer)
PHANTOM processes ADRs into semantic chunks for FAISS indexing.
Export:
adr export adr/accepted --format jsonl --compact > /tmp/adr.jsonl
Ingestion:
from phantom import CortexProcessor, FAISSVectorStore
# Load JSONL
with open('/tmp/adr.jsonl') as f:
adrs = [json.loads(line) for line in f]
# Process each ADR
for adr in adrs:
# Chunk semantic sections
chunks = [
{"text": adr["summary"], "type": "summary", "priority": "high"},
{"text": adr["knowledge"]["why"], "type": "context", "priority": adr["metadata"]["embedding_priority"]},
{"text": adr["knowledge"]["what"], "type": "decision", "priority": "critical"}
]
# Embed
embeddings = cortex.embed([c["text"] for c in chunks])
# Index with metadata
faiss_store.add_documents(
chunks=chunks,
embeddings=embeddings,
metadata={
"id": adr["id"],
"status": adr["status"],
"projects": adr["scope"]["projects"],
"classification": adr["governance"]["classification"],
"date": adr["metadata"]["date"]
}
)
# Save index
faiss_store.save("adr_ledger.faiss")
Query:
# Semantic search with metadata filtering
results = faiss_store.search(
query="Why NixOS?",
top_k=5,
filter={"status": "accepted", "projects": ["NEUTRON"]}
)
CEREBRO (Air-Gapped Knowledge Vault)
CEREBRO maintains isolated, RAG-optimized knowledge bases.
Export:
adr export adr/accepted --format json --filter-status accepted > cerebro_kb.json
Ingestion:
from cerebro import KnowledgeVault
vault = KnowledgeVault.load_or_create("adr_vault")
# Import ADRs
with open('cerebro_kb.json') as f:
fragments = json.load(f)
for fragment in fragments:
vault.add_decision(
id=fragment["id"],
content=fragment["knowledge"],
metadata=fragment["metadata"],
graph=fragment["relations"]
)
# Query with graph traversal
response = vault.query("Explain our authentication strategy")
# Returns: Relevant ADRs + related decisions via graph
External Tools
PostgreSQL + pgvector
# Export
adr export adr/accepted --format jsonl --compact > adr.jsonl
# Import to PostgreSQL
cat adr.jsonl | jq -c '{id: .id, content: .knowledge, meta: .metadata}' | \
psql -c "COPY adr_knowledge (data) FROM STDIN WITH (FORMAT csv, QUOTE '\"');"
Elasticsearch
# Bulk index
cat adr.jsonl | \
jq -c '{index: {_index: "adrs", _id: .id}}, .' | \
curl -X POST http://localhost:9200/_bulk -H 'Content-Type: application/x-ndjson' --data-binary @-
Neo4j (Knowledge Graph)
# Export relations
adr export adr/accepted --format json | \
jq -r '.[] | "\(.id),\(.relations.supersedes[]),supersedes"' | \
cypher-shell --format plain
Performance Considerations
Memory Usage
| Operation | JSON | JSONL |
|---|---|---|
| Parse 100 ADRs | ~5MB | ~5MB |
| Export 100 ADRs | ~10MB (array overhead) | ~2MB (streaming) |
| Large datasets (1000+) | Consider chunking | Constant memory |
Export Speed
Benchmarks (12-core, 32GB RAM, NVMe): - Parse 100 ADRs: ~2 seconds - Export 100 ADRs (JSON): ~0.5 seconds - Export 100 ADRs (JSONL): ~0.3 seconds - Filter 1000 ADRs by status: ~0.1 seconds
Optimization Strategies
1. Use JSONL for large datasets:
# Memory-efficient streaming
adr export adr --format jsonl --compact | \
while read -r line; do
process_adr "$line"
done
2. Filter early:
# Export only what you need
adr export adr/accepted --filter-status accepted --filter-project PHANTOM
3. Parallel processing:
# Split by project and process in parallel
for project in CEREBRO PHANTOM SPECTRE; do
adr export adr --filter-project $project --format jsonl --compact > ${project}.jsonl &
done
wait
Troubleshooting
Issue: Empty Result Set
Symptom:
adr export adr --filter-status accepted --filter-project NONEXISTENT
# Returns: []
Solution:
- Check filter values (case-sensitive)
- Verify projects exist in ADRs:
bash
adr export adr --format json | jq -r '.[].scope.projects[]' | sort -u
Issue: Parse Errors
Symptom:
Warning: Failed to parse adr/proposed/ADR-0017.md: 'str' object has no attribute 'get'
Cause: ADR uses simplified YAML format (strings instead of objects)
Solution: Parser now supports both formats automatically. Update to latest version.
Issue: Invalid JSONL
Symptom:
cat output.jsonl | jq .
# jq: parse error: Invalid numeric literal at line 2
Cause: Log messages mixed with JSON output
Solution: Redirect stderr:
adr export adr --format jsonl 2>/dev/null
Issue: Slow Export
Symptom: Export takes >30 seconds for 100 ADRs
Possible Causes:
1. Parsing overhead: Use --format jsonl instead of json
2. Large files: ADRs with huge markdown content
3. Disk I/O: Slow filesystem
Solution:
# Profile with time
time adr export adr --format jsonl --compact > /dev/null
# Use tmpfs for temp files
export TMPDIR=/dev/shm
Issue: Character Encoding
Symptom: Unicode characters garbled
Solution: Ensure UTF-8:
export LC_ALL=en_US.UTF-8
adr export adr --format json > output.json
Advanced Use Cases
Incremental Updates
Track ADR changes with content hashing:
import json
# Load previous export
with open('prev_export.json') as f:
prev = {adr['id']: adr['metadata']['hash'] for adr in json.load(f)}
# Export current state
with open('curr_export.json') as f:
curr = {adr['id']: adr['metadata']['hash'] for adr in json.load(f)}
# Find changes
new_ids = set(curr.keys()) - set(prev.keys())
changed_ids = {id for id in curr if id in prev and curr[id] != prev[id]}
deleted_ids = set(prev.keys()) - set(curr.keys())
print(f"New: {len(new_ids)}, Changed: {len(changed_ids)}, Deleted: {len(deleted_ids)}")
Custom Transformations
Extract specific fields with jq:
# Extract decision timeline
adr export adr/accepted --format json | \
jq '.[] | {id, date: .metadata.date, status, projects: .scope.projects}' | \
jq -s 'sort_by(.date)'
# Generate Markdown table
adr export adr/accepted --format json | \
jq -r '.[] | "| \(.id) | \(.title) | \(.status) | \(.metadata.date) |"'
Compliance Reporting
Generate compliance reports:
# LGPD-tagged decisions
adr export adr --format json --filter-status accepted | \
jq '[.[] | select(.governance.compliance | contains(["LGPD"]))] |
{count: length, decisions: [.[] | {id, title}]}'
# Critical decisions by quarter
adr export adr --filter-classification critical --format json | \
jq 'group_by(.metadata.date[0:7]) |
map({month: .[0].metadata.date[0:7], count: length})'
Best Practices
- Use JSONL for pipelines: Streaming, constant memory
- Filter early: Export only what you need
- Version exports: Tag exports with timestamps
- Validate output: Always pipe to
jqfor validation - Compress for storage:
gzipJSONL files (70% reduction) - Monitor hash changes: Track ADR evolution via
metadata.hash - Automate exports: Use git hooks for auto-sync
Examples Gallery
Daily Knowledge Sync
#!/bin/bash
# Sync accepted ADRs to CEREBRO daily
DATE=$(date +%Y-%m-%d)
adr export adr/accepted \
--format jsonl \
--filter-status accepted \
--compact > /tmp/adr_${DATE}.jsonl
# Upload to CEREBRO
cerebro-cli sync /tmp/adr_${DATE}.jsonl
# Cleanup old exports (keep 7 days)
find /tmp -name 'adr_*.jsonl' -mtime +7 -delete
Project Documentation
# Generate project-specific docs
for project in CEREBRO PHANTOM SPECTRE NEUTRON; do
adr export adr/accepted \
--filter-project $project \
--filter-status accepted \
--format json > docs/${project}/decisions.json
done
Architecture Review
# Export critical decisions for review
adr export adr \
--filter-classification critical \
--since 2026-01-01 \
--format json | \
jq '.[] | {id, title, date: .metadata.date, projects: .scope.projects}' > review.json
For more examples, see the documentation home or run adr export --help.