Bridge Edges
What bridge edges are, the full edge type taxonomy, how they connect spec and code layers, and the AI-assisted traceability pipeline that constructs them for brownfield projects.
Bridge Edges
Bridge edges are what make the Unified Project Graph unified. Without them, the spec graph and the code graph are two disconnected islands — separate node sets with no links between them. Bridge edges connect spec-layer nodes (entities, operations, endpoints, user stories) to code-layer nodes (tables, functions, handlers, modules), enabling convergence checking, drift detection, and impact analysis.
Edge Type Taxonomy
| Edge Type | From | To | Meaning |
|---|---|---|---|
implements | spec_operation | code_function | This function implements this operation |
exposes | spec_endpoint | code_endpoint | This handler exposes this endpoint |
persists | spec_entity | code_table | This table stores this entity |
satisfies | spec_user_story | code_function | Coarse story-to-code link |
tests | test_case | spec_acceptance_criterion | This test verifies this criterion |
drifts_from | code_node | spec_node | Implementation has diverged from spec |
validates | spec_guard | code_function | Guard function validates this spec rule |
derived_from | impl_node | spec_node | Implementation node derived from spec node |
generated_from | impl_node | spec_node | Written by the codegen pipeline |
Two Scenarios: Greenfield and Brownfield
Greenfield
When Praetor generates the code, bridge edges are written with confidence 1.0 at generation time. The Kit system knows exactly which spec node each generated file implements. Bridge edges are part of the Kit's KitOutput — they are not inferred after the fact.
The Kit writes a generated_from edge from the impl_node to its spec_node as part of the REGISTER phase of Kit execution.
Brownfield
When analyzing an existing codebase, bridge edges must be inferred. The code exists independently of any Praetor spec. The bridge edge pipeline maps spec nodes to code nodes using a 4-stage pipeline combining deterministic matching, embedding similarity, structural re-ranking, and LLM classification.
The 4-Stage Bridge Edge Pipeline (Brownfield)
Stages 1 and 3 are gated behind feature flags (off by default). When all AI flags are off, the pipeline runs deterministic matching only — which handles 60%+ of bridge edges for well-structured TypeScript projects.
Deterministic Matching (ALWAYS ON)
Route path exact match, table name match, derived-from-code edges
Confidence 0.90–1.0, no AI calls
│
▼ (remaining unmatched spec nodes, if flags enabled)
Stage 1: Candidate Generation [flag: bridgeEmbeddingCandidates]
Embedding similarity — cast a wide net
Input: all spec nodes × all code nodes (type-filtered)
Output: top-10 candidate pairs per spec node
Threshold: cosine similarity > 0.3 (recall over precision)
│
▼
Stage 2: Structural Re-ranking (always runs if Stage 1 ran)
Exploit graph topology to boost/penalize candidates
Signals: name_similarity, signature_match, graph_topology,
table_access, route_match, module_colocation,
cdg_guard_match, dfg_data_path
│
▼
Stage 3: LLM Classification [flag: bridgeLlmClassification]
For top-3 candidates per spec node:
ask LLM "Does this code implement this spec?"
Chain-of-thought → match / no-match / partial
Cheap model (Haiku/DeepSeek) for bulk, Sonnet for ambiguous
If flag OFF: skip → use structural score alone
│
▼
Stage 4: Confidence Scoring + Emission (always runs)
Combine all signals → composite confidence
> 0.85 → auto-accept
0.50–0.85 → flag for human review
< 0.50 → discard
Write accepted edges to context_artifact_dependenciesType Compatibility Matrix
The pipeline only compares spec/code node pairs that could plausibly link. The compatibility matrix:
| Spec Type | Compatible Code Types |
|---|---|
spec_entity | code_table, code_class, code_schema |
spec_operation | code_function, code_method |
spec_endpoint | code_endpoint, code_function |
spec_user_story | code_function, code_class, code_module |
spec_guard | code_function, code_method |
Pairs outside this matrix are never compared, which eliminates the bulk of the O(N×M) candidate space.
Structural Re-ranking Signals
Stage 2 produces eight signal types, each scoring 0.0–1.0:
| Signal | Description |
|---|---|
name_similarity | Normalized Levenshtein after case normalization |
signature_match | Parameter count and type alignment |
graph_topology | If spec nodes A→B depend on each other, boost code pairs X→Y that mirror the same dependency |
table_access | Code function accesses the spec entity's corresponding table |
route_match | HTTP method + path similarity between spec endpoint and code handler |
module_colocation | Spec operations in the same service map to code functions in the same module |
cdg_guard_match | Control-dependence graph guard type matches spec guard type |
dfg_data_path | Data flow graph path matches spec-defined intended data flow |
Confidence and Human Review
The final confidence score for each bridge edge determines its fate:
- High confidence (> 0.85): Auto-accepted, written to
context_artifact_dependencieswithconfidencemetadata. - Medium confidence (0.50–0.85): Written to the graph but flagged for human review in the brownfield studio. Reviewers can accept, reject, or correct the link.
- Low confidence (< 0.50): Discarded. Not written to the graph.
All accepted edges carry provenance metadata recording which stage produced the match and the contributing signal scores. This makes the traceability itself traceable.
Research Foundation
The pipeline design is grounded in automated traceability link recovery (TLR) research:
- Embedding similarity alone achieves ~45–55% F1 on spec-to-code mapping
- Adding structural signals improves F1 by 8–15 percentage points
- Adding LLM classification pushes accuracy above 85% for well-structured codebases
- Hierarchical Bayesian composition (COMET, ICSE 2020) improves average precision by 5–14%
The staged architecture — deterministic first, then embedding, then structural, then LLM — optimizes for cost: each stage only processes what the previous stage could not resolve deterministically.