What bridge edges are, the full edge type taxonomy, how they connect spec and code layers, and the AI-assisted traceability pipeline that constructs them for brownfield projects.

Bridge Edges

Bridge edges are what make the Unified Project Graph unified. Without them, the spec graph and the code graph are two disconnected islands — separate node sets with no links between them. Bridge edges connect spec-layer nodes (entities, operations, endpoints, user stories) to code-layer nodes (tables, functions, handlers, modules), enabling convergence checking, drift detection, and impact analysis.

Edge Type Taxonomy

Edge Type	From	To	Meaning
`implements`	`spec_operation`	`code_function`	This function implements this operation
`exposes`	`spec_endpoint`	`code_endpoint`	This handler exposes this endpoint
`persists`	`spec_entity`	`code_table`	This table stores this entity
`satisfies`	`spec_user_story`	`code_function`	Coarse story-to-code link
`tests`	`test_case`	`spec_acceptance_criterion`	This test verifies this criterion
`drifts_from`	`code_node`	`spec_node`	Implementation has diverged from spec
`validates`	`spec_guard`	`code_function`	Guard function validates this spec rule
`derived_from`	`impl_node`	`spec_node`	Implementation node derived from spec node
`generated_from`	`impl_node`	`spec_node`	Written by the codegen pipeline

Two Scenarios: Greenfield and Brownfield

Greenfield

When Praetor generates the code, bridge edges are written with confidence 1.0 at generation time. The Kit system knows exactly which spec node each generated file implements. Bridge edges are part of the Kit's KitOutput — they are not inferred after the fact.

The Kit writes a generated_from edge from the impl_node to its spec_node as part of the REGISTER phase of Kit execution.

Brownfield

When analyzing an existing codebase, bridge edges must be inferred. The code exists independently of any Praetor spec. The bridge edge pipeline maps spec nodes to code nodes using a 4-stage pipeline combining deterministic matching, embedding similarity, structural re-ranking, and LLM classification.

The 4-Stage Bridge Edge Pipeline (Brownfield)

Stages 1 and 3 are gated behind feature flags (off by default). When all AI flags are off, the pipeline runs deterministic matching only — which handles 60%+ of bridge edges for well-structured TypeScript projects.

Deterministic Matching (ALWAYS ON)
  Route path exact match, table name match, derived-from-code edges
  Confidence 0.90–1.0, no AI calls
        │
        ▼  (remaining unmatched spec nodes, if flags enabled)

Stage 1: Candidate Generation [flag: bridgeEmbeddingCandidates]
  Embedding similarity — cast a wide net
  Input: all spec nodes × all code nodes (type-filtered)
  Output: top-10 candidate pairs per spec node
  Threshold: cosine similarity > 0.3 (recall over precision)
        │
        ▼

Stage 2: Structural Re-ranking (always runs if Stage 1 ran)
  Exploit graph topology to boost/penalize candidates
  Signals: name_similarity, signature_match, graph_topology,
           table_access, route_match, module_colocation,
           cdg_guard_match, dfg_data_path
        │
        ▼

Stage 3: LLM Classification [flag: bridgeLlmClassification]
  For top-3 candidates per spec node:
  ask LLM "Does this code implement this spec?"
  Chain-of-thought → match / no-match / partial
  Cheap model (Haiku/DeepSeek) for bulk, Sonnet for ambiguous
  If flag OFF: skip → use structural score alone
        │
        ▼

Stage 4: Confidence Scoring + Emission (always runs)
  Combine all signals → composite confidence
  > 0.85 → auto-accept
  0.50–0.85 → flag for human review
  < 0.50 → discard
  Write accepted edges to context_artifact_dependencies

Type Compatibility Matrix

The pipeline only compares spec/code node pairs that could plausibly link. The compatibility matrix:

Spec Type	Compatible Code Types
`spec_entity`	`code_table`, `code_class`, `code_schema`
`spec_operation`	`code_function`, `code_method`
`spec_endpoint`	`code_endpoint`, `code_function`
`spec_user_story`	`code_function`, `code_class`, `code_module`
`spec_guard`	`code_function`, `code_method`

Pairs outside this matrix are never compared, which eliminates the bulk of the O(N×M) candidate space.

Structural Re-ranking Signals

Stage 2 produces eight signal types, each scoring 0.0–1.0:

Signal	Description
`name_similarity`	Normalized Levenshtein after case normalization
`signature_match`	Parameter count and type alignment
`graph_topology`	If spec nodes A→B depend on each other, boost code pairs X→Y that mirror the same dependency
`table_access`	Code function accesses the spec entity's corresponding table
`route_match`	HTTP method + path similarity between spec endpoint and code handler
`module_colocation`	Spec operations in the same service map to code functions in the same module
`cdg_guard_match`	Control-dependence graph guard type matches spec guard type
`dfg_data_path`	Data flow graph path matches spec-defined intended data flow

Confidence and Human Review

The final confidence score for each bridge edge determines its fate:

High confidence (> 0.85): Auto-accepted, written to context_artifact_dependencies with confidence metadata.
Medium confidence (0.50–0.85): Written to the graph but flagged for human review in the brownfield studio. Reviewers can accept, reject, or correct the link.
Low confidence (< 0.50): Discarded. Not written to the graph.

All accepted edges carry provenance metadata recording which stage produced the match and the contributing signal scores. This makes the traceability itself traceable.

Research Foundation

The pipeline design is grounded in automated traceability link recovery (TLR) research:

Embedding similarity alone achieves ~45–55% F1 on spec-to-code mapping
Adding structural signals improves F1 by 8–15 percentage points
Adding LLM classification pushes accuracy above 85% for well-structured codebases
Hierarchical Bayesian composition (COMET, ICSE 2020) improves average precision by 5–14%

The staged architecture — deterministic first, then embedding, then structural, then LLM — optimizes for cost: each stage only processes what the previous stage could not resolve deterministically.

Bridge Edges

Command Palette