GitHub - AdamManuel-dev/Ultra-Research: Deep research, turned up to 11 for utility instead of cost-savings.

Deep Research Cockpit

Steerable AI Research Platform for Mapping Knowledge in Real Time

An AI-powered research cockpit that explores web knowledge with real-time controls, prioritizes primary sources, and builds a verifiable graph of insights. Built for technical professionals who demand transparency and control: research engineers, PMs, data analysts, and tech leads.

๐Ÿšง Project Status: Deep Research Cockpit is under active development (Alpha - Phase 3: Advanced Features). Core research pipeline is functional, with Phase 1 (Foundations) and Phase 2 (Ranking & Core-First) complete. Currently implementing advanced features including graph memory and JS rendering.


Table of Contents


Why This Matters

The Problem

Traditional research and search tools fall short for knowledge work:

  • Aggregation over exploration: Search results are ranked lists, not maps of adjacent concepts
  • Black-box prioritization: You don't know why one source ranked above another
  • Lost provenance: Claims are divorced from original sources and extraction context
  • No steering: Once a search is launched, you can't adjust depth, breadth, or source priority mid-flight
  • Opacity: No audit trail of how conclusions were reached

The Deep Research Cockpit Difference

1. Real-Time Steering Adjust depth, breadth, source priority, verification intensity, and comparison scope live with immediate visual feedback. Every control change shows previewed impact within 400ms.

2. Core-First Architecture Automatically prioritizes papers, specifications, and repositories (L0/L1) over blogs and forums (L3/L4). Enforce reading order with transparent reasoning for each ranking decision.

3. Graph Memory Concepts, claims, and relationships guide exploration. Every discovered fact is linked to sources and previous findings. The knowledge graph evolves with your session, enabling discovery of contradictions and novel connections.

4. Complete Verifiable Lineage Every claim has:

  • The exact source document with URL
  • Character span of the extracted evidence
  • Timestamp of when it was discovered
  • Why-ranked explanation of source prioritization
  • Independent corroborating sources

5. Deterministic Replay & Analysis Replay any session exactly as it happened. Analyze how strategy changes affected outcomes. Compare A/B runs with synchronized timelines and metric deltas. Export complete evidence ledgers.

Use Cases

  • Literature reviews with primary source emphasis and contradiction detection
  • Technology spec hunts with origin tracing and related standards mapping
  • Controversy analysis with evidence clustering and disagreement discovery
  • Vendor comparison with specification matching and bias detection
  • Learning paths with curated reading queues and prerequisite ordering

Key Concepts

Source Tiers (L0 โ†’ L4)

Understanding source classification is central to Deep Research Cockpit's core-first approach:

Tier Classification Examples
L0 Foundational artifacts Academic papers, technical specifications, official standards (RFCs, ISO), core repositories
L1 Primary sources Conference talks, official documentation, white papers, working groups
L2 Informed secondary Technical blogs, tutorials, conference proceedings, theses
L3 Practitioner knowledge StackOverflow, MathOverflow, professional forums, product docs
L4 General discussion Twitter, Reddit, general forums, unvetted blogs

The Frontier

A priority queue of concepts, pages, and claims the system recommends exploring next. Scored by:

  • Novelty: How new this information is relative to existing memory
  • Centrality: How connected to core concepts
  • Disagreement: Potential contradictions with existing claims
  • Recency: When the source was published
  • User interest: Explicit knob adjustments

Strategy Controls (The Knobs)

Real-time parameters that steer exploration:

  • Depth โ†” Breadth: Focus on depth (following chains of logic) vs. breadth (exploring diverse angles)
  • Core-First: Enforce reading order from L0โ†’L4 vs. treat all sources equally
  • Verify: High corroboration requirements vs. accept single-source claims
  • Compare: Find contradicting sources and alternative viewpoints
  • Pivot: Jump to adjacent concepts vs. stay focused on current topic

GraphRAG (Graph-guided Retrieval)

Hybrid retrieval combining:

  • Text search: BM25 for keyword matching
  • Vector search: Dense embeddings for semantic similarity
  • Graph-guided: Follow edges in the knowledge graph to discover related claims
  • Community summaries: Each cluster of related concepts has an AI-generated summary

Event Stream & Replay

Every actionโ€”search, fetch, extract, rank, think, decideโ€”emits a JSONL event with:

  • Timestamp and run ID
  • Action type and agent responsible
  • Input parameters and decision rationale
  • Output summary and cost (tokens, time, money)
  • Artifacts created

Full replay determinism enables exact reproduction of any session.


Core Features

๐ŸŽฎ Run Mode โ€” "Pilot View"

Real-time research exploration with full agency:

  • Command Palette: Macros for common patterns (Survey, Spec Hunt, Contradiction Hunt, Origin Trace, Video-First)
  • Strategy Compass: Adjustable control knobs with what-if previews (ghost re-ranking)
  • Frontier Map: Interactive concept graph with overlays showing:
    • Core proximity (distance to foundational sources)
    • Disagreement (potential contradictions)
    • Freshness (recent discoveries)
    • Memory writes (new concepts added this session)
  • Evidence & Reading Queue: L0โ†’L4 columns with rank cards showing:
    • Why-ranked breakdown (relevance, authority, core proximity, recency, independence)
    • Source metadata (venue, citations, OA status)
    • Confidence scores and contradiction flags
  • Live Log: Filterable event stream grouped by decision phases
  • Status Strip: Real-time operational metrics
    • Latency, token usage, %JS renders, diversity index, contradiction density

๐Ÿ“Š Review Mode โ€” "Black-Box Recorder"

After-the-fact analysis of any completed research session:

  • Timeline Replay: Scrub through session snapshots; jump to landmarks
    • First core source discovered
    • Contradiction spike events
    • Cost inflection points
    • Strategy control changes
  • Effectiveness Dashboard: Key Performance Indicators
    • TTFC (Time-to-First-Core source)
    • Authority Mix (% L0/L1 sources)
    • Evidence Robustness (independent sources per claim)
    • Frontier Entropy (breadth measurement)
    • Operational metrics (latency, cost, cache hits)
  • Evidence Ledger: Complete claim inventory with
    • Confidence scores
    • Supporting sources and quotes
    • Timestamps and extraction spans
    • Contradiction flags and corrections
  • Learning Map: Diff of graph memory
    • New concepts and entities discovered
    • New relationships and claims added
    • Summary changes and retractions
  • Session Summary: Machine-generated brief with key findings
  • A/B Run Comparator: Compare two sessions side-by-side
    • Synchronized timelines
    • Metric deltas
    • Frontier divergence analysis
  • Exports: Multiple output formats
    • JSONL (complete event stream)
    • CSV (claims and sources)
    • GraphML (knowledge graph structure)
    • PDF (formatted research brief)

๐Ÿ”— Orchestrator & Strategy Controller

Central planning brain that decomposes research goals into exploration task graphs:

  • Task Decomposition: Break objectives into search โ†’ fetch โ†’ extract โ†’ index โ†’ retrieve โ†’ synthesize tasks
  • Frontier Scoring: Maintain a priority queue of what to explore next using multi-factor scoring
  • Strategy Application: Apply user control changes to re-weight scoring and branching
  • Decision Logging: Every decision includes inputs, score vector, and outcomes
  • Deterministic Replay: Same inputs always produce same decisions (seeded randomness)
  • Budget Enforcement: Hard caps on tokens, time, domains, and JS renders

Response Times (P95):

  • Knob change โ†’ preview: โ‰ค400ms
  • Knob change โ†’ scheduler impact: โ‰ค800ms

๐ŸŒ Fetch & Extraction Pipeline

Cost-aware content acquisition with intelligent routing:

  • Non-JS Fetch (Default): HTTP with redirects, charset detection, robots.txt compliance via Trafilatura
  • Smart Router: Escalates to JS rendering when detected:
    • Placeholder/empty DOM
    • SPA patterns (React, Vue, Angular)
    • Interstitial content gates
    • Dynamic loading patterns
  • JS Fetch Path: Browserbase + Stagehand + Playwright with
    • Site-specific playbooks for reliable extraction
    • Structured extraction hooks for tables, lists, metadata
    • Session management for multi-step authentication
  • HTMLโ†’Markdown Normalization: Turndown service with
    • Heading hierarchy preservation
    • Code block and table fidelity
    • Figure captions and link preservation
    • Boilerplate removal
  • Smart Caching: Idempotent by URL+ETag with configurable TTLs
  • Compliance: Robots.txt respect, rate limiting, user-agent identification

Performance (P95):

  • Basic fetch: โ‰ค600ms
  • JS fetch: โ‰ค3s
  • Cache hit rate target: โ‰ฅ40%

๐ŸŽฏ Source Ranking & Core-First Prioritization

Multi-factor scoring with transparent reasoning:

  • Source Classification: Automatic categorization into tiers
    • Paper/spec/repo detection via metadata and URL patterns
    • Academic vs. practitioner vs. general source
    • Authority level via venue, citation count, DOI presence
  • Metadata Enrichment: Integration with scholarly databases
    • Crossref (DOI, venue, citations)
    • OpenAlex (comprehensive metadata)
    • Semantic Scholar (citation context)
    • Unpaywall (open access links)
    • Retraction Watch (correction flags)
  • Normalized Scoring: Combine factors with user-tunable weights
    • Text relevance (BM25 similarity)
    • Authority (venue rank, citation count, L-tier)
    • Core proximity (citation-graph distance to origins)
    • Recency (publish date with time decay)
    • Independence (avoiding duplication and bias)
    • Penalties for retractions/corrections
  • Why-Ranked Explanations: Human-readable breakdown showing
    • Which factors helped (green)
    • Which factors hurt (red)
    • What-if previews when knobs adjust
  • Reading Queue Builder: Enforce L0โ†’L4 reading order
    • Quota per tier
    • Gap detection (e.g., missing specification)
    • Drag-to-override with audit trail

Performance (P95):

  • Enrichment latency: โ‰ค1.5s (batched)
  • Scoring with explanations: โ‰ค200ms

๐Ÿ“ˆ Graph Memory & GraphRAG Retrieval

Persistent knowledge graph with intelligent exploration:

  • Schema: Typed nodes and edges
    • Nodes: Concept, Entity, Claim, Source, Note
    • Edges: SUPPORTED_BY, ABOUT, RELATED_TO, CONTRADICTS, CITES
  • Merge Operations: Upsert claims with automatic deduplication
    • Concept linking
    • Source attribution
    • Relationship creation
    • โ‰ค200ms per claim (P95)
  • Community Detection: Automatic clustering and summarization
    • Identify dense clusters of related concepts
    • Generate community-level summaries
    • Measure cluster coherence
  • Origin Path Calculation: Shortest path from any claim to foundational sources
    • Enables "why should I trust this?" answers
    • Visualize citation chains
    • Detect chain-of-logic breaks
  • Hybrid Retrieval: Combine multiple strategies
    • BM25 (keyword matching)
    • Dense vectors (semantic similarity)
    • Neural sparse (entity-aware keyword search)
    • Graph-guided expansion (follow edges to neighbors)
    • RRF (reciprocal rank fusion) to combine signals
  • Query Subgraph Latency: โ‰ค400ms P95

๐Ÿ“š Reading Queue & "Watch the Act" Pipeline

Core-first reading order with integrated video/talk content:

  • Origin Detection: Identify earliest highly-cited foundational works
    • Patent searches
    • Spec document discovery
    • GitHub repository analysis
    • Academic paper citation trails
  • L0โ†’L4 Queue Building: Enforce reading order with
    • Tier quotas (e.g., 40% L0, 30% L1, 20% L2, 10% L3/L4)
    • Gap marking (missing prerequisites or standards)
    • Override capability with full audit trail
  • Video Pipeline:
    • Fetch captions (auto-transcription as fallback)
    • Align snippets to extracted claims
    • Add jump-to timestamps in UI
    • โ‰ค3s per video with caching

๐Ÿ“ก Observability & Event Schema

Complete audit trail enabling transparency and replay:

  • Event Stream: JSONL format with core fields
    • ts (ISO timestamp), run_id, step_id
    • agent (which component), action (operation type)
    • input, output_summary, artifacts
    • source (document URL or system)
    • cost_ms, tokens_in, tokens_out, decision
  • Real-Time Streaming: SSE/WebSocket for live views
  • Durable Storage: Object store + indexed queryable database
  • Snapshot Generator: Pre-computed snapshots for โ‰ค2s replay load times
  • Queryable: By run ID, time range, action, agent, decision type
  • Deterministic: Same inputs always produce same decision log

๐Ÿ”’ Security, Privacy & Compliance

Enterprise-grade safety:

  • Site Compliance
    • Robots.txt and Terms of Service respect
    • Per-domain rate limiting
    • Appropriate user-agent identification
    • Retraction/correction flagging
  • Data Protection
    • PII minimization in storage
    • Encryption at rest and in transit
    • Secret vault for API keys and credentials
    • Data retention policies with automatic cleanup
  • Access Control
    • Role-based access (viewer, editor, admin)
    • Workspace isolation for teams
    • Audit logs for all actions
    • Export & deletion endpoints
  • Compliance Documentation
    • DPA/Terms of Service summaries
    • Opt-in for auto-transcription features
    • GDPR-ready data handling

๐Ÿ’ฐ Cost & Performance Management

Predictable operational costs:

  • JS Rendering Budget: Policy enforcement
    • Auto/Conservative/Aggressive modes
    • Percentage-of-run caps
    • Fallback to cached content when budget exceeded
  • Per-Run Caps: Configurable limits
    • Token budget (input + output)
    • Time budget (wall-clock duration)
    • Domain count limit
    • JS render count limit
  • Caching Layers: Reduce redundant work
    • HTTP cache (ETag-aware)
    • HTMLโ†’Markdown reduction cache
    • Embeddings cache
    • Metadata cache (scholarly API results)
  • Batching & Optimization
    • Batch API requests to scholarly databases
    • Circuit breakers for failing services
    • Adaptive backoff and retry logic

๐Ÿงช Evaluation & QA Framework

Continuous quality assurance:

  • Golden Tasks: Reference queries with expected findings
    • Known sources to discover
    • Claims to validate
    • Contradictions to surface
  • Automated Scoring: RAGAS-style metrics
    • Faithfulness (claims match sources)
    • Answer relevancy (retrieved content answers question)
    • Context precision (no irrelevant sources)
    • Context recall (all relevant sources found)
  • Claim-Level Judgment: LLM-as-judge with evidence links
  • A/B Testing Infrastructure:
    • Vary knobs and component configurations
    • Track outcome differences
    • Pareto front analysis for trade-offs
  • Automated Reporting:
    • Weekly quality reports
    • Regression thresholds with alerts
    • Component performance breakdown

๐Ÿ”ฌ External Scholarly Integrations

Authority-based metadata and enrichment:

  • Integrated APIs:
    • Crossref: DOI lookup, venue metadata, citation counts
    • OpenAlex: Comprehensive scholarly metadata
    • Semantic Scholar: Citation context and author information
    • Unpaywall: Open access link discovery
    • Venue Databases: Journal/conference impact rankings
    • Video Caption APIs: YouTube, Vimeo, etc.
  • Normalized Schema: Consistent "source profile" structure across providers
  • Caching Strategy: Local caching with periodic refresh
  • Rate Limit & Retry: Respectful API usage with exponential backoff
  • 95%+ Enrichment Target: Scholarly metadata available for 95%+ of academic sources

Architecture Overview

System Diagram

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         User UI (Run Mode / Review Mode)                    โ”‚
โ”‚     (Pilot View / Black-Box Recorder)                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
        SSE/WebSocket โ†” Real-time Control
                   โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Orchestrator (Planner + Strategy Controller)               โ”‚
โ”‚  - Task decomposition                                       โ”‚
โ”‚  - Frontier scoring & prioritization                        โ”‚
โ”‚  - Budget enforcement                                       โ”‚
โ”‚  - Decision logging                                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚                                  โ”‚
       โ–ผ                                  โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Fetch Basic     โ”‚          โ”‚  Fetch JS        โ”‚
โ”‚  (httpx +        โ”‚          โ”‚  (Browserbase +  โ”‚
โ”‚   Trafilatura)   โ”‚          โ”‚   Stagehand)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                             โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ–ผ
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚ Turndown (HTML โ†’ Markdown)  โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ–ผ
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚ Indexer (Hybrid OpenSearch)  โ”‚
          โ”‚ - BM25 (keyword)             โ”‚
          โ”‚ - Dense vectors (semantic)   โ”‚
          โ”‚ - Neural sparse (entities)   โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ–ผ                      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Claim/Entity    โ”‚  โ”‚ Graph Memory         โ”‚
โ”‚ Extractor       โ”‚  โ”‚ (Neo4j/Memgraph)     โ”‚
โ”‚ (w/ citations)  โ”‚  โ”‚ - Concepts/Entities  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚ - Claims/Sources     โ”‚
         โ”‚           โ”‚ - Relationships      โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”ค - Community summariesโ”‚
                โ–ผ    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚ Retrieval (Hybrid +      โ”‚
         โ”‚ Graph-Guided)            โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ–ผ
      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
      โ”‚ Synthesis           โ”‚
      โ”‚ (Notes/Reports)     โ”‚
      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                โ”‚
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ–ผ                 โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ Events  โ”‚      โ”‚ Snapshots    โ”‚
    โ”‚ (JSONL) โ”‚      โ”‚ (for replay) โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Technology Stack

Component Technology Purpose
Graph Database Neo4j / Memgraph Knowledge graph storage
Search Index OpenSearch Hybrid retrieval (BM25 + vector)
Content Fetch httpx + Trafilatura Non-JS content acquisition
JS Rendering Browserbase + Stagehand Dynamic site rendering
HTMLโ†’Markdown Turndown Content normalization
Embedding Model (Configurable) Semantic search vectors
Event Streaming SSE / WebSocket Real-time client updates
Event Storage Object store (S3/GCS) + OpenSearch Queryable audit trail
Scheduling (DAG-based task runner) Orchestration engine
API Clients Scholarly integrations Metadata enrichment

Design Principles

Cost-Aware: Non-JS fetch by default; JS rendering only when necessary and within budget Verifiable: Every claim links to source spans with timestamps and confidence Deterministic: Exact replay of sessions with seeded randomness Transparent: All decisions include reasoning; users see trade-offs Scalable: Horizontal scaling of components; streaming architecture Compliant: Respect robots.txt, TOS, rate limits, and privacy regulations


Quick Start

๐Ÿ“š Complete Setup Guide: For detailed installation instructions, troubleshooting, and verification steps, see the Getting Started Guide (30-45 minutes).

Prerequisites

System Requirements:

  • Node.js 18.0.0 or higher
  • npm 9.0.0 or higher
  • Docker Desktop (or Docker Engine + Docker Compose v2.0+)
  • 8GB RAM minimum (16GB recommended)
  • 20GB disk space for Docker volumes

Optional:

  • API keys for LLM providers (OpenAI, Anthropic) - for future integrations

Installation

5-Minute Setup:

# 1. Clone the repository
git clone https://github.com/yourusername/deep-research-cockpit.git
cd deep-research-cockpit

# 2. Install dependencies
npm install

# 3. Start infrastructure services (Neo4j, OpenSearch, Redis, MinIO)
npm run docker:up

# 4. Wait for services to be healthy (~1-2 minutes)
docker-compose ps  # All services should show "healthy"

# 5. Start development servers (backend + frontend)
npm run dev

Access the Application:

Verify Installation

# Check backend health
curl http://localhost:3001/health | jq

# Expected output:
{
  "status": "healthy",
  "services": {
    "neo4j": { "status": "connected", "latency_ms": 12 },
    "opensearch": { "status": "connected", "latency_ms": 45 },
    "redis": { "status": "connected", "latency_ms": 3 }
  },
  "uptime_seconds": 42,
  "version": "0.1.0"
}

First Research Session

Note: In the current pre-alpha phase, many UI features are still in development. The backend event system and storage are functional.

  1. Open the UI at http://localhost:3000
  2. Create a new research session (when UI is ready)
  3. Monitor events in real-time:
    # Watch event stream
    curl -N http://localhost:3001/events/stream
  4. Check stored events:

Troubleshooting

Services won't start?

# Check Docker is running
docker ps

# Check port conflicts
lsof -i :7474  # Neo4j
lsof -i :9200  # OpenSearch

# View logs
npm run docker:logs

Need more help?

Next Steps

For Users:

  • Explore the system as features are implemented
  • Provide feedback on GitHub Discussions

For Developers:

Complete Documentation:


Core Metrics & Performance

Key Performance Indicators

Metric Target Purpose
TTFC โ‰ค90s P95 Time to First Core (L0/L1) source discovery
Authority Mix โ‰ฅ60% Proportion of L0/L1 sources in core-first mode
Evidence Robustness โ‰ฅ2.0 Independent sources per high-impact claim
Control Responsiveness โ‰ค400ms Preview latency for knob changes
Commit Latency โ‰ค800ms Strategy change application
JS Render Rate โ‰ค25% Percentage JS for general topics
Cache Hit Rate โ‰ฅ40% HTTP/metadata caching effectiveness
Frontier Entropy Tunable Breadth measurability

Operational Characteristics

  • Deterministic Replay: 100% reproducibility of event sequence
  • Frontier Scoring: Measurable changes when controls shift
  • Domain Diversity: Tunable with visible impact on results
  • Contradiction Density: Observable clustering of disagreements

Success Criteria (MVP)

  • TTFC โ‰ค120s with Authority Mix โ‰ฅ50% L0/L1
  • Complete event logging and basic replay
  • Evidence Robustness โ‰ฅ2.0 independent sources per claim

Documentation Map

๐Ÿ“š Complete Documentation Index: See docs/INDEX.md for all documentation organized by role and topic

This README provides a high-level overview. For detailed information:

  • System-level docs: See docs/ directory for guides, architecture, and PRDs
  • Package-level docs: Each package (backend/, frontend/, shared/) has its own detailed documentation in packages/{name}/docs/
  • API references: Available in both root ARCHITECTURE.md and package-specific API.md files

Quick Links by Role

๐Ÿ‘จโ€๐Ÿ’ป For Developers

Essential Guides:

Architecture Deep Dives:

๐Ÿ”ฌ For Research Engineers & Analysts

User Guides (Coming Soon):

  • Pilot View UI Guide - Master the live research interface
  • Strategy Controls Guide - Steer exploration effectively
  • Source Tiers & Core-First - Understanding authority ranking
  • Reading Queue Management - Building optimal reading order
  • Review Mode Deep Dive - Analyzing completed sessions

๐Ÿข For Technical Leaders & Product Managers

Strategic Documentation:

Future Documentation:

  • Evaluation Framework - Quality metrics and measurement
  • Cost & Performance Management - Budget controls and optimization
  • Security & Compliance - Privacy, compliance, and safety

๐Ÿ“Š For Data Scientists

Technical Implementation (Planned):

  • GraphRAG Implementation - Graph-guided retrieval details
  • Ranking Algorithms - Scoring function design
  • Evaluation & Metrics - Measurement methodology
  • Source Classification - L0-L4 detection algorithms

Core Documentation

Document Description Status
README.md Project overview and quick start โœ… Current
ARCHITECTURE.md Complete system architecture โœ… Current
CONTRIBUTING.md Contribution guidelines โœ… Current
Getting Started Setup and installation โœ… Current
Development Guide Development workflow โœ… Current
Testing Guide Testing practices โœ… Current
Deployment Guide Deployment strategies โœ… Current
Documentation Index Complete documentation map โœ… Current

Package Documentation

Each package has its own detailed documentation:

Backend (packages/backend/docs/):

Frontend (packages/frontend/docs/):

Shared (packages/shared/docs/):

Additional Resources


Roadmap & Status

Current Status

Alpha - Phase 3 โ€” Core research pipeline functional (Phases 1-2 complete); currently implementing advanced features (graph memory, JS rendering, video pipeline); evaluating user workflows and performance characteristics

Rollout Phases

โœ… Phase 1: Foundations (Weeks 1โ€“3)

  • Event schema and bus
  • Basic fetch + Turndown normalization
  • Hybrid search index (OpenSearch)
  • Orchestrator with frontier scoring
  • Pilot View shell (layout, basic log, queue)

Exit Criteria: MVP research pipeline works end-to-end

โœ… Phase 2: Ranking & Core-First (Weeks 4โ€“6)

  • Source classification and metadata enrichment
  • Multi-factor ranking with why-ranked explanations
  • Reading queue builder with L0โ†’L4 enforcement
  • Router heuristics for JS escalation
  • HTTP and metadata caching

Exit Criteria: Authority Mix โ‰ฅ50%, TTFC โ‰ค120s

๐Ÿšง Phase 3: Advanced Features (Weeks 7โ€“9)

  • JS rendering path (Browserbase + Stagehand)
  • Graph memory with community summaries
  • Graph overlays in Pilot View
  • Origin detection and reading queue optimization
  • Video caption alignment

Exit Criteria: โ‰ค25% JS renders for general topics; graph structure validated

๐Ÿ“… Phase 4: Analysis & Exports (Weeks 10โ€“12)

  • Review Mode implementation
  • Timeline replay with KPI dashboard
  • Evidence ledger and learning map
  • Export formats (JSONL, CSV, GraphML, PDF)
  • A/B run comparison
  • Evaluation framework and dashboards
  • RBAC and audit logs

Exit Criteria: Full replay works; exports verified; team workflow tested

๐Ÿ“… Phase 5 (V2): Advanced Analytics & Scaling

  • Advanced attribution analytics
  • Team multi-tenancy and shared workspaces
  • Failure autopsy and automated debugging
  • Extended scholarly integrations (arXiv, PubMed, etc.)
  • Multi-language support
  • Custom ranking rule builder

Contributing

We welcome contributions! Here's how to get involved:

Report Issues

Found a bug? Have a feature idea? โ†’ Open an issue

Contribute Code

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes following our style guide
  4. Add tests for new functionality
  5. Submit a pull request

See Contributing Guide for detailed setup and process.

Improve Documentation

  • Clarify existing docs
  • Add examples and tutorials
  • Report broken links or unclear sections
  • Translate documentation

Share Use Cases

Tell us how you're using Deep Research Cockpit! We'd love to feature your research or workflow.


Glossary

Authority Mix Percentage of sources in L0/L1 tiers; target โ‰ฅ60% for core-first runs.

Core Proximity Citation-graph distance to foundational sources (papers, specs, repos).

Frontier Priority queue of next concepts, pages, and claims to explore; scored by novelty, centrality, disagreement, and user interest.

GraphRAG Retrieval Augmented Generation combined with graph structure; uses knowledge graph to guide discovery and provide context.

L0 / L1 / L2 / L3 / L4 Source tier classification from foundational (L0) to general discussion (L4).

TTFC Time-to-First-Core source; how quickly the system discovers L0/L1 materials.

Why-Ranked Human-readable breakdown of factors contributing to a source's rank; shows which factors helped (green) and hurt (red).

Deterministic Replay Ability to reproduce any session exactly given same inputs; enabled by seeded randomness and event logging.

Snapshot Pre-computed state capture of a research session at a point in time; enables fast review mode loading.

Evidence Robustness Average number of independent sources supporting each high-impact claim; target โ‰ฅ2.0.

Frontier Entropy Measure of breadth in exploration; higher values indicate more diverse concept coverage.


License & Support

License

This project is licensed under the MIT License โ€” see LICENSE file for details.

Support

Credits

Built with gratitude for these excellent projects:


Quick Links


Deep Research Cockpit โ€” Making research exploration transparent, verifiable, and steerable.