Skip to main content

Core Concepts

Ground is built around a few key concepts that work together to provide grounded retrieval.

Sources

A source is a repository or documentation site that Ground indexes. Each source has:
  • Type: repo (Git repository) or docs (documentation)
  • Format: For docs, either html (web pages) or openapi (API specs)
  • URL: The location to fetch content from
  • Status: pending, syncing, synced, or error
Sources are indexed asynchronously via sync jobs. After creating a source, you must trigger a sync to index its content.

Jobs

A job represents a sync operation that fetches, parses, chunks, embeds, and indexes content from a source. Jobs progress through stages:
  1. queued → Waiting to be processed
  2. fetch → Downloading content from the source
  3. parse → Extracting text from files/pages
  4. chunk → Splitting content into searchable chunks
  5. embed → Generating vector embeddings
  6. index → Storing chunks in the database
  7. finalize → Updating source metadata

Chunks

A chunk is a piece of indexed content with:
  • Content: The actual text
  • Embedding: Vector representation for semantic search
  • Metadata: Path, language, line numbers, version reference
  • Extra metadata: For OpenAPI chunks, includes method, path, operation ID
Ground uses hybrid search combining:
  1. Vector similarity: Finds semantically similar content
  2. Full-text search: Matches keywords and phrases
Results are scored using a weighted combination (70% vector, 30% text by default).

Citations

Every search result includes a citation with:
  • Source name and ID
  • File path or URL
  • Symbol (function/class name or section heading)
  • Line numbers (for code)
  • Version reference (commit SHA or doc version)
  • Language/chunk type

Freshness & Staleness

Ground tracks how recent each source’s content is:
  • Freshness: Days since last successful sync
  • Staleness: When a source exceeds the configured staleness budget
  • Warnings: Stale results include warnings in the response

Trust Policy

The trust policy controls search behavior:
  • Staleness budget: How many days before content is considered stale
  • Source priorities: Weights for different source types (e.g., OpenAPI higher for API questions)
  • Refusal thresholds: Minimum evidence count/score to answer

Conflicts

When multiple sources define the same thing differently (e.g., same API endpoint with different schemas), Ground detects and surfaces the conflict.