Skip to main content

Core Concepts

Ground is built around a few key concepts that work together to provide grounded retrieval.

Sources

A source is a repository, documentation site, or PDF that Ground indexes. Each source has:
  • Type: repo (Git), docs (documentation), or pdf (uploaded PDF)
  • Format: For docs, either html (web pages) or openapi (API specs)
  • URL: The location to fetch content from
  • Status: pending, syncing, synced, or error
Sources are indexed asynchronously via sync jobs. After creating a source, you must trigger a sync to index its content.

Jobs

A job represents a sync operation that fetches, parses, chunks, embeds, and indexes content from a source. Jobs progress through stages:
  1. queued → Waiting to be processed
  2. fetch → Downloading content from the source
  3. parse → Extracting text from files/pages
  4. chunk → Splitting content into searchable chunks
  5. embed → Generating vector embeddings
  6. index → Storing chunks in the database
  7. finalize → Updating source metadata

Chunks

A chunk is a piece of indexed content with:
  • Content: The actual text
  • Embedding: Vector representation for semantic search
  • Metadata: Path, language, line numbers, version reference
  • Extra metadata: For OpenAPI chunks, includes method, path, operation ID
Ground uses hybrid search combining:
  1. Vector similarity: Finds semantically similar content
  2. Full-text search: Matches keywords and phrases
Results are scored using a weighted combination (70% vector, 30% text by default).

Citations

Every search result includes a citation with:
  • Source name and ID
  • File path or URL
  • Symbol (function/class name or section heading)
  • Line numbers (for code)
  • Version reference (commit SHA or doc version)
  • Language/chunk type

Freshness & Staleness

Ground tracks how recent each source’s content is:
  • Freshness: Days since last successful sync
  • Staleness: When a source exceeds the configured staleness budget
  • Warnings: Stale results include warnings in the response

Trust Policy

The trust policy controls search behavior:
  • Staleness budget: How many days before content is considered stale
  • Source priorities: Weights for different source types (e.g., OpenAPI higher for API questions)
  • Refusal thresholds: Minimum evidence count/score to answer

Conflicts

When multiple sources define the same thing differently (e.g., same API endpoint with different schemas), Ground detects and surfaces the conflict.

Sources

Deep dive into source types and configuration

Search

How hybrid search works

Trust Policy

Configure staleness and refusal

OpenAPI

Index API specifications