Core Concepts

Ground is built around a few key concepts that work together to provide grounded retrieval.

Sources

A source is a repository, documentation site, or PDF that Ground indexes. Each source has:

Type: repo (Git), docs (documentation), or pdf (uploaded PDF)
Format: For docs, either html (web pages) or openapi (API specs)
URL: The location to fetch content from
Status: pending, syncing, synced, or error

Sources are indexed asynchronously via sync jobs. After creating a source, you must trigger a sync to index its content.

Jobs

A job represents a sync operation that fetches, parses, chunks, embeds, and indexes content from a source. Jobs progress through stages:

queued → Waiting to be processed
fetch → Downloading content from the source
parse → Extracting text from files/pages
chunk → Splitting content into searchable chunks
embed → Generating vector embeddings
index → Storing chunks in the database
finalize → Updating source metadata

Chunks

A chunk is a piece of indexed content with:

Content: The actual text
Embedding: Vector representation for semantic search
Metadata: Path, language, line numbers, version reference
Extra metadata: For OpenAPI chunks, includes method, path, operation ID

Search

Ground uses hybrid search combining:

Vector similarity: Finds semantically similar content
Full-text search: Matches keywords and phrases

Results are scored using a weighted combination (70% vector, 30% text by default).

Citations

Every search result includes a citation with:

Source name and ID
File path or URL
Symbol (function/class name or section heading)
Line numbers (for code)
Version reference (commit SHA or doc version)
Language/chunk type

Freshness & Staleness

Ground tracks how recent each source’s content is:

Freshness: Days since last successful sync
Staleness: When a source exceeds the configured staleness budget
Warnings: Stale results include warnings in the response

Trust Policy

The trust policy controls search behavior:

Staleness budget: How many days before content is considered stale
Source priorities: Weights for different source types (e.g., OpenAPI higher for API questions)
Refusal thresholds: Minimum evidence count/score to answer

Conflicts

When multiple sources define the same thing differently (e.g., same API endpoint with different schemas), Ground detects and surfaces the conflict.

Sources

Deep dive into source types and configuration

Search

How hybrid search works

Trust Policy

Configure staleness and refusal

OpenAPI

Index API specifications

​Core Concepts

​Sources

​Jobs

​Chunks

​Search

​Citations

​Freshness & Staleness

​Trust Policy

​Conflicts

Sources

Search

Trust Policy

OpenAPI

Core Concepts

Sources

Jobs

Chunks

Search

Citations

Freshness & Staleness

Trust Policy

Conflicts