RAG
(26/02/18) Although I still primarily use the Lunr XML search feature, I occasionally use the AI helper and I want to get rid of the remote API calls. I'm adding this page for documenting the process.
Overview
Build a fully local RAG system that ingests ~313 markdown files from the Docusaurus site, stores them in ChromaDB with Ollama embeddings, and provides a web UI for semantic search and Q&A.
Stack
- Python 3 with FastAPI for the web server
- Ollama with nomic-embed-text for embeddings
- ChromaDB for vector storage (persisted to disk)
- Simple HTML/JS frontend served by FastAPI
Directory Structure
daw_til/
└── rag/
├── ingest.py # Markdown parser + ChromaDB ingestion
├── server.py # FastAPI web server + query endpoint
├── requirements.txt # Python dependencies
├── templates/
│ └── index.html # Web UI (search box + results)
└── chroma_db/ # ChromaDB persistence (gitignored)
Implementation Steps
- Create rag/requirements.txt
chromadb
fastapi
uvicorn[standard]
ollama
python-frontmatter
- Create
rag/ingest.py— Markdown Ingestion Pipeline
- Walk docs/, notes/, lists/, posts/ directories
- Parse each .md file:
- Extract YAML frontmatter (posts have title, slug, description, tags)
- For docs/notes/lists: derive title from first # heading
- Extract date from post filenames (YYYY-MM-DD pattern)
- Categorize by directory: docs, notes, lists, posts
- Chunking strategy:
- Split on markdown headers (##, ###) to keep semantic sections intact
- For sections exceeding ~1000 chars, further split on paragraphs
- Each chunk gets metadata: source (file path), category, section (header path), title, tags (if post)
- Generate embeddings via Ollama (nomic-embed-text)
- Upsert into ChromaDB collection with metadata
- Print progress (file count, chunk count, timing)
- Create
rag/server.py— FastAPI Query Server
GET /— serve the HTML UIPOST /query— accept a search query, embed it with Ollama, query ChromaDB for top-k results (default 8), return JSON with chunks + metadata + distancesGET /stats— return collection stats (doc count, chunk count)- CORS enabled for local dev
- Create
rag/templates/index.html— Web UI
- Clean, minimal search interface
- Search box with submit button
- Results displayed as cards showing:
- Matched text snippet
- Source file path (clickable link to Docusaurus page)
- Category badge (docs/notes/lists/posts)
- Relevance score
- Loading spinner during search
- Add
rag/to.gitignoreAddrag/chroma_db/to gitignore to keep vector data out of git
Metadata Schema (ChromaDB)
Each chunk stored with:
{
"source": "docs/lang/JavaScript.md", # relative file path
"category": "docs", # docs|notes|lists|posts
"title": "JavaScript", # doc/post title
"section": "Arrays > Map and Filter", # header breadcrumb
"tags": "tech", # comma-separated (posts only)
"date": "2025-03-03", # posts only
"url": "/til/docs/lang/JavaScript" # reconstructed Docusaurus URL
}
Prerequisites
- Ollama installed and running with nomic-embed-text model pulled
- Python 3.10+
Usage
cd rag
pip install -r requirements.txt
ollama pull nomic-embed-text
# Ingest all markdown files (run once, re-run to rebuild)
python ingest.py
# Start the web UI
python server.py
# Open http://localhost:8808
Verification
- Run
python ingest.py— should report ~300+ files processed and ~1000+ chunks created - Run
python server.py— should start on port 8808 - Open browser to
http://localhost:8808 - Search for "PostgreSQL" — should return relevant chunks from
docs/db/PostgreSQL.md - Search for "Docker" — should return chunks from docs/server/Docker.md
- Search for a conceptual query like "how to set up SSL certificates" — should return relevant Let's Encrypt / Apache / Nginx docs