Drag : AI-Powered Documentation RAG Assistant

January 30, 2025

What Is This?

drag is a full-stack AI-powered documentation assistant. You point it at any documentation URL, it crawls the site, stores the content in a vector database, and then you can ask natural-language questions about that documentation — powered by Retrieval-Augmented Generation (RAG).

Tech Stack

Frontend: Next.js 14, TypeScript, Tailwind CSS, Shadcn UI

Backend: FastAPI, ChromaDB, OpenAI Embeddings (text-embedding-ada-002), Google Gemini 1.5 Flash, SQLite, crawl4ai, BeautifulSoup4

How It Works

1. Crawl & Index

When you submit a documentation URL, the system runs a two-layer crawling strategy:

  • BeautifulSoup performs a multiprocess breadth-first crawl to discover all internal links within the domain
  • crawl4ai then asynchronously crawls each discovered page and extracts clean Markdown content

The extracted content is chunked using tiktoken (matching the exact tokenizer used by text-embedding-ada-002), embedded via OpenAI, and stored in a persistent ChromaDB vector store.

2. Chat & Retrieve

When you ask a question, the system queries ChromaDB for the top 5 semantically similar chunks, feeds them as context to Google Gemini 1.5 Flash, and returns a grounded answer based on the actual documentation.

3. Deduplication

Two layers of deduplication prevent redundant work:

  • SQLite tracks visited URLs at the network level no re-crawling
  • ChromaDB checks existing URLs at the storage level ,no re-embedding

Architecture

The frontend and backend run concurrently from a single pnpm dev command. Next.js proxies /api/py/* requests to the FastAPI backend, enabling seamless communication.

User  Next.js (Chat UI)  FastAPI
                              ├── DocsCrawler (BFS link discovery)
                              ├── crawl4ai (async content extraction)
                              ├── ChromaDB (vector storage & search)
                              ├── OpenAI (embeddings)
                              ├── Gemini (generation)
                              └── SQLite (crawl tracking)

Key Engineering Decisions

  • Dual crawling: BeautifulSoup for broad link discovery, crawl4ai for high-quality Markdown extraction — each tool plays to its strength
  • Token-aware chunking: Using the exact cl100k_base tokenizer that ada-002 uses ensures chunks stay within embedding model limits
  • Dual AI provider split: OpenAI handles embeddings (strong retrieval quality), Gemini handles generation (cost-effective chat)
  • Persistent storage: ChromaDB on disk means the vector store survives server restarts — no need to re-crawl

What I Learned

Building this project deepened my understanding of the RAG pipeline end to end — from web crawling and text chunking to vector embeddings and LLM-grounded generation. It also gave me hands-on experience wiring a Python AI backend with a modern TypeScript frontend in a monorepo setup.

View the source code →