What Is This?
drag is a full-stack AI-powered documentation assistant. You point it at any documentation URL, it crawls the site, stores the content in a vector database, and then you can ask natural-language questions about that documentation — powered by Retrieval-Augmented Generation (RAG).
Tech Stack
Frontend: Next.js 14, TypeScript, Tailwind CSS, Shadcn UI
Backend: FastAPI, ChromaDB, OpenAI Embeddings (text-embedding-ada-002), Google Gemini 1.5 Flash, SQLite, crawl4ai, BeautifulSoup4
How It Works
1. Crawl & Index
When you submit a documentation URL, the system runs a two-layer crawling strategy:
- BeautifulSoup performs a multiprocess breadth-first crawl to discover all internal links within the domain
- crawl4ai then asynchronously crawls each discovered page and extracts clean Markdown content
The extracted content is chunked using tiktoken (matching the exact tokenizer used by text-embedding-ada-002), embedded via OpenAI, and stored in a persistent ChromaDB vector store.
2. Chat & Retrieve
When you ask a question, the system queries ChromaDB for the top 5 semantically similar chunks, feeds them as context to Google Gemini 1.5 Flash, and returns a grounded answer based on the actual documentation.
3. Deduplication
Two layers of deduplication prevent redundant work:
- SQLite tracks visited URLs at the network level no re-crawling
- ChromaDB checks existing URLs at the storage level ,no re-embedding
Architecture
The frontend and backend run concurrently from a single pnpm dev command. Next.js proxies /api/py/* requests to the FastAPI backend, enabling seamless communication.
User → Next.js (Chat UI) → FastAPI
├── DocsCrawler (BFS link discovery)
├── crawl4ai (async content extraction)
├── ChromaDB (vector storage & search)
├── OpenAI (embeddings)
├── Gemini (generation)
└── SQLite (crawl tracking)
Key Engineering Decisions
- Dual crawling: BeautifulSoup for broad link discovery, crawl4ai for high-quality Markdown extraction — each tool plays to its strength
- Token-aware chunking: Using the exact
cl100k_basetokenizer thatada-002uses ensures chunks stay within embedding model limits - Dual AI provider split: OpenAI handles embeddings (strong retrieval quality), Gemini handles generation (cost-effective chat)
- Persistent storage: ChromaDB on disk means the vector store survives server restarts — no need to re-crawl
What I Learned
Building this project deepened my understanding of the RAG pipeline end to end — from web crawling and text chunking to vector embeddings and LLM-grounded generation. It also gave me hands-on experience wiring a Python AI backend with a modern TypeScript frontend in a monorepo setup.