Drag : AI-Powered Documentation RAG Assistant

What Is This?

drag is a full-stack AI-powered documentation assistant. You point it at any documentation URL, it crawls the site, stores the content in a vector database, and then you can ask natural-language questions about that documentation — powered by Retrieval-Augmented Generation (RAG).

Tech Stack

Frontend: Next.js 14, TypeScript, Tailwind CSS, Shadcn UI

Backend: FastAPI, ChromaDB, OpenAI Embeddings (text-embedding-ada-002), Google Gemini 1.5 Flash, SQLite, crawl4ai, BeautifulSoup4

How It Works

1. Crawl & Index

When you submit a documentation URL, the system runs a two-layer crawling strategy:

BeautifulSoup performs a multiprocess breadth-first crawl to discover all internal links within the domain
crawl4ai then asynchronously crawls each discovered page and extracts clean Markdown content

The extracted content is chunked using tiktoken (matching the exact tokenizer used by text-embedding-ada-002), embedded via OpenAI, and stored in a persistent ChromaDB vector store.

2. Chat & Retrieve

When you ask a question, the system queries ChromaDB for the top 5 semantically similar chunks, feeds them as context to Google Gemini 1.5 Flash, and returns a grounded answer based on the actual documentation.

3. Deduplication

Two layers of deduplication prevent redundant work:

SQLite tracks visited URLs at the network level no re-crawling
ChromaDB checks existing URLs at the storage level ,no re-embedding

Architecture

The frontend and backend run concurrently from a single pnpm dev command. Next.js proxies /api/py/* requests to the FastAPI backend, enabling seamless communication.

User → Next.js (Chat UI) → FastAPI
                              ├── DocsCrawler (BFS link discovery)
                              ├── crawl4ai (async content extraction)
                              ├── ChromaDB (vector storage & search)
                              ├── OpenAI (embeddings)
                              ├── Gemini (generation)
                              └── SQLite (crawl tracking)

Key Engineering Decisions

Dual crawling: BeautifulSoup for broad link discovery, crawl4ai for high-quality Markdown extraction — each tool plays to its strength
Token-aware chunking: Using the exact cl100k_base tokenizer that ada-002 uses ensures chunks stay within embedding model limits
Dual AI provider split: OpenAI handles embeddings (strong retrieval quality), Gemini handles generation (cost-effective chat)
Persistent storage: ChromaDB on disk means the vector store survives server restarts — no need to re-crawl

What I Learned

Building this project deepened my understanding of the RAG pipeline end to end — from web crawling and text chunking to vector embeddings and LLM-grounded generation. It also gave me hands-on experience wiring a Python AI backend with a modern TypeScript frontend in a monorepo setup.

View the source code →