RAG - Beyond Basics
Rainer Stropek | time cockpit
Rainer Stropek
- Passionate software developers for 30+ years
- time cockpit
- time cockpit
- Microsoft MVP, Regional Director
- Trainer, Teacher, Mentor
- 💕 community

Data Source
Unstructured, semi-structured, structured
Vector DB
Embedding vectors,
backlinks, metadata
Bot
Takes question,
performs retrieval,
builds prompt,
sends to LLM,
answers to user
LLM
Receives prompt with embedded content from retrieval, generates answer
User
Make or Buy?
Make or Buy?
Document Processing
Garbage in, garbage out 😅
Data Source
Vector DB
Bot
LLM
User
General Considerations
- Cleanup!
- How to deal with document versions?
- Governance hurdles
- Define use cases, involve end users
- Know your documents
- Set realistic expectations
- Consider permission requirements from day one
- Special case: Extract structured data from documents
Parsing
- Source format might not be text
- E.g. images, PDF, docx, xlsx, etc.
- Typically converted to Markdown
- Many options
- Using LLMs
- Use specialized online services
E.g. LlamaParse, Azure Document Intelligence, Mistral OCR - Use libraries/packages
E.g. MarkItDown (MS), docling - Use ETL plattforms specialized on AI (e.g. unstructured.io)
- How to choose?
- Special requirements (features, security, governance, license form)?
- Know your documents (content, quality, structure, etc.)
- Test with real-world documents
- Involve your end users
Splitting
- Different document types require different chunking strategies
Examples:- Legal contracts
- Technical documents
- Conversational transcripts
- Code
- Text splitters
- Simple: Length-based (tokens, characters)
- Usually: Document-aware splitters; examples:
- LangChain Text Splitters 🔗
- LlamaIndex Markdown Parser
- Advance: Semantic splitting
- See also 5 Levels Of Text Splitting
Keeping Context
- Challenge
- Chunks might lose important context about parent document
- Possible Solutions
- Prepend document metadata to each chunk
- Use a context window that includes neighboring chunks
- Attach summaries to chunks
Multi-Modal Content
- Challenge
- Documents with embedded images, charts, tables
- Specialized libraries/services available
Vectors
Data Source
Vector DB
Bot
LLM
User
Embedding Model Selection
Choosing a Vector DB
- Examples (open and closed source)
- Specialized:
Qdrant, Azure AI Search, pinecone, Weaviate, Milvus) - Unified data storage:
PostgreSQL (e.g. PGVector), SQL, Mongo, CosmosDB, etc.
- Specialized:
- Location
- On-premises/cloud
- Cloud only
- How to choose?
- Specific requirements (features, scalability, auth, etc.)?
Look for specialized solutions - Already using a DB? It probably already supports vector search
- What is available in your private/public cloud?
- Specific requirements (features, scalability, auth, etc.)?
Beyond Vectors
- Content of Vector DB
- Embedding vectors
- Backlinks to original documents/fragments (e.g. PKs)
- Metadata (keyword search, faceted search, security filter)
- Metadata for document-level permissions
- E.g. Security Filter Pattern in Azure AI Search 🔗
- Challenge: Use permissions from source system
- Use standard component
E.g. Permissions-aware content retrieval with SharePoint and LlamaCloud 🔗 - Implement custom solutions
- Don't do it 😅
- Use standard component
Queries
Data Source
Vector DB
Bot
LLM
User
- LLMs cannot magically answer everything!
- Work on real-world use cases
- Involve your end users
- Detail queries
- Look for a specific answer in a single document
- Examples
- Search answer in knowledge base
- Who signed contract X?
- Who was present at meeting Y?
- Aggregations
- Count, get aggregated data
- Examples
- How many contracts do we have with vendor X?
- What is the total worth of all our support contracts?
- Pre-compute statistics, use structured extraction
- Consider agent-based retrieval patterns
Enhance Retrieval Results
- Rewrite queries with LLM; examples:
- Query expansion with synonyms/related terms
- Multi-query generation for different interpretations
- HyDE (Hypothetical Document Embeddings)
- "Imagine" what queries users might type
- Encode the documents with these hypothetical queries
- Dense retrieval with re-ranking
- Dense retrieval: Fast retrieval based on embedding vectors
- Re-raking: Detailed analysis of documents, find few best fitting docs
- Graph-based document retrieval
- Store relationships between documents
- Explore "neighbors" of retrieved docs via graph
- Consider existing platforms like LlamaIndex for complex queries
Document-Level Authorization
- Different users have access to different documents
- Make or buy?
- E.g. OSO 🔗
- Metadata filtering at retrieval time
- Row-level security in vector database
- Import permissions from source systems?
- Separate indexes per access level
- Suitable if only a few access levels
Agentic RAG
Retrieval as Function Tool
- Retrieval step implemented as a function tool
- Replaces embedding documents in prompts
- Agent decides about tool use autonomously
- Different tools for different use cases
- Aggregation vs. detail queries
- Larger scope: Specialized agents
- Make tools widely available using MCPs
- Potentially no need for custom UI
Q&A
Rainer Stropek | time cockpit
RAG-Beyond-Basics
By Rainer Stropek
RAG-Beyond-Basics
- 220