Essays

24 essays · 5 also in Türkçe

RAG from First Principles — runnable code & step-by-step notebooks for the 20-part series Code on GitHub ↗

Part 1
Why RAG Exists

Jun 9, 2026 19 min read EN · TR

Even a powerful LLM answers confidently and wrong about your own documents and yesterday's events. Part 1 of a from-scratch series on Retrieval-Augmented Generation: no code, just the four problems RAG solves and one durable mental model.
- #RAG
- #LLM
- #Retrieval
Part 2
Embeddings, Truly Understood

Jun 9, 2026 28 min read EN

RAG retrieves the 'relevant' information, but how does a computer decide what counts as relevant? Part 2 of a from-scratch series on Retrieval-Augmented Generation: how we turn meaning into numbers, why similar meanings land close together, and the quiet geometric trick that makes search by meaning possible.
- #RAG
- #Embeddings
- #Vectors
Part 3
Measuring Similarity

Jun 9, 2026 19 min read EN

Relevant just means close in embedding space, but how do you turn 'close' into a single number you can rank by? Part 3 of a from-scratch series on Retrieval-Augmented Generation: Euclidean distance, the dot product, and why cosine similarity, which measures direction and ignores length, is the default for scoring chunks in RAG.
- #RAG
- #Embeddings
- #Vector Search
Part 4
Vector Databases and Indexing

Jun 10, 2026 30 min read EN

You can score one chunk against a query, but doing it for every chunk is exact, brute-force k-NN: perfectly accurate and painfully O(n). Part 4 of a from-scratch series on Retrieval-Augmented Generation: why ordinary database indexes break in high dimensions, the speed-versus-recall trade-off behind approximate nearest-neighbor (ANN) search, the intuition for HNSW and IVF, and what a vector database actually stores and does.
- #RAG
- #Vector Database
- #ANN
Part 5
Documents and Chunking

Jun 11, 2026 18 min read EN

We have a retrieval engine, but it rests on one quiet assumption: that documents arrive as tidy chunks. Part 5 of a from-scratch series on Retrieval-Augmented Generation: getting clean text out of messy formats, why we chunk at all, the too-small versus too-large tension, the main splitting strategies (fixed-size, recursive, structure-aware, semantic), and the two dials that quietly decide retrieval quality, chunk size and overlap. Bad chunks poison everything downstream.
- #RAG
- #Chunking
- #Document Processing
Part 6
Build Your First RAG

Jun 12, 2026 17 min read EN

Five parts of theory, now one running program. Part 6 of a from-scratch series on Retrieval-Augmented Generation: build a complete chat-with-your-documents app by hand in Python, no framework hiding the mechanics. Embed with a local model, store vectors in plain NumPy, score by cosine similarity, retrieve top-k, ground the prompt, and generate, then swap in a real vector database. Every line ties back to a concept you already learned.
- #RAG
- #Python
- #Embeddings
Part 7
Retrieval Deep Dive

Jun 13, 2026 28 min read EN

Our Part 6 app works, but it retrieves naively: pure semantic search with a fixed top-k. Part 7 of a from-scratch series on Retrieval-Augmented Generation: why dense retrieval whiffs on exact codes and names, the sparse (keyword) retrieval that nails them, TF-IDF and BM25 explained by intuition, how hybrid search fuses the two (weighted sum and Reciprocal Rank Fusion), and why top-k is a real knob with a lost-in-the-middle trap. Dense and sparse fail in opposite directions; combine them.
- #RAG
- #Retrieval
- #BM25
Part 8
Making Retrieval Smarter

Jun 14, 2026 27 min read EN

Part 8 of a from-scratch series on Retrieval-Augmented Generation. First-pass retrieval is fast but only roughly right: the best chunk can sit at rank six. Sharpen it with three levers, in pipeline order. Before retrieval, transform the query (multi-query, HyDE, step-back, decomposition). During retrieval, filter by metadata. After retrieval, rerank a wide candidate set with a cross-encoder and keep the best few. Includes a focused code addition that adds reranking and a metadata filter to the app you built in Part 6.
- #RAG
- #Retrieval
- #Reranking
Part 9
Advanced Retrieval Patterns

Jun 15, 2026 21 min read EN

Eight parts in, your pipeline retrieves well. But it still assumes one unit of text does triple duty: the thing you embed, the thing you search, and the thing you hand the model. Part 9 of a from-scratch series on Retrieval-Augmented Generation breaks that assumption. The big idea is decoupling: the best unit to search on (small, sharp) is rarely the best unit to generate from (large, rich). Four patterns put it to work, parent-document, sentence-window, self-querying, and contextual compression, with one focused code addition on the running app.
- #RAG
- #Retrieval
- #Chunking
Part 10
Advanced RAG Architectures

Jun 16, 2026 26 min read EN

The leap from a fixed pipeline that runs the same way every time to a dynamic, decision-making loop that can choose whether to retrieve, judge what came back, and try again. Part 10 of a from-scratch series on Retrieval-Augmented Generation: a guided tour of Agentic RAG, Corrective RAG (CRAG), Self-RAG, GraphRAG, and Multi-Modal RAG, what control flow each one adds, and the sober cost of reaching for any of them.
- #RAG
- #Agentic RAG
- #GraphRAG
Part 11
Evaluating RAG

Jun 17, 2026 33 min read EN

How to replace vibes with numbers. Part 11 of a from-scratch series on Retrieval-Augmented Generation: the two failure surfaces of a RAG system, the core metrics that probe each one (context precision and recall, faithfulness, answer relevance), LLM-as-a-judge and its biases, the frameworks that automate it, how to build an evaluation set, and the disciplined loop that turns guessing into engineering.
- #RAG
- #Evaluation
- #RAGAS
Part 12
RAG in Production

Jun 18, 2026 34 min read EN

The finale. A RAG system that works in a notebook is about 20 percent of the job; the other 80 percent is making it fast, cheap, reliable, secure, and observable under real traffic. Part 12 of a from-scratch series on Retrieval-Augmented Generation: where latency and cost actually go and how to cut them, caching (including semantic caching), monitoring and tracing, failing gracefully, and the most underrated topic of all, security (prompt injection and data leakage). It closes with a capstone checklist for the whole series and a warm send-off.
- #RAG
- #Production
- #Latency
Part 13
Late-Interaction Retrieval

Jun 19, 2026 25 min read EN

Single-vector embeddings throw away token-level signal. Late interaction keeps a vector per token and scores with MaxSim, getting cross-encoder-quality matching at bi-encoder serving cost. Part 13 of a from-scratch series on Retrieval-Augmented Generation, opening the Frontier Track: ColBERT and ColBERTv2, MaxSim by hand in numpy, the storage tradeoff, and how ColPali extends late interaction to document page images without OCR or chunking.
- #RAG
- #Retrieval
- #ColBERT
Part 14
Context-Aware Chunking

Jun 20, 2026 25 min read EN

A chunk that reads fine in isolation can be uninterpretable once it leaves its document: 'she' no longer resolves to 'Alice', 'the policy' loses its antecedent. Part 14 of a from-scratch series on Retrieval-Augmented Generation, on the Frontier Track: two training-free fixes, late chunking (pool token spans after the transformer) and Anthropic's Contextual Retrieval (prepend an LLM-written situating sentence before embedding), built by hand and compared.
- #RAG
- #Chunking
- #Late Chunking
Part 15
Adaptive RAG

Jun 21, 2026 23 min read EN

Not every query needs the same machinery: a greeting needs no retrieval, a fact needs one lookup, a comparison needs several. Part 15 of a from-scratch series on Retrieval-Augmented Generation and the close of the Frontier Track: a small complexity classifier that routes each query to no-retrieval, single-step, or multi-step retrieval, unifying the pipelines built across Parts 6 to 10 into one adaptive system.
- #RAG
- #Adaptive RAG
- #Routing
Part 16
RAG vs Long-Context vs CAG

Jun 22, 2026 23 min read EN

Part 1 asked why RAG exists. Part 16 asks the harder follow-up: when do you even need retrieval? Context windows reach about a million tokens in 2026, so sometimes you can just stuff everything in, and Cache-Augmented Generation (CAG) preloads a small, stable corpus once and reuses the cached KV state instead of retrieving. This part works out the prompt-caching economics that decide between them and gives you a clear decision matrix: massive or fast-moving or private corpus to RAG, small and stable to CAG or long-context, mid-size to long-context.
- #RAG
- #Long Context
- #CAG
Part 17
Securing RAG

Jun 23, 2026 24 min read EN

RAG widens the attack surface in a way ordinary apps do not: its whole premise is feeding external, often untrusted, content straight into a powerful model's prompt. Part 17 of a from-scratch series on Retrieval-Augmented Generation: the threats unique to RAG (indirect prompt injection through retrieved documents, knowledge-base poisoning, cross-tenant leakage) and the layered defensive pipeline that contains them, from input redaction and provenance scoring to a delimited untrusted-context wall, decline-if-not-grounded, output filtering, and identity-scoped access control.
- #RAG
- #Security
- #Prompt Injection
Part 18
Structured and SQL RAG

Jun 24, 2026 22 min read EN

Most enterprise knowledge does not live in documents, it lives in databases and tables, and dense passage retrieval cannot answer a question whose answer has to be computed. Part 18 of a from-scratch series on Retrieval-Augmented Generation: text-to-SQL with RAG (retrieve the schema, generate SQL, execute, answer), table retrieval and the scaling reality, and routing text-search versus SQL per query.
- #RAG
- #Text-to-SQL
- #Structured Data
Part 19
Building a RAG Agent

Jun 25, 2026 18 min read EN

Part 19 of a from-scratch series on Retrieval-Augmented Generation: take the agentic RAG that Part 10 only toured in prose (the ReAct loop, tool use, routing, multi-hop) and build a real agent by hand, with four tools, a reason/act/observe loop, an honest step budget, and three traces you can read line by line.
- #RAG
- #Agentic RAG
- #ReAct
Part 20
Conversational RAG
New — most recent essay

Jun 26, 2026 19 min read EN

Part 20 of a from-scratch series on Retrieval-Augmented Generation: give the one-shot agent a memory. Build multi-turn RAG by hand, where query condensation rewrites a context-dependent follow-up into a standalone question before retrieval, so 'what about damaged items?' finally finds the right chunk.
- #RAG
- #Conversational RAG
- #Multi-turn
When the Co-Pilot Is an Algorithm: How Europe Plans to Make Aviation AI Trustworthy

Jun 4, 2026 10 min read EN · TR

The EU's aviation regulator just published a 239-page concept paper on making AI safe enough to fly. From artificial narrow intelligence to learning assurance and the W-shape process, the mental model behind trustworthy aviation AI, explained for non-experts.
- #EASA
- #Aviation
- #AI Safety
ASELSAN TOYGUN: Platform Bütünleşik EOTS Teknik İncelemesi

Jun 1, 2026 21 min read EN · TR

TOYGUN'un nasıl gördüğünü, hedefi nasıl tanıyıp izlediğini, lazerle nasıl ölçüp işaretlediğini ve uçağın radar izini bozmadan tüm bunları nasıl yaptığını kademe kademe, sinyal zinciri ve alt sistemler ile anlatır.
- #ASELSAN
- #TOYGUN
- #EOTS
Fine-tuning an SLM on UAV combat doctrine

Jun 1, 2026 14 min read EN · TR

End-to-end notes from a $55 LoRA build on gpt-oss-20b: the data pipeline, the training run, the evaluation, and the surprise that the biggest win was teaching a reasoning model to stop reasoning.
- #LLMs
- #Fine-tuning
- #LoRA
The Silent Variable in Graph RAG: Why Corpus Language Matters More Than You Think

Apr 21, 2026 5 min read EN · TR

Chunk size, embeddings, re-rankers: the usual suspects. But the language of your corpus quietly shapes every layer of the pipeline, and reasoning models make it decisive.
- #Graph RAG
- #LLMs
- #Multilingual