Chibeze

Building a Production Vision RAG System with ColPali and Light-ColPali

We took ColPali (vision-language embeddings for documents) and Light-ColPali (token merging via hierarchical clustering) and built the production infrastructure around them. The system uses PostgreSQL + pgvector as a unified store, a lease-based job queue for resilient ingestion, and a two-stage retrieval pipeline that retrieves at patch granularity but ranks at page level.

The key insight: text extraction is lossy. For documents with complex layouts, charts, and tables, embedding the rendered page as an image solves problems that text-based RAG can’t touch.

  • March 8, 2026

GPT Transformers models

Transformer models revolutionised AI by replacing recurrence with attention, allowing machines to understand context, relationships, and meaning across long sequences of data. This post explores how GPT models harness the Transformer architecture to generate coherent, intelligent text and why this innovation reshaped the entire AI landscape.

  • November 5, 2025

Unlocking RAG Success with Semantic Chunking

Semantic chunking is a smarter, meaning-based approach to breaking text into pieces for Retrieval-Augmented Generation (RAG) systems. Instead of slicing by token count, it groups sentences that belong together, keeping the logic and flow intact. This helps retrieval systems pull coherent context rather than fragmented thoughts—a crucial advantage when dealing with complex documents like contracts or financial statements. While current research shows mixed evidence on its effectiveness, the theoretical promise is clear: semantic chunking preserves context, coherence, and clarity, making RAG systems feel less mechanical and more genuinely understanding.

  • November 1, 2025