Vector Databases & How They Work

Understanding why we need vector databases and exploring their internal implementations for efficient similarity search.

Why Do We Need Vector Databases?

Consider these two queries that might be sent to a search engine:

"what is an array"

"Array kya hota hai"

These queries are asking the same question, but traditional databases can't recognize their similarity. This is where vector databases come in.

SQL and NoSQL databases are designed to find exact matches based on known values, while vector databases are designed to find similar items based on conceptual meaning.

First Method: Brute Force

The simplest approach is to convert text into vectors (numerical representations) and compare them using distance metrics like Euclidean distance or Cosine similarity.

Example of a vector: [0.9, 0.1, 0.0, 0.9, 0.1]

Item	Vector
onion	[0.9, 0.1, 0.0, 0.9, 0.1]
tomato	[0.8, 0.2, 0.0, 0.8, 0.2]
dhaniya	[0.7, 0.1, 0.0, 0.9, 0.0]
nimbu	[0.1, 0.8, 0.1, 0.7, 0.3]
banana	[0.1, 0.9, 0.1, 0.1, 0.8]
protein	[0.0, 0.1, 0.9, 0.1, 0.7]
creatine	[0.0, 0.0, 0.95, 0.0, 0.1]
bcaa	[0.0, 0.0, 0.92, 0.0, 0.1]

How Distance Calculation Works

For example, to find the distance between points (2,3) and (3,4):

Euclidean distance: √((2-3)² + (3-4)²)

With brute force, the system compares the query vector with every single vector in the database, calculating the distance each time. It then returns the items with the shortest distances.

The Problem with Brute Force

The reason a full scan is slow is because it's perfect. It calculates the distance to every single point and guarantees it finds the absolute closest one. This is called Exact Nearest Neighbor (ENN) search.

While this gives perfect results, it becomes impractical when dealing with billions of vectors, as the computation time scales linearly with the number of items.

ANN Approach: Trading Accuracy for Speed

Approximate nearest neighbor (ANN) is an algorithm that finds a data point in a data set that's very close to the given query point, but not necessarily the absolute closest one.

It sacrifices a tiny amount of accuracy (it might return 9 of the top 10, plus the 11th) for a massive gain in speed (often orders of magnitude faster).

Let's explore several ANN approaches that make vector databases practical for large-scale applications.

First Method: Clustering/Inverted File Index - IVF

Phase 1: Indexing (The one-time setup)

Find the 'Centers': The database first looks at all of your million vectors and uses an algorithm (like k-means clustering) to find, let's say, 100 'center points' or centroids. Each centroid is itself a vector that represents the average of a region in your data. Think of these as the signs for the sections in our library: "Fiction," "Science," etc.
Assign to Clusters: The database then goes through every single one of your million vectors and figures out which of the 100 centroids it's closest to. It then puts a "pointer" to that vector into a list for that centroid.

Searching (When you make a query)

Find the Closest Section(s): The database does NOT compare your query to all one million vectors. Instead, it just compares it to the 100 centroids. This is super fast. It finds that your query is closest to, say, 'Centroid #42' (the "men's casual shirts" section).
Search Only Within That Section: Now, the database knows your best matches are probably in the list for Centroid #42. So it only searches through the 10,000 vectors in that one cluster.

Example

Consider vectors like:

onion: [2,3]
tomato: [3,4]
grapes: [1,2]
paali: [1,6]

By using centroids, we might find that 3 centroids represent our entire dataset well, and each points to a cluster of about 10,000 items, making searches much faster.

Second Method: The Decision Tree Method (Binary Space Partitioning)

What if we just took the whole space and chopped it in half? And then chopped those halves in half? And so on.

How It Works:

First Split: We pick a dimension (e.g., the x-axis) and find the middle point (the median). We draw a vertical line there. All points to the left go into the "left" branch of our tree, and all points to the right go into the "right" branch.
Second Split: Now we look at the two new groups. For each group, we pick the other dimension (the y-axis) and split it in half with a horizontal line.
Keep Splitting: We keep repeating this process, alternating which dimension we split on, until we only have a few points left in each final, tiny rectangle.

Binary Tree Structure

Decision tree with numbered leaf nodes and corresponding vector coordinates

Search 3 Nearest Neighbors: Q = [13, 8]

Approach 3: Hierarchical Navigable Small Worlds - HNSW

How HNSW Works

HNSW organizes vectors in a hierarchical graph structure with multiple layers:

Layer 2: Highest level with few nodes
Layer 1: Mid-level with alpha=3 connections per node
Layer 0: Base level with alpha=6 connections per node

Hierarchical Navigable Small Worlds (HNSW) structure with multiple layers

HNSW works by creating a "skip list"-like structure where higher levels allow for quick, coarse navigation, and lower levels refine the search. This approach provides both fast search speeds and high accuracy.

Approach 4: The Compression Method (Product Quantization - PQ)

The Memory Problem

A single vector using 1536 dimensions of 32-bit floating-point numbers takes up 1536 * 4 bytes = 6,144 bytes. If you have a billion vectors, that's 6.1 Terabytes of RAM! That's incredibly expensive.

Product Quantization solves this by compressing vectors while maintaining their similarity relationships.

How Product Quantization Works

It chops the big vector into smaller segments. For example, a 16-dimension vector could be chopped into 4 segments of 4 dimensions each. [41, 42, ..., 416] → [chunk1], [chunk2], ..., [chunk4]

Step A: Split V into chunks

V = [1.1, 2.3, 0.9, 3.4, 8.8, 7.6, 9.1, 6.5, 4.2, 5.5, 3.4, 5.1, 9.9, 8.1, 7.7, 9.3]
total size = 16*4 = 64 bytes

v_chunk1 = [1.1, 2.3, 0.9, 3.4] -> mapped to [10]
v_chunk2 = [8.8, 7.6, 9.1, 6.5] -> mapped to [23]
v_chunk3 = [4.2, 5.5, 3.9, 5.1] -> mapped to [123]
v_chunk4 = [9.9, 8.1, 7.7, 9.3] -> mapped to [16]

Resulting compressed vector: [10, 23, 123, 16]

Step B: Create the Codebooks (k cluster/centroid)

For each chunk position, we create a codebook of representative vectors:

Chunk 1 (dims 1-4): We take all the first chunks and look at Codebook 1.
Chunk 2 (dims 5-8): We do the same for the second chunks and look at Codebook 2.
Chunk 3 (dims 9-12): This gives us Codebook 3.
Chunk 4 (dims 13-16): This gives us Codebook 4.

Codebook 1 might look like this (for 1 billion records, chunk1 might have 256 clusters/centroids):

0 : [1,2,3,6,4,7,2,1]
1 : [1,1,3,2,4,5,2,8]
2 : [2,2,3,6,5,7,1,1]
3 : [1,1,2,4,1,0,3,3]
...
255 : [4,2,3,6,1,7,2,1]

How to search a query

For a query Q = [1.5, 2.1, 0.8, 3.4, 8.1, 7.9, 9.0, 6.6, 4.8, 5.2, 4.1, 5.0, 9.5, 8.3, 7.9, 9.1]:

Split Q into chunks, just like we did for vectors in our database
For each chunk, compute distances to all centroids in the corresponding codebook
Build a distance lookup table for fast comparisons
Use these tables to estimate distances between Q and compressed vectors

This allows for extremely memory-efficient vector storage while maintaining reasonably accurate similarity searches.

Which Method is "Best"? A Comparison

Method	Speed (Query Latency)	Accuracy (Recall)	Memory Usage
HNSW	Highest	Highest	Highest
IVFPQ (Hybrid)	High	High (with re-ranking)	Lowest by Far
IVF (Standalone)	Medium	Medium	Low
Trees (e.g., KD-Tree)	Low (in high dimensions)	Low	Medium

Database / Library Comparison

Database/Library	Category	Key Feature / Philosophy	Primary Indexing Technique(s)
Milvus	Native	Open-source, highly scalable, MLOps-focused. Decoupled storage & compute.	Highly Flexible: HNSW, IVFPQ, IVFFlat, and others.
Pinecone	Native	Managed service (SaaS), ease of use, performance-focused.	HNSW
Weaviate	Native	Open-source, GraphQL API, connects vectors in a knowledge graph.	HNSW
Qdrant	Native	Open-source, performance-focused with advanced filtering. Written in Rust.	HNSW with custom quantization.
PostgreSQL (pgvector)	Integrated	Keep your structured data and vectors together in a trusted relational DB.	IVFFlat and HNSW
Elasticsearch	Integrated	Combine traditional keyword search with semantic vector search in one query.	HNSW
Redis	Integrated	In-memory speed, combining vector search with other Redis data structures.	HNSW
Faiss (by Meta)	Library	The foundational C++/Python toolkit. Not a DB, but powers many others.	The Reference: HNSW, IVFPQ, and many more.

What Data is Actually Stored

Vector databases typically store three types of information:

// Example record in a vector database
{
  id: "product_456", // unique identifier
  vector: [0.98, 0.23, -0.11, ...], // the numerical representation
  metadata: {
    product_name: "Red Running Shoes",
    price: 89.99,
    link: "https://example.com/shoes/456"
  }
}

Vector representation and bidirectional relationship between text and vectors

Summary: Vector Databases and Their Implementation

1. Vector databases solve the similarity search problem

They allow us to find items based on meaning and similarity rather than exact matches.

2. Brute force is accurate but slow

Calculating distance to every vector gives perfect results but becomes impractical at scale.

3. Approximate methods offer speed-accuracy tradeoffs

IVF clustering, tree-based partitioning, HNSW, and product quantization all offer different approaches to the same problem.

4. Modern vector databases combine techniques

Most production systems use combinations of these methods to balance speed, accuracy, and memory usage.

Continue learning

What Are Vectors How LLMs Work Building AI Chatbots