Engineering February 2026

Vector Databases Explained: Storing AI's Memory

By Bartosz K. — Published: 19 February 2026 — Updated: 27 February 2026 — 11 min read

Contents

What Is a Vector?
What Is a Vector Database?
How Similarity Search Works
Why Vector Databases Matter for AI
Retrieval-Augmented Generation (RAG)
Choosing a Vector Database
Popular Vector Database Options
Limitations and Trade-offs

A quiet infrastructure revolution is happening underneath modern AI applications. Alongside the large language models and image generators that have captured public attention, a new category of database has emerged as essential plumbing: the vector database. If you are building AI systems in 2026, understanding vector databases is no longer optional — they are the mechanism by which AI systems find relevant information quickly, and the foundation of some of the most powerful patterns in applied AI.

What Is a Vector?

Before we can understand vector databases, we need to understand what a vector is in the machine learning sense. A vector is simply a list of numbers — for example, [0.12, -0.83, 0.44, 0.67, ...]. In AI systems, vectors are used to represent the meaning of data in a mathematical form that a computer can reason about.

When a text embedding model processes the sentence "the cat sat on the mat", it produces a vector — typically 384, 768, or 1536 numbers long — that encodes the semantic meaning of that sentence. The remarkable property of these vectors is that similar meanings produce similar vectors. "A feline rested on the rug" would produce a vector that is mathematically close to the first one, even though the words are completely different.

This same idea applies to other types of data. Images can be encoded as vectors that capture visual similarity. Audio can be encoded so that similar sounds produce similar vectors. Products can be embedded so that similar items cluster together. In all cases, the embedding model transforms raw data into a point in high-dimensional space, where proximity in space corresponds to similarity in meaning or content.

What Is a Vector Database?

A vector database is a database optimised for storing, indexing, and querying these high-dimensional vectors. Traditional relational databases (PostgreSQL, MySQL) are optimised for exact matches and range queries on structured data. They can store vectors, but searching them requires computing the distance from a query vector to every stored vector — an operation that becomes prohibitively slow as the dataset grows.

Vector databases solve this with specialised indexing structures — typically approximate nearest-neighbour (ANN) algorithms — that allow you to find the most similar vectors to a query in milliseconds, even across hundreds of millions of entries. The trade-off is "approximate" rather than exact: you get the closest results with high probability, but not a guaranteed exhaustive search.

How Similarity Search Works

The core operation of a vector database is nearest-neighbour search: given a query vector, find the k most similar vectors in the database. Similarity is typically measured using:

Cosine similarity — measures the angle between two vectors, ignoring magnitude. This is the most common metric for text embeddings.
Euclidean distance (L2) — measures the straight-line distance between two points in vector space. Common for image and audio embeddings.
Dot product — fast and effective when vectors are normalised, often used in recommendation systems.

The indexing algorithms that make fast approximate search possible include:

HNSW (Hierarchical Navigable Small World) — a graph-based algorithm that builds a layered structure allowing fast traversal. This is currently the most popular algorithm for its combination of speed and recall.
IVF (Inverted File Index) — divides the vector space into clusters, then searches only the most relevant clusters at query time.
PQ (Product Quantization) — compresses vectors to reduce memory usage, often combined with IVF for very large datasets.

Why Vector Databases Matter for AI

The rise of vector databases is directly tied to the rise of large language models and embedding models. These models produce rich representations of meaning, but they have a critical limitation: they only know what was in their training data. They cannot access your company's internal documents, your product catalogue, your customer records, or anything that changed after their training cutoff date.

Vector databases bridge this gap. By embedding your proprietary data and storing the vectors, you create a semantic search system that can find relevant information at query time — and then pass that information to a language model as context. This is the foundation of Retrieval-Augmented Generation.

Beyond RAG, vector databases power:

Semantic search — search that understands meaning rather than just keywords. A user searching for "how do I cancel my subscription" finds relevant results even if the documentation uses different phrasing.
Recommendation systems — finding products, articles, or users that are similar to a given item based on learned embeddings.
Deduplication and clustering — identifying near-duplicate content or grouping similar items without hand-crafted rules.
Anomaly detection — finding data points that are far from all known examples in embedding space.
Multi-modal search — searching across text, images, and audio using a shared embedding space.

Retrieval-Augmented Generation (RAG)

RAG deserves its own section because it has become one of the most important patterns in applied AI. The problem it solves is fundamental: large language models are powerful but their knowledge is frozen at training time, and they cannot access your specific data.

RAG works in two phases. In the indexing phase, you take your documents — internal wikis, product manuals, support articles, contracts, research papers — split them into chunks, embed each chunk using an embedding model, and store the vectors in a vector database along with the original text.

In the query phase, when a user asks a question, you embed the question and retrieve the k most semantically similar chunks from the vector database. You then pass those chunks to a language model along with the question and instruct it to answer based on the provided context. The result is an answer grounded in your actual data, with citations possible, and without hallucination about facts that are not in the retrieved context.

RAG systems require careful engineering around chunking strategy, retrieval quality, context window management, and prompt design — but the core architecture is now well-established and production-proven.

Choosing a Vector Database

The right vector database depends on your scale, infrastructure preferences, performance requirements, and existing stack. Key dimensions to consider:

Scale — how many vectors do you need to store? Options range from in-memory libraries suitable for millions of vectors to distributed databases handling billions.
Query throughput and latency — how many queries per second do you need, and what is your acceptable response time?
Metadata filtering — do you need to combine vector similarity search with attribute filters (e.g., "find similar products from category X")?
Managed vs. self-hosted — do you prefer a managed cloud service or a self-hosted deployment?
Cost — vector database costs can vary significantly. At large scale, storage and query costs become material.

Popular Vector Database Options

The landscape has evolved rapidly. Here are the main options in 2026:

Pinecone — fully managed, easy to start with, good performance. Best for teams that want a hosted solution without operational overhead.
Weaviate — open-source with a managed cloud offering. Strong on hybrid search (combining vector and keyword search) and built-in embedding models.
Qdrant — open-source, written in Rust, fast and efficient. Good for self-hosted deployments. Excellent metadata filtering capabilities.
Chroma — lightweight, easy to embed in Python applications. Excellent for development and small-scale production use cases.
FAISS — Facebook's research library. Not a database per se, but the underlying algorithm that powers many others. Suitable for high-performance in-process search.
pgvector — a PostgreSQL extension that adds vector storage and similarity search. If you are already on PostgreSQL and your scale is moderate, this avoids introducing a separate service.

Limitations and Trade-offs

Vector databases are not a silver bullet. Important limitations to understand:

Approximate, not exact. ANN algorithms trade recall for speed. In most applications this is acceptable, but for use cases requiring exhaustive search, you may need exact nearest-neighbour methods that are slower at scale.

Embedding quality determines retrieval quality. A vector database is only as good as the embedding model producing the vectors. Mismatched embedding models for indexing and query, domain mismatch, or poor-quality embeddings will produce poor retrieval results regardless of the database's performance.

Stale data requires re-embedding. When your source documents change, the corresponding embeddings must be updated. Managing this pipeline — detecting changes, re-embedding, updating the index — is an operational responsibility that is easy to overlook in early development.

Not a replacement for traditional databases. Vector databases complement rather than replace relational or document databases. Most production systems use both — the vector database for similarity search, and a traditional database for structured data, user records, and transactions.

Key Takeaways

Vectors are numerical representations of meaning; similar meanings produce similar vectors.
Vector databases enable fast approximate nearest-neighbour search across millions of vectors.
RAG (Retrieval-Augmented Generation) uses vector databases to ground LLM responses in your specific data.
Embedding quality is the most important factor — garbage embeddings produce garbage retrieval.
Choose the right tool for your scale: pgvector for small/moderate workloads, dedicated vector DBs for larger scale.