Vector databases are not always the answer
A practical guide to embeddings, hybrid search, pgvector, reranking, and choosing the simplest retrieval architecture that works.

Vector databases became one of the default answers to almost every AI product question.
Building a chatbot over your documents? Use a vector database.
Building semantic search? Use a vector database.
Building RAG? Use a vector database.
Building product recommendations? Use a vector database.
That answer is sometimes correct. It is also incomplete.
A vector database is a useful tool, but it is not the whole retrieval system. It does not fix bad chunking. It does not understand your permissions model. It does not know which source is stale. It does not magically make retrieval accurate. It does not replace keyword search. It does not remove the need for ranking, filtering, evaluation, or good product design.
Most teams do not actually need "a vector database" first.
They need a search system.
That search system may include vectors. It may include Postgres. It may include BM25. It may include metadata filters. It may include reranking. It may include a dedicated vector database later. But if you start with the tool before understanding the retrieval problem, you can easily add infrastructure without improving the product.
This article is a practical guide to that decision.
Not anti-vector database.
Anti-overbuilding.
What vector databases actually solve
A vector database stores and searches embeddings.
An embedding is a list of numbers that represents the meaning of something: a sentence, paragraph, image, product, user profile, document, code snippet, support ticket, or audio clip. Items with similar meaning should end up close together in vector space.
That lets you search by meaning instead of only matching words.
If a user searches for:
How do I reset my password?
A semantic search system can still find documents that say:
Recover access to your account
A keyword search system might miss that because the exact words are different.
That is the real value of vector search.
Vector databases are built to make this kind of search fast at scale.
They usually provide:
Vector storage
Approximate nearest neighbor search
Metadata filtering
Indexing
APIs for insert and query
Scaling and replication
Monitoring and operational tooling
Dedicated products like Pinecone, Weaviate, Qdrant, and Milvus exist because vector workloads can become large and demanding. Pinecone describes itself as a fully managed vector database built for AI, with automatic indexing and fast queries at scale. Qdrant supports dense and sparse vector approaches and documents hybrid search patterns. Weaviate supports vector search, keyword search, and hybrid search using BM25F and vector fusion.
So yes, vector databases solve real problems.
But they solve a specific layer of the problem.
They help you find similar vectors quickly.
That is not the same as building reliable retrieval.
Similarity is not relevance
This is the first trap.
Vector search finds semantic similarity. Product search, document search, and RAG need relevance.
Those are not the same thing.
A result can be semantically similar and still be wrong.
A support article can talk about billing, but not the billing issue the user has.
A code snippet can look similar, but use the wrong framework version.
A policy document can match the question, but be outdated.
A product can be similar, but unavailable in the user's country.
A paragraph can be close in vector space, but miss the exact keyword that matters.
This is why pure vector search often disappoints in production.
It feels impressive in demos because it understands fuzzy intent. But real users search for exact things too.
They search for:
Error codes
Invoice numbers
API names
Product SKUs
Policy IDs
Function names
Legal terms
Version numbers
Customer names
Acronyms
Dense embeddings are not always good at exact matching. A vector search system may understand that two documents are related while still failing to prioritize the one with the exact identifier the user typed.
That is why keyword search still matters.
PostgreSQL's full text search documentation describes the classic search pipeline: convert documents into a searchable tsvector, convert user input into a tsquery, rank results by relevance, and display the matches usefully. That may sound old compared with embeddings, but it solves a real problem that embeddings do not replace.
Search is not one technique.
It is a ranking problem.
And ranking usually needs more than one signal.
Hybrid search is usually the better default
The better default for many products is hybrid search.
Hybrid search combines semantic search with keyword search. The goal is simple: use vectors to understand meaning and use lexical search to preserve exact matching.
Weaviate's documentation describes hybrid search as combining vector search and keyword search, often BM25, then fusing the result sets. Qdrant describes hybrid search as combining dense vectors for semantic understanding with sparse vectors for precise word matching. Elastic recommends reciprocal rank fusion, or RRF, for combining semantic and lexical result rankings.
The reason is practical.
Users do not search in one mode.
Sometimes they describe meaning.
Sometimes they type exact words.
Sometimes they do both.
Hybrid search handles both sides better.
| Query type | Keyword search | Vector search | Hybrid search |
|---|---|---|---|
| Exact error code | Strong | Weak to medium | Strong |
| Natural language question | Medium | Strong | Strong |
| Product SKU | Strong | Weak | Strong |
| Synonyms | Weak | Strong | Strong |
| Acronyms | Strong if indexed | Unreliable | Stronger |
| Conceptual search | Medium | Strong | Strong |
| Compliance wording | Strong | Risky alone | Stronger |
This matters a lot for RAG.
A language model can only answer well if the retrieval system gives it the right context. If retrieval misses the exact document, the final answer may still sound confident. That is worse than a normal search failure because the user may not know retrieval failed.
Hybrid search reduces that risk.
It does not guarantee correctness.
But it gives your system more ways to find the right thing.
Reranking is where quality often improves
Many teams stop at vector search.
A better retrieval pipeline usually has at least two stages.
First, retrieve a broad set of candidates.
Then rerank them.
The first stage needs speed. The second stage needs quality.
This is common in modern search systems because the fastest retrieval method is not always the best final judge.
A vector index is optimized for quickly finding approximate neighbors. A reranker can compare the query and candidate documents more carefully. It can account for task-specific relevance, exact wording, freshness, source quality, and other signals.
Reranking is especially useful when:
Many chunks are semantically similar
Documents are long
Search results come from multiple sources
Queries are ambiguous
The answer depends on exact context
You need fewer but higher quality chunks for an LLM
Without reranking, teams often stuff too many chunks into the context window. That increases cost and can reduce answer quality because the model has to sort through noisy context.
A smaller set of better chunks is usually better than a larger set of weak chunks.
This is one reason "just use a vector database" is not enough.
The vector database may get you candidates.
It does not finish the ranking job by itself.
pgvector is enough more often than people think
If your application already uses Postgres, start by seriously considering pgvector.
pgvector adds vector similarity search to Postgres. Its official repository describes support for exact and approximate nearest neighbor search, including HNSW and IVFFlat indexes. This means you can store embeddings next to relational data and query them with SQL.
That is a big deal.
For many teams, the hardest part of retrieval is not vector math. It is product logic.
You need to filter by tenant, user permissions, document status, language, region, product, organization, plan, freshness, and visibility.
That data often already lives in Postgres.
Keeping vectors in the same database can simplify the system.
The advantages are boring but important.
| Advantage | Why it matters |
|---|---|
| One database | Less infrastructure to operate |
| SQL joins | Combine vector results with business data |
| Transactions | Keep documents and embeddings consistent |
| Existing backups | Use the backup strategy you already trust |
| Existing permissions | Easier tenant and access filtering |
| Lower operational load | Fewer systems for a small team to manage |
This is especially useful for:
Internal tools
Early RAG systems
SaaS knowledge search
Support search
Product catalogs with moderate scale
Documentation search
Multi-tenant apps that already depend on Postgres permissions
Teams that do not yet know their retrieval workload shape
Does pgvector replace Pinecone, Qdrant, Weaviate, Milvus, or OpenSearch for every use case?
No.
But it can delay the need for another moving part. That delay has value.
Every new database adds operational work: migrations, backups, security, observability, access control, local development, incident response, data consistency, and cost monitoring.
If Postgres can serve the first version well, start there.
You can move later when the problem is clearer.
A dedicated vector database makes sense at real scale
Dedicated vector databases exist for a reason.
There are situations where Postgres is not the right retrieval engine anymore.
You may need a dedicated vector database when:
You have tens or hundreds of millions of vectors
You need very low latency at high query volume
You need heavy concurrent writes and reads
You need vector-native scaling and replication
You need advanced filtering at vector scale
You need multi-modal search across text, image, audio, or video
You need managed operations for a large retrieval workload
You need search features your current database cannot provide cleanly

The mistake is not choosing a dedicated vector database.
The mistake is choosing one before you know why.
If the reason is "we are building AI", that is not enough.
Better reasons sound like this:
Our vector index no longer fits our latency target.
We need higher recall at much larger scale.
We need operational isolation from the primary database.
Our retrieval workload is growing faster than the app database.
We need hybrid search features that are better supported elsewhere.
We need managed scaling because our team cannot operate this layer.
That is a real decision.
Not hype.
Chunking can matter more than the database
Many RAG systems fail before the vector database is even involved.
They fail during chunking.
Chunking is the process of splitting documents into smaller pieces before embedding them. If chunks are too small, they lose context. If chunks are too large, they become noisy and expensive. If chunks ignore document structure, retrieval becomes messy.
A vector database can only search what you give it.
Bad chunks create bad retrieval.
Good chunking usually respects structure.
For example:
Keep headings with their sections.
Avoid splitting tables carelessly.
Preserve code blocks.
Keep policy clauses intact.
Add document title and source metadata.
Include timestamps and version information.
Consider parent-child retrieval for long documents.
Bad chunking looks like this:
Split every 500 characters blindly.
Remove headings.
Mix unrelated sections.
Drop source metadata.
Ignore document versions.
Embed duplicate or stale content.
This is why teams can switch vector databases and see no improvement.
The problem was never the database.
The problem was the shape of the indexed knowledge.
Metadata is not optional
A vector without metadata is rarely enough.
Real retrieval needs filters.
Metadata lets you ask not only "what is similar?" but "what is similar and allowed, current, relevant, and useful?"
Useful metadata includes:
Tenant ID
User access level
Source system
Document type
Created date
Updated date
Version
Language
Region
Product
Department
Sensitivity level
Status
Authoritative source flag

Metadata is also where security enters retrieval.
If a user cannot access a document directly, your RAG system should not expose it through an LLM answer. This is one of the easiest ways to create an accidental data leak.
The permission filter must happen before the answer is generated.
Not after.
A bad pipeline retrieves everything, lets the model answer, and hopes the answer does not reveal sensitive information.
A safer pipeline filters by access before the model sees the content.
This is another reason Postgres can be a strong starting point. Many applications already model users, roles, organizations, and permissions in Postgres. Keeping retrieval near that data can reduce accidental mismatch.
Dedicated vector databases can also support metadata filtering, but you still need to design the permission model carefully.
A vector database is not an authorization strategy.
Freshness beats cleverness
RAG systems often fail because they retrieve old information.
The answer may be semantically perfect and practically wrong.
A pricing document from last year.
An old API guide.
A deprecated policy.
A support article that was replaced.
A legal clause that only applies to one region.
A vector database will not automatically understand that one document is stale unless you model freshness.
Add freshness signals.
| Signal | Why it matters |
|---|---|
created_at |
When content entered the system |
updated_at |
Whether the content changed recently |
effective_from |
When a policy starts applying |
effective_until |
When a policy expires |
version |
Which version is authoritative |
status |
Draft, published, deprecated, archived |
source_priority |
Which source wins during conflict |
Freshness should affect ranking and filtering.
Sometimes you should keep older documents, but label them clearly.
For example, if a user asks about policy history, old documents are relevant. If the user asks what rule applies now, old documents are dangerous.
Search systems need product logic.
Vector similarity alone cannot answer "which source should win?"
The retrieval stack has layers
A serious retrieval system has more layers than most diagrams show.
The vector database is only one box.
Each layer can fail.
| Layer | Common failure |
|---|---|
| Ingestion | Missing or duplicate documents |
| Cleaning | Important structure removed |
| Chunking | Context split badly |
| Embedding | Wrong model or outdated embeddings |
| Storage | Missing metadata or permissions |
| Retrieval | Wrong candidates |
| Filtering | Data leakage or over-filtering |
| Fusion | Bad ranking blend |
| Reranking | Cost or latency too high |
| Context building | Too much noise sent to model |
| Generation | Hallucination or unsupported answer |
| Evaluation | No feedback loop |
When a RAG system performs badly, do not immediately blame the vector database.
Inspect the pipeline.
Most failures hide upstream.
Evaluate retrieval before changing databases
A common mistake is to switch infrastructure when the team has not measured retrieval quality.
Before moving from Postgres to a vector database, or from one vector database to another, build an evaluation set.
You need examples.
Start with 100 to 300 realistic queries.
For each query, label:
The ideal document
The ideal chunk
Acceptable alternative documents
Bad but tempting documents
Whether exact keywords matter
Whether freshness matters
Whether permissions matter
Then measure the retrieval system.
Useful metrics include:
| Metric | What it tells you |
|---|---|
| Recall@k | Did the right result appear in the top k? |
| Precision@k | How noisy are the top results? |
| MRR | How high does the first good result appear? |
| nDCG | How good is the ranked list? |
| Latency | Can users wait for it? |
| Cost | Is it affordable at expected traffic? |
| Permission errors | Did restricted content leak? |
| Freshness errors | Did stale content rank too high? |
This gives you a basis for decisions.
If recall is bad, maybe chunking or embeddings are wrong.
If exact matches are bad, add keyword search.
If good results are retrieved but ranked poorly, add reranking.
If latency is bad at scale, improve indexes or evaluate a dedicated vector database.
If stale results appear, fix metadata and freshness ranking.
If restricted content appears, fix permissions immediately.
Without evals, architecture discussions become opinions.
With evals, they become engineering.
Cost is part of the search design
Vector search has costs that are easy to ignore early.
There is embedding cost.
There is storage cost.
There is indexing cost.
There is query cost.
There is reranking cost.
There is context window cost when retrieved chunks are sent to an LLM.
A bad retrieval system can be expensive even if the database is cheap.
For example, if your search returns noisy chunks, you may send too much context to the model. If you embed duplicate documents, you pay more to store and search them. If your reranker runs on too many candidates, latency and cost rise. If you choose a managed service too early, you may pay for scale you do not yet need.
Cost-aware retrieval design includes:
Deduplicating documents
Not embedding drafts unless needed
Re-embedding only changed content
Keeping chunks meaningful but not huge
Retrieving enough candidates, not too many
Reranking only the top candidates
Caching common queries when appropriate
Measuring context token usage
Using smaller embedding models when quality is acceptable
Starting with existing infrastructure when it is good enough
This is where "vector databases are not always the answer" connects to architecture.
The best retrieval system is not the fanciest one.
It is the one that gives good enough results at a cost and complexity your team can operate.
A practical decision framework
Here is a simple way to decide.
Start with the question:
What kind of search problem do we actually have?
Then choose based on real conditions.
| Situation | Good default |
|---|---|
| Small internal RAG | Postgres plus pgvector |
| Existing SaaS app on Postgres | pgvector plus full text search |
| Documentation search | Hybrid search, often Postgres first |
| Product search | Hybrid search with business ranking |
| Exact code or error search | Keyword search plus optional semantic layer |
| Large-scale semantic search | Dedicated vector database |
| Multi-modal search | Dedicated vector database or search platform |
| Heavy search relevance needs | Search engine plus vectors and reranking |
| Strict permissions | Keep retrieval close to auth data or design filters carefully |
A good rule:
Start with the simplest retrieval system that can be evaluated.
Not the simplest one that feels good.
The simplest one that can be measured.
If pgvector plus Postgres full text search gets strong retrieval metrics, keep it. If it fails for clear reasons, upgrade the right layer.
Common mistakes teams make
Here are the mistakes I see most often.
Mistake 1: Treating RAG as vector search plus prompt
RAG is not just retrieve chunks and paste them into a prompt.
It is ingestion, cleaning, chunking, embedding, indexing, retrieval, filtering, ranking, context construction, generation, citation, evaluation, and monitoring.
Skipping those layers creates fragile systems.
Mistake 2: Ignoring keyword search
Vector search is not a replacement for exact matching.
If users search for error codes, order numbers, SKUs, function names, or policy clauses, keyword search matters.
Mistake 3: No permissions at retrieval time
Filtering after generation is too late.
The model should not see documents the user cannot access.
Mistake 4: No eval dataset
Without labeled queries, you cannot tell whether a database migration, embedding model change, chunking change, or reranking step improved anything.
Mistake 5: Bad metadata
Metadata is how retrieval becomes useful in real products.
Without metadata, you cannot filter by tenant, source, freshness, language, status, or permissions.
Mistake 6: Overusing the context window
Long context windows are helpful, but they do not remove the need for retrieval quality.
More context can mean more noise.
Mistake 7: Choosing infrastructure before understanding workload
A dedicated vector database may be the right answer.
But not because a tutorial used one.
Choose it when scale, latency, features, or operations justify it.
What a sane first version looks like
For many teams, a sane first version looks like this:
This version is not glamorous.
It is useful.
It gives you:
Fewer systems
SQL joins
Permission-aware retrieval
Hybrid search
Measurable results
A migration path later
Start here when the scale is reasonable and your data already lives in Postgres.
Then improve based on evidence.
Possible next steps:
Add better chunking
Add query rewriting
Add reranking
Add feedback capture
Add freshness ranking
Add source quality weights
Add caching for common queries
Move hot retrieval paths to a dedicated search system
Move vectors to a dedicated vector database if needed
This is how you avoid premature infrastructure.
You do not refuse complexity.
You make it earn its place.
When the dedicated vector database wins
A dedicated vector database wins when vector search is not a feature inside your product, but a major workload of your product.
For example:
An AI search product
A recommendation platform
A multi-modal media search engine
A large RAG platform serving many teams
A marketplace with huge semantic search volume
A security analytics product using embeddings at scale
A company-wide knowledge platform with strict latency targets
At that point, the vector layer deserves specialized infrastructure.
You may want:
Better vector indexing at scale
Easier horizontal scaling
Managed operations
Vector-native APIs
Advanced hybrid search
Better observability for retrieval
Separation from the transactional database
Search-specific replication and availability
That is a good reason.
The point is not "never use Pinecone" or "always use pgvector".
The point is to avoid confusing categories.
Postgres is a great application database with vector search capabilities.
A dedicated vector database is a specialized retrieval system.
A search engine is a ranking and discovery system.
A RAG pipeline is an application architecture.
Those are related, but they are not the same thing.
The architecture should match the product
Search for a legal document is different from search for a sneaker.
Search for code is different from search for support tickets.
Search for medical policies is different from search for blog posts.
Search for internal company knowledge is different from public ecommerce search.
The architecture should match the risk, user expectation, and data shape.
| Product type | Retrieval priority |
|---|---|
| Support assistant | Correct source, freshness, escalation path |
| Developer docs | Exact API names, versions, examples |
| Ecommerce | Availability, price, personalization, ranking |
| Legal docs | Exact wording, citation, jurisdiction, version |
| Internal knowledge | Permissions, source quality, freshness |
| Code search | Symbols, paths, identifiers, semantic similarity |
| Media search | Multi-modal embeddings and metadata |
This is why generic RAG tutorials can be dangerous.
They make every retrieval problem look the same.
Real retrieval is domain-specific.
A legal assistant should be conservative.
A shopping search can be fuzzy.
A code assistant needs exact symbols.
A support assistant needs escalation logic.
A vector database cannot decide those priorities for you.
Your architecture has to encode them.
The boring answer is usually better
The boring answer is not "do not use vectors".
The boring answer is:
Use keyword search where exact matching matters.
Use vector search where meaning matters.
Use metadata where product rules matter.
Use reranking where quality matters.
Use evals where decisions matter.
Use a dedicated vector database when scale or features justify it.
Use Postgres when it is good enough and already central to your app.
That is less exciting than a new database logo.
It is also how better systems get built.
The teams that win with RAG and semantic search will not be the teams that collected the most AI infrastructure. They will be the teams that understood their retrieval problem clearly.
They will know what users search for.
They will know which documents are trusted.
They will know how permissions work.
They will know whether exact terms matter.
They will know when freshness beats similarity.
They will know how to measure quality.
Only then will the database choice become obvious.
Vector databases are useful.
They are just not always the answer.
Sometimes the answer is Postgres.
Sometimes it is hybrid search.
Sometimes it is better metadata.
Sometimes it is reranking.
Sometimes it is better chunking.
Sometimes it is an eval set.
And sometimes, yes, it is a dedicated vector database.
The hard part is knowing which problem you actually have.
That is where architecture starts.



