Weaviate

Open-source vector database with built-in vectorization modules, native hybrid search, and Kubernetes-ready deployment.

Overview

Weaviate is an open-source (BSD-3) vector database written in Go that differentiates itself through built-in vectorization modules, native hybrid search (BM25 + vector), and a GraphQL API. Unlike Pinecone (managed-only) or Qdrant (bring-your-own-embeddings), Weaviate can generate embeddings internally via pluggable modules — eliminating the need for a separate embedding pipeline.

For DevOps and platform teams, Weaviate is notable for its Kubernetes-native architecture: it ships as a Helm chart with StatefulSet deployment, configurable replication, and multi-tenancy support. This makes it a natural choice for teams that want full control over data residency and infrastructure.

Architecture

┌────────────────────────────────────────────────────────┐
│                   Weaviate Cluster                      │
│                                                         │
│  ┌───────────────────────────────────────────────────┐ │
│  │                 API Layer                          │ │
│  │  ┌──────────┐  ┌───────────┐  ┌───────────────┐  │ │
│  │  │ REST API │  │ GraphQL   │  │ gRPC API      │  │ │
│  │  │          │  │ API       │  │ (high-perf)   │  │ │
│  │  └──────────┘  └───────────┘  └───────────────┘  │ │
│  └───────────────────────┬───────────────────────────┘ │
│                          │                              │
│  ┌───────────────────────┼───────────────────────────┐ │
│  │              Core Engine                           │ │
│  │  ┌──────────────┐  ┌────────────────────────────┐ │ │
│  │  │ HNSW Vector  │  │ Inverted Index (BM25)      │ │ │
│  │  │ Index        │  │ Keyword search             │ │ │
│  │  └──────────────┘  └────────────────────────────┘ │ │
│  │                                                    │ │
│  │  ┌──────────────┐  ┌────────────────────────────┐ │ │
│  │  │ Object Store │  │ Metadata / Property Index  │ │ │
│  │  │ (LSM)        │  │                            │ │ │
│  │  └──────────────┘  └────────────────────────────┘ │ │
│  └────────────────────────────────────────────────────┘ │
│                                                         │
│  ┌───────────────────────────────────────────────────┐ │
│  │           Module Layer                             │ │
│  │  ┌──────────────┐  ┌────────────┐  ┌───────────┐ │ │
│  │  │ text2vec-    │  │ text2vec-  │  │ img2vec-  │ │ │
│  │  │ openai       │  │ transformers  │ openai    │ │ │
│  │  ├──────────────┤  ├────────────┤  ├───────────┤ │ │
│  │  │ generative-  │  │ multi2vec- │  │ reranker- │ │ │
│  │  │ openai       │  │ clip       │  │ cohere    │ │ │
│  │  └──────────────┘  └────────────┘  └───────────┘ │ │
│  └───────────────────────────────────────────────────┘ │
│                                                         │
│  ┌───────────────────────────────────────────────────┐ │
│  │     Clustering & Multi-tenancy                     │ │
│  │  • Sharding across nodes                           │ │
│  │  • Configurable replication factor                 │ │
│  │  • Tenant isolation (hot/cold/frozen states)       │ │
│  └───────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────┘

Key Architecture Features

Feature	Description
Vectorizer Modules	Built-in embedding generation via pluggable modules (OpenAI, Cohere, Hugging Face, local transformers)
Hybrid Search	Native combination of BM25 keyword search and vector similarity in a single query
GraphQL API	Complex queries with filters, aggregations, and cross-references
Multi-tenancy	Native tenant isolation with per-tenant activity states (hot, cold, frozen)
Replication	Configurable replication factor for read throughput and fault tolerance
Generative Modules	Built-in RAG — retrieve and generate in a single API call

Use Cases

Self-Hosted RAG with Built-in Vectorization

Eliminate the external embedding pipeline:

import weaviate
import weaviate.classes.config as wc

client = weaviate.connect_to_local()

# Create collection with built-in vectorizer
collection = client.collections.create(
    name="KnowledgeBase",
    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(
        model="text-embedding-3-small",
    ),
    generative_config=wc.Configure.Generative.openai(
        model="gpt-4o",
    ),
    properties=[
        wc.Property(name="content", data_type=wc.DataType.TEXT),
        wc.Property(name="source", data_type=wc.DataType.TEXT),
        wc.Property(name="department", data_type=wc.DataType.TEXT),
    ],
)

# Insert objects — Weaviate generates embeddings automatically
collection.data.insert({"content": "Kubernetes HPA scales pods...", "source": "docs"})

# Hybrid search (vector + keyword)
results = collection.query.hybrid(
    query="how does pod autoscaling work",
    alpha=0.75,  # 0.75 = 75% vector, 25% keyword
    limit=5,
    filters=weaviate.classes.query.Filter.by_property("source").equal("docs"),
)

RAG in a Single API Call

Use generative modules for retrieve-and-generate:

# Search + generate in one call
response = collection.generate.near_text(
    query="kubernetes autoscaling best practices",
    limit=5,
    grouped_task="Summarize these documents into a concise answer about autoscaling.",
)

print(response.generated)  # LLM-generated summary based on retrieved docs

Multi-Tenant SaaS

Isolate customer data with native multi-tenancy:

# Enable multi-tenancy on collection
collection = client.collections.create(
    name="CustomerDocs",
    multi_tenancy_config=wc.Configure.multi_tenancy(enabled=True),
    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(),
)

# Create tenants
collection.tenants.create([
    wc.Tenant(name="customer-abc", activity_status=wc.TenantActivityStatus.HOT),
    wc.Tenant(name="customer-xyz", activity_status=wc.TenantActivityStatus.HOT),
])

# Search within a specific tenant
tenant_collection = collection.with_tenant("customer-abc")
results = tenant_collection.query.hybrid(query="deployment patterns", limit=5)

Kubernetes Production Deployment

Deploy Weaviate as a StatefulSet with Helm:

helm repo add weaviate https://weaviate.github.io/weaviate-helm

helm install weaviate weaviate/weaviate \
  --set replicas=3 \
  --set storage.size=100Gi \
  --set env.AUTHENTICATION_APIKEY_ENABLED=true \
  --set env.AUTHENTICATION_APIKEY_ALLOWED_KEYS="read-key,write-key" \
  --set env.AUTHENTICATION_APIKEY_USERS="reader,writer" \
  --set env.ENABLE_MODULES="text2vec-openai,generative-openai,reranker-cohere" \
  --set env.DEFAULT_VECTORIZER_MODULE="text2vec-openai" \
  --set env.REPLICATION_FACTOR=3

Pros and Cons

Pros

Open-source — BSD-3 license, full self-hosted control
Built-in vectorization — No external embedding pipeline needed
Native hybrid search — BM25 + vector in a single query
Multi-tenancy — Per-tenant activity states (hot/cold/frozen) for cost management
GraphQL API — Complex queries with filters, aggregations, cross-references
Generative modules — Built-in RAG (retrieve + generate in one call)
Kubernetes-native — Helm chart with StatefulSet, replication, and PVCs

Cons

Memory-intensive — HNSW indexes require significant RAM for large datasets
Operational complexity — Self-hosted requires cluster management expertise
Go codebase — Harder to debug for Python-centric AI teams
Module configuration — Many modules and settings create configuration complexity
Cloud pricing — Weaviate Cloud Services pricing is not transparent

Integration with AI Infrastructure

Kubernetes: Deploy on Kubernetes AI infrastructure as a StatefulSet alongside model serving
RAG: Core retrieval component in production RAG systems
Security: Row-level access control for secure LLM pipelines
Monitoring: Query metrics feed into the AI observability stack

Overview​

Architecture​

Key Architecture Features​

Use Cases​

Self-Hosted RAG with Built-in Vectorization​

RAG in a Single API Call​

Multi-Tenant SaaS​

Kubernetes Production Deployment​

Pros and Cons​

Pros​

Cons​

Integration with AI Infrastructure​

Related​