ragwise¶
The pip-installable Python RAG library built for 2026 — hybrid search, retrieval observability, temporal filtering, and agent tools on by default.
No Docker. No background server. No framework lock-in.
from ragwise import RAG, QueryConfig
async with RAG(llm="openai/gpt-4o-mini", reranker="flashrank") as rag:
result = await rag.ingest("./docs/")
print(result) # IngestResult(chunks_created=42, skipped=0, failed_files=[])
answer = await rag.query("What is the refund policy?")
print(answer.text)
print(answer.citations[0].text) # passage text, not just filename
print(answer.trace.retrieval_ms) # 34 — always populated
print(answer.trace.cost_usd) # 0.00021
Hybrid BM25+dense retrieval runs automatically on every query. answer.trace is always populated — no setup needed.
Why ragwise?¶
| Feature | ragwise | LangChain | LlamaIndex | RAGFlow |
|---|---|---|---|---|
| Lines to get started | 4 | 40+ | 20+ | Docker setup |
| Hybrid search by default | ✅ | ❌ | opt-in | ✅ (Docker) |
| pip install, no server | ✅ | ✅ | ✅ | ❌ |
| Async-first | ✅ | partial | partial | ❌ |
| Streaming responses | ✅ | partial | partial | ❌ |
| Retrieval trace (always-on) | ✅ | ❌ | ❌ | ❌ |
| Passage-level citations | ✅ | ❌ | partial | ❌ |
Temporal filtering (as_of) |
✅ | ❌ | ❌ | ❌ |
| Agent tool built-in | ✅ | ❌ | ❌ | ❌ |
| Multi-tenant isolation | ✅ | ❌ | ❌ | ❌ |
| Built-in eval | ✅ | ❌ | partial | ❌ |
Chunking accuracy: matches the FloTorch Feb 2026 benchmark winner at 69% end-to-end accuracy using RecursiveChunker.
Core Features¶
Hybrid search — on by default¶
BM25+dense retrieval fused with Reciprocal Rank Fusion. Catches exact keywords and semantic meaning on every query, no configuration needed.
Retrieval observability¶
answer = await rag.query("...")
print(answer.trace.retrieval_ms) # 34
print(answer.trace.cost_usd) # 0.00021
print(answer.citations[0].text) # passage text
print(answer.citations[0].score) # 0.91
Temporal filtering¶
# Query as of any date — no competitor has this
answer = await rag.query(
"refund policy?",
config=QueryConfig(as_of="2024-06-15"),
)
Semantic cache¶
async with RAG(cache=True, cache_threshold=0.92, ...) as rag:
a = await rag.query("How do refunds work?")
print(a.trace.cache_hit) # True on similar queries — <10ms
Streaming¶
Agent tools (stateful)¶
from ragwise.agent import as_claude_tool_suite
tools = as_claude_tool_suite(rag, max_iterations=5)
# search_documents + get_document_context + check_context_budget
Multi-tenant isolation¶
await rag.ingest("./org_a/", tenant_id="org_a")
await rag.ingest("./org_b/", tenant_id="org_b")
answer = await rag.query("policy?", config=QueryConfig(tenant_id="org_a"))
Store upgrade path — no code changes¶
store="memory" # dev / CI — zero setup, volatile
store="lance://./ragwise-index" # persistent dev — no server
store="postgresql://..." # production — pgvector
Next Steps¶
- Getting Started — install, first query, config
- Stores — full store documentation
- API Reference —
RAG,QueryConfig,Answer - Configuration Reference — full
RAGConfig+QueryConfigfield table - Observability —
answer.trace, citations,ragwise doctor - Temporal Filtering —
as_of,valid_from/until, TTL - Agent Tools —
AgentSession,as_claude_tool_suite - Testing Guide — VCR cassettes,
FakeEmbedder, pytest plugin - FastAPI Integration —
RAGLifespan,Depends(get_rag), streaming - Guides: Hybrid Search — how BM25+dense+RRF works
- Guides: Streaming — token streaming for UIs
- Guides: Evaluation — measure and gate retrieval quality