Back to Play 11 Resources
Play 11: Knowledge Base Q&A

Play 11 Complete Implementation Guide

Full walkthrough: document audit, vector DB setup, internal knowledge portal, semantic search, citation system.

Play 11 Complete Implementation Guide

You need a knowledge base that actually works. Not a document graveyard where information goes to die, but a system that surfaces the right answer in under 30 seconds. This guide walks you through building a production-grade Q&A system with semantic search, bot integration, and proper citation tracking.

Timeline: 3-4 weeks for initial deployment. Budget: $500-2,000/month for tooling at 50-person firm scale.

1. Document Audit and Preparation

Start with an honest inventory. Most firms have knowledge scattered across SharePoint, Google Drive, Confluence, email threads, and partner hard drives. You need it all in one place.

Week 1: Discovery and Collection

  1. Map every knowledge repository in your firm. Create a spreadsheet with columns: Location, Owner, Document Count, Last Updated, Access Level.
  2. Prioritize by usage frequency. Start with client deliverables, training materials, and process documentation. Skip marketing collateral and expired proposals.
  3. Export everything to a staging folder. Use native export tools (SharePoint migration API, Google Takeout, Confluence export). Maintain original folder structure for now.

Document Cleaning Checklist

Run every document through this filter:

  • Remove headers, footers, page numbers, and watermarks using Adobe Acrobat batch processing or Docparser.
  • Strip out boilerplate sections that appear in multiple documents (standard disclaimers, signature blocks).
  • Redact client names, financial figures, and confidential data. Use regex patterns: \$[\d,]+ for dollar amounts, [A-Z][a-z]+ (LLC|Inc\.|Corporation) for company names.
  • Convert all files to plain text or Markdown. PDFs go through OCR if needed (use Tesseract or Adobe's built-in OCR).
  • Normalize filenames: YYYY-MM-DD_DocumentType_Topic.txt (example: 2024-01-15_Memo_Section1031Exchange.txt).

Metadata Extraction

Your vector database needs rich metadata for filtering. Extract and standardize:

  • Document type (memo, brief, training guide, process doc)
  • Practice area or department
  • Author and reviewer names
  • Creation and last-modified dates
  • Client matter number (if applicable)
  • Confidence level (draft, reviewed, approved)

Store metadata in a CSV with columns: filename, doc_type, practice_area, author, date_created, date_modified, matter_id, status. This becomes your source of truth.

2. Vector Database Configuration

Skip the analysis paralysis. For professional services firms under 100 people, use Pinecone (easiest) or Qdrant (self-hosted option). Over 100 people or handling 50,000+ documents, evaluate Weaviate.

Pinecone Setup (Recommended Path)

  1. Create account at pinecone.io. Start with Starter plan ($70/month, 100K vectors).
  2. Create index named firm-knowledge with dimensions=1536 (matches OpenAI ada-002 embeddings), metric=cosine, pod-type=p1.
  3. Install Python client: pip install pinecone-client openai.
  4. Generate embeddings and upload:
import pinecone
import openai
from pathlib import Path

pinecone.init(api_key="your-key", environment="us-west1-gcp")
index = pinecone.Index("firm-knowledge")

def embed_and_upload(text_file, metadata):
    with open(text_file) as f:
        content = f.read()
    
    # Chunk into 500-word segments with 50-word overlap
    chunks = chunk_text(content, chunk_size=500, overlap=50)
    
    for i, chunk in enumerate(chunks):
        embedding = openai.Embedding.create(
            input=chunk,
            model="text-embedding-ada-002"
        )['data'][0]['embedding']
        
        index.upsert(vectors=[(
            f"{text_file.stem}_chunk{i}",
            embedding,
            {**metadata, "text": chunk, "chunk_id": i}
        )])

# Process all documents
for doc in Path("cleaned_docs").glob("*.txt"):
    metadata = get_metadata_from_csv(doc.name)
    embed_and_upload(doc, metadata)

Chunking Strategy

Don't embed entire documents. Break into logical segments:

  • 500 words per chunk for general knowledge
  • 200 words per chunk for dense technical content
  • 50-word overlap between chunks to preserve context
  • Store chunk position in metadata for reassembly

Index Optimization

Create metadata filters for common query patterns:

  • practice_area filter: Allows "show me only tax documents"
  • date_created range filter: "documents from last 6 months"
  • doc_type filter: "only show process guides"

Test retrieval with 20 sample queries spanning your practice areas. Relevant results should appear in top 3 for 80% of queries. If not, adjust chunk size or re-evaluate document cleaning.

3. Internal Knowledge Portal

Build a simple web portal at a memorable internal URL (kb.yourfirm.com or yourfirm.com/kb behind SSO). One search box, one answer pane, no chat client to install. Operators already live in their browser - meet them there.

Portal Stack

  • Frontend: Next.js + Tailwind CSS, hosted on Vercel behind your SSO provider (Okta, Azure AD, or Google Workspace).
  • Backend: FastAPI service exposing one POST endpoint at /api/ask, hosted on Railway, Render, or your existing infra.
  • Auth: SSO via NextAuth or a reverse-proxy header. No public access.

Search Endpoint

from fastapi import FastAPI, Header, HTTPException
from pydantic import BaseModel
import pinecone
import openai

pinecone.init(api_key="your-key", environment="us-west1-gcp")
index = pinecone.Index("firm-knowledge")

app = FastAPI()

class AskRequest(BaseModel):
    question: str

@app.post("/api/ask")
def ask(req: AskRequest, x_sso_user: str | None = Header(default=None)):
    if not x_sso_user:
        raise HTTPException(status_code=401, detail="not authenticated")

    question = (req.question or "").strip()
    if not question:
        raise HTTPException(status_code=400, detail="question required")

    embedding = openai.Embedding.create(
        input=question,
        model="text-embedding-ada-002",
    )["data"][0]["embedding"]

    results = index.query(vector=embedding, top_k=3, include_metadata=True)

    sources = []
    for match in results["matches"]:
        md = match["metadata"]
        sources.append({
            "filename": md["filename"],
            "excerpt": md["text"][:300] + ("..." if len(md["text"]) > 300 else ""),
            "practice_area": md.get("practice_area"),
            "date_created": md.get("date_created"),
            "relevance": round(match["score"], 2),
        })

    # Synthesize an answer with citations
    context = "\n\n".join(f"[{i+1}] {s['excerpt']}" for i, s in enumerate(sources))
    completion = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer the firm's question using only the provided sources. Cite source numbers inline as [1], [2], [3]. If the sources do not contain the answer, say so."},
            {"role": "user", "content": f"Question: {question}\n\nSources:\n{context}"},
        ],
        temperature=0.1,
    )

    return {
        "answer": completion["choices"][0]["message"]["content"],
        "sources": sources,
        "user": x_sso_user,
    }

Frontend Behavior

  • Single search box, focus on page load.
  • Submit posts to /api/ask. Render the answer first, then each cited source as a collapsible card with filename, excerpt, practice area, last-modified date, and relevance score.
  • Log every question with the SSO user, timestamp, top source IDs, and a 1-click thumbs-up / thumbs-down. That feedback table becomes the input for weekly tuning sessions.
  • "Copy answer" button at the bottom of every result for partners who want to paste into a client email.

Email Fallback

For operators who do not want to open a browser, set up an internal mailbox like kb@yourfirm.com. An n8n workflow watches the inbox, runs the same /api/ask endpoint, and replies with the answer plus cited sources in the email body. Same brain, different surface.

Usage and Tuning

  • Daily: scan the query log for low-relevance questions (top score under 0.7). Those are knowledge gaps.
  • Weekly: publish the top 3 unanswered queries to the document owner via email so they get answered or the source document gets fixed.
  • Monthly: review thumbs-down feedback for prompt or chunking changes.

4. Semantic Search Interface

The single-question portal in section 3 handles ~70% of queries. Build a richer search interface for complex research sessions where operators need to browse multiple sources and apply filters.

Search UI Stack

  • Frontend: Next.js with Tailwind CSS
  • Backend: FastAPI Python service
  • Hosting: Vercel (frontend) + Railway (backend)

Search Endpoint

from fastapi import FastAPI, Query
from pydantic import BaseModel

app = FastAPI()

class SearchRequest(BaseModel):
    query: str
    filters: dict = {}
    top_k: int = 10

@app.post("/search")
async def search(request: SearchRequest):
    query_embedding = openai.Embedding.create(
        input=request.query,
        model="text-embedding-ada-002"
    )['data'][0]['embedding']
    
    results = index.query(
        vector=query_embedding,
        top_k=request.top_k,
        filter=request.filters,
        include_metadata=True
    )
    
    # Group chunks from same document
    grouped = {}
    for match in results['matches']:
        doc_id = match['metadata']['filename']
        if doc_id not in grouped:
            grouped[doc_id] = {
                'filename': doc_id,
                'chunks': [],
                'max_score': 0,
                'metadata': match['metadata']
            }
        grouped[doc_id]['chunks'].append({
            'text': match['metadata']['text'],
            'score': match['score']
        })
        grouped[doc_id]['max_score'] = max(
            grouped[doc_id]['max_score'],
            match['score']
        )
    
    # Sort by best chunk score
    ranked = sorted(
        grouped.values(),
        key=lambda x: x['max_score'],
        reverse=True
    )
    
    return {"results": ranked}

Advanced Search Features

Implement these filters in your UI:

  • Date range picker: "Documents created between [start] and [end]"
  • Practice area dropdown: Multi-select with your firm's practice areas
  • Document type checkboxes: Memo, Brief, Guide, Process Doc
  • Author search: Autocomplete from your staff directory
  • Confidence filter: Show only "Approved" status documents

Query Suggestions

Track all searches in a PostgreSQL table: searches(id, query, user_id, timestamp, clicked_result). Generate suggestions:

SELECT query, COUNT(*) as frequency
FROM searches
WHERE clicked_result IS NOT NULL
GROUP BY query
ORDER BY frequency DESC
LIMIT 10;

Display these as "Popular searches" on the homepage.

5. Citation System

Every answer needs a source. Build citation tracking into the retrieval flow.

Citation Format Standards

Use this format for internal documents:

[Author Last Name], [Document Title], [Practice Area], [Date Created]. Internal Doc ID: [filename]

Example: Chen, Section 1031 Exchange Memo, Tax, 2024-01-15. Internal Doc ID: 2024-01-15_Memo_Section1031Exchange

Automatic Citation Generation

Add citation button to every search result:

def generate_citation(metadata):
    author = metadata.get('author', 'Unknown')
    title = metadata.get('filename', '').replace('_', ' ')
    practice = metadata.get('practice_area', 'General')
    date = metadata.get('date_created', 'n.d.')
    doc_id = metadata.get('filename', '')
    
    return f"{author}, {title}, {practice}, {date}. Internal Doc ID: {doc_id}"

Display with one-click copy button in your UI.

Citation Analytics Dashboard

Track which documents drive the most value:

  • Most cited documents (monthly leaderboard)
  • Citation count by practice area
  • Authors with highest citation rates
  • Documents with zero citations in 90 days (candidates for archival)

Build a simple dashboard with Metabase or Grafana connected to your search logs database.

Usage Metrics to Monitor

Week 1: Track baseline query volume and response time.

Week 4: Measure these KPIs:

  • Average time to first result: Target under 2 seconds
  • Click-through rate on top result: Target above 60%
  • Queries with zero results: Target under 10%
  • Daily active users: Target 40% of firm within 30 days
  • Repeat usage rate: Target 70% of users return within 7 days

Set up weekly review meetings. Adjust chunking strategy, add missing documents, and refine metadata based on actual usage patterns.

Bottom Line

This system replaces the "email the senior associate" workflow with instant, cited answers. Expect 15-20 hours per week saved across a 50-person firm once adoption hits 60%. The ROI shows up in faster client response times and reduced duplicate work.

Get the Book

The full system, end to end.

Looking to build your AI workforce? Get the comprehensive guide for professional services - the 12 plays, the frameworks, and the field-tested playbooks.

Buy on Amazon
Revenue Institute

Reviewed by Revenue Institute

This guide is actively maintained and reviewed by the implementation experts at Revenue Institute. As the creators of The AI Workforce Playbook, we test and deploy these exact frameworks for professional services firms scaling without new headcount.

Done-For-You Implementation

Need help turning this guide into reality?

Revenue Institute builds and implements the AI workforce for professional services firms.

Work with Revenue Institute