Unified Knowledge Base Implementation
Project Overview
This is a comprehensive organizational knowledge base that centralizes all internal company knowledge, including process documentation, best practices, project archives, and employee expertise. The system provides advanced search capabilities using both traditional keyword search and modern semantic search powered by LLM embeddings.
Features
🔍 Advanced Search
- Keyword Search: Traditional text matching with fuzzy search
- Semantic Search: AI-powered understanding using sentence embeddings
- Hybrid Search: Combines both approaches for optimal results
- Real-time search suggestions and auto-complete
📊 Data Sources
- Confluence pages and spaces
- Jira tickets and projects
- SharePoint documents
- Network drives and file shares
- Git repositories (README files and wikis)
- Internal SQL databases
🚀 Key Capabilities
- Natural language querying
- Content versioning and history
- User and group-based access control
- Analytics and usage reporting
- Automated content synchronization
- Intelligent content recommendations
Technology Stack
Backend
- API: FastAPI (Python)
- Database: PostgreSQL with pgvector extension
- Search Engine: Elasticsearch
- Cache: Redis
- ETL: Apache Airflow
- LLM: OpenAI API + Sentence Transformers
Frontend
- Framework: React.js
- Styling: CSS3 with modern design
- State Management: React Context/Hooks
- HTTP Client: Axios
Infrastructure
- Containerization: Docker & Docker Compose
- Orchestration: Kubernetes
- Cloud: AWS/GCP ready
- Infrastructure as Code: Terraform
- Monitoring: Prometheus + Grafana
Quick Start
Prerequisites
- Docker and Docker Compose
- Node.js 18+ (for frontend development)
- Python 3.9+ (for backend development)
1. Clone and Setup
cd knowledge-base cp .env.example .env # Edit .env with your configuration
2. Start Services
# Start all services docker-compose up -d # Check service status docker-compose ps
3. Initialize Database
# Run database migrations docker-compose exec backend alembic upgrade head # Create initial admin user docker-compose exec backend python scripts/create_admin.py
4. Access Applications
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Airflow: http://localhost:8080
- Elasticsearch: http://localhost:9200
Development Setup
Backend Development
cd backend python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt uvicorn app:app --reload --host 0.0.0.0 --port 8000
Frontend Development
cd frontend
npm install
npm startETL Development
cd etl # Set up Airflow export AIRFLOW_HOME=$(pwd) airflow db init airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@example.com airflow webserver -p 8080
Project Structure
knowledge-base/
├── README.md
├── requirements.txt
├── docker-compose.yml
├── .env.example
├── config/ # Configuration files
├── backend/ # FastAPI backend
│ ├── models/ # Database models
│ ├── api/ # API endpoints
│ ├── services/ # Business logic
│ └── utils/ # Utility functions
├── etl/ # Data pipeline
│ ├── dags/ # Airflow DAGs
│ ├── extractors/ # Data extractors
│ └── transformers/ # Data transformers
├── frontend/ # React frontend
│ └── src/
│ ├── components/ # React components
│ ├── services/ # API services
│ └── utils/ # Utility functions
├── infrastructure/ # Deployment configs
│ ├── terraform/ # Infrastructure as Code
│ └── kubernetes/ # K8s manifests
└── tests/ # Test suites
├── unit/
├── integration/
└── e2e/
Configuration
Environment Variables
Copy .env.example to .env and configure:
# Database DATABASE_URL=postgresql://kb_user:kb_password@localhost:5432/knowledge_base # Elasticsearch ELASTICSEARCH_URL=http://localhost:9200 # Redis REDIS_URL=redis://localhost:6379 # LLM API Keys OPENAI_API_KEY=your_openai_api_key_here # Data Source Credentials CONFLUENCE_URL=https://your-company.atlassian.net CONFLUENCE_USERNAME=your_username CONFLUENCE_API_TOKEN=your_api_token JIRA_URL=https://your-company.atlassian.net JIRA_USERNAME=your_username JIRA_API_TOKEN=your_api_token SHAREPOINT_URL=https://your-company.sharepoint.com SHAREPOINT_CLIENT_ID=your_client_id SHAREPOINT_CLIENT_SECRET=your_client_secret
Data Pipeline
ETL Process
- Extract: Pull data from various sources (Confluence, Jira, etc.)
- Transform: Clean, normalize, and chunk content
- Load: Store in PostgreSQL and index in Elasticsearch
- Embed: Generate vector embeddings for semantic search
Scheduling
- Full Sync: Weekly (Sundays at 2 AM)
- Incremental Sync: Daily (Every 4 hours)
- Real-time: Webhook-triggered updates
API Documentation
Search Endpoints
POST /api/search/- Perform search with various typesGET /api/search/suggestions- Get search suggestionsGET /api/search/analytics- Search analytics
Document Endpoints
GET /api/documents/- List documentsGET /api/documents/{id}- Get document detailsPOST /api/documents/- Create new documentPUT /api/documents/{id}- Update document
Authentication
POST /api/auth/login- User loginPOST /api/auth/logout- User logoutGET /api/auth/me- Get current user
Deployment
Production Deployment
# Build and deploy docker-compose -f docker-compose.prod.yml up -d # Or use Kubernetes kubectl apply -f infrastructure/kubernetes/
Monitoring
- Health checks:
/health - Metrics:
/metrics(Prometheus format) - Logs: Centralized logging with structured JSON
Testing
Run Tests
# Backend tests cd backend pytest tests/ # Frontend tests cd frontend npm test # Integration tests docker-compose -f docker-compose.test.yml up --abort-on-container-exit
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Project Phases
Phase 1: Foundation (Weeks 1-3) ✅
- Infrastructure setup
- Database schema design
- Basic LLM integration
Phase 2: Core Services (Weeks 4-7) 🔄
- Embedding service
- Natural language query parser
- Vector search service
Phase 3: Advanced Features (Weeks 8-11) ⏳
- Intelligent query router
- Content generation engine
- Real-time embedding updates
Phase 4: Production (Weeks 12-15) ⏳
- Performance optimization
- Security hardening
- Production deployment
Support
For questions and support:
- Technical Issues: Create an issue in this repository
- Documentation: Check the
/docsfolder - API Questions: Visit http://localhost:8000/docs
License
This project is proprietary software for internal company use only.