GitHub - tmustafiz/pg-vector

Fraud Detection System

A system for detecting fraudulent merchant applications using vector similarity search and pattern matching.

System Architecture

The system consists of three main components:

Training Module (fraud_detection_training):
- Generates synthetic training data
- Creates and populates the database with merchant records
- Handles vector embeddings for similarity search
API Module (fraud_detection_api):
- Provides REST API endpoints for fraud detection
- Implements pattern matching and similarity search
- Returns detailed fraud analysis results
Common Module (fraud_detection_common):
- Shared utilities and configurations
- Database operations
- Dynamic model generation

Navigation

Project Structure

The system is divided into three main components:

fraud_detection_common - Shared utilities and models
- Database operations with pgvector
- Custom embedding generation using feature engineering and PCA
- Dynamic model generation based on configuration
- Common data models and types
fraud_detection_training - Training and embedding generation
- Processes training data from JSON or CSV files
- Generates custom embeddings
- Stores embeddings in the database
fraud_detection_api - API service
- FastAPI-based REST API
- Evaluates new applications
- Returns fraud detection results

System Flow

graph TD
    A[Training Data] --> B[Training Module]
    B --> C[Feature Engineering]
    C --> D[PCA Transformation]
    D --> E[Store Embeddings]
    
    F[New Application] --> G[API Service]
    G --> H[Feature Engineering]
    H --> I[PCA Transformation]
    I --> J[Vector Similarity Search]
    J --> K[Field Matching]
    K --> L[Decision Making]
    L --> M[Return Result]
    
    E --> J

Prerequisites

Python 3.9+
PostgreSQL 15+ with pgvector extension
Docker and Docker Compose (for containerized deployment)
Make (optional, for using Makefile commands)

Local Development Setup

1. Create and Activate Virtual Environment

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

2. Install Dependencies

# Install common module
pip install -e fraud_detection_common/

# Install training module
pip install -e fraud_detection_training/

# Install API module
pip install -e fraud_detection_api/

3. Database Setup

# Start PostgreSQL with pgvector
docker compose up -d db

# Wait for database to be ready
docker compose exec db pg_isready -U postgres

4. Run Training Module

# Generate training data and populate database
python -m fraud_detection_training.train

5. Run API Module

# Start the API server
python -m fraud_detection_api.api

Docker-based Deployment

1. Build and Start Services

# Build all services
docker compose build

# Start all services
docker compose up -d

2. Check Service Status

# View logs
docker compose logs -f

# Check service status
docker compose ps

3. Stop Services

# Stop all services
docker compose down

# Stop and remove volumes
docker compose down -v

API Usage

1. Check API Schema

# Get API schema
curl http://localhost:8000/schema

2. Submit Merchant Application

# Submit a merchant application for fraud detection
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{
    "merchant_id": "test-merchant-1",
    "owner_ssn": "123-45-6789",
    "business_fed_tax_id": "12-3456789",
    "owner_drivers_license": "DL12345678",
    "business_phone_number": "+1-555-123-4567",
    "owner_phone_number": "+1-555-987-6543",
    "email": "merchant@example.com",
    "address_line1": "123 Main St",
    "city": "New York",
    "state": "NY",
    "zip_code": "10001",
    "country": "US",
    "website": "https://example.com"
  }'

Development Workflow

1. Local Development

# Start database
docker compose up -d db

# Install dependencies
pip install -e fraud_detection_common/
pip install -e fraud_detection_training/
pip install -e fraud_detection_api/

# Run training
python -m fraud_detection_training.train

# Run API
python -m fraud_detection_api.api

2. Docker Development

# Build and start services
docker compose up --build

# View logs
docker compose logs -f

# Stop services
docker compose down

Configuration

The system uses a centralized configuration in the config/ directory:

database_config.json: Database connection settings
database_config.local.json: Local development overrides

To modify configuration:

Copy the template:

cp config/database_config.json config/database_config.local.json

Edit the local configuration file with your settings

Troubleshooting

Database Connection Issues

# Check database status
docker compose exec db psql -U postgres -c "\l"

# Check table structure
docker compose exec db psql -U postgres -d fraud_detection -c "\dt"

Service Logs

# View API logs
docker compose logs -f api

# View training logs
docker compose logs -f training

# View database logs
docker compose logs -f db

Contributing

Fork the repository
Create a feature branch
Make your changes
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

pgvector for vector similarity search
FastAPI for the API framework
scikit-learn for feature engineering and PCA