Fraud Detection System
A system for detecting fraudulent merchant applications using vector similarity search and pattern matching.
System Architecture
The system consists of three main components:
-
Training Module (
fraud_detection_training):- Generates synthetic training data
- Creates and populates the database with merchant records
- Handles vector embeddings for similarity search
-
API Module (
fraud_detection_api):- Provides REST API endpoints for fraud detection
- Implements pattern matching and similarity search
- Returns detailed fraud analysis results
-
Common Module (
fraud_detection_common):- Shared utilities and configurations
- Database operations
- Dynamic model generation
Navigation
- Project Structure
- Prerequisites
- Setup
- Usage
- How It Works
- System Flow
- Development
- Contributing
- License
- Acknowledgments
Project Structure
The system is divided into three main components:
-
fraud_detection_common - Shared utilities and models
- Database operations with pgvector
- Custom embedding generation using feature engineering and PCA
- Dynamic model generation based on configuration
- Common data models and types
-
fraud_detection_training - Training and embedding generation
- Processes training data from JSON or CSV files
- Generates custom embeddings
- Stores embeddings in the database
-
fraud_detection_api - API service
- FastAPI-based REST API
- Evaluates new applications
- Returns fraud detection results
System Flow
graph TD
A[Training Data] --> B[Training Module]
B --> C[Feature Engineering]
C --> D[PCA Transformation]
D --> E[Store Embeddings]
F[New Application] --> G[API Service]
G --> H[Feature Engineering]
H --> I[PCA Transformation]
I --> J[Vector Similarity Search]
J --> K[Field Matching]
K --> L[Decision Making]
L --> M[Return Result]
E --> J
Prerequisites
- Python 3.9+
- PostgreSQL 15+ with pgvector extension
- Docker and Docker Compose (for containerized deployment)
- Make (optional, for using Makefile commands)
Local Development Setup
1. Create and Activate Virtual Environment
# Create virtual environment python -m venv .venv # Activate virtual environment # On macOS/Linux: source .venv/bin/activate # On Windows: .venv\Scripts\activate
2. Install Dependencies
# Install common module pip install -e fraud_detection_common/ # Install training module pip install -e fraud_detection_training/ # Install API module pip install -e fraud_detection_api/
3. Database Setup
# Start PostgreSQL with pgvector docker compose up -d db # Wait for database to be ready docker compose exec db pg_isready -U postgres
4. Run Training Module
# Generate training data and populate database
python -m fraud_detection_training.train5. Run API Module
# Start the API server
python -m fraud_detection_api.apiDocker-based Deployment
1. Build and Start Services
# Build all services docker compose build # Start all services docker compose up -d
2. Check Service Status
# View logs docker compose logs -f # Check service status docker compose ps
3. Stop Services
# Stop all services docker compose down # Stop and remove volumes docker compose down -v
API Usage
1. Check API Schema
# Get API schema
curl http://localhost:8000/schema2. Submit Merchant Application
# Submit a merchant application for fraud detection curl -X POST http://localhost:8000/predict \ -H "Content-Type: application/json" \ -d '{ "merchant_id": "test-merchant-1", "owner_ssn": "123-45-6789", "business_fed_tax_id": "12-3456789", "owner_drivers_license": "DL12345678", "business_phone_number": "+1-555-123-4567", "owner_phone_number": "+1-555-987-6543", "email": "merchant@example.com", "address_line1": "123 Main St", "city": "New York", "state": "NY", "zip_code": "10001", "country": "US", "website": "https://example.com" }'
Development Workflow
1. Local Development
# Start database docker compose up -d db # Install dependencies pip install -e fraud_detection_common/ pip install -e fraud_detection_training/ pip install -e fraud_detection_api/ # Run training python -m fraud_detection_training.train # Run API python -m fraud_detection_api.api
2. Docker Development
# Build and start services docker compose up --build # View logs docker compose logs -f # Stop services docker compose down
Configuration
The system uses a centralized configuration in the config/ directory:
database_config.json: Database connection settingsdatabase_config.local.json: Local development overrides
To modify configuration:
- Copy the template:
cp config/database_config.json config/database_config.local.json
- Edit the local configuration file with your settings
Troubleshooting
Database Connection Issues
# Check database status docker compose exec db psql -U postgres -c "\l" # Check table structure docker compose exec db psql -U postgres -d fraud_detection -c "\dt"
Service Logs
# View API logs docker compose logs -f api # View training logs docker compose logs -f training # View database logs docker compose logs -f db
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- pgvector for vector similarity search
- FastAPI for the API framework
- scikit-learn for feature engineering and PCA