GitHub - SSaksit23/package-tour-comparison

πŸ“Š Travel Itinerary Analyzer - Complete Documentation

A powerful AI-powered travel itinerary comparison and analysis platform with Hybrid RAG (Retrieval-Augmented Generation) capabilities.


🌟 Features Overview

Core Features

Feature Description
πŸ“„ Multi-format Upload Supports PDF, Word (DOCX), and Images
πŸ” AI Analysis Extracts structured data from itineraries
βš–οΈ Comparison Side-by-side comparison of multiple itineraries
πŸ’‘ Smart Insights RAG-enhanced strategic recommendations
πŸ’¬ Q&A Chat Ask questions about your documents
🌍 Bilingual Support English (Roboto Slab) & Thai (Sarabun) fonts

Advanced RAG Features

Feature Description
πŸ“š Knowledge Base Store and index documents for retrieval
πŸ”— Hybrid Search Combines vector similarity + graph traversal
πŸ‡ΉπŸ‡­ Thai RAG Optimized embedding for Thai documents
πŸ–ΌοΈ Multimodal Process images with GPT-4 Vision

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Frontend (React + Vite)                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  Upload     β”‚  β”‚  Analysis   β”‚  β”‚  Knowledge Base     β”‚  β”‚
β”‚  β”‚  Component  β”‚  β”‚  Output     β”‚  β”‚  Manager            β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                      Services Layer                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ AI       β”‚ β”‚ File     β”‚ β”‚ Arango   β”‚ β”‚ Thai RAG      β”‚   β”‚
β”‚  β”‚ Service  β”‚ β”‚ Parser   β”‚ β”‚ Service  β”‚ β”‚ Service       β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                     External Services                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚   OpenAI     β”‚  β”‚   ArangoDB   β”‚  β”‚   ChromaDB       β”‚   β”‚
β”‚  β”‚   GPT-4o     β”‚  β”‚   (Hybrid)   β”‚  β”‚   (Backup)       β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

itin-analyzer/
β”œβ”€β”€ components/              # React UI components
β”‚   β”œβ”€β”€ AnalysisOutput.tsx   # Main output container
β”‚   β”œβ”€β”€ InsightsView.tsx     # RAG-enhanced insights display
β”‚   β”œβ”€β”€ ComparisonView.tsx   # Side-by-side comparison
β”‚   β”œβ”€β”€ QnaView.tsx          # Q&A chat interface
β”‚   β”œβ”€β”€ KnowledgeBase.tsx    # KB management UI
β”‚   β”œβ”€β”€ RagProgressOverlay.tsx # Loading overlay
β”‚   └── icons/               # SVG icon components
β”‚
β”œβ”€β”€ services/                # Business logic services
β”‚   β”œβ”€β”€ aiService.ts         # OpenAI API integration
β”‚   β”œβ”€β”€ arangoService.ts     # ArangoDB Hybrid RAG
β”‚   β”œβ”€β”€ thaiRagService.ts    # Thai language processing
β”‚   β”œβ”€β”€ fileParser.ts        # PDF/DOCX/Image parsing
β”‚   β”œβ”€β”€ multimodalRagService.ts # GPT-4 Vision
β”‚   β”œβ”€β”€ dbService.ts         # IndexedDB local storage
β”‚   └── exportService.ts     # PDF/Excel export
β”‚
β”œβ”€β”€ utils/                   # Utility functions
β”‚   └── markdownRenderer.tsx # Markdown to React
β”‚
β”œβ”€β”€ App.tsx                  # Main application component
β”œβ”€β”€ types.ts                 # TypeScript type definitions
β”œβ”€β”€ index.html               # HTML entry point
β”œβ”€β”€ index.css                # Global styles
β”œβ”€β”€ vite.config.ts           # Vite configuration
β”œβ”€β”€ docker-compose.yml       # Docker services
└── .env                     # Environment variables

πŸ”„ How It Works

1. Document Upload & Parsing

graph LR
    A[Upload File] --> B{File Type?}
    B -->|PDF| C[pdf-parse]
    B -->|DOCX| D[mammoth.js]
    B -->|Image| E[GPT-4 Vision]
    C --> F[Extracted Text]
    D --> F
    E --> F
    F --> G[AI Analysis]
Loading

Supported formats:

  • PDF: Uses pdfjs-dist for text extraction
  • Word: Uses mammoth for DOCX parsing
  • Images: Uses GPT-4 Vision for OCR

2. AI Analysis Flow

When you click "Analyze & Compare":

  1. Text Extraction β†’ Parse uploaded files
  2. Structured Analysis β†’ GPT-4o extracts:
    • Tour name, duration, destinations
    • Pricing breakdown
    • Inclusions/exclusions
    • Day-by-day itinerary
  3. Comparison β†’ Generate side-by-side table
  4. Geocoding β†’ Map destinations

3. Knowledge Base & RAG

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    RAG Pipeline                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                          β”‚
β”‚  1. INDEXING (Upload to KB)                              β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
β”‚     β”‚ Chunk   β”‚ β†’ β”‚ Embed   β”‚ β†’ β”‚ Store   β”‚             β”‚
β”‚     β”‚ Text    β”‚   β”‚ Vectors β”‚   β”‚ ArangoDBβ”‚             β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
β”‚          ↓                                               β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                           β”‚
β”‚     β”‚ Extract β”‚ β†’ β”‚ Build   β”‚                           β”‚
β”‚     β”‚ Entitiesβ”‚   β”‚ Graph   β”‚                           β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                           β”‚
β”‚                                                          β”‚
β”‚  2. RETRIEVAL (Get Insights / Q&A)                       β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚     β”‚ Query   β”‚ β†’ β”‚ Hybrid Search       β”‚               β”‚
β”‚     β”‚ Embed   β”‚   β”‚ (Vector + Graph)    β”‚               β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚
β”‚                           ↓                              β”‚
β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”‚
β”‚     β”‚ LLM Generate Answer with Context    β”‚             β”‚
β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
β”‚                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

4. Language Detection

The app automatically detects Thai text and applies appropriate processing:

Detection Embedding Model Font
πŸ‡ΊπŸ‡Έ English OpenAI text-embedding-3-small Roboto Slab
πŸ‡ΉπŸ‡­ Thai OpenThaiGPT or Thai-preprocessed Sarabun
🌐 Mixed Smart hybrid embedding Both

βš™οΈ Configuration

Environment Variables (.env)

# Required: OpenAI API Key
OPENAI_API_KEY=sk-proj-your_key_here

# ArangoDB (Hybrid RAG)
ARANGO_URL=http://localhost:8529
ARANGO_USER=root
ARANGO_PASSWORD=password123
ARANGO_DATABASE=itinerary_kb

# ChromaDB (Backup vector store)
CHROMA_URL=http://localhost:8000

# Optional: Thai RAG
OPENTHAI_ENABLED=false
OPENTHAI_API_URL=http://localhost:5000

Docker Services

services:
  dev:           # Vite dev server (port 3000)
  arangodb:      # Hybrid RAG database (port 8529)
  chromadb:      # Vector backup (port 8000)

πŸš€ Getting Started

Prerequisites

  • Node.js 18+
  • Docker & Docker Compose
  • OpenAI API Key

Quick Start

# 1. Clone the repository
git clone https://github.com/SSaksit23/package-tour-comparison.git
cd package-tour-comparison

# 2. Copy environment file
cp env.example .env
# Edit .env and add your OPENAI_API_KEY

# 3. Start with Docker
docker-compose up -d

# 4. Open browser
# http://localhost:3000

Development Mode

# Install dependencies
npm install

# Start dev server only
npm run dev

# Or start with databases
docker-compose up arangodb chromadb -d
npm run dev

πŸ“Š Usage Guide

Basic Workflow

  1. Upload Itineraries

    • Drag & drop PDF/DOCX/Images into upload zones
    • Add competitor names for each itinerary
  2. Analyze

    • Click "Analyze & Compare"
    • View structured data, comparison table, insights
  3. Build Knowledge Base (for RAG)

    • Click "Knowledge Base" button
    • Upload reference documents
    • Wait for indexing (check console logs)
  4. Get Enhanced Insights

    • With KB populated, click "Get Insights"
    • System searches KB for relevant context
    • Generates RAG-enhanced recommendations
  5. Q&A

    • Ask questions about your documents
    • System uses hybrid search for answers

Tips for Best Results

Tip Description
πŸ“š Populate KB First Upload similar itineraries to KB before analysis
πŸ‡ΉπŸ‡­ Thai Documents System auto-detects Thai and uses optimized processing
πŸ“„ File Size Keep files under 10MB for best performance
πŸ’¬ Q&A Context More KB documents = better Q&A answers

πŸ”§ Troubleshooting

Common Issues

Issue Solution
"No KB docs used" Upload documents to Knowledge Base first
409 Conflict errors Normal - collections already exist
Embedding stuck Check OpenAI API key and rate limits
Thai text garbled Ensure Sarabun font is loaded

Checking Logs

# View Docker logs
docker-compose logs -f dev
docker-compose logs -f arangodb

# Browser console shows:
# βœ… ArangoDB Hybrid RAG initialized
# πŸ“¦ Knowledge Base: X chunks total
# πŸ“š RAG Context: Found X relevant documents

πŸ“ API Reference

AI Service Functions

// Analyze itinerary text
analyzeItinerary(text: string, language: string): Promise<ItineraryData>

// Generate comparison table
getComparison(competitors: Competitor[], language: string): Promise<string>

// Get recommendations (with optional RAG context)
getRecommendations(
  competitors: Competitor[], 
  history: AnalysisRecord[], 
  language: string,
  ragContext?: string
): Promise<string>

// Q&A with RAG
generateAnswer(
  contexts: {name: string, text: string}[],
  question: string,
  chatHistory: ChatMessage[],
  language: string,
  ragContext?: string
): Promise<string>

ArangoDB Service Functions

// Index document in knowledge base
indexDocumentInArango(doc: Document): Promise<{chunks: number, entities: number}>

// Hybrid search (vector + graph)
hybridSearch(query: string, topK?: number): Promise<HybridSearchResult[]>

// RAG query with chat
arangoHybridQuery(
  question: string,
  chatHistory: ChatMessage[],
  language: string
): Promise<HybridRAGResponse>

🌐 Technologies Used

Category Technology
Frontend React 18, TypeScript, Tailwind CSS
Build Vite
AI OpenAI GPT-4o, GPT-4 Vision
Vector DB ArangoDB (primary), ChromaDB (backup)
Graph ArangoDB Graph
PDF pdfjs-dist
Word mammoth.js
Maps Leaflet
Fonts Roboto Slab, Sarabun
Container Docker, Docker Compose

πŸ“„ License

MIT License - See LICENSE file for details.


πŸ‘¨β€πŸ’» Author

Created by Saksit Saelow


Last updated: December 2024