GitHub - vision-dev1/PhishX: Detects phishing links before they detect you.

Advanced AI-Powered Phishing Detection

PhishX is a full-stack cybersecurity web application that uses Machine Learning to detect phishing threats across emails, URLs, and QR codes — with real-time results and confidence scoring.

Features • Installation • Usage • API • ML Details

✨ Features

Feature	Description
📧 Email Analysis	Detects phishing language, urgency tactics, and deceptive patterns using TF-IDF + Logistic Regression
🔗 URL Checker	Analyzes 15+ URL features — domain, special characters, HTTPS, IP usage — with Random Forest
📱 QR Code Scan	Decodes QR images, extracts embedded URLs, and runs them through the phishing model
🌗 Dark Mode	Full dark/light theme toggle, preference saved to localStorage
📊 Live Counters	Animated scan, user, and organization statistics

🛠️ Tech Stack

Backend: Python 3.8+, Flask, scikit-learn, OpenCV, pyzbar, joblib, NumPy

Frontend: HTML5, TailwindCSS (CDN), Vanilla JavaScript

ML Models: TF-IDF Vectorization, Logistic Regression (email), Random Forest (URL)

📁 Project Structure

PhishX/
├── backend/
│   ├── app.py                    # Flask API server
│   ├── train_email_model.py      # Email model trainer
│   ├── train_url_model.py        # URL model trainer
│   ├── requirements.txt          # Python dependencies
│   └── models/                   # Auto-generated after training
│       ├── email_model.pkl
│       ├── email_vectorizer.pkl
│       ├── url_model.pkl
│       └── url_scaler.pkl
├── frontend/
│   ├── index.html                # Main UI
│   ├── style.css                 # Custom styles
│   ├── script.js                 # Frontend logic & API calls
│   └── assets/                   # Images & icons
├── utils/
│   └── qr_scanner.py             # QR code decoder
└── README.md

🚀 Installation

Prerequisites

Python 3.8+ and pip
System library for QR scanning (pyzbar):

OS	Command
Ubuntu / Debian	`sudo apt-get install libzbar0`
macOS	`brew install zbar`
Windows	Download from ZBar SourceForge

Step 1 — Clone & Enter the Project

git clone https://github.com/yourusername/PhishX.git
cd PhishX

Step 2 — Create & Activate a Virtual Environment

Platform	Activation Command
Linux / macOS	`source venv/bin/activate`
Windows (CMD)	`venv\Scripts\activate`
Windows (PowerShell)	`.\venv\Scripts\Activate.ps1`

Step 3 — Install Dependencies

pip install -r backend/requirements.txt

Step 4 — Train the ML Models

⚠️ Required before first run. This generates the .pkl model files.

python backend/train_email_model.py
python backend/train_url_model.py

Step 5 — Start the Server

Open your browser at http://localhost:5000

📖 Usage

📧 Email Analysis

Click the Email Analysis card
Paste the full email (headers + body) into the textarea
Press Ctrl + Enter or click Analyze Email

Example phishing email:
URGENT: Your account will be suspended. Click here immediately to verify your credentials.

🔗 URL Checker

Click the Link Checker card
Enter the suspicious URL
Press Enter or click Check URL

Example phishing URL:
http://paypal-secure-login.com/verify-account?user=you

📱 QR Code Scanner

Click the QR Code Scan card
Drag & drop or upload a QR image (PNG, JPG, SVG)
Click Scan Image

🔌 API Reference

Base URL: http://localhost:5000

`POST /detect/email`

// Request
{ "text": "Your account will be suspended..." }

// Response
{ "result": "Phishing", "confidence": "92.5%", "raw_confidence": 92.5 }

`POST /detect/url`

// Request
{ "url": "https://suspicious-site.com/login" }

// Response
{ "result": "Legitimate", "confidence": "87.3%", "raw_confidence": 87.3 }

`POST /detect/qr`

Multipart form data with an image field.

// Response
{ "decoded_url": "http://phishing-site.com", "result": "Phishing", "confidence": "95.2%" }

`GET /health`

{ "status": "healthy", "email_model_loaded": true, "url_model_loaded": true }

🧠 Machine Learning

Email Model

Algorithm: Logistic Regression
Vectorizer: TF-IDF (5,000 features)
Accuracy: ~95% on test set

URL Model

Algorithm: Random Forest (100 estimators)
Features (15+): URL length, dot/dash/slash counts, HTTPS flag, IP address detection, suspicious keyword matches, domain length, redirect patterns
Accuracy: ~95% on test set

Improving for Production

Replace synthetic training data with real-world datasets (PhishTank, OpenPhish, Enron corpus)
Integrate VirusTotal or Google Safe Browsing API
Use BERT/LSTM for deeper email NLP
Add WHOIS and SSL certificate validation

🔮 Roadmap

Docker containerization
Browser extension for automatic URL checking
Real-time threat intelligence feed (VirusTotal API)
Advanced NLP models (BERT, GPT)
API key authentication & rate limiting
Email client plugin (Gmail / Outlook)
Mobile app for QR scanning
Multi-language support

🤝 Contributing

Contributions are welcome! Areas of interest:

Improved / real-world ML training datasets
UI/UX enhancements
Additional phishing detection heuristics
Bug fixes and performance improvements

Please open an issue first to discuss major changes.

📄 License

This project is licensed under the MIT License.See the LICENSE file for deatils.

👨‍💻 Author

Vision KC
Portfolio
Github

Built as a cybersecurity portfolio project demonstrating full-stack development, machine learning, and security awareness.