Advanced AI-Powered Phishing Detection
PhishX is a full-stack cybersecurity web application that uses Machine Learning to detect phishing threats across emails, URLs, and QR codes — with real-time results and confidence scoring.
Features • Installation • Usage • API • ML Details
✨ Features
| Feature | Description |
|---|---|
| 📧 Email Analysis | Detects phishing language, urgency tactics, and deceptive patterns using TF-IDF + Logistic Regression |
| 🔗 URL Checker | Analyzes 15+ URL features — domain, special characters, HTTPS, IP usage — with Random Forest |
| 📱 QR Code Scan | Decodes QR images, extracts embedded URLs, and runs them through the phishing model |
| 🌗 Dark Mode | Full dark/light theme toggle, preference saved to localStorage |
| 📊 Live Counters | Animated scan, user, and organization statistics |
🛠️ Tech Stack
Backend: Python 3.8+, Flask, scikit-learn, OpenCV, pyzbar, joblib, NumPy
Frontend: HTML5, TailwindCSS (CDN), Vanilla JavaScript
ML Models: TF-IDF Vectorization, Logistic Regression (email), Random Forest (URL)
📁 Project Structure
PhishX/
├── backend/
│ ├── app.py # Flask API server
│ ├── train_email_model.py # Email model trainer
│ ├── train_url_model.py # URL model trainer
│ ├── requirements.txt # Python dependencies
│ └── models/ # Auto-generated after training
│ ├── email_model.pkl
│ ├── email_vectorizer.pkl
│ ├── url_model.pkl
│ └── url_scaler.pkl
├── frontend/
│ ├── index.html # Main UI
│ ├── style.css # Custom styles
│ ├── script.js # Frontend logic & API calls
│ └── assets/ # Images & icons
├── utils/
│ └── qr_scanner.py # QR code decoder
└── README.md
🚀 Installation
Prerequisites
- Python 3.8+ and pip
- System library for QR scanning (
pyzbar):
| OS | Command |
|---|---|
| Ubuntu / Debian | sudo apt-get install libzbar0 |
| macOS | brew install zbar |
| Windows | Download from ZBar SourceForge |
Step 1 — Clone & Enter the Project
git clone https://github.com/yourusername/PhishX.git
cd PhishXStep 2 — Create & Activate a Virtual Environment
| Platform | Activation Command |
|---|---|
| Linux / macOS | source venv/bin/activate |
| Windows (CMD) | venv\Scripts\activate |
| Windows (PowerShell) | .\venv\Scripts\Activate.ps1 |
Step 3 — Install Dependencies
pip install -r backend/requirements.txt
Step 4 — Train the ML Models
⚠️ Required before first run. This generates the
.pklmodel files.
python backend/train_email_model.py python backend/train_url_model.py
Step 5 — Start the Server
Open your browser at http://localhost:5000
📖 Usage
📧 Email Analysis
- Click the Email Analysis card
- Paste the full email (headers + body) into the textarea
- Press
Ctrl + Enteror click Analyze Email
Example phishing email:
URGENT: Your account will be suspended. Click here immediately to verify your credentials.
🔗 URL Checker
- Click the Link Checker card
- Enter the suspicious URL
- Press
Enteror click Check URL
Example phishing URL:
http://paypal-secure-login.com/verify-account?user=you
📱 QR Code Scanner
- Click the QR Code Scan card
- Drag & drop or upload a QR image (PNG, JPG, SVG)
- Click Scan Image
🔌 API Reference
Base URL: http://localhost:5000
POST /detect/email
// Request { "text": "Your account will be suspended..." } // Response { "result": "Phishing", "confidence": "92.5%", "raw_confidence": 92.5 }
POST /detect/url
// Request { "url": "https://suspicious-site.com/login" } // Response { "result": "Legitimate", "confidence": "87.3%", "raw_confidence": 87.3 }
POST /detect/qr
Multipart form data with an image field.
// Response { "decoded_url": "http://phishing-site.com", "result": "Phishing", "confidence": "95.2%" }
GET /health
{ "status": "healthy", "email_model_loaded": true, "url_model_loaded": true }🧠 Machine Learning
Email Model
- Algorithm: Logistic Regression
- Vectorizer: TF-IDF (5,000 features)
- Accuracy: ~95% on test set
URL Model
- Algorithm: Random Forest (100 estimators)
- Features (15+): URL length, dot/dash/slash counts, HTTPS flag, IP address detection, suspicious keyword matches, domain length, redirect patterns
- Accuracy: ~95% on test set
Improving for Production
- Replace synthetic training data with real-world datasets (PhishTank, OpenPhish, Enron corpus)
- Integrate VirusTotal or Google Safe Browsing API
- Use BERT/LSTM for deeper email NLP
- Add WHOIS and SSL certificate validation
🔮 Roadmap
- Docker containerization
- Browser extension for automatic URL checking
- Real-time threat intelligence feed (VirusTotal API)
- Advanced NLP models (BERT, GPT)
- API key authentication & rate limiting
- Email client plugin (Gmail / Outlook)
- Mobile app for QR scanning
- Multi-language support
🤝 Contributing
Contributions are welcome! Areas of interest:
- Improved / real-world ML training datasets
- UI/UX enhancements
- Additional phishing detection heuristics
- Bug fixes and performance improvements
Please open an issue first to discuss major changes.
📄 License
This project is licensed under the MIT License.See the LICENSE file for deatils.
👨💻 Author
Built as a cybersecurity portfolio project demonstrating full-stack development, machine learning, and security awareness.
