Machine Learning Practice
Some practices using statistical machine learning technique based on some dataset.
To see more detail or example about deep learning, you can checkout my Deep Learning repository.
Environment
- Using Python 3
(most of the relative path links are according to the repository root)
Dependencies
numpy: For low-level math operationspandas: For data manipulationsklearn- Scikit Learn: For evaluation metrics, some data preprocessing
For comparison purpose
sklearn: For machine learning modelscvxopt: For convex optimization problem (for SVM)
NLP related
gensim: Topic Modellinghmmlearn: Hidden Markov Models in Python, with scikit-learn like APIjieba: Chinese text segementation librarypyHanLP: Chinese NLP library (Python API)nltk: Natural Language Toolkit
Projects
Machine Learning Categories
Consider the learning task
- Surpervised Learning
- Classification - Discrete
- Regression - Continuous
- Unsupervised Learning
- Clustering - Discrete
- Dimensionality Reduction - Continuous
- Association Rule Learning
- Semi-supervised Learning
- Reinforcement Learning
Cosider the desired output of a ML system
- Classification
Logistic Regression(optimization algo.)k-Nearest Neighbors (kNN)Support Vector Machine (SVM)- Deduction (optimization algo.)Naive BayesDecision Tree (ID3, C4.5, CART)
- Regression
Linear Regression(optimization algo.)Tree (CART)
- Clustering
k-MeansHierarchical Clustering
- Association Rule Learning
- Dimensionality Reduction
Principal Compnent Analysis (PCA)Single Value Decomposition (SVD)- LSA, LSI, Recommendation SystemISOMAP
Ensemble Method (Meta-algorithm)
- Bagging
Random Forests
- Boosting
AdaBoost<- With some basic boosting notesGradient BoostingGradient Boosting Decision Tree (GBDT)(aka. Multiple Additive Regression Tree (MART))
XGBoost
Others
Hidden Markov Model (HMM)Bayesian Network(aka. Probabilistic Directed Acyclic Graphical Model)Conditional Random Field (CRF)Probabilistic Latent Semantic Analysis (PLSA)Latent Dirichlet Allocation (LDA)Vector Space Model (VSM)
Heuristic Algorithm
Machine Learning Concepts
General Case
Categorized
- Classification
- Data Preprocessing
- Real-world Problem
- Evaluation Metrics
- Binary to Multi-class Expension
- Regression
- Evaluation Metrics
- Clustering
- Evaluation Metrics
Specific Field
- Data Mining - Knowledge Discovering
- Recommendation System
- Collaborative Filtering
- Information Retrieval - Topic Modelling
- Latent Semantic Analysis (LSA/LSI/SVD)
- Latent Dirichlet Allocation (LDA)
- Random Projections (RP)
- Hierarchical Dirichlet Process (HDP)
- word2vec
Machine Learning Mathematics
Topic
- Kernel Usages
- Convex Optimization
Categories
- Linear Algebra
- Orthogonality
- Eigenvalues
- Hessian Matrix
- Quadratic Form
- Markov Chain - HMM
- Calculus
- Multivariable Deratives
- Quadratic Approximations
- Lagrange Multipliers and Constrained Optimization - SVM SMO
- Lagrange Duality
- Multivariable Deratives
- Probability and Statistics
- Statistical Estimation
Basics
- Algebra
- Trigonometry
Application
(from A to Z)
- Decision Tree
- Entropy
- HMM
- Markov Chain
- Naive Bayes
- Bayes' Theorem
- PCA
- Orthogonal Transformations
- Eigenvalues
- SVD
- Eigenvalues
- SVM
- Convex Optimization
- Constrained Optimization
- Lagrange Multipliers
- Kernel
Books Recommendation
Machine Learning
- Machine Learning in Action
- 統計學習方法 (李航)
- 機器學習 (周志華)
Mathematics
- Linear Algebra with Applications (Steven Leon)
- Convex Optimization (Stephen Boyd & Lieven Vandenberghe)
- Numerical Linear Algebra (L. Trefethen & D. Bau III)
Resources
Tutorial
Videos
- Google - Machine Learning Recipes with Josh Gordon
- Youtube - Machine Learning Fun and Easy
- Siraj Raval - The Math of Intelligence
- bilibili - 機器學習 - 白板推導系列
- bilibili - 機器學習升級版
Documentations
- ApacheCN (ML, DL, NLP)
- Machine learning 101 (infographics)
Interactive Learning
- Google Machine Learning Crash Course
- Kaggle Learn Machine Learning
- Microsoft Professional Program - Artificial Intelligence track
MOOC
Github
Textbook Implementation
- Machine Learning in Action
- Learning From Data
- 統計學習方法 (李航)
- Stanford Andrew Ng CS229
Datasets
- UCI Machine Learning Repository
- Awesome Public Datasets
- Kaggle Datasets
- The MNIST Database of handwritten digits
- 資料集平台 Data Market
- AI Challenger Datasets
- Peking University Open Research Data
- Open Images Dataset
- Alibaba Cloud Tianchi Data Lab
Machine Learning Platform
Machine Learning Tool
(Online) Development Environment
- Extension plugin -
pip install jupyter_contrib_nbextensions- VIM binding
- Codefolding
- ExecuteTime
- Notify
- Jupyter Theme -
pip install --upgrade jupyterthemes