GitHub - lidalei/DataMining: Various data mining algorithms implemented with sklearn and tensorflow.
MPNN.py
Multiple processing nearest-neighbor based on Cosine similarity.
MTNN.py
Multiple threads nearest-neighbor based on Cosine similarity.
NN.py
Nearest-neighbor based on Cosine similarity.
SGDDataset.py
Provides next_batch method, useful in Neural Network Mini-batch training.
ada_learning_rate_nn.py
One hidden layer and one Softmax output layer neural netwok based on Tensorflow.
challenge.py
Used to challenge the task 14951 in OpenML.
dataloader_1b.py
Used to load files in data1b/. Provided by Course Prof.
decision_tree.py
Experimented CART and randomized tree with a set of hyperparameter settings.
ensembles.py
Experimented with Random Forests.
evaluate_NN.py
Used to evaluate Nearest-neighbor algorithms with different distance functions, i.e., confusion matrix.
k_means.py
k-means with different initialization methods, inclu. first k points, uniformly sampled k points, kmeans++, gonzales algorithm.
k_medians.py
k-median clustering.
kernel_selection.py
Support Vector Machines (SVM) with different kernels, incl. linear, rbf and polynomial kernels.
kernel_selection2.py
Experimented parameters of SVM with rbf kernel, namely gamma and C.
kernel_selection3.py
Grid search of SVM with rbf kernel, using AUC as metric.
landscape_analysis.py
Grid search of SVM with rbf kernel. Plot the AUC = f(gamma, C) heat map.
max_margin_classifier.py
A simple example to explain support vectors and maximal margin linear classifier.
mnist_dataloader.py
To load the MNIST dataset (data1a/). Provided by Course Prof.
model_selection.py
Compute bias and variance using bootstraping of knearest-neighbor (different ks) or decision tree (different max_depth or max_leaf_nodes).
nn_mnist.py
Neural Network with sklearn.
nn_with_alpha.py
Neural Network with different alphas, i.e., l2-norm penalty implemented with Tensorflow.
nn_with_learning_rate.py
Neural Network with different learning rates implemented with Tensorflow.
nn_with_momentum.py
Neural Network with different momentum implemented with Tensorflow.
nn_with_nodes.py
Neural Network with a hidden layer and a softmax output layer implemented in Tensorflow.
optimization.py
Experimented with different hyperparameter tuning techniques, incl. random search, grid search (with cross validation).
random_forests.py
Demonstrate how Random Forests reduce variance without increasing bias (much) so as to reduce the classification error.
random_projection.py
Implement random projection, to do dimensionality reduction. The result is compared with MPNN.py.
roc_curves.py
Demonstrate the convex hull of many classifiers in ROC diagram.
tensor_flow_softmax_mnist.py
Softmax regression implemented in Tensorflow. This is used to practice with Tensorflow.
unit_circles.py
Demonstrate the unit
circles of different norms, inclu. l1, l2, l10 and l-infinity.