GitHub - lidalei/DataMining: Various data mining algorithms implemented with sklearn and tensorflow.

MPNN.py Multiple processing nearest-neighbor based on Cosine similarity. MTNN.py Multiple threads nearest-neighbor based on Cosine similarity. NN.py Nearest-neighbor based on Cosine similarity. SGDDataset.py Provides next_batch method, useful in Neural Network Mini-batch training. ada_learning_rate_nn.py One hidden layer and one Softmax output layer neural netwok based on Tensorflow. challenge.py Used to challenge the task 14951 in OpenML. dataloader_1b.py Used to load files in data1b/. Provided by Course Prof. decision_tree.py Experimented CART and randomized tree with a set of hyperparameter settings. ensembles.py Experimented with Random Forests. evaluate_NN.py Used to evaluate Nearest-neighbor algorithms with different distance functions, i.e., confusion matrix. k_means.py k-means with different initialization methods, inclu. first k points, uniformly sampled k points, kmeans++, gonzales algorithm. k_medians.py k-median clustering. kernel_selection.py Support Vector Machines (SVM) with different kernels, incl. linear, rbf and polynomial kernels. kernel_selection2.py Experimented parameters of SVM with rbf kernel, namely gamma and C. kernel_selection3.py Grid search of SVM with rbf kernel, using AUC as metric. landscape_analysis.py Grid search of SVM with rbf kernel. Plot the AUC = f(gamma, C) heat map. max_margin_classifier.py A simple example to explain support vectors and maximal margin linear classifier. mnist_dataloader.py To load the MNIST dataset (data1a/). Provided by Course Prof. model_selection.py Compute bias and variance using bootstraping of knearest-neighbor (different ks) or decision tree (different max_depth or max_leaf_nodes). nn_mnist.py Neural Network with sklearn. nn_with_alpha.py Neural Network with different alphas, i.e., l2-norm penalty implemented with Tensorflow. nn_with_learning_rate.py Neural Network with different learning rates implemented with Tensorflow. nn_with_momentum.py Neural Network with different momentum implemented with Tensorflow. nn_with_nodes.py Neural Network with a hidden layer and a softmax output layer implemented in Tensorflow. optimization.py Experimented with different hyperparameter tuning techniques, incl. random search, grid search (with cross validation). random_forests.py Demonstrate how Random Forests reduce variance without increasing bias (much) so as to reduce the classification error. random_projection.py Implement random projection, to do dimensionality reduction. The result is compared with MPNN.py. roc_curves.py Demonstrate the convex hull of many classifiers in ROC diagram. tensor_flow_softmax_mnist.py Softmax regression implemented in Tensorflow. This is used to practice with Tensorflow. unit_circles.py Demonstrate the unit circles of different norms, inclu. l1, l2, l10 and l-infinity.