cancer_visualize_pca, a scikit-learn code which uses principal component analysis (PCA) of the cancer dataset to visualize the difference between malignant and benign cases.
The computer code and data files described and made available on this web page are distributed under the MIT license
cancer_classify_decision, a scikit-learn code which uses a decision tree algorithm to classify the breast cancer dataset, comparing the training and testing accuracy as the depth of the tree is varied.
cancer_classify_forest, a scikit-learn code which uses the random forest algorithm to classify the breast cancer dataset.
cancer_classify_gradboost, a scikit-learn code which uses the gradient boosting algorithm to classify the breast cancer dataset.
cancer_classify_knn, a scikit-learn code which uses the k-nearest neighbor algorithm to classify the breast cancer dataset, comparing the training and testing accuracy as the number of neighbors is increased.
cancer_classify_logistic, a scikit-learn code which uses logistic regression to classify the breast cancer dataset, investigating the influence of the C parameter.
cancer_classify_mlp, a scikit-learn code which uses a multilayer perceptron to classify the breast cancer dataset.
cancer_classify_svm_rbf, a scikit-learn code which uses the support vector algorithm with RBF kernel on the cancer dataset, showing that the data should be rescaled to avoid overfitting.
cancer_scale_minmax, a scikit-learn code which uses the min-max scaling to preprocess the cancer dataset.
cancer_visualize_histogram, a scikit-learn code which displays all 30 features of the cancer dataset as histograms of feature frequence for malignant versus benign cases.