Course Proposal
proposal,
for a prospective course to be called
Mathematics of Machine Learning.
An initial version of the course will be offered in spring semester 2022,
as a directed study course, perhaps meeting once a week, for one credit hour,
open to students only on the approval of the instructor. During this
trial version of the class, a limited selection of topics and assignments
will be covered.
This course introduces machine learning to an interested student.
It presents a number of sets of data, and applies various
algorithms to show how underlying patterns and structure can be
discovered. The mathematical basis for the algorithms is presented,
particularly the use of linear algebra. Applicable concepts from
probability and statistics will be discussed. A programming component
is including in the class, so that students can try out some of
the algorithms.
It is expected that both undergraduate and graduate students could
master the course.
Students will be graded on
-
five homework assignments,
-
five programming tasks,
-
a final project.
The algorithms to be presented will be selected from:
-
apriori and other recommendations systems;
-
binning, clustering, Kmeans, Voronoi diagrams;
-
correlation, covariance matrix;
-
data wrangling;
-
decision trees;
-
dimensionality reduction (projection, principal component analysis);
-
expectation maximization;
-
logistic, linear, multilinear regression;
-
Markov methods, page rank;
-
naive Bayes:
-
Nearest Neighbors, K nearest neighbors, Weighted nearest neighbors;
-
neural networks, training and testing;
-
optimization by Newton's method, Gradient Descent,
Stochastic Gradient Descent;
-
perceptron, Support Vector Machine;
-
singular value decomposition;
Datasets to be considered will include:
-
baseball teams: total salary versus win/loss record;
-
basketball teams: player salary, position, age, height, weight, sponsorship;
-
Caesarian birth operations, given patient's age, medical history;
-
college admission/rejection versus Math/English SAT scores;
-
consumer grocery purchases;
-
facial recognition;
-
golden gopher versus hopping gopher classification;
-
house prices in Boston;
-
medical costs for patients, given age, sex, weight, smoking status, etc;
-
high school hockey teams: month of birth;
-
movie reviews to be classified as positive or negative;
-
Old Faithful eruption intervals;
-
Titanic survival;
Programming: (some of these, to be taught in class)
-
python
-
numpy;
-
scipy;
-
pandas;
-
seaborn;
-
scikit-learn;
Last revised on 19 October 2021.