COMP 469 – Artificial Intelligence / Machine Learning – Fall 2024

AI and ML have recently taken the world by storm; applications multiply, from autonomous vehicles, to ChatGPT, to recommender systems, to automated software development. This course aims to cover the latest developments in ML from an applications (rather than a theoretical) perspective, motivated by the conviction that everyone can, and should, be familiar with a technology that will deeply affect the next decades.

Course Outline: The course covers the many aspects of how human intelligence might be encoded in computer programs and machines such as robots. This includes topics in Natural Language Processing, Computer Vision, Expert Systems, and Automated Problem Solving. The course will especially concentrate on the latest developments in Machine Learning (ML), using Amazon Web Services (AWS) ML services.

Student Prerequisites: The intention of this course is to be advanced but self contained. However, some familiarity (not expertise!) in the following areas will be helpful:

Linear Algebra
Statistics and probability
Fundamentals of programming, especially Python
Understanding of basic concepts of Cloud Computing (e.g., the knowledge in AWS Certified Cloud Practitioner; at CSUCI this content is taught in COMP 347)
Jupyter Notebook is the preeminent interactive lab environment used for data analytics and Machine Learning prototyping; some familiarity with this environment would be helpful as it will be the main tool we use in the course, but not strictly necessary as students will have an opportunity to learn this great tool.

Course Delivery Method: Weekly online meetings outlining the content, but the student will be expected to do asynchronous learning, using the materials provided. There will be lecture slides, student guides, coding lab activities using Jupyter Notebooks, and knowledge checks. All materials will be provided to the student at no cost. We will use Jupyter Notebook on two platforms: Amazon SageMaker Studio Lab and Google Colab. Canvas pages:

Grade: TBD

Partners

As CSUCI is a partner with AWS Academy and AWS Machine Learning University, there is no cost to the student.

Student Learning Outcomes

Determine if a business problem is a good candidate for a machine learning solution based on problem goal, available data, scalability, and other factors.
Gain hands-on experience with Jupyter notebooks, Amazon SageMaker, Google Colab and Python ML libraries, which are powerful tools for developing and deploying machine learning models.
Learn about AutoML and AutoGluon and how they can be used to automate the tedious parts of the machine learning pipeline, freeing up time for more important tasks.
Understand the importance of data pre-processing and feature engineering, overfitting/under-fitting and how to avoid them by using regularization techniques.
Learn about different types of machine learning models, including tree-based models, regression models and ensembling models, and how to select and evaluate the best model for a given task.

Content

Part 1

Introduction: what is Machine Learning?
Jupyter Notebooks and Amazon SageMaker
Exploratory Data Analysis
Responsible ML
Types of ML
Overfitting and Underfitting
AutoML and AutoGluon
Generating batch predictions from AutoGluon models
Basic Feature Engineering
Tree-based models
Optimization and regression
Hyperparameter tuning
Ensembling and Boosting
Exploring bias in data and fairness metrics
Implementing an ML pipeline
Introducing forecasting
Introducing Natural Language Processing (NLP)
Introducing Computer Vision

Part 2

Introduction to Deep Learning on Text and Images
Introduction to Neural Networks: Layers and Activations
How Neural Networks Learn
First Examples of Neural Networks
Building an End-to-End Neural Network Solution
Neural Network Engineering
Challenges of Textual Data and Domains of NLP
Processing Text
Word Embeddings
Recurrent Neural Networks
RNN Example with a Practical Dataset
Transformers
How Are Images Stored in a Computer?
The Concept of Convolution
Convolutional Neural Networks
ResNet: The Trade-Offs of Depth and Model Performance
Modern Architectures
Transfer Learning

Schedule

MLTA stands for Machine Learning Through Application, which is Part 1 of the course, and ADLTID stands for Application of Deep Learning to Text and Image Data, which is Part 2 of the course.

Aug 27	Intro to ML	MLTA – M1	Lab 1,2
Sep 3	Intro to ML	MLTA – M1	Lab 3,4,5
Sep 10	Tabular Data	MLTA – M2	Lab 1,2
Sep 17	Tabular Data	MLTA – M2	Lab 3,4
Sep 24	Tabular Data	MLTA – M2	Lab 5,6
Oct 1	Responsible ML	MLTA – M3	Lab 1,2
Oct 8	Responsible ML	MLTA – M3	Lab 3,4
Oct 15	Neural Networks	ADLTID – M1	Lab 1,2
Oct 22	Neural Networks	ADLTID – M1	Lab 3,4
Oct 29	Text Data	ADLTID – M2	Lab 1
Nov 5	Text Data	ADLTID – M2	Lab 2,3
Nov 12	Text Data	ADLTID – M2	Lab 4,5
Nov 19	Computer Vision	ADLTID – M3	Lab 1,2
Nov 26	Computer Vision	ADLTID – M3	Lab 3,4
Dec 3	Computer Vision	ADLTID – M3	Lab 5,6