AI and ML have recently taken the world by storm; applications multiply, from autonomous vehicles, to ChatGPT, to recommender systems, to automated software development. This course aims to cover the latest developments in ML from an applications (rather than a theoretical) perspective, motivated by the conviction that everyone can, and should, be familiar with a technology that will deeply affect the next decades.
Course Outline: The course covers the many aspects of how human intelligence might be encoded in computer programs and machines such as robots. This includes topics in Natural Language Processing, Computer Vision, Expert Systems, and Automated Problem Solving. The course will especially concentrate on the latest developments in Machine Learning (ML), using Amazon Web Services (AWS) ML services.
Student Prerequisites: The intention of this course is to be advanced but self contained. However, some familiarity (not expertise!) in the following areas will be helpful:
- Linear Algebra
- Statistics and probability
- Fundamentals of programming, especially Python
- Understanding of basic concepts of Cloud Computing (e.g., the knowledge in AWS Certified Cloud Practitioner; at CSUCI this content is taught in COMP 347)
- Jupyter Notebook is the preeminent interactive lab environment used for data analytics and Machine Learning prototyping; some familiarity with this environment would be helpful as it will be the main tool we use in the course, but not strictly necessary as students will have an opportunity to learn this great tool.
Course Delivery Method: Weekly online meetings outlining the content, but the student will be expected to do asynchronous learning, using the materials provided. There will be lecture slides, student guides, coding lab activities using Jupyter Notebooks, and knowledge checks. All materials will be provided to the student at no cost. We will use Jupyter Notebook on two platforms: Amazon SageMaker Studio Lab and Google Colab. Canvas pages:
- AWS MLU Machine Learning through Application
- AWS MLU Application of Deep Learning to Text and Image Data
Grade: TBD
Partners
As CSUCI is a partner with AWS Academy and AWS Machine Learning University, there is no cost to the student.
Student Learning Outcomes
- Determine if a business problem is a good candidate for a machine learning solution based on problem goal, available data, scalability, and other factors.
- Gain hands-on experience with Jupyter notebooks, Amazon SageMaker, Google Colab and Python ML libraries, which are powerful tools for developing and deploying machine learning models.
- Learn about AutoML and AutoGluon and how they can be used to automate the tedious parts of the machine learning pipeline, freeing up time for more important tasks.
- Understand the importance of data pre-processing and feature engineering, overfitting/under-fitting and how to avoid them by using regularization techniques.
- Learn about different types of machine learning models, including tree-based models, regression models and ensembling models, and how to select and evaluate the best model for a given task.
Content
Part 1
- Introduction: what is Machine Learning?
- Jupyter Notebooks and Amazon SageMaker
- Exploratory Data Analysis
- Responsible ML
- Types of ML
- Overfitting and Underfitting
- AutoML and AutoGluon
- Generating batch predictions from AutoGluon models
- Basic Feature Engineering
- Tree-based models
- Optimization and regression
- Hyperparameter tuning
- Ensembling and Boosting
- Exploring bias in data and fairness metrics
- Implementing an ML pipeline
- Introducing forecasting
- Introducing Natural Language Processing (NLP)
- Introducing Computer Vision
Part 2
- Introduction to Deep Learning on Text and Images
- Introduction to Neural Networks: Layers and Activations
- How Neural Networks Learn
- First Examples of Neural Networks
- Building an End-to-End Neural Network Solution
- Neural Network Engineering
- Challenges of Textual Data and Domains of NLP
- Processing Text
- Word Embeddings
- Recurrent Neural Networks
- RNN Example with a Practical Dataset
- Transformers
- How Are Images Stored in a Computer?
- The Concept of Convolution
- Convolutional Neural Networks
- ResNet: The Trade-Offs of Depth and Model Performance
- Modern Architectures
- Transfer Learning
Schedule
MLTA stands for Machine Learning Through Application, which is Part 1 of the course, and ADLTID stands for Application of Deep Learning to Text and Image Data, which is Part 2 of the course.
Aug 27 | Intro to ML | MLTA – M1 | Lab 1,2 |
Sep 3 | Intro to ML | MLTA – M1 | Lab 3,4,5 |
Sep 10 | Tabular Data | MLTA – M2 | Lab 1,2 |
Sep 17 | Tabular Data | MLTA – M2 | Lab 3,4 |
Sep 24 | Tabular Data | MLTA – M2 | Lab 5,6 |
Oct 1 | Responsible ML | MLTA – M3 | Lab 1,2 |
Oct 8 | Responsible ML | MLTA – M3 | Lab 3,4 |
Oct 15 | Neural Networks | ADLTID – M1 | Lab 1,2 |
Oct 22 | Neural Networks | ADLTID – M1 | Lab 3,4 |
Oct 29 | Text Data | ADLTID – M2 | Lab 1 |
Nov 5 | Text Data | ADLTID – M2 | Lab 2,3 |
Nov 12 | Text Data | ADLTID – M2 | Lab 4,5 |
Nov 19 | Computer Vision | ADLTID – M3 | Lab 1,2 |
Nov 26 | Computer Vision | ADLTID – M3 | Lab 3,4 |
Dec 3 | Computer Vision | ADLTID – M3 | Lab 5,6 |