We are very happy to have been selected for an SageMaker Pilot for AWS Educate Classrooms! Machine Learning (ML) is a top hard skill for graduates, and it is also becoming a premier tool for research in all areas. SageMaker Studio is a complete development environment for ML.
The theory of ML can always be taught, but in order to have hands on experience with ML, a computing infrastructure is required that is beyond the means of most educational institutions. Our students will have access to AWS Educate accounts with credits to use the SageMaker Studio environment, and access to to powerful CPU/GPU resources (ml.m5.xlarge
, ml.c5.xlarge
, and ml.g4dn.xlarge
) for training ML models.
ML use cases include SPAM filtering for emails, recommender systems, e.g., Netflix show recommendations, and uncovering credit card fraud. There are three types of ML: supervised, where the data is labeled and the expected outputs are well understood (is an, is this email SPAM or not); unsupervised, where the ML algorithm has to discover the salient properties of the data; and, reinforcement, where some agent (e.g., RoboMaker) interacts with an environment and learns to navigate it through a system of rewards.
SageMaker supports many leading deep learning frameworks, including: TensorFlow, PyTorch, Apache MXNet, Chainer, Keras, Gluon, Horovod, Scikit-learn, and Deep Graph Library.
We applied last July to be part of the AWS pilot program to make SageMaker available to our students, and we were approved to start this fall 2020. We have a group of about 10 students who are going to be learning to use under my supervision.
We are building on our growing expertise in Artificial Intelligence. This fall term, professor Reza Abdolee is teaching a graduate class in AI (COMP569) and professor Bahareh Abbasi is teaching both an undergraduate course in AI (COMP469) and a graduate class in Neural Networks (COMP572).
ML is one of the areas of AWS certification.
Students will learn a variety of auxiliary tools; as you will see from this list, the Python programming language is central to Data Analytics:
- Jupyter Notebook and Jupyter Lab: an open-source web application that allows the creation and sharing of documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, etc.
- Pandas: a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
- Seaborn: a library for making statistical graphics in Python. It is built on top of Matplotlib and closely integrated with Pandas data structures.
- Scikit-learn: a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms.
- Matplotlib: a comprehensive library for creating static, animated, and interactive visualizations in Python.
- NumPy: a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
- PyTorch (AWS testimonials): an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab.
In the words of Jose Cahue:
One of the major hurdles to learn ML as a student is having access to a machine optimized for model training. Cloud computing can be one practical solution to provide the computation resources needed to learn ML.
Jose Cahue