In this paper a Machine Learning framework for predicting enrollment is proposed. The framework consists of Amazon Web Services SageMaker together with standard Python tools for Data Analytics, including Pandas, NumPy, MatPlotLib and Scikit-Learn. The tools are deployed with Jupyter Notebooks running on AWS SageMaker. Based on three years of enrollment history, a model is built to compute — individually or in batch mode — probabilities of enrollments for given applicants. These probabilities can then be used during the admission period to target undecided students. The audience for this paper is both SEM practitioners and technical practitioners in the area of data analytics. Through reading this paper, enrollment management professionals will be able to understand what goes into the preparation of Machine Learning model to help with predicting admission rates. Technical experts, on the other hand, will gain a blue-print for what is required from them.
Sarah Hassan is a 2021 graduate with a BS in Computer Science and a minor in both Visual Media Communication and Mathematics. During her years at CSUCI, Sarah was working part-time at her local Apple Store as a Technical Specialist. While working at Apple and being a fresh graduate, she was granted the opportunity to partake in what is known as a Career Experience, an opportunity for employees to experience a new role while contributing to important projects at Apple. Her role is a Siri Experience Prototyper in the Siri Conversational Interaction team. Sarah believes that her Capstone project (an iOS application) was able to leave a good impression during her interviews along with her graphic design knowledge. She was able to share her link to her Capstone project and discuss technical/design challenges she’s faced while also sharing graphic design work she has done at CSUCI.
We are very happy to have been selected for an SageMaker Pilot for AWS Educate Classrooms! Machine Learning (ML) is a top hard skill for graduates, and it is also becoming a premier tool for research in all areas. SageMaker Studio is a complete development environment for ML.
The theory of ML can always be taught, but in order to have hands on experience with ML, a computing infrastructure is required that is beyond the means of most educational institutions. Our students will have access to AWS Educate accounts with credits to use the SageMaker Studio environment, and access to to powerful CPU/GPU resources (ml.m5.xlarge, ml.c5.xlarge, and ml.g4dn.xlarge) for training ML models.
ML use cases include SPAM filtering for emails, recommender systems, e.g., Netflix show recommendations, and uncovering credit card fraud. There are three types of ML: supervised, where the data is labeled and the expected outputs are well understood (is an, is this email SPAM or not); unsupervised, where the ML algorithm has to discover the salient properties of the data; and, reinforcement, where some agent (e.g., RoboMaker) interacts with an environment and learns to navigate it through a system of rewards.
SageMaker supports many leading deep learning frameworks, including: TensorFlow, PyTorch, Apache MXNet, Chainer, Keras, Gluon, Horovod, Scikit-learn, and Deep Graph Library.
We applied last July to be part of the AWS pilot program to make SageMaker available to our students, and we were approved to start this fall 2020. We have a group of about 10 students who are going to be learning to use under my supervision.
We are building on our growing expertise in Artificial Intelligence. This fall term, professor Reza Abdolee is teaching a graduate class in AI (COMP569) and professor Bahareh Abbasi is teaching both an undergraduate course in AI (COMP469) and a graduate class in Neural Networks (COMP572).
Students will learn a variety of auxiliary tools; as you will see from this list, the Python programming language is central to Data Analytics:
Jupyter Notebook and Jupyter Lab: an open-source web application that allows the creation and sharing of documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, etc.
Pandas: a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.
Seaborn: a library for making statistical graphics in Python. It is built on top of Matplotlib and closely integrated with Pandas data structures.
Scikit-learn: a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms.
Matplotlib: a comprehensive library for creating static, animated, and interactive visualizations in Python.
NumPy: a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
PyTorch (AWS testimonials): an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab.
One of the major hurdles to learn ML as a student is having access to a machine optimized for model training. Cloud computing can be one practical solution to provide the computation resources needed to learn ML.
What is interesting about this is the serendipitous manner in which results build on each other: our result consisted in a partial solution to an original problem in combinatorics posed by the itinerant mathematician Paul Ërdos (posed in the mid 1960s), which we then used to partially solve a problem related to string indeterminates (also in this case working on previous results of Joel Helling [post]), which are related to genetics. Now, our work is being used to solve the problem of satellite allocation.
CSUCI Master of Computer Science students were successful in submitting two papers to KES 2020, the 24rd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, which this year is taking place in Verona, Italy, in September 2020. However, due to the COVID pandemic, the conference will be held virtually. The papers are the following:
Malware Persistence Mechanisms, co-authored by Zane Gittins and Michael Soltys. Zane Gittins is a masters student in Computer Science at CSUCI, and this paper is the result of his masters thesis. Zane Gittins has worked as a Cybersecurity experts at HAAS, and currently is working at Meissner Filtration. (This paper will be presented in the General Track session G3b: Cybersecurity.)
Voyager: Tracking with a Click, co-authored by Samuel Decanio, Kimo Hildreth and Michael Soltys. Sam Decanio is a masters student in Computer Science at CSUCI, and this paper is the result of his masters thesis and a fruitful collaboration between Computer Science at CI and the SoCal High Technology Task Force. Sam Decanio is currently working at the Navy. (This paper will be presented in the General Track session G3b: Cybersecurity.)
Point 1: Don’t think of this move to online teaching as a one-off; this is the new normal. At California State University we have had to move to online teaching practically every year in the last five years: fires (twice), shootings, and now the pandemic. So think of the COVID-19 pandemic as an opportunity to build an online offering that can serve your department and students for years. You should have an online version for all your classes, not only for emergencies, but also to be responsive to the current reality where so many students want online offerings.
Point 2: There are two initial “shifts” in the move to online teaching. First, the pedagogical shift to not teaching in the classroom, where it is easy to connect with students physically present, to read facial expressions and adjust your teaching accordingly, to chat with some of them in person after class. Second, the shift to a different usage of tools, or a different set of tools altogether: Zoom, Canvas, Piazza, MyITLab, Slack, Microsoft Teams, and of course AWS Educate offerings. Both “shifts” require some time; e.g., think of how you are going to compensate for lack of physical presence, and do not start learning Zoom half an hour before the first class.
Point 3: In Point 2 we mentioned the challenge of not having the students physically present; how are you going to compensate for the lack of interaction that you are used to? I use Slack to create a collaborative environment in the class. I dedicate a channel to the course, and include all the students in the channel. Students can interact with me (the instructor), but even more importantly, they can interact with each other, and they do! Here appears one advantage of online teaching: often, as the students sit to write down a difficulty they encounter in the course, by the act of writing it in a public forum, they concentrate more than they do when asking verbally in class, and the question is better formed and often the answer appears in the process. Also, having those interactions recorded in the channel allows us to point them out later if the question comes up again. Further interaction comes by using Zoom on a regular basis, both to teach, and to have office hours / question periods.
Point 4: In Point 2 we mentioned the challenge of shifting to a new set of tools. For Computer Science faculty this is relatively easy from the technical perspective. We are familiar with cloud-based tools, and our students like IT tools, and so the move is seamless. What can be problematic is how these tools are deployed; that is, the heavy reliance on these tools can make the course about them instead of making them ancillary to the objective of the course. The solution here is to explain, or even better automate, the aspects of the tools that are not intrinsic to the topic being taught. For example, we use AWS Educate accounts to teach our Computer Architecture class (COMP 262), a sophomore course where student learn about different microprocessor architectures and assembler level programming. Being able to deploy AMI (Amazon Machine Images) with certain architectures frees the student to concentrate on the point of the exercise: the differences in architecture.
Point 5: It is important to be creative. More material can be taught successfully online than one would expect. For example, we have a senior elective in “mobile robotics” (COMP 470), which includes a lot of hands on lab work. It may seem hopeless to simulate such a course online, but it is not – we used the material in AWS Educate RoboMaker class to create virtual labs. Students can be given the relatively inexpensive robots (e.g., Amazon Deep Racer, ~$300 each), and participate in a lab by doing the hands-on activity at home, but testing and competing in a virtual environment in the cloud.
Point 6: Do not think of online teaching as simulating classroom teaching. It is a different entity, with its advantages and disadvantages; concentrate on the advantages. For example, simply using Zoom to deliver a lecture at the same times as a regular lecture won’t do. Your lecture will be dry, you will feel frustrated as you feel as if you were talking into your own screen instead of a classroom full of students. Use Zoom to create an interactive environment, including quizzes (there are some nice tools to deliver interactive quizzes which always awaken a sense of fun competition along students; e.g., Kahoots, Quizzez), Zoom breakout rooms, question and answer sessions, presentations by students, etc.
Point 7: Grading has to be changed. For example, rely more on assignments, as in a final assignment rather than a final exam. Tests and exams can still be given, but I would suggest to give them as multiple-choice quizzes with limited times per question, in order not to make them exercises in who can Google-search faster.
Point 8: In my experience online teaching has to be very well structured and organized, and the communication with the class has to be excellent: frequent, repetitive and complete. Students should know exactly what they need to do each week, and where to go with questions.
Point 9: Communicate enjoyment, passion and enthusiasm for the material. One of the most important roles of a teacher is to reassure the student that time spent with you, and the effort required to master your difficult material, is a worthy pursuit. Tell the students what is the treasure that they will possess upon completion, what we dryly call SLO (Student Learning Outcomes), but which is the raison d’être for your course. Present your online offering not as “the 2nd best given the circumstances”, but rather as a great opportunity to work with others in an online setting – remember, this is the direction in which the IT world is moving, and students will benefit greatly from having the experience of being self-motivating, accountable and working with others online.
Point 10 (Bonus for Comp Sci instructors): Some material can be taught very easily online. For example, I prefer to teach programming classes in a blended online environment, even when we do not have a crisis! The reason is that Amazon Cloud9 is a perfect cloud-based IDE (Integrated Development Environment) that has many advantages over a machine-in-a-lab IDE: first, everyone has exactly the same environment, which I can customize to the needs of the course as precisely as I choose, and everyone can access this environment independently of the type of computer they have, as all it requires is a wi-fi connection and a browser. It also allows me to enter the environment from the “outside”, and code with the student watching my changes. This is really fantastic!
Alfred Camposagrado is a Principal Embedded Software Engineer at Northrop Grumman. He received his Bachelor’s in Computer Science at CSUCI in 2014. He started his journey in Camarillo working as a Software Engineer for Crescendo Interactive shortly after graduation. He gained valuable experience by initially starting as a front-end developer and later promoted to a Full-Stack developer focusing on Java. His experience in Java landed him a job at Northrop Grumman. Located in Point Mugu, he supports the US Navy with various projects from software development to system integration tests. He also continues his education at CSUCI in the Masters of Computer Science Program (MSCS). http://linkedin.com/in/alfredcamposagrado
I am very happy to be part of the California Governor’s Cybersecurity Task Force (GCTF), serving on the Workforce Development and Education Subcommittee. The main objective of this subcommittee is to address the growing workforce gap; currently, there are 37,000 available cybersecurity positions in California, and 314,000 in the nation. About 70% of those positions require a 4 year degree or more.
The aim of our subcommittee is three fold: to enrich and standardize the educational pathway from K12 to PhD/Certification; to teach a general Cyber hygiene, both to the workforce and the public; and to help military, especially veterans, transition into civilian careers in Cybersecurity.
Computer Science at CI is well positioned to address some of the challenges:
A thriving program in Computer Science, with a minor in Cybersecurity; we are part of CyberWatchWest, we have a Cybersecurity student club, and we teach courses in Cybersecurity at the undergraduate and graduate level.
Experience in “hands-on” education, which is one of the aims of the workforce development. We have strong connections with the industry and the public sector (such as the SoCal High Technology Task Force).
An ongoing collaboration with the Navy, and have worked with both Navy officer and civilians as instructors and collaborators.
In the summer 2017, while I was teaching COMP 524 (Cybersecurity) at California State University Channel Islands, the students were introduced to a project based on an R&D from the SoCal High Technology Task Force (HTTF). The requirements and specifications asked for a device that could automate the search through vast amounts of data contained in portable devices (such as hard disks and thumb-drives), looking for pre-established patterns in file-names.
In September of 2018, a group of CI students, working on their senior capstone project under my supervision, started to build a machine capable of massive parallel computing. We christened the machine “The Beast.” We undertook to build the machine following the specification of the So Cal High Technology Task Force (HTTF) digital forensics lab in Ventura County.
The Beast was built with five EVGA GeForce GTX 1080Ti, capable of massive computational parallelism, a MSI Z370-A-Pro motherboard, a i5-8400 CPU, as well as a Hydra II 8 GPU 6U Server Mining Rig Case, and power supplied capable of maintaining four big fans; cooling The Beast was an important part of the project.
The students who participated in the project were, in alphabetical order, Noelle Abe, Benjamin Alcazar, Matthew Atcheson, Joshua Buckley, Joshua Carter, John Miller, Scott Slocum, Ryan Torres and Devon Trammell (the team leader). On May 2nd, after working on the project during both terms of 2018/19, and having overcome many technical difficulties, the team presented The Beast at the Computer Science Advisory Board Meeting and the Computer Science Capstone Showcase; following these presentation, The Beast was handed over to the SoCal HTTF digital forensics lab. As you can see from the first picture above, The Beast has settled in its new home, a cooling room at the HTTF lab.