Algorithmic work of Tyler King from Newbury Park Highschool

During the summer 2021, I was contacted by a high-school student from Newbury Park High School, Tyler King. In his email Tyler said that over the previous summer (2020) he completed a research project on pathfinding algorithms, and that he wanted to share his results with me.

Tyler and I discussed his paper online, and I was really impressed with his work which he titled: Minimum Path Star Topology Algorithms for Weighted Regions and Obstacles. I gave Tyler some feedback, but the work was original and interesting – Tyler’s incredible mathematical maturity made me think that I was lucky to be engaging with a modern day Évariste Galois!

We met a few times, and Tyler invited me to be his co-author, although the work in the paper is his. Tyler is now a student at Cornell university, and his work can be read here: https://arxiv.org/abs/2109.06944.

Enrollment Predictions with Machine Learning

Happy to announce that our joint paper, Enrollment Predictions with Machine Learning, co-authored with Hung Dang, Ginger Reyes Reilly and Katharine Soltys, has appeared in Volume 9, number 2, of the Strategic Enrollment Management Quarterly (SEMQ).

In this paper a Machine Learning framework for predicting enrollment is proposed. The framework consists of Amazon Web Services SageMaker together with standard Python tools for Data Analytics, including Pandas, NumPy, MatPlotLib and Scikit-Learn. The tools are deployed with Jupyter Notebooks running on AWS SageMaker. Based on three years of enrollment history, a model is built to compute — individually or in batch mode — probabilities of enrollments for given applicants. These probabilities can then be used during the admission period to target undecided students. The audience for this paper is both SEM practitioners and technical practitioners in the area of data analytics. Through reading this paper, enrollment management professionals will be able to understand what goes into the preparation of Machine Learning model to help with predicting admission rates. Technical experts, on the other hand, will gain a blue-print for what is required from them.

This paper has been made possible in part by the AWS Pilot Program in Machine Learning where California State University Channel Islands was one if the participating institutions.

Our paper with Neerja Mhaskar on logic of stringology appeard in Discrete Applied Math

In this paper, A new formal framework for Stringology is proposed, which consists of a three-sorted logical theory $S$ designed to capture the combinatorial reasoning about finite strings. We propose a language $L_S$ for expressing assertions about strings, and study in detail two sets of formulas $\Sigma_0^B$, a set of formulas decidable in polytime, and $\Sigma_1^B$, a set of formulas with the property that those provable in $S$ yield polytime algorithms.

Microsoft Open Sources Homomorphic Encryption Library

My student Dragan Rakas and I worked on an early version of this technology in 2013, and we found it to be a very interesting but difficult problem. You can read our about it here.

Microsoft has open sourced a homomorphic encryption library developed by its Cryptography Research group, saying it “strongly believes” the technology is ripe for use in real-world applications, as it makes the source code available on GitHub. (Here is the link to the GitHub repository.)

Geetanjali Agarwal successfully defended her MSCS thesis on image recognition

My student Geetanjali (Geet) Agarwal defended her masters thesis titled Aneka – Wavelet Image Hashing Algorithm, see announcement, where the contribution is a framework of hashing algorithms for image recognition. This important work is done in collaboration with the SoCal High Technology Task Force (HTTF). Geet deployed the AWS to accomplish her results, including EC2 instances and MySQL databases used to run experiments on thousands of images. Geet’s thesis will be available after the final draft is ready.

Iddo Tzameret and Stephen Cook strengthened our results from 2004 in the computational complexity of linear algebra

During my PhD years (1997-2001) in the Computer Science department, at the University of Toronto, my advisor Stephen Cook and I worked on laying the computational complexity foundations of Linear Algebra. To that end we deployed Berkowitz’s algorithm for computing the characteristic polynomial, as it allowed us to state major theorems of linear algebra in the theory NC2 (fast parallel computations). We published the final version of our result in the Annals of Pure and Applied Logic. Recently, Iddo Tzameret and Stephen Cook strengthened those results considerably in this paper.

Using AWS on a project in collaboration with SoCal HTTF to decrypt a password

Anyone working in the field of Digital Forensics is aware that a substantial portion of time is dedicated to reverse engineering passwords. That is, in most cases a digital forensics investigator receives a password-protected handheld device, or a laptop with an encrypted hard disk, or a Microsoft Word document which has been password protected.

It is then the task of the investigator to try to retrieve the evidence, and that in turns requires reverse engineering the password; in some cases this can be achieved by recovering the hash of the password, which is stored somewhere (the locations are often known) on the device’s memory.

In order to obtain the password from the hash, we have to run a brute-force search algorithm that guesses passwords (the guesses can be more or less educated, depending on what is known about the case). Sometimes we get lucky. There are two programs that are used extensively for this purpose: John the Ripper and hashcat.

As we have been studying methods for recovering passwords from hashes, we have been using AWS EC2 instances in order to run experiments and help HTTF with their efforts. Together with senior capstone students as well as graduate students in Cybersecurity, we have been creating a set of guidelines and best practices to help in the recovery of passwords from hashes. AWS EC2 instances are ideal as they can be crafted to the needs and resources of a particular case. For example we are currently running a `t2.2xlarge` instance on a case where we have to recover the password of a Microsoft Word document; we have also used a `p2.16xlarge` with GPU-based parallel compute capabilities, but it costs \$14/hour of usage, and so we deploy it in a very surgical manner.

Paper with Ryan McIntyre to appear in the Journal of Discrete Algorithms

Happy to announce that Ryan McIntyre’s masters thesis results, An improved upper bound and algorithm for clique covers (prelim), will be published as our joint paper in Journal of Discrete Algorithms.

Our paper is on indeterminate strings, which are important for their applicability in bioinformatics. (They have been considered, for example, in Christodoulakis 2015  and Helling 2017.)

An interesting feature of indeterminate strings is the natural correspondence with undirected graphs. One aspect of this correspondence is the fact that the minimal alphabet size of indeterminates representing any given undirected graph corresponds to the size of the minimal clique cover of this graph. This paper first considers a related problem proposed in Helling 2017: characterize \$latex\Theta_n(m)\$, which is the size of the largest possible minimal clique cover (i.e., an exact upper bound), and hence alphabet size of the corresponding indeterminate, of any graph on $n$ vertices and $m$ edges. We provide improvements to the known upper bound for $\Theta_n(m)$. Helling 2017 also presents an algorithm which finds clique covers in polynomial time. We build on this result with an original heuristic for vertex sorting which significantly improves their algorithm’s results, particularly in dense graphs.

This work was the result of building on Helling 2017 (see this post) and of a year of research undertaken by Ryan McIntyre under my (Michael Soltys) supervision at the California State University Channel Islands.

iSprinkle – a topical capstone project @CSUCI @cagomez805

iSprinkle is a Raspberry Pi powered irrigation controller which will allow a user to set an initial irrigation schedule for a sprinkler system using a web interface, after which it will use the local weather forecast to adjust the base watering schedule as needed. iSprinkle is the result of a senior Capstone project (COMP499)  at California State University Channel Islands, undertaken by student Carlos Gomez in 2016, advised by Michael Soltys. A detailed write up of the project, where we partnered with Prof. Adam Sędziwy (who visited CI in June 2016), can be found here:

A short version of the above paper will be presented at INDOTEC2017:

iSprinkle was also presented at SCCUR 2016, the Southern California Council for Undergraduate Research Conference at UC Riverside on November 12, 2016.

The design of a system such as iSprinkle requires a holistic approach that is very different from most class assignments. The former usually span a few files that are to be turned in within a week or two, making it difficult to implement a system with many “moving parts.” However, iSprinkle’s functionality is divided between the front-end and back-end, both of which need to communicate so that the user’s requests are fulfilled. Designing such a system requires taking into consideration many aspects; from major decisions such as deciding on a backend language to use, to minutiae such as the date and time formats to use across the backend and front end to maintain consistency.

By doing so, iSprinkle will be able to irrigate more efficiently compared to a fixed schedule; by progrmmatically modifying the user’s watering schedule, iSprinkle will increase/decrease the amount of watering that the schedule dictates depending on data that it receives from a weather API. iSprinkle hopes to make it easier for homeowners to conserve water by automating adjustments to their irrigation schedule.

Carlos Adrian Gomez has since graduated from CI, and is working as a Software Engineer in the Ventura area.