In this paper, A new formal framework for Stringology is proposed, which consists of a three-sorted logical theory designed to capture the combinatorial reasoning about finite strings. We propose a language for expressing assertions about strings, and study in detail two sets of formulas , a set of formulas decidable in polytime, and , a set of formulas with the property that those provable in yield polytime algorithms.

# Category: Research

## Eric Gentry presenting SEAKER at FICC 2019

## Microsoft Open Sources Homomorphic Encryption Library

*My student Dragan Rakas and I worked on an early version of this technology in 2013, and we found it to be a very interesting but difficult problem. You can read our about it here.*

Microsoft has open sourced a homomorphic encryption library developed by its Cryptography Research group, saying it “strongly believes” the technology is ripe for use in real-world applications, as it makes the source code available on GitHub. (*Here is the link to the GitHub repository*.)

Source: *Microsoft Open Sources Homomorphic Encryption Library*

## Geetanjali Agarwal successfully defended her MSCS thesis on image recognition

My student Geetanjali (Geet) Agarwal defended her masters thesis titled ** Aneka – Wavelet Image Hashing Algorithm, **see announcement, where the contribution is a framework of hashing algorithms for image recognition. This important work is done in collaboration with the SoCal High Technology Task Force (HTTF). Geet deployed the AWS to accomplish her results, including EC2 instances and MySQL databases used to run experiments on thousands of images. Geet’s thesis will be available after the final draft is ready.

## Iddo Tzameret and Stephen Cook strengthened our results from 2004 in the computational complexity of linear algebra

During my PhD years (1997-2001) in the Computer Science department, at the University of Toronto, my advisor Stephen Cook and I worked on laying the computational complexity foundations of Linear Algebra. To that end we deployed Berkowitz’s algorithm for computing the characteristic polynomial, as it allowed us to state major theorems of linear algebra in the theory NC2 (fast parallel computations). We published the final version of our result in the Annals of Pure and Applied Logic. Recently, Iddo Tzameret and Stephen Cook strengthened those results considerably in this paper.

## Using AWS on a project in collaboration with SoCal HTTF to decrypt a password

Anyone working in the field of *Digital Forensics* is aware that a substantial portion of time is dedicated to reverse engineering passwords. That is, in most cases a digital forensics investigator receives a password-protected handheld device, or a laptop with an encrypted hard disk, or a Microsoft Word document which has been password protected.

It is then the task of the investigator to try to retrieve the evidence, and that in turns requires reverse engineering the password; in some cases this can be achieved by recovering the *hash* of the password, which is stored somewhere (the locations are often known) on the device’s memory.

In order to obtain the password from the hash, we have to run a brute-force search algorithm that guesses passwords (the guesses can be more or less educated, depending on what is known about the case). Sometimes we get lucky. There are two programs that are used extensively for this purpose: John the Ripper and hashcat.

As we have been studying methods for recovering passwords from hashes, we have been using AWS EC2 instances in order to run experiments and help HTTF with their efforts. Together with senior capstone students as well as graduate students in Cybersecurity, we have been creating a set of guidelines and best practices to help in the recovery of passwords from hashes. AWS EC2 instances are ideal as they can be crafted to the needs and resources of a particular case. For example we are currently running a `t2.2xlarge`

instance on a case where we have to recover the password of a Microsoft Word document; we have also used a `p2.16xlarge`

with GPU-based parallel compute capabilities, but it costs $14/hour of usage, and so we deploy it in a very surgical manner.

## Paper with Ryan McIntyre to appear in the Journal of Discrete Algorithms

Happy to announce that Ryan McIntyre’s masters thesis results, ** An improved upper bound and algorithm for clique covers (prelim)**, will be published as our joint paper in Journal of Discrete Algorithms.

Our paper is on ** indeterminate strings, **which are important for their applicability in bioinformatics. (They have been considered, for example, in Christodoulakis 2015 and Helling 2017.)

An interesting feature of indeterminate strings is the natural correspondence with undirected graphs. One aspect of this correspondence is the fact that the minimal alphabet size of indeterminates representing any given undirected graph corresponds to the size of the minimal clique cover of this graph. This paper first considers a related problem proposed in Helling 2017: characterize $latex\Theta_n(m)$, which is the size of the largest possible minimal clique cover (i.e., an exact upper bound), and hence alphabet size of the corresponding indeterminate, of any graph on vertices and edges. We provide improvements to the known upper bound for . Helling 2017 also presents an algorithm which finds clique covers in polynomial time. We build on this result with an original heuristic for vertex sorting which significantly improves their algorithm’s results, particularly in dense graphs.

This work was the result of building on Helling 2017 (see this post) and of a year of research undertaken by Ryan McIntyre under my (Michael Soltys) supervision at the California State University Channel Islands.

An article from the CSUCI news center about this work can be found here.

## iSprinkle – a topical capstone project @CSUCI @cagomez805

iSprinkle is a Raspberry Pi powered irrigation controller which will allow a user to set an initial irrigation schedule for a sprinkler system using a web interface, after which it will use the local weather forecast to adjust the base watering schedule as needed. iSprinkle is the result of a senior Capstone project (COMP499) at California State University Channel Islands, undertaken by student Carlos Gomez in 2016, advised by Michael Soltys. A detailed write up of the project, where we partnered with Prof. Adam Sędziwy (who visited CI in June 2016), can be found here:

*iSprinkle: Design and implementation of an internet-enabled sprinkler timer*, by Carlos Adrian Gomez, Adam Sedziwy and Michael Soltys.

A short version of the above paper will be presented at INDOTEC2017:

*iSprinkle: when education, innovation and application meet*, by Carlos Adrian Gomez, Adam Sedziwy and Michael Soltys, to be presented at the*5th International Conference on Educational Innovation in Technical Careers,*INDOTEC 2017.

iSprinkle was also presented at SCCUR 2016, the Southern California Council for Undergraduate Research Conference at UC Riverside on November 12, 2016.

The design of a system such as iSprinkle requires a holistic approach that is very different from most class assignments. The former usually span a few files that are to be turned in within a week or two, making it difficult to implement a system with many “moving parts.” However, iSprinkle’s functionality is divided between the front-end and back-end, both of which need to communicate so that the user’s requests are fulfilled. Designing such a system requires taking into consideration many aspects; from major decisions such as deciding on a backend language to use, to minutiae such as the date and time formats to use across the backend and front end to maintain consistency.

By doing so, iSprinkle will be able to irrigate more efficiently compared to a fixed schedule; by progrmmatically modifying the user’s watering schedule, iSprinkle will increase/decrease the amount of watering that the schedule dictates depending on data that it receives from a weather API. iSprinkle hopes to make it easier for homeowners to conserve water by automating adjustments to their irrigation schedule.

Carlos Adrian Gomez has since graduated from CI, and is working as a Software Engineer in the Ventura area.

## Running experiments on AWS

I can’t say how happy I am to have AWS. I just got the account set up, started my first instance, and run a simulation for a very interesting project that I am working on with Ryan McIntyre (a student in CS). What took about 15min on my Mac Pro quad core, took 1m40s on the AWS instance.

This is a brave new world! 🙂 . Here us the summary of the experiment:

~/EdgeGraph/EdgeGraph$ time python3 cover_vs_edges.py How many vertices? 7 Generating graphs... Filtering isomorphisms... Sorting graphs... Checking up to 21 edges... 0 / 21 edges complete. 1 / 21 edges complete. 2 / 21 edges complete. 3 / 21 edges complete. 4 / 21 edges complete. 5 / 21 edges complete. 6 / 21 edges complete. 7 / 21 edges complete. 8 / 21 edges complete. 9 / 21 edges complete. 10 / 21 edges complete. 11 / 21 edges complete. 12 / 21 edges complete. 13 / 21 edges complete. 14 / 21 edges complete. 15 / 21 edges complete. 16 / 21 edges complete. 17 / 21 edges complete. 18 / 21 edges complete. 19 / 21 edges complete. 20 / 21 edges complete. 21 / 21 edges complete. elapsed time: --- 96.4 seconds --- real 1m40.000s user 1m36.812s sys 0m0.113s

## Rabin Miller algorithm: witnesses of compositness

The Rabin Miller algorithm for primality testing relies on a random selection of a’s for which it performs some basic tests in order to determine whether the input n is prime or composite. If n is prime, all a’s work, but if n is composite it is possible to show that at least half of the a’s work, i.e., at least half of the a’s are witnesses of compositness. But in fact, it seems like a lot more a’s than just half of those in the set {1,2,…,n-1} work; see the following diagram where I test all composites under about 60,000. It is clear that there are more good witnesses than just half the numbers.