My student Geetanjali (Geet) Agarwal defended her masters thesis titled Aneka – Wavelet Image Hashing Algorithm, see announcement, where the contribution is a framework of hashing algorithms for image recognition. This important work is done in collaboration with the SoCal High Technology Task Force (HTTF). Geet deployed the AWS to accomplish her results, including EC2 instances and MySQL databases used to run experiments on thousands of images. Geet’s thesis will be available after the final draft is ready.
It was a pleasure to speak at the AWS/CSU Research in the Cloud series. By nature I am not a strong promoter of any technology, and the browser, OS or editor “wars” frankly bore me; I sometimes use a “lesser” technology because it happens to be more convenient, or because I don’t have the time to learn a “better” technology, or many other good reasons.
However, as a researcher and teacher I am absolutely thrilled with what AWS has to offer. I regularly give tours of our computer labs at CSU CI (to local companies, prospective graduate students, CSU trustees, fundraising prospects, etc.), and I explain that three things make it possible for a relatively small and unknown campus like ours to compete in scientific & engineering output in the national and international arena:
- How cheap embedded systems have become; a Google Raspberry Pi is $35, and it comes with Linux and GPIO that makes it into a universal controller.
- How cheap 3D printing has become, and in turn this frees us to some extent from having to build an expensive manufacturing lab.
- And AWS: Amazon Cloud Computing Services. Instead of buying, maintaining, cooling and powering expensive servers, we can immediately utilize the required services, and pay as we go. This works very well for a university because we do not have to make up-front capital investments, and our usage is not always the same (e.g., practically no classes in the summer).
Material related to the talk
- Examples of AWS related projects that my students and I have undertaken over the last year: http://prof.msoltys.com/?tag=aws.
- AWS presentation slides.
- Video of the presentation (my talk start at about 12min)
Voyager is a software that implements what is called an invisible bit (aka, a tracking bit), that can be used to track certain activities. Voyager deploys the AWS network infrastructure, and its Data Base, the Relational Database Service (RDS). Voyager has been implemented at CI by a group of Computer Science students, as a Research & Development project for the HTTF. From AWS website:
Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups. It frees you to focus on your applications so you can give them the fast performance, high availability, security and compatibility they need.
For this project, we are also using the following tools: EC2, S3 and Route 53.
Anyone working in the field of Digital Forensics is aware that a substantial portion of time is dedicated to reverse engineering passwords. That is, in most cases a digital forensics investigator receives a password-protected handheld device, or a laptop with an encrypted hard disk, or a Microsoft Word document which has been password protected.
It is then the task of the investigator to try to retrieve the evidence, and that in turns requires reverse engineering the password; in some cases this can be achieved by recovering the hash of the password, which is stored somewhere (the locations are often known) on the device’s memory.
In order to obtain the password from the hash, we have to run a brute-force search algorithm that guesses passwords (the guesses can be more or less educated, depending on what is known about the case). Sometimes we get lucky. There are two programs that are used extensively for this purpose: John the Ripper and hashcat.
As we have been studying methods for recovering passwords from hashes, we have been using AWS EC2 instances in order to run experiments and help HTTF with their efforts. Together with senior capstone students as well as graduate students in Cybersecurity, we have been creating a set of guidelines and best practices to help in the recovery of passwords from hashes. AWS EC2 instances are ideal as they can be crafted to the needs and resources of a particular case. For example we are currently running a
t2.2xlarge instance on a case where we have to recover the password of a Microsoft Word document; we have also used a
p2.16xlarge with GPU-based parallel compute capabilities, but it costs $14/hour of usage, and so we deploy it in a very surgical manner.
As I am working through the AWS Academy Cloud Computing Architecture – Instructor Accreditation, we are going to offer COMP 529, our Cloud Computing course in the Computer Science masters program, using the AWS curriculum. This is a service offered through the AWS Academy. The students who complete the course will be ready to take the AWS Cloud Solutions Architect certification.
The first lecture will be on Thursday January 24, 2019, in Sierra Hall 1131 (the Computer Science Networking & Security Lab).
“Storage Evaluator And Knowledge Extraction Reader”
On Monday August 7, at 6pm, in DEL NORTE 1530, the COMP 524 (Cybersecurity) students will present their final project, a technical solution for the SoCal High Technology Task Force in Ventura. This project implements a digital forensic tool with strict performance requirements.
We used GitHub as the software repository, Dropbox Paper for the documentation Wiki, and AWS S3 for distribution of the production version of the software.
You are cordially invited to attend; the presentation will take about two hours, and there will be snacks (Short link to this post: https://wp.me/p7D4ee-FJ).
This tool can be used to upload a target file, directory, or URL to Virus Total, a website that scans the target with around 60 virus scanners from the industry. If the target is not already in the Virus Total database, the scan will be queued and completed shortly. As this is an asynchronous process, this tool is useful in uploading jobs, checking if jobs have completed, and displaying reports on completed jobs. The system also keeps track of all files uploaded, performs checks on already uploaded files to save bandwidth, saves all completed reports in a list, and all positive reports in a separate list.
Utilizing Amazon Web Services (AWS), Elastic Compute Cloud (EC2), and Simple Storage Service (S3), this system can be set up allow users to place files into a S3 bucket which will then be scanned automatically and user can be notified of any possible positives found.
- The User places a file they wish to scan into the S3 bucket, such as
- A dedicated EC2 instance watches the bucket, detects the new file, and uploads the file to Virus Total.
- The EC2 instance waits until Virus Total returns a completed report.
- If any positives are found the instance notifies the user, otherwise the report is added to the completed list.
Virus total has a public API that is limited to 6 uploads per minute, but CSU Channel Islands was granted research API access which is limited to 600 uploads per minute.
Mattias is going to make this tool available for everyone through GitHub.
I can’t say how happy I am to have AWS. I just got the account set up, started my first instance, and run a simulation for a very interesting project that I am working on with Ryan McIntyre (a student in CS). What took about 15min on my Mac Pro quad core, took 1m40s on the AWS instance.
This is a brave new world! 🙂 . Here us the summary of the experiment:
~/EdgeGraph/EdgeGraph$ time python3 cover_vs_edges.py How many vertices? 7 Generating graphs... Filtering isomorphisms... Sorting graphs... Checking up to 21 edges... 0 / 21 edges complete. 1 / 21 edges complete. 2 / 21 edges complete. 3 / 21 edges complete. 4 / 21 edges complete. 5 / 21 edges complete. 6 / 21 edges complete. 7 / 21 edges complete. 8 / 21 edges complete. 9 / 21 edges complete. 10 / 21 edges complete. 11 / 21 edges complete. 12 / 21 edges complete. 13 / 21 edges complete. 14 / 21 edges complete. 15 / 21 edges complete. 16 / 21 edges complete. 17 / 21 edges complete. 18 / 21 edges complete. 19 / 21 edges complete. 20 / 21 edges complete. 21 / 21 edges complete. elapsed time: --- 96.4 seconds --- real 1m40.000s user 1m36.812s sys 0m0.113s