Project topic exploration
One-term comps goes fast. So during this first week, our goal is to get you assigned to teams and working on your projects. This process will go like this:
- Step 1 (due 8:30AM Wednesday, Jan 7): explore a possible project with a randomly assigned team.
- Step 2 (due 11:59PM Thursday, Jan 8): fill out a survey expressing team and project preferences.
- Step 3 (in class Friday, Jan 9 and over the weekend): meet with your team.
- Step 4 (due 11:59PM Monday, Jan 12): propose two candidate projects with light-weight project plans.
- Step 5: after feedback from me, you'll be able to get to work by Wednesday, Jan 14.
This page describes the Step 1 assignment.
For Wednesday: Explore one possible topic
Your job for Wednesday is to work with your random team to prepare a short report on a single malware-related topic.
NOTE: you are not committing to work on this project for the whole term. You're just practicing some initial research, thinking about possible projects related to malware, and learning what the other teams have investigated.
- Meet your randomly assigned team in class on Monday. Pick a topic.
- Spend a couple hours on your own investigating your topic.
- Prepare a personal report containing the following information.
- A short description of the problem/topic in your own words.
- A list of deliverables for the final version of your project. That is, what do you imagine this project's code, data, and documentation consisting of by the end of the term?
- A few of your main questions about the topic (so far). What words or concepts are confusing? Is imagining appropriate deliverables difficult? Can you foresee barriers to success? etc.
- A list of the best links you found on this topic.
- Meet with your team to discuss your personal reports. Merge those
reports and record the result in a 3-slide Google Slides presentation:
- Slide 1: Brief description of the project. In the "speaker notes" portion of this slide, put your list of useful links.
- Slide 2: List of essential deliverables.
- Slide 3: Your group's subjective evaluation. Cool project or no? Which concepts sound most interesting to learn about? etc.
- Make your slides public.
- BY 8:30AM Wednesday Post a link to your slides on the #general Slack channel.
A few topic ideas for your consideration
- Old-school reverse engineering. For this project, you would Use an open-source reverse-engineering tool (probably Ghidra) to study the assembly-level code of a collection of known malware samples and give detailed reports on what each piece of malware does, how it does it, and how you figured it out. (Possible malware to study can be found in a variety of online archives like this one. You could also look for samples of historically significant malware to study in this way if that grabs your team's interest.) This project would require you to set up a sandbox in which it is safe to run the malware and study its effects as you pursue your reverse engineering efforts.
- Browser extension scanner. Browser extensions are a great vector for distributing malware. For this project, you would research the techniques used by malicious browser extensions and then develop a security scanner to produce security reports on each installed extension. Essentially, you would be replicating some of the features of existing scanners like this one. (There used to be a widely used scanner called CRXcavator, but it went offline a year ago--if you work on this project, you're likely to run into lots of info about CRXcavator, some of which might be useful.)
- Phishing URL Detector. There has been a lot of work on phishing detection. For example, an old paper (2015) Know Your Phish: Novel Techniques for Detecting Phishing Sites and their Targets describes techniques for identifying phishing via analyses of the malicious URLs and their corresponding web page content. For this project, you would identify the best literature describing client-side phishing-detection techniques, implement some of those techniques, and build your implementations into a browser plugin to provide users with phishing scores/reports on each visited URL. Test your detection techniques using existing datasets of malicious and benign websites.
- Canaries. One form of security software is called a "canary", in analogy to the idiom "canary in a coalmine". That is, a canary is a mechanism for getting an early warning that your network is being attacked. For example, Thinkst Canary sells a variety of "canary tokens" that they claim are easy to install and effective and detecting network breaches early. Another example is this ransomware detection whitepaper from the managed security company Huntress. For this project, you would learn how canaries work, implement a bunch of them yourselves, and set up a network and simulated attacks to evaluate your canaries' effectiveness.
- YARA rule generation and application. YARA is a system of formal rules that describe characteristics of particular pieces of malware. YARA rules can be used to scan email attachments, hard drives, and network traffic for malware (among other things). For this project, you would start by studying how YARA works, what tools are available for making use of YARA rules, etc. After that, you would have a lot of possible directions to go. You could develop a system for generating YARA rules automatically from malware samples (e.g., based on papers like this one or this one). You could write and test a virus scanner based on existing collections of YARA rules. etc.
- Malicious activity monitor. There are many tools that try to detect malicious activity on your computer or network while the activity is occurring. For this project, you would do a survey of the available kinds of tools, choose a set of activity detection features you want to implement, and set up an environment for running and testing your tool.
- Malicious activity forensics. Similar to the previous idea, but this time focusing on collecting data after an attack has occurred.
- Using image processing to classify malware. (This one might be of interest if you have some experience with data mining and/or machine learning.) One problem with malware analysis and detection is that attackers often make small changes to their executables to dodge detection techniques that previously thwarted their attacks. So it's important for defenders to have a way to identify variants of the same malware. There's a cool and very influential paper from 2011 that turns a malware executable into a grayscale image. It turns out that this technique tends to produce visually similar images (detectable both by people and image processing software) for variants of the same malware. For this project, you would implement and test the techniques described in this paper.
- Other ML for classifying malware. If you want to start more from scratch on the malware-classification problem, I'm open to proposals from teams who have some machine learning expertise. (I wouldn't choose this for Wednesday's assignment--but maybe for your term-long project once your team has been determined.)
- Malware-testing sandbox. One technique for analyzing and eventually fighting malware is to watch the malware in action. To do that safely, you need a sandbox. That is, you need a system for installing, running, and observing malware during and after its execution. One such sandbox system is known as CAPE. For this project, you would create your own simple sandbox system. This system would launch a virtual machine or container to act as the sandbox, upload a piece of malware to the sandbox, run the malware for a short time, collects data on the malware's behavior, and generate a report.
- DoS mitigator. For this project, you would study the techniques used to mitigate denial-of-service (DoS) attacks, either of the single source or distributed (DDoS) variety, and then implement and test some of those mitigation techniques. Setting up a suitable testing environment would be a big part of this project, since of course you don't want to execute a DoS attack on anybody else's computers or networks.
- Memory scraper. There's a famous and dangerous tool called mimikatz that, when run with administrator privileges on a Windows computer, can scrape passwords, keys, etc., from the computer's memory, sometimes leading to the compromise of other Windows computers on the same network. Related memory-scraping tools have been used in other famous breaches to extract things like credit card numbers from memory. For this project, you would create and test your own memory scraper for the operating system of your choice.
- ...
- A topic of your own choosing. Maybe you have an idea of your own, maybe you get something promising from an internet search or a chat with an LLM. But if it involves making sense of some malicious software, it's fair game.