Last summer I spent a few months as a research intern with the SOTERIA project at the University of Chicago, working on the security side of containerized systems. Going in, I had a decent grasp of Python and not much else relevant to the job. By the end, I'd shipped a working tool, learned a stack I'd only read about, and come away with a much clearer sense of what security research actually looks like day to day.
The core problem
The main task was deceptively simple to state: pull data out of Harbor, a container registry, and turn it into something a person could actually read and reason about. Container images carry a lot of metadata, including known vulnerabilities, and most of it sits locked away behind an API rather than in any human-friendly form.
I wrote a Python script that talked to Harbor's API and pulled the details that mattered: artifact names, sizes, hashes, tags, pull times, and the CVEs attached to each image. Early versions made me type in the project and repository names by hand. Over a few iterations I got it to walk the registry on its own, which made it far more useful and a lot less tedious to run.
Learning the stack as I went
A good chunk of the internship was just getting comfortable with tools I'd never touched. Elasticsearch was the big one. I worked through the official guide, then filled the gaps with whatever tutorials and documentation I could find, until I understood enough to feed it real data and query it sensibly.
Docker and Kubernetes came next. Reading about containers only gets you so far, so once the script was solid I packaged it into a Docker image and deployed it onto the lab's Kubernetes cluster. Actually getting it running in that environment taught me more than any video had — the gap between "this works on my laptop" and "this runs on the cluster" is where most of the real learning happened.
What the weeks looked like
The work built on itself week over week, roughly like this:
- Got my footing with Elasticsearch and started poking at Docker.
- Wrote the first version of the Harbor script — names, tags, and sizes.
- Extended it to pull hashes, pull times, tag lists, and vulnerability data, and made it run without manual input.
- Containerized the script into a Docker image, ready for deployment.
- Built a Kibana dashboard so the Harbor data was actually visual and not just rows in a terminal.
- Got the container running on the cluster, then went back to make the script faster and to handle pulling RPMs out of images more efficiently.
Bringing it all together
By the end of the internship, I had Docker, Kubernetes, and Elasticsearch working as one system rather than three separate pieces. People using the SOTERIA project could upload their code into the SOTERIA Docker space and, in return, get a clear readout of what they'd submitted: file sizes, current usage, and the versions of everything their code depended on — the Python version, the libraries, and so on.
The system ran on a nightly cycle. Every night it scanned the Docker space, picked up any code that had changed, and pushed the updated details into Elasticsearch, where graphs and visualizations made the whole picture easy to read at a glance instead of buried in raw output.
The next step I was aiming for was a scanner that could look at the dependency versions a project was running and flag any that were known to carry security risks. I didn't get to fully build that piece, but the groundwork was there: the pipeline already knew what versions everyone was using, so checking those against known vulnerabilities was the natural thing to add on top.
What I took away
More than any single tool, the internship taught me how to learn an unfamiliar system quickly and how to keep going when something doesn't work the first, fifth, or tenth time. I left a better programmer, more comfortable with security data, and genuinely interested in the questions that container security raises. It's a big part of why I'm still working in this space today.
← Back to all posts