Visualizing Amass Data Over Time

I want to share an idea I’m really excited about!

There is a lot you can do with all the historical data you’re collecting, if you’re running Amass periodically, and you’re currently tracking changes between just the most recent enumerations.1

How about getting your data to look like this…

Figure 1. Domain Name → Host Count Over Time. The chart plots the Amass enumeration index along the x-axis, domain name along the y-axis, and the dot size and shade indicate the number of hosts behind that domain for that enumeration. The data has been shortened and domain names obfuscated for this post.

…instead of like this?

Found: about.example.com XXX.XXX.XXX.XXX
Found: analytics.example.com XXX.XXX.XXX.XXX
Found: api.example.com XXX.XXX.XXX.XXX,XXX.XXX.XXX.XXX
Found: app.example.com XXX.XXX.XXX.XXX
Found: blog.example.com XXX.XXX.XXX.XXX

Figure 2. Example sample of Amass's `track` output.

AFAICT, the status quo is all about using a line-based diff to capture changes to an asset space over the two most recent enumerations, and then getting notified about those changes.2 The idea is that this will keep us up to date with assets that might be more vulnerable because they haven’t been exposed as long, any issues are yet to be discovered, and yet to be fixed.

Sticking to just the two most recent enumerations seems extremely limiting. Why only compare the most recent data by doing a simple line-by-line comparison when the Amass DB stores historical data stretching further back and much more of it than we can get with an amass track? Admittedly, making good sense of connected categorical data (domain names, IP address, etc.) over time is difficult, but it’s clear from the chart that there’s a lot more going on than a line-by-line comparison can show us. Not least that what looks like a new asset when comparing just two enumerations, had in fact existed historically: it was there, it went, and it came back.

A line-based diff is excellent for looking at the details (e.g. exactly which domain and address at which point in time need closer examination), but they make it much harder to get the big picture and discern patterns, like what’s normal and what’s not.3 This sort of framework might actually help us learn about an organization’s routines, habits, and historical mistakes.

Spotting a pattern in the past and learning to exploit it in the future means that we can be proactive in trying to predict or narrow our focus. No doubt, notifications are great for keeping up-to-date, but as real-time as they might be, they will only ever be a reactive mechanism.

I have a few more ideas working along this theme, but more than anything, I’d love your thoughts and feedback. I don’t have as much time as I’d like to validate this idea but perhaps you do. If you have Amass set-up and running periodically you can point the code (and write-up) in this notebook to your Amass data directory to get the same chart as above.

Write to me at joseph@yiasemides.com.

↩
In short, Amass is a tool used by security professionals to gather information on a target off of the Web, including domains, IP Addresses, etc.
↩
I learnt about reconnaissance over time from Codingo’s video, sw33tLie’s post, and Hakluke’s post. I searched the Web (Google, Bing, and DuckDuckGo), consulted ChatGPT, searched on forums and chat servers, and took a look at every product I could find in the space. I couldn’t find anything providing an overview over time. Let me know if there’s something out there.
↩
The chart generated by the notebook below does in fact provide a diff when hovering over the dots.

Back