GreyNoise | Creation and utilization of GreyNoise tags

How are GreyNoise Tags created?

GreyNoise tags are described in the documentation as “a signature-based detection method used to capture patterns and create subsets in our data.” The GreyNoise Research team is responsible for creating tags for vulnerabilities and activities seen in the wild by GreyNoise sensors. GreyNoise researchers have two main methods for tagging: a data-driven approach, and an emerging threats-driven approach. Each of these approaches has three main stages:

Discovery
Research
Implementation

Data-driven approach

When using a data-driven approach, researchers work backward from the data collected by GreyNoise sensors. Researchers will manually browse data or create tooling to aid in finding previously untagged and interesting data. This method relies heavily upon intuition and prior expertise and has led to non-vulnerability-related discoveries such as a Holiday PrintJacking campaign. Using this approach, GreyNoise steadily works toward providing some kind of context for every bit of data opportunistically transmitted over the internet to our sensors.

During the discovery phase, researchers identify interesting data that does not appear to be tagged by manual or tool-assisted browsing of raw sensor data. Researchers will simply query the data lake for interesting words or patterns, using instinct to drive exploration of the data. Once they have identified and collected an interesting set of data, they begin the research phase.

During the research phase, the researcher works to identify what the data is. This could be anything from CVE-related traffic to a signature for a particular tool. They do this by scouring the internet for various paths, strings, and bytes to find documentation relating to the raw traffic. This often requires the researcher to be adept at reading formal standards, like Requests for Comments, as well as reading source code in a variety of programming languages. Once they have identified the data, the researcher will gather and document their findings before moving on to the implementation phase.

Using their research, the researcher will implement a tag by actually writing the signature and populating the metadata that makes it into a GreyNoise tag. Once complete, a peer will review the work, looking for errors in the signature and false positives in the results before clearing it for production.

Emerging threats approach

When using an emerging threats-driven approach, researchers seek out emerging threats observable by GreyNoise sensors. For the most part, GreyNoise only observes network-related vulnerability and scanning traffic. This rules out vulnerabilities like local privilege escalations. Using this method, GreyNoise can provide early warning for mass scanning or exploitation activity in the wild of things like CVE-2022-26134, an Atlassian Confluence Server RCE.

During the discovery phase, researchers monitor a wide variety of sources such as tech news outlets, social media, US CISA’s Known Exploited Vulnerabilities Catalog, and customer/community requests. Researchers identify and prioritize CVEs that customers and community members may be interested in due to their magnitude, targeted appliances, etc.

Similar to the data-driven approach, researchers will gather publicly available information regarding the emerging threat that will allow them to write a signature. Proof-of-Concept (PoC) code is often the most useful piece of information. On rare occasions, lacking a PoC, researchers will sometimes attempt to independently reproduce the vulnerability. Researchers will often attempt to validate vulnerabilities by setting up testbeds to better understand what elements of the vulnerability should be used to create a unique and narrowly scoped signature.

Finally, using all collected information, the researcher will seek to write the signature that becomes a tag. When doing this, researchers focus on eliminating false positives and tightly scoping the signature to the targeted data or vulnerability. When relevant for emerging threats, GreyNoise researchers will run this signature across all of GreyNoise’s historical data to determine the date of the first occurrence. This allows GreyNoise to publish information regarding when a vulnerability has first seen mass exploitation in the wild and, occasionally, if a vulnerability, like OMIGOD, was exploited before exploit details were publicly available.

How to use GreyNoise tags

GreyNoise provides insight into IP addresses that are scanning the internet or attempting to opportunistically exploit hosts across the internet. Tag data associated with a specific IP address provides an overview of the activity that GreyNoise has observed from a particular IP, as well as insight into the intention of the activity originating from it. For example, we can see that this IP is classified as malicious in GreyNoise because it is doing some reconnaissance but also has tags associated with malicious activity attempting to brute force credentials as well as traffic identifying it as part of the Mirai botnet.

‍

GreyNoise tags are also a great way to identify multiple hosts that are scanning for particular items or CVEs. For example, querying for a tag and filtering data can show activity related to a CVE that is originating from a certain country, ASN, or organization. This gives a unique perspective on activity originating from these different sources.