GreyNoise Research

In-depth analysis and trend reporting from the GreyNoise Research team. Includes detection engineering insights, reverse engineering work, and white papers that surface emerging threat trends based on our telemetry — helping defenders stay ahead of risks that are often overlooked or not yet widely known.

GreyNoise Uncovers Early Warning Signals for Emerging Vulnerabilities

It’s well known that the window between CVE disclosure and active exploitation has narrowed. But what happens before a CVE is even disclosed?

In our latest research “Early Warning Signals: When Attacker Behavior Precedes New Vulnerabilities,” GreyNoise analyzed hundreds of spikes in malicious activity — scanning, brute forcing, exploit attempts, and more — targeting edge technologies. We discovered a consistent and actionable trend: in the vast majority of cases, these spikes were followed by the disclosure of a new CVE affecting the same technology within six weeks.

This recurring behavior led us to ask:

‍Could attacker activity offer defenders an early warning signal for vulnerabilities that don’t exist yet — but soon will?

‍

The Six-Week Critical Window

Across 216 spikes observed across our Global Observation Grid (GOG) since September 2024, we found:

80 percent of spikes were followed by a new CVE within six weeks.
50 percent were followed by a CVE disclosure within three weeks.
These patterns were exclusive to enterprise edge technologies like VPNs, firewalls, and remote access tools — the same kinds of systems increasingly targeted by advanced threat actors.

‍

Why This Matters

Exploit activity may be more than what it seems. Some spikes appear to reflect reconnaissance or exploit-based inventorying. Others may represent probing that ultimately results in new CVE discovery. Either way, defenders can take action.

Blocking attacker infrastructure involved in these spikes may reduce the chances of being inventoried — and ultimately targeted — when a new CVE emerges. Just as importantly, these trends give CISOs and security leaders a credible reason to harden defenses, request additional resources, or prepare strategic responses based on observable signals — not just after a CVE drops, but weeks before.

‍

What’s Inside the Report

The full report includes:

A breakdown of the vendors, products, and GreyNoise tags where these patterns were observed.
Analysis of attacker behavior leading up to CVE disclosure.
The methodology used to identify spikes and establish spike-to-CVE relationships.
Clear takeaways for analysts and CISOs on how to operationalize this intelligence.

This research builds on our earlier work on resurgent vulnerabilities, offering a new lens for defenders to track vulnerability risk based on what attackers do — not just what’s been disclosed.

‍

READ THE FULL REPORT

‍

GreyNoise Research

Jul 31, 2025

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Text Embedding for Fun and Profit

Words != Numbers

Computers don’t understand words. They don’t understand verbs, nouns, prepositions, or even adjectives. They kinda understand conjunctions (and/or/not), but not really. Computers understand numbers.

To make computers do things with words, you have to make them numbers. Welcome to the wild world of text embedding!

In this blog I want to teach you about text embedding, why it’s useful, and a couple ways to do it yourself to make your pet project just a little bit better or get a new idea off the ground. I’ll also describe how we’re using it at GreyNoise.

Use Cases

With LLMs in the mix, modern use cases of text embedding are all over the place.

Can’t fit all your text into a context window? Try embedding and searching the relevant parts.
Working on sentiment analysis? Try to build a classifier on embedded text
Looking to find similar texts? Create a search index on embedded values
Want to build a recommendation system? Figure out how things are similar without building a classification model

In all of these, we’re encoding a piece of text into a numerical vector in order to do basic machine learning tasks against it, such as nearest neighbors or classification. If you’ve been paying attention in class, this is just feature engineering, but it’s unsupervised and on unstructured data, which has previously been a really hard problem.

How to method 1: Cheat mode

Lots of large models will offer sentence level embedding APIs. One of the most popular ones is OpenAI https://platform.openai.com/docs/guides/embeddings. It doesn’t cost a ton, probably under $100 for most data sets, but you’re dependent on the latency of external API calls and the whims of another company. Plus, since it’s a GPT model it is based on the encoding of the last word in your text (with the cumulative words before it), that doesn’t feel as cohesive as what I’m going to suggest next. (This is foreshadowing a BERT vs GPT discussion)

How to method 2: Build your own

Side quest: GPT vs BERT

GPT stands for Generative Pre-trained Transformer. It is built to predict the next word in a sequence, and then the next word, and then the next word. Alternately, BERT stands for Bidirectional Encoder Representations from Transformers. It is built to predict any word within a set piece of text.

The little bit of difference between them, because they both use Transformers, is where they mask data while training. During the training process a piece of the text is masked, or obscured, from the model and the model is asked to predict it. When the model gets it right, hooray! When the model gets it wrong, booo! These actions either reinforce or change the weights of the neural network to hopefully better predict in the future.

GPT models only mask the last word in a sequence. They are trying to learn and predict that word. This makes them generative. If your sequence is “The cat jumped” it might predict “over”. Then your sequence would be “The cat jumped over” and it might predict “the”, then “dog”, etc.

BERT models mask random words in the sequence, so they are taking the entire sequence and trying to figure out the word based on what came before and after (bidirectional!!!). For this point, I believe they are better for text embedding. Note, the biggest GPT models are orders of magnitude bigger than the biggest BERT models because there is more money in generation than encoding/translation, so it is possible GPT4 does a better job at generic sentence encoding than a home grown BERT, but let's all collectively stick it to the man and build our own, it’s easy.

‍

Main quest: Building a text encoder

If your data is perhaps not just basic English text data, building your own encoder and model might be the right decision. For GreyNoise, we have a ton of HTTP payloads that don’t exactly follow typical English language syntax. For this point, we decided to build our own payload model and wanted to share the knowledge.

There are two parts of a LLM. The same parts you’ll see in HuggingFace models (https://huggingface.co/models) and everywhere else. A Tokenizer and a Model.

Tokenizer

The tokenizer takes your input text and translates it to a base numerical representation. You can train a tokenizer to learn vocabulary directly from your dataset or use the one attached to a pre-trained model. If you are training a model from scratch you might as well train a tokenizer (it takes minutes), but if you are using a pre-trained model you should stick with the one attached.

Tokens are approximately words, but if a word is over 4-5 characters it might get broken up. “Fire” and “fly” could each be one token, but “firefly” would be broken into 2 tokens. This is why you might often hear that tokens are “about ¾ of a word”, it’s an average of word to token. Once you have a tokenizer it can translate a text into integers representing the index of the tokenizer set.

“The cat jumped over” -> 456, 234, 452, 8003

Later, supposing we have a model, if you have the output 8003, 456, 234, 452 (I reordered on purpose) you could translate that back to “over the cat jumped”

The tokenizer is the translation of a numeric representation to a word (or partial word) representation.

Model

With a tokenizer, we can pass numerical data to a model and get numerical data out, and then re-encode that to text data.

We could discuss the models, but others have done that before (https://huggingface.co/blog/bert-101) All of these LLM models are beasts. They have basic (kinda) components, but they have a lot of them, which makes for hundreds of millions to billions of parameters. For 98% of people, you want to know what it does, the pitfalls, and how to use it without knowing the inner workings of how transformer layers are connected to embedding, softmax, and other layers. We’re going to leave that to another discussion. We’ll focus on what it takes to train and get a usable output.

The models can be initialized with basic configs and trained with easy prompts. Thanks to the folks at Huggingface (you the real MVP!). For this we are going to use a RoBERTa model (https://huggingface.co/docs/transformers/model_doc/roberta). You could use a pre-trained model and fine-tune it, however, we’re just going to use the config and build the whole model from random/scratch. A very similar workflow is usable if you want to use a pre-trained model and tokenizer though. I promise I won’t judge.

Code

Import or copy the code from model training gist.

Create your own list of text you want to train the encoder and model on. It should be at least 100k samples.

If you have created your data set as `str_data` and set a model path as a folder where you want to save the model and tokenizer, you can just do:

‍

tokenizer = create_tokenizer(model_path, str_data[0:50000]) ## you don’t really need more than 50k to train the tokenizer
model = create_model_from_scratch(model_path, tokenizer)

‍

This will create the tokenizer and model. The tokenizer is usable at this state. The model is just random untrained garbage though.

When you’re ready for the first train, get into the habit of loading the tokenizer and model you created and training it, this is what people call “checkpoints”.

‍

tokenizer = RobertaTokenizerFast.from_pretrained(model_path, max_len=512)
model = RobertaForMaskedLM.from_pretrained(model_path)
model = train_model(tokenizer, model, model_path, str_data[0:100000]) ## train on however much you want at a time, there is a whole other discussion about this, but give it at least 100k samples.

‍

When you want to retrain or further train, which at this point is also called fine-tuning, just load it up and go again with new or the same data. Nobody is your boss and nobody really knows what is best right here.

Note: You’re going to want to use GPUs for training. Google Colab and Huggingface Notebooks have some free options. All in, this particular model will require 9-10GB of GPU memory, easily attainable by commodity hardware.

Evaluating

Large Language Models do not have a great list of sanity checks. Ironically most benchmarks are against other LLMs. For embeddings we can do a little better to work toward your personal model. When you take two samples that you think are similar and run them through the model to get the embeddings, you can calculate how far they are apart with either cosine or Euclidean distance. This gives you a sanity check of if your model is performing as expected or just off the rails.

For Euclidean distance use:

import numpy as np

euclidean_dist = np.linalg.norm(embedding1 - embedding2)

‍

For cosine distance use:

from sklearn.metrics.pairwise import cosine_similarity

cos_sim = cosine_similarity(embedding1.reshape(1, -1), embedding2.reshape(1, -1))

How We’re Using it at GreyNoise

We’re early adopters of LLM tech at GreyNoise, but it is hard to put it in the hands of users responsibly. We basically don’t want to F up. We have an upcoming feature called NoiseGPT that takes natural language text and turns it into GNQL queries. Begone the days of learning a new syntax for just figuring out what the hell is going on.

We also have an in-development feature called Sift, a way to tease out the new traffic on the internet and describe it for users. This would take the hundreds of thousands of http payloads we see every day and reduce it to the ~15 new and relevant ones and also describe what they are doing. EAP coming on that soon.

Plus, if you think of any great ideas we should be doing, please hit us up. We have a community slack and my email is below. We want to hear from you.

Fin

With these tips I hope you’re able to create your own LLM for your projects or at least appreciate those that do. If you have any questions please feel free to reach out to daniel@greynoise.io, give GreyNoise a try (https://viz.greynoise.io/), and look out for features using these techniques in the very near future.

‍

Daniel Grant

Jun 15, 2023

KEV'd: CVE-2021-45046, CVE-2023-21839, and CVE-2023-1389

On Monday, May 1, 2023, CISA added CVE-2021-45046, CVE-2023-21839, and CVE-2023-1389 to the Known Exploited Vulnerabilities (KEV) list. For all three CVEs, GreyNoise users had visibility into which IPs were attempting mass exploitation prior to their addition to the KEV list. GreyNoise tags allow organizations to monitor and prioritize the handling of alerts regarding benign and, in this case, malicious IPs.

TP-LINK ARCHER AX21 COMMAND INJECTION VULNERABILITY SCAN | CISA KEV UPDATE: CVE-2023-1389

‍

ORACLE WEBLOGIC CVE-2023-21839 RCE ATTEMPT | CISA KEV UPDATE: CVE-2023-21839

APACHE LOG4J RCE ATTEMPT | CISA KEV UPDATE: CVE-2023-45046

‍

CVE	CVE Description	Tag Date	KEV Date
CVE-2021-45046	Apache Log4j2 contains a deserialization of untrusted data vulnerability due to the incomplete fix of CVE-2021-44228, where the Thread Context Lookup Pattern is vulnerable to remote code execution in certain non-default configurations.	December 9, 2021	May 1, 2023
CVE-2023-21839	Oracle WebLogic Server contains an unspecified vulnerability that allows an unauthenticated attacker with network access via T3, IIOP, to compromise Oracle WebLogic Server.	March 6, 2023	May 1, 2023
CVE-2023-1389	TP-Link Archer AX-21 contains a command injection vulnerability that allows for remote code execution.	April 25, 2023	May 1, 2023

Bonus Update:

On Thursday, April 27, 2023, GreyNoise released a tag for the critically scored CVE-2023-21554, QueueJumper, a Microsoft message queuing remote code execution vulnerability.

As of this publication, we have not observed mass exploitation attempts, but have observed >600 IPs that are attempting to discover Internet-facing Microsoft Windows devices that respond over Microsoft Message Queuing (MSMQ) binary protocol.

‍

Glenn Thorpe

May 1, 2023

Change in ENV Crawler Tags as Bots Continue to Target Environment Files

Crawlers finding public, unsecured environment files continue to be used to compromise organizations.

On Tuesday, April 25, 2023, GreyNoise is changing how we classify environment file crawlers from unknown intent to malicious intent. At the time of publication, this change will result in the reclassification of over 11,000 IPs as malicious. Users who use GreyNoise’s malicious tag to block IPs based on malicious intent will see an increase in blocked IPs.

Background

An environment file crawler is a bot that scours the internet for publicly available env files. The use of these files have been popular for over a decade and are used to pass dynamic environmental variables to software and services.

Environment files are dotfiles; dotfiles are hidden files that are hidden from the user by default but are editable by any text editor and contain configuration settings for various applications. An example of an environment file is:

APP_NAME=The App
APP_ENV=dev

DB_CONNECTION=mysql
DB_HOST=127.0.0.1
DB_PORT=3306
DB_DATABASE=theappdb
DB_USERNAME=thedatabaseuser
DB_PASSWORD=theappsecretpassword
API_KEY=abc123def456

Why are attackers so interested in env files?

They almost always contain sensitive data such as authentication information (ex. keys or passwords) and often their specific connection paths. For this reason, env files should never be exposed publicly; anyone who obtains the file can potentially access sensitive information. Adding insult to injury, organizations often are unaware that they are exposing these files to the public, and these crawlers have been historically overlooked.

What is GreyNoise changing?

For years, GreyNoise has monitored env scanners and classified them as unknown intent. However, we continuously strive to enhance our datasets to safeguard organizations and increase the effectiveness of SOCs; thus, we have decided to reclassify these crawlers as malicious.

Click/tap here for more information on GreyNoise classifications.

The reclassification of intent will affect the following tags:

Why the change?

These files should never be publicly exposed since they typically contain sensitive information; the internet noise generated by the constant searching for these files is indicative of the scale of opportunistic attackers looking for credentials.

Using environment files to compromise organizations is a well-established tactic

There are numerous CVEs related to env files as information disclosure or code execution, including but not limited to:

CVE-2021-41714 in Tipask
CVE-2019-17050 in Voyager
CVE-2017-16894 in Laravel framework
CVE-2019-5736 in Docker

Final thoughts:

Organizations should take proactive measures to regularly look for exposed .env files; scanning once won’t cut it as they can appear at any time. Searching for unsecured env files should be a part of an organization's vulnerability management program. If you do find a publicly available .env file for your organization, it is imperative that you immediately remediate the exposure and rotate any credentials that were leaked. GreyNoise will continue to review the classifications of our tags to ensure their efficacy.

Sign up for a free GreyNoise account or request a demo to see how GreyNoise can help provide immediate protection from threats like these, especially when activity mutates from "unknown" or "benign" to "malicious.”

Glenn Thorpe

Apr 24, 2023

OpenAI, MinIO, And Why You Should Always Use docker-cli-scan To Keep Your Supply chAIn Clean

OpenAI ChatGPT has recently released a new feature that allows for plugins to fetch live data from various providers. This feature has been designed with "safety as a core design principle", which means that the OpenAI team has taken steps to ensure that the data being accessed is secure and private.

However, there are some concerns about the security of the example code provided by OpenAI for developers who want to integrate their plugins with the new feature. Specifically, the code examples utilize a docker image for MinIO RELEASE.2022-03-17. This version of MinIO is vulnerable to CVE-2023-28432, which is a security vulnerability resulting in information disclosure of all environment variables, including MINIO_SECRET_KEY and MINIO_ROOT_PASSWORD.

While we have no information suggesting that any specific actor is targeting ChatGPT example instances, we have observed this vulnerability being actively exploited in the wild. When attackers attempt mass-identification and mass-exploitation of vulnerable services, “everything” is in scope, including any deployed ChatGPT plugins that utilize this outdated version of MinIO.

To avoid any potential data breaches, it is recommended that users upgrade to a patched version of MinIO (RELEASE.2023-03-20T20-16-18Z) and integrate security tooling such as docker-cli-scan or use Github’s built-in monitoring for supply chain vulnerabilities, which already contains a record referencing this vulnerability.

https://github.com/minio/minio/security/advisories/GHSA-6xvq-wj2x-3h3q

GreyNoise has posted an issue to the affected OpenAPI GitHub project to help ensure this weakness gets addressed as soon as possible.

While the new feature released by OpenAI is a valuable tool for developers who want to access live data from various providers in their ChatGPT integration, security should remain a core design principle.

Text of a chat with ChatGPT where we ask it about the vulnerability explained in the post.

‍

Matthew Remacle

Mar 24, 2023

A week in the life of a GreyNoise Sensor: It's all about the tags

To ensure we have as much visibility into activity on the internet as possible, we regularly deploy new sensors in different “geographical” network locations. We’ve selected two sensors for a short “week in the life” series to give practitioners a glimpse into what activity new internet nodes see. This series should help organizations understand the opportunistic and random activity that awaits newly deployed services, plus help folks understand just how little time you have to ensure internet-facing systems are made safe and resilient.

We initially took a look at what the sources of "benign" traffic are slinging your way. Today, we're going to look at the opposite end of the spectrum. We stripped away all the benign sources from the same dataset and focused on incoming tagged traffic from malicious or unknown sources. After all, these detection rules are the heart and soul of the GreyNoise platform, and are also what our customers and community members depend upon to keep their organizations safe.

The "not-so-benign" perspective

If we hearken back to our previous episode, it took over an hour for even the best-of-the-best of the "benigns" to discover our freshly deployed nodes. This makes sense, since there aren't too many legitimate organizations conducting scans, and they do not have infinite resources. Sure, they could likely spare some change to scan more frequently, but they really don't need to.

In contrast, we tagged 8,697 incoming IP addresses in that first week, and the first packet of possible ill-intent appeared ten seconds after the nodes were fully armed and operational. However, the first tagged event — RDP Alternative Port Crawler — was seen three hours later. The difference between those two events lies in one of our core promises: our tags are 100% reliable. We don't just take every IP address hitting our unannounced sensor nodes and shove it into a list of indicators of compromise (IoCs).

Tagged Malicious Traffic Started Coming In As Soon As The Sensors Were Functional

217,852 total malicious/unknown events encountered during the ~7.8 day sampling period.

‍

The above chart is the raw, non-benign connection data to those sensors for that week. You're likely wondering what those spikes are. We did to!

Let's look at some summary data by tallies of:

autonomous system organization (aso) — the name of the network the connections came from
geolocated country
destination port
source IPv4
total connections

The Four Largest "Spike" Hours Had Mostly Similar Characteristics

‍

The August 28th malicious traffic spike (for an hour) focused mainly on SMB exploits, and originated from the "Data Communication Business Group" autonomous system in Taiwan. It is odd that we saw so little other activity, and that the port volume was an order of magnitude less. There are any number of reasons for this. Given where these sensors are (which we're not disclosing), it could have been a day of deliberate country network isolation. Or, it could just mean that the botnet herders were super-focused on SMB.

Over the course of those seven days, non-benign nodes hit over thirteen-thousand ports, and you can likely guess which ones made the top of the list.

We Saw The Usual Suspects Rise To The Top Of 13,576 Ports

Port numbers associated with Telnet, RDP, SSH, and SMB were, by far, the most common.

‍

Since we called out one autonomous system, we should be fair and call them all out.

‍

If You've Ever Stared At An IPv4 IoC List, You Definitely Recognize These Folks

‍

These are very common autonomous systems to see in malicious attack logs. What should concern you, however, is that "high reputation" sources such as Linode, OVH SAS, Google LLC, and Microsoft Corporation are all in the visible treemap cells. Even if you don't consider them as high reputation sources, you cannot permanently block communication to/from them. These are all hosting providers, and we see other hosting providers with decent IP reputation also hosting malicious traffic sources. The hourly updated nature of our API-downloadable block lists, combined with the scheduled roll off of IP addresses when they stop sporting malicious activity after a while, means you can make your organization safe from these IPs while they are trying to do harm.

It's all about the tags

What you, and we, truly care about is the tagged, non-benign traffic.

The Tagged Traffic Distribution Takes A Familiar Shape

‍

Of the 115 identified tags, those associated with RDP, SMB, and MS SQL attacks topped the list, along with scrapers looking for useful information. If you still haven't removed RDP from your perimeter, please stop reading and take the opportunity to do so now.

Head on over to the Observable notebook that houses all these charts and data for a more interactive version of the chart and data tables.

You can see the aforementioned yellow SMBv1 Crawler August 28th spike right after "12 PM".

Key Takeaways

When you deploy a new internet-facing system, you have only seconds until unwanted traffic comes knocking on your door. After that, there's a constant drumbeat, and sometimes even an entire off-key orchestra, of unsought after:

benign traffic attempting to maintain an inventory of internet-connected devices and services;
malicious traffic with a laundry list of goals to achieve; and,
unknown traffic that could be something we and the rest of the cybersecurity community have not yet identified as malicious, but is definitely something your apps and other services might not be able to handle.

You can sign up for a GreyNoise account to start exploring the tags and networks identified in this post, and check out some of our freshly minted new features such as IP Similarity — which lets you hunt for bad actors exhibiting behavior similar to ones we've tagged — and, IP Timeline — where you can see what sources have been up to, and how their behavior has changed over time.

‍

‍

boB Rudis

Mar 14, 2023

GreyNoise Analysis Of A Quartet of Exchange Remote Code Execution Vulnerabilities: CVE-2023-21529; CVE-2023-21706; CVE-2023-21707; CVE-2023-21710

Microsoft’s Patch Tuesday (Valentine’s Edition) released information on four remote code execution vulnerabilities in Microsoft Exchange, impacting the following versions:

Exchange Server 2019
Exchange Server 2016
Exchange Server 2013

Attackers must have functional authentication to attempt exploitation. If they are successful, they may be able to execute code on the Exchange server as SYSTEM, a mighty Windows account.

Exchange remote code execution vulnerabilities have a bit of a pattern in their history. This history is notable due to authentication being a requirement for exploitation of these newly announced vulnerabilities.

CVE-2023-21529, CVE-2023-21706, and CVE-2023-21707 have similarities to CVE-2022-41082 due to them all requiring authentication to achieve remote code execution, which GreyNoise covered back in September 2022. Readers may know those previous September 2022 vulnerabilities under the “ProxyNotShell” moniker, since an accompanying Server-Side Request Forgery (SSRF) vulnerability was leveraged to bypass the authentication constraint. “As per our last email” we noted this historical pattern of Exchange exploitation in prior blogs as well as tracked recent related activity under the Exchange ProxyNotShell Vuln Check tag which sees regular activity.

Shadowserver, a nonprofit organization which proactively scans the internet and notifies organizations and regional emergency response centers of outstanding exposed vulnerabilities, noted that there were over 87,000 Exchange instances vulnerable to CVE-2023-21529 (the most likely vulnerability entry point of the four new weaknesses).

As of the publishing date of this post, there are no known, public proof-of-concept exploits for these new Exchange vulnerabilities. Unless attackers are attempting to bypass web application firewall signatures that protect against the previous server-side request forgery (SSRF) weakness, it is unlikely we will see any attempts to mass exploit these new weaknesses any time soon. Furthermore, determined attackers have been more stealthy when it comes to attacking self-hosted Exchange servers, amassing solid IP address and domain inventories of these systems, and retargeting them directly for new campaigns.

GreyNoise does not have a tag for any of the four, new Exchange vulnerabilities but is continuing to watch for emergent proof-of-concept code and monitoring activity across the multi-thousand node sensor network for anomalous Exchange exploitation. Specifically, we are keeping a keen eye on any activity related to a SSRF bypass or Exchange credential brute-force meant to meet the authentication constraints needed by an attacker to leverage these vulnerabilities.

GreyNoise researchers will update this post if and when new information becomes available.

Given the likely targeted nature of new, malicious Exchange exploit campaigns, you may be interested in how GreyNoise can help you identify targeted attacks, so you can focus on what matters to your organization.

Don’t have a GreyNoise account? Sign-up for a free account.

GreyNoise Research

Feb 22, 2023

Exploit Vector Analysis of Emerging ‘ESXiArgs’ Ransomware

Wow do I hate ESXi Threat Intel (right now)

UPDATE 2023-02-14: in response to an inquiry, GreyNoise researchers went back in time to see if there were exploit attempts closer to when CVE-2021-21974 was released.

From January 1st 2021 through June 1st 2021, 2 IP's were observed exploiting CVE-2021-21974, each active (as observed by GreyNoise sensors) for a single day.

Active 2021-05-25, 45[.]112[.]240[.]81
Active 2021-05-31, 77[.]243[.]181[.]196
‍

a stick figure holding a cinder block yelling "Move y'all I got it!"

‍

In recent days CVE-2021-21974, a heap-overflow vulnerability in VMWare ESXi’s OpenSLP service has been prominently mentioned in the news in relation to a wave of ransomware effecting numerous organizations. The relationship between CVE-2021-21974 and the ransomware campaign may be blown out of proportion. We do not currently know what the initial access vector is, and it is possible it could be any of the vulnerabilities related to ESXi’s OpenSLP service.

The Security Community seems to be focusing on a single vulnerability. GreyNoise believes that CVE-2021-21974 makes sense as an initial access vector, but are not aware of any 1st party sources confirming that to be the case. We encourage defenders to remain vigilant and not accept every vendor at their word (including us).

The objective of the following document is to provide clarity to network defenders surrounding the ransomware campaign as it relates the the following items:

Attribution of exploitation vector to a specific CVE
Metrics of vulnerable hosts
Metrics of compromised hosts
How do you “GreyNoise” an unknown attack vector?

Attribution to a specific CVE

CVE-2021-21974 is a heap-overflow vulnerability in ESXi’s OpenSLP service.

CVE-2020-3992 is a use-after-free vulnerability in ESXi’s OpenSLP service.

CVE-2019-5544 is a heap overwrite vulnerability in ESXi’s OpenSLP service.

Back in October 2022, Juniper Networks wrote a blog regarding the potential usage of CVE-2019-5544 / CVE-2020-3992 as part of an exploitation campaign. Due to log retention on the compromised server they were unable to confidently attribute which specific vulnerability resulted in successful exploitation. Instead they focused their blog on the details of the backdoor that was installed post-exploitation.

On February 3rd, 2023 the cloud hosting provider OVH published a notice regarding an active ransomware campaign effecting many of their ESXi customers, hereafter referred to as “ESXiArgs” due to the ransomware creating files with an extension of .args. As part of their notice they provide the following quote:

According to experts from the ecosystem as well as authorities, the malware is probably using CVE-2021-21974 as compromission vector. Investigation are still ongoing to confirm those assumptions.

On February 6th, 2023 VMWare published a security blog acknowledging the “ESXiArgs” campaign and stated:

VMware has not found evidence that suggests an unknown vulnerability (0-day) is being used to propagate the ransomware used in these recent attacks.

In summary, while there are many 3rd party sources of intelligence directly attributing this ransomware campaign to CVE-2021-21974, the first party sources are not.

Tl;dr:

There are several high-profile OpenSLP vulnerabilities in various versions of ESXi

These vulnerabilities have been exploited in the past to install malicious backdoors

No CVE is being concretely attributed as the initial access vector for the ESXiArgs campaign by first-party sources

Metrics of Vulnerable Hosts

There are many companies that scan the internet with benign intentions for inventory, research, and actionable intelligence. GreyNoise sees these companies on a very regular basis, since we operate “sensors” similar to a honeypot. They scan, and we (GreyNoise) listen.

Without going into too much depth, there is a significant complexity jump between “determining if a port is open on a server” and “determining what protocol is operating on a port”.

For scanners, high interaction protocols such as those used by the ESXi OpenSLP service may be checked on a weekly/monthly basis, whereas more common protocols such as HTTP(s) on common ports like 80/443 may be checked nearly constantly.

Much like the variety of benign internet-wide scanning companies, GreyNoise is not the only organization operating honeypots on the internet. This causes biases in reported metrics of potentially vulnerable servers on the public internet.

Once an incident such as the ESXiArgs campaign has begun, “scanning” organizations will ramp up scanning, and “honeypot” organizations will ramp up honeypots. At this point, the ESXiArgs campaign is already underway and accurate metrics can be drawn upon from other attributes.

Tl;dr:

Metrics regarding vulnerable host counts have biases

Metrics regarding vulnerable host counts are scoped estimates

These metrics are still the most accurate reports available

Metrics of Compromised Hosts

One of the publicly visible aspects of the “ESXiArgs” campaign is that a ransom note is made available on a hosts public IP with a title of How to Restore Your Files.

By performing a query for How to Restore Your Files we can generate a list of the autonomous system organizations and countries affected by this campaign, complete with a generated timestamp since this number will continually fluctuate and is only accurate as a point-in-time metric.

bar chart showing ransomed system counts by aso (OVH is top)

bar chart showing ransomed system counts by country (france is top)

Our data partner, Censys, provided the data/query. To see current results, use the query — services.http.response.body: "How to Restore Your Files" and services.http.response.html_title:"How to Restore Your Files”.

Tl;dr:

OVH is the predominantly effected hosting provider

France (where OVH is primarily located) is the most impacted region

The estimated count of compromised hosts at the time of writing is between 1,500 and 2,000 nodes. Censys noted that the initial high water mark of compromised nodes was over 3,500.

How do you “GreyNoise” an unknown attack vector?

GreyNoise has had a tag available for tracking and blocking CVE-2021-21974 since June 2021:

https://viz.greynoise.io/tag/vmware-esxi-openslp-rce-attempt?days=3

At this moment in time, we’re seeing a the Log4j-style conundrum 😬; the majority of CVE related activity is related to benign cybersecurity companies checking for the presence of the vulnerability.

As described above, there is no confirmed reports of the initial CVE exploit vector, so how can GreyNoise help defenders when the attack vector is unknown?

image showing how greynoise can help orgs by enabling them to focus on real targeted threats

As we explained in our “How to know if I am being targeted” blog, IPs that return no results when searched on GreyNoise are traffic that is targeted at your organization.

If your organization observes a connection from an IP that returns no search results in GreyNoise: You should almost certainly prioritize that investigation, because it’s targeting your organization instead of the entire internet. If you find that the IP not known to GreyNoise was connecting to your organization’s VMWare ESXi on TCP/427, you should definitely prioritize that investigation.

In cases where initial access vectors are unknown, using GreyNoise as a filter for internet background noise can help prioritize the things that matter the most, because absence of a signal is a signal itself.

Additional Resources

CISA has released a tool to aid in the recovery of compromised instances: https://www.cisa.gov/uscert/ncas/current-activity/2023/02/07/cisa-releases-esxiargs-ransomware-recovery-script
VMWare has a Ransomware Resource Center: https://core.vmware.com/ransomware
Censys has posted a deep-dive on ESXiArgs exposure
runZero has an excellent blog on how to find potentially vulnerable assets in your organization

Matthew Remacle

Feb 8, 2023

Observed In The Wild: HTTP PUT Anomalies Explained

UPDATE (2023-01-31): Added link to the QA'd Tag.

You may have noticed an anomalous uptick of PUT requests in the GreyNoise sensors these past couple of days (2023-01-22 → 2023-01-24). For those interested, we’ve put together a quick summary of the details for you to dive into.

The majority of PUT requests occurred from January 15, 2023, to January 26th, 2023. During this time period, 2,927 payloads were observed containing HTTP paths of randomly generated letters and either a “.txt” or “.jsp” file extension. Similarly, the body of the PUT requests contained a randomly generated string as text and another randomly generated string contained within the markdown of a Jakarta Server Page (JSP) comment field. We believe this to be a methodology attempting to insert a unique identifier into the target server to determine potential capabilities of further exploitation; such as the ability to upload arbitrary files, retrieve the contents of the uploaded file, and determine if the JSP page was rendered (indicative of possible remote code execution).

The remaining path counts can be seen here:

Table of path counts for anomalous PUT requests showing /api/v2/cmdb/system/admin/admin and /FxCodeShell… seeing the most hits — Table of HTTP PUT Path Counts Observed by GreyNoise Sensors

The most common being "/api/v2…", a path often found in FortiOS authentication bypass attempts. Check out our blog to learn more about tracking this exploit. We’ve also seen variations of "/FxCodeShell.jsp…", which is indicative of a Tomcat backdoor usage. Each has their respective packet example below:

HTTP PUT decoded packet example — FortiOS Authentication Bypass Attempt

Screen capture of FxCodeShell PUT headers and payload — Tomcat Backdoor CVE-2017-12615

Inquiring into these paths led to a discovery for us as well! Having been formally unfamiliar with the "/_users/org.couchdb.user:..." path, we did some digging, which led to a new signature for CVE-2017-12635.

Table of CouchDB Remote Priv Esc Attempts — Instances of Apache CouchDB Remote Priv Esc Attempts

This highlights novel ways attackers are attempting to fingerprint exposed services using known vulnerabilities, and is a starting point for hunting for additional malicious activity related to these requests.

If you want to do your own threat hunting, check out GreyNoise Trends (Anomalies).

Researcher Notes

_{When digging into these anomalies, GreyNoise researchers noticed a pattern of randomly generated JSP checking to see if they can upload, and then access their uploaded files.}

_{The FortiOS authentication bypass used both the "/api/v2" HTTP PATH prefix along with a header of "User-Agent: Node.js"}

_{"FxCodeShell.jsp" is associated with a}_{well-known webshell}_.

‍

Jacob Fisher

Jan 30, 2023

A week in the life of a GreyNoise Sensor: The benign view

A week in the life of a GreyNoise sensor

The “Benign” perspective

Presently, there are three source IPv4 classifications defined in our GreyNoise Query Language (GNQL): benign, malicious, and unknown. Cybersecurity folks tend to focus quite a bit on the malicious ones since they may represent a clear and present danger to operations. However, there are many benign sources of activity that are always on the lookout for new nodes and services, so we’re starting our new sensor retrospective by looking at those sources first. Don’t worry, we’ll spend plenty of time in future posts looking at the “bad guys."

While likely far from a comprehensive summary, there are at least 74 organizations regularly conducting internet service surveys of some sort (we’ll refer to them as ‘scanners’ moving forward):

AdScore

Ahrefs

Alpha Strike Labs

Ampere Innotech

ANT Lab

Applebot

Arbor Observatory

Archive.org

BinaryEdge.io

BingBot

Bit Discovery

Bitsight

BLEXBot

Caida

Censys

CERT-FR

Cloud System Networks

Cloudflare

Cortex Xpanse

CriminalIP

cyber.casa

CyberGreen

cymru

dataplane

DomainTools

Dutch Institute for Vulnerability Disclosure

errata

ESET

Facebook Crawler

FH Muenster University

GoogleBot

Internet Census

InterneTTL

Intrinsec

IPinfo.io

ipip.net

ipqualityscore

Knoq

LeakIX

Mail.RU

Max Planck Inst.

Moz DotBot

Net Systems Research

NetCraft

netsystems

ONYPHE

OpenIntel.nl

openportstats

Palo Alto Crawler

Petalbot

pnap

Project Sonar

Project25499

Quadmetrics.com

Qualys

Qwant

Recyber

RWTH AACHEN University

scorecardresearch

SecurityTrails

Seznam

ShadowServer.org

Shodan.io

Sogou

spyse

Stanford Univ.

stretchoid

Technical University of Munich

threatsinkhole

UMich

University of Colorado

VeriSign

WithSecure

Yandex Search Engine

‍

We were curious as to how long it took these scanners to find our new nodes after they came online and were ready to accept connections. We capped our discovery period exploration at a week for this analysis but may dig into longer time periods in future updates.

Out of the 74 known scanners, only 18 (24%) contacted our nodes within the first week.

As the above chart shows, some of the more well-known scanners found our new sensor nodes within just about an hour after being booted up. A caveat to this data is that other scanners in the main list may have just tried contacting the IP addresses of these nodes before we booted them up.

One reason organizations should care about this metric is that some of these scanners are run by “cyber hygiene” rating organizations, and you only get one chance to make a first impression that could negatively impact, say, your cyber insurance premiums. So, don’t deploy poorly configured services if you want to keep the CFO happy.

Benign infrastructure

It’s pretty “easy” to scan the entire internet these days, thanks to tools such as Rob Graham’s masscan, provided you like handling abuse complaints and can afford the bandwidth costs on providers that allow such activity. We identified each of these scanning organizations via their published list of IPs. We decided to see just how many unique IPs of each scanner we saw within the first week:

Bitsight dedicates a crazy amount of infrastructure to poke at internet nodes. Same for the Internet Census. By the end of the week, we saw 346 unique benign scanner IPs contact our sensors, which means your internet-facing nodes likely did as well. While you may not want these organizations probing your perimeter, the reality is that, while you may be able to ask them to opt you out of scanning, you cannot do the same for attackers (abuse complaints aren’t a great solution either). Some organizations, ShadowServer in particular, are also there to help you by letting you understand your “attack surface” better, so you are likely better off using our benign classified IPs to help thin down the alerts these services likely generate (more on that in a bit).

The chart above also shows that some services have definite “schedules” for these scans, and others rarely make contact. Just how many contacts can you expect per day?

Hopefully, you are using some intelligent alert filtering to keep your defenders from being overloaded.

What are the scanners looking for?

Web servers may rule the top spot of deployed internet-facing services, but they aren’t the only exposed services and they aren’t just hosted on ports 443 and 80 anymore. Given how many IP addresses some scanners use and how many times the node in the above example was contacted by certain scanners, it’s likely a safe assumption that the port/service coverage was broad for some of them. It turns out, that assumption was spot-on:

At least when it comes to this observation experiment, Censys clearly has the most port/service coverage out of all the benign scanners. It was a bit surprising to see such a broad service coverage in the top seven providers, although most have higher concentrations below port 20000.

If you think you’re being “clever” by hosting an internet-facing service on a port “nobody will look at," think again. Censys (and others’) scans are also protocol-aware, meaning they’ll try to figure out what’s running based on the initial connection response. That means you can forget about hiding your SSH and SMB servers from their watchful eyes. As we’ll see in future posts, non-benign adversaries also know you try to hide services, and are just as capable of looking for your hidden high port treasure.

Going beyond benign

If we strip away all the benign scanner activity, we’re left with the real noise:

We’ll have follow-up posts looking at “a week in the life” of these sensors to help provide more perspectives on what defenders are facing when working to keep internet-facing nodes safe and sound.

Remember: you can use GreyNoise to help separate internet noise from threats as an unauthenticated user on our site. For additional functionality and IP search capacity, create your own GreyNoise Community (free) account today.

boB Rudis

Oct 5, 2022

How GreyNoise tags get created and how to use them

How are GreyNoise Tags created?

GreyNoise tags are described in the documentation as “a signature-based detection method used to capture patterns and create subsets in our data.” The GreyNoise Research team is responsible for creating tags for vulnerabilities and activities seen in the wild by GreyNoise sensors. GreyNoise researchers have two main methods for tagging: a data-driven approach, and an emerging threats-driven approach. Each of these approaches has three main stages:

Discovery
Research
Implementation

Data-driven approach

When using a data-driven approach, researchers work backward from the data collected by GreyNoise sensors. Researchers will manually browse data or create tooling to aid in finding previously untagged and interesting data. This method relies heavily upon intuition and prior expertise and has led to non-vulnerability-related discoveries such as a Holiday PrintJacking campaign. Using this approach, GreyNoise steadily works toward providing some kind of context for every bit of data opportunistically transmitted over the internet to our sensors.

During the discovery phase, researchers identify interesting data that does not appear to be tagged by manual or tool-assisted browsing of raw sensor data. Researchers will simply query the data lake for interesting words or patterns, using instinct to drive exploration of the data. Once they have identified and collected an interesting set of data, they begin the research phase.

During the research phase, the researcher works to identify what the data is. This could be anything from CVE-related traffic to a signature for a particular tool. They do this by scouring the internet for various paths, strings, and bytes to find documentation relating to the raw traffic. This often requires the researcher to be adept at reading formal standards, like Requests for Comments, as well as reading source code in a variety of programming languages. Once they have identified the data, the researcher will gather and document their findings before moving on to the implementation phase.

Using their research, the researcher will implement a tag by actually writing the signature and populating the metadata that makes it into a GreyNoise tag. Once complete, a peer will review the work, looking for errors in the signature and false positives in the results before clearing it for production.

Emerging threats approach

When using an emerging threats-driven approach, researchers seek out emerging threats observable by GreyNoise sensors. For the most part, GreyNoise only observes network-related vulnerability and scanning traffic. This rules out vulnerabilities like local privilege escalations. Using this method, GreyNoise can provide early warning for mass scanning or exploitation activity in the wild of things like CVE-2022-26134, an Atlassian Confluence Server RCE.

During the discovery phase, researchers monitor a wide variety of sources such as tech news outlets, social media, US CISA’s Known Exploited Vulnerabilities Catalog, and customer/community requests. Researchers identify and prioritize CVEs that customers and community members may be interested in due to their magnitude, targeted appliances, etc.

Similar to the data-driven approach, researchers will gather publicly available information regarding the emerging threat that will allow them to write a signature. Proof-of-Concept (PoC) code is often the most useful piece of information. On rare occasions, lacking a PoC, researchers will sometimes attempt to independently reproduce the vulnerability. Researchers will often attempt to validate vulnerabilities by setting up testbeds to better understand what elements of the vulnerability should be used to create a unique and narrowly scoped signature.

Finally, using all collected information, the researcher will seek to write the signature that becomes a tag. When doing this, researchers focus on eliminating false positives and tightly scoping the signature to the targeted data or vulnerability. When relevant for emerging threats, GreyNoise researchers will run this signature across all of GreyNoise’s historical data to determine the date of the first occurrence. This allows GreyNoise to publish information regarding when a vulnerability has first seen mass exploitation in the wild and, occasionally, if a vulnerability, like OMIGOD, was exploited before exploit details were publicly available.

How to use GreyNoise tags

GreyNoise provides insight into IP addresses that are scanning the internet or attempting to opportunistically exploit hosts across the internet. Tag data associated with a specific IP address provides an overview of the activity that GreyNoise has observed from a particular IP, as well as insight into the intention of the activity originating from it. For example, we can see that this IP is classified as malicious in GreyNoise because it is doing some reconnaissance but also has tags associated with malicious activity attempting to brute force credentials as well as traffic identifying it as part of the Mirai botnet.

‍

GreyNoise tags are also a great way to identify multiple hosts that are scanning for particular items or CVEs. For example, querying for a tag and filtering data can show activity related to a CVE that is originating from a certain country, ASN, or organization. This gives a unique perspective on activity originating from these different sources.

‍

Finally, tag data is accessible via the GreyNoise API and allows integrations to add this tag data easily into other products.

‍

See which tags are trending right now.

‍

The GreyNoise Team

Sep 28, 2022

Practical takeaways from CISA's Cyber Safety Review Board Log4j Report

Practical takeaways from CISA's Cyber Safety Review Board Log4j report

The Cybersecurity and Infrastructure Security Agency (CISA)'s Cyber Safety Review Board (CSRB) was established in May 2021 and is charged with reviewing major cybersecurity incidents and issuing guidance and recommendations where necessary. GreyNoise is cited as a primary source of ground truth in the CSRB's first report (direct PDF), published on July 11, 2022, covering the December 2021 Log4j event. GreyNoise employees testified to the Cyber Safety Review Board on our early observations of Log4j. In this post, we'll examine key takeaways from the report with a GreyNoise lens and discuss what present-day Log4j malicious activity may portend.

Log4J retrospective

It may seem as if it's been a few years since the Log4j event ruined vacations and caused quite a stir across the internet. In reality, it's been just a little over six months since we published our "Log4j Analysis - What To Do Before The Next Big One" review of this mega-incident. Since that time, we've worked with the CSRB and provided our analyses and summaries of pertinent telemetry to help them craft their findings.

The most perspicacious paragraph in the CSRB's report may be this one:

Most importantly, however, the Log4j event is not over. The Board assesses that Log4j is an “endemic vulnerability” and that vulnerable instances of Log4j will remain in systems for many years to come, perhaps a decade or longer. Significant risk remains.

In fact, it reinforces their primary recommendation: "Organizations should be prepared to address Log4j vulnerabilities for years to come and continue to report (and escalate) observations of Log4j exploitation." (CSRB Log4j Report, pg. 6)

The CSRB further recommends (CSRB Log4j Report, pg. 7) that organizations:

develop strong configuration, asset, and vulnerability management practices
invest resources in the open-source ecosystems they depend upon
follow the lead of the Federal government and require and use Software Bill of Materials (SBOM) when sourcing software and components

This is all sound advice, but if your organization is starting from ground zero in any of those bullets, getting to a maturity level where they will each be effective will take time. Meanwhile, we've published three blogs on emerging, active exploit campaigns since the Log4j event:

Furthermore, CISA has added over 460 new Known Exploited Vulnerabilities (KEV) to their ever-growing catalog. That's quite a jump in the known attack surface, especially if you're still struggling to know if you have any of the entries in the KEV catalog in your environment.

While you're working on honing your internal asset, vulnerability, and software inventory telemetry, you can improve your ability to defend from emerging attacks by keeping an eye on attacker activity (something our tools and APIs are superb at), and ensuring your incident responders and analysts identify, block, or contain exploit attempts as quickly as possible. That's one area the CSRB took a light touch on in their report, but that we think is a crucial component of your safety and resilience practices.

Log4j today

Log4j is (sadly, unsurprisingly) alive and well:

This hourly view over the past two months shows regular, and often large amounts of activity by a handful (~50) of internet sources. In essence, Log4j has definitely become one of the many permanent and persistent components of internet "noise," and this is a further confirmation of CISA's posit that Log4j is here for the long haul. As if on queue, news broke of Iranian state-sponsored attackers using the Log4j exploit in very recent campaigns, just as we were preparing this post for publication.

If we take a look at what other activity is present from those source nodes, we make an interesting discovery:

While most of the nodes are only targeting the Log4j vulnerability, some are involved in SSH exploitation, hunting for nodes to add to the ever-expanding Mirai botnet clusters, or focusing on a more recent Atlassian vulnerability from this year.

However, one node has merely added Log4j to the inventory of exploits it has been using. It's not necessary to see all the tag names here, but you can explore these IPs in-depth at your convenience.

Building on the conclusion of the previous section, you can safely block all IPs associated with the Apache Log4j RCE Exploit Attempt tag or other emerging tags to give you and your operations teams breathing room to patch.

You are always welcome to use the GreyNoise product to help you separate internet noise from threats as an unauthenticated user on our site. For additional functionality and IP search capacity, create your own GreyNoise Community (free) account today.

boB Rudis

Aug 30, 2022

Diving in the IPv6 Ocean

The Future of IPv6 at GreyNoise

The GreyNoise research team has reviewed a ton of IPv6 research and reading to provide a roadmap for the future of GreyNoise sensors and data collection. IPv6 is, without a doubt, a growing part of the Internet’s future. Google’s survey shows that adoption rates for IPv6 are on the rise and will continue to grow; the United States government has established an entire program and set dates for migrating all government resources to IPv6; and, most notably, the IPv4 exhaustion apocalypse continues to be an issue. As we approach a bright new future for IPv6, we must also expect IPv6 noise to grow. For GreyNoise, this presents a surprisingly difficult question: where do we listen from?

According to zMap, actors searching for vulnerable devices can scan all 4.2 billion IPv4 addresses in less than 1 hour. Unlike IPv4 space, IPv6 is unfathomably large, weighing in at approximately 340x10³⁶addresses. Quick math allows us to estimate 6.523 × 10^24 years to scan all IPv6 space at the same rate as one might use to scan IPv4 space. Sheer size prevents actors from surveying IPv6 space for vulnerabilities in the same way as IPv4.

But there’s a Hitlist?

Since actors cannot simply traverse the entire address space as they can with IPv4 space, determining where responsive devices might reside in IPv6 space is a difficult and time-consuming endeavor – as demonstrated by the IPv6 Hitlist Project. Projects like the Hitlist are critical as they allow academic researchers to measure the internet and provide context for the environment of IPv6. Without projects like this, we wouldn’t know adoption rates or understand the vastness of the IPv6 space.

Research scanning is one of the internet’s most important types of noise. It also happens to be the only noise that GreyNoise marks as benign. Unfortunately, researchers aren’t the only ones leveraging things like the Hitlist to survey IPv6 space. Malicious actors also use these “found” responsive IPv6 address databases to hunt vulnerable hosts. To better observe and characterize the landscape of IPv6 noise, GreyNoise must ensure that our sensors end up on things like the IPv6 Hitlist.

One strategy is to place sensors inside of reserved IPv6 space. IPv6 addresses can be up to 39 characters long, proving a challenge to memorize over IPv4’s maximum of 15. The reliance on DNS for devices will become even more prevalent as more organizations adopt IPv6, exposing reverse DNS as a primary method for the enumeration of devices. Following the Nmap ARPA scan logic, adding an octet to an IPv6 prefix and performing a reverse DNS lookup will return one of two results: an NXDOMAIN indicating no entry at the address or NOERROR indicating a reserved host. This method can efficiently reduce the number of hosts scanned in an IPv6 prefix, but does have the prerequisite of knowing the appropriate IPv6 prefix to add octets to check. Since GreyNoise already places sensors in multiple data centers and locations, any database, like the IPv6 Hitlist, will already include us.

Another method is to reside inside of providers that are IPv6-routed. BGP announcements provide a direct route to IPv6 networks, but an enumeration of responsive hosts is still an undertaking. Scanners will need to find a way to catalog and call back to the responsive hosts since there could still be many results (and the size of the address is much larger). Providers with IPv6 routing are growing and affordable, making it worthwhile for us to deploy sensors and work with widely used providers to determine who is already getting scanned using this method.

Our current IPv6 status

What we currently see in our platform begins with reliable identification of IPv6 in IPv4 encapsulation, often referred to as 6in4. None of our sensors are currently located on providers using solely IPv6; therefore, the packets will always be IPv4 encapsulated.

We also see users querying for IPv6 addresses in the GreyNoise Visualizer, but these queries are problematic; GreyNoise currently can do better when a user queries for an IPv6 address. Users regularly query for link-local addresses, which are addresses meant for internal network communications. Other queried addresses are often in sets that indicate users are querying IPv6 addresses in their same provider prefix. They may be querying their own IPv6 address or nodes that are attempting neighbor discovery. We are looking at ways to educate and notify users when they input these types of addresses to help them further understand the IPv6 landscape.

The future of IPv6

Though the technicalities of scanning for IPv6 are less straightforward than one would expect, GreyNoise looks to the academic research being done in the IPv6 field to inform future product strategies. As the attack landscape evolves, GreyNoise sensors placed in opportunistic paths will continue to gain and share meaningful IPv6 knowledge for researchers around the world.

Kimber Duke

Jun 21, 2022

Prioritizing What Matters Through the Lens of the Verizon DBIR and GreyNoise Intelligence

GreyNoise And The 2022 Verizon DBIR

This year marks the fifteenth anniversary of the Verizon Data Breach and Investigations Report (DBIR). If you’re not familiar with this annual publication, it is a tome produced by the infamous cyber data science team over at Verizon. Their highly data-driven approach (referencing 914,547 incidents and 234,638 breaches plus 8.9 TB of cybersecurity data) helps practitioners understand malicious cyber activity across the industry. The Verizon DBIR shows how threats are trending and evolving, as well as the impacts these malicious actions have on organizations of every shape and size.

This year, as in years gone by, GreyNoise researchers contributed our insights-infused, planetary-scale, opportunistic attacker sensor fleet data to the Verizon DBIR. This is the same data that fuels our platform and helps defenders mitigate threats, understand adversaries, and focus on what matters.

Let’s take a look at the key findings from the report, what our data has to say about the current threat landscape, and how you can use insights from our data to help keep your organization safe and resilient.

What Say You, DBIR?

The DBIR team provided five key elements in their overall summary, and we’ll take a modest dive into each of them.

Barbarians At The Gates

First up is how attackers breach your defenses. It will come as no shock to most readers that the use of credentials, phishing attacks, vulnerability exploitation, and botnets are all initial techniques that attackers use to breach the defenses of organizations.

Over four thousand breaches in 2021 were initially caused by criminals replaying credentials, launching successful phishing attacks, exploiting internet-facing vulnerabilities or employing botnets to perform other actions.

Given the propensity of credential use in initial access, you may wonder why attackers even bother using other means of gaining a foothold. While there will always be services deployed on the internet with default credentials left intact, user credentials do not age very well and need to be re-upped regularly (usually by breaching an organization to steal them en masse). Make no mistake, they work far too often, especially for juicy services such as Microsoft’s Remote Desktop Protocol, which is why they are used in the first place.

Creds are noisy, and phishing does take some effort to do it well, even when using phishing kits or attacker phishing-as-a-service providers. Scouring the internet for vulnerable services is almost risk-free, relatively cheap, and can lead to remote code execution on a decent percentage of nodes, as Figure 43 of the DBIR shows (GreyNoise provided the data behind the chart for the DBIR team to work their magic on):

Attackers use a multi-stage approach to acquiring targets. Scanners scour the internet for likely targets. Crawlers look for weaknesses in exposed services. More detailed scans look to see if you have exposed vulnerabilities, and, when they do, your system gets compromised.

If you ensure you have safe and resilient configurations on your internet-exposed assets and mission-critical internal systems, plus have empowered your workforce to be co-defenders of your organization, you may avoid becoming a statistic, at least in this category.

Always Be enCrypting

Ransomware also plagued more organizations than ever this past year, with a 13% increase from 2020, as shown in our reimagining of Figure 6 in the DBIR. The DBIR’s ransomware corpus is far from complete, but aligns in proportion with statistics from other sources of ransomware incidents.

There were 740 ransomware events in the 2021 Verizon DBIR corpus. A 13% increase from the 597 events in 2020.

As noted in the text of the report, ransomware starts with some action; usually, one of those found in the Initial Actions noted above. Ransomware actors often take advantage of the latest and greatest exploits for recent CVEs, which is activity you can track in the GreyNoise platform to help you frame the need for speed when it comes to mitigating and patching.

Prioritizing patching actively exploited vulnerabilities should be at the top of your to-do list.

Attackers Getting High On Your Supply [Chain]

Supply chain attacks have been making headlines ever since the highly disruptive SolarWinds incident back in 2019 (though there have been numerous documented supply chain attacks long before that mega-event). The DBIR documented over 3,400 “System Intrusion” events this year, showing you need to be as vigilant on the inside as you are on your internet-facing attack surface; this ensures you aren’t a conduit to other organizations for criminals. Furthermore, you should have a solid third-party risk management program and some way to track software development dependencies, which prevents breach by those you trust.

A Bucketful Of Errors

In this year’s corpus, the DBIR team found that 13% of breaches were caused by errors, often when it comes to securing cloud storage. So, make sure you mind your buckets, but take some comfort: this particular disheartening statistic appears to be stabilizing.

To Err Is Still Human

Humans likely helped cause some (most?) of the aforementioned misconfigurations as well as many other incidents that ended up as breaches. As the DBIR researchers themselves note: “Use of stolen credentials, phishing, misuse, or simply an error, people continue to play a very large role in incidents and breaches alike.”

How To Avoid Being an Accidental DBIR Contributor Next Year

GreyNoise has tools, data, and insights which easily integrate into your comprehensive cybersecurity program to help keep your organization safe and resilient.

We’re almost halfway through the year, and if you’ve managed to avoid a major incident or breach so far, you’re doing pretty well. But (there’s always a “but”), we should note that we’re also likely to see more groups like LAPSUS$ pop up to use their smash-and-grab model. Plus, you’ve also got all the old-school attacks to worry about.

If you and your team can filter out the noise, figure out what you don’t need to do, and get visibility into the areas you do need to focus on (while ensuring you have a spot-on incident response program), you may just make it another year without adding to the 9.8 terabytes of data the DBIR team already has to crunch for each report.

boB Rudis

May 31, 2022

GreyNoise Tag Round Up | October 1 - 29

New Tags

GitLab CE RCE Attempt [Intention: Malicious]

CVE-2021-22205
This IP address has been observed attempting to exploit CVE-2021-22205, a remote command execution vulnerability in GitLab Community Edition.
Sources: NIST, HN Security, GitHub
See it on GreyNoise Viz

Apache Storm Supervisor RCE Attempt [Intention: Malicious]

CVE-2021-40865
This IP address has been observed attempting to exploit CVE-2021-40865, a pre-auth remote code execution vulnerability in Apache Storm supervisor server.
Sources: Security Lab, SecLists
See it on GreyNoise Viz

Hikvision IP Camera RCE Attempt [Intention: Malicious]

CVE-2021-36260
This IP address has been observed attempting to exploit CVE-2021-36260, a remote command execution vulnerability in Hikvision IP cameras and NVR firmware.
Sources: Watchful IP, Github (@Aiminsun)
See it on GreyNoise Viz

SonicWall SMA100 Factory Reset Attempt [Intention: Malicious]

CVE-2021-20034
This IP address has been observed attempting to exploit CVE-2021-20034, an arbitrary file deletion vulnerability that allows performing a factory reset on SonicWall SMA100 devices.
Sources: Exploit DB, Attacker KB
See it on GreyNoise Viz

SonicWall SSL-VPN RCE Attempt [Intention: Malicious]

This IP address has been observed attempting to exploit a remote command execution vulnerability in SonicWall SSL-VPN.
Sources: Darren Martyn (Blog, GitHub)
See it on GreyNoise Viz

Legacy Web Server RCE Attempt [Intention: Malicious]

CVE-2009-4487, CVE-2009-4488, CVE-2009-4489, CVE-2009-4490, CVE-2009-4491, CVE-2009-4492, CVE-2009-4493, CVE-2009-4494, CVE-2009-4495, CVE-2009-4496
This IP address has been observed attempting to exploit a command injection vulnerability found in the old versions of several web servers.
Sources: ush.it
See it on GreyNoise Viz

D-Link DIR-825 R1 RCE Attempt [Intention: Malicious]

CVE-2020-29557
This IP address has been observed attempting to exploit CVE-2020-29557, a remote command execution vulnerability in D-Link DIR-825 R1 devices.
Sources: Shaked Delarea, NIST
See it on GreyNoise Viz

D-Link DNS-320 RCE Attempt [Intention: Malicious]

CVE-2020-25506
This IP address has been observed attempting to exploit CVE-2020-25506, a remote command execution vulnerability in D-Link DNS-320 devices.
Sources: NIST, GitHub
See it on GreyNoise Viz

Micro Focus OBR RCE Attempt [Intention: Malicious]

CVE-2021-22502
This IP address has been observed attempting to exploit CVE-2021-22502, a remote command execution vulnerability in Micro Focus Operation Bridge Reporter software.
Sources: NIST, GitHub
See it on GreyNoise Viz

Yealink Device Management RCE Attempt [Intention: Malicious]

CVE-2021-27561
This IP address has been observed attempting to exploit CVE-2021-27561, a remote command execution vulnerability in Yealink Device Management Platform.
Sources: NIST, SSD Disclosure
See it on GreyNoise Viz

The GreyNoise Team

Oct 29, 2021

GreyNoise Tag Round Up | September 14 - 30

New Tags

Azure OMI RCE Attempt [Intention: Malicious]

CVE-2021-38647, CVE-2021-38648, CVE-2021-38645, CVE-2021-38649
This IP address has been observed scanning the internet for WSMan Powershell providers without an Authorization header, a root RCE in Azure Open Management Infrastructure.
Sources: Wiz, Microsoft Security Response Center [1, 2, 3, 4]
See it on GreyNoise Viz

Azure OMI RCE Check [Intention: Unknown]

CVE-2021-38647, CVE-2021-38648, CVE-2021-38645, CVE-2021-38649
This IP address has been observed scanning the internet for WSMan Powershell providers without an Authorization header, but has not provided a valid SOAP XML Envelope payload.
Sources: Wiz, Microsoft Security Response Center [1, 2, 3, 4]
See it on GreyNoise Viz

VMWare VCSA File Upload Attempt [Intention: Malicious]

CVE-2021-22005, CVE-2021-22017
This IP address has been observed attempting to exploit a remote file upload vulnerability in VMWare vCenter Server Appliance.
Sources: VMware [1, 2], MITRE [1, 2]
See it on GreyNoise Viz

VMWare VCSA File Upload Check [Intention: Unknown]

CVE-2021-22005, CVE-2021-22017
This IP address has been observed checking for the presence of a remote file upload vulnerability in VMWare vCenter Server Appliance.
Sources: VMware [1, 2], MITRE [1, 2]
See it on GreyNoise Viz

LDAP Crawler [Intention: Unknown]

This IP address has been observed crawling the internet and attempting to discover hosts that respond to LDAP SearchRequest messages.
Sources: IETF, ldap.com, LDAP Wiki
See it on GreyNoise Viz

Veeder-Root ATGs Crawler [Intention: Unknown]

This IP address has been observed attempting to discover Veeder-Root Automatic Oil Tank Gauges.
Sources: Rapid7 [1, 2], Veeder
See it on GreyNoise Viz

VMware vCenter File Disclosure [Intention: Malicious]

This IP address has been observed attempting to exploit an arbitrary file disclosure vulnerability in VMware vCenter.
Sources: GitHub, PT Swarm
See it on GreyNoise Viz

PJL Crawler [Intention: Unknown]

This IP address has been observed sending Printer Job Language commands.
Sources: Tenable, HP Developers Portal
See it on GreyNoise Viz

PowerShell Generic Shell Attempt [Intention: Malicious]

This IP address has been observed attempting to spawn a generic PowerShell reverse or bind shell using the web request.
Sources: GitHub
See it on GreyNoise Viz

Cisco IMC Supervisor and UCS Director Backdoor [Intention: Malicious]

CVE-2019-1935
This IP address has been observed attempting to authenticate via SSH using default credentials for Cisco IMC Supervisor and Cisco UCS Director products.
Sources: NIST
See it on GreyNoise Viz

Supriya Mazumdar

Oct 4, 2021

Cookies + Milk: Detecting Cookies, Headless Browsers, and CLI tools with GreyNoise

Our research team is always looking for ways to improve our tagging methodology to enable GreyNoise users to understand actor behavior and tooling. GreyNoise already identifies clients with JA3 and HASSH data.

To expand on this work, GreyNoise recently added 3 new tags to shed more light on how various internet background noise-makers using HTTP clients manage their internal state. The below tags improve client fingerprinting for HTTP-based protocols.

Carries HTTP Referer: This tag identifies HTTP clients that include a “Referer” header which indicates what page or site the HTTP request was referred from.

Stores HTTP Cookies - This tag identifies HTTP clients that allow “Cookies” to be set and stored in the client’s storage and are sent with subsequent requests.

Follows HTTP Redirects - This tag identifies HTTP clients that follow “Location” 301 (Permanent) redirects to another page or site.

On their own, each individual tag contributes a small indication of how the HTTP client manages its internal state. While that alone has value in helping to profile the actor behind the IP and possibly track them across IPs, the more interesting insights can be seen when these tags are viewed holistically.

Figure 1: Venn Diagram representing IP's that match each tag and their respective overlaps, data pulled on Aug. 25, 2021. — *Figure 1: Venn Diagram representing IPs that match each tag and their respective overlaps, data pulled on Aug. 25, 2021.*

Figure 2: Venn Diagram representing IP's that match each tag and their respective overlaps, data pulled on Sep. 10, 2021. — *Figure 2: Venn Diagram representing IPs that match each tag and their respective overlaps, data pulled on Sep. 10, 2021.*

As seen above, the tagged activity is not homogenous, allowing us a glimpse into the diversity of tooling or techniques used in scanning and opportunistic exploitation. While many actors may use the same exploit vector or payload, they may launch them from tools that support different HTTP features. These new tags may help the analyst determine if two IPs appear to be using the same tools.

*Figure 3: IP Details page for 42.236.10.75. See it in the GreyNoise Viz.*

For example, in Figure 3, we are able to determine with a high degree of confidence that the IP shown above is orchestrating a full-featured web browser (such as Puppeteer) to scan the internet. We see this because the IP exhibits browser-like behavior and attributes, including carrying a referer header, accepting cookies, and following redirects.

We hope these new tags offer our users greater insight into the tooling and libraries utilized by internet background noise-makers. Let us know what you think by sharing your feedback on the GreyNoise Community Slack channel (must have a GreyNoise account).

The GreyNoise Team

Sep 29, 2021

GreyNoise Tag Round Up | September 2 - 13

New Tags

MongoDB Crawler [Intention: Unknown]

This IP address has been observed crawling the Internet and attempting to discover MongoDB instances.
Sources: MongoDB Docs, RAPID7
See it on GreyNoise Viz

Apple iOS Lockdownd Crawler [Intention: Unknown]

This IP address has been observed attempting to discover legacy Apple iOS devices with remotely accessible lockdownd service.
Sources: iPhone Wiki, Zdziarski's Blog, GitHub, Apple Support
See it on GreyNoise Viz

HTTP Request Smuggling [Intention: Malicious]

This IP address has been observed attempting to smuggle HTTP requests, a method commonly used to bypass load balancer or proxy security restrictions.
Sources: PortSwigger, JFrog
See it on GreyNoise Viz

Gh0st RAT Crawler [Intention: Malicious]

This IP address has been observed checking for the existence of hosts infected with Gh0st trojan.
Sources: RSA Community, norman.no
See it on GreyNoise Viz

nJRAT Crawler [Intention: Malicious]

This IP address has been observed sending unobfuscated njRAT traffic.
Sources: RSA Community, Krebs On Security
See it on GreyNoise Viz

Supervisor XML-RCE Attempt [Intention: Malicious]

This IP address has been observed attempting to exploit CVE-2017-11610, a remote command execution vulnerability in Supervisor client/server.
Sources: NIST, Supervisor
See it on GreyNoise Viz

New Actor Tag

BLEXbot [Intention: Benign]

Sources: WebMeUp
See it on GreyNoise Viz

Supriya Mazumdar

Sep 13, 2021

GreyNoise Tag Roundup | August 16 - September 1

New Tags

Atlassian Confluence Server OGNL Injection Attempt [Intention: Malicious]

CVE-2021-26084
This IP address has been observed attempting to exploit CVE-2021-26084, an OGNL injection vulnerability in Confluence Server and Data Center.
Sources: GitHub (1, 2), MITRE
See it on GreyNoise Viz

Atlassian Confluence Server OGNL Injection Vuln Check [Intention: Unknown]

CVE-2021-26084
This IP address has been observed checking for the existence of CVE-2021-26084, an OGNL injection vulnerability in Confluence Server and Data Center.
Sources: GitHub (1, 2), MITRE
See it on GreyNoise Viz

Oracle WebLogic RCE CVE-2021-2109 [Intention: Malicious]

CVE-2021-2109
This IP address has been observed exploiting Oracle WebLogic CVE-2021-2109.
Sources: Mitre, PacketStorm Security, GitHub
See it on GreyNoise Viz

Seagate BlackArmor RCE Attempt [Intention: Malicious]

CVE-2014-3206
This IP address has been observed exploiting CVE-2014-3206, a remote code execution vulnerability in Seagate BlackArmor NAS.
Sources: NIST, VulDB, ExploitDB
See it on GreyNoise Viz

ASUS GT-AC2900 Auth Bypass Attempt [Intention: Malicious]

CVE-2021-32030
This IP address has been observed attempting to exploit CVE-2021-32030, an authentication bypass in ASUS GT-AC2900 routers.
Sources: MITRE, Atredis
See it on GreyNoise Viz

Apache SkyWalking GraphQL SQL Injection [Intention: Malicious]

CVE-2020-9483
This IP address has been observed attempting to exploit CVE-2020-9483, a SQL injection vulnerability in Apache SkyWalking via GraphQL.
Sources: GitHub, NVD
See it on GreyNoise Viz

Carries HTTP Referer [Intention: Unknown]

This IP address has been observed scanning the internet with an HTTP client that includes the Referer header in its requests.
Sources: Firefox
See it on GreyNoise Viz

Stores HTTP Cookies [Intention: Unknown]

This IP address has been observed scanning the internet with an HTTP client that supports storing Cookies.
Sources: Firefox (1, 2)
See it on GreyNoise Viz

Follows HTTP Redirects [Intention: Unknown]

This IP address has been observed scanning the internet with an HTTP client that follows redirects defined in a Location header.
Sources: Firefox
See it on GreyNoise Viz

RSYNC Crawler [Intention: Unknown]

This IP address has been observed scanning the internet and attempting to discover rsync server instances.
Sources: Red Hat, Hacktricks.xyz
See it on GreyNoise Viz

New Actor Tag

University of Michigan [Intention: Benign]

Sources: Department of Electrical Engineering and Computer Science
See it on GreyNoise Viz

Tag Improvements

As part of our process, our research team continues to clean up and improve on existing tags as new information or better processes are introduced.

ADB Check [Intention: Unknown]

This IP address has been observed checking for the existence of the Android Debug Bridge protocol.
See it on GreyNoise Viz

ADB Attempt [Intention: Malicious]

This IP address has been observed checking for the existence of the Android Debug Bridge protocol and has requested interactivity.
See it on GreyNoise Viz

EDITORS NOTE: This blog post has been updated as of Sep. 2 to reflect edits to the Atlassian Confluence Server OGNL Injection tags.

The GreyNoise Team

Sep 2, 2021

No blog articles found

Please update your search term or select a different category and try again.

Get started today

Get a demo Search for free

GreyNoise Research

Subscribe to GreyNoise

GreyNoise Uncovers Early Warning Signals for Emerging Vulnerabilities

The Six-Week Critical Window

Why This Matters

What’s Inside the Report

Text Embedding for Fun and Profit

Words != Numbers

Use Cases

How to method 1: Cheat mode

How to method 2: Build your own

Side quest: GPT vs BERT

Main quest: Building a text encoder

Tokenizer

Model

Code

Evaluating

How We’re Using it at GreyNoise

Fin

KEV'd: CVE-2021-45046, CVE-2023-21839, and CVE-2023-1389

Bonus Update:

Change in ENV Crawler Tags as Bots Continue to Target Environment Files

Crawlers finding public, unsecured environment files continue to be used to compromise organizations.

Background

Why are attackers so interested in env files?

What is GreyNoise changing?

Why the change?

Final thoughts:

OpenAI, MinIO, And Why You Should Always Use docker-cli-scan To Keep Your Supply chAIn Clean

A week in the life of a GreyNoise Sensor: It's all about the tags

The "not-so-benign" perspective

Tagged Malicious Traffic Started Coming In As Soon As The Sensors Were Functional

The Four Largest "Spike" Hours Had Mostly Similar Characteristics

We Saw The Usual Suspects Rise To The Top Of 13,576 Ports

If You've Ever Stared At An IPv4 IoC List, You Definitely Recognize These Folks

It's all about the tags

The Tagged Traffic Distribution Takes A Familiar Shape

Key Takeaways

GreyNoise Analysis Of A Quartet of Exchange Remote Code Execution Vulnerabilities: CVE-2023-21529; CVE-2023-21706; CVE-2023-21707; CVE-2023-21710

Exploit Vector Analysis of Emerging ‘ESXiArgs’ Ransomware

Attribution to a specific CVE

Metrics of Vulnerable Hosts

Metrics of Compromised Hosts

How do you “GreyNoise” an unknown attack vector?

Additional Resources

Observed In The Wild: HTTP PUT Anomalies Explained

Researcher Notes

A week in the life of a GreyNoise Sensor: The benign view

A week in the life of a GreyNoise sensor

The “Benign” perspective

Benign infrastructure

What are the scanners looking for?

Going beyond benign

How GreyNoise tags get created and how to use them

How are GreyNoise Tags created?

Data-driven approach

Emerging threats approach

How to use GreyNoise tags

Practical takeaways from CISA's Cyber Safety Review Board Log4j Report

Practical takeaways from CISA's Cyber Safety Review Board Log4j report

Log4J retrospective

Log4j today

Diving in the IPv6 Ocean

The Future of IPv6 at GreyNoise

But there’s a Hitlist?

Our current IPv6 status

The future of IPv6

Prioritizing What Matters Through the Lens of the Verizon DBIR and GreyNoise Intelligence

GreyNoise And The 2022 Verizon DBIR

What Say You, DBIR?

Barbarians At The Gates

Always Be enCrypting

Attackers Getting High On Your Supply [Chain]

A Bucketful Of Errors

To Err Is Still Human

How To Avoid Being an Accidental DBIR Contributor Next Year

GreyNoise Tag Round Up | October 1 - 29

New Tags

GreyNoise Tag Round Up | September 14 - 30

New Tags