Insights

Blog posts in the Insights category.

Checking It Twice: Profiling Benign Internet Scanners — 2024 Edition

This is a follow-up from our October, 2022 post — Sensors and Benign Scanner Activity

Throughout the year, GreyNoise tends to focus quite a bit on the “naughty” connections coming our way. After all, that’s how we classify IP addresses as malicious so organizations can perform incident triage at light speed, avoid alert fatigue, and get a leg up on opportunistic attackers by using our IP-based block-lists.

At this time of year, we usually take some time to don our Santa hats and review the activities of the “nice” (a.k.a., “benign”) sources that make contact with our fleet.

Scanning the entire internet now drives both cybersecurity attack strategies and defense tactics. Every day, multiple legitimate organizations perform mass scanning of IPv4 space to gather data about exposed services, vulnerabilities, and general internet health. In November 2024, we deployed 24 new GreyNoise sensors across diverse network locations to study the behavior and patterns of these benign scanners.

Why This Matters

When organizations deploy new internet-facing assets, they typically experience a flood of inbound connection attempts within minutes. While many security teams focus on malicious actors, understanding benign scanning activity is equally crucial for several reasons:

  1. These scans generate significant amounts of log data that can obscure actual threats
  2. Security teams waste valuable time investigating legitimate scanning activity
  3. Benign scanners often discover and report vulnerable systems before malicious actors

The Experiment

We positioned 24 freshly baked sensors across five separate autonomous systems and eight distinct geographies and began collecting data on connection attempts from known benign scanning services. We narrowed the focus down to the top ten actors with the most tags in November. The analyzed services included major players in the internet scanning space, such as Shodan, Censys, and BinaryEdge, along with newer entrants like CriminalIP and Alpha Strike Labs.

Today, we’ll examine these services' scanning patterns, protocols, and behaviors when they encounter new internet-facing assets. Understanding these patterns helps security teams better differentiate between routine internet background noise and potentially malicious reconnaissance activity. There’s a “Methodology” section at the tail end of this post if you want the gory details of how the sausage was made.

The Results

We’ll first take a look at the fleet size of the in-scope benign scanners.

The chart below plots the number of observed IP addresses from each organization for the entire month of November vs. the total tagged interactions from those sources (as explained in the Methodology section). Take note of the tiny presence of both Academy for Internet Research and BLEXBot, as you won’t see them again in any chart. While they made the cut for the month, they also made no effort to scan the sensors used in this study.

As we’ll see, scanner fleet size does not necessarily guarantee nimbleness or completeness when it comes to surveying services on the internet.

Contact Has Been Made

The internet scanner/attack surface management (ASM) space is pretty competitive. One area where speed makes a difference is how quickly new nodes are added to the various inventories. All benign scanners save for ONYPHE (~9 minutes) and CriminalIP (~17 minutes) hit at least one of the target sensors within five minutes of the sensor coming online.

BinaryEdge and ONYPHE display similar dense clustering patterns, with significant activity bursts occurring around the 1-week mark. Their sensor networks appear to capture a high volume of unique IP contacts, forming distinctive cone-shaped distributions that suggest systematic scanning behavior.

Censys and Bitsight exhibit comparable behavioral patterns, though Bitsight’s first contacts appear more concentrated in recent timeframes. This could indicate a more aggressive or efficient scanning methodology for discovering new hosts.

ShadowServer shows a more dispersed pattern of first contacts, with clusters forming across multiple time intervals rather than concentrated bursts. This suggests a different approach to host discovery, possibly employing more selective or targeted scanning strategies.

Alpha Strike Labs and Shodan.io demonstrate sparser contact patterns, indicating either more selective scanning criteria or potentially smaller sensor networks. Their distributions show periodic clusters rather than continuous streams of new contacts.

CriminalIP presents the most minimal contact pattern, with occasional first contacts spread across the timeline. This could reflect a highly selective approach to host identification or a more focused scanning methodology.

The above graph also shows just how extensive some of the scanner fleets are (each dot is a single IP address making contact with one of the sensors; dot colors distinguish one sensor node from another).

If we take all that distinct data and whittle it down to count which benign scanners hit the most sensors first, we see that ONYPHE is the clear winner, followed by Censys — demonstrating strong but more focused scanning capabilities — with BinaryEdge coming in third.

The chart below digs a bit deeper into the first contact scenarios. We identified the very first contacts to each of the 24 sensor nodes from each benign scanner. ONYPHE shows a concentrated burst of activity in the 6-12 hour window, while Bitsight’s contacts are more evenly distributed throughout the observation period. Censys demonstrates a mixed pattern, with clusters in the early hours followed by sporadic contacts. ShadowServer exhibits a notably consistent spread of first contacts across multiple time windows.

BinaryEdge’s pattern suggests coordinated scanning activity, with tight groupings of contacts that could indicate automated discovery processes. Alpha Strike Labs shows a selective, possibly more targeted approach to first contact, while CriminalIP has minimal but distinct touchpoints. Shodan rounds out the observation set with periodic contacts that suggest a methodical scanning approach.

Speed Versus Reach

While speed is a critical competitive edge, coverage may be an even more important one. It’s fine to be the first to discover, but if you’re not making a comprehensive inventory, are you even scanning?

We counted up all the ports these benign scanners probed over the course of a week. Censys leads the pack with an impressive 36,056 ports scanned, followed by ShadowServer scanning 19,166 ports, and Alpha Strike Labs covering 14,876 ports.

ONYPHE, Shodan, and even both BinaryEdge and Bitsight seem to take similar approaches when it comes to probing for services on midrange and higher ports. All of them, save for CriminalIP, definitely know when you’ve been naughty and tried to hide some service outside traditional port ranges.

Before moving on to our last section, it is important to remind readers that we are only showing a 7-day view of activity. Some scanners, notably Censys, have much broader port coverage than a mere 55% of port space. The internet is a very tough environment to perform measurements in. Routes break, cables are cut, and even one small connection hiccup could mean a missed port hit. Plus, it’s not very nice to rapidly clobber a remote node that one is not responsible for.

Tag Time

The vast majority of benign contacts have no real payloads. Some of them do make checks for specific services or for the presence of certain weaknesses. When they do, the GreyNoise Global Observation Grid records a tag for that event. We wanted to see just how many tags these benign scanners sling our way.

Given ShadowServer’s mission, it makes sense that they’d be looking for far more weaknesses than the other benign scanners. The benign scanner organizations that also have an attack surface management (ASM) practice will also usually perform targeted secondary scans for customers who have signed up for such inspections.

In Conclusion

We hope folks enjoyed this second look at what benign scanners are up to and what their strategies seem to be when it comes to measuring the state of the internet.

If you have specific questions about the data or would like to see different views, please do not hesitate to contact us in our community Slack or via email at research@greynoise.io.

Methodology

Sensors were deployed between 2024-11-19 and 2024-11-26 (UTC) across five autonomous systems and in the IP space of the following countries:

  • Croatia
  • Estonia
  • Ghana
  • Kenya
  • Luxembourg
  • Norway
  • Slovenia
  • South Africa
  • Sweden

The in-scope benign actors (based on total tag hits across all of November):

Both Palo Alto’s Cortex Expanse and ByteSpider were in the original top ten, but were removed as candidates. Each of those services are prolific/noisy (one might even say “rude”), would have skewed the results, and made it impossible to compare the performance of these more traditional scanners. Furthermore, while ByteSpider may be (arguably) benign, it has more of a web crawling mission that differs from the intents of the services on the rest of the actor list.

We measured the inbound traffic from the in-scope benign actors for a 7-day period.

Unfortunately, neither Academy for Internet Research and BLEXBot reached out and touched these 24 new sensor nodes, therefore have no presence in the results.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

GreyNoise Use Cases: Twitter Edition V2

Andrew Morris got on a roll the other day and whacked out this tweetstorm describing the three key use cases for GreyNoise. Enjoy!


How Internet Noise Makes Security Harder

Defining Internet Noise

Every machine connected to the internet is exposed to a constant barrage of communications from tens of thousands of unique IP addresses per day. A percentage of these communications are malicious attacks and web crawls; some are non-malicious scans and pings;  some are legitimate business services; and still others are unknown. Taken together, this massive volume of unsolicited traffic is a challenge for security organizations because these communications trigger security tools to generate thousands of events to be analyzed, with little context on the potential threats.

Sources of Internet Noise

Let’s take a look at the different kinds of internet communications traffic that create this “noise” for security organizations:

Internet Scanners (aka Internet Background Noise)

Scanning the internet means reaching out and trying to initiate communications with a wide range of devices  that are directly connected to the internet. At a technical level, mass scanning the internet means requesting a slight amount of information (specifically a TCP SYN, UDP/ICMP packet, or banner grab) from all 4.2 BILLION IP addresses on the entire routable IPv4 space. And it turns out that tens of thousands of devices are scanning the internet constantly, generating a tremendous amount of internet “noise.”

Who scans the Internet?

Good guys scan the internet to measure the exposure of vulnerabilities, take inventory of software market share, and find botnet command & control servers.  In fact, there are entire websites and companies that act as "search engines" devoted to mass scanning the internet. Examples of this include companies like Shodan and Censys, as well as researchers and universities, who scan in good faith to help uncover vulnerabilities for network defense.

Bad guys scan the internet with malicious intent to find vulnerable devices that they can compromise and use for nefarious purposes. So while benign mass-scanner IP addresses might check if a port is running and then go away, malicious scanners might attempt to compromise the target machine by brute-forcing login credentials or launching a remote exploit. A good example was a recently discovered vulnerability in F5 network devices - in this case, malicious IPs scanned for F5 BIG-IP devices, checked if the device was vulnerable, and attempted to exploit the vulnerability.

Unknown groups scan the internet for unclear or covert reasons. Unknown actors could be individual researchers, companies, or nation-state actors that are attempting to remain anonymous, and everything in between.

At the end of the day, web crawlers, port scanners, researchers, and malware such as worms and botnets are all part of the activities  that contribute to Internet Noise. The challenge for security organizations is differentiating which of these scans are malicious signs of a targeted attack, and which are just “noise.”

Common Business Services

Another increasingly challenging source of Internet Noise is legitimate network communications with common business applications like Microsoft O365, Google Workspace, and Slack, as well as services like CDNs and public DNS servers. These applications often communicate through unpublished or dynamic IPs, making them difficult to identify. The result is a storm of log events from “unknown” IP addresses that are, in reality, from well-known and benign business services. Without context, this harmless communication distracts security teams from investigating true threats.

Security Challenges of Internet Noise

The goal for security teams is to identify malicious internet traffic that represents a potential threat to the organization, so they can focus research and remediation efforts quickly. Internet Noise ends up being a huge tax on SOC teams by taking time away from analysts that could be spent addressing true threats,  inflating log volumes and increasing storage costs, and contributing to analyst burnout.

GreyNoise Identifies Internet Noise So Security Teams Can Focus on Targeted Threats

GreyNoise tracks two distinct sets of Internet Noise today, making them available through our API, integrations, and visualizer:

  • Internet Background Noise: At GreyNoise, we deploy and manage hundreds of servers in multiple data centers and countries around the world to listen to internet Background Noise. Our purpose is to sit back and soak up all the opportunistic traffic generated by anyone mass scanning the internet. GreyNoise analyzes and enriches this data to identify behavior, methods, and intent. The goal is to give analysts the context they need to answer questions like: How many people are scanning the internet right now? What IP addresses is it coming from? What are they scanning for?
  • RIOT: RIOT provides context to communications between your users and common business applications (e.g., Microsoft O365, Google Workspace, and Slack) or services like CDNs and public DNS servers. These applications communicate through unpublished or dynamic IPs, making it difficult for security teams to track. Without context, this harmless behavior distracts security teams from investigating true threats.

The data GreyNoise collects can be used by security analysts to identify and de-prioritize traffic from omnidirectional scanners and common business services, allowing them to focus on targeted scan and attack traffic. They can use the data to

  • Track opportunistic botnets and other compromised devices
  • Understand what software vulnerabilities the bad guys are actively scanning for
  • Automatically enrich and prioritize alerts in SIEM and SOAR systems
  • And, if so inclined, opt out of many malicious mass-scanners altogether by blocking them preemptively and dynamically at the firewall

Viewing Internet Noise with GreyNoise

If you’re interested in learning more about what Internet Noise is and how much of it is happening on the internet right now, please check out the GreyNoise Visualizer. Free to use, the Visualizer can show you:

  • Overall volume of Internet Noise
  • New IPs generating Internet Noise
  • Classification of Internet Noise into malicious, benign, and unknown actors
  • Top organizations that are sources of Internet Noise
  • Trends and anomalies in Internet Noise traffic over the past month
  • Detailed behavioral information about specific IP addresses running scans
  • Emerging threat data about vulnerabilities being actively exploited

And if you find this information interesting or useful, please sign up for a free Community account, which includes access to our API for a subset of the “noise” data we collect. Our community of 10,000+ security analysts is a tremendous source of insight into Internet Noise and other InfoSec knowledge. If you are interested in joining, please reach out to community@greynoise.io

Also, please follow us on Twitter and LinkedIn!

Get Started With GreyNoise for Free

No blog articles found

Please update your search term or select a different category and try again.

Get started today