A week in the life of a GreyNoise sensor

To ensure we have as much visibility into activity on the internet as possible, we regularly deploy new sensors in different “geographical” network locations. We’ve selected two sensors for a short “week in the life” series to give practitioners a glimpse into what activity new internet nodes see. This series should help organizations understand the opportunistic and random activity that awaits newly deployed services, plus help folks understand just how little time you have to ensure internet-facing systems are made safe and resilient.

The “Benign” perspective

Presently, there are three source IPv4 classifications defined in our GreyNoise Query Language (GNQL): benign, malicious, and unknown. Cybersecurity folks tend to focus quite a bit on the malicious ones since they may represent a clear and present danger to operations. However, there are many benign sources of activity that are always on the lookout for new nodes and services, so we’re starting our new sensor retrospective by looking at those sources first. Don’t worry, we’ll spend plenty of time in future posts looking at the “bad guys."

While likely far from a comprehensive summary, there are at least 74 organizations regularly conducting internet service surveys of some sort (we’ll refer to them as ‘scanners’ moving forward):

AdScore

Ahrefs

Alpha Strike Labs

Ampere Innotech

ANT Lab

Applebot

Arbor Observatory

Archive.org

BinaryEdge.io

BingBot

Bit Discovery

Bitsight

BLEXBot

Caida

Censys

CERT-FR

Cloud System Networks

Cloudflare

Cortex Xpanse

CriminalIP

cyber.casa

CyberGreen

cymru

dataplane

DomainTools

Dutch Institute for Vulnerability Disclosure

errata

ESET

Facebook Crawler

FH Muenster University

GoogleBot

Internet Census

InterneTTL

Intrinsec

IPinfo.io

ipip.net

ipqualityscore

Knoq

LeakIX

Mail.RU

Max Planck Inst.

Moz DotBot

Net Systems Research

NetCraft

netsystems

ONYPHE

OpenIntel.nl

openportstats

Palo Alto Crawler

Petalbot

pnap

Project Sonar

Project25499

Quadmetrics.com

Qualys

Qwant

Recyber

RWTH AACHEN University

scorecardresearch

SecurityTrails

Seznam

ShadowServer.org

Shodan.io

Sogou

spyse

Stanford Univ.

stretchoid

Technical University of Munich

threatsinkhole

UMich

University of Colorado

VeriSign

WithSecure

Yandex Search Engine

We were curious as to how long it took these scanners to find our new nodes after they came online and were ready to accept connections. We capped our discovery period exploration at a week for this analysis but may dig into longer time periods in future updates.

Out of the 74 known scanners, only 18 (24%) contacted our nodes within the first week.

As the above chart shows, some of the more well-known scanners found our new sensor nodes within just about an hour after being booted up. A caveat to this data is that other scanners in the main list may have just tried contacting the IP addresses of these nodes before we booted them up.

One reason organizations should care about this metric is that some of these scanners are run by “cyber hygiene” rating organizations, and you only get one chance to make a first impression that could negatively impact, say, your cyber insurance premiums. So, don’t deploy poorly configured services if you want to keep the CFO happy.

Benign infrastructure

It’s pretty “easy” to scan the entire internet these days, thanks to tools such as Rob Graham’s masscan, provided you like handling abuse complaints and can afford the bandwidth costs on providers that allow such activity. We identified each of these scanning organizations via their published list of IPs. We decided to see just how many unique IPs of each scanner we saw within the first week:

Bitsight dedicates a crazy amount of infrastructure to poke at internet nodes. Same for the Internet Census. By the end of the week, we saw 346 unique benign scanner IPs contact our sensors, which means your internet-facing nodes likely did as well. While you may not want these organizations probing your perimeter, the reality is that, while you may be able to ask them to opt you out of scanning, you cannot do the same for attackers (abuse complaints aren’t a great solution either). Some organizations, ShadowServer in particular, are also there to help you by letting you understand your “attack surface” better, so you are likely better off using our benign classified IPs to help thin down the alerts these services likely generate (more on that in a bit).

The chart above also shows that some services have definite “schedules” for these scans, and others rarely make contact. Just how many contacts can you expect per day?

Hopefully, you are using some intelligent alert filtering to keep your defenders from being overloaded.

What are the scanners looking for?

Web servers may rule the top spot of deployed internet-facing services, but they aren’t the only exposed services and they aren’t just hosted on ports 443 and 80 anymore. Given how many IP addresses some scanners use and how many times the node in the above example was contacted by certain scanners, it’s likely a safe assumption that the port/service coverage was broad for some of them. It turns out, that assumption was spot-on:

At least when it comes to this observation experiment, Censys clearly has the most port/service coverage out of all the benign scanners. It was a bit surprising to see such a broad service coverage in the top seven providers, although most have higher concentrations below port 20000.

If you think you’re being “clever” by hosting an internet-facing service on a port “nobody will look at," think again. Censys (and others’) scans are also protocol-aware, meaning they’ll try to figure out what’s running based on the initial connection response. That means you can forget about hiding your SSH and SMB servers from their watchful eyes. As we’ll see in future posts, non-benign adversaries also know you try to hide services, and are just as capable of looking for your hidden high port treasure.

Going beyond benign

If we strip away all the benign scanner activity, we’re left with the real noise:

We’ll have follow-up posts looking at “a week in the life” of these sensors to help provide more perspectives on what defenders are facing when working to keep internet-facing nodes safe and sound.

Remember: you can use GreyNoise to help separate internet noise from threats as an unauthenticated user on our site. For additional functionality and IP search capacity, create your own GreyNoise Community (free) account today.

This article is a summary of the full, in-depth version on the GreyNoise Labs blog.
GreyNoise Labs logo
Link to GreyNoise Twitter account
Link to GreyNoise Twitter account