The Future of IPv6 at GreyNoise

The GreyNoise research team has reviewed a ton of IPv6 research and reading to provide a roadmap for the future of GreyNoise sensors and data collection. IPv6 is, without a doubt, a growing part of the Internet’s future. Google’s survey shows that adoption rates for IPv6 are on the rise and will continue to grow; the United States government has established an entire program and set dates for migrating all government resources to IPv6; and, most notably, the IPv4 exhaustion apocalypse continues to be an issue. As we approach a bright new future for IPv6, we must also expect IPv6 noise to grow. For GreyNoise, this presents a surprisingly difficult question: where do we listen from?

According to zMap, actors searching for vulnerable devices can scan all 4.2 billion IPv4 addresses in less than 1 hour. Unlike IPv4 space, IPv6 is unfathomably large, weighing in at approximately 340x10³⁶addresses. Quick math allows us to estimate 6.523 × 10^24 years to scan all IPv6 space at the same rate as one might use to scan IPv4 space. Sheer size prevents actors from surveying IPv6 space for vulnerabilities in the same way as IPv4.

But there’s a Hitlist?

Since actors cannot simply traverse the entire address space as they can with IPv4 space, determining where responsive devices might reside in IPv6 space is a difficult and time-consuming endeavor – as demonstrated by the IPv6 Hitlist Project. Projects like the Hitlist are critical as they allow academic researchers to measure the internet and provide context for the environment of IPv6. Without projects like this, we wouldn’t know adoption rates or understand the vastness of the IPv6 space.

Research scanning is one of the internet’s most important types of noise. It also happens to be the only noise that GreyNoise marks as benign. Unfortunately, researchers aren’t the only ones leveraging things like the Hitlist to survey IPv6 space. Malicious actors also use these “found” responsive IPv6 address databases to hunt vulnerable hosts. To better observe and characterize the landscape of IPv6 noise, GreyNoise must ensure that our sensors end up on things like the IPv6 Hitlist.

One strategy is to place sensors inside of reserved IPv6 space. IPv6 addresses can be up to 39 characters long, proving a challenge to memorize over IPv4’s maximum of 15. The reliance on DNS for devices will become even more prevalent as more organizations adopt IPv6, exposing reverse DNS as a primary method for the enumeration of devices. Following the Nmap ARPA scan logic, adding an octet to an IPv6 prefix and performing a reverse DNS lookup will return one of two results: an NXDOMAIN indicating no entry at the address or NOERROR indicating a reserved host. This method can efficiently reduce the number of hosts scanned in an IPv6 prefix, but does have the prerequisite of knowing the appropriate IPv6 prefix to add octets to check. Since GreyNoise already places sensors in multiple data centers and locations, any database, like the IPv6 Hitlist, will already include us.

Another method is to reside inside of providers that are IPv6-routed. BGP announcements provide a direct route to IPv6 networks, but an enumeration of responsive hosts is still an undertaking. Scanners will need to find a way to catalog and call back to the responsive hosts since there could still be many results (and the size of the address is much larger). Providers with IPv6 routing are growing and affordable, making it worthwhile for us to deploy sensors and work with widely used providers to determine who is already getting scanned using this method.

Our current IPv6 status

What we currently see in our platform begins with reliable identification of IPv6 in IPv4 encapsulation, often referred to as 6in4. None of our sensors are currently located on providers using solely IPv6; therefore, the packets will always be IPv4 encapsulated.

We also see users querying for IPv6 addresses in the GreyNoise Visualizer, but these queries are problematic; GreyNoise currently can do better when a user queries for an IPv6 address. Users regularly query for link-local addresses, which are addresses meant for internal network communications. Other queried addresses are often in sets that indicate users are querying IPv6 addresses in their same provider prefix. They may be querying their own IPv6 address or nodes that are attempting neighbor discovery. We are looking at ways to educate and notify users when they input these types of addresses to help them further understand the IPv6 landscape.