GreyNoise API Documentation Guide with Brad Chiappetta

Summary

At GreyNoise, we collect, analyze and label data on IPs that saturate security tools with noise. This unique perspective helps analysts waste less time on irrelevant or harmless activity, and spend more time focused on targeted and emerging threats.

GreyNoise data is made available through our web-based Visualizer and GreyNoise APIs. There are two basic APIs:The Community API provides a free resource to members to allow for quick IP lookups in the GreyNoise datasets.

The Community API can be used by analysts with a Community account, and returns basic information on what GreyNoise knows about that IP.

The GreyNoise Enterprise APIs require an active paid subscription or Enterprise Trial to access, and provide rich contextual information on what GreyNoise knows about that IP.

The Community API can be used by analysts with a Community account, and returns basic information on what GreyNoise knows about that IP.

The GreyNoise Enterprise APIs require an active paid subscription or Enterprise Trial to access, and provide rich contextual information on what GreyNoise knows about that IP.

Read the transcript

Hello, everyone. My name is Brad Chiappetta, and on this session we're going to be going through and reviewing the GreyNoise API documentation and the basic components of each one of our API endpoints. I'm going to be showing a few examples and sort of listing out all of the gotchas and things that you need to know about entering interfacing with and using the GreyNoise API. So the best way to get all of the documentation about the GreyNoise API is to start on the GreyNoise documentation hub, which can be found at docs.greynoise.io. This is a great resource, if you haven't used it before, to find out all of the documentation information you might have about GreyNoise, what it is, what our datasets provide, and to answer any other questions that you might have about the service.

Today, we're going to be mostly focused on the API reference component that is listed right here. In the API reference section, there is a listing of all of the available API endpoints that exist for the GreyNoise API. And you can see them all listed out here. And we're gonna go through each one of those and sort of talk through what they do and what data you can expect to get back from each one of them. So we're going to start the beginning with the community API. The way that the documentation site works, is it gives you the ability to drop in your API key here, and actually go through and test the API and its responses, you know, right through the interface here.

By default, it's going to show you shell commands and what it would look like to go ahead and curl the data directly. And I'm going to use that as some of my examples. But if you're looking for other programming languages to sort of copy and reuse this, you can see this a variety of programming languages available. So you can simply go through and select which one you would like, and be able to get examples for that particular language of how this particular request is formatted. And when you do things like add in a particular IP address or other input for that, it's going to go ahead and actually append that to the sample.

So you can see where that data is added in here. And then additionally, you'll be able to hit the Try It button and actually see what the response is as well. Alright, so the first endpoint that we're going to be reviewing today is the community endpoint. This is the free endpoint that all users both community and paid users are able to access. And this endpoint is provide designed to provide a basic response for what we know about a IP address. So what this is going to give you back and you can see sort of in the sample response that I have up on the screen here is we're going to give you two bullions, which are going to be sorted the most critical pieces of information, the first is going to indicate noise. And you'll see this response in a lot of our API's.

We may also interchangeably use this with the term "seen" (s-e-e-n), we use those interchangeably. So they mean the same thing when you're seeing those in the responses. In this particular case, this Boolean has come back as false. And what that means is that this particular IP address that we've created is not an internet scanner, it's not in our internet scanning database. If we then look to the next one, we have the right Boolean. And this is basically saying, hey, in this case, this IP address is in the right dataset. So this IP address is an IP address that belongs to a common business service. This endpoint also spits out a classification, a name. In this particular case, the name is either going to be the provider name for those IP addresses in RIOT, or an acronym for those that are an internet scanner. We're also going to provide you with a prebuilt link back to this data on the Visualizer information about the last time that we've seen this particular IP address, and whether or not the the request was actually successful or not.

To give you a couple of different examples of what these IP addresses might look like. I'm going to go ahead and put in a couple of different IP addresses here just so that we can see some other examples. So by copying this IP in here, I'm gonna go ahead and click the Try It button again. And we can see in this case, this is an IP address that is both noise and RIOT. And we can see some additional information about it to understand what this looks like by actually going ahead and copying this URL, opening up in a new tab. And then we can see that this is an Akamai CDN IP address that is also out there doing mass internet scanning. And then we can see that less information and the other information that's available. Bring in one more sample as well. We'll go ahead and pop this in here. Alright, and we go ahead and see that this is an internet scanner, in which case it's also classified as malicious and we don't have an actor associated. So again, giving you some useful but very basic details on each one of the IP addresses.

If you want to see some samples of what this looks like, this is one of the few endpoints that support unauthenticated authenticated requests. So in this particular case, here, I'm just doing a simple curl command on a particular IP address. And you're going to see that this is actually successful, and returns the full response from the API.

You know, as a note, this authenticated version of the API does have differently than if you have signed up for a free Community account, which we do encourage everybody to do. And once you have that free Community account, you'll have an API key that you can use as well. And so you just sort of pass that along to request, you get the same response back, but you'll have different rate limits that you can use for the day. Okay, we're going to sort of move on to the next endpoint.

So our next endpoint is going to be our ping endpoint. This is mostly designed to be used with integrations and other things along that line. And this is really helpful to sort of determine what access level an API key has, and if it is still actually valid. So I'm simply going to go in and pass in a curl command example over here, clear this out. So we have a better picture here. And then paste in this. And what this is going to go ahead, it's going to return back, hey, when's my expiration date, and then sort of what level out so I get this, you know, Pong message back to indicate that this was a successful request. And it's also showing me that I'm currently on an Enterprise trial.

So what this is showing is that this is a community account that has opted into a free Enterprise trial, to get 14 days of access at that level. And so that gives you the ability to access temporarily all of those enterprise API's. And I'm going to review to sort of determine if you'd like to move forward with a paid subscription. If you're not in the enterprise trial, this offering will come back and just say community indicate that your key is a Community account. This is a very basic, very basic, very straightforward, set up here. And we get sort of the same response from the API reference documentation as well.

Next up, we're going to just continue on here and move on to the multi IP context. So context, in terms of GreyNoise, is whether or not we have any internet scanning data on this particular IP address. So anytime you are querying a context endpoint, you are looking to see exclusively if that IP address is an internet scanner. This particular endpoint is a post endpoint. Alright, that takes in a string of IP addresses to go through and actually query them all at once. So in this particular example, here, we can go ahead and put a couple of different IP addresses in here. So I'm gonna just put add a sample here. And then I'm going to add another one here. And then I'm going to go ahead and try it out. And then we can see the data returned here.

Alright, so this is giving me back a response that is showing me all of the IP addresses where the scene flag, as I mentioned before, is true. So it's going to strip out any IP addresses that are unknown or not noise for us. So if they're unidentified to GreyNoise and are not part of our internet scanning dataset, they are not going to be returned in this response response. That is a little bit of a gotcha and a slight variation from some of our other endpoints. So it's good to note that those will be filtered out. And you can expect just to get the positive responses back from this endpoint. But this is going to give you all of the IP document information that we have, including everything that we know all the tagging information, the metadata information, all of the raw scanning data, and pretty much every IP address that is an internet scanner.

Another slight gotcha is that if you submit through a request, that is all false information, meaning that all of the IP addresses are in fact false. And we don't have data on any of them, it will come back and say, Hey, this entire payload is all IP addresses that are not internet scanners. So in this particular case, I submitted a single IP address, and I still got that result, even though seen as false. So that will be the only time that you get back all of the results. If they are all seen false in this particular case. So wanted to go through and sort of point that out. So that you can sort of see what the differences there.

I'm gonna go ahead and show just again, a couple of these similar curl examples here. So I'm going to pass this in with that single, not found address in there, you can see it does return the document for that with that one response. And if everything in this payload was in that scene, false category, it would go ahead and return all of them for you. But if I paste in another example where I've got four IP addresses in here, we're gonna go ahead and send this over. Alright, and in this case, I'm only getting back again, those two addresses where that scene is true. So in this case, this "23..." address here was seen. And then in addition, this "85..." address was seen as well.

Alright, and so you want to make sure that you are doing a POST on this endpoint, the majority of our API endpoints are GETs. But this one is different. And so it's good to sort of call that out and make sure that you've captured that. Alright, I'm gonna go ahead and move on to the next endpoint, which is our single IP lookup IP context. So this is meant to do just a single IP. And to give you back information on whether or not it is an internet scanner. Alright, so in this particular case, if we do just a simple example, and we go ahead and try it out, this API is going to give you back that scene false for IP addresses that are not in our internet standard database, right? However, when you go ahead and enter an IP address that we do know about...

Alright, what you should see is all of the data that we have on that particular IP address as it relates to internet scanning traffic. So again, when was the first time it hit our sensor network, when was the last time it hit our sensor network associated tags that we've been able to apply to it, if we've been able to identify an actor, whether it's movable, what its classification is, if it has associated CVEs, whether or not it has known bot activity belongs to a VPN, and all that geo information and raw scanning data as well. Alright, taking a look at these, you know, from the curl command perspective, as well, we'll just give you some examples of over here on the console. So we'll clear this out. And go ahead and sort of giving you that negative response as well so that we can see what that looks like.

Okay, and then we'll go ahead and pull back one, again, that is a positive response. And you can see the way that lays out with the same information there as well. So very straightforward, very easy to use. And this is where the majority of folks will do those sorts of quick single lookups there. I'm going to continue on now to our quick endpoint. So the quick endpoint is an endpoint that's designed to give you just a sort of similar to the community API, but it's going to give you just the sort of Boolean response of is this IP address in our right dataset, or is this IP address in our noise dataset, it also returns back a code, which can then be translated to the sort of code messages here.

Alright, so we're gonna go ahead and take a look at an example here. So again, I'm just going to put in quad1 in here (1.1.1.1), I'm going to go ahead and execute this. In this particular case, we can see that we get that noise false flag here to say it's not in our internet scanning database. It is in RIOT, which indicates that it is a common business service IP address. But when we go ahead and look at this code, we can translate to say that, hey, this IP address was found in RIOT, to go ahead and pass along that information as well.

So an additional example, here, let's go ahead and use this same IP address that we've been using. We'll go ahead and enter that in here. Try it out. In this particular case, we can see that this IP address is in the noise dataset. But it's not in the right dataset, and gives us back this code of "0x01", which we can basically say this IP address has been observed by the GreyNoise sensor network and nothing else more specific than that. This can be very useful if you are trying to both pull back information on an IP address from both of our datasets. Because the datasets use different API endpoints to query for the full information for noise or the full information of RIOT, this is a great thing to check first, because this will tell you right away which endpoints you then need to do sort the secondary lookups for.

And so in the case, where both noise and RIOT come back false on this endpoint, that would indicate that you can stop right there. And you don't actually need to query the other context endpoints or the right endpoints to go ahead and pull back the sort of more comprehensive data that you're looking for.

And just to give a couple of quick examples, over here on the command line as well, we'll go ahead and pass over here, this sort of unknown IP address. And so in this case, here, we get that false false response back. Alright, but if we go ahead and pass over our quad1 example, again, we can see that sort of false true example. And then if we go ahead and one more time and look at the inverse of this one, and we sort of get that indication of noise true, right false to indicate that this is an internet scanner. So again, pretty straightforward, single IP address gives you back the information that we have just on that sort of quick lookup response. And we do have sort of as our next endpoint here, a way to do a multi IP check, meaning that you can pass a variety of IP addresses all at once, and get sort of the response for each one of these IP addresses without having to do all those individual API calls for each IP address.

Now, there are two versions of the multi IP check endpoints, you can both use a GET and a POST request, the difference being that the GET requests will be limited to 500 IP addresses. And the post will be limited to 1000 IP addresses in a single request. So we go ahead and we sort of build out an example of what these are going to look like here, we can go ahead and put a couple of IP addresses into our sample here. So we'll add one here that will add another one, add a third one here, and then we'll go ahead and try it out. And we can see again, we get that same quick response back for each one. So in this case, this is noise, it's not RIOT, this one here is neither noise or RIOT. In this case, this one is not noise, but isn't RIOT. And so that gives you the ability to go through submit that list of IP addresses and it'd be able to sort of parse out, you know, whatever you're looking for to determine which ones we want to go and do additional context lookup for and which ones we might want to do additional RIOT lookups for as well.

And to show these examples, again, over, you know, on the command line itself, here's what these are going to look like. So we'll go ahead and curl that. So that's that single one. So you can still do a single IP address if you want to just consistently use it. And again, I'm showing these as a post example. But the GET works exactly the same way in an in the same format. It's just the number of IP addresses that you can submit at once is what's limited. And then one gotcha to note about this, this endpoint is that this will give a response back for any valid IP address, meaning that it's also going to give you a response back for IP addresses that are non routable addresses. So these are going to be local addresses that you wouldn't necessarily want to send to the API, but you will get that sort of false-false response. And that 07 code here ("0x07") is basically saying, Hey, this is an invalid IP address for our service, because we don't have data or intelligence on local addresses that wouldn't apply there.

Alright, so we'll go ahead and clear this out, and move on to the next one. So in the documentation, the next thing will be sort of the samples of the, again, the multi IP quick check. This is just the get version, which is sort of indicating that it is limited to the 500 IPs in that single request there as well. So we'll sort of just skip over this as the examples and the way that it works are exactly the same as the one we just did.

So next up, we're gonna go ahead and review the RIOT endpoint. Alright, so RIOT is a different dataset. So similar to looking up an IP address in the context of the internet scanning database. This, again, is looking up an IP address directly in our right database to see if it belongs to a common business service. So we'll go ahead and we'll do an example here. Again, we'll put quad8 in (8.8.8.8), we'll go ahead and try it, and we'll see the response back here. Alright, so we get the IP address, and we get that right true flag. But now we also get the category information to say, Hey, this is categorized as a public DNS server, the name of the provider, and then a basic description of who the provider is and why, why we added them to the right dataset, then also, the last time this record was updated in RIOT, and then the "logo_url" can generally be ignored, that's used internally and doesn't provide a lot of value downstream. And then there's some reference information as well. And then also our trust level in terms of is this a trust level one or two IP address within the right dataset. And you can find out more about those trust levels in the documentation.

For IP addresses, again, that we don't necessarily have in this dataset, we'll go ahead and pull an example in. And this works very similar to the context endpoint, where it's going to go ahead and actually return the response where it just says right as false to say that this IP address is not in RIOT, it's not an IP address belonging to a common business service that we are monitoring. And so you get that sort of simple response back there. And again, we can look at some of the very simple curl requests to see those responses. So again, this is that unfound, that it's not in that dataset. Again, if I'm looking up an IP address that was noisy, if I want to see if it was also in RIOT, in this case, this one is not it's only in that single dataset, then we can go ahead and look up one of those IP addresses that we know is in here. And we get that full payload with all the details that we know about this particular IP address in the context of the RIOT dataset.

Okay, moving right along. Now we're going to take a look at the GNQL query endpoint. Alright, so similar to how we go ahead and build queries on the GreyNoise Visualizer. So doing things like less than one day within here and then perhaps doing something and filtering and say, Hey, I specifically want to see Log4j attack tags here. We can go ahead and build this as essentially a GreyNoise query or a GNQL or a API endpoints supports the same exact query language. And so you can use the examples that you build within the visualizer to then support what you're building in the actual query endpoint itself. The query endpoint does allow for you to search for a variety of different facets. And you can see those all listed out here, we also have additional documentation in our guide section that talks about using the GNQL language, and sort of going through and searching for a variety of different things.

Alright, so just to run through a couple of examples, you can do something as simple as putting in a query that looks like a single IP address. So if I come in here, and I drop in an IP address, and then I go ahead, we give this a go, this is basically going to give me a response for an IP record that matches the query. So we looked up a single IP address, here's the query that was submitted, this came back, okay, meaning that it was valid. And then we got back that sort of IP details for the IP address that match that query. Now, again, we looked up a single IP address or not too much there that we're going to dig into. But let's expand this out to actually go ahead and look into a larger group. So in this particular case, we'll go ahead and put an asset in, and we'll go ahead and submit this. Let's see here. I got an extra quote in here. So I'm gonna go ahead, remove that real quick.

Alright, so now we've gone and we've provided this query here. Alright, so we can see, in this case, this completed true and so I've got all of the records available. And so we have 79 IP addresses that match this query. So that means GreyNoise was currently tracking 79 addresses for this particular ASN. And then in the Data section, we give you details on each one of those IP addresses. So like you would get from the context endpoint, you get that full IP details, record back for each one of these internet scanner IP addresses. Right.

And so that's essentially how this sort of builds out. And again, you can see in the samples in the code snippets here, what those look like, I'll drop a few of those in here, just so that we can dig into them a little bit more on the command line. Alright, so here's the that first one using that IP document. Alright, so just saying, hey, I want to actually query this particular IP, and getting that single one back. Alright, then I'm going to go ahead and use our ASN query. And we can go ahead. And at this point, again, we see this was completed, we get, you know, 79 results here. And then we sort of get that data back for each one, we can scroll through it and see those variety of different IP addresses there.

Alright, so I'm gonna go ahead and exit out of that. And then the last thing I'm going to do is now what I'm doing is I'm going to query for that same exact ASN, but this time, I'm going to go ahead and look for the classification malicious. What you'll notice here is that I'm passing in a %20. And instead of a space to go ahead and be able to actually pass this to the API, so any cases where you would go ahead and have a space in a query that you're running here, you're just going to go ahead and actually replace those with that percent 20, to help ensure that the API can interpret that request correctly. And again, we can see now that we've filtered down to this 12 addresses, and those are all included in the data payload here as well.

Now, this endpoint does have a couple of different attributes that you can set, it has this size response here, what you can do is actually say, Hey, I might only want the first 10 records, or I only want 10 records per lookup. So if I go ahead, and I pass that in here, right, what you're going to see now is that we're only going to get the first 10 records in the responses, even though the count is still the same. Now this complete is set to false, because I've only got a partial response here. Alright, and so sometimes that's helpful based on how you're coding or how you're building out, whatever scripting it is that you're going to be working with to have sort of those, just a few responses at a time, rather than getting them all at once. The endpoint will return a maximum of 1000 IP results per page. So for those responses, where you have more than 1000, you will need to pass through this scroll token, which will go ahead and help you paginate through the results that you can pull them all back.

The GNQL endpoint is not meant to you know work as a feed. You should not do broad searches, like looking for less than one day or a classification malicious or benign from this endpoint. It's designed to sort of help you correlate data in searches you would within the visualizer. So do be cautious that the usage that you're using around this endpoint is in line with whatever kind of contract or subscription that you have with us to make sure that you're not going through and pulling too much data down from this particular endpoint. Alright, and that covers everything on the query endpoint here. And so I'm going to go ahead and move forward to the stats endpoint.

The stats endpoint is built exactly the same as the query endpoint. However, what it does is it basically instead of giving all of the IP records, will go ahead and give us just basic stats on that particular query. So if I come in here, and I'm going to put my ASN query and again, what I'm going to get is sort of that total record count. But then we're going to break it down all the different classifications. So in this case, there are 67 unknown and 12 malicious. The breakdown of whether or not the IP addresses are movable or not the breakdown of organizations, actors, countries, tags, operating systems, categories, and ASNs. So that you can sort of go through and just get a high level overview of what that looks like. This is equivalent to putting in this particular query into the Visualizer. And then seeing all of the different stats data that is included in the sidebar here in the visualizer. So this is sort of mimicking what you're seeing in this view here.

Alright, so again, if we look at this, and we can take this same query, we're gonna say, hey, I want to also filter this down on classification. So I'm gonna go ahead and replace this with the classification information as well. So we pass that in. And now you can see here is the query that I passed in. So I've got metadata, ASN, and I'm also doing classification malicious. So this way, now, I'm only getting that 12 IP addresses back. And I'm getting stats just on the malicious one. So I can see the different information that's associated with those as well. And in this particular case, you can see interesting facet that pretty much every one of the IP addresses in this ASN are tagged with Marai. And so that could be an interesting data data point for you to sort of go with, right? And that is essentially, what comes down from the stats endpoint as well. So very, very similar to the full query endpoint. But again, just sort of giving that high level statistics.

And we have one final endpoint to cover. This is the metadata endpoint. And what this endpoint does, it doesn't take any inputs. But when you query it, it just goes ahead. And it provides you with all of the current tags that GreyNoise has in the dataset. So I'm going to create this, go ahead and query this one over here. And what you can see is that you get back the sort of tag metadata for each and every one of the tags, that GreyNoise has our dataset. So if you want to search through, you want to pull out some additional tag information, you can query this and then pull in particular tag for whatever you're building and pull out. You know what the intention of that tag is, if we have any additional references, if we're recommending it's blocked, you can also see things like the associated CVEs.

And then at the very end of this, we also include a list of all of the VPN services that we are currently tracking within the dataset as well. So as you're probably familiar, we do VPN enrichment through Spur. And so you can get an example of what that looks like. So I'm gonna go ahead and actually just pull up the example here. And so this is the format that you get, you sort of get this metadata with each one of the tags. And then down at the end, there's going to be this VPN services response that has the list of all of the current DNS that we have seen internet scanning behavior from as well.

So that brings us pretty close to the end of this couple of different things that I want to go ahead and mention. So in the guide section here, there are a couple of articles that might be helpful. There is the using GreyNoise enterprise API that gives some basic scripting examples, both with the rhinos SDK, and in some Python request format. These might be helpful for some basic patterns on how we typically recommend that the APIs be used. And then there's also an additional documentation on using the GNQL endpoint, how you can go ahead and build out particularly queries about information that you might be interested in as well. And then there's some shortcuts and some behavior. And some examples in here that might be useful. So you as you're building through and experimenting with the API endpoint, you can use this as a reference.

As always, if you have any follow up questions or you need any additional information, you can reach out to us at support@greynoise.io, or you can reach out to us on our community Slack. So thanks for tuning in today. We hope that you found this helpful, and enjoy your day.