In the previous post we covered API2:2019 Broken User Authentication, which was the second post in this series. If you want to start from the beginning, go to the first post, API1:2019 Broken Object Level Authorization.
Excessive data exposure is something that web applications can face, not just APIs. That said, because web-based APIs are basically services on the web, they can be abused even more easily to exfiltrate sensitive data than a regular web app. It’s easy for an attacker to find APIs (just connect to any web app or mobile app using a web proxy and see the API calls for yourself!), to call them, and then look at the responses to see if anything being sent looks potentially sensitive. For instance, if the data in a field passed back is named “password”, “sin” or “secret”, you’re most likely onto something.
Using a web proxy to watch the API calls go back and forth is sometimes called “sniffing”, but no matter what you call it, it’s easy to do! Anyone with the tiniest amount of web-app hacking training can do this on day one. This means this threat is prevalent (happens all the time) and very dangerous (because unsophisticated attackers can easily execute it).
Some APIs are *supposed* to return sensitive data. This vulnerability is when sensitive data is exposed to someone it should not be (for instance, someone who is not a valid user, seeing another user’s sensitive data, or for whom that specific data should not be shown due to their role within the system). Since whether data is sensitive in nature is not obvious to automated testing tools, it can be a bit more difficult to identify than other types of vulnerabilities.
* Note: occasionally the vulnerability rears its head via poorly-generated and/or overly-populated responses. For instance, the API delivers the entire table worth of data, which includes sensitive information, but then the client-side front-end sifts through it and only reveals the non-sensitive/appropriate data to the end user. Unfortunately, if the API call is not encrypted in transit, this means a malicious actor could see all of the data if they were sniffing the API at the time.
How do we avoid this?
Let’s look at some great advice from the project team (I may have added a bit onto their list):
- Never rely on the client side to filter sensitive data. By this we mean, only return the data you need to return! Don’t send a ton of stuff you do not need to, then let the GUI/front end decide what to show the user. Make these important decisions on the server.
- Classifying then label all your data. If you know immediately when you look at something that it is sensitive, it’s automatic to treat it in a certain way. Educate your developers and other areas of IT on how to classify data, and to ask the security team if they aren’t sure.
- In the design of your API, add user stories and/or threat models around this potential vulnerability. Making protecting sensitive data part of your design.
- Review the responses from the API to make sure they contain only legitimate data, data that the specific user (or users with that role inside your system) are allowed to access.
- Back-end engineers should always ask themselves “who is the consumer of the data?” before exposing a new API endpoint. Or better yet, perform threat modelling on your data flows, THEN design.
- Avoid using generic methods such as to_json() and to_string(). Instead, cherry-pick specific properties you really want to return. You do not need to return everything. In fact, it’s better for your cloud bills to return only what you need, even if it requires a bit more programming.
- Classify sensitive and personally identifiable information (PII) that your application stores and works with, reviewing all API calls returning such information to see if these responses pose a security issue.
- Implement a schema-based response validation mechanism as an extra layer of security. As part of this mechanism define and enforce data returned by all API methods, including errors.
- Perform strict linting on your API definition file, to ensure you have input validation built-in, by default, for every variable.
In the next blog post we will be talking about API4:2019 Lack of Resources & Rate Limiting!