Pushing Left, Like a Boss — Part 5.1 — Input Validation, Output Encoding and Parameterized Queries
The next several posts will break up the secure coding guideline from We Hack Purple into smaller pieces, and explain each of them in further detail than our secure coding info graphic, but far less detail than our secure coding course.
Any input that you receive, from anywhere, must be validated to ensure that it is what you are expecting. For instance:
- It is the right type of data? — Date/string/integer/float/etc.
- It is within the appropriate range for size? Is it too long? Too short? Does that day actually exist? (June 31st is not a real day)
- Is the data is appropriate? — If you are expecting a username, why does it contain characters other than a-z, A-Z, 0–9? If the field is for the date of a future event, why is the date entered in the past? Business logic should be applied here.
- Is the data is in the correct format? — If it’s a call to an API, is the call following the protocol of requested input? Is the XML in the correct format? Is it MM/DD/YY, DD/MM/YY or YYYY/DD/MM?
The most important thing is ensuring that the data you are receiving is *valid*. If it is not valid, reject it, then issue an error to the user. Do not try to sanitize it, that is where many programmers get into trouble. Just tell the user what they entered was wrong and let them try again.
Note #2: An approved list is always recommended when performing input validation.
Approved list versus block list: A block list is a list of characters that you do not want to allow (for instance tags that you may think would be part of a script). A block list is a list of “known bad”characters, which is very difficult to get right, and often simple for an attacker to avoid. An approved list is a a list of “known good” characters that you will accept. For instance, when you want someone to create a username, you only allow [a-z, A-Z, 0–9]. If a character is not in the list of “known good”, then it is rejected, plain and simple.
There are many ways for malicious actors to circumvent block lists, as illustrated in detail in the OWASP SQLi Filter Evasion Cheat Sheet.
When displaying information to the screen, if it was received from a data source (rather than being part of the labels and other information programmed into the interface of the application), it needs to be output encoded. When something is output encoded, any ‘power’ it has is stripped away, and it is treated only as text. This means that if a script was accidentally passed into the application, API or database, it would be rendered as text, not as a script, when it is output by the program.
When we spoke about “Defense in Depth”, the layering of security measures, this is a perfect example of this in practice; only accepting valid input, then output encoding it just to be sure.
This is a perfect example of the layering of security measures in practice, as we covered in “Defense in Depth.” Only valid input should be accepted into the program, then we output encoded it just to be sure. Please do both.
Some programming frameworks have output encoding automatically added, such as .Net Core.
When sending queries to the database it is important that we use parameterized queries (also known as prepared statements or stored procedures), rather than inline/dynamic SQL or other database languages. Dynamic queries are made of user input pasted together with database query language, then submitting it directly to the database for execution, which is a highly dangerous activity. It often means no input validation, and there's a reason input validation was the first thing on this list.
The reason for this is that if you put the user input into parameters, it will either 1) be the correct data type and function normally, or 2) be incorrect and it will fail. For instance, if you inject a script into a date field, it will cause the query to fail. Using parameterized queries also strips any special powers that some characters may have from the data within the parameters, similar to output encoding. This strategy of using parameterized queries is a huge win against any sort of database injection attack.
* For those of you who are unaware, injection attacks are the #1 most damaging and dangerous type of web application attack, and are generally considered to be rated as “critical” if found in a live application.