PHP – Securing your Web Application : Filter Input

Previously in this series we see why security is important for PHP web application and extension we develop for various CMS. In this article we we learn how we can filter the input from user and securely use the input.

One of the most fundamental things to understand when developing a secure site is this: all information not generated within the application itself is potentially tainted. This includes data from forms, files, and databases.

When data is described as being tainted, this doesn’t mean it’s necessarily malicious. It means it might be malicious. You can’t trust the source, so you should inspect it to make sure it’s valid. This inspection process is called filtering, and you only want to allow valid data to enter your application.

There are a few best practices regarding the filtering process:

  • Use a whitelist approach. This means you err on the side of caution and assume data to be invalid unless you can prove it to be valid.
  • Never correct invalid data. History has proven that attempts to correct invalid data often result in security vulnerabilities due to errors.
  • Use a naming convention to help distinguish between filtered and tainted data. Filtering is useless if you can’t reliably determine whether something has been filtered.

In order to solidify these concepts, consider a simple HTML form allowing a user to select among three colors:

<form action="process.php" method="POST">
	<p>Please select a color:

	<select name="color">
		<option value="red">red</option>
		<option value="green">green</option>
		<option value="blue">blue</option>
	</select>

	<input type="submit" /></p>
</form>

It’s easy to appreciate the desire to trust $_POST['color'] in process.php. After all, the form seemingly restricts what a user can enter. However, experienced developers know HTTP requests have no restriction on the fields they contain—client-side validation is never sufficient by itself. There are numerous ways malicious data can be sent to your application, and your only defense is to trust nothing and filter your input:

$clean = array();

switch($_POST['color']) {
	case 'red':
	case 'green':
	case 'blue':
		$clean['color'] = $_POST['color'];
		break;
	default:
		/* ERROR */
	break;
}

This example demonstrates a simple naming convention. You initialize an array called $clean. For each input field, validate the input and store the validated input into the array. This reduces the likelihood of tainted data being mistaken for filtered data, because you should always err on the side of caution and consider everything not stored in this array to be tainted.

Your filtering logic depends entirely upon the type of data you’re inspecting, and the more restrictive you can be, the better. For example, consider a registration form that asks the user to provide a desired username. Clearly, there are many possible usernames, so the previous example doesn’t help. In these cases, the best approach is to filter based on format. If you want to require a username to be alphanumeric (consisting of only alphabetic and numeric characters), your filtering logic can enforce this:

$clean = array();

if (ctype_alnum($_POST['username'])) {
	$clean['username'] = $_POST['username'];
}
else {
	/* ERROR */
}

Of course, this doesn’t ensure any particular length. Use mb_strlen() to inspect a string’s length and enforce a minimum and maximum:

$clean = array();

$length = mb_strlen($_POST['username']);

if (ctype_alnum($_POST['username']) && ($length > 0) && ($length <= 32)) {
	$clean['username'] = $_POST['username'];
}
else {
	/* ERROR */
}

Frequently, the characters you want to allow don’t all belong to a single group (such as alphanumeric), and this is where regular expressions can help. For example, consider the following filtering logic for a last name:

$clean = array();

if (preg_match('/[^A-Za-z \'\-]/', $_POST['last_name'])) {
	/* ERROR */
}
else {
	$clean['last_name'] = $_POST['last_name'];
}

This only allows alphabetic characters, spaces, hyphens, and single quotes (apostrophes), and it uses a whitelist approach as described earlier. In this case, the whitelist is the list of valid characters.

In general, filtering is a process that ensures the integrity of your data. Although filtering alone can prevent many web application security vulnerabilities, most are due to a failure to escape data, and neither is a substitute for the other.

Here is the list of of Article in this Series:

Please share the article if you like let your friends learn PHP Security. Please comment any suggestion or queries.

 

Thanks Kevin Tatroe, Peter MacIntyre and Rasmus Lerdorf. Special Thanks to O’Relly.