CGI scripting made marginally safer

[Edit: there is an earlier version available here]
There is at the moment a massive amount of "blog spam" being generated. There's also a wide problem with (at least) PHP-based scripts having security problems. One of the reasons is that CGI parameters have static names, making it easy to write exploit scripts to spam blogs or subvert remote code.

I will here explore some methods to make this slightly harder.

Before I go much further, I will explain that in this essay I will use te term "CGI" to refer to any web page generated by code on the server side. I will refer to the parameters sent in a GET or POST as "CGI parameters". THis might not be technically correct in every single case, but the method to handle said parameters on the server side will most probably not differ noticeably, even if they're served by mod_{perl,php,python,lisp}.

Solution the first

The first obvious solution is to randomize parameter names. This will always work, but imposes some hard resttrictions on the code (this would be required on build/install of the code) and pluggability. Most CGIs of this kind is just part of a framework, so it'd require touching multiple files and it'd make installing something later more than just a little painful. Also, though of minor consequence for all but the developer(s) is lessened code readability.

Thus, completely random parameter names isn't really an option in the long run.

Solution the second

In brief

Luckily, there is a method to combine the goodness of the first solution with the convenience of "predictable" parameter names. With judicious use of one-way hashes, we can have per-installation "random" parameters, without the hassle of complete source-code mangling on install and still have the ability to have plug-in modules (as long as they obey the same mangling protocol).

The simplest widely-available hash algorithm suitable for our purpose is the unix crypt algorithm (the basic, DES-based, password hasher). It's not the be-all and end-all of cryptographic hashes, but it's easy to work with and for our purposes, that is good enough.

The method

What we want to accomplish is to have the ability to map a descriptive name in the code to a token used in a web page and in subsequent requests. There is a small risk of collisions and we will only use the low seven bits of the (up to) eight first characters of the code-internal descriptive name. I will provide example source code in Python and in C (the C code will target my cgilib CGI interaction library). Adapting this to any other C CGI library should be fairly simple, though.

To accomplish this, we will need one shared secret between our CGI application bundle (I will call this salts) and this is simply a string of random characters, generated at install time (build scripts must be adapted to generate and/or change this on rebuild, since it absolutely needs to be installation-specific).

Each time we need to print the name of a parameter (as part of the generated HTML), we will feed it through a cryptographic transformation. Likewise, every time we need to find the value of a submitetd parameter, we will feed it through one (or more) cryptographic transformations.

Some details

The only non-trivial part is to make sure that any transformation will be identically done the rest of the times we need it, to preserve the semantics of whatever code actually uses our little library (this minimizes adoption cost).

The slightly-annotated C source code is available for viewing. It should be fairly obvious what is happening.

Similarly, we can do the same type of transformation in Python, though in the Python case we can easily leverage the built-in hash tables ("dictionaries") and thus avoid O(n) searches whenever we need to generate a new memoized transformation.

Generating a shared salts secret

The main difficulty of this method is to generate a shared secret. This can be accomplished in several ways, though one that is bordering on "sufficiently easy" is to simply generate random data (under linux, I prefer using dd and /dev/random) and feed it through a MIME base64-encoder. This gives you data with good randomness, it's easy to do and I can't off-hand see any drawbacks.

For C, this should then be inserted into the template

char salts[]= "<random data goes here>";
in a separate file and this file included in the relevant place(s) in the codebase needing access to it. A similar method (using import) can be done in Python.

Possible downsides of this method

This is not a be-all and end-all of CGI security.

Easily-identified problems are:

On the other hand, a wide adoption of this should lead to not-much-longer run times and CPU usage. It's easy to wrap up in a library. It lifts the bar to exploitation noticeably.

Even at that, it is no replacement for all other security measures (input validation, submission throttling, IP-based bans, sensible authentication mechanisms), but as an easily rolled-out measure, it should do some good.

The author

Ingvar Mattsson can be reached as < ingvar -at- hexapodia . net >. His day-job is as senior network specialist at a UK ISP. Previous jobs have included designing secure EDI infrastructures for public healthcare.

This is one of Ingvar's essays

All fields below are mandatory, your email address will not be displayed by the site. All comments are sent to a moderation queue, so do not be surprised that it doesn't show up immediately.

Name:
Email (will not be displayed):
Comment: