CGI scripting made marginally safer
[Edit: there is an earlier version available here]
There is at the moment a massive amount of "blog spam" being generated.
There's also a wide problem with (at least) PHP-based scripts having
security problems. One of the reasons is that CGI parameters have
static names, making it easy to write exploit scripts to spam blogs or
subvert remote code.
I will here explore some methods to make this slightly harder.
Before I go much further, I will explain that in this essay I will use
te term "CGI" to refer to any web page generated by code on the server
side. I will refer to the parameters sent in a GET or POST as "CGI
parameters". THis might not be technically correct in every single
case, but the method to handle said parameters on the server side will
most probably not differ noticeably, even if they're served by
mod_{perl,php,python,lisp}.
Solution the first
The first obvious solution is to randomize parameter names. This will
always work, but imposes some hard resttrictions on the code (this
would be required on build/install of the code) and pluggability.
Most CGIs of this kind is just part of a framework, so it'd require
touching multiple files and it'd make installing something later more
than just a little painful. Also, though of minor consequence for all
but the developer(s) is lessened code readability.
Thus, completely random parameter names isn't really an option in the
long run.
Solution the second
In brief
Luckily, there is a method to combine the goodness of the first
solution with the convenience of "predictable" parameter names. With
judicious use of one-way hashes, we can have per-installation "random"
parameters, without the hassle of complete source-code mangling on
install and still have the ability to have plug-in modules (as long as
they obey the same mangling protocol).
The simplest widely-available hash algorithm suitable for our purpose
is the unix crypt algorithm (the basic, DES-based, password
hasher). It's not the be-all and end-all of cryptographic hashes, but
it's easy to work with and for our purposes, that is good enough.
The method
What we want to accomplish is to have the ability to map a descriptive
name in the code to a token used in a web page and in subsequent
requests. There is a small risk of collisions and we will only use
the low seven bits of the (up to) eight first characters of the
code-internal descriptive name. I will provide example source code in
Python and in C (the C code will target my
cgilib CGI
interaction library). Adapting this to any other C CGI library should
be fairly simple, though.
To accomplish this, we will need one shared secret between our CGI
application bundle (I will call this salts) and this is simply
a string of random characters, generated at install time (build
scripts must be adapted to generate and/or change this on rebuild,
since it absolutely needs to be installation-specific).
Each time we need to print the name of a parameter (as part of the
generated HTML), we will feed it through a cryptographic
transformation. Likewise, every time we need to find the value of a
submitetd parameter, we will feed it through one (or more)
cryptographic transformations.
Some details
The only non-trivial part is to make sure that any transformation will
be identically done the rest of the times we need it, to preserve the
semantics of whatever code actually uses our little library (this
minimizes adoption cost).
The slightly-annotated C source code is
available for viewing. It should be fairly obvious what is happening.
Similarly, we can do the same type of transformation in Python, though
in the Python case we can easily leverage the built-in hash tables
("dictionaries") and thus avoid O(n) searches whenever we need to
generate a new memoized transformation.
Generating a shared salts secret
The main difficulty of this method is to generate a shared
secret. This can be accomplished in several ways, though one that is
bordering on "sufficiently easy" is to simply generate random data
(under linux, I prefer using dd and /dev/random) and feed it through a
MIME base64-encoder. This gives you data with good randomness, it's
easy to do and I can't off-hand see any drawbacks.
For C, this should then be inserted into the template
char salts[]= "<random data goes here>";
in a separate file and this file included in the relevant place(s) in the
codebase needing access to it. A similar method (using import) can be done
in Python.
Possible downsides of this method
This is not a be-all and end-all of CGI security.
Easily-identified problems are:
- We use crypt, this is a one-way hash that has had some serious
study in reversion.
- We use rand() to generate random numbers, this is not ideal, but
it's fast
- It's non-trivial to generate sufficiently-random salts secrets
and keep tabs on what needs access to it wand what shouldn't have
access to it.
- If the shared secret is too short, it will be trivial to exhaust
the parameter space by repeated requests and data scraping. If it is
too long, all possible combinations of salt will be available and it
will at that point be useless. It's currently unclear what a good
trade-off is.
On the other hand, a wide adoption of this should lead to
not-much-longer run times and CPU usage. It's easy to wrap up in a
library. It lifts the bar to exploitation noticeably.
Even at that, it is no replacement for all other security measures
(input validation, submission throttling, IP-based bans, sensible
authentication mechanisms), but as an easily rolled-out measure, it
should do some good.
The author
Ingvar Mattsson can be reached as < ingvar -at- hexapodia . net
>. His day-job is as senior network specialist at a UK
ISP. Previous jobs have included designing secure EDI infrastructures
for public healthcare.
This is one of Ingvar's essays