How can I avoid buffer overflows with C?
June 9, 2007 6:05 AM   Subscribe

Robust usage of sscanf and sprintf - how to avoid buffer overflows?

I am sure this is embarrassingly simple, but all the tutorials I read always gloss over this aspect of sscanf and sprintf. How do I go about ensuring I don't encounter buffer overflows when using these two functions? I've read links like these that suggest truncating the data, but I don't want to do this. This link suggests using the "a" flag, but also says it's a GNU-only addition.

How do programmes that don't have fixed-length strings operate? Surely our chat clients don't use fixed-length buffers? Help, Mefi!
posted by PuGZ to Computers & Internet (23 answers total)
I forgot an example!
char cmd[1024];
sprintf(cmd, "INSERT INTO users (user, time) VALUES ('%s', %d);", buddy->name, time);
Clearly, there's a possibility that I might use more than 1023 characters. What should I do in such a circumstance?
posted by PuGZ at 6:08 AM on June 9, 2007

Check sizes of name & time and reject if too large before you do these kinds to things? Dyanamically allocate "cmd" base on those sizes? Either seem safer.
posted by R343L at 6:15 AM on June 9, 2007

I'd wrap the whole thing inside an if statement checking before calling the sprintf.

The other option is to move to c++ and use a resizeable container like string.
posted by handee at 6:32 AM on June 9, 2007

I'd avoid sprintf and sscanf altogether unless I was just writing a quick hack. Most *printf's allow a maximum field width that you could use to avoid buffer overflows. As an example:

printf("%0.8s", "hello, world this is a long string");

will have it's output limited to 8 characters. The problem with this approach is that it silently truncates the string.
posted by rdr at 6:34 AM on June 9, 2007

"Surely our chat clients don't use fixed-length buffers?"

Why not? Protocols like this invariably have specific limits on field sizes. You don't want 1k usernames and the utility of even >4k message bodies is generally dubious.

"char cmd[1024];

sprintf(cmd, "INSERT INTO users (user, time) VALUES ('%s', %d);", buddy->name, time);"

The FreeBSD sprintf manpage suggests:

char cmd[1024];
snprintf(cmd, sizeof(cmd), "INSERT INTO users (user, time) VALUES ('%s', %d);", buddy->name, time);

You can check the return value fits in cmd if you want to handle inputs too large to fit as an error condition.

You can also use "%500s" or so in your format string to limit to a specific number of characters (this applies to *scanf() too). Of course in this case using bind parameters is more sensible; this should be your SQL library's job, not yours.

There's also asprintf(), which is a GNU libc extension, but also found in some other systems, and you can always include your own to link with if necessary. This will make its own buffer big enough to fit everything in, which may or may not be a good idea.
posted by Freaky at 6:46 AM on June 9, 2007

Oops, "%.500s" or so. Of course your unit tests will spot silly errors like this ;)
posted by Freaky at 6:50 AM on June 9, 2007

Doesn't basically everything have snprintf() by now?

sscanf() lets you put a maximum field width, like "%10s".

Also, I like to use my own modified strcpy that's like strncpy but guarrantees termination.
posted by sfenders at 6:52 AM on June 9, 2007

Are you aware that the code snippet that you've posted also suggests that SQL injection attacks are unknown to you?

How do programmers without fixed length strings operate? Simple, they don't use sprintf() and friends when dealing with data from potentially untrusted sources. Most of them adopt a belt and braces approach and don't use sprintf() at all. Or they use a decent string handling library in the first place, probably in a language other than C.

If you're committee to using C then in general, you're better off using functions which are guaranteed not to cause buffer overflows in the first place: otherwise at some point you will make a mistake and open yourself up to security problems.

Oh, and if you aren't already aware of them then please go away and read up on SQL injection attacks & how to prevent them. A hint: sprintf(ptr,"SELECT * FROM table WHERE name = '%s' ;",string) will allow an attacker to destroy your database if they can control the content of "string" (so long as the user id you use to access the database and execute SQL statements under has the power to alter data of course).
posted by pharm at 6:53 AM on June 9, 2007

Argh: s/committee/committed/
posted by pharm at 7:00 AM on June 9, 2007

I've read links like [one that suggests snprintf] that suggest truncating the data, but I don't want to do this.

Oh, right. Well, the idea is that you make the size large enough that the data will never be truncated. If that's not possible it's necessary to either use one of the many libraries of string functions that handles it for you, or calculate the required size before you allocate space.
posted by sfenders at 7:01 AM on June 9, 2007

The database already has fixed field lengths, anyway. So using fixed-length buffers isn't so bad. I think the simplest way to do it, which might be acceptable if this isn't a really serious project but does need to work, is:

int sqlclean(char *s, int n)
// scan string for sql injection attacks
// truncate string at n characters.
char cmd[SQL_BUFFER_SIZE];

if (snprintf(cmd, SQL_BUFFER_SIZE, "INSERT INTO users (user, time) VALUES ('%s', %d);", sqlclean(buddy->name, FIELD_SIZE), time)>=SQL_BUFFER_SIZE)
// your sql command buffer is too small.

If it was a larger project you'd probably want to have table structure data and functions that use it to construct the sql commands in a more general way, so you could call db_insert(object) where object could be any of the various things you need to insert into the database. Or something like that.
posted by sfenders at 7:33 AM on June 9, 2007

I meant char *sqlclean(), of course.
posted by sfenders at 7:36 AM on June 9, 2007

And it would be better to call it from somewhere else, naturally.

If it's a XMPP chat client for example, the name is coming from an xml parser. Anyway, its source should most likely be something way more complicated than scanf().
posted by sfenders at 7:41 AM on June 9, 2007

If you can't truncate user input at all, then your best bet would be to allocate the buffer on the heap based on the length of user input. i.e. something like this:

char *cmd = malloc(sizeof(char) * (strlen(buddy->name)+some_number);
if(cmd) sprintf(cmd,...); else die_badly('Out of memory!')

As for the SQL injection, most databases provide built-in escaping in their client libraries, generally in the form of a parameterized query call that (much like sprintf) takes a format string followed by a variable number of other arguments. But you don't need to allocate a buffer and you don't need to worry about escaping user data.

All the same, the original question was more general than this particular SQL example, so in other situations, the above should help you. Just remember: every malloc() should have a free()!
posted by goingonit at 7:58 AM on June 9, 2007

There's never an excuse to use either sprintf or scanf.

A correct and portable way to printf an text of unknown length is the make_message function in the Linux snprintf manpage.
posted by reynaert at 8:25 AM on June 9, 2007

You need a proper string library from the sounds of it. As quirky as it sounds, the Tcl library has a good one (called Tcl_DString), which is fairly easy to extract from the rest of the language. It doesn't have a printf function built in, but it's trivially easy to add your own. I recommend Tcl 7.6 because they later versions are all full of garbage that you don't need.

I suspect there are other libraries out there that might give you what you need with even less effort.
posted by jewzilla at 8:26 AM on June 9, 2007

Safe C String Library.
posted by scalefree at 9:57 AM on June 9, 2007

Even snprintf() is not safe, because it does not guarantee that it terminates the output string with NULL if it reached the end of the buffer before finishing output. And hence there is no way to tell if NULL must be appended, since strlen() afterwards is not safe. It's a real mess.

vasnprintf is what you want. It's LGPL.
posted by Rhomboid at 1:51 PM on June 9, 2007

Hi guys,

To those pointing out parameterised query calls and SQL injection: you're entirely correct. I am aware of these (and the utmost importance of preventing the latter) and so perhaps this was a bad example!

The responses have more-or-less summed up what I thought in the first place. Thanks for the confirmation that I hadn't overlooked anything and for a round-up of the situation. All excellent answers!
posted by PuGZ at 5:21 PM on June 9, 2007

Rhomboid: What crappy snprintf() implementations don't null terminate?

The snprintf() and vsnprintf() functions will write at most size-1 of the
characters printed into the output string (the size'th character then
gets the terminating `\0'); if the return value is greater than or equal
to the size argument, the string was too short and some of the printed
characters were discarded. The output is always null-terminated.

Let me guess, GPL ones again? ;)

Not exactly hard to ensure there is a NULL anyway, since you already know how big the buffer can possibly be. And if the libc string functions are broken, you can always link in your own working functions.
posted by Freaky at 6:18 AM on June 10, 2007

You can call them crappy, but it is the standard bevavior and if you write code that assumes the nonstandard quoted behavior, you are in for a world of hurt. I checked both glibc and MSVC++ and neither of them work that way. You're apparently quoting a BSD-ish implementation. It's nice that they would be security minded in that way, but since this is a traditional standard function that has dozens of implementations on dozens of platforms it's a wasted effort, since no portable program could ever get away with using snprintf and assuming that behavior.

And if the libc string functions are broken, you can always link in your own working functions.

Here "broken" would be "almost any libc in the world except for whatever pet one you're referring to." And if you're going to use a replacement function, then you might as well drop the outdated snprintf interface and use something more modern that doesn't require the inefficiency of going through the entire output/parsing process twice if the buffer was too small the first time, which is exactly what I was advocating when suggesting vasnprintf.
posted by Rhomboid at 7:13 AM on June 10, 2007

Actually they're C99ish:

Page 290/291: The snprintf function .. output characters beyond the n-1st are
discarded rather than being written to the array, and a null character is written at the end
of the characters actually written into the array.
The snprintf function returns the number of characters that would have been written
had n been sufficiently large, not counting the terminating null character, or a neg ative
value if an encoding error occurred. Thus, the null-terminated output has been
completely written if and only if the returned value is nonnegative and less than n.

Were they in any actual standard prior to C99? And defined differently, or just ad-hoc?
posted by Freaky at 8:25 AM on June 10, 2007

Er, page 302/303, sorry.
posted by Freaky at 8:29 AM on June 10, 2007

« Older Haxx0red?   |   Do you know what this game is that I found at a... Newer »
This thread is closed to new comments.