Git string manipulation functions wrong?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Git string manipulation functions wrong?
@ 2007-05-21 13:11 Erik Mouw
  2007-05-21 14:36 ` Petr Baudis
  0 siblings, 1 reply; 4+ messages in thread
From: Erik Mouw @ 2007-05-21 13:11 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 896 bytes --]

Hi,

I got this forwarded from a friend who is subscribed to the Dovecot
mailing lists (dovecot is a pop3/imap server).

  http://www.dovecot.org/list/dovecot/2007-May/022853.html
  http://www.dovecot.org/list/dovecot/2007-May/022856.html

The Dovecot author claims there are "basic string manipulation errors"
in the git code and that's a reason for him not to use git.

I can see his problem with *snprintf() functions in the case where the
amount of output is larger than the buffer size: *snprintf() will
return the number of characters written if there would have been enough
space to write them, which will lead to problems with code like "len +=
snprintf(buf, max, bla, ...)". I don't see his problems with strncpy(),
though.

Erik

-- 
They're all fools. Don't worry. Darwin may be slow, but he'll
eventually get them. -- Matthew Lammers in alt.sysadmin.recovery

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Git string manipulation functions wrong?
  2007-05-21 13:11 Git string manipulation functions wrong? Erik Mouw
@ 2007-05-21 14:36 ` Petr Baudis
  2007-05-21 14:59   ` Karl Hasselström
  0 siblings, 1 reply; 4+ messages in thread
From: Petr Baudis @ 2007-05-21 14:36 UTC (permalink / raw)
  To: Erik Mouw; +Cc: git

On Mon, May 21, 2007 at 03:11:03PM CEST, Erik Mouw wrote:
> Hi,
> 
> I got this forwarded from a friend who is subscribed to the Dovecot
> mailing lists (dovecot is a pop3/imap server).
> 
>   http://www.dovecot.org/list/dovecot/2007-May/022853.html
>   http://www.dovecot.org/list/dovecot/2007-May/022856.html
> 
> The Dovecot author claims there are "basic string manipulation errors"
> in the git code and that's a reason for him not to use git.
> 
> I can see his problem with *snprintf() functions in the case where the
> amount of output is larger than the buffer size: *snprintf() will
> return the number of characters written if there would have been enough
> space to write them, which will lead to problems with code like "len +=
> snprintf(buf, max, bla, ...)". I don't see his problems with strncpy(),
> though.

It's the opposite for me - we don't properly set the NUL byte for smoe
of our strncpy() calls, but I don't really see his problem with
snprintf(), we seem to handle its return value correctly everywhere
(except diff.c, but there the buffer sizes should be designed in such a
way that an overflow should be impossible).

-- 
				Petr "Pasky the Sleepy" Baudis
Stuff: http://pasky.or.cz/
Ever try. Ever fail. No matter. // Try again. Fail again. Fail better.
		-- Samuel Beckett

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Git string manipulation functions wrong?
  2007-05-21 14:36 ` Petr Baudis
@ 2007-05-21 14:59   ` Karl Hasselström
  2007-05-23  3:22     ` Kyle Moffett
  0 siblings, 1 reply; 4+ messages in thread
From: Karl Hasselström @ 2007-05-21 14:59 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Erik Mouw, git

On 2007-05-21 16:36:16 +0200, Petr Baudis wrote:

> It's the opposite for me - we don't properly set the NUL byte for
> smoe of our strncpy() calls, but I don't really see his problem with
> snprintf(), we seem to handle its return value correctly everywhere
> (except diff.c, but there the buffer sizes should be designed in
> such a way that an overflow should be impossible).

I think this kind of detailed case-by-case analysis defeats Timo's
point, though: that the C library functions make it too easy to write
bugs. If it's necessary to do non-trivial bounds checking etc. at
every call site, it doesn't really matter if we currently do get them
all right; at some point, we _are_ going to miss one. Instead of using
our collective C-fu to get difficult calls right, we should be using
it to construct string routines that have low enough overhead that
it's lost in the noise, and are dead simple to use (and, of course,
that can be cleanly bypassed in the 1% of cases where it's necessary).

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Git string manipulation functions wrong?
  2007-05-21 14:59   ` Karl Hasselström
@ 2007-05-23  3:22     ` Kyle Moffett
  0 siblings, 0 replies; 4+ messages in thread
From: Kyle Moffett @ 2007-05-23  3:22 UTC (permalink / raw)
  To: Karl Hasselström; +Cc: Petr Baudis, Erik Mouw, git

On Mon, 21 May 2007 16:59:25 +0200 Karl Hasselström <kha@treskal.com> wrote:
> On 2007-05-21 16:36:16 +0200, Petr Baudis wrote:
> > It's the opposite for me - we don't properly set the NUL byte for
> > smoe of our strncpy() calls, but I don't really see his problem with
> > snprintf(), we seem to handle its return value correctly everywhere
> > (except diff.c, but there the buffer sizes should be designed in
> > such a way that an overflow should be impossible).
> 
> I think this kind of detailed case-by-case analysis defeats Timo's
> point, though: that the C library functions make it too easy to write
> bugs. If it's necessary to do non-trivial bounds checking etc. at
> every call site, it doesn't really matter if we currently do get them
> all right; at some point, we _are_ going to miss one. Instead of using
> our collective C-fu to get difficult calls right, we should be using
> it to construct string routines that have low enough overhead that
> it's lost in the noise, and are dead simple to use (and, of course,
> that can be cleanly bypassed in the 1% of cases where it's necessary).

That would be mostly true, except for the fact that without snprintf()
returning how many bytes _would_ have been written, it's much harder to
reliably allocate buffers for the result on the first pass.  For
example, this is a trivial implementation of an function which returns
a freshly-allocated formatted string:

	char *data;
	unsigned long len;
	len = snprintf(NULL, 0, some_fmt, arg1, arg2, arg3);
	if (!len)
		return NULL;
	data = malloc(len+1);
	if (!data)
		return NULL;
	data[len] = '\0';
	snprintf(data, len, some_fmt, arg1, arg2, arg3);
	return data;

You can't do that without a loop if it returns how many bytes were
actually written (although some braindead platforms do that already).
Here's a function which handles both use-cases in an optimal way:

	char *data = NULL;
	unsigned long datalen = 0, len;
	do {
		len = snprintf(data, datalen, some_fmt, arg1, arg2, arg3);
		if (!datalen) {
			datalen = len ? len : 16;
			data = malloc(datalen);
			if (!data)
				return NULL;
		} else if (len >= datalen) {
			void *newmem;
			datalen = (len > datalen)?(len + 1):(datalen +16);
			newmem = realloc(data, datalen);
			if (!newmem) {
				free(data);
				return NULL
			}
		}
	} while (len >= datalen);
	data[len] = '\0';
	return data;

Hopefully, on a nice modern platform, the first iteration will have len
equal to the ideal actual required length and so it will hit the first
case and carefully allocate exactly enough bytes, then on the second
loop through it will fill in exactly the required bytes and return
success.  On one of the abovementioned dain-bramaged systems, this will
loop until snprintf doesn't use all the space in the buffer,
incrementing by some fixed value each time (in this implementation,
16).  It should be obvious that correctly-implemented systems will be
significantly more performant than ones without the useful "feature" of
POSIX-compliance. :-D

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-05-23  3:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-21 13:11 Git string manipulation functions wrong? Erik Mouw
2007-05-21 14:36 ` Petr Baudis
2007-05-21 14:59   ` Karl Hasselström
2007-05-23  3:22     ` Kyle Moffett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).