git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Reece Dunn" <msclrhd@googlemail.com>
To: "Timo Sirainen" <tss@iki.fi>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	git@vger.kernel.org
Subject: Re: Buffer overflows
Date: Thu, 30 Aug 2007 23:34:26 +0100	[thread overview]
Message-ID: <3f4fd2640708301534k40f07a1cva90a59d12ace6138@mail.gmail.com> (raw)
In-Reply-To: <6F219888-6F48-4D56-8FA9-BE63EB6E1D95@iki.fi>

On 30/08/2007, Timo Sirainen <tss@iki.fi> wrote:
> On 31.8.2007, at 0.35, Reece Dunn wrote:
>
> >> The problem is that the git code is full of these random cases. It's
> >> simply a huge job to even try to verify the correctness of it. Even
> >> if someone did that and fixed all the problems, tomorrow there would
> >> be new ones because noone bothers to even try to avoid them. So there
> >> really isn't any point in trying to make git secure until the coding
> >> style changes.
> >
> > You don't want a manual check to do these kinds of checks. Not only is
> > it a huge job, you have the human factor: people make mistakes. This
> > is (in part) what the review process is for, but understanding how to
> > identify code that is safe from buffer overruns, integer overflows and
> > the like is a complex task. Also, it may work on 32-bit which has been
> > verified, but not on 64-bit.
> >
> > It would be far better to specify the rules on how to detect these
> > issues into a static analysis tool and have that do the checking for
> > you. Therefore, it is possible to detect when new problems have been
> > added into the codebase. Does sparse support identifying these issues?
>
> Yes, it is a complex task. But if there did exist such a static
> analyzer tool already, it would probably show that half of the strcpy
> () calls (and others) in git are currently unsafe. Wouldn't help all
> that much I think.

Let's see:

symlinks.c(39): In has_symlink_leading_path, the last_symlink input
argument needs to be large enough to hold MAX_PATH characters, as this
is sizeof(path). This is a potential risk, depending on how this
method is used.

sideband.c(17): buf is large enough to copy the "remote:" string
literal into. Also, the buffer is large enough for the input being
read, as packet_read_line specifies the size of the remaining buffer
with an extra character reserved for the null. Therefore, this usage
is safe.

sha1_file.c(550): open_pack_index does a runtime calculation that
essentially results in strcpy( ".pack", ".idx" ). As these are
literals, this usage is safe, provided that strlen(idx_name) >
strlen(".pack") which should be ensured by the caller. Hence, this
usage is safe.

So far, these are looking safe to me. I'd need to check the remaining
uses to be sure and to find the strcpy usage that you have identified
as an issue.

It would also be worth creating a test case that triggers this bug, as
that can then be used to ensure that it does not reappear once fixed.
That is the best way to deal with these, as not all uses of "unsafe"
API are actually unsafe.

> >> The code should be easy to verify to be secure, and with some kind of
> >> a safe string API it's a lot easier than trying to figure out corner
> >> cases where strcpy() calls break.
> >
> > Why is it easier? If you have a fixed-size buffer, why not use
> > strncpy, which is what a safe string API is essentially doing anyway?
>
> Well, strncpy() is a pretty good example actually. A lot of people
> use it wrong, because they don't realize that it doesn't necessarily
> NUL-terminate the strings. So it's another example of a bad API that
> can be easily used wrong. And besides that, it also fills the rest of
> the buffer with NULs, which is almost always pointless waste of CPU.

Ok, I take your point. However, there is nothing preventing these API
from being used correctly. Granted, this is more work and like the
strncpy example, have subtle behaviour, but that does not make it
impossible.

As an example, do your safe API do null pointer checks. This is
because strcpy, strlen and the like don't, which is one of the reasons
why they are considered unsafe. But then, if you guarantee that you
are not passing a null pointer to one of these API, why take the hit
of the additional checks when you know that these are safe. It is also
easier to debug the problem if it happens at the point of call,
instead of silently working.

One of the problems with a safe string API is that there are different
possible things to do when an overrun is detected. The best thing to
do would be to cap the input to the buffer size and return some error
status. For example, you could have:

    STR_STATIC(str, 1);
    sstr_append(str, "/foo/bar");
    // run "rm -rf ${str}"

which would hose your system. This is worse than a buffer overrun.

> And why is safe string API easier to verify? Here's an example:
>
> // see how easily you can use strncpy() to cause a buffer overflow:
> char buf[1024];
> strncpy(buf, input, 2048);

Yes it can, as above. A subtler case would be when the space for a
null terminator is missed. But then I'd write something like:

    char buf[1024];
    strncpy(buf, input, sizeof(buf));

Here, the compiler does the work for me, so if I change 1024 to 512,
it will still be safe.

> // see how impossible it is to cause a buffer overflow with my static
> string API:
> STR_STATIC(str, 1024);
> sstr_append(str, input);

How is str defined? Looking at the code, I can't _see_ that this is
safe. All I see as inputs are `str` and `input`, so how can I verify
that this is correct without looking at how these are defined.

> Of course the above example is a simple one, but often when using
> libc string handling functions for building strings the code gets
> complex and there are all kinds of "is the buffer full already? what
> about now? and now? and now?" and with all of those checks it's easy
> to make mistakes.

But also, those API are well known, along with their associated
problems. By creating a new API, there are new (unknown) ways to
abuse/misuse it.

> The point is that if the APIs are (nearly) impossible to use
> insecurely, it's very easy to verify that the code is safe. The code
> doesn't get safe by lots of checks everywhere, it gets safe by
> placing a minimal amount of checks to small area of the code. The
> correctness of a few checks is a lot easier to verify.

But what about that nearly impossible case?

Also, how do you use the safe string API with other API that don't
have a "safe" equivalent?

- Reece

  reply	other threads:[~2007-08-30 22:34 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-30 19:26 Buffer overflows Timo Sirainen
2007-08-30 20:26 ` Lukas Sandström
2007-08-30 20:46 ` Linus Torvalds
2007-08-30 21:08   ` Timo Sirainen
2007-08-30 21:35     ` Reece Dunn
2007-08-30 21:51       ` Timo Sirainen
2007-08-30 22:34         ` Reece Dunn [this message]
2007-08-31 10:52           ` Wincent Colaiuta
2007-08-31 12:48             ` Simon 'corecode' Schubert
2007-08-30 22:14       ` Junio C Hamano
2007-08-30 22:36         ` Pierre Habouzit
2007-08-30 22:41         ` Timo Sirainen
2007-09-02 13:42         ` Johan Herland
2007-09-02 15:11           ` Reece Dunn
2007-09-02 15:19             ` David Kastrup
2007-09-02 15:35               ` Reece Dunn
2007-09-03  0:19               ` Jakub Narebski
2007-09-03  0:31                 ` Junio C Hamano
2007-09-02 17:17           ` René Scharfe
2007-09-02 17:39             ` Lukas Sandström
2007-08-31  4:09     ` Linus Torvalds
2007-08-31  5:00       ` Timo Sirainen
2007-08-31  9:53         ` Andreas Ericsson
2007-08-31 10:06         ` Johannes Schindelin
2007-08-30 21:48 ` [PATCH] Temporary fix for stack smashing in mailinfo Alex Riesen
2007-08-30 22:53   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f4fd2640708301534k40f07a1cva90a59d12ace6138@mail.gmail.com \
    --to=msclrhd@googlemail.com \
    --cc=git@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tss@iki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).