Re: [patch 27/95] scanf: fix type range overflow

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [patch 27/95] scanf: fix type range overflow
       [not found] <55f0b472.ReLomj/XdFgiHSkY%akpm@linux-foundation.org>
@ 2015-09-09 23:52 ` Linus Torvalds
  2015-09-10 11:28   ` Alexey Dobriyan
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2015-09-09 23:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexey Dobriyan, Rasmus Villemoes, Linux Kernel Mailing List

On Wed, Sep 9, 2015 at 3:36 PM,  <akpm@linux-foundation.org> wrote:
> From: Alexey Dobriyan <adobriyan@gmail.com>
> Subject: scanf: fix type range overflow

This is another patch that claims to fix a bug, but looks
fundamentally broken unless I'm misreading something.

Like it or not, people write code like

    unsigned int x;

and then may well initialize it with "-1" or with 0xffffffff depending
on how they feel and what helper macros etc they used.

And I see no reason to believe that the same wouldn't be true of
sscanf().  In fact, I'm pretty sure it happens exactly for things like
initializing bitmasks etc.

But you seem to want to make this an error, because your seem to
verify that the underlying numerical value it fits in whatever
(signed/unsigned) type that is being converted.

Maybe I misread the code, but that's what it looks like.

And it's wrong. People may use "%d" for "unsigned int" exactly because
they want the "-1" to work, but "4294967294" should work too.

> Fun fact:
>
>         uint8_t val;
>         sscanf("256", "%hhu", &val);
>
> will return 1 and make val=0 (clearly bogus).

Fun fact: overflows are not at all always clearly bogus.

Come to think of it, I think that the parse_integer() interface may
have had the same bug.

Too much "I know better than you" is actually a bug.

                  Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 27/95] scanf: fix type range overflow
  2015-09-09 23:52 ` [patch 27/95] scanf: fix type range overflow Linus Torvalds
@ 2015-09-10 11:28   ` Alexey Dobriyan
  2015-09-10 18:01     ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Alexey Dobriyan @ 2015-09-10 11:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Rasmus Villemoes, Linux Kernel Mailing List

On Wed, Sep 09, 2015 at 04:52:29PM -0700, Linus Torvalds wrote:
> On Wed, Sep 9, 2015 at 3:36 PM,  <akpm@linux-foundation.org> wrote:
> > From: Alexey Dobriyan <adobriyan@gmail.com>
> > Subject: scanf: fix type range overflow
> 
> This is another patch that claims to fix a bug, but looks
> fundamentally broken unless I'm misreading something.
> 
> Like it or not, people write code like
> 
>     unsigned int x;
> 
> and then may well initialize it with "-1" or with 0xffffffff depending
> on how they feel and what helper macros etc they used.
> 
> And I see no reason to believe that the same wouldn't be true of
> sscanf().  In fact, I'm pretty sure it happens exactly for things like
> initializing bitmasks etc.
> 
> But you seem to want to make this an error, because your seem to
> verify that the underlying numerical value it fits in whatever
> (signed/unsigned) type that is being converted.
> 
> Maybe I misread the code, but that's what it looks like.

It does exactly that (which I think is saner default).

If you have very strict rules as default, someone who wants to implement
more relaxed parsing can easily do that. In case of parse_integer()
simple take next wider type. If it is "unsigned long long", well, look
for sign "-". It is certainly doable (and made explicit).

But if relaxed parsing is the foundation someone who wants very strict
parsing can't do that and forced to reimplement from scratch.
If all you have is strict_strtoull() what do you do? You scrap it.

> And it's wrong. People may use "%d" for "unsigned int" exactly because
> they want the "-1" to work, but "4294967294" should work too.
> 
> > Fun fact:
> >
> >         uint8_t val;
> >         sscanf("256", "%hhu", &val);
> >
> > will return 1 and make val=0 (clearly bogus).
> 
> Fun fact: overflows are not at all always clearly bogus.

> Come to think of it, I think that the parse_integer() interface may
> have had the same bug.
> 
> Too much "I know better than you" is actually a bug.

Maybe, but what to do with the example?
Should kernel write 0 when there are exactly no "0" digits
present in the string? It does that currently. We can't even ask
userspace for advice because glibc will return ERANGE for "unsigned int"
and higher and silently overflow for "unsigned char" and "unsigned short".

Overall, I don't think it is a problem at all.
kstrto*() and kstrto*_from_user() functions were merged 4.5 years ago and
all of them derive valid input range from type of the argument. There were
no problems so far.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 27/95] scanf: fix type range overflow
  2015-09-10 11:28   ` Alexey Dobriyan
@ 2015-09-10 18:01     ` Linus Torvalds
  2015-09-10 18:37       ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2015-09-10 18:01 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Andrew Morton, Rasmus Villemoes, Linux Kernel Mailing List

On Thu, Sep 10, 2015 at 4:28 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
>>
>> Maybe I misread the code, but that's what it looks like.
>
> It does exactly that (which I think is saner default).

No. You think *WRONG*.

Nobody in the history of the world has really cared about integer
parsing overflow. Really. It's just not an issue. It's not a security
issue (the possibly truncated number that is returned *could* have
been returned by just parsing that numebr in the first place), and
it's not a practical worry either.

Seriously. When was the last time you heard somebody complain about

   "I wrote the number 9223372036854775809, and the computer thought I
meant '1' - I'm really really upset"

Yeah, outside of mathematicians and bignum people, that complaint just
simply doesn't happen.

But the example I gave was *real*. Seriously. I didn't make some
theoretical argument. People really do write

   unsigned int x = -1;

and I just checked in the kernel - a quick grep seems to indicate that
we have around a hundred cases of that.

And that code isn't wrong.

YOUR CODE IS WRONG, AND REALITY SHOWS THAT YOUR DEFAULT IS CRAP.

Really. The whole "-1" thing is just a very common and convenient way
to say "all bits set" even for unsigned values.

I'd argue that it's an even *bigger* deal when parsing numbers in the
kernel, because in source code you could write it as

    unsigned long val = ~0ul;

if you are pedantic. You can't do anything similar when you have
something simple like a number parser rather than a real language
parser.

> If you have very strict rules as default, someone who wants to implement
> more relaxed parsing can easily do that.

BS.

The only reason for your interface was that it was simpler to use. You
broke that.

And you broke that for no good reason. As mentioned, the type size
overflow has never ever been a security issue, and simply _cannot_ be
one. And your incorrectly strict range check violates all
expectations, and all other numerical parsers. I just checked glibc,
and it is perfectly happy to use "%d" with an unsigned int, and it is
also perfectly happy to take MAX_UINT rather than MAX_INT.

In fact, you can happily feed scanf("%d") in user space something like

     -9223372036854775803

and it will return the integer "5". Nobody has ever cared. And that's
still a *better* end result than erroring out, because while the above
example is insane, examples like "-1" etc are not.

So your "default" is not actually safe. It breaks real cases, and
doesn't add any security.

It's broken.

                         Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 27/95] scanf: fix type range overflow
  2015-09-10 18:01     ` Linus Torvalds
@ 2015-09-10 18:37       ` Linus Torvalds
  2015-09-11 20:52         ` Alexey Dobriyan
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2015-09-10 18:37 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Andrew Morton, Rasmus Villemoes, Linux Kernel Mailing List

On Thu, Sep 10, 2015 at 11:01 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Nobody in the history of the world has really cared about integer
> parsing overflow. Really. It's just not an issue. It's not a security
> issue (the possibly truncated number that is returned *could* have
> been returned by just parsing that numebr in the first place), and
> it's not a practical worry either.

Btw, while the type-based range checking cannot possibly be a security
issue (because by definition, you could get the same value some other
way by just picking a valid number instead of relying on overflow),
there could be real security advantages to an interface that made the
actual accepted range be something explicitly given.

If course, the caller can always do *that* check trivially themselves,
but making it part of the interface might force people to think about
their expected valid ranges.

Also, as was shown by the "parse_integer()" failure, the error cases
really must not be mixed up with the returned parsed value, or with
the number of characters used.

The problem, of course, is that the more arguments you add to the
parsing function, the less convenient it gets.

But I could imagine something like

     int parse_integer(string, base, T *, <error-statement>);

could still be usable as an interface. Note that the return value
would be guaranteed to be positive (not zero, not negative), because
the error condition would execute the passed-in statement, so you
could write code like

    str += parse_integer(str, 0, &variable, return -EINVAL);

and it would basically mean

 - parse the integer in "str" with any base
 - if any error happens (no actual integer found), do "return -EINVAL"
 - otherwise, put the result in "variable" and return the length
(which is guaranteed positive)
 - it would not check the range in any shape or form.

and then we could have a helper function somthing like

   #define parse_integer_range(str,base,T,min,max,err) ({
        int __len; \
        typeof *T __tmp; \
        __len = parse_integer(str,base,&__tmp,0,0,err); \
        if (__tmp < (min) || __tmp > (max)) err; else *(T) = __tmp; \
        __len; \
    })

and notice how in the range-checking version, the value still gets
truncated to the type (which fundamentally cannot be a security
issue), but we then - within that type - check for the given actual
range.

So with that kind of setup you can write

     str += parse_integer_range(str, 10, &variable, 0, 5, return -EINVAL);

and know that "variable" will be set to a value between 0 and 5, or
we'll be returning an error.

I dunno. But basically the above seems to me to be *much* more useful
semantics for some actual range checking than any crazy type-based one
that by definition cannot help security, since it doesn't actually
limit the real range that might be assigned.

We might have special rules for "what happens if the error statement
is a fall-through statement". Maybe we can make the rule be that the
return value from "parse_integer()" in that case is zero, so that you
can do things like

     int i = 0, array[MAX] = { 0, };

    while (i < MAX) {
        str += parse_integer(str, 0, array+i, /*nothing*/);
        if (*str != ',')
            break;
        str++;
    }
    if (*str)
        return -EINVAL;

to fill in an array of integers where a missing integer keeps the
previous value (in this case 0). So the above could parse things like
"5,,,6".

The above seems kind of a special case, but I made it up more as an
example of how robust the semantics can be when you *don't* mix up the
value (or the number of chatacters eaten) with the error case.

              Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 27/95] scanf: fix type range overflow
  2015-09-10 18:37       ` Linus Torvalds
@ 2015-09-11 20:52         ` Alexey Dobriyan
  2015-09-11 21:19           ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Alexey Dobriyan @ 2015-09-11 20:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Rasmus Villemoes, Linux Kernel Mailing List

On Thu, Sep 10, 2015 at 11:37:53AM -0700, Linus Torvalds wrote:

1. Please use more caps, thread is rapidly dropping off Hacker News
   front page.

2. Security was never a concern here (or a very small one), this is all
   your imagination thinking strlcpy 2.0 is coming. Security was a concern
   when kstrto*_from_user() was added because people were copying and
   parsing numbers by hand so the opportunities for mistakes were real.
   I've checked grsec patches, they don't touch kstrto*_from_user(),
   which gives some hope that the code is all right. But then some idiot
   decided that kernel should also parse whitespace so the code like
   below couldn't be simplified:

	memset(buffer, 0, sizeof(buffer));
	if (count > sizeof(buffer) - 1)
		count = sizeof(buffer) - 1;
	if (copy_from_user(buffer, buf, count))
		return -EFAULT;
	rv = kstrtoint(strstrip(buffer), 0, &make_it_fail);
	if (rv < 0)
		return rv;
	if (make_it_fail < 0 || make_it_fail > 1)
		return -EINVAL;

   Returning to parse_integer(), whole thing is a matter of general
   robustness (no "be liberal in what you accept" rubbish) and
   hopefully nicer interface which is pleasant to use (1 function
   instead of 4, etc).

3. Your point that unchecked errors will result in bogus pointer is
   a valid one and I admit this is the case. Compiler warning on
   unchecked error in this case require gcc plugin or external checker
   or something non-trivial which kernel doesn't use. Quite sad.

4. Overall notion that overflow integer can't be security issue is
   a laughable one. I can't give you kernel example (kernel doesn't
   munge integers much obviously) but I can give you userspace example.
   with libpcre.

   Version 8.37 has notation (?[0-9]+) for forward/backward reference
   group. Internally group number is parsed into variable "int recno;"
   at pcre_compile.c:7324

	recno = 0;
	while(IS_DIGIT(*ptr))
		recno = recno * 10 + *ptr++ - CHAR_0;

   Typical code without overflow check. Later forward reference groups
   (starting with sign "+" or signless) are distinguished from backward
   reference groups (those starting with "-"). And there is very final
   check for overflowing past the number of match groups:

	if (recno != 0)
		called = PRIV(find_bracket)(cd->start_code, utf, recno);

	/* Forward reference */
	if (called == NULL)
	{
===>		if (recno > cd->final_bracount)
		{
			*errorcodeptr = ERR15;
			goto FAILED;
		}
		/* Fudge the value of "called" so that when it is inserted as an
		offset below, what it actually inserted is the reference number
		of the group. Then remember the forward reference. */

		called = cd->start_code + recno;

   So without type overflow check it is possible to supply seemingly
   positive number which will be parsed as negative and sneak in negative
   "recno" past all checks (signed variables for the win!).

   Real example: the following regex will crach pcre-8.37 with NULL
   pointer dereference (which is fallout, real bug happens earlier)
   if you try to pcre_compile()+pcre_study(PCRE_STUDY_JIT_COMPILE)

   	((?='))(?=(/(='D))?(?<!.(?2))(?=E(?2937177884))(?#(?0))

  Now I don't know if this particular crash is exploitable but exploit
  community never fail to impress.

  So, yes it can be a problem, and if someone wants to accept "-1" as
  a better user interface let him be explicit about it.

  As a matter of statistics there are 792+50=842 simple_strtoul/ll callers
  and 120+8=128 simple_strtol/ll callers, so the ratio is 1:6.6 not in
  favour of "-1" users because simple_strtoul/ll will not accept "-1".

5. The slight problem with your kitchen-sink proposal that C doesn't have
   optional/named macro and function arguments neither it does have real
   preprocessor. So people will have to specify INT_MIN/INT_MAX/...
   every invocation even if they don't care or if type information by itself
   is enough:

	uid_t loginuid;

	rv = kstrtou32_from_user(buf, count, 10, &loginuid);
	if (rv < 0)
		return rv;

   For there reasons parse_integer() leaves bound checking to the user.

   Adding "return -EINVAL" seems cool but, well, not with this language.
   Control flow changing macros are frowned upon, right?
   parse_integer() tries to behave like real C function to not create
   problems in this area.

At least it is good to know you aren't against __builtin_choose_expr() per se. :^)

	Alexey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 27/95] scanf: fix type range overflow
  2015-09-11 20:52         ` Alexey Dobriyan
@ 2015-09-11 21:19           ` Linus Torvalds
  0 siblings, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2015-09-11 21:19 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Andrew Morton, Rasmus Villemoes, Linux Kernel Mailing List

On Fri, Sep 11, 2015 at 1:52 PM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
>    Returning to parse_integer(), whole thing is a matter of general
>    robustness (no "be liberal in what you accept" rubbish) and
>    hopefully nicer interface which is pleasant to use (1 function
>    instead of 4, etc).

The thing is, that whole argument is totally invalidated by the fact
that you then convert existing users that have different semantics
than the ones you are introducing.

If this was a *new* interface with purely *new* users, then
"robustness" would be a possible argument for it.

But as it is, it is documented (and used) as a replacement interface.

Which means that you can't just make up BS about "it's more robust".
It has clearly different semantics, and those clearly different
semantics are

 (a) bad, as explained by my example of "unsigned int x = -1".

      I don't understand how you can just dismiss this very simple
case. It's not theoretical.

 (b) incompatible and leading to actual bugs, as exemplified by the
broken conversion.

The thing is, your very own conversion proved me right. That whole

    if (isdigit(*str))
        str += parse_integer(str ...)

would actually have been ok if it wasn't for the wrongheaded range
checking. The range checking directly led to bugs, because it
introduced a subtle error case that you clearly didn't think about.

Your arguments all are entirely irrelevant to the fundamental issue.

And then when I suggest a *sane* interface that doesn't have this
problem, your arguments are crap - again. Here:

> 5. The slight problem with your kitchen-sink proposal that C doesn't have
>   optional/named macro and function arguments neither it does have real
>   preprocessor. So people will have to specify INT_MIN/INT_MAX/...
>   every invocation even if they don't care or if type information by itself
>   is enough:

Bullshit. You clearly didn't even read my proposal. My proposal was
that the fundamental unit of conversion would be the *sane* one that
didn't check ranges at all. So if you don't care about a particular
range, you just do "parse_integer()".

And if you actually care about a particular range (which is almost
*never* the range of the type itself, but can be things like "the size
of an array" etc), you do "parse_integer_range()", and give the actual
range you care about.

So no. People would never ever specify INT_MIN/INT_MAX. Because if
that's the range they care about, then they clearly don't care about
the range at all.

Anyway, I'm not discussing this. You are clearly unwilling to just
admit that your patch-series was broken, and that assigning "-1" to an
unsigned entity is perfectly normal and common-place. As such, why
bother arguing?

                  Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-09-11 21:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <55f0b472.ReLomj/XdFgiHSkY%akpm@linux-foundation.org>
2015-09-09 23:52 ` [patch 27/95] scanf: fix type range overflow Linus Torvalds
2015-09-10 11:28   ` Alexey Dobriyan
2015-09-10 18:01     ` Linus Torvalds
2015-09-10 18:37       ` Linus Torvalds
2015-09-11 20:52         ` Alexey Dobriyan
2015-09-11 21:19           ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox