* Re: [patch 27/95] scanf: fix type range overflow [not found] <55f0b472.ReLomj/XdFgiHSkY%akpm@linux-foundation.org> @ 2015-09-09 23:52 ` Linus Torvalds 2015-09-10 11:28 ` Alexey Dobriyan 0 siblings, 1 reply; 6+ messages in thread From: Linus Torvalds @ 2015-09-09 23:52 UTC (permalink / raw) To: Andrew Morton Cc: Alexey Dobriyan, Rasmus Villemoes, Linux Kernel Mailing List On Wed, Sep 9, 2015 at 3:36 PM, <akpm@linux-foundation.org> wrote: > From: Alexey Dobriyan <adobriyan@gmail.com> > Subject: scanf: fix type range overflow This is another patch that claims to fix a bug, but looks fundamentally broken unless I'm misreading something. Like it or not, people write code like unsigned int x; and then may well initialize it with "-1" or with 0xffffffff depending on how they feel and what helper macros etc they used. And I see no reason to believe that the same wouldn't be true of sscanf(). In fact, I'm pretty sure it happens exactly for things like initializing bitmasks etc. But you seem to want to make this an error, because your seem to verify that the underlying numerical value it fits in whatever (signed/unsigned) type that is being converted. Maybe I misread the code, but that's what it looks like. And it's wrong. People may use "%d" for "unsigned int" exactly because they want the "-1" to work, but "4294967294" should work too. > Fun fact: > > uint8_t val; > sscanf("256", "%hhu", &val); > > will return 1 and make val=0 (clearly bogus). Fun fact: overflows are not at all always clearly bogus. Come to think of it, I think that the parse_integer() interface may have had the same bug. Too much "I know better than you" is actually a bug. Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 27/95] scanf: fix type range overflow 2015-09-09 23:52 ` [patch 27/95] scanf: fix type range overflow Linus Torvalds @ 2015-09-10 11:28 ` Alexey Dobriyan 2015-09-10 18:01 ` Linus Torvalds 0 siblings, 1 reply; 6+ messages in thread From: Alexey Dobriyan @ 2015-09-10 11:28 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Rasmus Villemoes, Linux Kernel Mailing List On Wed, Sep 09, 2015 at 04:52:29PM -0700, Linus Torvalds wrote: > On Wed, Sep 9, 2015 at 3:36 PM, <akpm@linux-foundation.org> wrote: > > From: Alexey Dobriyan <adobriyan@gmail.com> > > Subject: scanf: fix type range overflow > > This is another patch that claims to fix a bug, but looks > fundamentally broken unless I'm misreading something. > > Like it or not, people write code like > > unsigned int x; > > and then may well initialize it with "-1" or with 0xffffffff depending > on how they feel and what helper macros etc they used. > > And I see no reason to believe that the same wouldn't be true of > sscanf(). In fact, I'm pretty sure it happens exactly for things like > initializing bitmasks etc. > > But you seem to want to make this an error, because your seem to > verify that the underlying numerical value it fits in whatever > (signed/unsigned) type that is being converted. > > Maybe I misread the code, but that's what it looks like. It does exactly that (which I think is saner default). If you have very strict rules as default, someone who wants to implement more relaxed parsing can easily do that. In case of parse_integer() simple take next wider type. If it is "unsigned long long", well, look for sign "-". It is certainly doable (and made explicit). But if relaxed parsing is the foundation someone who wants very strict parsing can't do that and forced to reimplement from scratch. If all you have is strict_strtoull() what do you do? You scrap it. > And it's wrong. People may use "%d" for "unsigned int" exactly because > they want the "-1" to work, but "4294967294" should work too. > > > Fun fact: > > > > uint8_t val; > > sscanf("256", "%hhu", &val); > > > > will return 1 and make val=0 (clearly bogus). > > Fun fact: overflows are not at all always clearly bogus. > Come to think of it, I think that the parse_integer() interface may > have had the same bug. > > Too much "I know better than you" is actually a bug. Maybe, but what to do with the example? Should kernel write 0 when there are exactly no "0" digits present in the string? It does that currently. We can't even ask userspace for advice because glibc will return ERANGE for "unsigned int" and higher and silently overflow for "unsigned char" and "unsigned short". Overall, I don't think it is a problem at all. kstrto*() and kstrto*_from_user() functions were merged 4.5 years ago and all of them derive valid input range from type of the argument. There were no problems so far. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 27/95] scanf: fix type range overflow 2015-09-10 11:28 ` Alexey Dobriyan @ 2015-09-10 18:01 ` Linus Torvalds 2015-09-10 18:37 ` Linus Torvalds 0 siblings, 1 reply; 6+ messages in thread From: Linus Torvalds @ 2015-09-10 18:01 UTC (permalink / raw) To: Alexey Dobriyan Cc: Andrew Morton, Rasmus Villemoes, Linux Kernel Mailing List On Thu, Sep 10, 2015 at 4:28 AM, Alexey Dobriyan <adobriyan@gmail.com> wrote: >> >> Maybe I misread the code, but that's what it looks like. > > It does exactly that (which I think is saner default). No. You think *WRONG*. Nobody in the history of the world has really cared about integer parsing overflow. Really. It's just not an issue. It's not a security issue (the possibly truncated number that is returned *could* have been returned by just parsing that numebr in the first place), and it's not a practical worry either. Seriously. When was the last time you heard somebody complain about "I wrote the number 9223372036854775809, and the computer thought I meant '1' - I'm really really upset" Yeah, outside of mathematicians and bignum people, that complaint just simply doesn't happen. But the example I gave was *real*. Seriously. I didn't make some theoretical argument. People really do write unsigned int x = -1; and I just checked in the kernel - a quick grep seems to indicate that we have around a hundred cases of that. And that code isn't wrong. YOUR CODE IS WRONG, AND REALITY SHOWS THAT YOUR DEFAULT IS CRAP. Really. The whole "-1" thing is just a very common and convenient way to say "all bits set" even for unsigned values. I'd argue that it's an even *bigger* deal when parsing numbers in the kernel, because in source code you could write it as unsigned long val = ~0ul; if you are pedantic. You can't do anything similar when you have something simple like a number parser rather than a real language parser. > If you have very strict rules as default, someone who wants to implement > more relaxed parsing can easily do that. BS. The only reason for your interface was that it was simpler to use. You broke that. And you broke that for no good reason. As mentioned, the type size overflow has never ever been a security issue, and simply _cannot_ be one. And your incorrectly strict range check violates all expectations, and all other numerical parsers. I just checked glibc, and it is perfectly happy to use "%d" with an unsigned int, and it is also perfectly happy to take MAX_UINT rather than MAX_INT. In fact, you can happily feed scanf("%d") in user space something like -9223372036854775803 and it will return the integer "5". Nobody has ever cared. And that's still a *better* end result than erroring out, because while the above example is insane, examples like "-1" etc are not. So your "default" is not actually safe. It breaks real cases, and doesn't add any security. It's broken. Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 27/95] scanf: fix type range overflow 2015-09-10 18:01 ` Linus Torvalds @ 2015-09-10 18:37 ` Linus Torvalds 2015-09-11 20:52 ` Alexey Dobriyan 0 siblings, 1 reply; 6+ messages in thread From: Linus Torvalds @ 2015-09-10 18:37 UTC (permalink / raw) To: Alexey Dobriyan Cc: Andrew Morton, Rasmus Villemoes, Linux Kernel Mailing List On Thu, Sep 10, 2015 at 11:01 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Nobody in the history of the world has really cared about integer > parsing overflow. Really. It's just not an issue. It's not a security > issue (the possibly truncated number that is returned *could* have > been returned by just parsing that numebr in the first place), and > it's not a practical worry either. Btw, while the type-based range checking cannot possibly be a security issue (because by definition, you could get the same value some other way by just picking a valid number instead of relying on overflow), there could be real security advantages to an interface that made the actual accepted range be something explicitly given. If course, the caller can always do *that* check trivially themselves, but making it part of the interface might force people to think about their expected valid ranges. Also, as was shown by the "parse_integer()" failure, the error cases really must not be mixed up with the returned parsed value, or with the number of characters used. The problem, of course, is that the more arguments you add to the parsing function, the less convenient it gets. But I could imagine something like int parse_integer(string, base, T *, <error-statement>); could still be usable as an interface. Note that the return value would be guaranteed to be positive (not zero, not negative), because the error condition would execute the passed-in statement, so you could write code like str += parse_integer(str, 0, &variable, return -EINVAL); and it would basically mean - parse the integer in "str" with any base - if any error happens (no actual integer found), do "return -EINVAL" - otherwise, put the result in "variable" and return the length (which is guaranteed positive) - it would not check the range in any shape or form. and then we could have a helper function somthing like #define parse_integer_range(str,base,T,min,max,err) ({ int __len; \ typeof *T __tmp; \ __len = parse_integer(str,base,&__tmp,0,0,err); \ if (__tmp < (min) || __tmp > (max)) err; else *(T) = __tmp; \ __len; \ }) and notice how in the range-checking version, the value still gets truncated to the type (which fundamentally cannot be a security issue), but we then - within that type - check for the given actual range. So with that kind of setup you can write str += parse_integer_range(str, 10, &variable, 0, 5, return -EINVAL); and know that "variable" will be set to a value between 0 and 5, or we'll be returning an error. I dunno. But basically the above seems to me to be *much* more useful semantics for some actual range checking than any crazy type-based one that by definition cannot help security, since it doesn't actually limit the real range that might be assigned. We might have special rules for "what happens if the error statement is a fall-through statement". Maybe we can make the rule be that the return value from "parse_integer()" in that case is zero, so that you can do things like int i = 0, array[MAX] = { 0, }; while (i < MAX) { str += parse_integer(str, 0, array+i, /*nothing*/); if (*str != ',') break; str++; } if (*str) return -EINVAL; to fill in an array of integers where a missing integer keeps the previous value (in this case 0). So the above could parse things like "5,,,6". The above seems kind of a special case, but I made it up more as an example of how robust the semantics can be when you *don't* mix up the value (or the number of chatacters eaten) with the error case. Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 27/95] scanf: fix type range overflow 2015-09-10 18:37 ` Linus Torvalds @ 2015-09-11 20:52 ` Alexey Dobriyan 2015-09-11 21:19 ` Linus Torvalds 0 siblings, 1 reply; 6+ messages in thread From: Alexey Dobriyan @ 2015-09-11 20:52 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Rasmus Villemoes, Linux Kernel Mailing List On Thu, Sep 10, 2015 at 11:37:53AM -0700, Linus Torvalds wrote: 1. Please use more caps, thread is rapidly dropping off Hacker News front page. 2. Security was never a concern here (or a very small one), this is all your imagination thinking strlcpy 2.0 is coming. Security was a concern when kstrto*_from_user() was added because people were copying and parsing numbers by hand so the opportunities for mistakes were real. I've checked grsec patches, they don't touch kstrto*_from_user(), which gives some hope that the code is all right. But then some idiot decided that kernel should also parse whitespace so the code like below couldn't be simplified: memset(buffer, 0, sizeof(buffer)); if (count > sizeof(buffer) - 1) count = sizeof(buffer) - 1; if (copy_from_user(buffer, buf, count)) return -EFAULT; rv = kstrtoint(strstrip(buffer), 0, &make_it_fail); if (rv < 0) return rv; if (make_it_fail < 0 || make_it_fail > 1) return -EINVAL; Returning to parse_integer(), whole thing is a matter of general robustness (no "be liberal in what you accept" rubbish) and hopefully nicer interface which is pleasant to use (1 function instead of 4, etc). 3. Your point that unchecked errors will result in bogus pointer is a valid one and I admit this is the case. Compiler warning on unchecked error in this case require gcc plugin or external checker or something non-trivial which kernel doesn't use. Quite sad. 4. Overall notion that overflow integer can't be security issue is a laughable one. I can't give you kernel example (kernel doesn't munge integers much obviously) but I can give you userspace example. with libpcre. Version 8.37 has notation (?[0-9]+) for forward/backward reference group. Internally group number is parsed into variable "int recno;" at pcre_compile.c:7324 recno = 0; while(IS_DIGIT(*ptr)) recno = recno * 10 + *ptr++ - CHAR_0; Typical code without overflow check. Later forward reference groups (starting with sign "+" or signless) are distinguished from backward reference groups (those starting with "-"). And there is very final check for overflowing past the number of match groups: if (recno != 0) called = PRIV(find_bracket)(cd->start_code, utf, recno); /* Forward reference */ if (called == NULL) { ===> if (recno > cd->final_bracount) { *errorcodeptr = ERR15; goto FAILED; } /* Fudge the value of "called" so that when it is inserted as an offset below, what it actually inserted is the reference number of the group. Then remember the forward reference. */ called = cd->start_code + recno; So without type overflow check it is possible to supply seemingly positive number which will be parsed as negative and sneak in negative "recno" past all checks (signed variables for the win!). Real example: the following regex will crach pcre-8.37 with NULL pointer dereference (which is fallout, real bug happens earlier) if you try to pcre_compile()+pcre_study(PCRE_STUDY_JIT_COMPILE) ((?='))(?=(/(='D))?(?<!.(?2))(?=E(?2937177884))(?#(?0)) Now I don't know if this particular crash is exploitable but exploit community never fail to impress. So, yes it can be a problem, and if someone wants to accept "-1" as a better user interface let him be explicit about it. As a matter of statistics there are 792+50=842 simple_strtoul/ll callers and 120+8=128 simple_strtol/ll callers, so the ratio is 1:6.6 not in favour of "-1" users because simple_strtoul/ll will not accept "-1". 5. The slight problem with your kitchen-sink proposal that C doesn't have optional/named macro and function arguments neither it does have real preprocessor. So people will have to specify INT_MIN/INT_MAX/... every invocation even if they don't care or if type information by itself is enough: uid_t loginuid; rv = kstrtou32_from_user(buf, count, 10, &loginuid); if (rv < 0) return rv; For there reasons parse_integer() leaves bound checking to the user. Adding "return -EINVAL" seems cool but, well, not with this language. Control flow changing macros are frowned upon, right? parse_integer() tries to behave like real C function to not create problems in this area. At least it is good to know you aren't against __builtin_choose_expr() per se. :^) Alexey ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [patch 27/95] scanf: fix type range overflow 2015-09-11 20:52 ` Alexey Dobriyan @ 2015-09-11 21:19 ` Linus Torvalds 0 siblings, 0 replies; 6+ messages in thread From: Linus Torvalds @ 2015-09-11 21:19 UTC (permalink / raw) To: Alexey Dobriyan Cc: Andrew Morton, Rasmus Villemoes, Linux Kernel Mailing List On Fri, Sep 11, 2015 at 1:52 PM, Alexey Dobriyan <adobriyan@gmail.com> wrote: > > Returning to parse_integer(), whole thing is a matter of general > robustness (no "be liberal in what you accept" rubbish) and > hopefully nicer interface which is pleasant to use (1 function > instead of 4, etc). The thing is, that whole argument is totally invalidated by the fact that you then convert existing users that have different semantics than the ones you are introducing. If this was a *new* interface with purely *new* users, then "robustness" would be a possible argument for it. But as it is, it is documented (and used) as a replacement interface. Which means that you can't just make up BS about "it's more robust". It has clearly different semantics, and those clearly different semantics are (a) bad, as explained by my example of "unsigned int x = -1". I don't understand how you can just dismiss this very simple case. It's not theoretical. (b) incompatible and leading to actual bugs, as exemplified by the broken conversion. The thing is, your very own conversion proved me right. That whole if (isdigit(*str)) str += parse_integer(str ...) would actually have been ok if it wasn't for the wrongheaded range checking. The range checking directly led to bugs, because it introduced a subtle error case that you clearly didn't think about. Your arguments all are entirely irrelevant to the fundamental issue. And then when I suggest a *sane* interface that doesn't have this problem, your arguments are crap - again. Here: > 5. The slight problem with your kitchen-sink proposal that C doesn't have > optional/named macro and function arguments neither it does have real > preprocessor. So people will have to specify INT_MIN/INT_MAX/... > every invocation even if they don't care or if type information by itself > is enough: Bullshit. You clearly didn't even read my proposal. My proposal was that the fundamental unit of conversion would be the *sane* one that didn't check ranges at all. So if you don't care about a particular range, you just do "parse_integer()". And if you actually care about a particular range (which is almost *never* the range of the type itself, but can be things like "the size of an array" etc), you do "parse_integer_range()", and give the actual range you care about. So no. People would never ever specify INT_MIN/INT_MAX. Because if that's the range they care about, then they clearly don't care about the range at all. Anyway, I'm not discussing this. You are clearly unwilling to just admit that your patch-series was broken, and that assigning "-1" to an unsigned entity is perfectly normal and common-place. As such, why bother arguing? Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2015-09-11 21:19 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <55f0b472.ReLomj/XdFgiHSkY%akpm@linux-foundation.org>
2015-09-09 23:52 ` [patch 27/95] scanf: fix type range overflow Linus Torvalds
2015-09-10 11:28 ` Alexey Dobriyan
2015-09-10 18:01 ` Linus Torvalds
2015-09-10 18:37 ` Linus Torvalds
2015-09-11 20:52 ` Alexey Dobriyan
2015-09-11 21:19 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox