* bug in sscanf()?
@ 2014-01-09 18:57 Allan, Bruce W
2014-01-13 23:30 ` Al Viro
0 siblings, 1 reply; 6+ messages in thread
From: Allan, Bruce W @ 2014-01-09 18:57 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org; +Cc: Allan, Bruce W
The kernel version of sscanf() does not allow for scanning a string with multiple concatenated integers even when maximum field widths are specified to stop reading the string to delineate the individual integers. For example, trying to read the (3) 32-bit hexadecimal integers deadbeef, 12345678 and deadc0de when provided as a concatenated string with:
int num;
u32 i, j, k;
num = sscanf("deadbeef12345678deadc0de", "%8x%8x%8x", &i, &j, &k);
will return the number of input items successfully matched and assigned as 3, the 32-bit integers j and k will have the expected values, but i will be 0.
The libc version of sscanf(), however, will set the value of i to deadbeef.
Should this be expected with the kernel version or is it a bug?
Thanks,
Bruce.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: bug in sscanf()? 2014-01-09 18:57 bug in sscanf()? Allan, Bruce W @ 2014-01-13 23:30 ` Al Viro 2014-01-14 0:22 ` Linus Torvalds 0 siblings, 1 reply; 6+ messages in thread From: Al Viro @ 2014-01-13 23:30 UTC (permalink / raw) To: Allan, Bruce W Cc: linux-kernel@vger.kernel.org, Linus Torvalds, Jan Beulich, Alexey Dobriyan On Thu, Jan 09, 2014 at 06:57:29PM +0000, Allan, Bruce W wrote: > The kernel version of sscanf() does not allow for scanning a string with multiple concatenated integers even when maximum field widths are specified to stop reading the string to delineate the individual integers. For example, trying to read the (3) 32-bit hexadecimal integers deadbeef, 12345678 and deadc0de when provided as a concatenated string with: > > int num; > u32 i, j, k; > > num = sscanf("deadbeef12345678deadc0de", "%8x%8x%8x", &i, &j, &k); > > will return the number of input items successfully matched and assigned as 3, the 32-bit integers j and k will have the expected values, but i will be 0. > > The libc version of sscanf(), however, will set the value of i to deadbeef. > > Should this be expected with the kernel version or is it a bug? It's a bug, all right, and rather amusing one, at that. What happens is that vsscanf() does optimistic strtoul(), _then_ checks if we'd eaten more than field_width characters, then essentially divides by base^overshot. In your case it gets the last 16 digits from the string (the first 8 got lost on overflow in simple_strtoul()) and proceeds to divide by 16^16 (aka 2^64), discarding excessive digits it has eaten. Which is to say, all that hadn't been lost to overflow... Next time we take the same last 16 digits (this time - without overflows) and divides by 16^8, which yields the correct result. The last one takes last 8 digits and doesn't bother with divisions at all - again, correct result. It's definitely a bug, introduced in commit 538097. Behaviour prior to that had also been buggy, but in a more straightforward way - width used to be completely ignored, so this sscanf() call would've returned 1, with i equal to 0x12345678deadc0de after it. From commit message: " This is another step towards better standard conformance. Rather than adding a local buffer to store the specified portion of the string (with the need to enforce an arbitrary maximum supported width to limit the buffer size), do a maximum width conversion and then drop as much of it as is necessary to meet the caller's request." This "drop as much of it as necessary" is where it went wrong; we are dealing with finite-width type... FWIW, there's enough weirdness in userland sscanf() - handling of overflows is, er, underspecified both in C99 and in C11. Userland implementations tend to fail conversion with ERANGE in errno if the value wouldn't fit into intmax_t/uintmax_t and silently downcast in other cases, but according to ANSI C it's all nasal daemon territory, which makes the sucker rather useless on unsanitized inputs... As far as I understand the rationale is more or less "if that happens on fscanf(3), you are fscked anyway - no way to unread more than one character", which wouldn't apply to sscanf(3), but since it's modelled after fscanf/scanf and not the other way round... In any case, field width semantics *is* clear - conversion should act as if there had been no input beyond that many characters. Overflows or no overflows, %8x should never step into one on targets where int is >= 32 bits. Other fun there: * j is a valid size modifier, meaning intmax_t/uintmax_t. Easily fixed - just turn it into 'L'. * %ln and friends are legitimate; the meaning of size modifier is the same as for %d - next argument is treated as signed long * and result is stored there. Valid modifiers: h/hh/l/ll/j/z/t. Again, easily fixed - instead of buggering off in "if (*fmt == 'n')", put the value into val.s, set is_sign to 1, decrement the assignment count to balance the increment to be done downstream and jump to the handling of modifiers. * we do not support wide chars (%lc, %ls) and floating point stuff. Fair enough. * we do not support %[<scanset>] either. OK, we'd lived without it so far and there's not too many *scanf() users in the kernel. * our handling of %* assumes that input items cannot contain whitespace; that's obviously not true for e.g. %*c. While we are at it, handling of %*d is also broken - "%*d%c" on "12345z" should, AFAICS, return 1 and store 'z'. What we ought to do is remember whether there'd been a * and skip the va_arg(), stores and increment of conversion count in handling of 'c', 's' and integer conversions past the switch. Not pretty for %*n, but that's a nasal daemon country anyway. * "%x%s" on "0xz". AFAICS, we share a bug with glibc - split the input into "0x" and "z" (correct from standard POV - "0x" is a prefix of admissible string for %x) and pretend that conversion succeeded. BTW, FreeBSD splits into "0" and "xz" instead, which is not good for another reason - fine for sscanf(), but scanf(3) would have to unget two characters here and ANSI explicitly talks about consuming the longest prefix of a matching string and failing conversion if that prefix doesn't match by itself. Not sure if we care about that mess... * leading '-' or '+' appears to be allowed on *all* integer input items. Yes, including %u et.al. What standard says for %u is "Matches an optionally signed decimal integer, whose format is the same as expected for the subject sequence of strtoul function with the value 10 for the base argument". And yes, userland implementations will accept "-1" for "%u" and convert it to ~0U. The same goes for strtoul(3) and friends - basically, they consider everything from -ULONG_MAX to ULONG_MAX as representable in unsigned long. Hell knows whether we want to allow that - kstrto...() family has different calling conventions anyway, so it's hard to confuse for strto...(). In principle, things like "-1" for %u are sometimes convenient... I'm not sure what's the right approach here; the real PITA is overflow semantics. AFAICS, kernel-side there are only two sensible options - fail the conversion (as if the string had been truncated at the point before that input item) or silently convert the value, overflows or no overflows. We should honour the field width in any case, of course; this bug is a separate story and it needs to be fixed. What we obviously can't do is reporting details of overflow - errno doesn't exist in the kernel and returning a negative value would confuse the hell out of all callers. How about the following semantics: the longest string no longer than field width and matching a prefix of [-+]?(0[xX][0-9a-fA-F]*|[1-9][0-9]*|0[0-7]*) (for %i) [-+]?[0-9]* (for %d and %u) [-+]?[0-7]* (for %o) [-+]?[0-9a-fA-F]* (for %x) is chosen. If it doesn't match the regex by itself (e.g. "0x" for %i or "-" for any of them), we fail the conversion. If it does, the string represents some integer (say, X). The rest depends on the target type - for signed ones we fail unless X is in range for that type (and store X otherwise), for unsigned ones we fail unless |X| is in range (and store X cast to the target type otherwise). It's reasonably consistent and cheap to implement - we'd need a variant of _parse_integer() that would take maximal length as explicit argument, parse the sign right in vsscanf() and use that sucker to get the absolute value. Overflow of u64 => conversion failure, otherwise we need to compare absolute_value - is_negative with limit for type, i.e. check if there are non-zero bits beyond type_width - is_type_signed... Comments? I'll put together something along those lines once I dig myself from under the pile of mail that had been growing since mid-December when I got sick ;-/ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bug in sscanf()? 2014-01-13 23:30 ` Al Viro @ 2014-01-14 0:22 ` Linus Torvalds 2014-01-14 0:32 ` Allan, Bruce W 2014-01-14 2:47 ` Al Viro 0 siblings, 2 replies; 6+ messages in thread From: Linus Torvalds @ 2014-01-14 0:22 UTC (permalink / raw) To: Al Viro Cc: Allan, Bruce W, linux-kernel@vger.kernel.org, Jan Beulich, Alexey Dobriyan On Tue, Jan 14, 2014 at 6:30 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Comments? Do we have actual users of this? Because I'd almost be inclined to say "we just don't support field widths on sscanf() and will warn" unless there are users. We've done that before. The kernel has various limited functions. See the whole snprint() issue with %n, which we decided that supporting the full semantics was actually a big mistake and we actively *removed* code that had been misguidedly added just because people thought we should do everything a standard user library does.. Limiting our problem space is a *good* thing, not a bad thing. If it's possible, of course, and we don't have nasty users. Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: bug in sscanf()? 2014-01-14 0:22 ` Linus Torvalds @ 2014-01-14 0:32 ` Allan, Bruce W 2014-01-14 2:47 ` Al Viro 1 sibling, 0 replies; 6+ messages in thread From: Allan, Bruce W @ 2014-01-14 0:32 UTC (permalink / raw) To: Linus Torvalds, Al Viro Cc: linux-kernel@vger.kernel.org, Jan Beulich, Alexey Dobriyan [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1398 bytes --] > -----Original Message----- > From: linus971@gmail.com [mailto:linus971@gmail.com] On Behalf Of Linus > Torvalds > Sent: Monday, January 13, 2014 4:23 PM > To: Al Viro > Cc: Allan, Bruce W; linux-kernel@vger.kernel.org; Jan Beulich; Alexey > Dobriyan > Subject: Re: bug in sscanf()? > > On Tue, Jan 14, 2014 at 6:30 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Comments? > > Do we have actual users of this? Because I'd almost be inclined to say > "we just don't support field widths on sscanf() and will warn" unless > there are users. > > We've done that before. The kernel has various limited functions. See > the whole snprint() issue with %n, which we decided that supporting > the full semantics was actually a big mistake and we actively > *removed* code that had been misguidedly added just because people > thought we should do everything a standard user library does.. > > Limiting our problem space is a *good* thing, not a bad thing. > > If it's possible, of course, and we don't have nasty users. > > Linus I was hoping to use sscanf() in this way for a driver I'm working on to support Thunderbolt device authentication, but if it's too much to ask for I could probably work around this. Bruce. ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bug in sscanf()? 2014-01-14 0:22 ` Linus Torvalds 2014-01-14 0:32 ` Allan, Bruce W @ 2014-01-14 2:47 ` Al Viro 2014-01-14 2:52 ` Al Viro 1 sibling, 1 reply; 6+ messages in thread From: Al Viro @ 2014-01-14 2:47 UTC (permalink / raw) To: Linus Torvalds Cc: Allan, Bruce W, linux-kernel@vger.kernel.org, Jan Beulich, Alexey Dobriyan On Tue, Jan 14, 2014 at 07:22:49AM +0700, Linus Torvalds wrote: > On Tue, Jan 14, 2014 at 6:30 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Comments? > > Do we have actual users of this? Because I'd almost be inclined to say > "we just don't support field widths on sscanf() and will warn" unless > there are users. > > We've done that before. The kernel has various limited functions. See > the whole snprint() issue with %n, which we decided that supporting > the full semantics was actually a big mistake and we actively > *removed* code that had been misguidedly added just because people > thought we should do everything a standard user library does.. > > Limiting our problem space is a *good* thing, not a bad thing. > > If it's possible, of course, and we don't have nasty users. We do, unfortunately... Not the trigger for that bug, but yes, we do have places that pass things like %2hhx to sscanf(). Or stuff like sscanf(oh->name, "timer%2d", &id); Or if (sscanf(location, "%03d%c%02d^%02d#%d", rack, &type, bay, slot, slab) != 5) return -1; (nice cargo-culting there, BTW - somebody got too used to printf). So no, we can't just drop all field width support - in-tree code will break. Actually, it looks like correct version wouldn't be more complex than what we have now. Something like [completely untested] while ((c = *fmt) != '\0') { /* whitespace matches any amount of whitespace */ if (isspace(c)) { str = skip_spaces(str); continue; } /* non-% matches itself, %% matches solitary % */ if (c != '%' || (c = *fmt++) == '%') { if (c == *str++) continue; break; } /* %*conversion means "convert but don't store" */ suppress = c == '*'; if (suppress) c = *fmt++; /* optional field width */ width = -1; if (isdigit(c)) { width = c; while (isdigit(c = *fmt++)) { width = width * 10 + c - '0'; if (width > MAX_SHRT) goto Invalid; } if (!width) goto Invalid; } /* qualifier */ switch (c) { case 'h': if ((c = *fmt++) == 'h') qualifier = 'H'; /* hh: char */ else qualifier = 'h'; /* h: short */ break; case 'l': if ((c = *fmt++) == 'l') qualifier = 'L'; /* ll: long long */ else qualifier = 'l'; /* l: long */ break; case 'j': /* j: intmax_t, aka long long */ case 'L': /* L: long long */ case 'z': /* z: size_t */ case 'Z': /* Z: alias for 'z' */ case 't': /* t: ptrdiff_t */ qualifier = c; c = *fmt++; break; default: qualifier = 0; } if (c == 'n') { if (suppress) continue; /* maybe goto Invalid */ negative = 0; is_signed = 1; val = str - buf; num--; goto Store_it; } /* %c */ if (c == 'c') { /* might be worth complaining about qualifiers? */ /* %c is %1c */ if (width == -1) width = 1; if (!suppress) { char *p = (char *)va_arg(args, char*); if (!*str) break; do { *p++ = *str++; } while (--width && *str); num++; } else { while (*str++ && --width) ; } continue; } /* everything but %c, %n and %[ skips whitespaces */ str = skip_spaces(str); if (!*str) break; /* %s */ if (c == 's') { if (width == -1) width = SHRT_MAX; if (!suppress) { char *p = (char *)va_arg(args, char*); /* now copy until next white space */ while (*str && !isspace(*str) && width--) *p++ = *str++; *p = '\0'; num++; } else { while (*str && !isspace(*str) && width--) str++; } continue; } /* at that point it's numeric or bust */ base = 10; is_signed = 0; negative = 0; sign = 0; switch (c) { case 'o': base = 8; break; case 'x': base = 16; break; case 'i': base = 0; case 'd': is_signed = 1; case 'u': break; default: goto Invalid; } /* optional sign - counts towards width limit */ switch (*str) { case '-': negative = 1; case '+': str++; width--; } rv = parse_number(str, width, base, &val); if (rv < 0) goto Conversion_failed; str += rv; Store_it: #define STORE(stype, utype) \ if (is_signed) { \ if ((val - negative) & (-(u64)(1 + (utype)~0ULL)/2)) \ goto Conversion_failed; \ if (negative) \ val = -val; \ if (!suppress) \ *va_arg(args, stype *) = (stype)val; \ } else { \ if (val & -(u64)(1 + (utype)~0ULL)) \ goto Conversion_failed; \ if (negative) \ val = -val; \ if (!suppress) \ *va_arg(args, utype *) = (utype)val; \ } \ if (!suppress) \ num++; switch (qualifier) { case 'H': STORE(signed char, unsigned char); break; case 'h': STORE(short, unsigned short); break; case 'l': STORE(long, unsigned long); case 'L': case 'j': STORE(long long, unsigned long long); case 'z': case 'Z': STORE(long, size_t); case 't': STORE(ptrdiff_t, unsigned long); break; default: STORE(int, unsigned); } } out: return num; Invalid: /* probably complain about invalid format string */ Conversion_failed: return num; with parse_number(str, width, base, p) being something along the lines of const char *s = str; u64 val = 0; if (!width || !isdigit(*s)) return -1; if (!base) { base = 10; if (*s == '0' && width > 1) { if (s[1] == 'x' || s[1] == 'X') { base = 16; s += 2; if (width == 2 || !isxdigit(*s)) return -1; width -= 2; } else { base = 8; } } } while (*s && width) { unsigned d = base; if (isdigit(*s)) d = *s - '0'; else if (isxdigit(*s)) d = _tolower(*s) - 'a' + 10; if (d >= base) break; /* * Check for overflow only if we are within range of * it in the max base we support (16) */ if (unlikely(val & (~0ull << 60))) { if (val > div_u64(ULLONG_MAX - d, base)) return -1; } val = val * base + d; s++; width--; } *p = val; return s - str; ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: bug in sscanf()? 2014-01-14 2:47 ` Al Viro @ 2014-01-14 2:52 ` Al Viro 0 siblings, 0 replies; 6+ messages in thread From: Al Viro @ 2014-01-14 2:52 UTC (permalink / raw) To: Linus Torvalds Cc: Allan, Bruce W, linux-kernel@vger.kernel.org, Jan Beulich, Alexey Dobriyan On Tue, Jan 14, 2014 at 02:47:26AM +0000, Al Viro wrote: > have now. Something like [completely untested] And untested it certainly is - at least three missing breaks in there: > case 'H': > STORE(signed char, unsigned char); > break; > case 'h': > STORE(short, unsigned short); > break; > case 'l': > STORE(long, unsigned long); break; > case 'L': > case 'j': > STORE(long long, unsigned long long); break; > case 'z': > case 'Z': > STORE(long, size_t); break; ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-01-14 2:52 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-09 18:57 bug in sscanf()? Allan, Bruce W 2014-01-13 23:30 ` Al Viro 2014-01-14 0:22 ` Linus Torvalds 2014-01-14 0:32 ` Allan, Bruce W 2014-01-14 2:47 ` Al Viro 2014-01-14 2:52 ` Al Viro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox