* Re: [PATCH v3] sscanf: implement basic character sets
2016-02-23 20:38 [PATCH v3] sscanf: implement basic character sets Jessica Yu
@ 2016-02-23 20:47 ` Kees Cook
2016-02-23 22:05 ` Andrew Morton
2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
2 siblings, 0 replies; 8+ messages in thread
From: Kees Cook @ 2016-02-23 20:47 UTC (permalink / raw)
To: Jessica Yu; +Cc: Andrew Morton, Rasmus Villemoes, Andy Shevchenko, LKML
On Tue, Feb 23, 2016 at 12:38 PM, Jessica Yu <jeyu@redhat.com> wrote:
> Implement basic character sets for the '%[]' conversion specifier.
>
> The '%[]' conversion specifier matches a nonempty sequence of characters
> from the specified set of accepted (or with '^', rejected) characters
> between the brackets. The substring matched is to be made up of characters
> in (or not in) the set. This implementation differs from its glibc
> counterpart in that it does not support character ranges (e.g., 'a-z' or
> '0-9'), the hyphen '-' is *not* a special character, and the brackets
> themselves cannot be matched.
>
> Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
-Kees
> ---
>
> This patch adds support for the '%[' conversion specifier for sscanf().
> This is useful in cases where we'd like to match substrings delimited by
> something other than spaces. The original motivation for this patch
> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
> where we were trying to come up with a clean way to parse symbol names with
> substrings delimited by periods and commas.
>
> Patch based on linux-next-20160223.
>
> v3:
> - Fix memory leak in error path (kfree() before returning)
> - Remove redundant condition in while loop
> - Style fix (*op)() -> op()
>
> v2:
> - Use kstrndup() to copy the character set from fmt instead of using a
> statically allocated array
>
> lib/vsprintf.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 525c8e1..983358a 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
> num++;
> }
> continue;
> + case '[':
> + {
> + char *s = (char *)va_arg(args, char *);
> + char *set;
> + size_t (*op)(const char *str, const char *set);
> + size_t len = 0;
> + bool negate = (*(fmt) == '^');
> +
> + if (field_width == -1)
> + field_width = SHRT_MAX;
> +
> + op = negate ? &strcspn : &strspn;
> + if (negate)
> + fmt++;
> +
> + len = strcspn(fmt, "]");
> + /* invalid format; stop here */
> + if (!len)
> + return num;
> +
> + set = kstrndup(fmt, len, GFP_KERNEL);
> + if (!set)
> + return num;
> +
> + /* advance fmt past ']' */
> + fmt += len + 1;
> +
> + len = op(str, set);
> + /* no matches */
> + if (!len) {
> + kfree(set);
> + return num;
> + }
> +
> + while (len-- && field_width--)
> + *s++ = *str++;
> + *s = '\0';
> + kfree(set);
> + num++;
> + }
> + continue;
> case 'o':
> base = 8;
> break;
> --
> 2.4.3
>
--
Kees Cook
Chrome OS & Brillo Security
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v3] sscanf: implement basic character sets
2016-02-23 20:38 [PATCH v3] sscanf: implement basic character sets Jessica Yu
2016-02-23 20:47 ` Kees Cook
@ 2016-02-23 22:05 ` Andrew Morton
2016-02-24 5:13 ` Jessica Yu
2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
2 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2016-02-23 22:05 UTC (permalink / raw)
To: Jessica Yu; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel
On Tue, 23 Feb 2016 15:38:22 -0500 Jessica Yu <jeyu@redhat.com> wrote:
> Implement basic character sets for the '%[]' conversion specifier.
>
> The '%[]' conversion specifier matches a nonempty sequence of characters
> from the specified set of accepted (or with '^', rejected) characters
> between the brackets. The substring matched is to be made up of characters
> in (or not in) the set. This implementation differs from its glibc
> counterpart in that it does not support character ranges (e.g., 'a-z' or
> '0-9'), the hyphen '-' is *not* a special character, and the brackets
> themselves cannot be matched.
>
> Signed-off-by: Jessica Yu <jeyu@redhat.com>
> ---
>
> This patch adds support for the '%[' conversion specifier for sscanf().
> This is useful in cases where we'd like to match substrings delimited by
> something other than spaces. The original motivation for this patch
> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
> where we were trying to come up with a clean way to parse symbol names with
> substrings delimited by periods and commas.
It would be better to include the justification right here in the
changelog please. Not via some link-to-discussion and definitely not
below the ^--- marker! It's very important.
The deviation from the glibc behaviour is a bit of a worry,
particularly as it is done in a non-back-compat manner: code which
assumes "-" is non-magic might break if someone later adds range
support.
Presumably we can live with that - there won't be many callsites and
they can be grepped for. But please, let's get a description of all
these considerations into the code as a comment. Probably it would be
helpful to include a little usage example in that comment.
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
> num++;
> }
> continue;
> + case '[':
> + {
> + char *s = (char *)va_arg(args, char *);
> + char *set;
> + size_t (*op)(const char *str, const char *set);
> + size_t len = 0;
> + bool negate = (*(fmt) == '^');
> +
> + if (field_width == -1)
> + field_width = SHRT_MAX;
> +
> + op = negate ? &strcspn : &strspn;
> + if (negate)
> + fmt++;
> +
> + len = strcspn(fmt, "]");
> + /* invalid format; stop here */
> + if (!len)
> + return num;
> +
> + set = kstrndup(fmt, len, GFP_KERNEL);
Embedding a GFP_KERNEL allocation into vsscanf is problematic - it
limits the situations in which this functionality can be used.
afaict the allocation is there merely so we can null-terminate the
string so we can use existing library functions (strcspn, strspn). Is
that compromise really worth it? We could pretty easily convert
strcspn() into
strcnspn(const char *s, const char *reject, size_t len)
and convert strcspn() to call that (ifndef __HAVE_ARCH_STRCSPN)
In fact I think we could still use strspn() and strcspn() on `fmt'
directly? We just need to check for the return value exceeding `len'
and if so, treat that as a no-match?
> + if (!set)
> + return num;
> +
> + /* advance fmt past ']' */
> + fmt += len + 1;
> +
> + len = op(str, set);
> + /* no matches */
> + if (!len) {
> + kfree(set);
> + return num;
> + }
> +
> + while (len-- && field_width--)
> + *s++ = *str++;
> + *s = '\0';
> + kfree(set);
> + num++;
> + }
> + continue;
> case 'o':
> base = 8;
> break;
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: sscanf: implement basic character sets
2016-02-23 22:05 ` Andrew Morton
@ 2016-02-24 5:13 ` Jessica Yu
2016-02-24 5:28 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Jessica Yu @ 2016-02-24 5:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel
+++ Andrew Morton [23/02/16 14:05 -0800]:
>On Tue, 23 Feb 2016 15:38:22 -0500 Jessica Yu <jeyu@redhat.com> wrote:
>
>> Implement basic character sets for the '%[]' conversion specifier.
>>
>> The '%[]' conversion specifier matches a nonempty sequence of characters
>> from the specified set of accepted (or with '^', rejected) characters
>> between the brackets. The substring matched is to be made up of characters
>> in (or not in) the set. This implementation differs from its glibc
>> counterpart in that it does not support character ranges (e.g., 'a-z' or
>> '0-9'), the hyphen '-' is *not* a special character, and the brackets
>> themselves cannot be matched.
>>
>> Signed-off-by: Jessica Yu <jeyu@redhat.com>
>> ---
>>
>> This patch adds support for the '%[' conversion specifier for sscanf().
>> This is useful in cases where we'd like to match substrings delimited by
>> something other than spaces. The original motivation for this patch
>> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
>> where we were trying to come up with a clean way to parse symbol names with
>> substrings delimited by periods and commas.
>
> It would be better to include the justification right here in the
> changelog please.
> Not via some link-to-discussion and definitely not
> below the ^--- marker! It's very important.
Thanks for the corrections Andrew. I am however slightly confused, are
you suggesting that I should provide a much more thorough explanation
about the motivation here in the changelog (below the ^--- marker), or
would this be better suited for a (separate) cover letter?
>The deviation from the glibc behaviour is a bit of a worry,
>particularly as it is done in a non-back-compat manner: code which
>assumes "-" is non-magic might break if someone later adds range
>support.
>
>Presumably we can live with that - there won't be many callsites and
>they can be grepped for. But please, let's get a description of all
>these considerations into the code as a comment. Probably it would be
>helpful to include a little usage example in that comment.
Hm, that is a very good point. At the moment we can be sure there
aren't any users of sscanf() using the %[ conversion specifier, as it
doesn't exist yet :-) But yes, this behavior should be documented
clearly in a comment, so future users will be aware..
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>> num++;
>> }
>> continue;
>> + case '[':
>> + {
>> + char *s = (char *)va_arg(args, char *);
>> + char *set;
>> + size_t (*op)(const char *str, const char *set);
>> + size_t len = 0;
>> + bool negate = (*(fmt) == '^');
>> +
>> + if (field_width == -1)
>> + field_width = SHRT_MAX;
>> +
>> + op = negate ? &strcspn : &strspn;
>> + if (negate)
>> + fmt++;
>> +
>> + len = strcspn(fmt, "]");
>> + /* invalid format; stop here */
>> + if (!len)
>> + return num;
>> +
>> + set = kstrndup(fmt, len, GFP_KERNEL);
>
>Embedding a GFP_KERNEL allocation into vsscanf is problematic - it
>limits the situations in which this functionality can be used.
>
>afaict the allocation is there merely so we can null-terminate the
>string so we can use existing library functions (strcspn, strspn). Is
>that compromise really worth it? We could pretty easily convert
>strcspn() into
>
> strcnspn(const char *s, const char *reject, size_t len)
>
>and convert strcspn() to call that (ifndef __HAVE_ARCH_STRCSPN)
>
>In fact I think we could still use strspn() and strcspn() on `fmt'
>directly? We just need to check for the return value exceeding `len'
>and if so, treat that as a no-match?
>
Perhaps we can use Rasmus' bitmap solution, as it avoids the
allocation altogether and it doesn't need to use strspn()/strcspn().
>> + if (!set)
>> + return num;
>> +
>> + /* advance fmt past ']' */
>> + fmt += len + 1;
>> +
>> + len = op(str, set);
>> + /* no matches */
>> + if (!len) {
>> + kfree(set);
>> + return num;
>> + }
>> +
>> + while (len-- && field_width--)
>> + *s++ = *str++;
>> + *s = '\0';
>> + kfree(set);
>> + num++;
>> + }
>> + continue;
>> case 'o':
>> base = 8;
>> break;
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: sscanf: implement basic character sets
2016-02-24 5:13 ` Jessica Yu
@ 2016-02-24 5:28 ` Andrew Morton
0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2016-02-24 5:28 UTC (permalink / raw)
To: Jessica Yu; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel
On Wed, 24 Feb 2016 00:13:47 -0500 Jessica Yu <jeyu@redhat.com> wrote:
> >> This patch adds support for the '%[' conversion specifier for sscanf().
> >> This is useful in cases where we'd like to match substrings delimited by
> >> something other than spaces. The original motivation for this patch
> >> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
> >> where we were trying to come up with a clean way to parse symbol names with
> >> substrings delimited by periods and commas.
> >
> > It would be better to include the justification right here in the
> > changelog please.
> > Not via some link-to-discussion and definitely not
> > below the ^--- marker! It's very important.
>
> Thanks for the corrections Andrew. I am however slightly confused, are
> you suggesting that I should provide a much more thorough explanation
> about the motivation here in the changelog (below the ^--- marker), or
> would this be better suited for a (separate) cover letter?
Just in the plain old changelog is good - if it was in [0/n] I'd only
move it into the changelog anyway.
And 99.9% of the stuff people put below ^--- is useful so I always end
up moving that into the changelog as well...
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3] sscanf: implement basic character sets
2016-02-23 20:38 [PATCH v3] sscanf: implement basic character sets Jessica Yu
2016-02-23 20:47 ` Kees Cook
2016-02-23 22:05 ` Andrew Morton
@ 2016-02-23 22:47 ` Rasmus Villemoes
2016-02-24 0:01 ` Rasmus Villemoes
2016-02-24 5:39 ` Jessica Yu
2 siblings, 2 replies; 8+ messages in thread
From: Rasmus Villemoes @ 2016-02-23 22:47 UTC (permalink / raw)
To: Jessica Yu; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel
On Tue, Feb 23 2016, Jessica Yu <jeyu@redhat.com> wrote:
> Implement basic character sets for the '%[]' conversion specifier.
>
>
> lib/vsprintf.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 525c8e1..983358a 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
> num++;
> }
> continue;
> + case '[':
> + {
> + char *s = (char *)va_arg(args, char *);
> + char *set;
> + size_t (*op)(const char *str, const char *set);
> + size_t len = 0;
> + bool negate = (*(fmt) == '^');
> +
> + if (field_width == -1)
> + field_width = SHRT_MAX;
> +
> + op = negate ? &strcspn : &strspn;
> + if (negate)
> + fmt++;
> +
> + len = strcspn(fmt, "]");
> + /* invalid format; stop here */
> + if (!len)
> + return num;
> +
> + set = kstrndup(fmt, len, GFP_KERNEL);
> + if (!set)
> + return num;
> +
> + /* advance fmt past ']' */
> + fmt += len + 1;
> +
> + len = op(str, set);
> + /* no matches */
> + if (!len) {
> + kfree(set);
> + return num;
> + }
> +
> + while (len-- && field_width--)
> + *s++ = *str++;
> + *s = '\0';
> + kfree(set);
> + num++;
> + }
> + continue;
> case 'o':
> base = 8;
> break;
(1) How do we know that doing a memory allocation would be ok, and then
with GFP_KERNEL? vsnprintf can be called from just about any context, so
I don't think that would fly there. Sooner or later someone is going to
be calling sscanf with a spinlock held, methinks.
(2) I think a field width should be mandatory (so %[ should simply be
regarded as malformed - it should be %*[ or %n[ for some explicit
decimal n). That will allow the compiler or other static analyzers to do
sanity checking, and we'll probably be saved from a few buffer
overflows down the line.
It's a bit sad that the C standard doesn't include the terminating '\0'
in the field width, so one would sometimes have to write
'(int)sizeof(buf)-1', but there's not much to do about that. On that
note, it seems that your field width handling is off-by-one.
To get rid of the allocation, why not use a small bitmap? Something like
{
char *s = (char *)va_arg(args, char *);
DECLARE_BITMAP(map, 256) = {0};
bool negate = false;
/* a field width is required, and must provide room for at least a '\0' */
if (field_width <= 0)
return num;
if (*fmt == '^') {
negate = true;
++fmt;
}
for ( ; *fmt && *fmt != ']'; ++fmt)
set_bit((u8)*fmt, map);
if (!*fmt) // no ], so malformed input
return num;
++fmt;
if (negate) {
bitmap_complement(map, map, 256);
clear_bit(0, map); // this avoids testing *str != '\0' below
}
if (!test_bit((u8)*str, map)) // match must be non-empty
return num;
while (test_bit((u8)*str, map) && --field_width) {
*s++ = *str++;
}
*s = '\0';
++num;
}
Rasmus
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v3] sscanf: implement basic character sets
2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
@ 2016-02-24 0:01 ` Rasmus Villemoes
2016-02-24 5:39 ` Jessica Yu
1 sibling, 0 replies; 8+ messages in thread
From: Rasmus Villemoes @ 2016-02-24 0:01 UTC (permalink / raw)
To: Jessica Yu; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel
On Tue, Feb 23 2016, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:
> On that note, it seems that your field width handling is off-by-one.
Sorry about that, it's me who's off-by-one.
Rasmus
> To get rid of the allocation, why not use a small bitmap? Something like
>
> {
> char *s = (char *)va_arg(args, char *);
> DECLARE_BITMAP(map, 256) = {0};
> bool negate = false;
>
> /* a field width is required, and must provide room for at least a '\0' */
> if (field_width <= 0)
> return num;
>
should be
/* a field width is required */
if (field_width < 0)
and
> while (test_bit((u8)*str, map) && --field_width) {
should be field_width--, exactly as in your code.
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: sscanf: implement basic character sets
2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
2016-02-24 0:01 ` Rasmus Villemoes
@ 2016-02-24 5:39 ` Jessica Yu
1 sibling, 0 replies; 8+ messages in thread
From: Jessica Yu @ 2016-02-24 5:39 UTC (permalink / raw)
To: Rasmus Villemoes; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel
+++ Rasmus Villemoes [23/02/16 23:47 +0100]:
>On Tue, Feb 23 2016, Jessica Yu <jeyu@redhat.com> wrote:
>
>> Implement basic character sets for the '%[]' conversion specifier.
>>
>>
>> lib/vsprintf.c | 41 +++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 41 insertions(+)
>>
>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>> index 525c8e1..983358a 100644
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>> num++;
>> }
>> continue;
>> + case '[':
>> + {
>> + char *s = (char *)va_arg(args, char *);
>> + char *set;
>> + size_t (*op)(const char *str, const char *set);
>> + size_t len = 0;
>> + bool negate = (*(fmt) == '^');
>> +
>> + if (field_width == -1)
>> + field_width = SHRT_MAX;
>> +
>> + op = negate ? &strcspn : &strspn;
>> + if (negate)
>> + fmt++;
>> +
>> + len = strcspn(fmt, "]");
>> + /* invalid format; stop here */
>> + if (!len)
>> + return num;
>> +
>> + set = kstrndup(fmt, len, GFP_KERNEL);
>> + if (!set)
>> + return num;
>> +
>> + /* advance fmt past ']' */
>> + fmt += len + 1;
>> +
>> + len = op(str, set);
>> + /* no matches */
>> + if (!len) {
>> + kfree(set);
>> + return num;
>> + }
>> +
>> + while (len-- && field_width--)
>> + *s++ = *str++;
>> + *s = '\0';
>> + kfree(set);
>> + num++;
>> + }
>> + continue;
>> case 'o':
>> base = 8;
>> break;
>
>(1) How do we know that doing a memory allocation would be ok, and then
>with GFP_KERNEL? vsnprintf can be called from just about any context, so
>I don't think that would fly there. Sooner or later someone is going to
>be calling sscanf with a spinlock held, methinks.
>
>(2) I think a field width should be mandatory (so %[ should simply be
>regarded as malformed - it should be %*[ or %n[ for some explicit
>decimal n). That will allow the compiler or other static analyzers to do
>sanity checking, and we'll probably be saved from a few buffer
>overflows down the line.
>
>It's a bit sad that the C standard doesn't include the terminating '\0'
>in the field width, so one would sometimes have to write
>'(int)sizeof(buf)-1', but there's not much to do about that. On that
>note, it seems that your field width handling is off-by-one.
>
>To get rid of the allocation, why not use a small bitmap? Something like
>
>{
> char *s = (char *)va_arg(args, char *);
> DECLARE_BITMAP(map, 256) = {0};
> bool negate = false;
>
> /* a field width is required, and must provide room for at least a '\0' */
> if (field_width <= 0)
> return num;
>
> if (*fmt == '^') {
> negate = true;
> ++fmt;
> }
> for ( ; *fmt && *fmt != ']'; ++fmt)
> set_bit((u8)*fmt, map);
> if (!*fmt) // no ], so malformed input
> return num;
> ++fmt;
> if (negate) {
> bitmap_complement(map, map, 256);
> clear_bit(0, map); // this avoids testing *str != '\0' below
> }
>
> if (!test_bit((u8)*str, map)) // match must be non-empty
> return num;
> while (test_bit((u8)*str, map) && --field_width) {
> *s++ = *str++;
> }
> *s = '\0';
> ++num;
>}
I quite like this idea, as it avoids allocations and doesn't need
strcspn/strspn. What do other people think?
^ permalink raw reply [flat|nested] 8+ messages in thread