[PATCH v3] sscanf: implement basic character sets

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] sscanf: implement basic character sets
@ 2016-02-23 20:38 Jessica Yu
  2016-02-23 20:47 ` Kees Cook
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jessica Yu @ 2016-02-23 20:38 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel,
	Jessica Yu

Implement basic character sets for the '%[]' conversion specifier.

The '%[]' conversion specifier matches a nonempty sequence of characters
from the specified set of accepted (or with '^', rejected) characters
between the brackets. The substring matched is to be made up of characters
in (or not in) the set. This implementation differs from its glibc
counterpart in that it does not support character ranges (e.g., 'a-z' or
'0-9'), the hyphen '-' is *not* a special character, and the brackets
themselves cannot be matched.

Signed-off-by: Jessica Yu <jeyu@redhat.com>
---

This patch adds support for the '%[' conversion specifier for sscanf().
This is useful in cases where we'd like to match substrings delimited by
something other than spaces. The original motivation for this patch
actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
where we were trying to come up with a clean way to parse symbol names with
substrings delimited by periods and commas.

Patch based on linux-next-20160223.

v3:
 - Fix memory leak in error path (kfree() before returning)
 - Remove redundant condition in while loop
 - Style fix (*op)() -> op()

v2:
 - Use kstrndup() to copy the character set from fmt instead of using a
   statically allocated array

 lib/vsprintf.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 525c8e1..983358a 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
 			num++;
 		}
 		continue;
+		case '[':
+		{
+			char *s = (char *)va_arg(args, char *);
+			char *set;
+			size_t (*op)(const char *str, const char *set);
+			size_t len = 0;
+			bool negate = (*(fmt) == '^');
+
+			if (field_width == -1)
+				field_width = SHRT_MAX;
+
+			op = negate ? &strcspn : &strspn;
+			if (negate)
+				fmt++;
+
+			len = strcspn(fmt, "]");
+			/* invalid format; stop here */
+			if (!len)
+				return num;
+
+			set = kstrndup(fmt, len, GFP_KERNEL);
+			if (!set)
+				return num;
+
+			/* advance fmt past ']' */
+			fmt += len + 1;
+
+			len = op(str, set);
+			/* no matches */
+			if (!len) {
+				kfree(set);
+				return num;
+			}
+
+			while (len-- && field_width--)
+				*s++ = *str++;
+			*s = '\0';
+			kfree(set);
+			num++;
+		}
+		continue;
 		case 'o':
 			base = 8;
 			break;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] sscanf: implement basic character sets
  2016-02-23 20:38 [PATCH v3] sscanf: implement basic character sets Jessica Yu
@ 2016-02-23 20:47 ` Kees Cook
  2016-02-23 22:05 ` Andrew Morton
  2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
  2 siblings, 0 replies; 8+ messages in thread
From: Kees Cook @ 2016-02-23 20:47 UTC (permalink / raw)
  To: Jessica Yu; +Cc: Andrew Morton, Rasmus Villemoes, Andy Shevchenko, LKML

On Tue, Feb 23, 2016 at 12:38 PM, Jessica Yu <jeyu@redhat.com> wrote:
> Implement basic character sets for the '%[]' conversion specifier.
>
> The '%[]' conversion specifier matches a nonempty sequence of characters
> from the specified set of accepted (or with '^', rejected) characters
> between the brackets. The substring matched is to be made up of characters
> in (or not in) the set. This implementation differs from its glibc
> counterpart in that it does not support character ranges (e.g., 'a-z' or
> '0-9'), the hyphen '-' is *not* a special character, and the brackets
> themselves cannot be matched.
>
> Signed-off-by: Jessica Yu <jeyu@redhat.com>

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>
> This patch adds support for the '%[' conversion specifier for sscanf().
> This is useful in cases where we'd like to match substrings delimited by
> something other than spaces. The original motivation for this patch
> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
> where we were trying to come up with a clean way to parse symbol names with
> substrings delimited by periods and commas.
>
> Patch based on linux-next-20160223.
>
> v3:
>  - Fix memory leak in error path (kfree() before returning)
>  - Remove redundant condition in while loop
>  - Style fix (*op)() -> op()
>
> v2:
>  - Use kstrndup() to copy the character set from fmt instead of using a
>    statically allocated array
>
>  lib/vsprintf.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 525c8e1..983358a 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>                         num++;
>                 }
>                 continue;
> +               case '[':
> +               {
> +                       char *s = (char *)va_arg(args, char *);
> +                       char *set;
> +                       size_t (*op)(const char *str, const char *set);
> +                       size_t len = 0;
> +                       bool negate = (*(fmt) == '^');
> +
> +                       if (field_width == -1)
> +                               field_width = SHRT_MAX;
> +
> +                       op = negate ? &strcspn : &strspn;
> +                       if (negate)
> +                               fmt++;
> +
> +                       len = strcspn(fmt, "]");
> +                       /* invalid format; stop here */
> +                       if (!len)
> +                               return num;
> +
> +                       set = kstrndup(fmt, len, GFP_KERNEL);
> +                       if (!set)
> +                               return num;
> +
> +                       /* advance fmt past ']' */
> +                       fmt += len + 1;
> +
> +                       len = op(str, set);
> +                       /* no matches */
> +                       if (!len) {
> +                               kfree(set);
> +                               return num;
> +                       }
> +
> +                       while (len-- && field_width--)
> +                               *s++ = *str++;
> +                       *s = '\0';
> +                       kfree(set);
> +                       num++;
> +               }
> +               continue;
>                 case 'o':
>                         base = 8;
>                         break;
> --
> 2.4.3
>



-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] sscanf: implement basic character sets
  2016-02-23 20:38 [PATCH v3] sscanf: implement basic character sets Jessica Yu
  2016-02-23 20:47 ` Kees Cook
@ 2016-02-23 22:05 ` Andrew Morton
  2016-02-24  5:13   ` Jessica Yu
  2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
  2 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2016-02-23 22:05 UTC (permalink / raw)
  To: Jessica Yu; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

On Tue, 23 Feb 2016 15:38:22 -0500 Jessica Yu <jeyu@redhat.com> wrote:

> Implement basic character sets for the '%[]' conversion specifier.
> 
> The '%[]' conversion specifier matches a nonempty sequence of characters
> from the specified set of accepted (or with '^', rejected) characters
> between the brackets. The substring matched is to be made up of characters
> in (or not in) the set. This implementation differs from its glibc
> counterpart in that it does not support character ranges (e.g., 'a-z' or
> '0-9'), the hyphen '-' is *not* a special character, and the brackets
> themselves cannot be matched.
> 
> Signed-off-by: Jessica Yu <jeyu@redhat.com>
> ---
> 
> This patch adds support for the '%[' conversion specifier for sscanf().
> This is useful in cases where we'd like to match substrings delimited by
> something other than spaces. The original motivation for this patch
> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
> where we were trying to come up with a clean way to parse symbol names with
> substrings delimited by periods and commas.

It would be better to include the justification right here in the
changelog please.  Not via some link-to-discussion and definitely not
below the ^--- marker!  It's very important.

The deviation from the glibc behaviour is a bit of a worry,
particularly as it is done in a non-back-compat manner: code which
assumes "-" is non-magic might break if someone later adds range
support.

Presumably we can live with that - there won't be many callsites and
they can be grepped for.  But please, let's get a description of all
these considerations into the code as a comment.  Probably it would be
helpful to include a little usage example in that comment.

> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>  			num++;
>  		}
>  		continue;
> +		case '[':
> +		{
> +			char *s = (char *)va_arg(args, char *);
> +			char *set;
> +			size_t (*op)(const char *str, const char *set);
> +			size_t len = 0;
> +			bool negate = (*(fmt) == '^');
> +
> +			if (field_width == -1)
> +				field_width = SHRT_MAX;
> +
> +			op = negate ? &strcspn : &strspn;
> +			if (negate)
> +				fmt++;
> +
> +			len = strcspn(fmt, "]");
> +			/* invalid format; stop here */
> +			if (!len)
> +				return num;
> +
> +			set = kstrndup(fmt, len, GFP_KERNEL);

Embedding a GFP_KERNEL allocation into vsscanf is problematic - it
limits the situations in which this functionality can be used.

afaict the allocation is there merely so we can null-terminate the
string so we can use existing library functions (strcspn, strspn).  Is
that compromise really worth it?  We could pretty easily convert
strcspn() into

	strcnspn(const char *s, const char *reject, size_t len)

and convert strcspn() to call that (ifndef __HAVE_ARCH_STRCSPN)

In fact I think we could still use strspn() and strcspn() on `fmt'
directly?  We just need to check for the return value exceeding `len'
and if so, treat that as a no-match?


> +			if (!set)
> +				return num;
> +
> +			/* advance fmt past ']' */
> +			fmt += len + 1;
> +
> +			len = op(str, set);
> +			/* no matches */
> +			if (!len) {
> +				kfree(set);
> +				return num;
> +			}
> +
> +			while (len-- && field_width--)
> +				*s++ = *str++;
> +			*s = '\0';
> +			kfree(set);
> +			num++;
> +		}
> +		continue;
>  		case 'o':
>  			base = 8;
>  			break;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] sscanf: implement basic character sets
  2016-02-23 20:38 [PATCH v3] sscanf: implement basic character sets Jessica Yu
  2016-02-23 20:47 ` Kees Cook
  2016-02-23 22:05 ` Andrew Morton
@ 2016-02-23 22:47 ` Rasmus Villemoes
  2016-02-24  0:01   ` Rasmus Villemoes
  2016-02-24  5:39   ` Jessica Yu
  2 siblings, 2 replies; 8+ messages in thread
From: Rasmus Villemoes @ 2016-02-23 22:47 UTC (permalink / raw)
  To: Jessica Yu; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel

On Tue, Feb 23 2016, Jessica Yu <jeyu@redhat.com> wrote:

> Implement basic character sets for the '%[]' conversion specifier.
>
>
>  lib/vsprintf.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 41 insertions(+)
>
> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
> index 525c8e1..983358a 100644
> --- a/lib/vsprintf.c
> +++ b/lib/vsprintf.c
> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>  			num++;
>  		}
>  		continue;
> +		case '[':
> +		{
> +			char *s = (char *)va_arg(args, char *);
> +			char *set;
> +			size_t (*op)(const char *str, const char *set);
> +			size_t len = 0;
> +			bool negate = (*(fmt) == '^');
> +
> +			if (field_width == -1)
> +				field_width = SHRT_MAX;
> +
> +			op = negate ? &strcspn : &strspn;
> +			if (negate)
> +				fmt++;
> +
> +			len = strcspn(fmt, "]");
> +			/* invalid format; stop here */
> +			if (!len)
> +				return num;
> +
> +			set = kstrndup(fmt, len, GFP_KERNEL);
> +			if (!set)
> +				return num;
> +
> +			/* advance fmt past ']' */
> +			fmt += len + 1;
> +
> +			len = op(str, set);
> +			/* no matches */
> +			if (!len) {
> +				kfree(set);
> +				return num;
> +			}
> +
> +			while (len-- && field_width--)
> +				*s++ = *str++;
> +			*s = '\0';
> +			kfree(set);
> +			num++;
> +		}
> +		continue;
>  		case 'o':
>  			base = 8;
>  			break;

(1) How do we know that doing a memory allocation would be ok, and then
with GFP_KERNEL? vsnprintf can be called from just about any context, so
I don't think that would fly there. Sooner or later someone is going to
be calling sscanf with a spinlock held, methinks.

(2) I think a field width should be mandatory (so %[ should simply be
regarded as malformed - it should be %*[ or %n[ for some explicit
decimal n). That will allow the compiler or other static analyzers to do
sanity checking, and we'll probably be saved from a few buffer
overflows down the line.

It's a bit sad that the C standard doesn't include the terminating '\0'
in the field width, so one would sometimes have to write
'(int)sizeof(buf)-1', but there's not much to do about that. On that
note, it seems that your field width handling is off-by-one.

To get rid of the allocation, why not use a small bitmap? Something like

{
  char *s = (char *)va_arg(args, char *);
  DECLARE_BITMAP(map, 256) = {0};
  bool negate = false;

  /* a field width is required, and must provide room for at least a '\0' */
  if (field_width <= 0)
    return num;

  if (*fmt == '^') {
    negate = true;
    ++fmt;
  }
  for ( ; *fmt && *fmt != ']'; ++fmt)
    set_bit((u8)*fmt, map);
  if (!*fmt) // no ], so malformed input
    return num;
  ++fmt;
  if (negate) {
    bitmap_complement(map, map, 256);
    clear_bit(0, map); // this avoids testing *str != '\0' below
  }

  if (!test_bit((u8)*str, map)) // match must be non-empty
    return num;
  while (test_bit((u8)*str, map) && --field_width) {
    *s++ = *str++;
  }
  *s = '\0';
  ++num;
}

Rasmus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3] sscanf: implement basic character sets
  2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
@ 2016-02-24  0:01   ` Rasmus Villemoes
  2016-02-24  5:39   ` Jessica Yu
  1 sibling, 0 replies; 8+ messages in thread
From: Rasmus Villemoes @ 2016-02-24  0:01 UTC (permalink / raw)
  To: Jessica Yu; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel

On Tue, Feb 23 2016, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> On that note, it seems that your field width handling is off-by-one.

Sorry about that, it's me who's off-by-one.

Rasmus


> To get rid of the allocation, why not use a small bitmap? Something like
>
> {
>   char *s = (char *)va_arg(args, char *);
>   DECLARE_BITMAP(map, 256) = {0};
>   bool negate = false;
>
>   /* a field width is required, and must provide room for at least a '\0' */
>   if (field_width <= 0)
>     return num;
>

should be

/* a field width is required */
if (field_width < 0)

and

>   while (test_bit((u8)*str, map) && --field_width) {

should be field_width--, exactly as in your code.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-23 22:05 ` Andrew Morton
@ 2016-02-24  5:13   ` Jessica Yu
  2016-02-24  5:28     ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Jessica Yu @ 2016-02-24  5:13 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

+++ Andrew Morton [23/02/16 14:05 -0800]:
>On Tue, 23 Feb 2016 15:38:22 -0500 Jessica Yu <jeyu@redhat.com> wrote:
>
>> Implement basic character sets for the '%[]' conversion specifier.
>>
>> The '%[]' conversion specifier matches a nonempty sequence of characters
>> from the specified set of accepted (or with '^', rejected) characters
>> between the brackets. The substring matched is to be made up of characters
>> in (or not in) the set. This implementation differs from its glibc
>> counterpart in that it does not support character ranges (e.g., 'a-z' or
>> '0-9'), the hyphen '-' is *not* a special character, and the brackets
>> themselves cannot be matched.
>>
>> Signed-off-by: Jessica Yu <jeyu@redhat.com>
>> ---
>>
>> This patch adds support for the '%[' conversion specifier for sscanf().
>> This is useful in cases where we'd like to match substrings delimited by
>> something other than spaces. The original motivation for this patch
>> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
>> where we were trying to come up with a clean way to parse symbol names with
>> substrings delimited by periods and commas.
>
> It would be better to include the justification right here in the
> changelog please. 
> Not via some link-to-discussion and definitely not
> below the ^--- marker!  It's very important.

Thanks for the corrections Andrew. I am however slightly confused, are
you suggesting that I should provide a much more thorough explanation
about the motivation here in the changelog (below the ^--- marker), or
would this be better suited for a (separate) cover letter?

>The deviation from the glibc behaviour is a bit of a worry,
>particularly as it is done in a non-back-compat manner: code which
>assumes "-" is non-magic might break if someone later adds range
>support.
>
>Presumably we can live with that - there won't be many callsites and
>they can be grepped for.  But please, let's get a description of all
>these considerations into the code as a comment.  Probably it would be
>helpful to include a little usage example in that comment.

Hm, that is a very good point. At the moment we can be sure there
aren't any users of sscanf() using the %[ conversion specifier, as it
doesn't exist yet :-) But yes, this behavior should be documented
clearly in a comment, so future users will be aware..

>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>>  			num++;
>>  		}
>>  		continue;
>> +		case '[':
>> +		{
>> +			char *s = (char *)va_arg(args, char *);
>> +			char *set;
>> +			size_t (*op)(const char *str, const char *set);
>> +			size_t len = 0;
>> +			bool negate = (*(fmt) == '^');
>> +
>> +			if (field_width == -1)
>> +				field_width = SHRT_MAX;
>> +
>> +			op = negate ? &strcspn : &strspn;
>> +			if (negate)
>> +				fmt++;
>> +
>> +			len = strcspn(fmt, "]");
>> +			/* invalid format; stop here */
>> +			if (!len)
>> +				return num;
>> +
>> +			set = kstrndup(fmt, len, GFP_KERNEL);
>
>Embedding a GFP_KERNEL allocation into vsscanf is problematic - it
>limits the situations in which this functionality can be used.
>
>afaict the allocation is there merely so we can null-terminate the
>string so we can use existing library functions (strcspn, strspn).  Is
>that compromise really worth it?  We could pretty easily convert
>strcspn() into
>
>	strcnspn(const char *s, const char *reject, size_t len)
>
>and convert strcspn() to call that (ifndef __HAVE_ARCH_STRCSPN)
>
>In fact I think we could still use strspn() and strcspn() on `fmt'
>directly?  We just need to check for the return value exceeding `len'
>and if so, treat that as a no-match?
>

Perhaps we can use Rasmus' bitmap solution, as it avoids the
allocation altogether and it doesn't need to use strspn()/strcspn().

>> +			if (!set)
>> +				return num;
>> +
>> +			/* advance fmt past ']' */
>> +			fmt += len + 1;
>> +
>> +			len = op(str, set);
>> +			/* no matches */
>> +			if (!len) {
>> +				kfree(set);
>> +				return num;
>> +			}
>> +
>> +			while (len-- && field_width--)
>> +				*s++ = *str++;
>> +			*s = '\0';
>> +			kfree(set);
>> +			num++;
>> +		}
>> +		continue;
>>  		case 'o':
>>  			base = 8;
>>  			break;
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-24  5:13   ` Jessica Yu
@ 2016-02-24  5:28     ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2016-02-24  5:28 UTC (permalink / raw)
  To: Jessica Yu; +Cc: Rasmus Villemoes, Andy Shevchenko, Kees Cook, linux-kernel

On Wed, 24 Feb 2016 00:13:47 -0500 Jessica Yu <jeyu@redhat.com> wrote:

> >> This patch adds support for the '%[' conversion specifier for sscanf().
> >> This is useful in cases where we'd like to match substrings delimited by
> >> something other than spaces. The original motivation for this patch
> >> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
> >> where we were trying to come up with a clean way to parse symbol names with
> >> substrings delimited by periods and commas.
> >
> > It would be better to include the justification right here in the
> > changelog please. 
> > Not via some link-to-discussion and definitely not
> > below the ^--- marker!  It's very important.
> 
> Thanks for the corrections Andrew. I am however slightly confused, are
> you suggesting that I should provide a much more thorough explanation
> about the motivation here in the changelog (below the ^--- marker), or
> would this be better suited for a (separate) cover letter?

Just in the plain old changelog is good - if it was in [0/n] I'd only
move it into the changelog anyway.

And 99.9% of the stuff people put below ^--- is useful so I always end
up moving that into the changelog as well...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sscanf: implement basic character sets
  2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
  2016-02-24  0:01   ` Rasmus Villemoes
@ 2016-02-24  5:39   ` Jessica Yu
  1 sibling, 0 replies; 8+ messages in thread
From: Jessica Yu @ 2016-02-24  5:39 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Andrew Morton, Andy Shevchenko, Kees Cook, linux-kernel

+++ Rasmus Villemoes [23/02/16 23:47 +0100]:
>On Tue, Feb 23 2016, Jessica Yu <jeyu@redhat.com> wrote:
>
>> Implement basic character sets for the '%[]' conversion specifier.
>>
>>
>>  lib/vsprintf.c | 41 +++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 41 insertions(+)
>>
>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>> index 525c8e1..983358a 100644
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>>  			num++;
>>  		}
>>  		continue;
>> +		case '[':
>> +		{
>> +			char *s = (char *)va_arg(args, char *);
>> +			char *set;
>> +			size_t (*op)(const char *str, const char *set);
>> +			size_t len = 0;
>> +			bool negate = (*(fmt) == '^');
>> +
>> +			if (field_width == -1)
>> +				field_width = SHRT_MAX;
>> +
>> +			op = negate ? &strcspn : &strspn;
>> +			if (negate)
>> +				fmt++;
>> +
>> +			len = strcspn(fmt, "]");
>> +			/* invalid format; stop here */
>> +			if (!len)
>> +				return num;
>> +
>> +			set = kstrndup(fmt, len, GFP_KERNEL);
>> +			if (!set)
>> +				return num;
>> +
>> +			/* advance fmt past ']' */
>> +			fmt += len + 1;
>> +
>> +			len = op(str, set);
>> +			/* no matches */
>> +			if (!len) {
>> +				kfree(set);
>> +				return num;
>> +			}
>> +
>> +			while (len-- && field_width--)
>> +				*s++ = *str++;
>> +			*s = '\0';
>> +			kfree(set);
>> +			num++;
>> +		}
>> +		continue;
>>  		case 'o':
>>  			base = 8;
>>  			break;
>
>(1) How do we know that doing a memory allocation would be ok, and then
>with GFP_KERNEL? vsnprintf can be called from just about any context, so
>I don't think that would fly there. Sooner or later someone is going to
>be calling sscanf with a spinlock held, methinks.
>
>(2) I think a field width should be mandatory (so %[ should simply be
>regarded as malformed - it should be %*[ or %n[ for some explicit
>decimal n). That will allow the compiler or other static analyzers to do
>sanity checking, and we'll probably be saved from a few buffer
>overflows down the line.
>
>It's a bit sad that the C standard doesn't include the terminating '\0'
>in the field width, so one would sometimes have to write
>'(int)sizeof(buf)-1', but there's not much to do about that. On that
>note, it seems that your field width handling is off-by-one.
>
>To get rid of the allocation, why not use a small bitmap? Something like
>
>{
>  char *s = (char *)va_arg(args, char *);
>  DECLARE_BITMAP(map, 256) = {0};
>  bool negate = false;
>
>  /* a field width is required, and must provide room for at least a '\0' */
>  if (field_width <= 0)
>    return num;
>
>  if (*fmt == '^') {
>    negate = true;
>    ++fmt;
>  }
>  for ( ; *fmt && *fmt != ']'; ++fmt)
>    set_bit((u8)*fmt, map);
>  if (!*fmt) // no ], so malformed input
>    return num;
>  ++fmt;
>  if (negate) {
>    bitmap_complement(map, map, 256);
>    clear_bit(0, map); // this avoids testing *str != '\0' below
>  }
>
>  if (!test_bit((u8)*str, map)) // match must be non-empty
>    return num;
>  while (test_bit((u8)*str, map) && --field_width) {
>    *s++ = *str++;
>  }
>  *s = '\0';
>  ++num;
>}

I quite like this idea, as it avoids allocations and doesn't need
strcspn/strspn. What do other people think?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-02-24  5:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-23 20:38 [PATCH v3] sscanf: implement basic character sets Jessica Yu
2016-02-23 20:47 ` Kees Cook
2016-02-23 22:05 ` Andrew Morton
2016-02-24  5:13   ` Jessica Yu
2016-02-24  5:28     ` Andrew Morton
2016-02-23 22:47 ` [PATCH v3] " Rasmus Villemoes
2016-02-24  0:01   ` Rasmus Villemoes
2016-02-24  5:39   ` Jessica Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).