Re: [Qemu-devel] Re: [PATCH 09/11] json-lexer: limit the maximum size of a given token

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Luiz Capitulino <lcapitulino@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Anthony Liguori <aliguori@us.ibm.com>,
	qemu-devel@nongnu.org, Michael Roth <mdroth@us.ibm.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [Qemu-devel] Re: [PATCH 09/11] json-lexer: limit the maximum size of a given token
Date: Mon, 14 Mar 2011 17:33:11 -0300	[thread overview]
Message-ID: <20110314173311.23dfa5f7@doriath> (raw)
In-Reply-To: <4D7E7831.1060306@codemonkey.ws>

On Mon, 14 Mar 2011 15:18:57 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/14/2011 02:25 PM, Luiz Capitulino wrote:
> > On Fri, 11 Mar 2011 15:00:47 -0600
> > Anthony Liguori<aliguori@us.ibm.com>  wrote:
> >
> >> This is a security consideration.  We don't want a client to cause an arbitrary
> >> amount of memory to be allocated in QEMU.  For now, we use a limit of 64MB
> >> which should be large enough for any reasonably sized token.
> >>
> >> This is important for parsing JSON from untrusted sources.
> >>
> >> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com>
> >>
> >> diff --git a/json-lexer.c b/json-lexer.c
> >> index 834d7af..3462c89 100644
> >> --- a/json-lexer.c
> >> +++ b/json-lexer.c
> >> @@ -18,6 +18,8 @@
> >>   #include "qemu-common.h"
> >>   #include "json-lexer.h"
> >>
> >> +#define MAX_TOKEN_SIZE (64ULL<<  20)
> >> +
> >>   /*
> >>    * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
> >>    * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*'
> >> @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
> >>           }
> >>           lexer->state = new_state;
> >>       } while (!char_consumed);
> >> +
> >> +    /* Do not let a single token grow to an arbitrarily large size,
> >> +     * this is a security consideration.
> >> +     */
> >> +    if (lexer->token->length>  MAX_TOKEN_SIZE) {
> >> +        lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y);
> >> +        QDECREF(lexer->token);
> >> +        lexer->token = qstring_new();
> >> +        lexer->state = IN_START;
> >> +    }
> > Entering an invalid token is an error, we should fail here.
> 
> It's not so clear to me.
> 
> I think of it like GCC.  GCC doesn't bail out on the first invalid 
> character the lexer encounters.  Instead, it records that an error has 
> occurred and tries its best to recover.
> 
> The result is that instead of getting an error message about the first 
> error in your code, you'll get a long listing of all the mistakes you 
> made (usually).

I'm not a compiler expert, but I think compilers do this because it would be
very problematic from a usability perspective to report one error at time:
by reporting most errors per run, the compiler gives the user to fix as much
as possible programming mistakes w/o having to re-run.

Our target are not humans, though.

> One thing that makes this more difficult in our case is that when you're 
> testing, we don't have a clear EOI to flush things out.  So a bad 
> sequence of inputs might make the message parser wait for an a character 
> that you're not necessarily inputting which makes the session appear 
> hung.  Usually, if you throw a couple extra brackets around, you'll get 
> back to valid input.

Not sure if this is easy to do, but shouldn't we fail if we don't get
the expected character?

> >   Which brings
> > two features:
> >
> >   1. A test code could trigger this condition and check for the specific
> >      error code
> >
> >   2. Developers will know when they hit the limit. Although I don't expect
> >      expect this to happen, there was talking about adding base64 support
> >      to transfer something (I can't remember what, but we never know how the
> >      protocol will evolve).
> >
> > Also, by testing this I found that the parser seems to get confused when
> > the limit is reached: it stops responding.
> 
> Actually, it does respond.  The lexer just takes an incredibly long time 
> to process a large token because qstring_append_ch is incredibly slow 
> :-)  If you drop the token size down to 64k instead of 64mb, it'll seem 
> a lot more reasonable.

Actually, I dropped it down to 20 :) This is supposed to work like your 64k
suggestion, right?

> 
> Regards,
> 
> Anthony Liguori
> 
> >> +
> >>       return 0;
> >>   }
> >>
> >
>

next prev parent reply	other threads:[~2011-03-14 20:33 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-11 21:00 [Qemu-devel] [00/11] QAPI Round 0 (JSON improvements) Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 01/11] Add hard build dependency on glib Anthony Liguori
2011-03-12  8:09   ` [Qemu-devel] " Paolo Bonzini
2011-03-12 14:52     ` Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 02/11] qerror: expose a function to format an error Anthony Liguori
2011-03-11 21:08   ` [Qemu-devel] " Anthony Liguori
2011-03-14 19:17     ` Luiz Capitulino
2011-03-14 19:27       ` Anthony Liguori
2011-03-14 19:37         ` Luiz Capitulino
2011-03-14 19:45           ` Anthony Liguori
2011-03-14 20:22             ` Luiz Capitulino
2011-03-14 20:41               ` Anthony Liguori
2011-03-14 20:48                 ` Luiz Capitulino
2011-03-14 21:03                   ` Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 03/11] add a generic Error object Anthony Liguori
2011-03-12 11:05   ` Blue Swirl
2011-03-12 14:51     ` Anthony Liguori
2011-03-14 19:18   ` [Qemu-devel] " Luiz Capitulino
2011-03-14 19:34     ` Anthony Liguori
2011-03-14 19:57       ` Luiz Capitulino
2011-03-11 21:00 ` [Qemu-devel] [PATCH 04/11] qerror: split out the reporting bits of QError Anthony Liguori
2011-03-14 19:18   ` [Qemu-devel] " Luiz Capitulino
2011-03-14 19:24     ` Anthony Liguori
2011-03-14 19:30       ` Luiz Capitulino
2011-03-14 20:30         ` Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 05/11] qerror: add new error message for invalid enum values Anthony Liguori
2011-03-14 19:19   ` [Qemu-devel] " Luiz Capitulino
2011-03-11 21:00 ` [Qemu-devel] [PATCH 06/11] qerror: add JSON parsing error message Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 07/11] json: propagate error from parser Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 08/11] json-lexer: reset the lexer state on an invalid token Anthony Liguori
2011-03-14 19:22   ` [Qemu-devel] " Luiz Capitulino
2011-03-14 19:43     ` Anthony Liguori
2011-03-14 20:12       ` Luiz Capitulino
2011-03-14 20:30         ` Anthony Liguori
2011-03-14 20:43           ` Luiz Capitulino
2011-03-11 21:00 ` [Qemu-devel] [PATCH 09/11] json-lexer: limit the maximum size of a given token Anthony Liguori
2011-03-14 19:25   ` [Qemu-devel] " Luiz Capitulino
2011-03-14 20:18     ` Anthony Liguori
2011-03-14 20:33       ` Luiz Capitulino [this message]
2011-03-11 21:00 ` [Qemu-devel] [PATCH 10/11] json-streamer: limit the maximum recursion depth and maximum token count Anthony Liguori
2011-03-11 23:16   ` Michael Roth
2011-03-12 15:03     ` Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 11/11] json-parser: detect premature EOI Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110314173311.23dfa5f7@doriath \
    --to=lcapitulino@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=anthony@codemonkey.ws \
    --cc=armbru@redhat.com \
    --cc=mdroth@us.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.