Re: [Qemu-devel] Re: [PATCH 09/11] json-lexer: limit the maximum size of a given token

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Luiz Capitulino <lcapitulino@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Anthony Liguori <aliguori@us.ibm.com>,
	qemu-devel@nongnu.org, Michael Roth <mdroth@us.ibm.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [Qemu-devel] Re: [PATCH 09/11] json-lexer: limit the maximum size of a given token
Date: Mon, 14 Mar 2011 17:33:11 -0300	[thread overview]
Message-ID: <20110314173311.23dfa5f7@doriath> (raw)
In-Reply-To: <4D7E7831.1060306@codemonkey.ws>

On Mon, 14 Mar 2011 15:18:57 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 03/14/2011 02:25 PM, Luiz Capitulino wrote:
> > On Fri, 11 Mar 2011 15:00:47 -0600
> > Anthony Liguori<aliguori@us.ibm.com>  wrote:
> >
> >> This is a security consideration.  We don't want a client to cause an arbitrary
> >> amount of memory to be allocated in QEMU.  For now, we use a limit of 64MB
> >> which should be large enough for any reasonably sized token.
> >>
> >> This is important for parsing JSON from untrusted sources.
> >>
> >> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com>
> >>
> >> diff --git a/json-lexer.c b/json-lexer.c
> >> index 834d7af..3462c89 100644
> >> --- a/json-lexer.c
> >> +++ b/json-lexer.c
> >> @@ -18,6 +18,8 @@
> >>   #include "qemu-common.h"
> >>   #include "json-lexer.h"
> >>
> >> +#define MAX_TOKEN_SIZE (64ULL<<  20)
> >> +
> >>   /*
> >>    * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
> >>    * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*'
> >> @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
> >>           }
> >>           lexer->state = new_state;
> >>       } while (!char_consumed);
> >> +
> >> +    /* Do not let a single token grow to an arbitrarily large size,
> >> +     * this is a security consideration.
> >> +     */
> >> +    if (lexer->token->length>  MAX_TOKEN_SIZE) {
> >> +        lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y);
> >> +        QDECREF(lexer->token);
> >> +        lexer->token = qstring_new();
> >> +        lexer->state = IN_START;
> >> +    }
> > Entering an invalid token is an error, we should fail here.
> 
> It's not so clear to me.
> 
> I think of it like GCC.  GCC doesn't bail out on the first invalid 
> character the lexer encounters.  Instead, it records that an error has 
> occurred and tries its best to recover.
> 
> The result is that instead of getting an error message about the first 
> error in your code, you'll get a long listing of all the mistakes you 
> made (usually).

I'm not a compiler expert, but I think compilers do this because it would be
very problematic from a usability perspective to report one error at time:
by reporting most errors per run, the compiler gives the user to fix as much
as possible programming mistakes w/o having to re-run.

Our target are not humans, though.

> One thing that makes this more difficult in our case is that when you're 
> testing, we don't have a clear EOI to flush things out.  So a bad 
> sequence of inputs might make the message parser wait for an a character 
> that you're not necessarily inputting which makes the session appear 
> hung.  Usually, if you throw a couple extra brackets around, you'll get 
> back to valid input.

Not sure if this is easy to do, but shouldn't we fail if we don't get
the expected character?

> >   Which brings
> > two features:
> >
> >   1. A test code could trigger this condition and check for the specific
> >      error code
> >
> >   2. Developers will know when they hit the limit. Although I don't expect
> >      expect this to happen, there was talking about adding base64 support
> >      to transfer something (I can't remember what, but we never know how the
> >      protocol will evolve).
> >
> > Also, by testing this I found that the parser seems to get confused when
> > the limit is reached: it stops responding.
> 
> Actually, it does respond.  The lexer just takes an incredibly long time 
> to process a large token because qstring_append_ch is incredibly slow 
> :-)  If you drop the token size down to 64k instead of 64mb, it'll seem 
> a lot more reasonable.

Actually, I dropped it down to 20 :) This is supposed to work like your 64k
suggestion, right?

> 
> Regards,
> 
> Anthony Liguori
> 
> >> +
> >>       return 0;
> >>   }
> >>
> >
>

next prev parent reply	other threads:[~2011-03-14 20:33 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-11 21:00 [Qemu-devel] [00/11] QAPI Round 0 (JSON improvements) Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 01/11] Add hard build dependency on glib Anthony Liguori
2011-03-12  8:09   ` [Qemu-devel] " Paolo Bonzini
2011-03-12 14:52     ` Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 02/11] qerror: expose a function to format an error Anthony Liguori
2011-03-11 21:08   ` [Qemu-devel] " Anthony Liguori
2011-03-14 19:17     ` Luiz Capitulino
2011-03-14 19:27       ` Anthony Liguori
2011-03-14 19:37         ` Luiz Capitulino
2011-03-14 19:45           ` Anthony Liguori
2011-03-14 20:22             ` Luiz Capitulino
2011-03-14 20:41               ` Anthony Liguori
2011-03-14 20:48                 ` Luiz Capitulino
2011-03-14 21:03                   ` Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 03/11] add a generic Error object Anthony Liguori
2011-03-12 11:05   ` Blue Swirl
2011-03-12 14:51     ` Anthony Liguori
2011-03-14 19:18   ` [Qemu-devel] " Luiz Capitulino
2011-03-14 19:34     ` Anthony Liguori
2011-03-14 19:57       ` Luiz Capitulino
2011-03-11 21:00 ` [Qemu-devel] [PATCH 04/11] qerror: split out the reporting bits of QError Anthony Liguori
2011-03-14 19:18   ` [Qemu-devel] " Luiz Capitulino
2011-03-14 19:24     ` Anthony Liguori
2011-03-14 19:30       ` Luiz Capitulino
2011-03-14 20:30         ` Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 05/11] qerror: add new error message for invalid enum values Anthony Liguori
2011-03-14 19:19   ` [Qemu-devel] " Luiz Capitulino
2011-03-11 21:00 ` [Qemu-devel] [PATCH 06/11] qerror: add JSON parsing error message Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 07/11] json: propagate error from parser Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 08/11] json-lexer: reset the lexer state on an invalid token Anthony Liguori
2011-03-14 19:22   ` [Qemu-devel] " Luiz Capitulino
2011-03-14 19:43     ` Anthony Liguori
2011-03-14 20:12       ` Luiz Capitulino
2011-03-14 20:30         ` Anthony Liguori
2011-03-14 20:43           ` Luiz Capitulino
2011-03-11 21:00 ` [Qemu-devel] [PATCH 09/11] json-lexer: limit the maximum size of a given token Anthony Liguori
2011-03-14 19:25   ` [Qemu-devel] " Luiz Capitulino
2011-03-14 20:18     ` Anthony Liguori
2011-03-14 20:33       ` Luiz Capitulino [this message]
2011-03-11 21:00 ` [Qemu-devel] [PATCH 10/11] json-streamer: limit the maximum recursion depth and maximum token count Anthony Liguori
2011-03-11 23:16   ` Michael Roth
2011-03-12 15:03     ` Anthony Liguori
2011-03-11 21:00 ` [Qemu-devel] [PATCH 11/11] json-parser: detect premature EOI Anthony Liguori

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110314173311.23dfa5f7@doriath \
    --to=lcapitulino@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=anthony@codemonkey.ws \
    --cc=armbru@redhat.com \
    --cc=mdroth@us.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).