From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=38154 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PzES8-0001fm-HI for qemu-devel@nongnu.org; Mon, 14 Mar 2011 16:33:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PzES7-0000Es-1O for qemu-devel@nongnu.org; Mon, 14 Mar 2011 16:33:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40414) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PzES6-0000Eg-NX for qemu-devel@nongnu.org; Mon, 14 Mar 2011 16:33:19 -0400 Date: Mon, 14 Mar 2011 17:33:11 -0300 From: Luiz Capitulino Subject: Re: [Qemu-devel] Re: [PATCH 09/11] json-lexer: limit the maximum size of a given token Message-ID: <20110314173311.23dfa5f7@doriath> In-Reply-To: <4D7E7831.1060306@codemonkey.ws> References: <1299877249-13433-1-git-send-email-aliguori@us.ibm.com> <1299877249-13433-10-git-send-email-aliguori@us.ibm.com> <20110314162502.12a7deab@doriath> <4D7E7831.1060306@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Paolo Bonzini , Anthony Liguori , qemu-devel@nongnu.org, Michael Roth , Markus Armbruster On Mon, 14 Mar 2011 15:18:57 -0500 Anthony Liguori wrote: > On 03/14/2011 02:25 PM, Luiz Capitulino wrote: > > On Fri, 11 Mar 2011 15:00:47 -0600 > > Anthony Liguori wrote: > > > >> This is a security consideration. We don't want a client to cause an arbitrary > >> amount of memory to be allocated in QEMU. For now, we use a limit of 64MB > >> which should be large enough for any reasonably sized token. > >> > >> This is important for parsing JSON from untrusted sources. > >> > >> Signed-off-by: Anthony Liguori > >> > >> diff --git a/json-lexer.c b/json-lexer.c > >> index 834d7af..3462c89 100644 > >> --- a/json-lexer.c > >> +++ b/json-lexer.c > >> @@ -18,6 +18,8 @@ > >> #include "qemu-common.h" > >> #include "json-lexer.h" > >> > >> +#define MAX_TOKEN_SIZE (64ULL<< 20) > >> + > >> /* > >> * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\" > >> * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*' > >> @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch) > >> } > >> lexer->state = new_state; > >> } while (!char_consumed); > >> + > >> + /* Do not let a single token grow to an arbitrarily large size, > >> + * this is a security consideration. > >> + */ > >> + if (lexer->token->length> MAX_TOKEN_SIZE) { > >> + lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y); > >> + QDECREF(lexer->token); > >> + lexer->token = qstring_new(); > >> + lexer->state = IN_START; > >> + } > > Entering an invalid token is an error, we should fail here. > > It's not so clear to me. > > I think of it like GCC. GCC doesn't bail out on the first invalid > character the lexer encounters. Instead, it records that an error has > occurred and tries its best to recover. > > The result is that instead of getting an error message about the first > error in your code, you'll get a long listing of all the mistakes you > made (usually). I'm not a compiler expert, but I think compilers do this because it would be very problematic from a usability perspective to report one error at time: by reporting most errors per run, the compiler gives the user to fix as much as possible programming mistakes w/o having to re-run. Our target are not humans, though. > One thing that makes this more difficult in our case is that when you're > testing, we don't have a clear EOI to flush things out. So a bad > sequence of inputs might make the message parser wait for an a character > that you're not necessarily inputting which makes the session appear > hung. Usually, if you throw a couple extra brackets around, you'll get > back to valid input. Not sure if this is easy to do, but shouldn't we fail if we don't get the expected character? > > Which brings > > two features: > > > > 1. A test code could trigger this condition and check for the specific > > error code > > > > 2. Developers will know when they hit the limit. Although I don't expect > > expect this to happen, there was talking about adding base64 support > > to transfer something (I can't remember what, but we never know how the > > protocol will evolve). > > > > Also, by testing this I found that the parser seems to get confused when > > the limit is reached: it stops responding. > > Actually, it does respond. The lexer just takes an incredibly long time > to process a large token because qstring_append_ch is incredibly slow > :-) If you drop the token size down to 64k instead of 64mb, it'll seem > a lot more reasonable. Actually, I dropped it down to 20 :) This is supposed to work like your 64k suggestion, right? > > Regards, > > Anthony Liguori > > >> + > >> return 0; > >> } > >> > > >