From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=57274 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PzEEN-00048U-C4 for qemu-devel@nongnu.org; Mon, 14 Mar 2011 16:19:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PzEEH-00067b-9i for qemu-devel@nongnu.org; Mon, 14 Mar 2011 16:19:06 -0400 Received: from mail-yi0-f45.google.com ([209.85.218.45]:61214) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PzEEH-00067T-5l for qemu-devel@nongnu.org; Mon, 14 Mar 2011 16:19:01 -0400 Received: by yib19 with SMTP id 19so2691700yib.4 for ; Mon, 14 Mar 2011 13:19:00 -0700 (PDT) Message-ID: <4D7E7831.1060306@codemonkey.ws> Date: Mon, 14 Mar 2011 15:18:57 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH 09/11] json-lexer: limit the maximum size of a given token References: <1299877249-13433-1-git-send-email-aliguori@us.ibm.com> <1299877249-13433-10-git-send-email-aliguori@us.ibm.com> <20110314162502.12a7deab@doriath> In-Reply-To: <20110314162502.12a7deab@doriath> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: Paolo Bonzini , Anthony Liguori , qemu-devel@nongnu.org, Michael Roth , Markus Armbruster On 03/14/2011 02:25 PM, Luiz Capitulino wrote: > On Fri, 11 Mar 2011 15:00:47 -0600 > Anthony Liguori wrote: > >> This is a security consideration. We don't want a client to cause an arbitrary >> amount of memory to be allocated in QEMU. For now, we use a limit of 64MB >> which should be large enough for any reasonably sized token. >> >> This is important for parsing JSON from untrusted sources. >> >> Signed-off-by: Anthony Liguori >> >> diff --git a/json-lexer.c b/json-lexer.c >> index 834d7af..3462c89 100644 >> --- a/json-lexer.c >> +++ b/json-lexer.c >> @@ -18,6 +18,8 @@ >> #include "qemu-common.h" >> #include "json-lexer.h" >> >> +#define MAX_TOKEN_SIZE (64ULL<< 20) >> + >> /* >> * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\" >> * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*' >> @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch) >> } >> lexer->state = new_state; >> } while (!char_consumed); >> + >> + /* Do not let a single token grow to an arbitrarily large size, >> + * this is a security consideration. >> + */ >> + if (lexer->token->length> MAX_TOKEN_SIZE) { >> + lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y); >> + QDECREF(lexer->token); >> + lexer->token = qstring_new(); >> + lexer->state = IN_START; >> + } > Entering an invalid token is an error, we should fail here. It's not so clear to me. I think of it like GCC. GCC doesn't bail out on the first invalid character the lexer encounters. Instead, it records that an error has occurred and tries its best to recover. The result is that instead of getting an error message about the first error in your code, you'll get a long listing of all the mistakes you made (usually). One thing that makes this more difficult in our case is that when you're testing, we don't have a clear EOI to flush things out. So a bad sequence of inputs might make the message parser wait for an a character that you're not necessarily inputting which makes the session appear hung. Usually, if you throw a couple extra brackets around, you'll get back to valid input. > Which brings > two features: > > 1. A test code could trigger this condition and check for the specific > error code > > 2. Developers will know when they hit the limit. Although I don't expect > expect this to happen, there was talking about adding base64 support > to transfer something (I can't remember what, but we never know how the > protocol will evolve). > > Also, by testing this I found that the parser seems to get confused when > the limit is reached: it stops responding. Actually, it does respond. The lexer just takes an incredibly long time to process a large token because qstring_append_ch is incredibly slow :-) If you drop the token size down to 64k instead of 64mb, it'll seem a lot more reasonable. Regards, Anthony Liguori >> + >> return 0; >> } >> >