From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=57274 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PzEEN-00048U-C4
	for qemu-devel@nongnu.org; Mon, 14 Mar 2011 16:19:08 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1PzEEH-00067b-9i
	for qemu-devel@nongnu.org; Mon, 14 Mar 2011 16:19:06 -0400
Received: from mail-yi0-f45.google.com ([209.85.218.45]:61214)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1PzEEH-00067T-5l
	for qemu-devel@nongnu.org; Mon, 14 Mar 2011 16:19:01 -0400
Received: by yib19 with SMTP id 19so2691700yib.4
	for <qemu-devel@nongnu.org>; Mon, 14 Mar 2011 13:19:00 -0700 (PDT)
Message-ID: <4D7E7831.1060306@codemonkey.ws>
Date: Mon, 14 Mar 2011 15:18:57 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: [PATCH 09/11] json-lexer: limit the maximum
	size of a given token
References: <1299877249-13433-1-git-send-email-aliguori@us.ibm.com>	<1299877249-13433-10-git-send-email-aliguori@us.ibm.com>
	<20110314162502.12a7deab@doriath>
In-Reply-To: <20110314162502.12a7deab@doriath>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Anthony Liguori <aliguori@us.ibm.com>, qemu-devel@nongnu.org, Michael Roth <mdroth@us.ibm.com>, Markus Armbruster <armbru@redhat.com>

On 03/14/2011 02:25 PM, Luiz Capitulino wrote:
> On Fri, 11 Mar 2011 15:00:47 -0600
> Anthony Liguori<aliguori@us.ibm.com>  wrote:
>
>> This is a security consideration.  We don't want a client to cause an arbitrary
>> amount of memory to be allocated in QEMU.  For now, we use a limit of 64MB
>> which should be large enough for any reasonably sized token.
>>
>> This is important for parsing JSON from untrusted sources.
>>
>> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com>
>>
>> diff --git a/json-lexer.c b/json-lexer.c
>> index 834d7af..3462c89 100644
>> --- a/json-lexer.c
>> +++ b/json-lexer.c
>> @@ -18,6 +18,8 @@
>>   #include "qemu-common.h"
>>   #include "json-lexer.h"
>>
>> +#define MAX_TOKEN_SIZE (64ULL<<  20)
>> +
>>   /*
>>    * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\"
>>    * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*'
>> @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
>>           }
>>           lexer->state = new_state;
>>       } while (!char_consumed);
>> +
>> +    /* Do not let a single token grow to an arbitrarily large size,
>> +     * this is a security consideration.
>> +     */
>> +    if (lexer->token->length>  MAX_TOKEN_SIZE) {
>> +        lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y);
>> +        QDECREF(lexer->token);
>> +        lexer->token = qstring_new();
>> +        lexer->state = IN_START;
>> +    }
> Entering an invalid token is an error, we should fail here.

It's not so clear to me.

I think of it like GCC.  GCC doesn't bail out on the first invalid 
character the lexer encounters.  Instead, it records that an error has 
occurred and tries its best to recover.

The result is that instead of getting an error message about the first 
error in your code, you'll get a long listing of all the mistakes you 
made (usually).

One thing that makes this more difficult in our case is that when you're 
testing, we don't have a clear EOI to flush things out.  So a bad 
sequence of inputs might make the message parser wait for an a character 
that you're not necessarily inputting which makes the session appear 
hung.  Usually, if you throw a couple extra brackets around, you'll get 
back to valid input.

>   Which brings
> two features:
>
>   1. A test code could trigger this condition and check for the specific
>      error code
>
>   2. Developers will know when they hit the limit. Although I don't expect
>      expect this to happen, there was talking about adding base64 support
>      to transfer something (I can't remember what, but we never know how the
>      protocol will evolve).
>
> Also, by testing this I found that the parser seems to get confused when
> the limit is reached: it stops responding.

Actually, it does respond.  The lexer just takes an incredibly long time 
to process a large token because qstring_append_ch is incredibly slow 
:-)  If you drop the token size down to 64k instead of 64mb, it'll seem 
a lot more reasonable.

Regards,

Anthony Liguori

>> +
>>       return 0;
>>   }
>>
>