From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=36943 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OF91d-0001RU-Sy for qemu-devel@nongnu.org; Thu, 20 May 2010 12:55:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OF91W-0001Nb-Q4 for qemu-devel@nongnu.org; Thu, 20 May 2010 12:55:13 -0400 Received: from mail-qy0-f173.google.com ([209.85.221.173]:60571) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OF91W-0001N6-KL for qemu-devel@nongnu.org; Thu, 20 May 2010 12:55:06 -0400 Received: by qyk4 with SMTP id 4so21919qyk.18 for ; Thu, 20 May 2010 09:55:05 -0700 (PDT) Message-ID: <4BF56964.8030603@codemonkey.ws> Date: Thu, 20 May 2010 11:55:00 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <1274303733-3700-1-git-send-email-lcapitulino@redhat.com> <1274303733-3700-3-git-send-email-lcapitulino@redhat.com> <4BF45BCF.5090300@codemonkey.ws> <20100520104433.1be3167c@redhat.com> <4BF55231.8020208@redhat.com> <4BF55A51.1080506@codemonkey.ws> <20100520132710.1e906771@redhat.com> In-Reply-To: <20100520132710.1e906771@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH 2/6] json-lexer: Handle missing escapes List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Luiz Capitulino Cc: Paolo Bonzini , aliguori@us.ibm.com, qemu-devel@nongnu.org On 05/20/2010 11:27 AM, Luiz Capitulino wrote: > On Thu, 20 May 2010 10:50:41 -0500 > Anthony Liguori wrote: > > >> On 05/20/2010 10:16 AM, Paolo Bonzini wrote: >> >>> On 05/20/2010 03:44 PM, Luiz Capitulino wrote: >>> >>>> I think there's another issue in the handling of strings. >>>> >>>> The spec says that valid unescaped chars are in the following range: >>>> >>>> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF >>>> >> That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in >> strings. Any parser that didn't accept that would be broken. >> > Honestly, I had the impression this should be encoded as: %x5C %x74, but > if you're right, wouldn't this be true for other sequences as well? > I don't think most reasonable clients are going to quote tabs as '\t'. Regards, Anthony Liguori >>>> But we do: >>>> >>>> [IN_DQ_STRING] = { >>>> [1 ... 0xFF] = IN_DQ_STRING, >>>> ['\\'] = IN_DQ_STRING_ESCAPE, >>>> ['"'] = IN_DONE_STRING, >>>> }, >>>> >>>> Shouldn't we cover 0x20 .. 0xFF instead? >>>> >>> If it's the lexer, isn't just it being liberal in what it accepts? >>> >> I believe the parser correctly rejects invalid UTF-8 sequences. >> > Will check. >