qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: blauwirbel@gmail.com, aliguori@us.ibm.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings
Date: Fri, 22 Mar 2013 15:52:09 +0100	[thread overview]
Message-ID: <514C7019.60800@redhat.com> (raw)
In-Reply-To: <87txo3tsa7.fsf@blackfin.pond.sub.org>

On 03/22/13 15:37, Markus Armbruster wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
> 
>> On 03/14/13 18:49, Markus Armbruster wrote:
>>> These are all broken, too.
>>
>> What are "these"? And how are they broken? And how does the patch fix them?
> 
> "These" refers to the subject: noncharacters other than U+FFFE, U+FFFF.
> 
> I agree that I should better explain how they're broken, and what the
> patch does to fix them.  Will fix on respin.
> 
>>>
>>> A few test cases use noncharacters U+FFFF and U+10FFFF.  Risks testing
>>> noncharacters some more instead of what they're supposed to test.  Use
>>> U+FFFD and U+10FFFD instead.
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>>>  tests/check-qjson.c | 85
>>> +++++++++++++++++++++++++++++++++++++++++++++--------
>>>  1 file changed, 72 insertions(+), 13 deletions(-)
>>
>> I'm confused about the commit message. There are three paragraphs in it
>> (the title, the first paragraph, and the 2nd paragraph). This patch
>> modifies different tests:
>>
>>> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
>>> index 852124a..efec1b2 100644
>>> --- a/tests/check-qjson.c
>>> +++ b/tests/check-qjson.c
>>> @@ -158,7 +158,7 @@ static void utf8_string(void)
>>>       * consider using overlong encoding \xC0\x80 for U+0000 ("modified
>>>       * UTF-8").
>>>       *
>>> -     * Test cases are scraped from Markus Kuhn's UTF-8 decoder
>>> +     * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
>>>       * capability and stress test at
>>>       * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
>>>       */
>>> @@ -256,11 +256,11 @@ static void utf8_string(void)
>>>              "\xDF\xBF",
>>>              "\"\\u07FF\"",
>>>          },
>>> -        /* 2.2.3  3 bytes U+FFFF */
>>> +        /* 2.2.3  3 bytes U+FFFD */
>>>          {
>>> -            "\"\xEF\xBF\xBF\"",
>>> -            "\xEF\xBF\xBF",
>>> -            "\"\\uFFFF\"",
>>> +            "\"\xEF\xBF\xBD\"",
>>> +            "\xEF\xBF\xBD",
>>> +            "\"\\uFFFD\"",
>>>          },
>>
>> This is under "2.2  Last possible sequence of a certain length". I guess
> 
> Which is in turn under "2  Boundary condition test cases".
> 
>> this is where you say "last possible sequence of a certain length,
>> encoding a character (= non-noncharacter)". OK, p#2.
> 
> Yes.
> 
> The test's purpose is testing the upper bound of 3-byte sequences is
> decoded correctly.
> 
> The upper bound is U+FFFF.  Since that's a noncharacter, the parser
> should reject it (or maybe replace), the formatter should replace it.
> Trouble is it could be misdecoded and then rejected / replaced.
> 
> Besides, U+FFFF already gets tested along with the other noncharacters
> under "5.3  Other illegal code positions".
> 
> Next in line is U+FFFE, also a noncharacter, also under 5.3.
> 
> Next in line is U+FFFD, which I picked.
> 
> But that gets tested under "2.3  Other boundary conditions"!  I guess I
> either drop it there, or make this one U+FFFC.
> 
> I think testing U+FFFC here makes sense, because U+FFFD could be
> misdecoded, then replaced by U+FFFD.
> 
> What do you think?

I think that we're extending Markus Kuhn's test suite, basically taking
random shots at where one specific parser's/formatter's weak spots might
be :)

That said, with intelligent fuzzing out of scope / capacity, U+FFFC
could be a good pick.

I also think I'm a quite a useless person to ask for thoughts in this
area :)

Thanks,
Laszlo

  reply	other threads:[~2013-03-22 15:41 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-14 17:49 [Qemu-devel] [PATCH 0/4] Fix JSON string formatter Markus Armbruster
2013-03-14 17:49 ` [Qemu-devel] [PATCH 1/4] unicode: New mod_utf8_codepoint() Markus Armbruster
2013-03-21 19:37   ` Laszlo Ersek
2013-03-22  9:23     ` Markus Armbruster
2013-03-22 11:46       ` Laszlo Ersek
2013-03-14 17:49 ` [Qemu-devel] [PATCH 2/4] check-qjson: Fix up a few bogus comments Markus Armbruster
2013-03-21 20:06   ` Laszlo Ersek
2013-03-22 13:27     ` Markus Armbruster
2013-03-14 17:49 ` [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings Markus Armbruster
2013-03-21 20:22   ` Laszlo Ersek
2013-03-22 14:37     ` Markus Armbruster
2013-03-22 14:52       ` Laszlo Ersek [this message]
2013-03-14 17:49 ` [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite Markus Armbruster
2013-03-21 20:44   ` Laszlo Ersek
2013-03-22 13:15   ` Laszlo Ersek
2013-03-22 14:51     ` Markus Armbruster
2013-03-17 19:55 ` [Qemu-devel] [PATCH 0/4] Fix JSON string formatter Blue Swirl
2013-03-18  9:58   ` Markus Armbruster
2013-03-23 14:44 ` Blue Swirl
2013-04-11 16:12   ` Markus Armbruster
  -- strict thread matches above, loose matches on Subject: below --
2013-04-11 16:07 Markus Armbruster
2013-04-11 16:07 ` [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=514C7019.60800@redhat.com \
    --to=lersek@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=armbru@redhat.com \
    --cc=blauwirbel@gmail.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).