From: Laszlo Ersek <lersek@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: blauwirbel@gmail.com, aliguori@us.ibm.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings
Date: Fri, 22 Mar 2013 15:52:09 +0100 [thread overview]
Message-ID: <514C7019.60800@redhat.com> (raw)
In-Reply-To: <87txo3tsa7.fsf@blackfin.pond.sub.org>
On 03/22/13 15:37, Markus Armbruster wrote:
> Laszlo Ersek <lersek@redhat.com> writes:
>
>> On 03/14/13 18:49, Markus Armbruster wrote:
>>> These are all broken, too.
>>
>> What are "these"? And how are they broken? And how does the patch fix them?
>
> "These" refers to the subject: noncharacters other than U+FFFE, U+FFFF.
>
> I agree that I should better explain how they're broken, and what the
> patch does to fix them. Will fix on respin.
>
>>>
>>> A few test cases use noncharacters U+FFFF and U+10FFFF. Risks testing
>>> noncharacters some more instead of what they're supposed to test. Use
>>> U+FFFD and U+10FFFD instead.
>>>
>>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>>> ---
>>> tests/check-qjson.c | 85
>>> +++++++++++++++++++++++++++++++++++++++++++++--------
>>> 1 file changed, 72 insertions(+), 13 deletions(-)
>>
>> I'm confused about the commit message. There are three paragraphs in it
>> (the title, the first paragraph, and the 2nd paragraph). This patch
>> modifies different tests:
>>
>>> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
>>> index 852124a..efec1b2 100644
>>> --- a/tests/check-qjson.c
>>> +++ b/tests/check-qjson.c
>>> @@ -158,7 +158,7 @@ static void utf8_string(void)
>>> * consider using overlong encoding \xC0\x80 for U+0000 ("modified
>>> * UTF-8").
>>> *
>>> - * Test cases are scraped from Markus Kuhn's UTF-8 decoder
>>> + * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
>>> * capability and stress test at
>>> * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
>>> */
>>> @@ -256,11 +256,11 @@ static void utf8_string(void)
>>> "\xDF\xBF",
>>> "\"\\u07FF\"",
>>> },
>>> - /* 2.2.3 3 bytes U+FFFF */
>>> + /* 2.2.3 3 bytes U+FFFD */
>>> {
>>> - "\"\xEF\xBF\xBF\"",
>>> - "\xEF\xBF\xBF",
>>> - "\"\\uFFFF\"",
>>> + "\"\xEF\xBF\xBD\"",
>>> + "\xEF\xBF\xBD",
>>> + "\"\\uFFFD\"",
>>> },
>>
>> This is under "2.2 Last possible sequence of a certain length". I guess
>
> Which is in turn under "2 Boundary condition test cases".
>
>> this is where you say "last possible sequence of a certain length,
>> encoding a character (= non-noncharacter)". OK, p#2.
>
> Yes.
>
> The test's purpose is testing the upper bound of 3-byte sequences is
> decoded correctly.
>
> The upper bound is U+FFFF. Since that's a noncharacter, the parser
> should reject it (or maybe replace), the formatter should replace it.
> Trouble is it could be misdecoded and then rejected / replaced.
>
> Besides, U+FFFF already gets tested along with the other noncharacters
> under "5.3 Other illegal code positions".
>
> Next in line is U+FFFE, also a noncharacter, also under 5.3.
>
> Next in line is U+FFFD, which I picked.
>
> But that gets tested under "2.3 Other boundary conditions"! I guess I
> either drop it there, or make this one U+FFFC.
>
> I think testing U+FFFC here makes sense, because U+FFFD could be
> misdecoded, then replaced by U+FFFD.
>
> What do you think?
I think that we're extending Markus Kuhn's test suite, basically taking
random shots at where one specific parser's/formatter's weak spots might
be :)
That said, with intelligent fuzzing out of scope / capacity, U+FFFC
could be a good pick.
I also think I'm a quite a useless person to ask for thoughts in this
area :)
Thanks,
Laszlo
next prev parent reply other threads:[~2013-03-22 15:41 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-14 17:49 [Qemu-devel] [PATCH 0/4] Fix JSON string formatter Markus Armbruster
2013-03-14 17:49 ` [Qemu-devel] [PATCH 1/4] unicode: New mod_utf8_codepoint() Markus Armbruster
2013-03-21 19:37 ` Laszlo Ersek
2013-03-22 9:23 ` Markus Armbruster
2013-03-22 11:46 ` Laszlo Ersek
2013-03-14 17:49 ` [Qemu-devel] [PATCH 2/4] check-qjson: Fix up a few bogus comments Markus Armbruster
2013-03-21 20:06 ` Laszlo Ersek
2013-03-22 13:27 ` Markus Armbruster
2013-03-14 17:49 ` [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings Markus Armbruster
2013-03-21 20:22 ` Laszlo Ersek
2013-03-22 14:37 ` Markus Armbruster
2013-03-22 14:52 ` Laszlo Ersek [this message]
2013-03-14 17:49 ` [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite Markus Armbruster
2013-03-21 20:44 ` Laszlo Ersek
2013-03-22 13:15 ` Laszlo Ersek
2013-03-22 14:51 ` Markus Armbruster
2013-03-17 19:55 ` [Qemu-devel] [PATCH 0/4] Fix JSON string formatter Blue Swirl
2013-03-18 9:58 ` Markus Armbruster
2013-03-23 14:44 ` Blue Swirl
2013-04-11 16:12 ` Markus Armbruster
-- strict thread matches above, loose matches on Subject: below --
2013-04-11 16:07 Markus Armbruster
2013-04-11 16:07 ` [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=514C7019.60800@redhat.com \
--to=lersek@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=armbru@redhat.com \
--cc=blauwirbel@gmail.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).