Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Laszlo Ersek <lersek@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: blauwirbel@gmail.com, aliguori@us.ibm.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings
Date: Thu, 21 Mar 2013 21:22:51 +0100	[thread overview]
Message-ID: <514B6C1B.7020400@redhat.com> (raw)
In-Reply-To: <1363283360-26220-4-git-send-email-armbru@redhat.com>

On 03/14/13 18:49, Markus Armbruster wrote:
> These are all broken, too.

What are "these"? And how are they broken? And how does the patch fix them?

> 
> A few test cases use noncharacters U+FFFF and U+10FFFF.  Risks testing
> noncharacters some more instead of what they're supposed to test.  Use
> U+FFFD and U+10FFFD instead.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  tests/check-qjson.c | 85 +++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 72 insertions(+), 13 deletions(-)

I'm confused about the commit message. There are three paragraphs in it
(the title, the first paragraph, and the 2nd paragraph). This patch
modifies different tests:

> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
> index 852124a..efec1b2 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qjson.c
> @@ -158,7 +158,7 @@ static void utf8_string(void)
>       * consider using overlong encoding \xC0\x80 for U+0000 ("modified
>       * UTF-8").
>       *
> -     * Test cases are scraped from Markus Kuhn's UTF-8 decoder
> +     * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
>       * capability and stress test at
>       * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
>       */
> @@ -256,11 +256,11 @@ static void utf8_string(void)
>              "\xDF\xBF",
>              "\"\\u07FF\"",
>          },
> -        /* 2.2.3  3 bytes U+FFFF */
> +        /* 2.2.3  3 bytes U+FFFD */
>          {
> -            "\"\xEF\xBF\xBF\"",
> -            "\xEF\xBF\xBF",
> -            "\"\\uFFFF\"",
> +            "\"\xEF\xBF\xBD\"",
> +            "\xEF\xBF\xBD",
> +            "\"\\uFFFD\"",
>          },

This is under "2.2  Last possible sequence of a certain length". I guess
this is where you say "last possible sequence of a certain length,
encoding a character (= non-noncharacter)". OK, p#2.


>          /* 2.2.4  4 bytes U+1FFFFF */
>          {
> @@ -303,10 +303,10 @@ static void utf8_string(void)
>              "\"\\uFFFD\"",
>          },
>          {
> -            /* U+10FFFF */
> -            "\"\xF4\x8F\xBF\xBF\"",
> -            "\xF4\x8F\xBF\xBF",
> -            "\"\\u43FF\\uFFFF\"", /* bug: want "\"\\uDBFF\\uDFFF\"" */
> +            /* U+10FFFD */
> +            "\"\xF4\x8F\xBF\xBD\"",
> +            "\xF4\x8F\xBF\xBD",
> +            "\"\\u43FF\\uFFFF\"", /* bug: want "\"\\uDBFF\\uDFFD\"" */
>          },
>          {
>              /* U+110000 */

Under "2.3  Other boundary conditions". Not a non-character any longer,
but also not a boundary condition. At least not the original one. Still
covered by the ...FFFD part of the commit message, p#2.


> @@ -584,9 +584,9 @@ static void utf8_string(void)
>              "\"\\u07FF\"",
>          },
>          {
> -            /* \U+FFFF */
> -            "\"\xF0\x8F\xBF\xBF\"",
> -            "\xF0\x8F\xBF\xBF",   /* bug: not corrected */
> +            /* \U+FFFD */
> +            "\"\xF0\x8F\xBF\xBD\"",
> +            "\xF0\x8F\xBF\xBD",   /* bug: not corrected */
>              "\"\\u03FF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
>          },
>          {

Under "4.2  Maximum overlong sequences". What does that even mean? "In
some sense maximum codepoints, all represented as overlong sequences"? P#2.

> @@ -731,6 +731,7 @@ static void utf8_string(void)
>              "\"\\uDBFF\\uDFFF\"", /* bug: want "\"\\uFFFF\\uFFFF\"" */
>          },
>          /* 5.3  Other illegal code positions */
> +        /* BMP noncharacters */
>          {
>              /* \U+FFFE */
>              "\"\xEF\xBF\xBE\"",
> @@ -741,7 +742,65 @@ static void utf8_string(void)
>              /* \U+FFFF */
>              "\"\xEF\xBF\xBF\"",
>              "\xEF\xBF\xBF",     /* bug: not corrected */
> -            "\"\\uFFFF\"",      /* bug: not corrected */
> +            "\"\\uFFFF\"",
> +        },
> +        {
> +            /* U+FDD0 */
> +            "\"\xEF\xB7\x90\"",
> +            "\xEF\xB7\x90",     /* bug: not corrected */
> +            "\"\\uFDD0\"",      /* bug: not corrected */
> +        },
> +        {
> +            /* U+FDEF */
> +            "\"\xEF\xB7\xAF\"",
> +            "\xEF\xB7\xAF",     /* bug: not corrected */
> +            "\"\\uFDEF\"",      /* bug: not corrected */
> +        },
> +        /* Plane 1 .. 16 noncharacters */
> +        {
> +            /* U+1FFFE U+1FFFF U+2FFFE U+2FFFF ... U+10FFFE U+10FFFF */
> +            "\"\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
> +            "\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
> +            "\xF0\xBF\xBF\xBE\xF0\xBF\xBF\xBF"
> +            "\xF1\x8F\xBF\xBE\xF1\x8F\xBF\xBF"
> +            "\xF1\x9F\xBF\xBE\xF1\x9F\xBF\xBF"
> +            "\xF1\xAF\xBF\xBE\xF1\xAF\xBF\xBF"
> +            "\xF1\xBF\xBF\xBE\xF1\xBF\xBF\xBF"
> +            "\xF2\x8F\xBF\xBE\xF2\x8F\xBF\xBF"
> +            "\xF2\x9F\xBF\xBE\xF2\x9F\xBF\xBF"
> +            "\xF2\xAF\xBF\xBE\xF2\xAF\xBF\xBF"
> +            "\xF2\xBF\xBF\xBE\xF2\xBF\xBF\xBF"
> +            "\xF3\x8F\xBF\xBE\xF3\x8F\xBF\xBF"
> +            "\xF3\x9F\xBF\xBE\xF3\x9F\xBF\xBF"
> +            "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
> +            "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
> +            "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF\"",
> +            /* bug: not corrected */
> +            "\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
> +            "\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
> +            "\xF0\xBF\xBF\xBE\xF0\xBF\xBF\xBF"
> +            "\xF1\x8F\xBF\xBE\xF1\x8F\xBF\xBF"
> +            "\xF1\x9F\xBF\xBE\xF1\x9F\xBF\xBF"
> +            "\xF1\xAF\xBF\xBE\xF1\xAF\xBF\xBF"
> +            "\xF1\xBF\xBF\xBE\xF1\xBF\xBF\xBF"
> +            "\xF2\x8F\xBF\xBE\xF2\x8F\xBF\xBF"
> +            "\xF2\x9F\xBF\xBE\xF2\x9F\xBF\xBF"
> +            "\xF2\xAF\xBF\xBE\xF2\xAF\xBF\xBF"
> +            "\xF2\xBF\xBF\xBE\xF2\xBF\xBF\xBF"
> +            "\xF3\x8F\xBF\xBE\xF3\x8F\xBF\xBF"
> +            "\xF3\x9F\xBF\xBE\xF3\x9F\xBF\xBF"
> +            "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
> +            "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
> +            "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF",
> +            /* bug: not corrected */
> +            "\"\\u07FF\\uFFFF\\u07FF\\uFFFF\\u0BFF\\uFFFF\\u0BFF\\uFFFF"
> +            "\\u0FFF\\uFFFF\\u0FFF\\uFFFF\\u13FF\\uFFFF\\u13FF\\uFFFF"
> +            "\\u17FF\\uFFFF\\u17FF\\uFFFF\\u1BFF\\uFFFF\\u1BFF\\uFFFF"
> +            "\\u1FFF\\uFFFF\\u1FFF\\uFFFF\\u23FF\\uFFFF\\u23FF\\uFFFF"
> +            "\\u27FF\\uFFFF\\u27FF\\uFFFF\\u2BFF\\uFFFF\\u2BFF\\uFFFF"
> +            "\\u2FFF\\uFFFF\\u2FFF\\uFFFF\\u33FF\\uFFFF\\u33FF\\uFFFF"
> +            "\\u37FF\\uFFFF\\u37FF\\uFFFF\\u3BFF\\uFFFF\\u3BFF\\uFFFF"
> +            "\\u3FFF\\uFFFF\\u3FFF\\uFFFF\\u43FF\\uFFFF\\u43FF\\uFFFF\"",
>          },
>          {}
>      };
> 

This is probably p#0 (the title).

Ah. Have you removed the noncharacters from the other tests, but made up
for them at the end with new noncharacter tests?

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

next prev parent reply	other threads:[~2013-03-21 20:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-14 17:49 [Qemu-devel] [PATCH 0/4] Fix JSON string formatter Markus Armbruster
2013-03-14 17:49 ` [Qemu-devel] [PATCH 1/4] unicode: New mod_utf8_codepoint() Markus Armbruster
2013-03-21 19:37   ` Laszlo Ersek
2013-03-22  9:23     ` Markus Armbruster
2013-03-22 11:46       ` Laszlo Ersek
2013-03-14 17:49 ` [Qemu-devel] [PATCH 2/4] check-qjson: Fix up a few bogus comments Markus Armbruster
2013-03-21 20:06   ` Laszlo Ersek
2013-03-22 13:27     ` Markus Armbruster
2013-03-14 17:49 ` [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings Markus Armbruster
2013-03-21 20:22   ` Laszlo Ersek [this message]
2013-03-22 14:37     ` Markus Armbruster
2013-03-22 14:52       ` Laszlo Ersek
2013-03-14 17:49 ` [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite Markus Armbruster
2013-03-21 20:44   ` Laszlo Ersek
2013-03-22 13:15   ` Laszlo Ersek
2013-03-22 14:51     ` Markus Armbruster
2013-03-17 19:55 ` [Qemu-devel] [PATCH 0/4] Fix JSON string formatter Blue Swirl
2013-03-18  9:58   ` Markus Armbruster
2013-03-23 14:44 ` Blue Swirl
2013-04-11 16:12   ` Markus Armbruster
  -- strict thread matches above, loose matches on Subject: below --
2013-04-11 16:07 Markus Armbruster
2013-04-11 16:07 ` [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=514B6C1B.7020400@redhat.com \
    --to=lersek@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=armbru@redhat.com \
    --cc=blauwirbel@gmail.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).