Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Laszlo Ersek <lersek@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: blauwirbel@gmail.com, aliguori@us.ibm.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings
Date: Thu, 21 Mar 2013 21:22:51 +0100	[thread overview]
Message-ID: <514B6C1B.7020400@redhat.com> (raw)
In-Reply-To: <1363283360-26220-4-git-send-email-armbru@redhat.com>

On 03/14/13 18:49, Markus Armbruster wrote:
> These are all broken, too.

What are "these"? And how are they broken? And how does the patch fix them?

> 
> A few test cases use noncharacters U+FFFF and U+10FFFF.  Risks testing
> noncharacters some more instead of what they're supposed to test.  Use
> U+FFFD and U+10FFFD instead.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  tests/check-qjson.c | 85 +++++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 72 insertions(+), 13 deletions(-)

I'm confused about the commit message. There are three paragraphs in it
(the title, the first paragraph, and the 2nd paragraph). This patch
modifies different tests:

> diff --git a/tests/check-qjson.c b/tests/check-qjson.c
> index 852124a..efec1b2 100644
> --- a/tests/check-qjson.c
> +++ b/tests/check-qjson.c
> @@ -158,7 +158,7 @@ static void utf8_string(void)
>       * consider using overlong encoding \xC0\x80 for U+0000 ("modified
>       * UTF-8").
>       *
> -     * Test cases are scraped from Markus Kuhn's UTF-8 decoder
> +     * Most test cases are scraped from Markus Kuhn's UTF-8 decoder
>       * capability and stress test at
>       * http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
>       */
> @@ -256,11 +256,11 @@ static void utf8_string(void)
>              "\xDF\xBF",
>              "\"\\u07FF\"",
>          },
> -        /* 2.2.3  3 bytes U+FFFF */
> +        /* 2.2.3  3 bytes U+FFFD */
>          {
> -            "\"\xEF\xBF\xBF\"",
> -            "\xEF\xBF\xBF",
> -            "\"\\uFFFF\"",
> +            "\"\xEF\xBF\xBD\"",
> +            "\xEF\xBF\xBD",
> +            "\"\\uFFFD\"",
>          },

This is under "2.2  Last possible sequence of a certain length". I guess
this is where you say "last possible sequence of a certain length,
encoding a character (= non-noncharacter)". OK, p#2.


>          /* 2.2.4  4 bytes U+1FFFFF */
>          {
> @@ -303,10 +303,10 @@ static void utf8_string(void)
>              "\"\\uFFFD\"",
>          },
>          {
> -            /* U+10FFFF */
> -            "\"\xF4\x8F\xBF\xBF\"",
> -            "\xF4\x8F\xBF\xBF",
> -            "\"\\u43FF\\uFFFF\"", /* bug: want "\"\\uDBFF\\uDFFF\"" */
> +            /* U+10FFFD */
> +            "\"\xF4\x8F\xBF\xBD\"",
> +            "\xF4\x8F\xBF\xBD",
> +            "\"\\u43FF\\uFFFF\"", /* bug: want "\"\\uDBFF\\uDFFD\"" */
>          },
>          {
>              /* U+110000 */

Under "2.3  Other boundary conditions". Not a non-character any longer,
but also not a boundary condition. At least not the original one. Still
covered by the ...FFFD part of the commit message, p#2.


> @@ -584,9 +584,9 @@ static void utf8_string(void)
>              "\"\\u07FF\"",
>          },
>          {
> -            /* \U+FFFF */
> -            "\"\xF0\x8F\xBF\xBF\"",
> -            "\xF0\x8F\xBF\xBF",   /* bug: not corrected */
> +            /* \U+FFFD */
> +            "\"\xF0\x8F\xBF\xBD\"",
> +            "\xF0\x8F\xBF\xBD",   /* bug: not corrected */
>              "\"\\u03FF\\uFFFF\"", /* bug: want "\"\\uFFFF\"" */
>          },
>          {

Under "4.2  Maximum overlong sequences". What does that even mean? "In
some sense maximum codepoints, all represented as overlong sequences"? P#2.

> @@ -731,6 +731,7 @@ static void utf8_string(void)
>              "\"\\uDBFF\\uDFFF\"", /* bug: want "\"\\uFFFF\\uFFFF\"" */
>          },
>          /* 5.3  Other illegal code positions */
> +        /* BMP noncharacters */
>          {
>              /* \U+FFFE */
>              "\"\xEF\xBF\xBE\"",
> @@ -741,7 +742,65 @@ static void utf8_string(void)
>              /* \U+FFFF */
>              "\"\xEF\xBF\xBF\"",
>              "\xEF\xBF\xBF",     /* bug: not corrected */
> -            "\"\\uFFFF\"",      /* bug: not corrected */
> +            "\"\\uFFFF\"",
> +        },
> +        {
> +            /* U+FDD0 */
> +            "\"\xEF\xB7\x90\"",
> +            "\xEF\xB7\x90",     /* bug: not corrected */
> +            "\"\\uFDD0\"",      /* bug: not corrected */
> +        },
> +        {
> +            /* U+FDEF */
> +            "\"\xEF\xB7\xAF\"",
> +            "\xEF\xB7\xAF",     /* bug: not corrected */
> +            "\"\\uFDEF\"",      /* bug: not corrected */
> +        },
> +        /* Plane 1 .. 16 noncharacters */
> +        {
> +            /* U+1FFFE U+1FFFF U+2FFFE U+2FFFF ... U+10FFFE U+10FFFF */
> +            "\"\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
> +            "\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
> +            "\xF0\xBF\xBF\xBE\xF0\xBF\xBF\xBF"
> +            "\xF1\x8F\xBF\xBE\xF1\x8F\xBF\xBF"
> +            "\xF1\x9F\xBF\xBE\xF1\x9F\xBF\xBF"
> +            "\xF1\xAF\xBF\xBE\xF1\xAF\xBF\xBF"
> +            "\xF1\xBF\xBF\xBE\xF1\xBF\xBF\xBF"
> +            "\xF2\x8F\xBF\xBE\xF2\x8F\xBF\xBF"
> +            "\xF2\x9F\xBF\xBE\xF2\x9F\xBF\xBF"
> +            "\xF2\xAF\xBF\xBE\xF2\xAF\xBF\xBF"
> +            "\xF2\xBF\xBF\xBE\xF2\xBF\xBF\xBF"
> +            "\xF3\x8F\xBF\xBE\xF3\x8F\xBF\xBF"
> +            "\xF3\x9F\xBF\xBE\xF3\x9F\xBF\xBF"
> +            "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
> +            "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
> +            "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF\"",
> +            /* bug: not corrected */
> +            "\xF0\x9F\xBF\xBE\xF0\x9F\xBF\xBF"
> +            "\xF0\xAF\xBF\xBE\xF0\xAF\xBF\xBF"
> +            "\xF0\xBF\xBF\xBE\xF0\xBF\xBF\xBF"
> +            "\xF1\x8F\xBF\xBE\xF1\x8F\xBF\xBF"
> +            "\xF1\x9F\xBF\xBE\xF1\x9F\xBF\xBF"
> +            "\xF1\xAF\xBF\xBE\xF1\xAF\xBF\xBF"
> +            "\xF1\xBF\xBF\xBE\xF1\xBF\xBF\xBF"
> +            "\xF2\x8F\xBF\xBE\xF2\x8F\xBF\xBF"
> +            "\xF2\x9F\xBF\xBE\xF2\x9F\xBF\xBF"
> +            "\xF2\xAF\xBF\xBE\xF2\xAF\xBF\xBF"
> +            "\xF2\xBF\xBF\xBE\xF2\xBF\xBF\xBF"
> +            "\xF3\x8F\xBF\xBE\xF3\x8F\xBF\xBF"
> +            "\xF3\x9F\xBF\xBE\xF3\x9F\xBF\xBF"
> +            "\xF3\xAF\xBF\xBE\xF3\xAF\xBF\xBF"
> +            "\xF3\xBF\xBF\xBE\xF3\xBF\xBF\xBF"
> +            "\xF4\x8F\xBF\xBE\xF4\x8F\xBF\xBF",
> +            /* bug: not corrected */
> +            "\"\\u07FF\\uFFFF\\u07FF\\uFFFF\\u0BFF\\uFFFF\\u0BFF\\uFFFF"
> +            "\\u0FFF\\uFFFF\\u0FFF\\uFFFF\\u13FF\\uFFFF\\u13FF\\uFFFF"
> +            "\\u17FF\\uFFFF\\u17FF\\uFFFF\\u1BFF\\uFFFF\\u1BFF\\uFFFF"
> +            "\\u1FFF\\uFFFF\\u1FFF\\uFFFF\\u23FF\\uFFFF\\u23FF\\uFFFF"
> +            "\\u27FF\\uFFFF\\u27FF\\uFFFF\\u2BFF\\uFFFF\\u2BFF\\uFFFF"
> +            "\\u2FFF\\uFFFF\\u2FFF\\uFFFF\\u33FF\\uFFFF\\u33FF\\uFFFF"
> +            "\\u37FF\\uFFFF\\u37FF\\uFFFF\\u3BFF\\uFFFF\\u3BFF\\uFFFF"
> +            "\\u3FFF\\uFFFF\\u3FFF\\uFFFF\\u43FF\\uFFFF\\u43FF\\uFFFF\"",
>          },
>          {}
>      };
> 

This is probably p#0 (the title).

Ah. Have you removed the noncharacters from the other tests, but made up
for them at the end with new noncharacter tests?

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

next prev parent reply	other threads:[~2013-03-21 20:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-14 17:49 [Qemu-devel] [PATCH 0/4] Fix JSON string formatter Markus Armbruster
2013-03-14 17:49 ` [Qemu-devel] [PATCH 1/4] unicode: New mod_utf8_codepoint() Markus Armbruster
2013-03-21 19:37   ` Laszlo Ersek
2013-03-22  9:23     ` Markus Armbruster
2013-03-22 11:46       ` Laszlo Ersek
2013-03-14 17:49 ` [Qemu-devel] [PATCH 2/4] check-qjson: Fix up a few bogus comments Markus Armbruster
2013-03-21 20:06   ` Laszlo Ersek
2013-03-22 13:27     ` Markus Armbruster
2013-03-14 17:49 ` [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings Markus Armbruster
2013-03-21 20:22   ` Laszlo Ersek [this message]
2013-03-22 14:37     ` Markus Armbruster
2013-03-22 14:52       ` Laszlo Ersek
2013-03-14 17:49 ` [Qemu-devel] [PATCH 4/4] qjson: to_json() case QTYPE_QSTRING is buggy, rewrite Markus Armbruster
2013-03-21 20:44   ` Laszlo Ersek
2013-03-22 13:15   ` Laszlo Ersek
2013-03-22 14:51     ` Markus Armbruster
2013-03-17 19:55 ` [Qemu-devel] [PATCH 0/4] Fix JSON string formatter Blue Swirl
2013-03-18  9:58   ` Markus Armbruster
2013-03-23 14:44 ` Blue Swirl
2013-04-11 16:12   ` Markus Armbruster
  -- strict thread matches above, loose matches on Subject: below --
2013-04-11 16:07 Markus Armbruster
2013-04-11 16:07 ` [Qemu-devel] [PATCH 3/4] check-qjson: Test noncharacters other than U+FFFE, U+FFFF in strings Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=514B6C1B.7020400@redhat.com \
    --to=lersek@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=armbru@redhat.com \
    --cc=blauwirbel@gmail.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.