From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40321) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fnlka-00019s-PJ for qemu-devel@nongnu.org; Thu, 09 Aug 2018 10:17:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fnlkX-0002L7-Ea for qemu-devel@nongnu.org; Thu, 09 Aug 2018 10:17:16 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:51280 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fnlkX-0002Kg-A3 for qemu-devel@nongnu.org; Thu, 09 Aug 2018 10:17:13 -0400 References: <20180808120334.10970-1-armbru@redhat.com> <20180808120334.10970-12-armbru@redhat.com> From: Eric Blake Message-ID: <26ff5c67-abfa-bc5d-7c26-3f08ffbdc57b@redhat.com> Date: Thu, 9 Aug 2018 09:17:11 -0500 MIME-Version: 1.0 In-Reply-To: <20180808120334.10970-12-armbru@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 11/56] check-qjson: Cover UTF-8 in single quoted strings List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster , qemu-devel@nongnu.org Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com On 08/08/2018 07:02 AM, Markus Armbruster wrote: > utf8_string() tests only double quoted strings. Cover single quoted > strings, too: store the strings to test without quotes, then wrap them > in either kind of quote. > > Signed-off-by: Markus Armbruster > --- > tests/check-qjson.c | 427 ++++++++++++++++++++++---------------------- > 1 file changed, 214 insertions(+), 213 deletions(-) > Pre-existing, but: > /* 2.2.4 4 bytes U+1FFFFF */ Technically, Unicode ends at U+10FFFF (21 bits). Anything beyond that is not valid Unicode, even if it IS a valid interpretation of UTF-8 encoding. > { > - "\"\xF7\xBF\xBF\xBF\"", > + "\xF7\xBF\xBF\xBF", > NULL, /* bug: rejected */ > - "\"\\uFFFD\"", > + "\\uFFFD", > "\xF7\xBF\xBF\xBF", > }, > /* 2.2.5 5 bytes U+3FFFFFF */ Which makes this one also questionable, > { > - "\"\xFB\xBF\xBF\xBF\xBF\"", > + "\xFB\xBF\xBF\xBF\xBF", > NULL, /* bug: rejected */ > - "\"\\uFFFD\"", > + "\\uFFFD", > "\xFB\xBF\xBF\xBF\xBF", > }, > /* 2.2.6 6 bytes U+7FFFFFFF */ and this one. > { > /* last one in last plane: U+10FFFD */ > - "\"\xF4\x8F\xBF\xBD\"", > "\xF4\x8F\xBF\xBD", > - "\"\\uDBFF\\uDFFD\"" > + "\xF4\x8F\xBF\xBD", > + "\\uDBFF\\uDFFD" > }, > { > /* first one beyond Unicode range: U+110000 */ while these are reasonable. The conversion of the initializer looks sane (well, mechanical). Ergo: Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org