From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] KVM call minutes for Feb 15
Date: Thu, 17 Feb 2011 15:25:21 +0200
Message-ID: <4D5D21C1.80009@redhat.com>
References: <20110215162629.GN21720@x200.localdomain>	<4D5B0889.4030303@codemonkey.ws>	<4D5BA5E9.90307@redhat.com>	<4D5BD259.3080804@codemonkey.ws> <4D5CE9AB.2030503@redhat.com>	<4D5D10C1.9010209@codemonkey.ws> <4D5D133F.4050801@redhat.com> <4D5D1E54.1070704@codemonkey.ws>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Chris Wright <chrisw@redhat.com>, qemu-devel@nongnu.org,
	kvm@vger.kernel.org
To: Anthony Liguori <anthony@codemonkey.ws>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:39687 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751824Ab1BQNZa (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 17 Feb 2011 08:25:30 -0500
In-Reply-To: <4D5D1E54.1070704@codemonkey.ws>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 02/17/2011 03:10 PM, Anthony Liguori wrote:
> On 02/17/2011 06:23 AM, Avi Kivity wrote:
>> On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>>>> (btw what happens in a non-UTF-8 locale? I guess we should just 
>>>> reject unencodable strings).
>>>
>>>
>>> While QEMU is mostly ASCII internally, for the purposes of the JSON 
>>> parser, we always encode and decode UTF-8.  We reject invalid UTF-8 
>>> sequences.  But since JSON is string-encoded unicode, we can always 
>>> decode a JSON string to valid UTF-8 as long as the string is well 
>>> formed.
>>
>> That is wrong.  If the user passes a Unicode filename it is expected 
>> to be translated to the current locale encoding for the purpose of, 
>> say, filename lookup.
>
> QEMU does not support anything but UTF-8.

Since when?

AFAICT, JSON string conversion is the only place where there is any 
dependency on UTF-8.  Anything else should just work.

>
> That's pretty common with Unix software.  I don't think any modern 
> Unix platform actually uses UCS2 or UTF-16.  It's either ascii or UTF-8.

Most/all Linux distributions support UTF-8 as well as a zillion other 
encodings (single-byte ASCII + another charset, or multi-byte charsets 
for languages with many characters.

> The only place it even matters is Windows and Windows has ASCII and 
> UTF-16 versions of their APIs.  So on Windows, non-ASCII characters 
> won't be handled correctly (yet another one of the many issues with 
> Windows support in QEMU).  UTF-8 is self-recovering though so it 
> degrades gracefully.

It matters on Linux with el_GR.iso88597, for example.  If you feed a 
JSON string and translate it blindly to UTF-8, you'll get garbage when 
you feed it to system calls.

Practically everyone uses UTF-8 these days, so the impact is minimal, 
but it is more correct (as well as simpler) to ask the system libraries 
to encode using the current locale.

-- 
error compiling committee.c: too many arguments to function