From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=34649 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Pq3rO-0006ta-G0
	for qemu-devel@nongnu.org; Thu, 17 Feb 2011 08:25:32 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1Pq3rN-0000Ol-Ad
	for qemu-devel@nongnu.org; Thu, 17 Feb 2011 08:25:30 -0500
Received: from mx1.redhat.com ([209.132.183.28]:32837)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <avi@redhat.com>) id 1Pq3rN-0000Ne-40
	for qemu-devel@nongnu.org; Thu, 17 Feb 2011 08:25:29 -0500
Message-ID: <4D5D21C1.80009@redhat.com>
Date: Thu, 17 Feb 2011 15:25:21 +0200
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] KVM call minutes for Feb 15
References: <20110215162629.GN21720@x200.localdomain>	<4D5B0889.4030303@codemonkey.ws>	<4D5BA5E9.90307@redhat.com>	<4D5BD259.3080804@codemonkey.ws>
	<4D5CE9AB.2030503@redhat.com>	<4D5D10C1.9010209@codemonkey.ws>
	<4D5D133F.4050801@redhat.com> <4D5D1E54.1070704@codemonkey.ws>
In-Reply-To: <4D5D1E54.1070704@codemonkey.ws>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Chris Wright <chrisw@redhat.com>, qemu-devel@nongnu.org, kvm@vger.kernel.org

On 02/17/2011 03:10 PM, Anthony Liguori wrote:
> On 02/17/2011 06:23 AM, Avi Kivity wrote:
>> On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>>>> (btw what happens in a non-UTF-8 locale? I guess we should just 
>>>> reject unencodable strings).
>>>
>>>
>>> While QEMU is mostly ASCII internally, for the purposes of the JSON 
>>> parser, we always encode and decode UTF-8.  We reject invalid UTF-8 
>>> sequences.  But since JSON is string-encoded unicode, we can always 
>>> decode a JSON string to valid UTF-8 as long as the string is well 
>>> formed.
>>
>> That is wrong.  If the user passes a Unicode filename it is expected 
>> to be translated to the current locale encoding for the purpose of, 
>> say, filename lookup.
>
> QEMU does not support anything but UTF-8.

Since when?

AFAICT, JSON string conversion is the only place where there is any 
dependency on UTF-8.  Anything else should just work.

>
> That's pretty common with Unix software.  I don't think any modern 
> Unix platform actually uses UCS2 or UTF-16.  It's either ascii or UTF-8.

Most/all Linux distributions support UTF-8 as well as a zillion other 
encodings (single-byte ASCII + another charset, or multi-byte charsets 
for languages with many characters.

> The only place it even matters is Windows and Windows has ASCII and 
> UTF-16 versions of their APIs.  So on Windows, non-ASCII characters 
> won't be handled correctly (yet another one of the many issues with 
> Windows support in QEMU).  UTF-8 is self-recovering though so it 
> degrades gracefully.

It matters on Linux with el_GR.iso88597, for example.  If you feed a 
JSON string and translate it blindly to UTF-8, you'll get garbage when 
you feed it to system calls.

Practically everyone uses UTF-8 these days, so the impact is minimal, 
but it is more correct (as well as simpler) to ask the system libraries 
to encode using the current locale.

-- 
error compiling committee.c: too many arguments to function