From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=39544 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pq3dW-0005RA-Hj for qemu-devel@nongnu.org; Thu, 17 Feb 2011 08:11:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pq3dV-0005mh-BC for qemu-devel@nongnu.org; Thu, 17 Feb 2011 08:11:10 -0500 Received: from mail-vx0-f173.google.com ([209.85.220.173]:52145) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pq3dV-0005mX-8b for qemu-devel@nongnu.org; Thu, 17 Feb 2011 08:11:09 -0500 Received: by vxb40 with SMTP id 40so1127636vxb.4 for ; Thu, 17 Feb 2011 05:11:08 -0800 (PST) Message-ID: <4D5D1E54.1070704@codemonkey.ws> Date: Thu, 17 Feb 2011 07:10:44 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] KVM call minutes for Feb 15 References: <20110215162629.GN21720@x200.localdomain> <4D5B0889.4030303@codemonkey.ws> <4D5BA5E9.90307@redhat.com> <4D5BD259.3080804@codemonkey.ws> <4D5CE9AB.2030503@redhat.com> <4D5D10C1.9010209@codemonkey.ws> <4D5D133F.4050801@redhat.com> In-Reply-To: <4D5D133F.4050801@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Chris Wright , qemu-devel@nongnu.org, kvm@vger.kernel.org On 02/17/2011 06:23 AM, Avi Kivity wrote: > On 02/17/2011 02:12 PM, Anthony Liguori wrote: >>> (btw what happens in a non-UTF-8 locale? I guess we should just >>> reject unencodable strings). >> >> >> While QEMU is mostly ASCII internally, for the purposes of the JSON >> parser, we always encode and decode UTF-8. We reject invalid UTF-8 >> sequences. But since JSON is string-encoded unicode, we can always >> decode a JSON string to valid UTF-8 as long as the string is well >> formed. > > That is wrong. If the user passes a Unicode filename it is expected > to be translated to the current locale encoding for the purpose of, > say, filename lookup. QEMU does not support anything but UTF-8. That's pretty common with Unix software. I don't think any modern Unix platform actually uses UCS2 or UTF-16. It's either ascii or UTF-8. The only place it even matters is Windows and Windows has ASCII and UTF-16 versions of their APIs. So on Windows, non-ASCII characters won't be handled correctly (yet another one of the many issues with Windows support in QEMU). UTF-8 is self-recovering though so it degrades gracefully. Regards, Anthony Liguori