From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: Chris Wright <chrisw@redhat.com>,
qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: [Qemu-devel] KVM call minutes for Feb 15
Date: Thu, 17 Feb 2011 07:37:54 -0600 [thread overview]
Message-ID: <4D5D24B2.30500@codemonkey.ws> (raw)
In-Reply-To: <4D5D21C1.80009@redhat.com>
On 02/17/2011 07:25 AM, Avi Kivity wrote:
> On 02/17/2011 03:10 PM, Anthony Liguori wrote:
>> On 02/17/2011 06:23 AM, Avi Kivity wrote:
>>> On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>>>>> (btw what happens in a non-UTF-8 locale? I guess we should just
>>>>> reject unencodable strings).
>>>>
>>>>
>>>> While QEMU is mostly ASCII internally, for the purposes of the JSON
>>>> parser, we always encode and decode UTF-8. We reject invalid UTF-8
>>>> sequences. But since JSON is string-encoded unicode, we can always
>>>> decode a JSON string to valid UTF-8 as long as the string is well
>>>> formed.
>>>
>>> That is wrong. If the user passes a Unicode filename it is expected
>>> to be translated to the current locale encoding for the purpose of,
>>> say, filename lookup.
>>
>> QEMU does not support anything but UTF-8.
>
> Since when?
>
> AFAICT, JSON string conversion is the only place where there is any
> dependency on UTF-8. Anything else should just work.
>
>>
>> That's pretty common with Unix software. I don't think any modern
>> Unix platform actually uses UCS2 or UTF-16. It's either ascii or UTF-8.
>
> Most/all Linux distributions support UTF-8 as well as a zillion other
> encodings (single-byte ASCII + another charset, or multi-byte charsets
> for languages with many characters.
An application has to explicitly support an encoding. It is not
transparent. UCS2/UTF-16 means that strings are not 'const char *'s but
'const wchar_t *' where typedef unsigned short wchar_t;.
QEMU assumes, in lots of places that strings are single-byte NUL
terminated. Basically, any use of snprintf, printf, strcpy, strlen,
etc. pretty much tie you to ASCII/UTF-8. You can have a single NUL byte
as part of a valid UCS2 string.
>> The only place it even matters is Windows and Windows has ASCII and
>> UTF-16 versions of their APIs. So on Windows, non-ASCII characters
>> won't be handled correctly (yet another one of the many issues with
>> Windows support in QEMU). UTF-8 is self-recovering though so it
>> degrades gracefully.
>
> It matters on Linux with el_GR.iso88597, for example.
The whole series of iso8859 (8-bit encodings) are officially abandoned
in favor of UCS and encodings that support the full UCS code page
(UTF-8/UTF-16).
I see no strong reason to try and support deprecated encodings when
there are perfectly valid replacements like el_GR.utf8.
Regards,
Anthony Liguori
next prev parent reply other threads:[~2011-02-17 13:38 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-15 16:26 [Qemu-devel] KVM call minutes for Feb 15 Chris Wright
2011-02-15 23:13 ` Anthony Liguori
2011-02-16 10:24 ` Avi Kivity
2011-02-16 13:34 ` Anthony Liguori
2011-02-17 9:26 ` Avi Kivity
2011-02-17 12:12 ` Anthony Liguori
2011-02-17 12:23 ` Avi Kivity
2011-02-17 13:10 ` Anthony Liguori
2011-02-17 13:25 ` Avi Kivity
2011-02-17 13:37 ` Anthony Liguori
2011-02-17 13:59 ` Peter Maydell
2011-02-17 14:01 ` Anthony Liguori
2011-02-17 14:06 ` Avi Kivity
2011-02-17 13:37 ` Anthony Liguori [this message]
2011-02-16 14:39 ` Amit Shah
2011-02-16 14:41 ` Anthony Liguori
2011-02-17 12:42 ` Amit Shah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D5D24B2.30500@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=chrisw@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).