From: Anthony Liguori <anthony@codemonkey.ws>
To: Avi Kivity <avi@redhat.com>
Cc: Chris Wright <chrisw@redhat.com>,
qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: [Qemu-devel] KVM call minutes for Feb 15
Date: Thu, 17 Feb 2011 07:37:54 -0600 [thread overview]
Message-ID: <4D5D24B2.30500@codemonkey.ws> (raw)
In-Reply-To: <4D5D21C1.80009@redhat.com>
On 02/17/2011 07:25 AM, Avi Kivity wrote:
> On 02/17/2011 03:10 PM, Anthony Liguori wrote:
>> On 02/17/2011 06:23 AM, Avi Kivity wrote:
>>> On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>>>>> (btw what happens in a non-UTF-8 locale? I guess we should just
>>>>> reject unencodable strings).
>>>>
>>>>
>>>> While QEMU is mostly ASCII internally, for the purposes of the JSON
>>>> parser, we always encode and decode UTF-8. We reject invalid UTF-8
>>>> sequences. But since JSON is string-encoded unicode, we can always
>>>> decode a JSON string to valid UTF-8 as long as the string is well
>>>> formed.
>>>
>>> That is wrong. If the user passes a Unicode filename it is expected
>>> to be translated to the current locale encoding for the purpose of,
>>> say, filename lookup.
>>
>> QEMU does not support anything but UTF-8.
>
> Since when?
>
> AFAICT, JSON string conversion is the only place where there is any
> dependency on UTF-8. Anything else should just work.
>
>>
>> That's pretty common with Unix software. I don't think any modern
>> Unix platform actually uses UCS2 or UTF-16. It's either ascii or UTF-8.
>
> Most/all Linux distributions support UTF-8 as well as a zillion other
> encodings (single-byte ASCII + another charset, or multi-byte charsets
> for languages with many characters.
An application has to explicitly support an encoding. It is not
transparent. UCS2/UTF-16 means that strings are not 'const char *'s but
'const wchar_t *' where typedef unsigned short wchar_t;.
QEMU assumes, in lots of places that strings are single-byte NUL
terminated. Basically, any use of snprintf, printf, strcpy, strlen,
etc. pretty much tie you to ASCII/UTF-8. You can have a single NUL byte
as part of a valid UCS2 string.
>> The only place it even matters is Windows and Windows has ASCII and
>> UTF-16 versions of their APIs. So on Windows, non-ASCII characters
>> won't be handled correctly (yet another one of the many issues with
>> Windows support in QEMU). UTF-8 is self-recovering though so it
>> degrades gracefully.
>
> It matters on Linux with el_GR.iso88597, for example.
The whole series of iso8859 (8-bit encodings) are officially abandoned
in favor of UCS and encodings that support the full UCS code page
(UTF-8/UTF-16).
I see no strong reason to try and support deprecated encodings when
there are perfectly valid replacements like el_GR.utf8.
Regards,
Anthony Liguori
next prev parent reply other threads:[~2011-02-17 13:38 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-15 16:26 KVM call minutes for Feb 15 Chris Wright
2011-02-15 16:26 ` [Qemu-devel] " Chris Wright
2011-02-15 23:13 ` Anthony Liguori
2011-02-15 23:13 ` Anthony Liguori
2011-02-16 10:24 ` Avi Kivity
2011-02-16 10:24 ` Avi Kivity
2011-02-16 13:34 ` Anthony Liguori
2011-02-17 9:26 ` Avi Kivity
2011-02-17 12:12 ` Anthony Liguori
2011-02-17 12:23 ` Avi Kivity
2011-02-17 13:10 ` Anthony Liguori
2011-02-17 13:25 ` Avi Kivity
2011-02-17 13:37 ` Anthony Liguori
2011-02-17 13:59 ` Peter Maydell
2011-02-17 13:59 ` Peter Maydell
2011-02-17 14:01 ` Anthony Liguori
2011-02-17 14:06 ` Avi Kivity
2011-02-17 13:37 ` Anthony Liguori [this message]
2011-02-16 14:39 ` Amit Shah
2011-02-16 14:41 ` Anthony Liguori
2011-02-17 12:42 ` Amit Shah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D5D24B2.30500@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=avi@redhat.com \
--cc=chrisw@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.