All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Chris Wright <chrisw@redhat.com>,
	qemu-devel@nongnu.org, kvm@vger.kernel.org
Subject: Re: [Qemu-devel] KVM call minutes for Feb 15
Date: Thu, 17 Feb 2011 16:06:41 +0200	[thread overview]
Message-ID: <4D5D2B71.9090201@redhat.com> (raw)
In-Reply-To: <4D5D2496.8030900@codemonkey.ws>

On 02/17/2011 03:37 PM, Anthony Liguori wrote:
> On 02/17/2011 07:25 AM, Avi Kivity wrote:
>> On 02/17/2011 03:10 PM, Anthony Liguori wrote:
>>> On 02/17/2011 06:23 AM, Avi Kivity wrote:
>>>> On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>>>>>> (btw what happens in a non-UTF-8 locale? I guess we should just 
>>>>>> reject unencodable strings).
>>>>>
>>>>>
>>>>> While QEMU is mostly ASCII internally, for the purposes of the 
>>>>> JSON parser, we always encode and decode UTF-8.  We reject invalid 
>>>>> UTF-8 sequences.  But since JSON is string-encoded unicode, we can 
>>>>> always decode a JSON string to valid UTF-8 as long as the string 
>>>>> is well formed.
>>>>
>>>> That is wrong.  If the user passes a Unicode filename it is 
>>>> expected to be translated to the current locale encoding for the 
>>>> purpose of, say, filename lookup.
>>>
>>> QEMU does not support anything but UTF-8.
>>
>> Since when?
>>
>> AFAICT, JSON string conversion is the only place where there is any 
>> dependency on UTF-8.  Anything else should just work.
>>
>>>
>>> That's pretty common with Unix software.  I don't think any modern 
>>> Unix platform actually uses UCS2 or UTF-16.  It's either ascii or 
>>> UTF-8.
>>
>> Most/all Linux distributions support UTF-8 as well as a zillion other 
>> encodings (single-byte ASCII + another charset, or multi-byte 
>> charsets for languages with many characters.
>
> Maybe there's some confusion here.  UTF-8 is an encoding, not a locale.
>
> The common encodings are ASCII, UTF-8, UCS2, UTF-16, and UTF-32.

ASCII is a character set and encoding.  The rest are encodings for 
Unicode.  There are lots of other encodings, say latin-1.

>
> An application has to explicitly support an encoding.  It is not 
> transparent.

It is fully transparent until you do wire conversions (like we do with 
qmp which is explicitly UTF-8).

>   UCS2/UTF-16 means that strings are not 'const char *'s but 'const 
> wchar_t *' where typedef unsigned short wchar_t;.
>
> QEMU assumes, in lots of places that strings are single-byte NUL 
> terminated.  Basically, any use of snprintf, printf, strcpy, strlen, 
> etc. pretty much tie you to ASCII/UTF-8.  You can have a single NUL 
> byte as part of a valid UCS2 string.

We're tied to single- or multiple- byte encodings, and can't do 
wchar_t.  But that's very different from ASCII/UTF-8 only.

>
>>> The only place it even matters is Windows and Windows has ASCII and 
>>> UTF-16 versions of their APIs.  So on Windows, non-ASCII characters 
>>> won't be handled correctly (yet another one of the many issues with 
>>> Windows support in QEMU).  UTF-8 is self-recovering though so it 
>>> degrades gracefully.
>>
>> It matters on Linux with el_GR.iso88597, for example.
>
> The whole series of iso8859 (8-bit encodings) are officially abandoned 
> in favor of UCS and encodings that support the full UCS code page 
> (UTF-8/UTF-16).
>
> I see no strong reason to try and support deprecated encodings when 
> there are perfectly valid replacements like el_GR.utf8.

All it takes is a call to iconv(3).  I agree it's unlikely to happen in 
practice.

-- 
error compiling committee.c: too many arguments to function


  parent reply	other threads:[~2011-02-17 14:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-15 16:26 KVM call minutes for Feb 15 Chris Wright
2011-02-15 16:26 ` [Qemu-devel] " Chris Wright
2011-02-15 23:13 ` Anthony Liguori
2011-02-15 23:13   ` Anthony Liguori
2011-02-16 10:24   ` Avi Kivity
2011-02-16 10:24     ` Avi Kivity
2011-02-16 13:34     ` Anthony Liguori
2011-02-17  9:26       ` Avi Kivity
2011-02-17 12:12         ` Anthony Liguori
2011-02-17 12:23           ` Avi Kivity
2011-02-17 13:10             ` Anthony Liguori
2011-02-17 13:25               ` Avi Kivity
2011-02-17 13:37                 ` Anthony Liguori
2011-02-17 13:59                   ` Peter Maydell
2011-02-17 13:59                     ` Peter Maydell
2011-02-17 14:01                     ` Anthony Liguori
2011-02-17 14:06                   ` Avi Kivity [this message]
2011-02-17 13:37                 ` Anthony Liguori
2011-02-16 14:39   ` Amit Shah
2011-02-16 14:41     ` Anthony Liguori
2011-02-17 12:42       ` Amit Shah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D5D2B71.9090201@redhat.com \
    --to=avi@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=chrisw@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.