[Qemu-devel] KVM call minutes for Feb 15

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] KVM call minutes for Feb 15
@ 2011-02-15 16:26 Chris Wright
  2011-02-15 23:13 ` Anthony Liguori
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wright @ 2011-02-15 16:26 UTC (permalink / raw)
  To: kvm; +Cc: qemu-devel

QAPI and QMP
- Anthony adding a new wiki page to describe all of this
- specified in formal schema using JSON
  - includes documenation in javadoc-like syntax
  - can generate api (possibly protocol) docs
  - documenting each command and expected errors
- creates marshalling functions and C interfaces
- can generate C library
  - facilitates unit tests/regression tests
- new and old code both exist in Anthony's tree
  - allows unit tests to run on both to verify
  - will remove old and force a flag day on merging in for 0.15
- still need to convert human monitor commands
  - goal to convert all of human monitor to QMP
- events?
  - still not consumable from internal use
  - model signals and slots
    - similar to notifier lists, but can pass arbitrary data
    - client connects to signal via QMP
  - how to extend?
    - optional parameters (ABI bump)
      - no way to know if client is aware of and consuming the optional
	parameters
    - add new events
      - client required to register for new events when the know about
	them, server can generate different logic based on clients
	capability
- first release may not include shared library (lack of libconf/autotool)
  - could 
- QMP session in default well-known location
  - allows iteration of all running QMP sessions
  - per-user directory to handle user-level isolation

qdev future
- have an object model, but can't do polymorphism (i.e. bus level)
- could use more oop style, use GObject, use C++...no great ideas
- no major qdev plans for 0.15
- would be useful to have the ability to do device level unit testing
  - cleaner device model, better encapsulation
  - this is both the device side interfaces, but also interfaces back to qemu
  - ability to do something like a virtual PCI bus to be a test harness
    to interact with a device
  - back to the GObject, oop, C++ questions?
    - IDL based code generation to generate VMState in effort to make
      migration more verifiable
    - VMState
      - need to focus on serialized guest visible state
        - start with all state and remove obviously internal only state
	- start with only guest visible state (structure separation)
      - verfiable
- need a qdev tree maintainer?
- some disagreement on exactly how much 
- qdev autodoc patches? (posted and ack'd multiple times)

bad patches committed that are not on list
- please inform of specifics incidents, this should not be happening

SeaBIOS update?
- w/out we will have features that can't be used 
- need a release..
  - 0.15 will need good planning and dates and communication with Kevin

0.14-rc2 tagged please review for any missing patches, 0.14.0 likely
tagged late today

revisit new -> old migration
- Amit offers virtio-serial patches and some legwork
- tabled discussion to list, possibly next week's call

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-15 16:26 [Qemu-devel] KVM call minutes for Feb 15 Chris Wright
@ 2011-02-15 23:13 ` Anthony Liguori
  2011-02-16 10:24   ` Avi Kivity
  2011-02-16 14:39   ` Amit Shah
  0 siblings, 2 replies; 17+ messages in thread
From: Anthony Liguori @ 2011-02-15 23:13 UTC (permalink / raw)
  To: Chris Wright; +Cc: qemu-devel, kvm

On 02/15/2011 10:26 AM, Chris Wright wrote:
> QAPI and QMP
> - Anthony adding a new wiki page to describe all of this
>    

http://wiki.qemu.org/Features/QAPI

> - specified in formal schema using JSON
>    - includes documenation in javadoc-like syntax
>    - can generate api (possibly protocol) docs
>    - documenting each command and expected errors
> - creates marshalling functions and C interfaces
> - can generate C library
>    - facilitates unit tests/regression tests
> - new and old code both exist in Anthony's tree
>    - allows unit tests to run on both to verify
>    - will remove old and force a flag day on merging in for 0.15
> - still need to convert human monitor commands
>    - goal to convert all of human monitor to QMP
> - events?
>    - still not consumable from internal use
>    - model signals and slots
>      - similar to notifier lists, but can pass arbitrary data
>      - client connects to signal via QMP
>    - how to extend?
>      - optional parameters (ABI bump)
>        - no way to know if client is aware of and consuming the optional
> 	parameters
>      - add new events
>        - client required to register for new events when the know about
> 	them, server can generate different logic based on clients
> 	capability
> - first release may not include shared library (lack of libconf/autotool)
>    - could
>    

Just to be clear, this is just not in my current priority list.  I'm 
much more focused on having full unit tests, documentation, and all HMP 
commands converted.  If there's time, I will try to do this.

> - QMP session in default well-known location
>    - allows iteration of all running QMP sessions
>    - per-user directory to handle user-level isolation
>
> qdev future
> - have an object model, but can't do polymorphism (i.e. bus level)
> - could use more oop style, use GObject, use C++...no great ideas
> - no major qdev plans for 0.15
>    

For me, if anyone wants to tackle this, I'd love to have a discussion.

> - would be useful to have the ability to do device level unit testing
>    - cleaner device model, better encapsulation
>    - this is both the device side interfaces, but also interfaces back to qemu
>    - ability to do something like a virtual PCI bus to be a test harness
>      to interact with a device
>    - back to the GObject, oop, C++ questions?
>      - IDL based code generation to generate VMState in effort to make
>        migration more verifiable
>      - VMState
>        - need to focus on serialized guest visible state
>          - start with all state and remove obviously internal only state
> 	- start with only guest visible state (structure separation)
>        - verfiable
> - need a qdev tree maintainer?
> - some disagreement on exactly how much
> - qdev autodoc patches? (posted and ack'd multiple times)
>
> bad patches committed that are not on list
> - please inform of specifics incidents, this should not be happening
>
> SeaBIOS update?
> - w/out we will have features that can't be used
> - need a release..
>    - 0.15 will need good planning and dates and communication with Kevin
>
> 0.14-rc2 tagged please review for any missing patches, 0.14.0 likely
> tagged late today
>
> revisit new ->  old migration
> - Amit offers virtio-serial patches and some legwork
>    

So, to me, migration correctness trumps compatibility.  I don't think 
compatibility is useful if it means that a guest may fail during 
migration.  We have subsections as a way to support the cases where it's 
safe to migrate to an old version only if a feature is not being used or 
a corner case is not currently happening.  This is the best way to 
approach the problem.

If a subsection won't work, that means you want to migrate when you're 
completely sure that migrating will break a guest.  That doesn't seem 
reasonable at all to me.

I think in the last discussion on Amit's patches, I had suggested that 
subsections could be used to allow migration when there wasn't any 
queued data.  I think this is the best we can do while preserving 
correctness.

Regards,

Anthony Liguori

> - tabled discussion to list, possibly next week's call
>
>    

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-15 23:13 ` Anthony Liguori
@ 2011-02-16 10:24   ` Avi Kivity
  2011-02-16 13:34     ` Anthony Liguori
  2011-02-16 14:39   ` Amit Shah
  1 sibling, 1 reply; 17+ messages in thread
From: Avi Kivity @ 2011-02-16 10:24 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

On 02/16/2011 01:13 AM, Anthony Liguori wrote:
> On 02/15/2011 10:26 AM, Chris Wright wrote:
>> QAPI and QMP
>> - Anthony adding a new wiki page to describe all of this
>
> http://wiki.qemu.org/Features/QAPI
>

   [ 'change', {'device': 'str', 'target': 'str'}, {'arg': 'str'}, 'none' ]
     ->
   void qmp_change(const char *device, const char *target, bool has_arg, 
const char *arg, Error **errp);

AFAICT a json-string allows embedded NULs ('\0000').  There translate to 
UTF-8 as '\0', terminating your char *s.  Either we use some 
length/pointer structure, or the parser has to look for them and kill 
them, and we have to specify them as verboten.

BlockDeviceInfo *qmp_query_block_device_info(const char *device, Error **errp)
{
     BlockDeviceInfo *info;
     BlockDriverState *bs;
     Error *local_err = NULL;

     bs = bdrv_find(device,&local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return NULL;
     }

     info->file = qemu_strdup(bs->filename);
     info->ro = bs->readonly;
     info->drv = qemu_strdup(bs->drv);
     info->encrypted = bs->encrypted;
     if (bs->backing_file[0]) {
         info->has_backing_file = true;
         info->backing_file = qemu_strdup(info->backing_file);
     }

     return info;
}


So, info and all its pointer-typed members are required to be 
qemu_free() compatible, with just a single pointer pointing to an 
object, and generated code will qemu_free() everything?

Recommend translating '-' in identifiers to '_' so we can use '-' in the 
schema as a word separator.

-- 

error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-16 10:24   ` Avi Kivity
@ 2011-02-16 13:34     ` Anthony Liguori
  2011-02-17  9:26       ` Avi Kivity
  0 siblings, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2011-02-16 13:34 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, qemu-devel, kvm

On 02/16/2011 04:24 AM, Avi Kivity wrote:
> On 02/16/2011 01:13 AM, Anthony Liguori wrote:
>> On 02/15/2011 10:26 AM, Chris Wright wrote:
>>> QAPI and QMP
>>> - Anthony adding a new wiki page to describe all of this
>>
>> http://wiki.qemu.org/Features/QAPI
>>
>
>   [ 'change', {'device': 'str', 'target': 'str'}, {'arg': 'str'}, 
> 'none' ]
>     ->
>   void qmp_change(const char *device, const char *target, bool 
> has_arg, const char *arg, Error **errp);
>
> AFAICT a json-string allows embedded NULs ('\0000').  There translate 
> to UTF-8 as '\0', terminating your char *s.  Either we use some 
> length/pointer structure, or the parser has to look for them and kill 
> them, and we have to specify them as verboten.

I feel like it would be safer for us to not accept strings with embedded 
NULs.  There's no way we're going to consistently handle this correctly 
in QEMU since we expect NUL terminated strings.  They won't work for any 
of the standard C functions either.

>
> BlockDeviceInfo *qmp_query_block_device_info(const char *device, Error 
> **errp)
> {
>     BlockDeviceInfo *info;
>     BlockDriverState *bs;
>     Error *local_err = NULL;
>
>     bs = bdrv_find(device,&local_err);
>     if (local_err) {
>         error_propagate(errp, local_err);
>         return NULL;
>     }
>
>     info->file = qemu_strdup(bs->filename);
>     info->ro = bs->readonly;
>     info->drv = qemu_strdup(bs->drv);
>     info->encrypted = bs->encrypted;
>     if (bs->backing_file[0]) {
>         info->has_backing_file = true;
>         info->backing_file = qemu_strdup(info->backing_file);
>     }
>
>     return info;
> }
>
>
> So, info and all its pointer-typed members are required to be 
> qemu_free() compatible, with just a single pointer pointing to an 
> object, and generated code will qemu_free() everything?

Yes.

>
> Recommend translating '-' in identifiers to '_' so we can use '-' in 
> the schema as a word separator.

Already do that and we make extensive use of that in the schema.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-15 23:13 ` Anthony Liguori
  2011-02-16 10:24   ` Avi Kivity
@ 2011-02-16 14:39   ` Amit Shah
  2011-02-16 14:41     ` Anthony Liguori
  1 sibling, 1 reply; 17+ messages in thread
From: Amit Shah @ 2011-02-16 14:39 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

On (Tue) 15 Feb 2011 [17:13:13], Anthony Liguori wrote:
> On 02/15/2011 10:26 AM, Chris Wright wrote:
> >
> >revisit new ->  old migration
> >- Amit offers virtio-serial patches and some legwork
> 
> So, to me, migration correctness trumps compatibility.  I don't
> think compatibility is useful if it means that a guest may fail
> during migration.  We have subsections as a way to support the cases
> where it's safe to migrate to an old version only if a feature is
> not being used or a corner case is not currently happening.  This is
> the best way to approach the problem.
> 
> If a subsection won't work, that means you want to migrate when
> you're completely sure that migrating will break a guest.  That
> doesn't seem reasonable at all to me.
> 
> I think in the last discussion on Amit's patches, I had suggested
> that subsections could be used to allow migration when there wasn't
> any queued data.  I think this is the best we can do while
> preserving correctness.

The only problem is that virtio hasn't been converted over to vmstate,
which is necessary for subsections.

		Amit

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-16 14:39   ` Amit Shah
@ 2011-02-16 14:41     ` Anthony Liguori
  2011-02-17 12:42       ` Amit Shah
  0 siblings, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2011-02-16 14:41 UTC (permalink / raw)
  To: Amit Shah; +Cc: Chris Wright, qemu-devel, kvm

On 02/16/2011 08:39 AM, Amit Shah wrote:
> On (Tue) 15 Feb 2011 [17:13:13], Anthony Liguori wrote:
>    
>> On 02/15/2011 10:26 AM, Chris Wright wrote:
>>      
>>> revisit new ->   old migration
>>> - Amit offers virtio-serial patches and some legwork
>>>        
>> So, to me, migration correctness trumps compatibility.  I don't
>> think compatibility is useful if it means that a guest may fail
>> during migration.  We have subsections as a way to support the cases
>> where it's safe to migrate to an old version only if a feature is
>> not being used or a corner case is not currently happening.  This is
>> the best way to approach the problem.
>>
>> If a subsection won't work, that means you want to migrate when
>> you're completely sure that migrating will break a guest.  That
>> doesn't seem reasonable at all to me.
>>
>> I think in the last discussion on Amit's patches, I had suggested
>> that subsections could be used to allow migration when there wasn't
>> any queued data.  I think this is the best we can do while
>> preserving correctness.
>>      
> The only problem is that virtio hasn't been converted over to vmstate,
> which is necessary for subsections.
>    

Then it needs to be converted.

Regards,

Anthony Liguori

> 		Amit
>
>    

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-16 13:34     ` Anthony Liguori
@ 2011-02-17  9:26       ` Avi Kivity
  2011-02-17 12:12         ` Anthony Liguori
  0 siblings, 1 reply; 17+ messages in thread
From: Avi Kivity @ 2011-02-17  9:26 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

On 02/16/2011 03:34 PM, Anthony Liguori wrote:
> On 02/16/2011 04:24 AM, Avi Kivity wrote:
>> On 02/16/2011 01:13 AM, Anthony Liguori wrote:
>>> On 02/15/2011 10:26 AM, Chris Wright wrote:
>>>> QAPI and QMP
>>>> - Anthony adding a new wiki page to describe all of this
>>>
>>> http://wiki.qemu.org/Features/QAPI
>>>
>>
>>   [ 'change', {'device': 'str', 'target': 'str'}, {'arg': 'str'}, 
>> 'none' ]
>>     ->
>>   void qmp_change(const char *device, const char *target, bool 
>> has_arg, const char *arg, Error **errp);
>>
>> AFAICT a json-string allows embedded NULs ('\0000').  There translate 
>> to UTF-8 as '\0', terminating your char *s.  Either we use some 
>> length/pointer structure, or the parser has to look for them and kill 
>> them, and we have to specify them as verboten.
>
> I feel like it would be safer for us to not accept strings with 
> embedded NULs.  There's no way we're going to consistently handle this 
> correctly in QEMU since we expect NUL terminated strings.  They won't 
> work for any of the standard C functions either.

I agree.  Technically we're making a backwards incompatible change to 
the protocol specification, but I don't think there's any risk that 
somebody is sending in strings with NULs.

(btw what happens in a non-UTF-8 locale? I guess we should just reject 
unencodable strings).

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-17  9:26       ` Avi Kivity
@ 2011-02-17 12:12         ` Anthony Liguori
  2011-02-17 12:23           ` Avi Kivity
  0 siblings, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2011-02-17 12:12 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, qemu-devel, kvm

On 02/17/2011 03:26 AM, Avi Kivity wrote:
> On 02/16/2011 03:34 PM, Anthony Liguori wrote:
>> On 02/16/2011 04:24 AM, Avi Kivity wrote:
>>> On 02/16/2011 01:13 AM, Anthony Liguori wrote:
>>>> On 02/15/2011 10:26 AM, Chris Wright wrote:
>>>>> QAPI and QMP
>>>>> - Anthony adding a new wiki page to describe all of this
>>>>
>>>> http://wiki.qemu.org/Features/QAPI
>>>>
>>>
>>>   [ 'change', {'device': 'str', 'target': 'str'}, {'arg': 'str'}, 
>>> 'none' ]
>>>     ->
>>>   void qmp_change(const char *device, const char *target, bool 
>>> has_arg, const char *arg, Error **errp);
>>>
>>> AFAICT a json-string allows embedded NULs ('\0000').  There 
>>> translate to UTF-8 as '\0', terminating your char *s.  Either we use 
>>> some length/pointer structure, or the parser has to look for them 
>>> and kill them, and we have to specify them as verboten.
>>
>> I feel like it would be safer for us to not accept strings with 
>> embedded NULs.  There's no way we're going to consistently handle 
>> this correctly in QEMU since we expect NUL terminated strings.  They 
>> won't work for any of the standard C functions either.
>
> I agree.  Technically we're making a backwards incompatible change to 
> the protocol specification, but I don't think there's any risk that 
> somebody is sending in strings with NULs.
>
> (btw what happens in a non-UTF-8 locale? I guess we should just reject 
> unencodable strings).

While QEMU is mostly ASCII internally, for the purposes of the JSON 
parser, we always encode and decode UTF-8.  We reject invalid UTF-8 
sequences.  But since JSON is string-encoded unicode, we can always 
decode a JSON string to valid UTF-8 as long as the string is well formed.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-17 12:12         ` Anthony Liguori
@ 2011-02-17 12:23           ` Avi Kivity
  2011-02-17 13:10             ` Anthony Liguori
  0 siblings, 1 reply; 17+ messages in thread
From: Avi Kivity @ 2011-02-17 12:23 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>> (btw what happens in a non-UTF-8 locale? I guess we should just 
>> reject unencodable strings).
>
>
> While QEMU is mostly ASCII internally, for the purposes of the JSON 
> parser, we always encode and decode UTF-8.  We reject invalid UTF-8 
> sequences.  But since JSON is string-encoded unicode, we can always 
> decode a JSON string to valid UTF-8 as long as the string is well formed.

That is wrong.  If the user passes a Unicode filename it is expected to 
be translated to the current locale encoding for the purpose of, say, 
filename lookup.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-16 14:41     ` Anthony Liguori
@ 2011-02-17 12:42       ` Amit Shah
  0 siblings, 0 replies; 17+ messages in thread
From: Amit Shah @ 2011-02-17 12:42 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

On (Wed) 16 Feb 2011 [08:41:27], Anthony Liguori wrote:
> On 02/16/2011 08:39 AM, Amit Shah wrote:
> >On (Tue) 15 Feb 2011 [17:13:13], Anthony Liguori wrote:
> >>On 02/15/2011 10:26 AM, Chris Wright wrote:
> >>>revisit new ->   old migration
> >>>- Amit offers virtio-serial patches and some legwork
> >>So, to me, migration correctness trumps compatibility.  I don't
> >>think compatibility is useful if it means that a guest may fail
> >>during migration.  We have subsections as a way to support the cases
> >>where it's safe to migrate to an old version only if a feature is
> >>not being used or a corner case is not currently happening.  This is
> >>the best way to approach the problem.
> >>
> >>If a subsection won't work, that means you want to migrate when
> >>you're completely sure that migrating will break a guest.  That
> >>doesn't seem reasonable at all to me.
> >>
> >>I think in the last discussion on Amit's patches, I had suggested
> >>that subsections could be used to allow migration when there wasn't
> >>any queued data.  I think this is the best we can do while
> >>preserving correctness.
> >The only problem is that virtio hasn't been converted over to vmstate,
> >which is necessary for subsections.
> 
> Then it needs to be converted.

But that can't be done for 0.14.

		Amit

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-17 12:23           ` Avi Kivity
@ 2011-02-17 13:10             ` Anthony Liguori
  2011-02-17 13:25               ` Avi Kivity
  0 siblings, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2011-02-17 13:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, qemu-devel, kvm

On 02/17/2011 06:23 AM, Avi Kivity wrote:
> On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>>> (btw what happens in a non-UTF-8 locale? I guess we should just 
>>> reject unencodable strings).
>>
>>
>> While QEMU is mostly ASCII internally, for the purposes of the JSON 
>> parser, we always encode and decode UTF-8.  We reject invalid UTF-8 
>> sequences.  But since JSON is string-encoded unicode, we can always 
>> decode a JSON string to valid UTF-8 as long as the string is well 
>> formed.
>
> That is wrong.  If the user passes a Unicode filename it is expected 
> to be translated to the current locale encoding for the purpose of, 
> say, filename lookup.

QEMU does not support anything but UTF-8.

That's pretty common with Unix software.  I don't think any modern Unix 
platform actually uses UCS2 or UTF-16.  It's either ascii or UTF-8.

The only place it even matters is Windows and Windows has ASCII and 
UTF-16 versions of their APIs.  So on Windows, non-ASCII characters 
won't be handled correctly (yet another one of the many issues with 
Windows support in QEMU).  UTF-8 is self-recovering though so it 
degrades gracefully.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-17 13:10             ` Anthony Liguori
@ 2011-02-17 13:25               ` Avi Kivity
  2011-02-17 13:37                 ` Anthony Liguori
  2011-02-17 13:37                 ` Anthony Liguori
  0 siblings, 2 replies; 17+ messages in thread
From: Avi Kivity @ 2011-02-17 13:25 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

On 02/17/2011 03:10 PM, Anthony Liguori wrote:
> On 02/17/2011 06:23 AM, Avi Kivity wrote:
>> On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>>>> (btw what happens in a non-UTF-8 locale? I guess we should just 
>>>> reject unencodable strings).
>>>
>>>
>>> While QEMU is mostly ASCII internally, for the purposes of the JSON 
>>> parser, we always encode and decode UTF-8.  We reject invalid UTF-8 
>>> sequences.  But since JSON is string-encoded unicode, we can always 
>>> decode a JSON string to valid UTF-8 as long as the string is well 
>>> formed.
>>
>> That is wrong.  If the user passes a Unicode filename it is expected 
>> to be translated to the current locale encoding for the purpose of, 
>> say, filename lookup.
>
> QEMU does not support anything but UTF-8.

Since when?

AFAICT, JSON string conversion is the only place where there is any 
dependency on UTF-8.  Anything else should just work.

>
> That's pretty common with Unix software.  I don't think any modern 
> Unix platform actually uses UCS2 or UTF-16.  It's either ascii or UTF-8.

Most/all Linux distributions support UTF-8 as well as a zillion other 
encodings (single-byte ASCII + another charset, or multi-byte charsets 
for languages with many characters.

> The only place it even matters is Windows and Windows has ASCII and 
> UTF-16 versions of their APIs.  So on Windows, non-ASCII characters 
> won't be handled correctly (yet another one of the many issues with 
> Windows support in QEMU).  UTF-8 is self-recovering though so it 
> degrades gracefully.

It matters on Linux with el_GR.iso88597, for example.  If you feed a 
JSON string and translate it blindly to UTF-8, you'll get garbage when 
you feed it to system calls.

Practically everyone uses UTF-8 these days, so the impact is minimal, 
but it is more correct (as well as simpler) to ask the system libraries 
to encode using the current locale.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-17 13:25               ` Avi Kivity
@ 2011-02-17 13:37                 ` Anthony Liguori
  2011-02-17 13:59                   ` Peter Maydell
  2011-02-17 14:06                   ` Avi Kivity
  2011-02-17 13:37                 ` Anthony Liguori
  1 sibling, 2 replies; 17+ messages in thread
From: Anthony Liguori @ 2011-02-17 13:37 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, qemu-devel, kvm

On 02/17/2011 07:25 AM, Avi Kivity wrote:
> On 02/17/2011 03:10 PM, Anthony Liguori wrote:
>> On 02/17/2011 06:23 AM, Avi Kivity wrote:
>>> On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>>>>> (btw what happens in a non-UTF-8 locale? I guess we should just 
>>>>> reject unencodable strings).
>>>>
>>>>
>>>> While QEMU is mostly ASCII internally, for the purposes of the JSON 
>>>> parser, we always encode and decode UTF-8.  We reject invalid UTF-8 
>>>> sequences.  But since JSON is string-encoded unicode, we can always 
>>>> decode a JSON string to valid UTF-8 as long as the string is well 
>>>> formed.
>>>
>>> That is wrong.  If the user passes a Unicode filename it is expected 
>>> to be translated to the current locale encoding for the purpose of, 
>>> say, filename lookup.
>>
>> QEMU does not support anything but UTF-8.
>
> Since when?
>
> AFAICT, JSON string conversion is the only place where there is any 
> dependency on UTF-8.  Anything else should just work.
>
>>
>> That's pretty common with Unix software.  I don't think any modern 
>> Unix platform actually uses UCS2 or UTF-16.  It's either ascii or UTF-8.
>
> Most/all Linux distributions support UTF-8 as well as a zillion other 
> encodings (single-byte ASCII + another charset, or multi-byte charsets 
> for languages with many characters.

Maybe there's some confusion here.  UTF-8 is an encoding, not a locale.

The common encodings are ASCII, UTF-8, UCS2, UTF-16, and UTF-32.

An application has to explicitly support an encoding.  It is not 
transparent.  UCS2/UTF-16 means that strings are not 'const char *'s but 
'const wchar_t *' where typedef unsigned short wchar_t;.

QEMU assumes, in lots of places that strings are single-byte NUL 
terminated.  Basically, any use of snprintf, printf, strcpy, strlen, 
etc. pretty much tie you to ASCII/UTF-8.  You can have a single NUL byte 
as part of a valid UCS2 string.

>> The only place it even matters is Windows and Windows has ASCII and 
>> UTF-16 versions of their APIs.  So on Windows, non-ASCII characters 
>> won't be handled correctly (yet another one of the many issues with 
>> Windows support in QEMU).  UTF-8 is self-recovering though so it 
>> degrades gracefully.
>
> It matters on Linux with el_GR.iso88597, for example.

The whole series of iso8859 (8-bit encodings) are officially abandoned 
in favor of UCS and encodings that support the full UCS code page 
(UTF-8/UTF-16).

I see no strong reason to try and support deprecated encodings when 
there are perfectly valid replacements like el_GR.utf8.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-17 13:25               ` Avi Kivity
  2011-02-17 13:37                 ` Anthony Liguori
@ 2011-02-17 13:37                 ` Anthony Liguori
  1 sibling, 0 replies; 17+ messages in thread
From: Anthony Liguori @ 2011-02-17 13:37 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, qemu-devel, kvm

On 02/17/2011 07:25 AM, Avi Kivity wrote:
> On 02/17/2011 03:10 PM, Anthony Liguori wrote:
>> On 02/17/2011 06:23 AM, Avi Kivity wrote:
>>> On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>>>>> (btw what happens in a non-UTF-8 locale? I guess we should just 
>>>>> reject unencodable strings).
>>>>
>>>>
>>>> While QEMU is mostly ASCII internally, for the purposes of the JSON 
>>>> parser, we always encode and decode UTF-8.  We reject invalid UTF-8 
>>>> sequences.  But since JSON is string-encoded unicode, we can always 
>>>> decode a JSON string to valid UTF-8 as long as the string is well 
>>>> formed.
>>>
>>> That is wrong.  If the user passes a Unicode filename it is expected 
>>> to be translated to the current locale encoding for the purpose of, 
>>> say, filename lookup.
>>
>> QEMU does not support anything but UTF-8.
>
> Since when?
>
> AFAICT, JSON string conversion is the only place where there is any 
> dependency on UTF-8.  Anything else should just work.
>
>>
>> That's pretty common with Unix software.  I don't think any modern 
>> Unix platform actually uses UCS2 or UTF-16.  It's either ascii or UTF-8.
>
> Most/all Linux distributions support UTF-8 as well as a zillion other 
> encodings (single-byte ASCII + another charset, or multi-byte charsets 
> for languages with many characters.

An application has to explicitly support an encoding.  It is not 
transparent.  UCS2/UTF-16 means that strings are not 'const char *'s but 
'const wchar_t *' where typedef unsigned short wchar_t;.

QEMU assumes, in lots of places that strings are single-byte NUL 
terminated.  Basically, any use of snprintf, printf, strcpy, strlen, 
etc. pretty much tie you to ASCII/UTF-8.  You can have a single NUL byte 
as part of a valid UCS2 string.

>> The only place it even matters is Windows and Windows has ASCII and 
>> UTF-16 versions of their APIs.  So on Windows, non-ASCII characters 
>> won't be handled correctly (yet another one of the many issues with 
>> Windows support in QEMU).  UTF-8 is self-recovering though so it 
>> degrades gracefully.
>
> It matters on Linux with el_GR.iso88597, for example.

The whole series of iso8859 (8-bit encodings) are officially abandoned 
in favor of UCS and encodings that support the full UCS code page 
(UTF-8/UTF-16).

I see no strong reason to try and support deprecated encodings when 
there are perfectly valid replacements like el_GR.utf8.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-17 13:37                 ` Anthony Liguori
@ 2011-02-17 13:59                   ` Peter Maydell
  2011-02-17 14:01                     ` Anthony Liguori
  2011-02-17 14:06                   ` Avi Kivity
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2011-02-17 13:59 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, Avi Kivity, kvm, qemu-devel

On 17 February 2011 13:37, Anthony Liguori <anthony@codemonkey.ws> wrote:
> An application has to explicitly support an encoding.  It is not
> transparent.  UCS2/UTF-16 means that strings are not 'const char *'s but
> 'const wchar_t *' where typedef unsigned short wchar_t;.
>
> QEMU assumes, in lots of places that strings are single-byte NUL terminated.
>  Basically, any use of snprintf, printf, strcpy, strlen, etc. pretty much
> tie you to ASCII/UTF-8.

Er, no, it limits you to those encodings where you can treat strings
as "bag of NUL-terminated bytes". Oddly enough just about all the
common legacy ones (iso-8859-*, iso-2022-jp, etc) fit in that category
because otherwise they'd break really badly. As it is, generally
things Just Work for programs which treat filenames as "an opaque
string".

-- PMM

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-17 13:59                   ` Peter Maydell
@ 2011-02-17 14:01                     ` Anthony Liguori
  0 siblings, 0 replies; 17+ messages in thread
From: Anthony Liguori @ 2011-02-17 14:01 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Chris Wright, Avi Kivity, kvm, qemu-devel

On 02/17/2011 07:59 AM, Peter Maydell wrote:
> On 17 February 2011 13:37, Anthony Liguori<anthony@codemonkey.ws>  wrote:
>    
>> An application has to explicitly support an encoding.  It is not
>> transparent.  UCS2/UTF-16 means that strings are not 'const char *'s but
>> 'const wchar_t *' where typedef unsigned short wchar_t;.
>>
>> QEMU assumes, in lots of places that strings are single-byte NUL terminated.
>>   Basically, any use of snprintf, printf, strcpy, strlen, etc. pretty much
>> tie you to ASCII/UTF-8.
>>      
> Er, no, it limits you to those encodings where you can treat strings
> as "bag of NUL-terminated bytes". Oddly enough just about all the
> common legacy ones (iso-8859-*, iso-2022-jp, etc) fit in that category
> because otherwise they'd break really badly.

I wasn't even considering those because I think the entire world has 
moved to unicode/utf*

Those functions limit you to UTF-8 which was my original point.

Regards,

Anthony Liguori

>   As it is, generally
> things Just Work for programs which treat filenames as "an opaque
> string".
>
> -- PMM
>
>    

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] KVM call minutes for Feb 15
  2011-02-17 13:37                 ` Anthony Liguori
  2011-02-17 13:59                   ` Peter Maydell
@ 2011-02-17 14:06                   ` Avi Kivity
  1 sibling, 0 replies; 17+ messages in thread
From: Avi Kivity @ 2011-02-17 14:06 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Chris Wright, qemu-devel, kvm

On 02/17/2011 03:37 PM, Anthony Liguori wrote:
> On 02/17/2011 07:25 AM, Avi Kivity wrote:
>> On 02/17/2011 03:10 PM, Anthony Liguori wrote:
>>> On 02/17/2011 06:23 AM, Avi Kivity wrote:
>>>> On 02/17/2011 02:12 PM, Anthony Liguori wrote:
>>>>>> (btw what happens in a non-UTF-8 locale? I guess we should just 
>>>>>> reject unencodable strings).
>>>>>
>>>>>
>>>>> While QEMU is mostly ASCII internally, for the purposes of the 
>>>>> JSON parser, we always encode and decode UTF-8.  We reject invalid 
>>>>> UTF-8 sequences.  But since JSON is string-encoded unicode, we can 
>>>>> always decode a JSON string to valid UTF-8 as long as the string 
>>>>> is well formed.
>>>>
>>>> That is wrong.  If the user passes a Unicode filename it is 
>>>> expected to be translated to the current locale encoding for the 
>>>> purpose of, say, filename lookup.
>>>
>>> QEMU does not support anything but UTF-8.
>>
>> Since when?
>>
>> AFAICT, JSON string conversion is the only place where there is any 
>> dependency on UTF-8.  Anything else should just work.
>>
>>>
>>> That's pretty common with Unix software.  I don't think any modern 
>>> Unix platform actually uses UCS2 or UTF-16.  It's either ascii or 
>>> UTF-8.
>>
>> Most/all Linux distributions support UTF-8 as well as a zillion other 
>> encodings (single-byte ASCII + another charset, or multi-byte 
>> charsets for languages with many characters.
>
> Maybe there's some confusion here.  UTF-8 is an encoding, not a locale.
>
> The common encodings are ASCII, UTF-8, UCS2, UTF-16, and UTF-32.

ASCII is a character set and encoding.  The rest are encodings for 
Unicode.  There are lots of other encodings, say latin-1.

>
> An application has to explicitly support an encoding.  It is not 
> transparent.

It is fully transparent until you do wire conversions (like we do with 
qmp which is explicitly UTF-8).

>   UCS2/UTF-16 means that strings are not 'const char *'s but 'const 
> wchar_t *' where typedef unsigned short wchar_t;.
>
> QEMU assumes, in lots of places that strings are single-byte NUL 
> terminated.  Basically, any use of snprintf, printf, strcpy, strlen, 
> etc. pretty much tie you to ASCII/UTF-8.  You can have a single NUL 
> byte as part of a valid UCS2 string.

We're tied to single- or multiple- byte encodings, and can't do 
wchar_t.  But that's very different from ASCII/UTF-8 only.

>
>>> The only place it even matters is Windows and Windows has ASCII and 
>>> UTF-16 versions of their APIs.  So on Windows, non-ASCII characters 
>>> won't be handled correctly (yet another one of the many issues with 
>>> Windows support in QEMU).  UTF-8 is self-recovering though so it 
>>> degrades gracefully.
>>
>> It matters on Linux with el_GR.iso88597, for example.
>
> The whole series of iso8859 (8-bit encodings) are officially abandoned 
> in favor of UCS and encodings that support the full UCS code page 
> (UTF-8/UTF-16).
>
> I see no strong reason to try and support deprecated encodings when 
> there are perfectly valid replacements like el_GR.utf8.

All it takes is a call to iconv(3).  I agree it's unlikely to happen in 
practice.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2011-02-17 14:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-15 16:26 [Qemu-devel] KVM call minutes for Feb 15 Chris Wright
2011-02-15 23:13 ` Anthony Liguori
2011-02-16 10:24   ` Avi Kivity
2011-02-16 13:34     ` Anthony Liguori
2011-02-17  9:26       ` Avi Kivity
2011-02-17 12:12         ` Anthony Liguori
2011-02-17 12:23           ` Avi Kivity
2011-02-17 13:10             ` Anthony Liguori
2011-02-17 13:25               ` Avi Kivity
2011-02-17 13:37                 ` Anthony Liguori
2011-02-17 13:59                   ` Peter Maydell
2011-02-17 14:01                     ` Anthony Liguori
2011-02-17 14:06                   ` Avi Kivity
2011-02-17 13:37                 ` Anthony Liguori
2011-02-16 14:39   ` Amit Shah
2011-02-16 14:41     ` Anthony Liguori
2011-02-17 12:42       ` Amit Shah

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).