From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:40890)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1RAioK-0005QQ-FK
	for qemu-devel@nongnu.org; Mon, 03 Oct 2011 09:44:01 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1RAioJ-00039n-1y
	for qemu-devel@nongnu.org; Mon, 03 Oct 2011 09:44:00 -0400
Received: from mail-qy0-f180.google.com ([209.85.216.180]:62558)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <anthony@codemonkey.ws>) id 1RAioI-00039Z-VA
	for qemu-devel@nongnu.org; Mon, 03 Oct 2011 09:43:59 -0400
Received: by qyc1 with SMTP id 1so4088608qyc.4
	for <qemu-devel@nongnu.org>; Mon, 03 Oct 2011 06:43:57 -0700 (PDT)
Message-ID: <4E89BC1A.30208@codemonkey.ws>
Date: Mon, 03 Oct 2011 08:43:54 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <1316443309-23843-1-git-send-email-mdroth@linux.vnet.ibm.com>	<20111003064653.GA15380@redhat.com>
	<4E89AFB4.8000103@us.ibm.com> <20111003132445.GB18920@redhat.com>
In-Reply-To: <20111003132445.GB18920@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC] New Migration Protocol using Visitor
	Interface
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: aliguori@linux.vnet.ibm.com, Anthony Liguori <aliguori@us.ibm.com>, Michael Roth <mdroth@linux.vnet.ibm.com>, qemu-devel@nongnu.org

On 10/03/2011 08:24 AM, Michael S. Tsirkin wrote:
> On Mon, Oct 03, 2011 at 07:51:00AM -0500, Anthony Liguori wrote:
>>> Here are some suggestions:
>>>
>>> - Let's make the protocol be BER directly.
>>>    As a first step, use a single octet string for
>>>    the whole of data. Next, start splitting this up.
>>
>> This can't be done without breaking the old style migration
>> protocol.  I don't have a problem with that but I do have a problem
>> with repeatedly breaking migration protocol.
>
> As long as this is within a release cycle, is this a real problem?

I think if we try to fit it within a release we'll either end up with a 2 year 
long release or a half-broken conversion.

I'd rather buy ourselves time by supporting both formats.  That way we can 
remove the old format when we're satisfied with the ASN.1 encoding.

>> The Visitor interface is very little abstraction over a native BER
>> interface. It lets us introduce BER incrementally while still
>> supporting the old protocol.
>>
>>>
>>> - Don't try to use arrays at all. They don't make sense
>>>    in either jason or asn. Instead, use sets for unordered
>>>    data and sequences for ordered data.
>>
>> sequences map to visit_start_array() pretty well. You would do something like:
>>
>> visit_start_array(v, "entries", errp);
>> for (int i = 0; i<  s->size; i++) {
>>      visit_type_int(v, NULL,&s->entry[i], errp);
>> }
>> visit_end_array(v, errp);
>
> Sequences can encode structures not just arrays.
> How would you encode this for example:
>
> SEQUENCE OF { VQN: INTEGER, SEQUENCE { OPTIONAL VECTOR: INTEGER}  }

visit_start_array(v, "vqs", errp);
for (i = 0; i < s->n_vqs; i++) {
     // Array elements never have a name, hence NULL name
     visit_start_struct(v, "VirtQueue", NULL, errp);
     visit_type_int(v, &s->vq[i].num, "vqn", errp);

     // Given this sub-struct an arbitrary name.  It could also be anonymous.
     visit_start_struct(v, "MsixInfo", "msix_info", errp);
     if (s->vq[i].msix_enabled) {
         visit_type_int(v, &s->vq[i].vector, "vector", errp);
     }
     visit_end_struct(v, errp);

     visit_end_struct(v, errp);
}
visit_end_array(v, errp);

This would also generate JSON of:

'vqs': [ { 'vqn': 2, 'msix_info': { 'vector': 3 } } ]

> This is a natural description of a virtio device.
>
> Yes we can unwrap this to a structure of arrays but
> this is tweaking protocol to match implementation.

There's no need to from a Visitor perspective.  It doesn't make it any easier.

>
>>> - Most data should be unordered: we don't normally care
>>>    in which order we get device attributes.
>>>    So we would have set of sequences, and
>>>    each sequence is attribute name (ascii string) + attribute data
>>
>> I'd actually be inclined to make everything ordered by default and
>> not encode explicit field names.  I'm afraid that encoding field
>> names on the wire is going to easily increase the amount of device
>> state by an order of magnitude.
>
> Assuming we worry about space used by name tags, let's use
> content-specific tags.
>
>>> - This does not discuss how we will handle cross version
>>>    migration in a lot of detail. Presumably you expect capability magic to
>>>    work for this. However this does not work for migration to file.
>>>    A self describing format makes a much simpler solution possible:
>>>    separating attributes which destination must parse from those
>>>    it can skip safely. If destination sees an attribute
>>>    that was marked as must parse and does not recognize it,
>>>    migration fails. If not, just skip the attribute.
>>
>> Compatibility and the wire format are more or less orthogonal.  A
>> self describing format helps eliminate a class of bugs but getting
>> compatibility right requires quite a bit more.
>>
>> Regards,
>>
>> Anthony Liguori
>
> So I think we do care about compatibility.
> I think this is by far the most interesting part of the problem.
> And making that robust IMO pretty much requires not relying
> on field order whenever we are not absolutely sure
> no other order makes sense.

There are multiple things to consider with compatibility:

1) Creating compatible device models.  This is a qdev problem and can't be 
solved in the protocol.

2) Ensuring we are sending all the data we need to.  I think we solve this 
problem by autogenerating Visitors from the C definitions of the device structures.

3) Having the flexibility to morph what's sent over the wire into something that 
we matches our internal state.  I don't think the wire format solves this.  I 
think what solves this the ability to read the protocol into a data structure 
such that we can manipulate the data structure in memory.

Having the ability to ignore some fields is not enough.  We need to also be able 
to split a single field into multiple fields, and event split a single device 
into multiple devices.  If we're dealing with a tree data structure in memory, 
we have limitless ability to fudge things.

Visitors buy us this.  Right now, we're talking about:

void virtio_marshal(QEMUFile *f, VirtioDevice *vdev);

If you just do BER encoding, that doesn't help us.  We still can only read from 
the wire to a VirtioDevice.   If we do:

void virtio_marshal(Visitor *v, VirtioDevice *vdev);

We can still do BER encoding.  We can also do QEMUFile encoding in the current 
format.

But we also can Visit from a QObject.  That means we can write something that 
reads in BER off the wire and generates QObjects corresponding to whatever is 
sent.  We can then manipulate those QObjects in memory as much as we'd like 
before handing a QObject Visitor to virtio_marshal.

Separating parsing off the wire from handing something to the device models and 
allowing transformation in between is the key to addressing compatibility in a 
robust fashion.

Regards,

Anthony Liguori

> Content-specific tags as suggested above let us do unordered.
> We lose the ability to easily examine
> input by external tools but without field names
> they won't be able to parse it so it's not all that valuable anyway.
>
>