* [Qemu-devel] optional feature (was Re: The State of the SaveVM format)
@ 2009-09-16 10:46 Michael S. Tsirkin
2009-09-16 11:04 ` [Qemu-devel] Re: optional feature Juan Quintela
0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 10:46 UTC (permalink / raw)
To: Juan Quintela, gleb; +Cc: qemu-devel
On Wed, Sep 09, 2009 at 10:47:27AM +0200, Juan Quintela wrote:
> How do we deal with optional features?
Here's an idea that Gleb suggested in a private
conversation: make optional features into
separate, non-user-visible devices.
Thus we would have vmstate for virtio and additionally, if msix is
enabled, vmstate for msix. This solves the problem of the number of
devices becoming exponential with the number of features: we have device
per feature.
I understand that RTC does something like this.
Happy?
--
MST
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 10:46 [Qemu-devel] optional feature (was Re: The State of the SaveVM format) Michael S. Tsirkin
@ 2009-09-16 11:04 ` Juan Quintela
2009-09-16 11:18 ` Gleb Natapov
2009-09-16 11:41 ` Michael S. Tsirkin
0 siblings, 2 replies; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 11:04 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, gleb
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 09, 2009 at 10:47:27AM +0200, Juan Quintela wrote:
>> How do we deal with optional features?
>
> Here's an idea that Gleb suggested in a private
> conversation: make optional features into
> separate, non-user-visible devices.
>
> Thus we would have vmstate for virtio and additionally, if msix is
> enabled, vmstate for msix. This solves the problem of the number of
> devices becoming exponential with the number of features: we have device
> per feature.
>
> I understand that RTC does something like this.
And it is wrong :) I sent a patch to fix it properly, but we have the
problem of backward compatibility with kvm.
Forget msix for virtio, virtio has the problem already with pci.
virtio_save()
{
if (vdev->binding->save_config)
vdev->binding->save_config(vdev->binding_opaque, f);
qemu_put_8s(f, &vdev->status);
.... some other normal fields ...
for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
if (vdev->vq[i].vring.num == 0)
break;
Not a problem, we can precalculate i on pre_save()
qemu_put_be32(f, vdev->vq[i].vring.num);
qemu_put_be64(f, vdev->vq[i].pa);
qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
This is sending a partial array of struct (the "i" 1st entries)
No problem here.
if (vdev->binding->save_queue)
vdev->binding->save_queue(vdev->binding_opaque, i, f);
Again, what to do with this one.
}
}
Looking at what does virtio_pci_save_queue()
static void virtio_pci_save_queue(void * opaque, int n, QEMUFile *f)
{
VirtIOPCIProxy *proxy = opaque;
if (msix_present(&proxy->pci_dev))
qemu_put_be16(f, virtio_queue_vector(proxy->vdev, n));
}
i.e. and now, an optional field.
And no, I don't have either a clean design that will be backward
compatible and clean. Clean design is easy:
virtio
virtio-pci (it does the equivalent of save_config() and then call
virtio_save)
virtio-pci-msix (it calls virtio-pci and then send a partial array of
queues. (the save queue thing)
Before you ask, partial arrays are sent: <num_elems> + array
where num_elems == 0 is valid.
But this is the "good" design if we started _now_, that is not the case,
and I am trying to get something clean and bacward compatible.
Later, Juan.
PD. Optional fields are going to have to be in, arm cpus really need
them if we want to maintain backward compatibility.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 11:04 ` [Qemu-devel] Re: optional feature Juan Quintela
@ 2009-09-16 11:18 ` Gleb Natapov
2009-09-16 11:48 ` Juan Quintela
2009-09-16 11:41 ` Michael S. Tsirkin
1 sibling, 1 reply; 33+ messages in thread
From: Gleb Natapov @ 2009-09-16 11:18 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, Michael S. Tsirkin
On Wed, Sep 16, 2009 at 01:04:19PM +0200, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Sep 09, 2009 at 10:47:27AM +0200, Juan Quintela wrote:
> >> How do we deal with optional features?
> >
> > Here's an idea that Gleb suggested in a private
> > conversation: make optional features into
> > separate, non-user-visible devices.
> >
> > Thus we would have vmstate for virtio and additionally, if msix is
> > enabled, vmstate for msix. This solves the problem of the number of
> > devices becoming exponential with the number of features: we have device
> > per feature.
> >
> > I understand that RTC does something like this.
>
> And it is wrong :) I sent a patch to fix it properly, but we have the
> problem of backward compatibility with kvm.
>
> Forget msix for virtio, virtio has the problem already with pci.
What is wrong about it?
>
> virtio_save()
> {
>
> if (vdev->binding->save_config)
> vdev->binding->save_config(vdev->binding_opaque, f);
>
> qemu_put_8s(f, &vdev->status);
>
> .... some other normal fields ...
>
> for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
> if (vdev->vq[i].vring.num == 0)
> break;
>
> Not a problem, we can precalculate i on pre_save()
>
>
> qemu_put_be32(f, vdev->vq[i].vring.num);
> qemu_put_be64(f, vdev->vq[i].pa);
> qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
>
> This is sending a partial array of struct (the "i" 1st entries)
> No problem here.
>
> if (vdev->binding->save_queue)
> vdev->binding->save_queue(vdev->binding_opaque, i, f);
>
> Again, what to do with this one.
>
> }
>
> }
>
> Looking at what does virtio_pci_save_queue()
>
> static void virtio_pci_save_queue(void * opaque, int n, QEMUFile *f)
> {
> VirtIOPCIProxy *proxy = opaque;
> if (msix_present(&proxy->pci_dev))
> qemu_put_be16(f, virtio_queue_vector(proxy->vdev, n));
> }
>
> i.e. and now, an optional field.
>
> And no, I don't have either a clean design that will be backward
> compatible and clean. Clean design is easy:
>
> virtio
> virtio-pci (it does the equivalent of save_config() and then call
> virtio_save)
> virtio-pci-msix (it calls virtio-pci and then send a partial array of
> queues. (the save queue thing)
>
> Before you ask, partial arrays are sent: <num_elems> + array
> where num_elems == 0 is valid.
>
> But this is the "good" design if we started _now_, that is not the case,
> and I am trying to get something clean and bacward compatible.
>
> Later, Juan.
>
> PD. Optional fields are going to have to be in, arm cpus really need
> them if we want to maintain backward compatibility.
--
Gleb.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 11:04 ` [Qemu-devel] Re: optional feature Juan Quintela
2009-09-16 11:18 ` Gleb Natapov
@ 2009-09-16 11:41 ` Michael S. Tsirkin
2009-09-16 12:13 ` Juan Quintela
1 sibling, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 11:41 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, gleb
On Wed, Sep 16, 2009 at 01:04:19PM +0200, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Sep 09, 2009 at 10:47:27AM +0200, Juan Quintela wrote:
> >> How do we deal with optional features?
> >
> > Here's an idea that Gleb suggested in a private
> > conversation: make optional features into
> > separate, non-user-visible devices.
> >
> > Thus we would have vmstate for virtio and additionally, if msix is
> > enabled, vmstate for msix. This solves the problem of the number of
> > devices becoming exponential with the number of features: we have device
> > per feature.
> >
> > I understand that RTC does something like this.
>
> And it is wrong :) I sent a patch to fix it properly, but we have the
> problem of backward compatibility with kvm.
>
> Forget msix for virtio, virtio has the problem already with pci.
>
> virtio_save()
> {
>
> if (vdev->binding->save_config)
> vdev->binding->save_config(vdev->binding_opaque, f);
>
> qemu_put_8s(f, &vdev->status);
>
> .... some other normal fields ...
>
> for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
> if (vdev->vq[i].vring.num == 0)
> break;
>
> Not a problem, we can precalculate i on pre_save()
>
>
> qemu_put_be32(f, vdev->vq[i].vring.num);
> qemu_put_be64(f, vdev->vq[i].pa);
> qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
>
> This is sending a partial array of struct (the "i" 1st entries)
> No problem here.
>
> if (vdev->binding->save_queue)
> vdev->binding->save_queue(vdev->binding_opaque, i, f);
>
> Again, what to do with this one.
>
> }
>
> }
>
> Looking at what does virtio_pci_save_queue()
>
> static void virtio_pci_save_queue(void * opaque, int n, QEMUFile *f)
> {
> VirtIOPCIProxy *proxy = opaque;
> if (msix_present(&proxy->pci_dev))
> qemu_put_be16(f, virtio_queue_vector(proxy->vdev, n));
> }
>
> i.e. and now, an optional field.
>
> And no, I don't have either a clean design that will be backward
> compatible and clean.
The reason seems to be that you already decided what is clean
(below) and it is not backwards compatible :)
> Clean design is easy:
>
> virtio
> virtio-pci (it does the equivalent of save_config() and then call
> virtio_save)
> virtio-pci-msix (it calls virtio-pci and then send a partial array of
> queues. (the save queue thing)
>
> Before you ask, partial arrays are sent: <num_elems> + array
> where num_elems == 0 is valid.
So writing num_elems == 0 in the image is wrong imo, we should have a
way to skip the array altogether. This will let as add any number of
arrays down the road, and have them treated as zero size if an old image
is loaded.
> But this is the "good" design if we started _now_, that is not the case,
> and I am trying to get something clean and bacward compatible.
>
> Later, Juan.
>
> PD. Optional fields are going to have to be in, arm cpus really need
> them if we want to maintain backward compatibility.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 11:18 ` Gleb Natapov
@ 2009-09-16 11:48 ` Juan Quintela
2009-09-16 11:52 ` Michael S. Tsirkin
2009-09-16 11:57 ` Gleb Natapov
0 siblings, 2 replies; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 11:48 UTC (permalink / raw)
To: Gleb Natapov; +Cc: qemu-devel, Michael S. Tsirkin
Gleb Natapov <gleb@redhat.com> wrote:
> On Wed, Sep 16, 2009 at 01:04:19PM +0200, Juan Quintela wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> > On Wed, Sep 09, 2009 at 10:47:27AM +0200, Juan Quintela wrote:
>> >> How do we deal with optional features?
>> >
>> > Here's an idea that Gleb suggested in a private
>> > conversation: make optional features into
>> > separate, non-user-visible devices.
>> >
>> > Thus we would have vmstate for virtio and additionally, if msix is
>> > enabled, vmstate for msix. This solves the problem of the number of
>> > devices becoming exponential with the number of features: we have device
>> > per feature.
>> >
>> > I understand that RTC does something like this.
>>
>> And it is wrong :) I sent a patch to fix it properly, but we have the
>> problem of backward compatibility with kvm.
>>
>> Forget msix for virtio, virtio has the problem already with pci.
> What is wrong about it?
See below, we are changing the state to one table, and tables don't have
neither if's or whiles (we have a limited for that just walks arrays).
I don't really know how to handle virtio in a sane way. The _saner_ way
that I thought was to split it into three devices as I sketched below,
but I still don't understood virtio creation to see how to "fix" it.
It is next on queue after cpu (BIGGG), and ide (was waiting for Gerd
patches to be commited. After that, I will thought again on how to
handle virtio.
>>
>> virtio_save()
>> {
>>
>> if (vdev->binding->save_config)
>> vdev->binding->save_config(vdev->binding_opaque, f);
>>
>> qemu_put_8s(f, &vdev->status);
>>
>> .... some other normal fields ...
>>
>> for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
>> if (vdev->vq[i].vring.num == 0)
>> break;
>>
>> Not a problem, we can precalculate i on pre_save()
>>
>>
>> qemu_put_be32(f, vdev->vq[i].vring.num);
>> qemu_put_be64(f, vdev->vq[i].pa);
>> qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
>>
>> This is sending a partial array of struct (the "i" 1st entries)
>> No problem here.
>>
>> if (vdev->binding->save_queue)
>> vdev->binding->save_queue(vdev->binding_opaque, i, f);
>>
>> Again, what to do with this one.
>>
>> }
>>
>> }
>>
>> Looking at what does virtio_pci_save_queue()
>>
>> static void virtio_pci_save_queue(void * opaque, int n, QEMUFile *f)
>> {
>> VirtIOPCIProxy *proxy = opaque;
>> if (msix_present(&proxy->pci_dev))
>> qemu_put_be16(f, virtio_queue_vector(proxy->vdev, n));
>> }
>>
>> i.e. and now, an optional field.
>>
>> And no, I don't have either a clean design that will be backward
>> compatible and clean. Clean design is easy:
>>
>> virtio
>> virtio-pci (it does the equivalent of save_config() and then call
>> virtio_save)
>> virtio-pci-msix (it calls virtio-pci and then send a partial array of
>> queues. (the save queue thing)
>>
>> Before you ask, partial arrays are sent: <num_elems> + array
>> where num_elems == 0 is valid.
>>
>> But this is the "good" design if we started _now_, that is not the case,
>> and I am trying to get something clean and bacward compatible.
>>
>> Later, Juan.
>>
>> PD. Optional fields are going to have to be in, arm cpus really need
>> them if we want to maintain backward compatibility.
>
> --
> Gleb.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 11:48 ` Juan Quintela
@ 2009-09-16 11:52 ` Michael S. Tsirkin
2009-09-16 12:14 ` Juan Quintela
2009-09-16 11:57 ` Gleb Natapov
1 sibling, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 11:52 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, Gleb Natapov
On Wed, Sep 16, 2009 at 01:48:35PM +0200, Juan Quintela wrote:
> Gleb Natapov <gleb@redhat.com> wrote:
> > On Wed, Sep 16, 2009 at 01:04:19PM +0200, Juan Quintela wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> > On Wed, Sep 09, 2009 at 10:47:27AM +0200, Juan Quintela wrote:
> >> >> How do we deal with optional features?
> >> >
> >> > Here's an idea that Gleb suggested in a private
> >> > conversation: make optional features into
> >> > separate, non-user-visible devices.
> >> >
> >> > Thus we would have vmstate for virtio and additionally, if msix is
> >> > enabled, vmstate for msix. This solves the problem of the number of
> >> > devices becoming exponential with the number of features: we have device
> >> > per feature.
> >> >
> >> > I understand that RTC does something like this.
> >>
> >> And it is wrong :) I sent a patch to fix it properly, but we have the
> >> problem of backward compatibility with kvm.
> >>
> >> Forget msix for virtio, virtio has the problem already with pci.
> > What is wrong about it?
>
> See below, we are changing the state to one table, and tables don't have
> neither if's or whiles (we have a limited for that just walks arrays).
Let's just bite the bullet and add support for if's? It's not like it's
hard to invent 'struct vmstate_condition' or some such.
> I don't really know how to handle virtio in a sane way. The _saner_ way
> that I thought was to split it into three devices as I sketched below,
> but I still don't understood virtio creation to see how to "fix" it.
> It is next on queue after cpu (BIGGG), and ide (was waiting for Gerd
> patches to be commited. After that, I will thought again on how to
> handle virtio.
>
>
> >>
> >> virtio_save()
> >> {
> >>
> >> if (vdev->binding->save_config)
> >> vdev->binding->save_config(vdev->binding_opaque, f);
> >>
> >> qemu_put_8s(f, &vdev->status);
> >>
> >> .... some other normal fields ...
> >>
> >> for (i = 0; i < VIRTIO_PCI_QUEUE_MAX; i++) {
> >> if (vdev->vq[i].vring.num == 0)
> >> break;
> >>
> >> Not a problem, we can precalculate i on pre_save()
> >>
> >>
> >> qemu_put_be32(f, vdev->vq[i].vring.num);
> >> qemu_put_be64(f, vdev->vq[i].pa);
> >> qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
> >>
> >> This is sending a partial array of struct (the "i" 1st entries)
> >> No problem here.
> >>
> >> if (vdev->binding->save_queue)
> >> vdev->binding->save_queue(vdev->binding_opaque, i, f);
> >>
> >> Again, what to do with this one.
> >>
> >> }
> >>
> >> }
> >>
> >> Looking at what does virtio_pci_save_queue()
> >>
> >> static void virtio_pci_save_queue(void * opaque, int n, QEMUFile *f)
> >> {
> >> VirtIOPCIProxy *proxy = opaque;
> >> if (msix_present(&proxy->pci_dev))
> >> qemu_put_be16(f, virtio_queue_vector(proxy->vdev, n));
> >> }
> >>
> >> i.e. and now, an optional field.
> >>
> >> And no, I don't have either a clean design that will be backward
> >> compatible and clean. Clean design is easy:
> >>
> >> virtio
> >> virtio-pci (it does the equivalent of save_config() and then call
> >> virtio_save)
> >> virtio-pci-msix (it calls virtio-pci and then send a partial array of
> >> queues. (the save queue thing)
> >>
> >> Before you ask, partial arrays are sent: <num_elems> + array
> >> where num_elems == 0 is valid.
> >>
> >> But this is the "good" design if we started _now_, that is not the case,
> >> and I am trying to get something clean and bacward compatible.
> >>
> >> Later, Juan.
> >>
> >> PD. Optional fields are going to have to be in, arm cpus really need
> >> them if we want to maintain backward compatibility.
> >
> > --
> > Gleb.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 11:48 ` Juan Quintela
2009-09-16 11:52 ` Michael S. Tsirkin
@ 2009-09-16 11:57 ` Gleb Natapov
2009-09-16 12:23 ` Juan Quintela
2009-09-16 13:51 ` Anthony Liguori
1 sibling, 2 replies; 33+ messages in thread
From: Gleb Natapov @ 2009-09-16 11:57 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, Michael S. Tsirkin
On Wed, Sep 16, 2009 at 01:48:35PM +0200, Juan Quintela wrote:
> Gleb Natapov <gleb@redhat.com> wrote:
> > On Wed, Sep 16, 2009 at 01:04:19PM +0200, Juan Quintela wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> > On Wed, Sep 09, 2009 at 10:47:27AM +0200, Juan Quintela wrote:
> >> >> How do we deal with optional features?
> >> >
> >> > Here's an idea that Gleb suggested in a private
> >> > conversation: make optional features into
> >> > separate, non-user-visible devices.
> >> >
> >> > Thus we would have vmstate for virtio and additionally, if msix is
> >> > enabled, vmstate for msix. This solves the problem of the number of
> >> > devices becoming exponential with the number of features: we have device
> >> > per feature.
> >> >
> >> > I understand that RTC does something like this.
> >>
> >> And it is wrong :) I sent a patch to fix it properly, but we have the
> >> problem of backward compatibility with kvm.
> >>
> >> Forget msix for virtio, virtio has the problem already with pci.
> > What is wrong about it?
>
> See below, we are changing the state to one table, and tables don't have
> neither if's or whiles (we have a limited for that just walks arrays).
>
I don't know virtio enough to understand all those things below. I am
asking what is wrong about how RTC did it? You don't need if's or
whiles. You have general RTC sate in one table and things that needed by
rtc-td-hack in another table. From vmstate point of view those are not
connected. Serialization/deserialization should support matching of
incomming binary blob to deserialize function. When entire incomming
stream is consumed check has to be made that there is no uninitialized
table (deserialize callback that was not called) and if there is -
abort.
--
Gleb.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 11:41 ` Michael S. Tsirkin
@ 2009-09-16 12:13 ` Juan Quintela
2009-09-16 12:29 ` Michael S. Tsirkin
0 siblings, 1 reply; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 12:13 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, gleb
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 16, 2009 at 01:04:19PM +0200, Juan Quintela wrote:
>> And no, I don't have either a clean design that will be backward
>> compatible and clean.
>
> The reason seems to be that you already decided what is clean
Somebody has to do the difficult things :)
> (below) and it is not backwards compatible :)
>
>> Clean design is easy:
>>
>> virtio
>> virtio-pci (it does the equivalent of save_config() and then call
>> virtio_save)
>> virtio-pci-msix (it calls virtio-pci and then send a partial array of
>> queues. (the save queue thing)
>>
>> Before you ask, partial arrays are sent: <num_elems> + array
>> where num_elems == 0 is valid.
>
> So writing num_elems == 0 in the image is wrong imo, we should have a
> way to skip the array altogether. This will let as add any number of
> arrays down the road, and have them treated as zero size if an old image
> is loaded.
No, this means that elements can appear form nowhere. That is wrong.
I have no problem adding new elements, you just up the version field,
and here you go. The problem here is _implicit_ information that is
_nowhere_ to be seen.
What you are asking for is (basically):
- I am more clever that VMState, and I will always do the right thing,
please let me alone.
What I am not doubting that is true, just that VMState is here to help
the rest of us that normally get the things wrong.
VMState rules are simple:
- Everything is explicit
- You can add a field -> you inc the version number
- Everything is explicit
- You never remove a field (you can send a dummy value, and ignore its
- Everything is explicit
value on reception), you just never remove it.
- Everything is explicit
- You can load old versions of the state (that is your option)
- Everything is explicit
- The type of the field that you send and the type of what you send
have to be the same.
- Everything is explicit
- You can send simple types (int32_t, and similars)
- Everything is explicit
- You can send structs (basically another description)
- Everything is explicit
- You can send arrays
- Everything is explicit
- You can send partial arrays
- Everything is explicit
- You can send pointers, but it is better if you don't use them
(not for VMState, just to get things right)
- Everything is explicit
And that is it. There are still no helpers for sending num_elems +
array elems.
What does VMState gives you:
- It will make the load/save function for you
- It will make sure that the types that you are saving/loading and the
types of the variables are right
- If you send full arrays, it will make sure that the length that you sent
is the same that the one defined
- If you use pointers, you still have typechecking, but you lost size
checking.
What VMState would give you:
being able to manipulate save images, this is the principal reason why
everything has to be explicit. You can't save/load anything that is
not described in the VMState. Obviously, anything that depends on the
device state that is _not_ saved, can't be manipulated.
This is the reason why 0 length arrays that you don't know where they
came from, can't be used. And no, seeing that you can look in the pci
config space, command line options, etc will not work. If you want a
new field, you told vmstate how to calculate/use it.
VMState was not created to have something more powerful that we had (we
already had a stream of bytes), it was created to have something easier
to use, and something that you could do things with it (like changing
how things are saved/loaded without having to change all the devices
code).
Obviously that is at odds with: my device knows how to magically know
that here, it is going to came a new field. And magically here means in
a way that you are not telling vmstate how to handle it. And telling
VMState how to handle it is: here is this new field that I want you to
save, and now that we have another field, it is a new version.
The design plan is that you have a device on version 5, you receive an
image for that device in version 5, and you are _not_ able to load it.
You here means VMState, not the device.
Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 11:52 ` Michael S. Tsirkin
@ 2009-09-16 12:14 ` Juan Quintela
2009-09-16 12:18 ` Michael S. Tsirkin
0 siblings, 1 reply; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 12:14 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, Gleb Natapov
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 16, 2009 at 01:48:35PM +0200, Juan Quintela wrote:
>> Gleb Natapov <gleb@redhat.com> wrote:
>> > On Wed, Sep 16, 2009 at 01:04:19PM +0200, Juan Quintela wrote:
>> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> > On Wed, Sep 09, 2009 at 10:47:27AM +0200, Juan Quintela wrote:
>> >> >> How do we deal with optional features?
>> >> >
>> >> > Here's an idea that Gleb suggested in a private
>> >> > conversation: make optional features into
>> >> > separate, non-user-visible devices.
>> >> >
>> >> > Thus we would have vmstate for virtio and additionally, if msix is
>> >> > enabled, vmstate for msix. This solves the problem of the number of
>> >> > devices becoming exponential with the number of features: we have device
>> >> > per feature.
>> >> >
>> >> > I understand that RTC does something like this.
>> >>
>> >> And it is wrong :) I sent a patch to fix it properly, but we have the
>> >> problem of backward compatibility with kvm.
>> >>
>> >> Forget msix for virtio, virtio has the problem already with pci.
>> > What is wrong about it?
>>
>> See below, we are changing the state to one table, and tables don't have
>> neither if's or whiles (we have a limited for that just walks arrays).
>
> Let's just bite the bullet and add support for if's? It's not like it's
> hard to invent 'struct vmstate_condition' or some such.
I have to do it. The problem is not adding an optional field, is adding
it conditionally on _what_, and that _what_ should also be ideally on vmstate.
Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 12:14 ` Juan Quintela
@ 2009-09-16 12:18 ` Michael S. Tsirkin
2009-09-16 12:26 ` Juan Quintela
0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 12:18 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, Gleb Natapov
On Wed, Sep 16, 2009 at 02:14:32PM +0200, Juan Quintela wrote:
> >> See below, we are changing the state to one table, and tables don't have
> >> neither if's or whiles (we have a limited for that just walks arrays).
> >
> > Let's just bite the bullet and add support for if's? It's not like it's
> > hard to invent 'struct vmstate_condition' or some such.
>
> I have to do it. The problem is not adding an optional field, is adding
> it conditionally on _what_, and that _what_ should also be ideally on vmstate.
>
> Later, Juan.
pci config is on vmstate already, I don't see a problem here.
--
MST
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 11:57 ` Gleb Natapov
@ 2009-09-16 12:23 ` Juan Quintela
2009-09-16 12:35 ` Gleb Natapov
2009-09-16 13:51 ` Anthony Liguori
1 sibling, 1 reply; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 12:23 UTC (permalink / raw)
To: Gleb Natapov; +Cc: qemu-devel, Michael S. Tsirkin
Gleb Natapov <gleb@redhat.com> wrote:
> On Wed, Sep 16, 2009 at 01:48:35PM +0200, Juan Quintela wrote:
>> Gleb Natapov <gleb@redhat.com> wrote:
>> > On Wed, Sep 16, 2009 at 01:04:19PM +0200, Juan Quintela wrote:
>> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >> > On Wed, Sep 09, 2009 at 10:47:27AM +0200, Juan Quintela wrote:
>> >> >> How do we deal with optional features?
>> >> >
>> >> > Here's an idea that Gleb suggested in a private
>> >> > conversation: make optional features into
>> >> > separate, non-user-visible devices.
>> >> >
>> >> > Thus we would have vmstate for virtio and additionally, if msix is
>> >> > enabled, vmstate for msix. This solves the problem of the number of
>> >> > devices becoming exponential with the number of features: we have device
>> >> > per feature.
>> >> >
>> >> > I understand that RTC does something like this.
>> >>
>> >> And it is wrong :) I sent a patch to fix it properly, but we have the
>> >> problem of backward compatibility with kvm.
>> >>
>> >> Forget msix for virtio, virtio has the problem already with pci.
>> > What is wrong about it?
>>
>> See below, we are changing the state to one table, and tables don't have
>> neither if's or whiles (we have a limited for that just walks arrays).
>>
> I don't know virtio enough to understand all those things below. I am
> asking what is wrong about how RTC did it? You don't need if's or
> whiles.
Sorry, I missunderstood your question.
> You have general RTC sate in one table and things that needed by
> rtc-td-hack in another table.
You have a rtc-td table that _needs_ rtc-td unconditionally, and that
makes no sense at all, then another way of doing it is:
up rtc version +1
add the two fields that we need (together with rtc-td-hack value)
and now we:
- can still load old rtc state
- can save rtc state with/without rtc-td-hack value
if rtc-td-hack is not enabled, it is just not used
if we ever get to the point that we decide that rtc-td-hack should
always be enabled, everything is working already.
> From vmstate point of view those are not connected.
This is not VMState related. It is that you need another two fields to
get a new feature of rtc. Are we agreeing that everything is easier if
you added the fields to rtc instead fo creating a new device for this
two values? That was my point about the correct way of handling this to
values. And yes, "correct" here don't have into account that kvm was a
fork of qemu. There are "historic" reasons why it made sense to create
a new device for rtc-td-hack, but that reasons don't mean that this is
the more correct way of doing it.
> Serialization/deserialization should support matching of
> incomming binary blob to deserialize function. When entire incomming
> stream is consumed check has to be made that there is no uninitialized
> table (deserialize callback that was not called) and if there is -
> abort.
As I told you, this one was not VMState related. From a technical point
of view, adding the new device was not a problem. What I was discussing
was if this was the better way of handling this problem. rtc-td-hack is
an imaginary device to save two flags that rtc don't save for you. If
you just remove the "hack" for the name, the ifdefs for the code, and
leave the coalesced field names, it indeed looks like a nice new feature
of the rtc? A feature that makes sense to have for everybody?
Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 12:18 ` Michael S. Tsirkin
@ 2009-09-16 12:26 ` Juan Quintela
2009-09-16 12:37 ` Michael S. Tsirkin
0 siblings, 1 reply; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 12:26 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, Gleb Natapov
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 16, 2009 at 02:14:32PM +0200, Juan Quintela wrote:
>> >> See below, we are changing the state to one table, and tables don't have
>> >> neither if's or whiles (we have a limited for that just walks arrays).
>> >
>> > Let's just bite the bullet and add support for if's? It's not like it's
>> > hard to invent 'struct vmstate_condition' or some such.
>>
>> I have to do it. The problem is not adding an optional field, is adding
>> it conditionally on _what_, and that _what_ should also be ideally on vmstate.
>>
>> Later, Juan.
>
> pci config is on vmstate already, I don't see a problem here.
vmstate don't understand pci config. I want to manipulate images with
the informantion that you have given vmstate. Sending another byte
meaning:
msix_enabled
and now depending on that value another field is ok with me. You can
calculate msix_enable at pre_save time whenever way that you see fit.
What vmstate needs is a only to now if msix_enable is 0 or 1, how you
calculate it, VMState don't care.
Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 12:13 ` Juan Quintela
@ 2009-09-16 12:29 ` Michael S. Tsirkin
2009-09-16 13:31 ` Juan Quintela
0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 12:29 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, gleb
On Wed, Sep 16, 2009 at 02:13:06PM +0200, Juan Quintela wrote:
> VMState rules are simple:
> - Everything is explicit
By the way, pci currently has cmask,
which performs checking on load, making sure
that load does not modify a constant field in config space,
which can't change as a result of guest actions.
If it does - migration fails.
This is IMO much better and more robust than
simply hoping that there are no bugs or that
developers remember to increment a version
number each time they change some field.
I think it's pretty important to keep this
feature, and maybe add something similar
to other devices.
How will VMState support this?
--
MST
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 12:23 ` Juan Quintela
@ 2009-09-16 12:35 ` Gleb Natapov
2009-09-16 12:40 ` Michael S. Tsirkin
2009-09-16 13:22 ` Juan Quintela
0 siblings, 2 replies; 33+ messages in thread
From: Gleb Natapov @ 2009-09-16 12:35 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, Michael S. Tsirkin
On Wed, Sep 16, 2009 at 02:23:33PM +0200, Juan Quintela wrote:
> Gleb Natapov <gleb@redhat.com> wrote:
> > On Wed, Sep 16, 2009 at 01:48:35PM +0200, Juan Quintela wrote:
> >> Gleb Natapov <gleb@redhat.com> wrote:
> >> > On Wed, Sep 16, 2009 at 01:04:19PM +0200, Juan Quintela wrote:
> >> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >> > On Wed, Sep 09, 2009 at 10:47:27AM +0200, Juan Quintela wrote:
> >> >> >> How do we deal with optional features?
> >> >> >
> >> >> > Here's an idea that Gleb suggested in a private
> >> >> > conversation: make optional features into
> >> >> > separate, non-user-visible devices.
> >> >> >
> >> >> > Thus we would have vmstate for virtio and additionally, if msix is
> >> >> > enabled, vmstate for msix. This solves the problem of the number of
> >> >> > devices becoming exponential with the number of features: we have device
> >> >> > per feature.
> >> >> >
> >> >> > I understand that RTC does something like this.
> >> >>
> >> >> And it is wrong :) I sent a patch to fix it properly, but we have the
> >> >> problem of backward compatibility with kvm.
> >> >>
> >> >> Forget msix for virtio, virtio has the problem already with pci.
> >> > What is wrong about it?
> >>
> >> See below, we are changing the state to one table, and tables don't have
> >> neither if's or whiles (we have a limited for that just walks arrays).
> >>
> > I don't know virtio enough to understand all those things below. I am
> > asking what is wrong about how RTC did it? You don't need if's or
> > whiles.
>
> Sorry, I missunderstood your question.
>
> > You have general RTC sate in one table and things that needed by
> > rtc-td-hack in another table.
>
> You have a rtc-td table that _needs_ rtc-td unconditionally, and that
> makes no sense at all, then another way of doing it is:
>
I don't understand why this makes no sense. It does for me. You have to
run "qemu -incoming" with same parameters as original one, so you have
to run it with rtc-td-hack if migration source did it. Otherwise
migration should fail. And if source runs without rtc-td-hack rtc-td
table is not registered and is not migrated.
> up rtc version +1
> add the two fields that we need (together with rtc-td-hack value)
And why this is better? You can't migrate old VM to new qemu even if you
don't use rtc-td-hack on new one.
>
> and now we:
> - can still load old rtc state
how? It has another version.
> - can save rtc state with/without rtc-td-hack value
> if rtc-td-hack is not enabled, it is just not used
Nothing that you can't do with two table approach as far as I can see.
>
> if we ever get to the point that we decide that rtc-td-hack should
> always be enabled, everything is working already.
>
> > From vmstate point of view those are not connected.
>
> This is not VMState related. It is that you need another two fields to
> get a new feature of rtc. Are we agreeing that everything is easier if
> you added the fields to rtc instead fo creating a new device for this
> two values? That was my point about the correct way of handling this to
You don't create another device. You create another "migration container"
> values. And yes, "correct" here don't have into account that kvm was a
> fork of qemu. There are "historic" reasons why it made sense to create
> a new device for rtc-td-hack, but that reasons don't mean that this is
> the more correct way of doing it.
>
>
It has nothing to do with fork of qemu. It was nice way to add
functionality + migration support for it without breaking plane RTC
migration.
> > Serialization/deserialization should support matching of
> > incomming binary blob to deserialize function. When entire incomming
> > stream is consumed check has to be made that there is no uninitialized
> > table (deserialize callback that was not called) and if there is -
> > abort.
>
> As I told you, this one was not VMState related. From a technical point
> of view, adding the new device was not a problem. What I was discussing
> was if this was the better way of handling this problem. rtc-td-hack is
> an imaginary device to save two flags that rtc don't save for you. If
> you just remove the "hack" for the name, the ifdefs for the code, and
> leave the coalesced field names, it indeed looks like a nice new feature
> of the rtc? A feature that makes sense to have for everybody?
>
So? I don't get your point. We are talking about how we can add features
to devices and don't break migration, no? One way is to have "migration containers"
for each isolated feature (I don't like to call them "imaginary devices").
--
Gleb.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 12:26 ` Juan Quintela
@ 2009-09-16 12:37 ` Michael S. Tsirkin
2009-09-16 13:01 ` Juan Quintela
0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 12:37 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, Gleb Natapov
On Wed, Sep 16, 2009 at 02:26:58PM +0200, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Sep 16, 2009 at 02:14:32PM +0200, Juan Quintela wrote:
> >> >> See below, we are changing the state to one table, and tables don't have
> >> >> neither if's or whiles (we have a limited for that just walks arrays).
> >> >
> >> > Let's just bite the bullet and add support for if's? It's not like it's
> >> > hard to invent 'struct vmstate_condition' or some such.
> >>
> >> I have to do it. The problem is not adding an optional field, is adding
> >> it conditionally on _what_, and that _what_ should also be ideally on vmstate.
> >>
> >> Later, Juan.
> >
> > pci config is on vmstate already, I don't see a problem here.
>
> vmstate don't understand pci config.
How can it save it then? What's more, how can it load it sanely? E.g.
when loading we must make sure that device id etc match.
> I want to manipulate images with
> the informantion that you have given vmstate. Sending another byte
> meaning:
>
> msix_enabled
>
> and now depending on that value another field is ok with me.
1. this bit is there in config already.
sending it twice is redundant and so, wrong:
we will just have to add even more code
to check that these values match
2. it's also not backward compatible, is it?
> You can
> calculate msix_enable at pre_save time whenever way that you see fit.
Not everything has to fit in a global variable, we can have a table
per-device. Then each time msix_enable bit is changed, I can tell
vmstate about it.
> What vmstate needs is a only to now if msix_enable is 0 or 1, how you
> calculate it, VMState don't care.
Add a condition including offset into pci config and bitmask?
> Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 12:35 ` Gleb Natapov
@ 2009-09-16 12:40 ` Michael S. Tsirkin
2009-09-16 13:22 ` Juan Quintela
1 sibling, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 12:40 UTC (permalink / raw)
To: Gleb Natapov; +Cc: qemu-devel, Juan Quintela
On Wed, Sep 16, 2009 at 03:35:35PM +0300, Gleb Natapov wrote:
> So? I don't get your point. We are talking about how we can add features
> to devices and don't break migration, no? One way is to have "migration containers"
> for each isolated feature (I don't like to call them "imaginary devices").
Yay, containers is a good term, especially in virtualization context :)
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 12:37 ` Michael S. Tsirkin
@ 2009-09-16 13:01 ` Juan Quintela
2009-09-16 13:03 ` Michael S. Tsirkin
0 siblings, 1 reply; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 13:01 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, Gleb Natapov
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 16, 2009 at 02:26:58PM +0200, Juan Quintela wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> > On Wed, Sep 16, 2009 at 02:14:32PM +0200, Juan Quintela wrote:
>> >> >> See below, we are changing the state to one table, and tables don't have
>> >> >> neither if's or whiles (we have a limited for that just walks arrays).
>> >> >
>> >> > Let's just bite the bullet and add support for if's? It's not like it's
>> >> > hard to invent 'struct vmstate_condition' or some such.
>> >>
>> >> I have to do it. The problem is not adding an optional field, is adding
>> >> it conditionally on _what_, and that _what_ should also be ideally on vmstate.
>> >>
>> >> Later, Juan.
>> >
>> > pci config is on vmstate already, I don't see a problem here.
>>
>> vmstate don't understand pci config.
>
> How can it save it then? What's more, how can it load it sanely? E.g.
> when loading we must make sure that device id etc match.
We don't test it at all. We just load an ne2000 image into an ne2000
device, and things like that. VMState has zero knowledge of pci. It
just saves some fields.
>> I want to manipulate images with
>> the informantion that you have given vmstate. Sending another byte
>> meaning:
>>
>> msix_enabled
>>
>> and now depending on that value another field is ok with me.
>
> 1. this bit is there in config already.
> sending it twice is redundant and so, wrong:
> we will just have to add even more code
> to check that these values match
You need to write the code anyways. There is no way that VMState will
learn PCI (or whatever else), it does'nt matter if you do:
pre_save()
....
s->msix_enabled_vmstate = foo(s);
....
or we add a new feature to VMState that allows you to call a function
(foo() in this case) to know if we send/don't send vmstate. You need to
write foo anyways.
> 2. it's also not backward compatible, is it?
Optional fields were not allowed, really before. Format was abused and
then we got problems that we have to live with them. If you mean that
in the past devices used optional features, that is right.
>
>> You can
>> calculate msix_enable at pre_save time whenever way that you see fit.
>
> Not everything has to fit in a global variable, we can have a table
> per-device. Then each time msix_enable bit is changed, I can tell
> vmstate about it.
No. You only need to tell before saving. And there is nice function
(pre_save()) that would be called before doing a save. You can put
there any code to fill that variable. It is ok to create
msix_enabled_vmstate variable, that is only used for vmstate, no
problems with that.
>
>> What vmstate needs is a only to now if msix_enable is 0 or 1, how you
>> calculate it, VMState don't care.
>
> Add a condition including offset into pci config and bitmask?
No, defining a variable is easy for anybody. if I teach PCI to VMState,
then the damn things at i2c will want me to teach it i2c, and then
SunBus, and then ISA, and then ..... VMState understands all qemu.
There is no problem at all doing in pre_save()
msix_enabled = foo()
And you can share that foo function for all msix devices if you want it.
VMState don't have to learn msix, and efect is the same.
Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 13:01 ` Juan Quintela
@ 2009-09-16 13:03 ` Michael S. Tsirkin
2009-09-16 13:34 ` Juan Quintela
0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 13:03 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, Gleb Natapov
On Wed, Sep 16, 2009 at 03:01:22PM +0200, Juan Quintela wrote:
> >> > pci config is on vmstate already, I don't see a problem here.
> >>
> >> vmstate don't understand pci config.
> >
> > How can it save it then? What's more, how can it load it sanely? E.g.
> > when loading we must make sure that device id etc match.
>
> We don't test it at all.
Someone has to. Existing code has it. Removing this sanity check and
changing e.g. device id by load will be a bad thing.
--
MST
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 12:35 ` Gleb Natapov
2009-09-16 12:40 ` Michael S. Tsirkin
@ 2009-09-16 13:22 ` Juan Quintela
2009-09-16 14:08 ` Anthony Liguori
1 sibling, 1 reply; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 13:22 UTC (permalink / raw)
To: Gleb Natapov; +Cc: qemu-devel, Michael S. Tsirkin
Gleb Natapov <gleb@redhat.com> wrote:
Hi
>>
>> You have a rtc-td table that _needs_ rtc-td unconditionally, and that
>> makes no sense at all, then another way of doing it is:
>>
> I don't understand why this makes no sense. It does for me. You have to
> run "qemu -incoming" with same parameters as original one, so you have
> to run it with rtc-td-hack if migration source did it. Otherwise
> migration should fail.
Yes.
> And if source runs without rtc-td-hack rtc-td
> table is not registered and is not migrated.
I am not telling that it will not work.
>> up rtc version +1
>> add the two fields that we need (together with rtc-td-hack value)
> And why this is better? You can't migrate old VM to new qemu even if you
> don't use rtc-td-hack on new one.
I think the difference between us is:
- is rtc-td-hack a hack that should only be used for some users
- it is a valid rtc feature that should be available to everybody
- it is independent, or it needs an rtc to have any value.
See for instance ide, pci-ide has two devices, and it includes its state
inside the "ide" bigger device. it is not that we had isa-ide working
before, and now that we have pci, we add a pci-ide with the missing
fields.
Form me it happens that:
- rtc-td-hack is a valid feature, and I think should be available to
everybody
- for me it is not independent (at all) of the rtc, is just another
feature that should be saved with it.
But that is my opinion :)
>> and now we:
>> - can still load old rtc state
> how? It has another version.
You can have it compatible.
old_version = 1
new_verion = 1
if loading from old_version -> only old fields are read
we put on post_load() save values for missing fields
saving -> all fields are saved
loading for new_version : we read all fields
No problems at all with compatibility.
See my mail:
From: Juan Quintela <quintela@redhat.com>
Subject: [RFC PATCH v2 1/4] mc145818rtc: fix saving of rtc-td
hack properly upgrading the version number
To: qemu-devel@nongnu.org
Date: Thu, 10 Sep 2009 01:12:54 +0200
Where I try to do it properly.
I only tried to change the savevm part. If this patch was agreed, I
would have cleaned the hack part to "feel" more integrated.
>> - can save rtc state with/without rtc-td-hack value
>> if rtc-td-hack is not enabled, it is just not used
> Nothing that you can't do with two table approach as far as I can see.
As I told on other email. This is not an VMState problem. This is in
the: it works, and has no "technical" problems. My problem here is with
the design of the feature.
>> if we ever get to the point that we decide that rtc-td-hack should
>> always be enabled, everything is working already.
>>
>> > From vmstate point of view those are not connected.
>>
>> This is not VMState related. It is that you need another two fields to
>> get a new feature of rtc. Are we agreeing that everything is easier if
>> you added the fields to rtc instead fo creating a new device for this
>> two values? That was my point about the correct way of handling this to
> You don't create another device. You create another "migration container"
>
>> values. And yes, "correct" here don't have into account that kvm was a
>> fork of qemu. There are "historic" reasons why it made sense to create
>> a new device for rtc-td-hack, but that reasons don't mean that this is
>> the more correct way of doing it.
>>
>>
> It has nothing to do with fork of qemu. It was nice way to add
> functionality + migration support for it without breaking plane RTC
> migration.
See my patch, migration without rtc-td-hack is maintained. Just we have
an rtc++ that is able to coalesce interrupts.
>> > Serialization/deserialization should support matching of
>> > incomming binary blob to deserialize function. When entire incomming
>> > stream is consumed check has to be made that there is no uninitialized
>> > table (deserialize callback that was not called) and if there is -
>> > abort.
>>
>> As I told you, this one was not VMState related. From a technical point
>> of view, adding the new device was not a problem. What I was discussing
>> was if this was the better way of handling this problem. rtc-td-hack is
>> an imaginary device to save two flags that rtc don't save for you. If
>> you just remove the "hack" for the name, the ifdefs for the code, and
>> leave the coalesced field names, it indeed looks like a nice new feature
>> of the rtc? A feature that makes sense to have for everybody?
>>
> So? I don't get your point. We are talking about how we can add features
> to devices and don't break migration, no?
we are talking several things :)
a- how to maintain backwards compatibility when we do changes.
Agreement was that we have to be able to load old state. Going from
newer qemu to old qemu is not supported. THis one is preserved with
the two approaches (yes, yours also allows to go to old qemu).
b- There have been changes at some points that _shouldn't_ have
happened. Maintain backward compatibility with that changes
is quite "difficult". We are trying here to maintaing bacward
compatibility, but telling people: please, no more changes like this
one. rtc-td-hack here was not here either.
c- Some changes can be implemented in more that one way. Some ways have
some advantages, and other has other compromises. Here we are.
In rtc case, I am discussing c) Current code:
- creates one imaginary device for the new needed fields. Allows
backwards compatiblity (as if only uses rtc, noting has changed).
- my patch: only one device either exist. rtc-td code looks more
integrated with the rest of the rtc code. We have one less device
that care about (where did you put the rtc-td device on qdev)? How do
you explain it on user documentation, ....
Obviously, I preffer my patch, but that don't means that it is the only
way. It is just a different compromise. One that I consider that is
better. But at this point, I think that everybody agrees that you don't
think that my compromise is better.
Later, Juan.
> One way is to have "migration containers"
> for each isolated feature (I don't like to call them "imaginary devices").
>
> --
> Gleb.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 12:29 ` Michael S. Tsirkin
@ 2009-09-16 13:31 ` Juan Quintela
2009-09-16 14:07 ` Michael S. Tsirkin
0 siblings, 1 reply; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 13:31 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, gleb
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 16, 2009 at 02:13:06PM +0200, Juan Quintela wrote:
>> VMState rules are simple:
>> - Everything is explicit
>
> By the way, pci currently has cmask,
> which performs checking on load, making sure
> that load does not modify a constant field in config space,
> which can't change as a result of guest actions.
> If it does - migration fails.
>
> This is IMO much better and more robust than
> simply hoping that there are no bugs or that
> developers remember to increment a version
> number each time they change some field.
This one is going to be fixed. Some kind of checksum that assures you
that you haven't added/removed any field of a vmstatedescription. It is
not difficult to add, but no code/whatever is there.
The only minimal check that it does today is that you put:
VMSTATE_INT32_ARRAY_V(irq_state, PCIDevice, 4, 2),
4 is the length and 2 is the version.
VMstate checks that PCIDevice has an irq_state field of type int32_t
with lenght 4. I could have calculated the 4, but it is there just in
case someone changes the size of the array in PCIDevice, it gets a
compilation error. There is not more infrastructure yet to check for
changes on the state. It should come once devices are ported to
VMState.
> I think it's pretty important to keep this
> feature, and maybe add something similar
> to other devices.
>
> How will VMState support this?
This is the 1st request that I have. This is what the code does today
(it is the same that was before):
static int get_pci_config_device(QEMUFile *f, void *pv, size_t size)
{
PCIDevice *s = container_of(pv, PCIDevice, config);
uint8_t config[size];
int i;
qemu_get_buffer(f, config, size);
for (i = 0; i < size; ++i)
if ((config[i] ^ s->config[i]) & s->cmask[i] & ~s->wmask[i])
return -EINVAL;
memcpy(s->config, config, size);
pci_update_mappings(s);
return 0;
}
/* just put buffer */
static void put_pci_config_device(QEMUFile *f, void *pv, size_t size)
{
const uint8_t *v = pv;
qemu_put_buffer(f, v, size);
}
i.e. at save time, we save everything that we want to save.
At load time, we only copy some things. I don't understand what cmask
and wmask means, but I guess you understand this part better than me.
If we need to add more checks on load, we can just hack on that function
whatever you want to check/change/...
VMstate don't really care (as it shouldn't)
Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 13:03 ` Michael S. Tsirkin
@ 2009-09-16 13:34 ` Juan Quintela
2009-09-16 14:02 ` Michael S. Tsirkin
0 siblings, 1 reply; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 13:34 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, Gleb Natapov
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 16, 2009 at 03:01:22PM +0200, Juan Quintela wrote:
>> >> > pci config is on vmstate already, I don't see a problem here.
>> >>
>> >> vmstate don't understand pci config.
>> >
>> > How can it save it then? What's more, how can it load it sanely? E.g.
>> > when loading we must make sure that device id etc match.
>>
>> We don't test it at all.
>
> Someone has to. Existing code has it. Removing this sanity check and
> changing e.g. device id by load will be a bad thing.
Is there anything else that the code that I posted in the other email?
I didn't removed any check. I haven't removed any check that existed
before when I ported VMState. I only removed some support for very old
versions that didn't work at all (ram device comes to mind). If you see
that I removed any other check, let me know and I will put it back.
Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Qemu-devel] Re: optional feature
2009-09-16 11:57 ` Gleb Natapov
2009-09-16 12:23 ` Juan Quintela
@ 2009-09-16 13:51 ` Anthony Liguori
1 sibling, 0 replies; 33+ messages in thread
From: Anthony Liguori @ 2009-09-16 13:51 UTC (permalink / raw)
To: Gleb Natapov; +Cc: Michael S. Tsirkin, qemu-devel, Juan Quintela
Gleb Natapov wrote:
> I don't know virtio enough to understand all those things below. I am
> asking what is wrong about how RTC did it? You don't need if's or
> whiles. You have general RTC sate in one table and things that needed by
> rtc-td-hack in another table.
The only awkward bit is that we want to connect the savevm state to a
qdev device. Ideally, each qdev device would have a single table of state.
> From vmstate point of view those are not
> connected. Serialization/deserialization should support matching of
> incomming binary blob to deserialize function.
With vmstate, we're moving away from the notion of blobs and
serialization functions. Instead, for a given qdev device, vmstate
describes it's state in a table.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 13:34 ` Juan Quintela
@ 2009-09-16 14:02 ` Michael S. Tsirkin
0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 14:02 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, Gleb Natapov
On Wed, Sep 16, 2009 at 03:34:21PM +0200, Juan Quintela wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Sep 16, 2009 at 03:01:22PM +0200, Juan Quintela wrote:
> >> >> > pci config is on vmstate already, I don't see a problem here.
> >> >>
> >> >> vmstate don't understand pci config.
> >> >
> >> > How can it save it then? What's more, how can it load it sanely? E.g.
> >> > when loading we must make sure that device id etc match.
> >>
> >> We don't test it at all.
> >
> > Someone has to. Existing code has it. Removing this sanity check and
> > changing e.g. device id by load will be a bad thing.
>
> Is there anything else that the code that I posted in the other email?
> I didn't removed any check.
Sorry, I thought you were advocation replacing get_pci_config_device.
If you are happy with it as it is, no problem.
--
MST
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 13:31 ` Juan Quintela
@ 2009-09-16 14:07 ` Michael S. Tsirkin
0 siblings, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 14:07 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, gleb
On Wed, Sep 16, 2009 at 03:31:31PM +0200, Juan Quintela wrote:
> > I think it's pretty important to keep this
> > feature, and maybe add something similar
> > to other devices.
> >
> > How will VMState support this?
>
> This is the 1st request that I have. This is what the code does today
> (it is the same that was before):
Yes. I thought you want to replace that code with a table.
If not, no problem.
--
MST
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Qemu-devel] Re: optional feature
2009-09-16 13:22 ` Juan Quintela
@ 2009-09-16 14:08 ` Anthony Liguori
2009-09-16 14:12 ` Michael S. Tsirkin
0 siblings, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2009-09-16 14:08 UTC (permalink / raw)
To: Juan Quintela; +Cc: qemu-devel, Gleb Natapov, Michael S. Tsirkin
Juan Quintela wrote:
>>> up rtc version +1
>>> add the two fields that we need (together with rtc-td-hack value)
>>>
>> And why this is better? You can't migrate old VM to new qemu even if you
>> don't use rtc-td-hack on new one.
>>
>
> I think the difference between us is:
> - is rtc-td-hack a hack that should only be used for some users
> - it is a valid rtc feature that should be available to everybody
> - it is independent, or it needs an rtc to have any value.
>
We need a single table that contains the full state for the device.
Many devices will have knobs. There are two likely types of knobs:
1) something that indicates how an array of state is going to be
2) a boolean that indicates whether a portion of state is valid
rtc-td falls into the second category. It makes sense to me that the
table state would contain a boolean to indicate whether a given set of
state was valid. You may need a grouping mechanism for this. It
probably makes sense to do this as separate tables. For instance,
.fields = (VMStateField []) {
VMSTATE_BOOL(td_hack, RTCState, (VMStateField[]){
VMSTATE_INT32(irq_coalesced, RTCState),
VMSTATE_INT32(period, RTCState),
VMSTATE_END_OF_LIST()}),
}
If we can't maintain backwards compatibility using this approach (we
definitely can't for rtc-td) then we'll just have to live with that.
I also think arrays should be expressed like this FWIW. Today we have
explicit typed arrays. I would rather see:
.fields = (VMStateField []) {
VMSTATE_ARRAY(nirq, PCIBus, (VMStateField[]) {
VMSTATE_INT32(irq_count[0], PCIBus),
VMSTATE_END_OF_LIST()}),
}
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Qemu-devel] Re: optional feature
2009-09-16 14:08 ` Anthony Liguori
@ 2009-09-16 14:12 ` Michael S. Tsirkin
2009-09-16 14:21 ` Anthony Liguori
0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 14:12 UTC (permalink / raw)
To: Anthony Liguori; +Cc: qemu-devel, Gleb Natapov, Juan Quintela
On Wed, Sep 16, 2009 at 09:08:59AM -0500, Anthony Liguori wrote:
> Juan Quintela wrote:
>>>> up rtc version +1
>>>> add the two fields that we need (together with rtc-td-hack value)
>>>>
>>> And why this is better? You can't migrate old VM to new qemu even if you
>>> don't use rtc-td-hack on new one.
>>>
>>
>> I think the difference between us is:
>> - is rtc-td-hack a hack that should only be used for some users
>> - it is a valid rtc feature that should be available to everybody
>> - it is independent, or it needs an rtc to have any value.
>>
>
> We need a single table that contains the full state for the device.
>
> Many devices will have knobs. There are two likely types of knobs:
>
> 1) something that indicates how an array of state is going to be
> 2) a boolean that indicates whether a portion of state is valid
>
> rtc-td falls into the second category. It makes sense to me that the
> table state would contain a boolean to indicate whether a given set of
> state was valid. You may need a grouping mechanism for this. It
> probably makes sense to do this as separate tables. For instance,
>
> .fields = (VMStateField []) {
> VMSTATE_BOOL(td_hack, RTCState, (VMStateField[]){
> VMSTATE_INT32(irq_coalesced, RTCState),
> VMSTATE_INT32(period, RTCState),
> VMSTATE_END_OF_LIST()}),
> }
>
> If we can't maintain backwards compatibility using this approach (we
> definitely can't for rtc-td) then we'll just have to live with that.
We have to? Why do we? And not only won't we have backwards
compatibility now, we will also break it and have to break it each time
we add a knob.
If instead we would only save/load the part of state if
the knob is set, we won't have a problem.
> I also think arrays should be expressed like this FWIW. Today we have
> explicit typed arrays. I would rather see:
>
> .fields = (VMStateField []) {
> VMSTATE_ARRAY(nirq, PCIBus, (VMStateField[]) {
> VMSTATE_INT32(irq_count[0], PCIBus),
> VMSTATE_END_OF_LIST()}),
> }
Same problem here.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Qemu-devel] Re: optional feature
2009-09-16 14:12 ` Michael S. Tsirkin
@ 2009-09-16 14:21 ` Anthony Liguori
2009-09-16 14:34 ` Michael S. Tsirkin
0 siblings, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2009-09-16 14:21 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, Gleb Natapov, Juan Quintela
Michael S. Tsirkin wrote:
> On Wed, Sep 16, 2009 at 09:08:59AM -0500, Anthony Liguori wrote:
>
>> Juan Quintela wrote:
>>
>>>>> up rtc version +1
>>>>> add the two fields that we need (together with rtc-td-hack value)
>>>>>
>>>>>
>>>> And why this is better? You can't migrate old VM to new qemu even if you
>>>> don't use rtc-td-hack on new one.
>>>>
>>>>
>>> I think the difference between us is:
>>> - is rtc-td-hack a hack that should only be used for some users
>>> - it is a valid rtc feature that should be available to everybody
>>> - it is independent, or it needs an rtc to have any value.
>>>
>>>
>> We need a single table that contains the full state for the device.
>>
>> Many devices will have knobs. There are two likely types of knobs:
>>
>> 1) something that indicates how an array of state is going to be
>> 2) a boolean that indicates whether a portion of state is valid
>>
>> rtc-td falls into the second category. It makes sense to me that the
>> table state would contain a boolean to indicate whether a given set of
>> state was valid. You may need a grouping mechanism for this. It
>> probably makes sense to do this as separate tables. For instance,
>>
>> .fields = (VMStateField []) {
>> VMSTATE_BOOL(td_hack, RTCState, (VMStateField[]){
>> VMSTATE_INT32(irq_coalesced, RTCState),
>> VMSTATE_INT32(period, RTCState),
>> VMSTATE_END_OF_LIST()}),
>> }
>>
>> If we can't maintain backwards compatibility using this approach (we
>> definitely can't for rtc-td) then we'll just have to live with that.
>>
>
> We have to? Why do we?
We could have an open loading function to load old versions of this
device. It's ugly, but there's really no other way.
> And not only won't we have backwards
> compatibility now, we will also break it and have to break it each time
> we add a knob.
>
No, we bump the version number and add more fields to the state.
If we need to make crazy changes (like make a previously non-optional
state, optional) then we can introduce two tables if we have to.
> If instead we would only save/load the part of state if
> the knob is set, we won't have a problem.
>
The rtc device happens to support an optional feature by splitting the
optional bits into a separate section. Not every device does this
though so if you want to convert other devices to this style, you'll
break their backwards compatibility.
The mechanisms are functionally the same. One is no more expressive
than the other. It's the difference of vN introduces these optional
features vs expliciting introducing new sections. What I don't like
about the later is that these all need to be tied together in some sort
of cohesive way.
>> I also think arrays should be expressed like this FWIW. Today we have
>> explicit typed arrays. I would rather see:
>>
>> .fields = (VMStateField []) {
>> VMSTATE_ARRAY(nirq, PCIBus, (VMStateField[]) {
>> VMSTATE_INT32(irq_count[0], PCIBus),
>> VMSTATE_END_OF_LIST()}),
>> }
>>
>
> Same problem here.
>
I don't see what the problem is.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Qemu-devel] Re: optional feature
2009-09-16 14:21 ` Anthony Liguori
@ 2009-09-16 14:34 ` Michael S. Tsirkin
2009-09-16 14:53 ` Juan Quintela
2009-09-16 15:45 ` Anthony Liguori
0 siblings, 2 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 14:34 UTC (permalink / raw)
To: Anthony Liguori; +Cc: qemu-devel, Gleb Natapov, Juan Quintela
On Wed, Sep 16, 2009 at 09:21:14AM -0500, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>> On Wed, Sep 16, 2009 at 09:08:59AM -0500, Anthony Liguori wrote:
>>
>>> Juan Quintela wrote:
>>>
>>>>>> up rtc version +1
>>>>>> add the two fields that we need (together with rtc-td-hack value)
>>>>>>
>>>>> And why this is better? You can't migrate old VM to new qemu even if you
>>>>> don't use rtc-td-hack on new one.
>>>>>
>>>> I think the difference between us is:
>>>> - is rtc-td-hack a hack that should only be used for some users
>>>> - it is a valid rtc feature that should be available to everybody
>>>> - it is independent, or it needs an rtc to have any value.
>>>>
>>> We need a single table that contains the full state for the device.
>>>
>>> Many devices will have knobs. There are two likely types of knobs:
>>>
>>> 1) something that indicates how an array of state is going to be
>>> 2) a boolean that indicates whether a portion of state is valid
>>>
>>> rtc-td falls into the second category. It makes sense to me that the
>>> table state would contain a boolean to indicate whether a given set
>>> of state was valid. You may need a grouping mechanism for this.
>>> It probably makes sense to do this as separate tables. For
>>> instance,
>>>
>>> .fields = (VMStateField []) {
>>> VMSTATE_BOOL(td_hack, RTCState, (VMStateField[]){
>>> VMSTATE_INT32(irq_coalesced, RTCState),
>>> VMSTATE_INT32(period, RTCState),
>>> VMSTATE_END_OF_LIST()}),
>>> }
>>>
>>> If we can't maintain backwards compatibility using this approach (we
>>> definitely can't for rtc-td) then we'll just have to live with that.
>>>
>>
>> We have to? Why do we?
>
> We could have an open loading function to load old versions of this
> device. It's ugly, but there's really no other way.
>
>> And not only won't we have backwards
>> compatibility now, we will also break it and have to break it each time
>> we add a knob.
>>
>
> No, we bump the version number and add more fields to the state.
>
> If we need to make crazy changes (like make a previously non-optional
> state, optional) then we can introduce two tables if we have to.
>
>> If instead we would only save/load the part of state if
>> the knob is set, we won't have a problem.
>>
>
> The rtc device happens to support an optional feature by splitting the
> optional bits into a separate section. Not every device does this
> though so if you want to convert other devices to this style, you'll
> break their backwards compatibility.
>
> The mechanisms are functionally the same. One is no more expressive
> than the other.
Yes, separate devices variant is more expressive.
It is more modular. With optional features A B C, versioning can not
support saving only A and C but not B. This is bad for example for
backporting features between versions: if C happened to be introduced
after B, backporting C will force backporting B.
But you can support it if you put each one in a migration container.
> It's the difference of vN introduces these optional
> features vs expliciting introducing new sections. What I don't like
> about the later is that these all need to be tied together in some sort
> of cohesive way.
I don't understand what this means, sorry.
>>> I also think arrays should be expressed like this FWIW. Today we
>>> have explicit typed arrays. I would rather see:
>>>
>>> .fields = (VMStateField []) {
>>> VMSTATE_ARRAY(nirq, PCIBus, (VMStateField[]) {
>>> VMSTATE_INT32(irq_count[0], PCIBus),
>>> VMSTATE_END_OF_LIST()}),
>>> }
>>>
>>
>> Same problem here.
>>
>
> I don't see what the problem is.
if you are not saving irq state, it's better
to skip the array that create a 0 size array.
The former works for non-array fields, the later does not,
and you have to invent a separate valid bit.
> Regards,
>
> Anthony Liguori
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 14:34 ` Michael S. Tsirkin
@ 2009-09-16 14:53 ` Juan Quintela
2009-09-16 15:11 ` Michael S. Tsirkin
2009-09-16 15:45 ` Anthony Liguori
1 sibling, 1 reply; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 14:53 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Gleb Natapov, qemu-devel
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 16, 2009 at 09:21:14AM -0500, Anthony Liguori wrote:
>> Michael S. Tsirkin wrote:
>>> On Wed, Sep 16, 2009 at 09:08:59AM -0500, Anthony Liguori wrote:
>>>
>>>> Juan Quintela wrote:
>>>>
>>>>>>> up rtc version +1
>>>>>>> add the two fields that we need (together with rtc-td-hack value)
>>>>>>>
>>>>>> And why this is better? You can't migrate old VM to new qemu even if you
>>>>>> don't use rtc-td-hack on new one.
>>>>>>
>>>>> I think the difference between us is:
>>>>> - is rtc-td-hack a hack that should only be used for some users
>>>>> - it is a valid rtc feature that should be available to everybody
>>>>> - it is independent, or it needs an rtc to have any value.
>>>>>
>>>> We need a single table that contains the full state for the device.
>>>>
>>>> Many devices will have knobs. There are two likely types of knobs:
>>>>
>>>> 1) something that indicates how an array of state is going to be
>>>> 2) a boolean that indicates whether a portion of state is valid
>>>>
>>>> rtc-td falls into the second category. It makes sense to me that the
>>>> table state would contain a boolean to indicate whether a given set
>>>> of state was valid. You may need a grouping mechanism for this.
>>>> It probably makes sense to do this as separate tables. For
>>>> instance,
>>>>
>>>> .fields = (VMStateField []) {
>>>> VMSTATE_BOOL(td_hack, RTCState, (VMStateField[]){
>>>> VMSTATE_INT32(irq_coalesced, RTCState),
>>>> VMSTATE_INT32(period, RTCState),
>>>> VMSTATE_END_OF_LIST()}),
>>>> }
>>>>
>>>> If we can't maintain backwards compatibility using this approach (we
>>>> definitely can't for rtc-td) then we'll just have to live with that.
>>>>
>>>
>>> We have to? Why do we?
>>
>> We could have an open loading function to load old versions of this
>> device. It's ugly, but there's really no other way.
>>
>>> And not only won't we have backwards
>>> compatibility now, we will also break it and have to break it each time
>>> we add a knob.
>>>
>>
>> No, we bump the version number and add more fields to the state.
>>
>> If we need to make crazy changes (like make a previously non-optional
>> state, optional) then we can introduce two tables if we have to.
>>
>>> If instead we would only save/load the part of state if
>>> the knob is set, we won't have a problem.
>>>
>>
>> The rtc device happens to support an optional feature by splitting the
>> optional bits into a separate section. Not every device does this
>> though so if you want to convert other devices to this style, you'll
>> break their backwards compatibility.
>>
>> The mechanisms are functionally the same. One is no more expressive
>> than the other.
>
> Yes, separate devices variant is more expressive.
> It is more modular. With optional features A B C, versioning can not
> support saving only A and C but not B. This is bad for example for
> backporting features between versions: if C happened to be introduced
> after B, backporting C will force backporting B.
No problem again.
You backport, and you add to the state both B and C. You just don't
care about B values. I leave you a name for them:
reserved0
reserved1
reserved2
And you are done.
And what is worse, what happens when you have to upgrade B and C with
new fields? Now you have:
A, B, B', C, C'
And what options are valid? How you differentiate between B and B', C
and C', a version number? We are back at stage 1?
And how many features do you support? Exponential again.
With linear version numbers you are going to have:
A
A+B
A+B+C
A+B'+C
A+B'+C'
(you can switch the 2 last ones depending the order in which changes
happen). And that is it, no exponential cases again. we added 4
features and we have 4 new versions. It looks very linear to me.
And we can still load all the previous versions.
> But you can support it if you put each one in a migration container.
My opinion: We don't even want to think about this.
> if you are not saving irq state, it's better
> to skip the array that create a 0 size array.
Why?
> The former works for non-array fields, the later does not,
> and you have to invent a separate valid bit.
VMStateDescription, think of it as a contract. Would you like that the
other part would be able to insert whole paragraphs when he wants?
Without ever telling you that it changed (i.e. same version).
I don't think so. I am sure I would preffer that it will told me
clearly that contract changed (new version), and the changes are this,
this and this.
Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 14:53 ` Juan Quintela
@ 2009-09-16 15:11 ` Michael S. Tsirkin
2009-09-16 15:25 ` Juan Quintela
0 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2009-09-16 15:11 UTC (permalink / raw)
To: Juan Quintela; +Cc: Gleb Natapov, qemu-devel
On Wed, Sep 16, 2009 at 04:53:47PM +0200, Juan Quintela wrote:
> > It is more modular. With optional features A B C, versioning can not
> > support saving only A and C but not B. This is bad for example for
> > backporting features between versions: if C happened to be introduced
> > after B, backporting C will force backporting B.
>
> No problem again.
> You backport, and you add to the state both B and C. You just don't
> care about B values. I leave you a name for them:
>
> reserved0
> reserved1
> reserved2
>
> And you are done.
Not really. When you save, B has an invalid value.
Load it in upstream qemu, boom.
> And what is worse, what happens when you have to upgrade B and C with
> new fields? Now you have:
>
> A, B, B', C, C'
>
> And what options are valid? How you differentiate between B and B', C
> and C', a version number?
Each is a new feature.
If B' relaces B, then do not store B, store only B'.
> We are back at stage 1?
>
> And how many features do you support? Exponential again.
Nothing exponential. Test with each feature on and off, you are done.
> With linear version numbers you are going to have:
>
> A
> A+B
> A+B+C
> A+B'+C
> A+B'+C'
This is exponential in the number of combinations you need to code up.
> (you can switch the 2 last ones depending the order in which changes
> happen). And that is it, no exponential cases again. we added 4
> features and we have 4 new versions. It looks very linear to me.
> And we can still load all the previous versions.
>
> > But you can support it if you put each one in a migration container.
>
> My opinion: We don't even want to think about this.
>
>
> > if you are not saving irq state, it's better
> > to skip the array that create a 0 size array.
>
> Why?
Because of what I said below: it does not work for non-arrays.
> > The former works for non-array fields, the later does not,
> > and you have to invent a separate valid bit.
>
> VMStateDescription, think of it as a contract. Would you like that the
> other part would be able to insert whole paragraphs when he wants?
> Without ever telling you that it changed (i.e. same version).
Yes, it has to tell me, each option should be tagged.
I see a new paragraph, I do not recognize it, I abort.
> I don't think so. I am sure I would preffer that it will told me
> clearly that contract changed (new version), and the changes are this,
> this and this.
It does not though. The connection between versions and
sets of features is scattered over the code.
Instead, the format should be self-documenting.
> Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* [Qemu-devel] Re: optional feature
2009-09-16 15:11 ` Michael S. Tsirkin
@ 2009-09-16 15:25 ` Juan Quintela
0 siblings, 0 replies; 33+ messages in thread
From: Juan Quintela @ 2009-09-16 15:25 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Gleb Natapov, qemu-devel
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 16, 2009 at 04:53:47PM +0200, Juan Quintela wrote:
>> > It is more modular. With optional features A B C, versioning can not
>> > support saving only A and C but not B. This is bad for example for
>> > backporting features between versions: if C happened to be introduced
>> > after B, backporting C will force backporting B.
>>
>> No problem again.
>> You backport, and you add to the state both B and C. You just don't
>> care about B values. I leave you a name for them:
>>
>> reserved0
>> reserved1
>> reserved2
>>
>> And you are done.
>
> Not really. When you save, B has an invalid value.
> Load it in upstream qemu, boom.
You are donig the backport or only C, you have to be able to put
sensible values here.
>> And what is worse, what happens when you have to upgrade B and C with
>> new fields? Now you have:
>>
>> A, B, B', C, C'
>>
>> And what options are valid? How you differentiate between B and B', C
>> and C', a version number?
>
> Each is a new feature.
> If B' relaces B, then do not store B, store only B'.
Backward compatibility was the whole game, do you remember? If I can
remove backward compatiblity, at least half of the problems dissapear.
>> We are back at stage 1?
>>
>> And how many features do you support? Exponential again.
>
> Nothing exponential. Test with each feature on and off, you are done.
1 features 2 test
2 features 4 tests
4 features 16 tests
it looks very exponential to me. You need to test all combinations.
You are the one wanting to be able to use all combinations.
>> With linear version numbers you are going to have:
>>
>> A
>> A+B
>> A+B+C
>> A+B'+C
>> A+B'+C'
>
> This is exponential in the number of combinations you need to code up.
No, you only code the ones that I showed. You are done.
>> (you can switch the 2 last ones depending the order in which changes
>> happen). And that is it, no exponential cases again. we added 4
>> features and we have 4 new versions. It looks very linear to me.
>> And we can still load all the previous versions.
>>
>> > But you can support it if you put each one in a migration container.
>>
>> My opinion: We don't even want to think about this.
>>
>>
>> > if you are not saving irq state, it's better
>> > to skip the array that create a 0 size array.
>>
>> Why?
>
> Because of what I said below: it does not work for non-arrays.
For non arrays it is easier, you just put the value there. No problem
at all. You are saving a 512MB (at minimum) image, believe me, 20 bytes
more/less are not going to matter.
>> > The former works for non-array fields, the later does not,
>> > and you have to invent a separate valid bit.
>>
>> VMStateDescription, think of it as a contract. Would you like that the
>> other part would be able to insert whole paragraphs when he wants?
>> Without ever telling you that it changed (i.e. same version).
>
> Yes, it has to tell me
How, where?
>, each option should be tagged.
how?
> I see a new paragraph, I do not recognize it, I abort.
>> I don't think so. I am sure I would preffer that it will told me
>> clearly that contract changed (new version), and the changes are this,
>> this and this.
>
> It does not though. The connection between versions and
> sets of features is scattered over the code.
> Instead, the format should be self-documenting.
Wait, I thought you were the one wanting backward compatibility. Now,
we want a different format. And how are you going to send that to an
old version? Format is what it is. I can't understand your switch
here, sorry.
1st: you want to send blobs, and you make sure that you understand what
you send, vmstate is just a transport.
2nd: you want to be able to pick what features you want/don't want, set
the save version number, and expect that the other side is able to
understand. You explicitely wanted to be able to send for a newer
qemu version in a device with more features to an old qemu version
with an device with less features.
3rd: Now you want a different format, where backward compatibility
obviously dissapear.
Sorry, I can't follow you.
Later, Juan.
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Qemu-devel] Re: optional feature
2009-09-16 14:34 ` Michael S. Tsirkin
2009-09-16 14:53 ` Juan Quintela
@ 2009-09-16 15:45 ` Anthony Liguori
2009-09-16 15:58 ` Anthony Liguori
1 sibling, 1 reply; 33+ messages in thread
From: Anthony Liguori @ 2009-09-16 15:45 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, Gleb Natapov, Juan Quintela
Michael S. Tsirkin wrote:
>>> If instead we would only save/load the part of state if
>>> the knob is set, we won't have a problem.
>>>
>>>
>> The rtc device happens to support an optional feature by splitting the
>> optional bits into a separate section. Not every device does this
>> though so if you want to convert other devices to this style, you'll
>> break their backwards compatibility.
>>
>> The mechanisms are functionally the same. One is no more expressive
>> than the other.
>>
>
> Yes, separate devices variant is more expressive.
>
Not when you consider my proposed syntax:
.fields = (VMStateField []) {
VMSTATE_BOOL(td_hack, RTCState, (VMStateField[]){
VMSTATE_INT32(irq_coalesced, RTCState),
VMSTATE_INT32(period, RTCState),
VMSTATE_END_OF_LIST()}),
}
You could clearly encode this on the wire as a separate section. You
could autogenerate the name as "rtc-td_hack". It won't be backwards
compatible for save but that's okay. We can hack things together to
make it backwards compatible on load.
I don't like this as a wire format though. The point though is that if
we get the VMState syntax right, then we can make whatever changes down
the road we need.
> It is more modular. With optional features A B C, versioning can not
> support saving only A and C but not B. This is bad for example for
> backporting features between versions: if C happened to be introduced
> after B, backporting C will force backporting B.
>
The real argument is against linear versioning. The whole "optional
feature" thing is almost orthogonal.
If we want to support downstream versioning, then I think we should
attack that problem properly instead of shoe horning it via "optional
features". This would involve introducing a v4 of the savevm protocol
that allowed for a minor versions of device state. QEMU would always
set the minor version to 0. If downstream decides to introduce changes,
they can bump the minor version for a device. We can also add a minor
version to the savevm protocol itself along with a vendor id. This lets
a vendor identify itself and allows the vendor to change the savevm
protocol. We can use reverse fqdn for vendor id to avoid id
distribution issues.
This solves a number of problems.
If qemu-kvm decides to change a device's state, they can increment a
minor version of that device and set the vendor id to one allocated for
qemu-kvm. Since qemu-kvm has other downstreams, it should also bump the
savevm protocol minor version to introduce an additional vendor
id/device minor number. This way if Fedora/RHEL decides to backport a
feature into qemu-kvm, they can set their own vendor id and also bump
the secondary minor version. This gives downstreams nice, non-linear
versioning that they can use and extend indefinitely.
It also provides a way to detect whether it's possible to migrate a VM
from one vendor's qemu to another. For instance, even though RHEL may
have backported a feature, if that feature isn't used for a particular
VM, it's still migratable to an upstream version. We determine this is
an entirely programmatic way. It cannot migrate when that feature is
enabled (even to a newer qemu that has the feature) but to me, that's a
not a negative thing.
> if you are not saving irq state, it's better
> to skip the array that create a 0 size array.
> The former works for non-array fields, the later does not,
> and you have to invent a separate valid bit.
>
I think that's a minor wire detail that has nothing to do with the table
format.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [Qemu-devel] Re: optional feature
2009-09-16 15:45 ` Anthony Liguori
@ 2009-09-16 15:58 ` Anthony Liguori
0 siblings, 0 replies; 33+ messages in thread
From: Anthony Liguori @ 2009-09-16 15:58 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: qemu-devel, Gleb Natapov, Juan Quintela
Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>>>> If instead we would only save/load the part of state if
>>>> the knob is set, we won't have a problem.
>>>>
>>> The rtc device happens to support an optional feature by splitting
>>> the optional bits into a separate section. Not every device does
>>> this though so if you want to convert other devices to this style,
>>> you'll break their backwards compatibility.
>>>
>>> The mechanisms are functionally the same. One is no more
>>> expressive than the other.
>>>
>>
>> Yes, separate devices variant is more expressive.
>>
>
> Not when you consider my proposed syntax:
>
> .fields = (VMStateField []) {
> VMSTATE_BOOL(td_hack, RTCState, (VMStateField[]){
> VMSTATE_INT32(irq_coalesced, RTCState),
> VMSTATE_INT32(period, RTCState),
> VMSTATE_END_OF_LIST()}),
> }
>
> You could clearly encode this on the wire as a separate section. You
> could autogenerate the name as "rtc-td_hack". It won't be backwards
> compatible for save but that's okay. We can hack things together to
> make it backwards compatible on load.
>
> I don't like this as a wire format though. The point though is that
> if we get the VMState syntax right, then we can make whatever changes
> down the road we need.
>
>> It is more modular. With optional features A B C, versioning can not
>> support saving only A and C but not B. This is bad for example for
>> backporting features between versions: if C happened to be introduced
>> after B, backporting C will force backporting B.
>>
>
> The real argument is against linear versioning. The whole "optional
> feature" thing is almost orthogonal.
>
> If we want to support downstream versioning, then I think we should
> attack that problem properly instead of shoe horning it via "optional
> features". This would involve introducing a v4 of the savevm protocol
> that allowed for a minor versions of device state. QEMU would always
> set the minor version to 0. If downstream decides to introduce
> changes, they can bump the minor version for a device. We can also
> add a minor version to the savevm protocol itself along with a vendor id.
Scratch the savevm protocol minor number. We need to properly support
multiple vendor ids and minor numbers. Otherwise, if you have multiple
down streams, you could never migrate from the 2-level downstream to
upstream.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2009-09-16 15:58 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-16 10:46 [Qemu-devel] optional feature (was Re: The State of the SaveVM format) Michael S. Tsirkin
2009-09-16 11:04 ` [Qemu-devel] Re: optional feature Juan Quintela
2009-09-16 11:18 ` Gleb Natapov
2009-09-16 11:48 ` Juan Quintela
2009-09-16 11:52 ` Michael S. Tsirkin
2009-09-16 12:14 ` Juan Quintela
2009-09-16 12:18 ` Michael S. Tsirkin
2009-09-16 12:26 ` Juan Quintela
2009-09-16 12:37 ` Michael S. Tsirkin
2009-09-16 13:01 ` Juan Quintela
2009-09-16 13:03 ` Michael S. Tsirkin
2009-09-16 13:34 ` Juan Quintela
2009-09-16 14:02 ` Michael S. Tsirkin
2009-09-16 11:57 ` Gleb Natapov
2009-09-16 12:23 ` Juan Quintela
2009-09-16 12:35 ` Gleb Natapov
2009-09-16 12:40 ` Michael S. Tsirkin
2009-09-16 13:22 ` Juan Quintela
2009-09-16 14:08 ` Anthony Liguori
2009-09-16 14:12 ` Michael S. Tsirkin
2009-09-16 14:21 ` Anthony Liguori
2009-09-16 14:34 ` Michael S. Tsirkin
2009-09-16 14:53 ` Juan Quintela
2009-09-16 15:11 ` Michael S. Tsirkin
2009-09-16 15:25 ` Juan Quintela
2009-09-16 15:45 ` Anthony Liguori
2009-09-16 15:58 ` Anthony Liguori
2009-09-16 13:51 ` Anthony Liguori
2009-09-16 11:41 ` Michael S. Tsirkin
2009-09-16 12:13 ` Juan Quintela
2009-09-16 12:29 ` Michael S. Tsirkin
2009-09-16 13:31 ` Juan Quintela
2009-09-16 14:07 ` Michael S. Tsirkin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).