[Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
@ 2011-06-30 15:46 Paolo Bonzini
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their " Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 47+ messages in thread
From: Paolo Bonzini @ 2011-06-30 15:46 UTC (permalink / raw)
  To: qemu-devel

With the current migration format, VMS_STRUCTs with subsections
are ambiguous.  The protocol cannot tell whether a 0x5 byte after
the VMS_STRUCT is a subsection or part of the parent data stream.
In the past QEMU assumed it was always a part of a subsection; after
commit eb60260 (savevm: fix corruption in vmstate_subsection_load(),
2011-02-03) the choice depends on whether the VMS_STRUCT has subsections
defined.

Unfortunately, this means that if a destination has no subsections
defined for the struct, it will happily read subsection data into
its own fields.  And if you are "lucky" enough to stumble on a
zero byte at the right time, it will be interpreted as QEMU_VM_EOF
and migration will be interrupted with half-loaded state.

There is no way out of this except defining an incompatible
migration protocol.  Not-so-long-term we should really try to define
one that is not a joke, but the bug is serious so we need a solution
for 0.15.  A sentinel at the end of embedded structs does remove the
ambiguity.

Of course, this can be restricted to new machine models, and this
is what the patch series does.  (And note that only patch 3 is specific
to the short-term solution, everything else is entirely generic).

Untested beyond compilation.

Paolo Bonzini (4):
  add support for machine models to specify their migration format
  add pc-0.14 machine
  savevm: define new unambiguous migration format
  Revert "savevm: fix corruption in vmstate_subsection_load()."

 cpu-common.h  |    3 ---
 qemu-common.h |    2 ++
 hw/boards.h   |    1 +
 hw/pc_piix.c  |   16 +++++++++++++++-
 savevm.c      |   44 +++++++++++++++++++++++++-------------------
 5 files changed, 43 insertions(+), 23 deletions(-)
                                                              50,2          Bot

-- 
1.7.5.2

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their migration format
  2011-06-30 15:46 [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
@ 2011-06-30 15:46 ` Paolo Bonzini
  2011-06-30 18:11   ` Michael S. Tsirkin
  2011-07-29 13:08   ` Anthony Liguori
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 2/4] add pc-0.14 machine Paolo Bonzini
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 47+ messages in thread
From: Paolo Bonzini @ 2011-06-30 15:46 UTC (permalink / raw)
  To: qemu-devel

We need to provide a new migration format, and not break migration
in old machine models.  So add a migration_format field to QEMUMachine.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 cpu-common.h  |    3 ---
 hw/boards.h   |    1 +
 qemu-common.h |    3 +++
 savevm.c      |    7 +++++--
 4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index b027e43..8c06dbb 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -26,9 +26,6 @@ enum device_endian {
     DEVICE_LITTLE_ENDIAN,
 };
 
-/* address in the RAM (different from a physical address) */
-typedef unsigned long ram_addr_t;
-
 /* memory API */
 
 typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value);
diff --git a/hw/boards.h b/hw/boards.h
index 716fd7b..560dbaf 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -19,6 +19,7 @@ typedef struct QEMUMachine {
     QEMUMachineInitFunc *init;
     int use_scsi;
     int max_cpus;
+    unsigned migration_format;
     unsigned int no_serial:1,
         no_parallel:1,
         use_virtcon:1,
diff --git a/qemu-common.h b/qemu-common.h
index 109498d..550fe2c 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -119,6 +119,9 @@ static inline char *realpath(const char *path, char *resolved_path)
 #define PRIo64 "I64o"
 #endif
 
+/* address in the RAM (different from a physical address) */
+typedef unsigned long ram_addr_t;
+
 /* FIXME: Remove NEED_CPU_H.  */
 #ifndef NEED_CPU_H
 
diff --git a/savevm.c b/savevm.c
index 8139bc7..74e6e99 100644
--- a/savevm.c
+++ b/savevm.c
@@ -72,6 +72,7 @@
 #include "qemu-common.h"
 #include "hw/hw.h"
 #include "hw/qdev.h"
+#include "hw/boards.h"
 #include "net.h"
 #include "monitor.h"
 #include "sysemu.h"
@@ -1474,7 +1475,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
     }
     
     qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+    qemu_put_be32(f, current_machine->migration_format ?: QEMU_VM_FILE_VERSION);
 
     QTAILQ_FOREACH(se, &savevm_handlers, entry) {
         int len;
@@ -1747,8 +1748,10 @@ int qemu_loadvm_state(QEMUFile *f)
         fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
         return -ENOTSUP;
     }
-    if (v != QEMU_VM_FILE_VERSION)
+    if (v != (current_machine->migration_format ?: QEMU_VM_FILE_VERSION)) {
+        fprintf(stderr, "Mismatching SaveVM format v%d\n", v);
         return -ENOTSUP;
+    }
 
     while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
         uint32_t instance_id, version_id, section_id;
-- 
1.7.5.2

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their migration format
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their " Paolo Bonzini
@ 2011-06-30 18:11   ` Michael S. Tsirkin
  2011-07-01  6:10     ` Paolo Bonzini
  2011-07-29 13:08   ` Anthony Liguori
  1 sibling, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2011-06-30 18:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Thu, Jun 30, 2011 at 05:46:14PM +0200, Paolo Bonzini wrote:
> We need to provide a new migration format, and not break migration
> in old machine models.  So add a migration_format field to QEMUMachine.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Should not machine describe guest behaviour?
It seems that a flag to the migrate/save command to control format
would be better - this way the same machine can
migrate to different formats.

> ---
>  cpu-common.h  |    3 ---
>  hw/boards.h   |    1 +
>  qemu-common.h |    3 +++
>  savevm.c      |    7 +++++--
>  4 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/cpu-common.h b/cpu-common.h
> index b027e43..8c06dbb 100644
> --- a/cpu-common.h
> +++ b/cpu-common.h
> @@ -26,9 +26,6 @@ enum device_endian {
>      DEVICE_LITTLE_ENDIAN,
>  };
>  
> -/* address in the RAM (different from a physical address) */
> -typedef unsigned long ram_addr_t;
> -
>  /* memory API */
>  
>  typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value);
> diff --git a/hw/boards.h b/hw/boards.h
> index 716fd7b..560dbaf 100644
> --- a/hw/boards.h
> +++ b/hw/boards.h
> @@ -19,6 +19,7 @@ typedef struct QEMUMachine {
>      QEMUMachineInitFunc *init;
>      int use_scsi;
>      int max_cpus;
> +    unsigned migration_format;
>      unsigned int no_serial:1,
>          no_parallel:1,
>          use_virtcon:1,
> diff --git a/qemu-common.h b/qemu-common.h
> index 109498d..550fe2c 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -119,6 +119,9 @@ static inline char *realpath(const char *path, char *resolved_path)
>  #define PRIo64 "I64o"
>  #endif
>  
> +/* address in the RAM (different from a physical address) */
> +typedef unsigned long ram_addr_t;
> +
>  /* FIXME: Remove NEED_CPU_H.  */
>  #ifndef NEED_CPU_H
>  
> diff --git a/savevm.c b/savevm.c
> index 8139bc7..74e6e99 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -72,6 +72,7 @@
>  #include "qemu-common.h"
>  #include "hw/hw.h"
>  #include "hw/qdev.h"
> +#include "hw/boards.h"
>  #include "net.h"
>  #include "monitor.h"
>  #include "sysemu.h"
> @@ -1474,7 +1475,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
>      }
>      
>      qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
> -    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
> +    qemu_put_be32(f, current_machine->migration_format ?: QEMU_VM_FILE_VERSION);
>  
>      QTAILQ_FOREACH(se, &savevm_handlers, entry) {
>          int len;
> @@ -1747,8 +1748,10 @@ int qemu_loadvm_state(QEMUFile *f)
>          fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
>          return -ENOTSUP;
>      }
> -    if (v != QEMU_VM_FILE_VERSION)
> +    if (v != (current_machine->migration_format ?: QEMU_VM_FILE_VERSION)) {
> +        fprintf(stderr, "Mismatching SaveVM format v%d\n", v);
>          return -ENOTSUP;
> +    }
>  
>      while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
>          uint32_t instance_id, version_id, section_id;
> -- 
> 1.7.5.2
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their migration format
  2011-06-30 18:11   ` Michael S. Tsirkin
@ 2011-07-01  6:10     ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2011-07-01  6:10 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-devel

On 06/30/2011 08:11 PM, Michael S. Tsirkin wrote:
> >  Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>
> Should not machine describe guest behaviour?
> It seems that a flag to the migrate/save command to control format
> would be better - this way the same machine can
> migrate to different formats.

I thought about it, but there is no savevm state variable (only 
migration state, but it's unused for snapshotting).  So it seemed too 
much work for an RFC series I wanted to push out fast.

Besides, you need to store a default somewhere, and the default must be 
the old format for versioned machine models; perhaps not from QEMU where 
new->old migration is a wild bet anyway, but for downstream it must. 
Then I think that you are suggesting a superset of this patch. I'm not 
even sure it is necessary to have it though: it is a correctness fix, it 
should be on for everyone.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their migration format
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their " Paolo Bonzini
  2011-06-30 18:11   ` Michael S. Tsirkin
@ 2011-07-29 13:08   ` Anthony Liguori
  2011-07-29 14:35     ` Paolo Bonzini
  1 sibling, 1 reply; 47+ messages in thread
From: Anthony Liguori @ 2011-07-29 13:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On 06/30/2011 10:46 AM, Paolo Bonzini wrote:
> We need to provide a new migration format, and not break migration
> in old machine models.  So add a migration_format field to QEMUMachine.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
>   cpu-common.h  |    3 ---
>   hw/boards.h   |    1 +
>   qemu-common.h |    3 +++
>   savevm.c      |    7 +++++--
>   4 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/cpu-common.h b/cpu-common.h
> index b027e43..8c06dbb 100644
> --- a/cpu-common.h
> +++ b/cpu-common.h
> @@ -26,9 +26,6 @@ enum device_endian {
>       DEVICE_LITTLE_ENDIAN,
>   };
>
> -/* address in the RAM (different from a physical address) */
> -typedef unsigned long ram_addr_t;
> -
>   /* memory API */
>
>   typedef void CPUWriteMemoryFunc(void *opaque, target_phys_addr_t addr, uint32_t value);
> diff --git a/hw/boards.h b/hw/boards.h
> index 716fd7b..560dbaf 100644
> --- a/hw/boards.h
> +++ b/hw/boards.h
> @@ -19,6 +19,7 @@ typedef struct QEMUMachine {
>       QEMUMachineInitFunc *init;
>       int use_scsi;
>       int max_cpus;
> +    unsigned migration_format;
>       unsigned int no_serial:1,
>           no_parallel:1,
>           use_virtcon:1,
> diff --git a/qemu-common.h b/qemu-common.h
> index 109498d..550fe2c 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -119,6 +119,9 @@ static inline char *realpath(const char *path, char *resolved_path)
>   #define PRIo64 "I64o"
>   #endif
>
> +/* address in the RAM (different from a physical address) */
> +typedef unsigned long ram_addr_t;
> +
>   /* FIXME: Remove NEED_CPU_H.  */
>   #ifndef NEED_CPU_H
>
> diff --git a/savevm.c b/savevm.c
> index 8139bc7..74e6e99 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -72,6 +72,7 @@
>   #include "qemu-common.h"
>   #include "hw/hw.h"
>   #include "hw/qdev.h"
> +#include "hw/boards.h"
>   #include "net.h"
>   #include "monitor.h"
>   #include "sysemu.h"
> @@ -1474,7 +1475,7 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, int blk_enable,
>       }
>
>       qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
> -    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
> +    qemu_put_be32(f, current_machine->migration_format ?: QEMU_VM_FILE_VERSION);

Please avoid this gcc extension as it's relatively obscure.  But in 
addition, why would use you 0 as the new format instead of 
QEMU_VM_FILE_VERSION + 1?

Regards,

Anthony Liguori

>       QTAILQ_FOREACH(se,&savevm_handlers, entry) {
>           int len;
> @@ -1747,8 +1748,10 @@ int qemu_loadvm_state(QEMUFile *f)
>           fprintf(stderr, "SaveVM v2 format is obsolete and don't work anymore\n");
>           return -ENOTSUP;
>       }
> -    if (v != QEMU_VM_FILE_VERSION)
> +    if (v != (current_machine->migration_format ?: QEMU_VM_FILE_VERSION)) {
> +        fprintf(stderr, "Mismatching SaveVM format v%d\n", v);
>           return -ENOTSUP;
> +    }
>
>       while ((section_type = qemu_get_byte(f)) != QEMU_VM_EOF) {
>           uint32_t instance_id, version_id, section_id;

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their migration format
  2011-07-29 13:08   ` Anthony Liguori
@ 2011-07-29 14:35     ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2011-07-29 14:35 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

On 07/29/2011 03:08 PM, Anthony Liguori wrote:
>
> Please avoid this gcc extension as it's relatively obscure.  But in
> addition, why would use you 0 as the new format instead of
> QEMU_VM_FILE_VERSION + 1?

0 is the default.  If a machine doesn't specify a format, it gets the 
newest one automatically.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Qemu-devel] [RFC PATCH 2/4] add pc-0.14 machine
  2011-06-30 15:46 [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their " Paolo Bonzini
@ 2011-06-30 15:46 ` Paolo Bonzini
  2011-08-05 19:26   ` Bruce Rogers
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 3/4] savevm: define new unambiguous migration format Paolo Bonzini
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2011-06-30 15:46 UTC (permalink / raw)
  To: qemu-devel

The new pc-0.15 machine will have a different migration format, so
define the compatibility one right now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pc_piix.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index c5c16b4..18cc942 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -258,7 +258,7 @@ static void pc_xen_hvm_init(ram_addr_t ram_size,
 #endif
 
 static QEMUMachine pc_machine = {
-    .name = "pc-0.14",
+    .name = "pc-0.15",
     .alias = "pc",
     .desc = "Standard PC",
     .init = pc_init_pci,
@@ -266,6 +266,13 @@ static QEMUMachine pc_machine = {
     .is_default = 1,
 };
 
+static QEMUMachine pc_machine_v0_14 = {
+    .name = "pc-0.14",
+    .desc = "Standard PC",
+    .init = pc_init_pci,
+    .max_cpus = 255,
+};
+
 static QEMUMachine pc_machine_v0_13 = {
     .name = "pc-0.13",
     .desc = "Standard PC",
@@ -482,6 +489,7 @@ static QEMUMachine xenfv_machine = {
 static void pc_machine_init(void)
 {
     qemu_register_machine(&pc_machine);
+    qemu_register_machine(&pc_machine_v0_14);
     qemu_register_machine(&pc_machine_v0_13);
     qemu_register_machine(&pc_machine_v0_12);
     qemu_register_machine(&pc_machine_v0_11);
-- 
1.7.5.2

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/4] add pc-0.14 machine
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 2/4] add pc-0.14 machine Paolo Bonzini
@ 2011-08-05 19:26   ` Bruce Rogers
  2011-08-05 19:41     ` Anthony Liguori
  0 siblings, 1 reply; 47+ messages in thread
From: Bruce Rogers @ 2011-08-05 19:26 UTC (permalink / raw)
  To: qemu-devel, Paolo Bonzini

So, do we not need this change, then, to go along with the 0.15 release?

Bruce

 >>> On 6/30/2011 at 09:46 AM, Paolo Bonzini <pbonzini@redhat.com> wrote: 
> The new pc-0.15 machine will have a different migration format, so
> define the compatibility one right now.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  hw/pc_piix.c |   10 +++++++++-
>  1 files changed, 9 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/pc_piix.c b/hw/pc_piix.c
> index c5c16b4..18cc942 100644
> --- a/hw/pc_piix.c
> +++ b/hw/pc_piix.c
> @@ -258,7 +258,7 @@ static void pc_xen_hvm_init(ram_addr_t ram_size,
>  #endif
>  
>  static QEMUMachine pc_machine = {
> -    .name = "pc-0.14",
> +    .name = "pc-0.15",
>      .alias = "pc",
>      .desc = "Standard PC",
>      .init = pc_init_pci,
> @@ -266,6 +266,13 @@ static QEMUMachine pc_machine = {
>      .is_default = 1,
>  };
>  
> +static QEMUMachine pc_machine_v0_14 = {
> +    .name = "pc-0.14",
> +    .desc = "Standard PC",
> +    .init = pc_init_pci,
> +    .max_cpus = 255,
> +};
> +
>  static QEMUMachine pc_machine_v0_13 = {
>      .name = "pc-0.13",
>      .desc = "Standard PC",
> @@ -482,6 +489,7 @@ static QEMUMachine xenfv_machine = {
>  static void pc_machine_init(void)
>  {
>      qemu_register_machine(&pc_machine);
> +    qemu_register_machine(&pc_machine_v0_14);
>      qemu_register_machine(&pc_machine_v0_13);
>      qemu_register_machine(&pc_machine_v0_12);
>      qemu_register_machine(&pc_machine_v0_11);

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 2/4] add pc-0.14 machine
  2011-08-05 19:26   ` Bruce Rogers
@ 2011-08-05 19:41     ` Anthony Liguori
  0 siblings, 0 replies; 47+ messages in thread
From: Anthony Liguori @ 2011-08-05 19:41 UTC (permalink / raw)
  To: Bruce Rogers; +Cc: Paolo Bonzini, qemu-devel

On 08/05/2011 02:26 PM, Bruce Rogers wrote:
> So, do we not need this change, then, to go along with the 0.15 release?

Strictly speaking, the 0.15 machine type is identical to the 0.14 
machine type.  It wouldn't hurt to have a 0.15 machine alias but I don't 
think it's a strict requirement.

Regards,

Anthony Liguori

> Bruce
>
>   >>>  On 6/30/2011 at 09:46 AM, Paolo Bonzini<pbonzini@redhat.com>  wrote:
>> The new pc-0.15 machine will have a different migration format, so
>> define the compatibility one right now.
>>
>> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
>> ---
>>   hw/pc_piix.c |   10 +++++++++-
>>   1 files changed, 9 insertions(+), 1 deletions(-)
>>
>> diff --git a/hw/pc_piix.c b/hw/pc_piix.c
>> index c5c16b4..18cc942 100644
>> --- a/hw/pc_piix.c
>> +++ b/hw/pc_piix.c
>> @@ -258,7 +258,7 @@ static void pc_xen_hvm_init(ram_addr_t ram_size,
>>   #endif
>>
>>   static QEMUMachine pc_machine = {
>> -    .name = "pc-0.14",
>> +    .name = "pc-0.15",
>>       .alias = "pc",
>>       .desc = "Standard PC",
>>       .init = pc_init_pci,
>> @@ -266,6 +266,13 @@ static QEMUMachine pc_machine = {
>>       .is_default = 1,
>>   };
>>
>> +static QEMUMachine pc_machine_v0_14 = {
>> +    .name = "pc-0.14",
>> +    .desc = "Standard PC",
>> +    .init = pc_init_pci,
>> +    .max_cpus = 255,
>> +};
>> +
>>   static QEMUMachine pc_machine_v0_13 = {
>>       .name = "pc-0.13",
>>       .desc = "Standard PC",
>> @@ -482,6 +489,7 @@ static QEMUMachine xenfv_machine = {
>>   static void pc_machine_init(void)
>>   {
>>       qemu_register_machine(&pc_machine);
>> +    qemu_register_machine(&pc_machine_v0_14);
>>       qemu_register_machine(&pc_machine_v0_13);
>>       qemu_register_machine(&pc_machine_v0_12);
>>       qemu_register_machine(&pc_machine_v0_11);
>
>
>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Qemu-devel] [RFC PATCH 3/4] savevm: define new unambiguous migration format
  2011-06-30 15:46 [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their " Paolo Bonzini
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 2/4] add pc-0.14 machine Paolo Bonzini
@ 2011-06-30 15:46 ` Paolo Bonzini
  2011-07-29 13:12   ` Anthony Liguori
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 4/4] Partially revert "savevm: fix corruption in vmstate_subsection_load()." Paolo Bonzini
  2011-07-25 21:10 ` [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
  4 siblings, 1 reply; 47+ messages in thread
From: Paolo Bonzini @ 2011-06-30 15:46 UTC (permalink / raw)
  To: qemu-devel

With the current migration format, VMS_STRUCTs with subsections
are ambiguous.  The protocol cannot tell whether a 0x5 byte after
the VMS_STRUCT is a subsection or part of the parent data stream.
In the past QEMU assumed it was always a part of a subsection; after
commit eb60260 (savevm: fix corruption in vmstate_subsection_load().,
2011-02-03) the choice depends on whether the VMS_STRUCT has subsections
defined.

Unfortunately, this means that if a destination has no subsections
defined for the struct, it will happily read subsection data into
its own fields.  And if you are "lucky" enough to stumble on a
zero byte at the right time, it will be interpreted as QEMU_VM_EOF
and migration will be interrupted.

There is no way out of this except defining an incompatible
migration protocol with a sentinel at the end of embedded structs.
Of course, this is restricted to new machine models.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 hw/pc_piix.c |    6 ++++++
 savevm.c     |   27 +++++++++++++++++++--------
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 18cc942..d8d629c 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -271,6 +271,7 @@ static QEMUMachine pc_machine_v0_14 = {
     .desc = "Standard PC",
     .init = pc_init_pci,
     .max_cpus = 255,
+    .migration_format = 3,
 };
 
 static QEMUMachine pc_machine_v0_13 = {
@@ -278,6 +279,7 @@ static QEMUMachine pc_machine_v0_13 = {
     .desc = "Standard PC",
     .init = pc_init_pci_no_kvmclock,
     .max_cpus = 255,
+    .migration_format = 3,
     .compat_props = (GlobalProperty[]) {
         {
             .driver   = "virtio-9p-pci",
@@ -317,6 +319,7 @@ static QEMUMachine pc_machine_v0_12 = {
     .desc = "Standard PC",
     .init = pc_init_pci_no_kvmclock,
     .max_cpus = 255,
+    .migration_format = 3,
     .compat_props = (GlobalProperty[]) {
         {
             .driver   = "virtio-serial-pci",
@@ -360,6 +363,7 @@ static QEMUMachine pc_machine_v0_11 = {
     .desc = "Standard PC, qemu 0.11",
     .init = pc_init_pci_no_kvmclock,
     .max_cpus = 255,
+    .migration_format = 3,
     .compat_props = (GlobalProperty[]) {
         {
             .driver   = "virtio-blk-pci",
@@ -411,6 +415,7 @@ static QEMUMachine pc_machine_v0_10 = {
     .desc = "Standard PC, qemu 0.10",
     .init = pc_init_pci_no_kvmclock,
     .max_cpus = 255,
+    .migration_format = 3,
     .compat_props = (GlobalProperty[]) {
         {
             .driver   = "virtio-blk-pci",
@@ -474,6 +479,7 @@ static QEMUMachine isapc_machine = {
     .desc = "ISA-only PC",
     .init = pc_init_isa,
     .max_cpus = 1,
+    .migration_format = 3,
 };
 
 #ifdef CONFIG_XEN
diff --git a/savevm.c b/savevm.c
index 74e6e99..654770a 100644
--- a/savevm.c
+++ b/savevm.c
@@ -158,6 +158,14 @@ void qemu_announce_self(void)
 
 #define IO_BUF_SIZE 32768
 
+#define QEMU_VM_EOF                  0x00
+#define QEMU_VM_SECTION_START        0x01
+#define QEMU_VM_SECTION_PART         0x02
+#define QEMU_VM_SECTION_END          0x03
+#define QEMU_VM_SECTION_FULL         0x04
+#define QEMU_VM_SUBSECTION           0x05
+#define QEMU_VM_SUBSECTIONS_END      0x06
+
 struct QEMUFile {
     QEMUFilePutBufferFunc *put_buffer;
     QEMUFileGetBufferFunc *get_buffer;
@@ -1348,6 +1356,12 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
                 }
                 if (field->flags & VMS_STRUCT) {
                     ret = vmstate_load_state(f, field->vmsd, addr, field->vmsd->version_id);
+                    if (!current_machine->migration_format ||
+                        current_machine->migration_format >= 4) {
+                        if (qemu_get_byte(f) != QEMU_VM_SUBSECTIONS_END) {
+                            return -EINVAL;
+                        }
+                    }
                 } else {
                     ret = field->info->get(f, addr, size);
 
@@ -1410,6 +1424,10 @@ void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
                 }
                 if (field->flags & VMS_STRUCT) {
                     vmstate_save_state(f, field->vmsd, addr);
+                    if (!current_machine->migration_format ||
+                        current_machine->migration_format >= 4) {
+                        qemu_put_byte(f, QEMU_VM_SUBSECTIONS_END);
+                    }
                 } else {
                     field->info->put(f, addr, size);
                 }
@@ -1439,14 +1457,7 @@ static void vmstate_save(QEMUFile *f, SaveStateEntry *se)
 
 #define QEMU_VM_FILE_MAGIC           0x5145564d
 #define QEMU_VM_FILE_VERSION_COMPAT  0x00000002
-#define QEMU_VM_FILE_VERSION         0x00000003
-
-#define QEMU_VM_EOF                  0x00
-#define QEMU_VM_SECTION_START        0x01
-#define QEMU_VM_SECTION_PART         0x02
-#define QEMU_VM_SECTION_END          0x03
-#define QEMU_VM_SECTION_FULL         0x04
-#define QEMU_VM_SUBSECTION           0x05
+#define QEMU_VM_FILE_VERSION         0x00000004
 
 bool qemu_savevm_state_blocked(Monitor *mon)
 {
-- 
1.7.5.2

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] savevm: define new unambiguous migration format
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 3/4] savevm: define new unambiguous migration format Paolo Bonzini
@ 2011-07-29 13:12   ` Anthony Liguori
  2011-07-29 14:35     ` Paolo Bonzini
  0 siblings, 1 reply; 47+ messages in thread
From: Anthony Liguori @ 2011-07-29 13:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On 06/30/2011 10:46 AM, Paolo Bonzini wrote:
> With the current migration format, VMS_STRUCTs with subsections
> are ambiguous.  The protocol cannot tell whether a 0x5 byte after
> the VMS_STRUCT is a subsection or part of the parent data stream.
> In the past QEMU assumed it was always a part of a subsection; after
> commit eb60260 (savevm: fix corruption in vmstate_subsection_load().,
> 2011-02-03) the choice depends on whether the VMS_STRUCT has subsections
> defined.
>
> Unfortunately, this means that if a destination has no subsections
> defined for the struct, it will happily read subsection data into
> its own fields.  And if you are "lucky" enough to stumble on a
> zero byte at the right time, it will be interpreted as QEMU_VM_EOF
> and migration will be interrupted.
>
> There is no way out of this except defining an incompatible
> migration protocol with a sentinel at the end of embedded structs.
> Of course, this is restricted to new machine models.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
>   hw/pc_piix.c |    6 ++++++
>   savevm.c     |   27 +++++++++++++++++++--------
>   2 files changed, 25 insertions(+), 8 deletions(-)
>
> diff --git a/hw/pc_piix.c b/hw/pc_piix.c
> index 18cc942..d8d629c 100644
> --- a/hw/pc_piix.c
> +++ b/hw/pc_piix.c
> @@ -271,6 +271,7 @@ static QEMUMachine pc_machine_v0_14 = {
>       .desc = "Standard PC",
>       .init = pc_init_pci,
>       .max_cpus = 255,
> +    .migration_format = 3,
>   };

Please introduce a macro so this code is readable.

We have other machines that support migration in other archs too.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 3/4] savevm: define new unambiguous migration format
  2011-07-29 13:12   ` Anthony Liguori
@ 2011-07-29 14:35     ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2011-07-29 14:35 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

On 07/29/2011 03:12 PM, Anthony Liguori wrote:
> Please introduce a macro so this code is readable.

Ok.

> We have other machines that support migration in other archs too.

Those machine types are not versioned, so they will automatically switch 
to the newest version.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [Qemu-devel] [RFC PATCH 4/4] Partially revert "savevm: fix corruption in vmstate_subsection_load()."
  2011-06-30 15:46 [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
                   ` (2 preceding siblings ...)
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 3/4] savevm: define new unambiguous migration format Paolo Bonzini
@ 2011-06-30 15:46 ` Paolo Bonzini
  2011-07-25 21:10 ` [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
  4 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2011-06-30 15:46 UTC (permalink / raw)
  To: qemu-devel

This reverts the additional check in commit eb60260d (but not the
assertions).

The new format does not require the check, and with the old format
it traded one kind of bogus failure for a different kind of silent
failure.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 savevm.c |    4 ----
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/savevm.c b/savevm.c
index 654770a..6c726ec 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1679,10 +1679,6 @@ static int vmstate_subsection_load(QEMUFile *f, const VMStateDescription *vmsd,
 {
     const VMStateSubsection *sub = vmsd->subsections;
 
-    if (!sub || !sub->needed) {
-        return 0;
-    }
-
     while (qemu_peek_byte(f) == QEMU_VM_SUBSECTION) {
         char idstr[256];
         int ret;
-- 
1.7.5.2

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-06-30 15:46 [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
                   ` (3 preceding siblings ...)
  2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 4/4] Partially revert "savevm: fix corruption in vmstate_subsection_load()." Paolo Bonzini
@ 2011-07-25 21:10 ` Paolo Bonzini
  2011-07-25 23:23   ` Anthony Liguori
  2011-07-29 13:14   ` Anthony Liguori
  4 siblings, 2 replies; 47+ messages in thread
From: Paolo Bonzini @ 2011-07-25 21:10 UTC (permalink / raw)
  To: qemu-devel, anthony, mst, quintela

On Thu, Jun 30, 2011 at 17:46, Paolo Bonzini <pbonzini@redhat.com> wrote:
> With the current migration format, VMS_STRUCTs with subsections
> are ambiguous.  The protocol cannot tell whether a 0x5 byte after
> the VMS_STRUCT is a subsection or part of the parent data stream.
> In the past QEMU assumed it was always a part of a subsection; after
> commit eb60260 (savevm: fix corruption in vmstate_subsection_load(),
> 2011-02-03) the choice depends on whether the VMS_STRUCT has subsections
> defined.
>
> Unfortunately, this means that if a destination has no subsections
> defined for the struct, it will happily read subsection data into
> its own fields.  And if you are "lucky" enough to stumble on a
> zero byte at the right time, it will be interpreted as QEMU_VM_EOF
> and migration will be interrupted with half-loaded state.
>
> There is no way out of this except defining an incompatible
> migration protocol.  Not-so-long-term we should really try to define
> one that is not a joke, but the bug is serious so we need a solution
> for 0.15.  A sentinel at the end of embedded structs does remove the
> ambiguity.
>
> Of course, this can be restricted to new machine models, and this
> is what the patch series does.  (And note that only patch 3 is specific
> to the short-term solution, everything else is entirely generic).
>
> Untested beyond compilation.

I have now tested this series (exactly as sent) both by examining
manually the differences between the two formats on the same guest
state, and by a mix of saves/restores (new on new, 0.14 on new
pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL).  It
always does what is expected.

Michael Tsirkin objected that the format should be passed as a
parameter in the migrate command.  I kind of agree, however since this
is a real bug you would need to bump the default for new machine
types, and this default would still go in the QEMUMachine struct like
I am doing.  So I consider the two settings to be orthogonal.  Also,
the alternative requires changes to the whole management stack and if
the default is not changed it imposes a broken format unless you
update the management tools.  Clearly much less bang for the buck.

I think this is ready to go into 0.15.  The bug happens when migrating
to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a
floppy.  The media changed subsection is almost always included, and
this causes problems when migrating to 0.14 which didn't have any
subsection for the floppy device.  While QEMU support for migration to
old version admittedly depends on luck, this isn't true of certain
downstreams :) which would like to have an unambiguous migration
format.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-25 21:10 ` [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
@ 2011-07-25 23:23   ` Anthony Liguori
  2011-07-26  9:42     ` Daniel P. Berrange
                       ` (2 more replies)
  2011-07-29 13:14   ` Anthony Liguori
  1 sibling, 3 replies; 47+ messages in thread
From: Anthony Liguori @ 2011-07-25 23:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Ryan Harper, Stefan Hajnoczi, Juan Quintela, mst, qemu-devel

On 07/25/2011 04:10 PM, Paolo Bonzini wrote:
> On Thu, Jun 30, 2011 at 17:46, Paolo Bonzini<pbonzini@redhat.com>  wrote:
>> With the current migration format, VMS_STRUCTs with subsections
>> are ambiguous.  The protocol cannot tell whether a 0x5 byte after
>> the VMS_STRUCT is a subsection or part of the parent data stream.
>> In the past QEMU assumed it was always a part of a subsection; after
>> commit eb60260 (savevm: fix corruption in vmstate_subsection_load(),
>> 2011-02-03) the choice depends on whether the VMS_STRUCT has subsections
>> defined.
>>
>> Unfortunately, this means that if a destination has no subsections
>> defined for the struct, it will happily read subsection data into
>> its own fields.  And if you are "lucky" enough to stumble on a
>> zero byte at the right time, it will be interpreted as QEMU_VM_EOF
>> and migration will be interrupted with half-loaded state.
>>
>> There is no way out of this except defining an incompatible
>> migration protocol.  Not-so-long-term we should really try to define
>> one that is not a joke, but the bug is serious so we need a solution
>> for 0.15.  A sentinel at the end of embedded structs does remove the
>> ambiguity.
>>
>> Of course, this can be restricted to new machine models, and this
>> is what the patch series does.  (And note that only patch 3 is specific
>> to the short-term solution, everything else is entirely generic).
>>
>> Untested beyond compilation.
>
> I have now tested this series (exactly as sent) both by examining
> manually the differences between the two formats on the same guest
> state, and by a mix of saves/restores (new on new, 0.14 on new
> pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL).  It
> always does what is expected.
>
> Michael Tsirkin objected that the format should be passed as a
> parameter in the migrate command.  I kind of agree, however since this
> is a real bug you would need to bump the default for new machine
> types, and this default would still go in the QEMUMachine struct like
> I am doing.  So I consider the two settings to be orthogonal.  Also,
> the alternative requires changes to the whole management stack and if
> the default is not changed it imposes a broken format unless you
> update the management tools.  Clearly much less bang for the buck.
>
> I think this is ready to go into 0.15.

I'll take a look for 0.15.

> The bug happens when migrating
> to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a
> floppy.  The media changed subsection is almost always included, and
> this causes problems when migrating to 0.14 which didn't have any
> subsection for the floppy device.  While QEMU support for migration to
> old version admittedly depends on luck, this isn't true of certain
> downstreams :) which would like to have an unambiguous migration
> format.

So this got me thinking about where we're at with migration and where we 
need to go.

I actually think there might be a reasonable path forward if we attack 
the problem differently than we have so far.

== Today ==

Today we only support generating the latest serialization of devices. 
To increase the probability of the latest version working on older 
versions of QEMU, we strategically omit fields that we know can safely 
be omitted with older versions (subsections).  More than likely, 
migrating new to old won't work.

Migrating old to new is more likely to work.  We version each section in 
order to be able to identify when we're dealing with old.

But all of this logic lives in one of two forms.  Either as a 
savevm/loadvm callback that takes a QEMUFile and writes byte 
serialization to the stream in an open way (usually big endian) or 
encoded declaratively in a VMState section.

== What we need ==

We need to decompose migration into three different problems: 1) 
serializing device state 2) transforming the device model in order to 
satisfy forwards and backwards compatibility 3) encoding the serialized 
device model on the wire.

We also need a way to future proof ourselves.

== What we can do ==

1) Add migration capabilities to future proof ourselves.  I think the 
simplest way this would work is to have a 'query-migration-capabilities' 
command that returned a bitmask of supported migration features.  I 
think we also introduce a 'set-migration-capabilities' command that can 
mask some of the supported features.

A management tool would query-migration features on the source and 
destination, take the intersection of the two masks, and set that mask 
on both the source and destination.

Lack of support for these commands indicates a mask of zero which is the 
protocol we offer today.

2) Switch to a visitor model to serialize device state.  This involves 
converting any occurance of:

qemu_put_be32(f, port->guest_connected);

To:

visit_type_u32(v, "guest_connected", &port->guest_connected, &local_err);

It's 100% mechanical and makes absolutely no logic change.  It works 
equally well with legacy and VMstate migration handlers.

3) Add a Visitor class that operates on QEMUFile.

At this state, we can migrate to data structures.  That means we can 
migrate to QEMUFile, QObjects, or JSON.  We could change the protocol at 
this stage to something that was still binary but had section sizes and 
things of that nature.

But we shouldn't stop here.

4) Compatibility logic should be extracted from the savevm functions and 
VMstate functions into separate functions that take a data structure. 
Basically, we want to have something roughly equivalent to:

QObject *e1000_migration_compatibility(QObject *src, int src_version, 
int dst_version);

We can have lots of helpers that reuse the VMstate declarative stuff to 
do this but this should be registered independent of the main 
serialization handler.

This moves us to a model where we always generate the latest 
serialization format, and then have specific ways to convert to older 
mechanisms.  It allows us to do very big backwards compatibility steps 
like convert the state of one device into two separate devices (because 
we're just dealing with in-memory data structures).

It's this step that lets us truly support compatibility with migration. 
  The good news is, it doesn't have to be all or nothing.  Since we 
always already generate the latest serialization format, the existing 
code only deals with migrating older versions to the latest which is 
something that isn't all that important.

So if we did this in 1.0, we could have a single function that converted 
the 1.0 device model to 1.1 and vice versa, and we'd be fine.  We 
wouldn't have to touch 200 devices to do this.

5) Once we're here, we can implement the next 5-year format.  That could 
be ASN.1 and be bidirectional or whatever makes the most sense.  We 
could support 50 formats if we wanted to.  As long as the transport is 
distinct from the serialization and compat routines, it really doesn't 
matter.

Regards,

Anthony Liguori
>
> Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-25 23:23   ` Anthony Liguori
@ 2011-07-26  9:42     ` Daniel P. Berrange
  2011-07-26  9:48     ` Stefan Hajnoczi
  2011-07-26 12:07     ` Juan Quintela
  2 siblings, 0 replies; 47+ messages in thread
From: Daniel P. Berrange @ 2011-07-26  9:42 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ryan Harper, Stefan Hajnoczi, quintela, mst, qemu-devel,
	Paolo Bonzini

On Mon, Jul 25, 2011 at 06:23:17PM -0500, Anthony Liguori wrote:
> We also need a way to future proof ourselves.
> 
> == What we can do ==
> 
> 1) Add migration capabilities to future proof ourselves.  I think
> the simplest way this would work is to have a
> 'query-migration-capabilities' command that returned a bitmask of
> supported migration features.  I think we also introduce a
> 'set-migration-capabilities' command that can mask some of the
> supported features.
> 
> A management tool would query-migration features on the source and
> destination, take the intersection of the two masks, and set that
> mask on both the source and destination.
> 
> Lack of support for these commands indicates a mask of zero which is
> the protocol we offer today.

This sounds like a very good idea to me.

> 5) .... We could support 50 formats if we wanted to.  As long as the
> transport is distinct from the serialization and compat routines, it
> really doesn't matter.

Lets not get too carried away :-) Even just dealing with the different
ways libvirt can invoke & manage the migration process gives me ~100
test scenarios to run through for each release of libvirt. The fewer
QEMU testing combinations we need to worry about the better, because
it quickly explodes with migration as you throw different versions
into the mix.


Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-25 23:23   ` Anthony Liguori
  2011-07-26  9:42     ` Daniel P. Berrange
@ 2011-07-26  9:48     ` Stefan Hajnoczi
  2011-07-26 12:51       ` Stefan Hajnoczi
  2011-07-26 12:07     ` Juan Quintela
  2 siblings, 1 reply; 47+ messages in thread
From: Stefan Hajnoczi @ 2011-07-26  9:48 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Ryan Harper, quintela, mst, qemu-devel, Paolo Bonzini

On Mon, Jul 25, 2011 at 06:23:17PM -0500, Anthony Liguori wrote:
> On 07/25/2011 04:10 PM, Paolo Bonzini wrote:
> >On Thu, Jun 30, 2011 at 17:46, Paolo Bonzini<pbonzini@redhat.com>  wrote:
> >>With the current migration format, VMS_STRUCTs with subsections
> >>are ambiguous.  The protocol cannot tell whether a 0x5 byte after
> >>the VMS_STRUCT is a subsection or part of the parent data stream.
> >>In the past QEMU assumed it was always a part of a subsection; after
> >>commit eb60260 (savevm: fix corruption in vmstate_subsection_load(),
> >>2011-02-03) the choice depends on whether the VMS_STRUCT has subsections
> >>defined.
> >>
> >>Unfortunately, this means that if a destination has no subsections
> >>defined for the struct, it will happily read subsection data into
> >>its own fields.  And if you are "lucky" enough to stumble on a
> >>zero byte at the right time, it will be interpreted as QEMU_VM_EOF
> >>and migration will be interrupted with half-loaded state.
> >>
> >>There is no way out of this except defining an incompatible
> >>migration protocol.  Not-so-long-term we should really try to define
> >>one that is not a joke, but the bug is serious so we need a solution
> >>for 0.15.  A sentinel at the end of embedded structs does remove the
> >>ambiguity.
> >>
> >>Of course, this can be restricted to new machine models, and this
> >>is what the patch series does.  (And note that only patch 3 is specific
> >>to the short-term solution, everything else is entirely generic).
> >>
> >>Untested beyond compilation.
> >
> >I have now tested this series (exactly as sent) both by examining
> >manually the differences between the two formats on the same guest
> >state, and by a mix of saves/restores (new on new, 0.14 on new
> >pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL).  It
> >always does what is expected.
> >
> >Michael Tsirkin objected that the format should be passed as a
> >parameter in the migrate command.  I kind of agree, however since this
> >is a real bug you would need to bump the default for new machine
> >types, and this default would still go in the QEMUMachine struct like
> >I am doing.  So I consider the two settings to be orthogonal.  Also,
> >the alternative requires changes to the whole management stack and if
> >the default is not changed it imposes a broken format unless you
> >update the management tools.  Clearly much less bang for the buck.
> >
> >I think this is ready to go into 0.15.
> 
> I'll take a look for 0.15.
> 
> >The bug happens when migrating
> >to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a
> >floppy.  The media changed subsection is almost always included, and
> >this causes problems when migrating to 0.14 which didn't have any
> >subsection for the floppy device.  While QEMU support for migration to
> >old version admittedly depends on luck, this isn't true of certain
> >downstreams :) which would like to have an unambiguous migration
> >format.
> 
> So this got me thinking about where we're at with migration and
> where we need to go.
> 
> I actually think there might be a reasonable path forward if we
> attack the problem differently than we have so far.
> 
> == Today ==
> 
> Today we only support generating the latest serialization of
> devices. To increase the probability of the latest version working
> on older versions of QEMU, we strategically omit fields that we know
> can safely be omitted with older versions (subsections).  More than
> likely, migrating new to old won't work.
> 
> Migrating old to new is more likely to work.  We version each
> section in order to be able to identify when we're dealing with old.
> 
> But all of this logic lives in one of two forms.  Either as a
> savevm/loadvm callback that takes a QEMUFile and writes byte
> serialization to the stream in an open way (usually big endian) or
> encoded declaratively in a VMState section.
> 
> == What we need ==
> 
> We need to decompose migration into three different problems: 1)
> serializing device state 2) transforming the device model in order
> to satisfy forwards and backwards compatibility 3) encoding the
> serialized device model on the wire.
> 
> We also need a way to future proof ourselves.
> 
> == What we can do ==
> 
> 1) Add migration capabilities to future proof ourselves.  I think
> the simplest way this would work is to have a
> 'query-migration-capabilities' command that returned a bitmask of
> supported migration features.  I think we also introduce a
> 'set-migration-capabilities' command that can mask some of the
> supported features.
> 
> A management tool would query-migration features on the source and
> destination, take the intersection of the two masks, and set that
> mask on both the source and destination.
> 
> Lack of support for these commands indicates a mask of zero which is
> the protocol we offer today.

When the management tool drives negotiation it is possible to do nice
error reporting (each capability bit has a meaning and detailed
incompatibility errors can be generated).

However, doing so imposes extra work on management tools - they need to
understand and drive negotiation.  If QEMU adds a new capability we
might even need to update management tools!

As a management tool author I would prefer the source and destination to
work it out amongst themselves so that I just issue the 'migrate'
command.  Negotiation can be done without the management tool's
involvement: fail migration if the initial negotation phase fails.

> 3) Add a Visitor class that operates on QEMUFile.
> 
> At this state, we can migrate to data structures.  That means we can
> migrate to QEMUFile, QObjects, or JSON.  We could change the
> protocol at this stage to something that was still binary but had
> section sizes and things of that nature.

Just want to see if you agree that the visitors build a representation
(QObject, ASN.1, native binary) but they are not the migration protocol.
The migration protocol will embed representations generated by visitors,
but the protocol has its own messages and really controls the migration
process.

> 4) Compatibility logic should be extracted from the savevm functions
> and VMstate functions into separate functions that take a data
> structure. Basically, we want to have something roughly equivalent
> to:
> 
> QObject *e1000_migration_compatibility(QObject *src, int
> src_version, int dst_version);
> 
> We can have lots of helpers that reuse the VMstate declarative stuff
> to do this but this should be registered independent of the main
> serialization handler.

What you are describing is a compiler :).  Collect inputs, build
internal representation, perform transformations, emit outputs.

This is how I imagine the steps on the source:
1. Serialize device state into QObject representation
2. Transform QObject down to compatible version QObject.
3. Serialize QObject into binary representation.
4. Send binary to destination host.

Transforming to an older version is done on the source host.  For save
state files this doesn't work nicely since a new QEMU will produce a
file that an old QEMU doesn't understand (we don't support that today
either).

Stefan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-26  9:48     ` Stefan Hajnoczi
@ 2011-07-26 12:51       ` Stefan Hajnoczi
  2011-07-26 13:00         ` Anthony Liguori
  0 siblings, 1 reply; 47+ messages in thread
From: Stefan Hajnoczi @ 2011-07-26 12:51 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Ryan Harper, mst, quintela, qemu-devel, Paolo Bonzini

On Tue, Jul 26, 2011 at 10:48 AM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> On Mon, Jul 25, 2011 at 06:23:17PM -0500, Anthony Liguori wrote:
>> On 07/25/2011 04:10 PM, Paolo Bonzini wrote:
>> >On Thu, Jun 30, 2011 at 17:46, Paolo Bonzini<pbonzini@redhat.com>  wrote:
>> >>With the current migration format, VMS_STRUCTs with subsections
>> >>are ambiguous.  The protocol cannot tell whether a 0x5 byte after
>> >>the VMS_STRUCT is a subsection or part of the parent data stream.
>> >>In the past QEMU assumed it was always a part of a subsection; after
>> >>commit eb60260 (savevm: fix corruption in vmstate_subsection_load(),
>> >>2011-02-03) the choice depends on whether the VMS_STRUCT has subsections
>> >>defined.
>> >>
>> >>Unfortunately, this means that if a destination has no subsections
>> >>defined for the struct, it will happily read subsection data into
>> >>its own fields.  And if you are "lucky" enough to stumble on a
>> >>zero byte at the right time, it will be interpreted as QEMU_VM_EOF
>> >>and migration will be interrupted with half-loaded state.
>> >>
>> >>There is no way out of this except defining an incompatible
>> >>migration protocol.  Not-so-long-term we should really try to define
>> >>one that is not a joke, but the bug is serious so we need a solution
>> >>for 0.15.  A sentinel at the end of embedded structs does remove the
>> >>ambiguity.
>> >>
>> >>Of course, this can be restricted to new machine models, and this
>> >>is what the patch series does.  (And note that only patch 3 is specific
>> >>to the short-term solution, everything else is entirely generic).
>> >>
>> >>Untested beyond compilation.
>> >
>> >I have now tested this series (exactly as sent) both by examining
>> >manually the differences between the two formats on the same guest
>> >state, and by a mix of saves/restores (new on new, 0.14 on new
>> >pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL).  It
>> >always does what is expected.
>> >
>> >Michael Tsirkin objected that the format should be passed as a
>> >parameter in the migrate command.  I kind of agree, however since this
>> >is a real bug you would need to bump the default for new machine
>> >types, and this default would still go in the QEMUMachine struct like
>> >I am doing.  So I consider the two settings to be orthogonal.  Also,
>> >the alternative requires changes to the whole management stack and if
>> >the default is not changed it imposes a broken format unless you
>> >update the management tools.  Clearly much less bang for the buck.
>> >
>> >I think this is ready to go into 0.15.
>>
>> I'll take a look for 0.15.
>>
>> >The bug happens when migrating
>> >to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a
>> >floppy.  The media changed subsection is almost always included, and
>> >this causes problems when migrating to 0.14 which didn't have any
>> >subsection for the floppy device.  While QEMU support for migration to
>> >old version admittedly depends on luck, this isn't true of certain
>> >downstreams :) which would like to have an unambiguous migration
>> >format.
>>
>> So this got me thinking about where we're at with migration and
>> where we need to go.
>>
>> I actually think there might be a reasonable path forward if we
>> attack the problem differently than we have so far.
>>
>> == Today ==
>>
>> Today we only support generating the latest serialization of
>> devices. To increase the probability of the latest version working
>> on older versions of QEMU, we strategically omit fields that we know
>> can safely be omitted with older versions (subsections).  More than
>> likely, migrating new to old won't work.
>>
>> Migrating old to new is more likely to work.  We version each
>> section in order to be able to identify when we're dealing with old.
>>
>> But all of this logic lives in one of two forms.  Either as a
>> savevm/loadvm callback that takes a QEMUFile and writes byte
>> serialization to the stream in an open way (usually big endian) or
>> encoded declaratively in a VMState section.
>>
>> == What we need ==
>>
>> We need to decompose migration into three different problems: 1)
>> serializing device state 2) transforming the device model in order
>> to satisfy forwards and backwards compatibility 3) encoding the
>> serialized device model on the wire.
>>
>> We also need a way to future proof ourselves.
>>
>> == What we can do ==
>>
>> 1) Add migration capabilities to future proof ourselves.  I think
>> the simplest way this would work is to have a
>> 'query-migration-capabilities' command that returned a bitmask of
>> supported migration features.  I think we also introduce a
>> 'set-migration-capabilities' command that can mask some of the
>> supported features.
>>
>> A management tool would query-migration features on the source and
>> destination, take the intersection of the two masks, and set that
>> mask on both the source and destination.
>>
>> Lack of support for these commands indicates a mask of zero which is
>> the protocol we offer today.
>
> When the management tool drives negotiation it is possible to do nice
> error reporting (each capability bit has a meaning and detailed
> incompatibility errors can be generated).
>
> However, doing so imposes extra work on management tools - they need to
> understand and drive negotiation.  If QEMU adds a new capability we
> might even need to update management tools!
>
> As a management tool author I would prefer the source and destination to
> work it out amongst themselves so that I just issue the 'migrate'
> command.  Negotiation can be done without the management tool's
> involvement: fail migration if the initial negotation phase fails.

An advantage I didn't think of was that management tools handling
negotiation makes negotiation out-of-band and the migration protocol
doesn't need to be changed.

It seems like the migration protocol needs an overhaul sooner or later
anyway, so perhaps it's not work making the negotiation external.

Stefan

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-26 12:51       ` Stefan Hajnoczi
@ 2011-07-26 13:00         ` Anthony Liguori
  0 siblings, 0 replies; 47+ messages in thread
From: Anthony Liguori @ 2011-07-26 13:00 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Ryan Harper, Stefan Hajnoczi, quintela, mst, qemu-devel,
	Paolo Bonzini

On 07/26/2011 07:51 AM, Stefan Hajnoczi wrote:
> On Tue, Jul 26, 2011 at 10:48 AM, Stefan Hajnoczi
> <stefanha@linux.vnet.ibm.com>  wrote:
>> On Mon, Jul 25, 2011 at 06:23:17PM -0500, Anthony Liguori wrote:
>> However, doing so imposes extra work on management tools - they need to
>> understand and drive negotiation.  If QEMU adds a new capability we
>> might even need to update management tools!
>>
>> As a management tool author I would prefer the source and destination to
>> work it out amongst themselves so that I just issue the 'migrate'
>> command.  Negotiation can be done without the management tool's
>> involvement: fail migration if the initial negotation phase fails.
>
> An advantage I didn't think of was that management tools handling
> negotiation makes negotiation out-of-band and the migration protocol
> doesn't need to be changed.

Not quite that, but that you can detect when the migration changes.  For 
instance, this feature would allow the following behavior:

1) src doesn't know the new protocol, dst still supports the old 
protocol and the new protocol, migration uses old protocol.

2) src knows the new protocol, dst doesn't know the new protocol, old 
protocol is used.

3) src knows the new protocol, dst knows the new protocol, new protocol 
is used

4) src doesn't know the new protocol, dst chooses to only support the 
new protocol, migration fails gracefully.

Even if we only ever introduce a single feature, having the mechanism 
means that we can gracefully fail with a new format and have a 
transition period.

Regards,

Anthony Liguori

>
> It seems like the migration protocol needs an overhaul sooner or later
> anyway, so perhaps it's not work making the negotiation external.
>
> Stefan
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-25 23:23   ` Anthony Liguori
  2011-07-26  9:42     ` Daniel P. Berrange
  2011-07-26  9:48     ` Stefan Hajnoczi
@ 2011-07-26 12:07     ` Juan Quintela
  2011-07-26 12:37       ` Anthony Liguori
  2 siblings, 1 reply; 47+ messages in thread
From: Juan Quintela @ 2011-07-26 12:07 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ryan Harper, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, mst

Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 07/25/2011 04:10 PM, Paolo Bonzini wrote:

> == Today ==
>
> Today we only support generating the latest serialization of
> devices. To increase the probability of the latest version working on
> older versions of QEMU, we strategically omit fields that we know can
> safely be omitted with older versions (subsections).  More than
> likely, migrating new to old won't work.
>
> Migrating old to new is more likely to work.  We version each section
> in order to be able to identify when we're dealing with old.
>
> But all of this logic lives in one of two forms.  Either as a
> savevm/loadvm callback that takes a QEMUFile and writes byte
> serialization to the stream in an open way (usually big endian) or
> encoded declaratively in a VMState section.

We have a very "poor" way to try to load a device without some features,
but support is very bad.

> == What we need ==
>
> We need to decompose migration into three different problems: 1)
> serializing device state 2) transforming the device model in order to
> satisfy forwards and backwards compatibility 3) encoding the
> serialized device model on the wire.

I will change this to:
- We need to be able to "enable/disable" features of a device.
  A.K.A. make -M pc-0.14 work with devices with the same features
  than 0.14.  Notice that this is _independent_ of migration.

- Be able to describe that different features/versions.  This is not the
  difficult part, it can be subsections, optional fields, whatever.
  What is the difficult part is _knowing_ what fields needs to be on
  each version.  That again depends of the device, not migration.

- Be able to to do forward/bacward compatibility (and without
  comunication both sides is basically impossible).

- Send things on the wire (really this is the easy part, we can play
  with it touching only migration functions.).

> We also need a way to future proof ourselves.

We have been very bad at this.  Automatic checking is the only way that
I can think of.

> == What we can do ==
>
> 1) Add migration capabilities to future proof ourselves.  I think the
> simplest way this would work is to have a
> query-migration-capabilities' command that returned a bitmask of
> supported migration features.  I think we also introduce a
> set-migration-capabilities' command that can mask some of the
> supported features.

We have two things here.  Device level & protocol level.

Device level: very late to set anything.
Protocol level: we can set things here, but notice that only a few
things cane be set here.

> A management tool would query-migration features on the source and
> destination, take the intersection of the two masks, and set that mask
> on both the source and destination.
>
> Lack of support for these commands indicates a mask of zero which is
> the protocol we offer today.
>
> 2) Switch to a visitor model to serialize device state.  This involves
> converting any occurance of:
>
> qemu_put_be32(f, port->guest_connected);
>
> To:
>
> visit_type_u32(v, "guest_connected", &port->guest_connected, &local_err);

VMSTATE_INT32(guest_conected, FooState)

can be make to do this at any point.

> It's 100% mechanical and makes absolutely no logic change.  It works
> equally well with legacy and VMstate migration handlers.
>
> 3) Add a Visitor class that operates on QEMUFile.
>
> At this state, we can migrate to data structures.  That means we can
> migrate to QEMUFile, QObjects, or JSON.  We could change the protocol
> at this stage to something that was still binary but had section sizes
> and things of that nature.

That was the whole point of vmstate.

> But we shouldn't stop here.
>
> 4) Compatibility logic should be extracted from the savevm functions
> and VMstate functions into separate functions that take a data
> structure. Basically, we want to have something roughly equivalent to:
>
> QObject *e1000_migration_compatibility(QObject *src, int src_version,
> int dst_version);
>
> We can have lots of helpers that reuse the VMstate declarative stuff
> to do this but this should be registered independent of the main
> serialization handler.
>
> This moves us to a model where we always generate the latest
> serialization format, and then have specific ways to convert to older
> mechanisms.  It allows us to do very big backwards compatibility steps
> like convert the state of one device into two separate devices
> (because we're just dealing with in-memory data structures).

Paint me sceptic about this.   I don't think this is going to work
because that functions will rote very fast.

> It's this step that lets us truly support compatibility with
> migration. The good news is, it doesn't have to be all or nothing.
> Since we always already generate the latest serialization format, the
> existing code only deals with migrating older versions to the latest
> which is something that isn't all that important.
>
> So if we did this in 1.0, we could have a single function that
> converted the 1.0 device model to 1.1 and vice versa, and we'd be
> fine.  We wouldn't have to touch 200 devices to do this.

I still think this is wrong.  We are launching a device with feature
"foo", and at migration time, we want to migration without feature
"foo".  This is not going to work on the general case.  But launching
the device _without_ feature "foo" will always work.

Notice the things that "can" be optional:
- features that are not used.  We update the device to have more
  features, but OS driver only uses the features of the old version.
  With subsections test, we can fix this one.

- values that are only needed sometimes.  PIO subsection cames to mind,
  it is only needed when we are on the middle of a PIO operation.

- values that rarely change for defaults.  This the mmio addresess
  problems with rtl8139.  If we plug/unplug the card, we will get a
  different address, so we need to change it.

- values that depend of other features (change default size of memory,
  add new variables, etc).  This is for its very nature not compatible,
  and we can't migrate.

What I am complaining here?  This "compatibility" support supposes that
migration works as:

device with some features -> migration -> device with other features

and it works.  This means that "migration" does magic, and this is never
going to work.

Until now, this kind of worked because we only supported migration from
old -> new, or the same version.  Migration from old -> new can never
have new features.  But from new -> old to work, we need a way to
disable the new features.   That is completely independent of migration.


> 5) Once we're here, we can implement the next 5-year format.  That
> could be ASN.1 and be bidirectional or whatever makes the most sense.
> We could support 50 formats if we wanted to.  As long as the transport
> is distinct from the serialization and compat routines, it really
> doesn't matter.

This means finishing the VMState support, once there, only thing needs
to change is "copy" the savevm, and change the "visitors" to whatever
else that we need/want.

Later, Juan.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-26 12:07     ` Juan Quintela
@ 2011-07-26 12:37       ` Anthony Liguori
  2011-07-26 20:13         ` Juan Quintela
  2011-07-29 14:03         ` Kevin Wolf
  0 siblings, 2 replies; 47+ messages in thread
From: Anthony Liguori @ 2011-07-26 12:37 UTC (permalink / raw)
  To: quintela; +Cc: Ryan Harper, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, mst

On 07/26/2011 07:07 AM, Juan Quintela wrote:
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>> == What we need ==
>>
>> We need to decompose migration into three different problems: 1)
>> serializing device state 2) transforming the device model in order to
>> satisfy forwards and backwards compatibility 3) encoding the
>> serialized device model on the wire.
>
> I will change this to:
> - We need to be able to "enable/disable" features of a device.
>    A.K.A. make -M pc-0.14 work with devices with the same features
>    than 0.14.  Notice that this is _independent_ of migration.

In theory, we already have this with qdev flags.

> - Be able to describe that different features/versions.  This is not the
>    difficult part, it can be subsections, optional fields, whatever.
>    What is the difficult part is _knowing_ what fields needs to be on
>    each version.  That again depends of the device, not migration.
>
> - Be able to to do forward/bacward compatibility (and without
>    comunication both sides is basically impossible).

Hrm, I'm not sure I agree with these conclusions.

Management tools should do their best job to create two compatible 
device models.

Given two compatible device models, there *may* be differences in the 
structure of the device models since we evolve things over time.  We may 
rename a field, change the type, etc.  To support this, we can use 
filters both on the destination and receive end to do our best to 
massage the device model into something compatible.

But creating two creating compatible device models is not the job of the 
migration protocol.  It's the job of management tools.

> - Send things on the wire (really this is the easy part, we can play
>    with it touching only migration functions.).
>
>> We also need a way to future proof ourselves.
>
> We have been very bad at this.  Automatic checking is the only way that
> I can think of.

I don't know what you mean by automatic checking.

>> == What we can do ==
>>
>> 1) Add migration capabilities to future proof ourselves.  I think the
>> simplest way this would work is to have a
>> query-migration-capabilities' command that returned a bitmask of
>> supported migration features.  I think we also introduce a
>> set-migration-capabilities' command that can mask some of the
>> supported features.
>
> We have two things here.  Device level&  protocol level.
>
> Device level: very late to set anything.
> Protocol level: we can set things here, but notice that only a few
> things cane be set here.

Once we have a protocol level feature bit, we can add device level 
feature bits as a new feature.

>> A management tool would query-migration features on the source and
>> destination, take the intersection of the two masks, and set that mask
>> on both the source and destination.
>>
>> Lack of support for these commands indicates a mask of zero which is
>> the protocol we offer today.
>>
>> 2) Switch to a visitor model to serialize device state.  This involves
>> converting any occurance of:
>>
>> qemu_put_be32(f, port->guest_connected);
>>
>> To:
>>
>> visit_type_u32(v, "guest_connected",&port->guest_connected,&local_err);
>
> VMSTATE_INT32(guest_conected, FooState)
>
> can be make to do this at any point.
>
>> It's 100% mechanical and makes absolutely no logic change.  It works
>> equally well with legacy and VMstate migration handlers.
>>
>> 3) Add a Visitor class that operates on QEMUFile.
>>
>> At this state, we can migrate to data structures.  That means we can
>> migrate to QEMUFile, QObjects, or JSON.  We could change the protocol
>> at this stage to something that was still binary but had section sizes
>> and things of that nature.
>
> That was the whole point of vmstate.

The problem with vmstate is that it's an all or nothing thing and the 
conversion isn't programmatic.  Since visiting and qemu_put match 1-1, 
we can do the conversion all-at-once with some sed magic.

>> So if we did this in 1.0, we could have a single function that
>> converted the 1.0 device model to 1.1 and vice versa, and we'd be
>> fine.  We wouldn't have to touch 200 devices to do this.
>
> I still think this is wrong.  We are launching a device with feature
> "foo", and at migration time, we want to migration without feature
> "foo".  This is not going to work on the general case.  But launching
> the device _without_ feature "foo" will always work.

Don't confuse migration with creating compatible device models.  We're 
never going to support migrating from a system with an e1000 to a system 
with virtio :-)

> Notice the things that "can" be optional:
> - features that are not used.  We update the device to have more
>    features, but OS driver only uses the features of the old version.
>    With subsections test, we can fix this one.
>
> - values that are only needed sometimes.  PIO subsection cames to mind,
>    it is only needed when we are on the middle of a PIO operation.
>
> - values that rarely change for defaults.  This the mmio addresess
>    problems with rtl8139.  If we plug/unplug the card, we will get a
>    different address, so we need to change it.
>
> - values that depend of other features (change default size of memory,
>    add new variables, etc).  This is for its very nature not compatible,
>    and we can't migrate.
>
> What I am complaining here?  This "compatibility" support supposes that
> migration works as:
>
> device with some features ->  migration ->  device with other features
>
> and it works.  This means that "migration" does magic, and this is never
> going to work.
>
> Until now, this kind of worked because we only supported migration from
> old ->  new, or the same version.  Migration from old ->  new can never
> have new features.  But from new ->  old to work, we need a way to
> disable the new features.   That is completely independent of migration.

At startup time, not dynamically.  And we have this, that's what -M pc-X 
is about.

>
>> 5) Once we're here, we can implement the next 5-year format.  That
>> could be ASN.1 and be bidirectional or whatever makes the most sense.
>> We could support 50 formats if we wanted to.  As long as the transport
>> is distinct from the serialization and compat routines, it really
>> doesn't matter.
>
> This means finishing the VMState support, once there, only thing needs
> to change is "copy" the savevm, and change the "visitors" to whatever
> else that we need/want.

There's no need to "finish" VMState to convert to visitors.  It's just 
sed -e 's:qemu_put_be32:visit_type_int32:g'

Regards,

Anthony Liguori

>
> Later, Juan.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-26 12:37       ` Anthony Liguori
@ 2011-07-26 20:13         ` Juan Quintela
  2011-07-26 21:46           ` Anthony Liguori
  2011-07-29 14:03         ` Kevin Wolf
  1 sibling, 1 reply; 47+ messages in thread
From: Juan Quintela @ 2011-07-26 20:13 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ryan Harper, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, mst

Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 07/26/2011 07:07 AM, Juan Quintela wrote:

>> I will change this to:
>> - We need to be able to "enable/disable" features of a device.
>>    A.K.A. make -M pc-0.14 work with devices with the same features
>>    than 0.14.  Notice that this is _independent_ of migration.
>
> In theory, we already have this with qdev flags.

theory.  we are not there at all :-(  but anyways, that is not
_migration_, it is qdev.


>> - Be able to describe that different features/versions.  This is not the
>>    difficult part, it can be subsections, optional fields, whatever.
>>    What is the difficult part is _knowing_ what fields needs to be on
>>    each version.  That again depends of the device, not migration.
>>
>> - Be able to to do forward/bacward compatibility (and without
>>    comunication both sides is basically impossible).
>
> Hrm, I'm not sure I agree with these conclusions.
>
> Management tools should do their best job to create two compatible
> device models.

How?  only part that can have enough information is the "new" part
(either source of destination).  And we are being very careful about not
allowing any comunication/setting of what is in the other side.

> Given two compatible device models, there *may* be differences in the
> structure of the device models since we evolve things over time.  We
> may rename a field, change the type, etc.  To support this, we can use
> filters both on the destination and receive end to do our best to
> massage the device model into something compatible.
>
> But creating two creating compatible device models is not the job of
> the migration protocol.  It's the job of management tools.

Agreed here.

>> - Send things on the wire (really this is the easy part, we can play
>>    with it touching only migration functions.).
>>
>>> We also need a way to future proof ourselves.
>>
>> We have been very bad at this.  Automatic checking is the only way that
>> I can think of.
>
> I don't know what you mean by automatic checking.

We should have unit test to see that (at least) the obvious migration work.

>> We have two things here.  Device level&  protocol level.
>>
>> Device level: very late to set anything.
>> Protocol level: we can set things here, but notice that only a few
>> things cane be set here.
>
> Once we have a protocol level feature bit, we can add device level
> feature bits as a new feature.

This don't help  migration time is very late to configure a device.  We
need to configure it at creation time.  It makes no sense to try to
migrate device foo with 4 bar's and at migration time try to "push" it
into only 2 bars.  Having it created with 2 bars in the 1st place is the
only sane solution.

>>> It's 100% mechanical and makes absolutely no logic change.  It works
>>> equally well with legacy and VMstate migration handlers.
>>>
>>> 3) Add a Visitor class that operates on QEMUFile.
>>>
>>> At this state, we can migrate to data structures.  That means we can
>>> migrate to QEMUFile, QObjects, or JSON.  We could change the protocol
>>> at this stage to something that was still binary but had section sizes
>>> and things of that nature.
>>
>> That was the whole point of vmstate.
>
> The problem with vmstate is that it's an all or nothing thing and the
> conversion isn't programmatic.

This is the whole point.  We are being declarative, and we create a
mecanism about how to visit all nodes.  What we do in each node is not
VMState business.  VMState only defines the nodes, and which ones belong
to each version.

> Since visiting and qemu_put match 1-1,
> we can do the conversion all-at-once with some sed magic.
>
>>> So if we did this in 1.0, we could have a single function that
>>> converted the 1.0 device model to 1.1 and vice versa, and we'd be
>>> fine.  We wouldn't have to touch 200 devices to do this.
>>
>> I still think this is wrong.  We are launching a device with feature
>> "foo", and at migration time, we want to migration without feature
>> "foo".  This is not going to work on the general case.  But launching
>> the device _without_ feature "foo" will always work.
>
> Don't confuse migration with creating compatible device models.  We're
> never going to support migrating from a system with an e1000 to a
> system with virtio :-)

I am not confusing it.

from virtio_serial_bus.c, you can see that what we sent on version 2 vs
version 3 is much, much more information.  Migrating from v3 to v2 is
imposible, we need to start the device with the features that it had on
v2.  Notice that this is not related to vmstate, is related to _how_ the
device works/features are implemented.


>> Notice the things that "can" be optional:
>> - features that are not used.  We update the device to have more
>>    features, but OS driver only uses the features of the old version.
>>    With subsections test, we can fix this one.
>>
>> - values that are only needed sometimes.  PIO subsection cames to mind,
>>    it is only needed when we are on the middle of a PIO operation.
>>
>> - values that rarely change for defaults.  This the mmio addresess
>>    problems with rtl8139.  If we plug/unplug the card, we will get a
>>    different address, so we need to change it.
>>
>> - values that depend of other features (change default size of memory,
>>    add new variables, etc).  This is for its very nature not compatible,
>>    and we can't migrate.
>>
>> What I am complaining here?  This "compatibility" support supposes that
>> migration works as:
>>
>> device with some features ->  migration ->  device with other features
>>
>> and it works.  This means that "migration" does magic, and this is never
>> going to work.
>>
>> Until now, this kind of worked because we only supported migration from
>> old ->  new, or the same version.  Migration from old ->  new can never
>> have new features.  But from new ->  old to work, we need a way to
>> disable the new features.   That is completely independent of migration.
>
> At startup time, not dynamically.  And we have this, that's what -M
> pc-X 
> is about.

It don't work.  Almost nothing have enough state to describe it.  See my
example, how to start virtio-serial-bus with something that is
compatible with v2 of the protocol?  Device don't know how to do that.

>>> 5) Once we're here, we can implement the next 5-year format.  That
>>> could be ASN.1 and be bidirectional or whatever makes the most sense.
>>> We could support 50 formats if we wanted to.  As long as the transport
>>> is distinct from the serialization and compat routines, it really
>>> doesn't matter.
>>
>> This means finishing the VMState support, once there, only thing needs
>> to change is "copy" the savevm, and change the "visitors" to whatever
>> else that we need/want.
>
> There's no need to "finish" VMState to convert to visitors.  It's just
> sed -e 's:qemu_put_be32:visit_type_int32:g'

I don't agree.  That is just cosmetic and we are not fixing any problem.

Later, Juan.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-26 20:13         ` Juan Quintela
@ 2011-07-26 21:46           ` Anthony Liguori
  2011-07-26 22:22             ` Peter Maydell
  0 siblings, 1 reply; 47+ messages in thread
From: Anthony Liguori @ 2011-07-26 21:46 UTC (permalink / raw)
  To: quintela; +Cc: Ryan Harper, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, mst

On 07/26/2011 03:13 PM, Juan Quintela wrote:
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>> On 07/26/2011 07:07 AM, Juan Quintela wrote:
>
>>> - Be able to describe that different features/versions.  This is not the
>>>     difficult part, it can be subsections, optional fields, whatever.
>>>     What is the difficult part is _knowing_ what fields needs to be on
>>>     each version.  That again depends of the device, not migration.
>>>
>>> - Be able to to do forward/bacward compatibility (and without
>>>     comunication both sides is basically impossible).
>>
>> Hrm, I'm not sure I agree with these conclusions.
>>
>> Management tools should do their best job to create two compatible
>> device models.
>
> How?  only part that can have enough information is the "new" part
> (either source of destination).  And we are being very careful about not
> allowing any comunication/setting of what is in the other side.

I'll explain below.

>>> - Send things on the wire (really this is the easy part, we can play
>>>     with it touching only migration functions.).
>>>
>>>> We also need a way to future proof ourselves.
>>>
>>> We have been very bad at this.  Automatic checking is the only way that
>>> I can think of.
>>
>> I don't know what you mean by automatic checking.
>
> We should have unit test to see that (at least) the obvious migration work.

Oh, 100% agree.  In fact, I've posted patches :)  But I wasn't happy 
with the level of completeness of those tests and want to write better 
tests which is part of my motivation in visiting this topic.

>>> We have two things here.  Device level&   protocol level.
>>>
>>> Device level: very late to set anything.
>>> Protocol level: we can set things here, but notice that only a few
>>> things cane be set here.
>>
>> Once we have a protocol level feature bit, we can add device level
>> feature bits as a new feature.
>
> This don't help  migration time is very late to configure a device.  We
> need to configure it at creation time.  It makes no sense to try to
> migrate device foo with 4 bar's and at migration time try to "push" it
> into only 2 bars.  Having it created with 2 bars in the 1st place is the
> only sane solution.

I misunderstood what you were suggesting.  For guest visible device 
features, they must be configured at creation time.  I'm in full agreement.

>>>> It's 100% mechanical and makes absolutely no logic change.  It works
>>>> equally well with legacy and VMstate migration handlers.
>>>>
>>>> 3) Add a Visitor class that operates on QEMUFile.
>>>>
>>>> At this state, we can migrate to data structures.  That means we can
>>>> migrate to QEMUFile, QObjects, or JSON.  We could change the protocol
>>>> at this stage to something that was still binary but had section sizes
>>>> and things of that nature.
>>>
>>> That was the whole point of vmstate.
>>
>> The problem with vmstate is that it's an all or nothing thing and the
>> conversion isn't programmatic.
>
> This is the whole point.  We are being declarative, and we create a
> mecanism about how to visit all nodes.  What we do in each node is not
> VMState business.  VMState only defines the nodes, and which ones belong
> to each version.

Right.  Thinking more after the call, I think this may be a better way 
to explain what I'm proposing.

With VMState, we provide a declarative description of each devices 
state.  Because it's declarative, some things end up being tough to 
describe like variable sized arrays and complex data structures.  You've 
worked through a lot of these, but this is fundamentally what makes this 
approach difficult to complete.

At the end of VMState conversion, we have a declaration of how to read 
the current state of the device tree.  We can write a function that 
takes all of the VMState descriptions and builds something from those 
descriptions.

But right now, what we actually have is a routine that takes a VMState 
data description, and then calls a marshalling function.  In essence, 
the data description gets interpreted to an imperative serialization 
mechanism.

I'm suggesting that instead of trying to eliminate the imperativeness 
(which will be hard since we have a lot of hooks in various places), we 
should embrace the imperativeness.  Instead of marshalling to a 
QEMUFile, we marshal to a Visitor, Visitor being an abstract that can 
marshal to arbitrary formats/objects.

So we never actually walk the VMState tables to do anything.  The 
unconverted purely imperative routines we just convert to use marshal to 
a Visitor instead of QEMUFile.

What this gives us is a way to achieve the same level of abstraction 
that VMState would give us for almost no work at all.  That 
fundamentally let's us take the next step in "fixing" migration.

>>> device with some features ->   migration ->   device with other features
>>>
>>> and it works.  This means that "migration" does magic, and this is never
>>> going to work.
>>>
>>> Until now, this kind of worked because we only supported migration from
>>> old ->   new, or the same version.  Migration from old ->   new can never
>>> have new features.  But from new ->   old to work, we need a way to
>>> disable the new features.   That is completely independent of migration.
>>
>> At startup time, not dynamically.  And we have this, that's what -M
>> pc-X
>> is about.
>
> It don't work.

Here's how I think we can fix this.

We have two concepts today, the machine and devices.  Not all devices 
can be created by an end user as some are implied by the machine (this 
is qdev.no_user).  Since not everything is directly created by the user, 
there is no easy way to basically do a dump of the device model, then 
feed that back into QEMU for recreation.

We do compatibility by using global properties for the different 
machines but this is a tough proposition to get right as the granularity 
is pretty poor.  I can't change a property of a particular device 
created by the machine without changing it universally.

With an improved qdev (which I think is QOM, but for now, just ignore 
that), we would be able to do the following:

1) create a device, that creates other devices as children of itself 
*without* those children being under a bus hierarchy.

2) eliminate the notion of machines altogether, and instead replace 
machines with a chipset, soc device, or whatever is the logic device 
that basically equates to what the machine logic does today.

The pc machine code is basically the i440fx.  You could take everything 
that it does, call it an i440fx object, and make "machine" properties 
properties of the i440fx.  That makes what we think of as machine 
creation identical to device creation.

3) eliminate anonymous devices, implicit bus assignment, and all of the 
other features of qdev that prevent the device model from being 
described in a stable fashion.

'-device-add e1000,id=foo' is ambiguous as is '-net nic,model=virtio 
-net nic,model=virtio'.

The rules about how we find the bus and location of e1000 in the device 
model today are arbitrary and difficult to introspect.  The result is 
that what you use to create a device model becomes wildly different than 
what you would use to recreate a device model.  There's really no way to 
programmaticaly discover this today either.  qdev doesn't return the 
properties value at construction time but rather the current value. 
That's not necessarily the value you want to use to recreate the device.

That's not to say that we shouldn't have friendly interfaces that do 
automatic PCI assignment bus assignment.  But that has to live a level 
higher up than where it lives today in order to create stable device trees.

The rules in QOM are meant to solve these problems.  They basically are:

a) All devices must have a unique name at the time of creation making 
stable device addressing guaranteed.

b) All relationships between devices are expressed as connections 
between plugs and sockets.  There are no exceptions here.  The 
implication is that you never need to use code to recreate a device 
model, you can always dump the device model and recreate it via QMP 
commands.

c) All device properties are settable after creation time.  This might 
not seem like a big deal, but in order to support composition, the act 
of instantiating a device such as the PIIX which creates more devices 
like a UART requires that you set the UARTs construction properties 
after creation of the PIIX.  Without having an explicit "realize" state 
where construction properties have been set, this problem is incredibly 
difficult to solve.

qdev cannot satisfy these requirements as it sits today.  Maybe there's 
a way to incrementally evolve qdev into QOM, I haven't really thought it 
through yet.

But the end goal is pretty clear.  We should be able to do (qemu) 
dump_device_model > foo.cfg in the source and then (qemu) 
import_device_model foo.cfg in the destination right before the final 
stage of migration.  And this should be part of the migration protocol.

This would make migration with hotplug work, along with scores of other 
things.  This is another reason why a unified model makes sense, just as 
you want to dump the device tree, you want to be able to enumerate the 
backends to make sure that identically named backends exist on the 
destination.  Doing that in a single operation is a lot easier than 
doing it 10 different ways.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-26 21:46           ` Anthony Liguori
@ 2011-07-26 22:22             ` Peter Maydell
  2011-07-26 23:08               ` Anthony Liguori
  0 siblings, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2011-07-26 22:22 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ryan Harper, Stefan Hajnoczi, mst, quintela, qemu-devel,
	Paolo Bonzini

On 26 July 2011 22:46, Anthony Liguori <anthony@codemonkey.ws> wrote:

[This is a bit random-sniping at minor points because I'm still thinking
about the big-picture bits]

> So we never actually walk the VMState tables to do anything.  The
> unconverted purely imperative routines we just convert to use marshal to a
> Visitor instead of QEMUFile.
>
> What this gives us is a way to achieve the same level of abstraction that
> VMState would give us for almost no work at all.  That fundamentally let's
> us take the next step in "fixing" migration.

IME the problem with migration is not devices which implement old-style
imperative save/load routines but all the devices which (silently) implement
no kind of save/load support at all...

> With an improved qdev (which I think is QOM, but for now, just ignore that),
> we would be able to do the following:
>
> 1) create a device, that creates other devices as children of itself
> *without* those children being under a bus hierarchy.

This is really important, yes. In fact in some ways the logical
partitioning of a system doesn't inherently follow any kind of bus.
So a Beagle board is a top-level thing which contains (among other
things) an OMAP3 SOC, and some external-to-the-SOC devices like a
NAND chip. The OMAP3 contains a CPU and a pile of devices including
the GPMC which is the memory controller for the NAND chip. So the
logical structure of the system is:

 beagleboard (the "machine" in qemu terms)
  - omap3
     - cortex-a8 (cpu)
     - omap_gpmc
     - omap_uart
     - etc
  - NAND flash
  - etc

even though the bus topology is more like:
 cortex-a8
  - omap_uart
  - other system-bus devices
  - omap_gpmc
     - NAND flash
     - other devices hanging off the GPMC

(and the interrupt topology is different again, ditto the clock tree).

When you're trying to put together a machine then I think the logical
structure is more useful than the memory bus or interrupt tree.

> 2) eliminate the notion of machines altogether, and instead replace machines
> with a chipset, soc device, or whatever is the logic device that basically
> equates to what the machine logic does today.

This doesn't make any sense, though. A machine isn't a chipset or a SOC.
It's a set of devices (including a CPU) wired up and configured in a
particular way. A Beagle and an Overo are definitely different machines
(which appear differently to guests in some ways) even though they share
the same OMAP3 SOC.

> The pc machine code is basically the i440fx.  You could take everything that
> it does, call it an i440fx object, and make "machine" properties properties
> of the i440fx.  That makes what we think of as machine creation identical to
> device creation.

I don't really know enough about PC hardware but I can't help thinking
that doing this is basically putting things into the qemu "i440fx" object
which aren't actually in the h/w i440fx. (Is the CPU really part of the
chipset, just for example? RAM?)

A random other point I'll throw in: along with composition ("this device
is really the result of wiring up and configuring these other devices
like this", you also want to be able to have a device 'hide' and/or
make read-only the properties of its subdevices, eg where My-SOC-USB
implements USB by composing usb-ohci and usb-ehci but hardwires various
things the generic OHCI/EHCI leave configurable. Also the machine
model will want to hide things for similar reasons.

-- PMM

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-26 22:22             ` Peter Maydell
@ 2011-07-26 23:08               ` Anthony Liguori
  0 siblings, 0 replies; 47+ messages in thread
From: Anthony Liguori @ 2011-07-26 23:08 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Ryan Harper, Stefan Hajnoczi, quintela, mst, qemu-devel,
	Paolo Bonzini

On 07/26/2011 05:22 PM, Peter Maydell wrote:
> On 26 July 2011 22:46, Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
> [This is a bit random-sniping at minor points because I'm still thinking
> about the big-picture bits]
>
>> So we never actually walk the VMState tables to do anything.  The
>> unconverted purely imperative routines we just convert to use marshal to a
>> Visitor instead of QEMUFile.
>>
>> What this gives us is a way to achieve the same level of abstraction that
>> VMState would give us for almost no work at all.  That fundamentally let's
>> us take the next step in "fixing" migration.
>
> IME the problem with migration is not devices which implement old-style
> imperative save/load routines but all the devices which (silently) implement
> no kind of save/load support at all...

 From a modelling PoV, I think the way to handle this is to make 
save/restore a pure virtual interface in the Device base class.  A 
device that doesn't implement something won't be instantiateble.

That's the approach I took for QOM.  But Gerd already did something for 
qdev here.

>> With an improved qdev (which I think is QOM, but for now, just ignore that),
>> we would be able to do the following:
>>
>> 1) create a device, that creates other devices as children of itself
>> *without* those children being under a bus hierarchy.
>
> This is really important, yes. In fact in some ways the logical
> partitioning of a system doesn't inherently follow any kind of bus.
> So a Beagle board is a top-level thing which contains (among other
> things) an OMAP3 SOC, and some external-to-the-SOC devices like a
> NAND chip. The OMAP3 contains a CPU and a pile of devices including
> the GPMC which is the memory controller for the NAND chip. So the
> logical structure of the system is:
>
>   beagleboard (the "machine" in qemu terms)
>    - omap3
>       - cortex-a8 (cpu)
>       - omap_gpmc
>       - omap_uart
>       - etc
>    - NAND flash
>    - etc
>
> even though the bus topology is more like:
>   cortex-a8
>    - omap_uart
>    - other system-bus devices
>    - omap_gpmc
>       - NAND flash
>       - other devices hanging off the GPMC
>
> (and the interrupt topology is different again, ditto the clock tree).
>
> When you're trying to put together a machine then I think the logical
> structure is more useful than the memory bus or interrupt tree.

Exactly.  In my mind, beagleboard should be a device that creates all of 
these parts via composition.

And if the logic is truly trivial, such that everything can be expressed 
without any special code, than beagleboard can just be a config file.

>> 2) eliminate the notion of machines altogether, and instead replace machines
>> with a chipset, soc device, or whatever is the logic device that basically
>> equates to what the machine logic does today.
>
> This doesn't make any sense, though. A machine isn't a chipset or a SOC.
> It's a set of devices (including a CPU) wired up and configured in a
> particular way. A Beagle and an Overo are definitely different machines
> (which appear differently to guests in some ways) even though they share
> the same OMAP3 SOC.

I'm basically suggesting that we model motherboards as devices.  They're 
boards with some wiring logic, a bunch of sockets for peripherals 
(sometimes including sockets for CPUs and memory, but not always), and 
all the necessary devices to be useful.

If everything can be constructed trivial without code, then it should be 
possible to do this.  But I don't think we'll ever be able to totally 
eliminate code in constructing these devices.

>> The pc machine code is basically the i440fx.  You could take everything that
>> it does, call it an i440fx object, and make "machine" properties properties
>> of the i440fx.  That makes what we think of as machine creation identical to
>> device creation.
>
> I don't really know enough about PC hardware but I can't help thinking
> that doing this is basically putting things into the qemu "i440fx" object
> which aren't actually in the h/w i440fx. (Is the CPU really part of the
> chipset, just for example? RAM?)

No, but there is an interface to the CPU which essentially becomes a 
socket.  Likewise, there are sockets for RAM dims.  So in my mind, 
creation would look like this:

(qemu) plug_create host-cpu id=cpu0
(qemu) plug_create ram id=ram0 size=4G
(qemu) plug_create i440fx id=chipset ram=ram0 cpu=cpu0

>
> A random other point I'll throw in: along with composition ("this device
> is really the result of wiring up and configuring these other devices
> like this", you also want to be able to have a device 'hide' and/or
> make read-only the properties of its subdevices, eg where My-SOC-USB
> implements USB by composing usb-ohci and usb-ehci but hardwires various
> things the generic OHCI/EHCI leave configurable. Also the machine
> model will want to hide things for similar reasons.

In the model I proposed, properties can be locked at run time. 
Normally, they're locked by default at realize but a device could 
certainly initialize it's child devices and then lock all of it's 
properties.  It's as simple as:

struct MySocUSB
{
    Device parent;

    UsbOhci ohci;
    UsbEhci ehci;
};

void my_soc_usb_initfn(...)
{
     // Identify children devices
     plug_add_property_plug(PLUG(obj), "ochi", &obj->ochi, TYPE_OCHI);
     plug_add_property_plug(PLUG(obj), "ehci", &obj->ehci, TYPE_EHCI);

     // Configure children devices
     ohci_set_foo(&obj->ohci, ...);
     ehci_set_bar(&obj->ehci, ...);

     // Prevent child device properties from being modified
     plug_lock_all_properties(PLUG(&obj->ohci));
     plug_lock_all_properties(PLUG(&obj->ehci));
}

This is all there today infrastructure wise.

Regards,

Anthony Liguori

>
> -- PMM
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-26 12:37       ` Anthony Liguori
  2011-07-26 20:13         ` Juan Quintela
@ 2011-07-29 14:03         ` Kevin Wolf
  2011-07-29 14:28           ` Anthony Liguori
  1 sibling, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2011-07-29 14:03 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ryan Harper, Stefan Hajnoczi, mst, quintela, qemu-devel,
	Paolo Bonzini

Am 26.07.2011 14:37, schrieb Anthony Liguori:
> On 07/26/2011 07:07 AM, Juan Quintela wrote:
>> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>>> == What we need ==
>>>
>>> We need to decompose migration into three different problems: 1)
>>> serializing device state 2) transforming the device model in order to
>>> satisfy forwards and backwards compatibility 3) encoding the
>>> serialized device model on the wire.
>>
>> I will change this to:
>> - We need to be able to "enable/disable" features of a device.
>>    A.K.A. make -M pc-0.14 work with devices with the same features
>>    than 0.14.  Notice that this is _independent_ of migration.
> 
> In theory, we already have this with qdev flags.
> 
>> - Be able to describe that different features/versions.  This is not the
>>    difficult part, it can be subsections, optional fields, whatever.
>>    What is the difficult part is _knowing_ what fields needs to be on
>>    each version.  That again depends of the device, not migration.
>>
>> - Be able to to do forward/bacward compatibility (and without
>>    comunication both sides is basically impossible).
> 
> Hrm, I'm not sure I agree with these conclusions.
> 
> Management tools should do their best job to create two compatible 
> device models.
> 
> Given two compatible device models, there *may* be differences in the 
> structure of the device models since we evolve things over time.  We may 
> rename a field, change the type, etc.  To support this, we can use 
> filters both on the destination and receive end to do our best to 
> massage the device model into something compatible.
> 
> But creating two creating compatible device models is not the job of the 
> migration protocol.  It's the job of management tools.

I'm not sure if I agree with this.

Let's forget about management tools for a moment, and just think of a
qemu instance with a given set of command line option describing its
devices. Then you start another instance with different options and
-incoming and start a migration. The result will be something, but
definitely not a successfully migrated VM (even though it might look
like one at first).

This is why I believe that the information about which devices to use
actually belongs into the migration data. There's no way to make use of
it with different options.

>>> 5) Once we're here, we can implement the next 5-year format.  That
>>> could be ASN.1 and be bidirectional or whatever makes the most sense.
>>> We could support 50 formats if we wanted to.  As long as the transport
>>> is distinct from the serialization and compat routines, it really
>>> doesn't matter.
>>
>> This means finishing the VMState support, once there, only thing needs
>> to change is "copy" the savevm, and change the "visitors" to whatever
>> else that we need/want.
> 
> There's no need to "finish" VMState to convert to visitors.  It's just 
> sed -e 's:qemu_put_be32:visit_type_int32:g'

Actually I think the real question is whether we want to have VMState or
not. If we do (and I think it's a good thing), then yes, we need to
finish it. If not, then we should revert the parts that are already
there. We shouldn't end up in an inconsistent state where half of qemu
is converted and we don't feel a need to do anything about the other half.

Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-29 14:03         ` Kevin Wolf
@ 2011-07-29 14:28           ` Anthony Liguori
  2011-07-29 15:18             ` Kevin Wolf
  0 siblings, 1 reply; 47+ messages in thread
From: Anthony Liguori @ 2011-07-29 14:28 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Ryan Harper, Stefan Hajnoczi, quintela, mst, qemu-devel,
	Paolo Bonzini

On 07/29/2011 09:03 AM, Kevin Wolf wrote:
> Am 26.07.2011 14:37, schrieb Anthony Liguori:
>> Hrm, I'm not sure I agree with these conclusions.
>>
>> Management tools should do their best job to create two compatible
>> device models.
>>
>> Given two compatible device models, there *may* be differences in the
>> structure of the device models since we evolve things over time.  We may
>> rename a field, change the type, etc.  To support this, we can use
>> filters both on the destination and receive end to do our best to
>> massage the device model into something compatible.
>>
>> But creating two creating compatible device models is not the job of the
>> migration protocol.  It's the job of management tools.
>
> I'm not sure if I agree with this.
>
> Let's forget about management tools for a moment, and just think of a
> qemu instance with a given set of command line option describing its
> devices. Then you start another instance with different options and
> -incoming and start a migration. The result will be something, but
> definitely not a successfully migrated VM (even though it might look
> like one at first).
>
> This is why I believe that the information about which devices to use
> actually belongs into the migration data. There's no way to make use of
> it with different options.

I agree with you actually.

Right now, it's the management tools job.  The complexity is daunting. 
Recreating the same object model, particularly after hotplug, is 
difficult and in many cases impossible.

I think the primary problem is that there are so many ways to create 
things.  -M pc creates a bunch of stuff that there's no way to create 
individually.  The stuff it creates can sort of be manipulated by using 
-global but not on a per device basis.  Much of it isn't even addressable.

Creating backends is a totally different mechanism and each backend has 
different mechanisms to enumerate and create.

The result is that introspecting what's there and recreating it is 
insanely complex today.

That's the motivation behind QOM.  plug_list lists *everything*.  All 
objects, whether they are created as part of the PIIX3 or whether it's a 
backing file, can be directly addressed and manipulated.

If you look at qsh, there's an import command.  The export command is 
trivial and I don't remember if I've already added it.  But the point is 
that you should be able to 'qsh export' everything and then 'qsh import' 
everything to create the exact same device model in another QEMU instance.

And yeah, this should end up becoming part of the migration protocol.

>>>> 5) Once we're here, we can implement the next 5-year format.  That
>>>> could be ASN.1 and be bidirectional or whatever makes the most sense.
>>>> We could support 50 formats if we wanted to.  As long as the transport
>>>> is distinct from the serialization and compat routines, it really
>>>> doesn't matter.
>>>
>>> This means finishing the VMState support, once there, only thing needs
>>> to change is "copy" the savevm, and change the "visitors" to whatever
>>> else that we need/want.
>>
>> There's no need to "finish" VMState to convert to visitors.  It's just
>> sed -e 's:qemu_put_be32:visit_type_int32:g'
>
> Actually I think the real question is whether we want to have VMState or
> not.

VMState doesn't give me what I want by itself.

I want to be able to marshal the device tree to an in-memory 
representation that can be manipulated.  One approach to doing that is 
first completing VMState, and then writing something that can walk the 
VMState descriptions.  The VMState descriptions are fairly complicated 
but it's doable.

Another approach, which I'm arguing is much simpler, the imperative 
nature of our current serialization and use visitors.

There may be other advantages of a declarative description of VMState 
that would justify completing the conversions.  But I don't think we 
need it to start improving the migration protocol.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-29 14:28           ` Anthony Liguori
@ 2011-07-29 15:18             ` Kevin Wolf
  2011-07-29 22:28               ` Anthony Liguori
  0 siblings, 1 reply; 47+ messages in thread
From: Kevin Wolf @ 2011-07-29 15:18 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Ryan Harper, Stefan Hajnoczi, quintela, mst, qemu-devel,
	Paolo Bonzini

Am 29.07.2011 16:28, schrieb Anthony Liguori:
> On 07/29/2011 09:03 AM, Kevin Wolf wrote:
>> Am 26.07.2011 14:37, schrieb Anthony Liguori:
>>> Hrm, I'm not sure I agree with these conclusions.
>>>
>>> Management tools should do their best job to create two compatible
>>> device models.
>>>
>>> Given two compatible device models, there *may* be differences in the
>>> structure of the device models since we evolve things over time.  We may
>>> rename a field, change the type, etc.  To support this, we can use
>>> filters both on the destination and receive end to do our best to
>>> massage the device model into something compatible.
>>>
>>> But creating two creating compatible device models is not the job of the
>>> migration protocol.  It's the job of management tools.
>>
>> I'm not sure if I agree with this.
>>
>> Let's forget about management tools for a moment, and just think of a
>> qemu instance with a given set of command line option describing its
>> devices. Then you start another instance with different options and
>> -incoming and start a migration. The result will be something, but
>> definitely not a successfully migrated VM (even though it might look
>> like one at first).
>>
>> This is why I believe that the information about which devices to use
>> actually belongs into the migration data. There's no way to make use of
>> it with different options.
> 
> I agree with you actually.
> 
> Right now, it's the management tools job.  The complexity is daunting. 
> Recreating the same object model, particularly after hotplug, is 
> difficult and in many cases impossible.
> 
> I think the primary problem is that there are so many ways to create 
> things.  -M pc creates a bunch of stuff that there's no way to create 
> individually.  The stuff it creates can sort of be manipulated by using 
> -global but not on a per device basis.  Much of it isn't even addressable.
> 
> Creating backends is a totally different mechanism and each backend has 
> different mechanisms to enumerate and create.

And backends are actually something totally different: They are the part
that you can't migrate automatically, but that you must create on the
destination host like we're doing it today. The paths to images etc.
could be completely different from the source host.

The one change for backends is that if we migrate a device in way so
that it can say "I need the block backend with the ID 'foo'", then we
can at least make sure that the backend actually exists and is usable.

> The result is that introspecting what's there and recreating it is 
> insanely complex today.
> 
> That's the motivation behind QOM.  plug_list lists *everything*.  All 
> objects, whether they are created as part of the PIIX3 or whether it's a 
> backing file, can be directly addressed and manipulated.
> 
> If you look at qsh, there's an import command.  The export command is 
> trivial and I don't remember if I've already added it.  But the point is 
> that you should be able to 'qsh export' everything and then 'qsh import' 
> everything to create the exact same device model in another QEMU instance.
> 
> And yeah, this should end up becoming part of the migration protocol.

If all you're saying is that we can't get it tomorrow, that's fine for
me. Good to know that we agree on the goal anyway.

>>>>> 5) Once we're here, we can implement the next 5-year format.  That
>>>>> could be ASN.1 and be bidirectional or whatever makes the most sense.
>>>>> We could support 50 formats if we wanted to.  As long as the transport
>>>>> is distinct from the serialization and compat routines, it really
>>>>> doesn't matter.
>>>>
>>>> This means finishing the VMState support, once there, only thing needs
>>>> to change is "copy" the savevm, and change the "visitors" to whatever
>>>> else that we need/want.
>>>
>>> There's no need to "finish" VMState to convert to visitors.  It's just
>>> sed -e 's:qemu_put_be32:visit_type_int32:g'
>>
>> Actually I think the real question is whether we want to have VMState or
>> not.
> 
> VMState doesn't give me what I want by itself.
> 
> I want to be able to marshal the device tree to an in-memory 
> representation that can be manipulated.  One approach to doing that is 
> first completing VMState, and then writing something that can walk the 
> VMState descriptions.  The VMState descriptions are fairly complicated 
> but it's doable.
> 
> Another approach, which I'm arguing is much simpler, the imperative 
> nature of our current serialization and use visitors.
> 
> There may be other advantages of a declarative description of VMState 
> that would justify completing the conversions.  But I don't think we 
> need it to start improving the migration protocol.

Yeah, I somehow read it as "there's no reason to continue with
converting to VMState", which isn't what you were saying.

Kevin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-29 15:18             ` Kevin Wolf
@ 2011-07-29 22:28               ` Anthony Liguori
  2011-07-31 10:48                 ` Dor Laor
  0 siblings, 1 reply; 47+ messages in thread
From: Anthony Liguori @ 2011-07-29 22:28 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Ryan Harper, Stefan Hajnoczi, mst, quintela, qemu-devel,
	Paolo Bonzini

On 07/29/2011 10:18 AM, Kevin Wolf wrote:
> Am 29.07.2011 16:28, schrieb Anthony Liguori:
>
> The one change for backends is that if we migrate a device in way so
> that it can say "I need the block backend with the ID 'foo'", then we
> can at least make sure that the backend actually exists and is usable.

Yup.  So with QOM, this could work in a couple ways.

You could dump the full graph including the backends, and then recreate 
it but not realize any objects.  This would give you a chance to make 
changes to things like the block device filenames.  It could be as 
simple as just changing the filename of a device, or deleting a complex 
block device chain (from backing files) and replacing it with something 
totally different.

I think the common case is that the backends are much the same so I 
think an interface centered around recreating the backends verbatim but 
allowing tweaks would probably be the friendliest.

We could also require that the backends are created before we migrate 
the device model.  In QOM, while you would be allowed to create a 
virtio-blk device, when you tried to set the drive property to 'foo', 
you'd get an error unless the 'foo' backend existed and was of the 
appropriate type.

Since it's pretty easy to enumerate the required backends, it's really 
not so bad for the management tools to do this work.  My only concern is 
that this all has to happen in the migration downtime window in order 
for hotplug to work robustly.

>
>> The result is that introspecting what's there and recreating it is
>> insanely complex today.
>>
>> That's the motivation behind QOM.  plug_list lists *everything*.  All
>> objects, whether they are created as part of the PIIX3 or whether it's a
>> backing file, can be directly addressed and manipulated.
>>
>> If you look at qsh, there's an import command.  The export command is
>> trivial and I don't remember if I've already added it.  But the point is
>> that you should be able to 'qsh export' everything and then 'qsh import'
>> everything to create the exact same device model in another QEMU instance.
>>
>> And yeah, this should end up becoming part of the migration protocol.
>
> If all you're saying is that we can't get it tomorrow, that's fine for
> me. Good to know that we agree on the goal anyway.

Yup :-)


>>>>>> 5) Once we're here, we can implement the next 5-year format.  That
>>>>>> could be ASN.1 and be bidirectional or whatever makes the most sense.
>>>>>> We could support 50 formats if we wanted to.  As long as the transport
>>>>>> is distinct from the serialization and compat routines, it really
>>>>>> doesn't matter.
>>>>>
>>>>> This means finishing the VMState support, once there, only thing needs
>>>>> to change is "copy" the savevm, and change the "visitors" to whatever
>>>>> else that we need/want.
>>>>
>>>> There's no need to "finish" VMState to convert to visitors.  It's just
>>>> sed -e 's:qemu_put_be32:visit_type_int32:g'
>>>
>>> Actually I think the real question is whether we want to have VMState or
>>> not.
>>
>> VMState doesn't give me what I want by itself.
>>
>> I want to be able to marshal the device tree to an in-memory
>> representation that can be manipulated.  One approach to doing that is
>> first completing VMState, and then writing something that can walk the
>> VMState descriptions.  The VMState descriptions are fairly complicated
>> but it's doable.
>>
>> Another approach, which I'm arguing is much simpler, the imperative
>> nature of our current serialization and use visitors.
>>
>> There may be other advantages of a declarative description of VMState
>> that would justify completing the conversions.  But I don't think we
>> need it to start improving the migration protocol.
>
> Yeah, I somehow read it as "there's no reason to continue with
> converting to VMState", which isn't what you were saying.

No, not at all.  Just that converting everything to VMState isn't a 
prerequisite for building a more robust migration protocol.

Regards,

Anthony Liguori

> Kevin
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-29 22:28               ` Anthony Liguori
@ 2011-07-31 10:48                 ` Dor Laor
  2011-07-31 11:37                   ` Peter Maydell
  2011-07-31 20:43                   ` Anthony Liguori
  0 siblings, 2 replies; 47+ messages in thread
From: Dor Laor @ 2011-07-31 10:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Ryan Harper, Stefan Hajnoczi, quintela, mst,
	qemu-devel, Paolo Bonzini

On 07/30/2011 01:28 AM, Anthony Liguori wrote:
> No, not at all.  Just that converting everything to VMState isn't a
> prerequisite for building a more robust migration protocol.

The main thing is to priorities the problems we're facing with.
  - Live migration protocol:
    - VMState conversion is not complete
    - Live migration is not flexible enough (even with subsections)
    - Simplify destination cmdline for machine creation
  - Qdev
    - conversion is not complete
    - Machine + devices description are complex and have hidden glue
  - Qapi
    - Needs merging
  - QOB
    - Only the beginning

So overall there are many parallel projects, probably more than the 
above. The RightThink(tm) would be to pick the ones that we can converge 
on and not try to handle all in parallel. There are problems we can live 
with. Engineering wise it might not be a beauty but they can wait (for 
instance dark magic to create the machines). There are some that prevent 
adding new features or make the code hard to support w/o them.

Cheers,
Dor

ps: how hard is to finish the vmstate conversion? Can't we just assume 
not converted code is not functional and just remove all of it?

>
> Regards,
>
> Anthony Liguori

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 10:48                 ` Dor Laor
@ 2011-07-31 11:37                   ` Peter Maydell
  2011-07-31 11:45                     ` Dor Laor
  2011-07-31 20:43                   ` Anthony Liguori
  1 sibling, 1 reply; 47+ messages in thread
From: Peter Maydell @ 2011-07-31 11:37 UTC (permalink / raw)
  To: dlaor
  Cc: Kevin Wolf, Ryan Harper, Stefan Hajnoczi, mst, quintela,
	qemu-devel, Paolo Bonzini

On 31 July 2011 11:48, Dor Laor <dlaor@redhat.com> wrote:
> ps: how hard is to finish the vmstate conversion? Can't we just assume
> not converted code is not functional and just remove all of it?

No, definitely not. I think most people using non-x86 architectures
don't use the vmsave/vmload/migration features at all, but would
be annoyed if the perfectly functional device models they were
using got deleted...

-- PMM

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 11:37                   ` Peter Maydell
@ 2011-07-31 11:45                     ` Dor Laor
  2011-07-31 18:46                       ` Christoph Hellwig
  0 siblings, 1 reply; 47+ messages in thread
From: Dor Laor @ 2011-07-31 11:45 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Kevin Wolf, Ryan Harper, Stefan Hajnoczi, mst, quintela,
	qemu-devel, Paolo Bonzini

On 07/31/2011 02:37 PM, Peter Maydell wrote:
> On 31 July 2011 11:48, Dor Laor<dlaor@redhat.com>  wrote:
>> ps: how hard is to finish the vmstate conversion? Can't we just assume
>> not converted code is not functional and just remove all of it?
>
> No, definitely not. I think most people using non-x86 architectures
> don't use the vmsave/vmload/migration features at all, but would
> be annoyed if the perfectly functional device models they were
> using got deleted...

I didn't mean to erase the entire device, just the code for save/load 
which as you say, might not be used at all.

>
> -- PMM

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 11:45                     ` Dor Laor
@ 2011-07-31 18:46                       ` Christoph Hellwig
  2011-07-31 20:43                         ` Dor Laor
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2011-07-31 18:46 UTC (permalink / raw)
  To: Dor Laor
  Cc: Kevin Wolf, Peter Maydell, Stefan Hajnoczi, mst, Ryan Harper,
	quintela, qemu-devel, Paolo Bonzini

On Sun, Jul 31, 2011 at 02:45:07PM +0300, Dor Laor wrote:
>> No, definitely not. I think most people using non-x86 architectures
>> don't use the vmsave/vmload/migration features at all, but would
>> be annoyed if the perfectly functional device models they were
>> using got deleted...
>
> I didn't mean to erase the entire device, just the code for save/load which 
> as you say, might not be used at all.

Like the one in virtio?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 18:46                       ` Christoph Hellwig
@ 2011-07-31 20:43                         ` Dor Laor
  2011-07-31 20:55                           ` Anthony Liguori
  2011-07-31 23:10                           ` Christoph Hellwig
  0 siblings, 2 replies; 47+ messages in thread
From: Dor Laor @ 2011-07-31 20:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kevin Wolf, Peter Maydell, Stefan Hajnoczi, quintela, Ryan Harper,
	mst, qemu-devel, Paolo Bonzini

On 07/31/2011 09:46 PM, Christoph Hellwig wrote:
> On Sun, Jul 31, 2011 at 02:45:07PM +0300, Dor Laor wrote:
>>> No, definitely not. I think most people using non-x86 architectures
>>> don't use the vmsave/vmload/migration features at all, but would
>>> be annoyed if the perfectly functional device models they were
>>> using got deleted...
>>
>> I didn't mean to erase the entire device, just the code for save/load which
>> as you say, might not be used at all.
>
> Like the one in virtio?

/me caught off guard. I wonder why it wasn't converted to VMSTATE 
before?  virtio is one of the key devices, it's not just random 
forgotten one that might not care about migration.

It's worth to utilize this discussion to realize whether vmstate is 
significant enough.
 From my brief browsing it looks like vmstate helps to reduce some plain 
errors with double save/load coding, ease the field encoding and handles 
subsections (which imho is the most important).

It's true that we need to introduce capabilities to the live migration 
protocol and some other goodies but we might be able to do that with
the existing method of gradual enhancement for VMSTATE to whatever form 
it may be.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 20:43                         ` Dor Laor
@ 2011-07-31 20:55                           ` Anthony Liguori
  2011-07-31 23:10                           ` Christoph Hellwig
  1 sibling, 0 replies; 47+ messages in thread
From: Anthony Liguori @ 2011-07-31 20:55 UTC (permalink / raw)
  To: dlaor
  Cc: Kevin Wolf, Peter Maydell, Stefan Hajnoczi, quintela, Ryan Harper,
	mst, qemu-devel, Paolo Bonzini, Christoph Hellwig

On 07/31/2011 03:43 PM, Dor Laor wrote:
> On 07/31/2011 09:46 PM, Christoph Hellwig wrote:
>> On Sun, Jul 31, 2011 at 02:45:07PM +0300, Dor Laor wrote:
>>>> No, definitely not. I think most people using non-x86 architectures
>>>> don't use the vmsave/vmload/migration features at all, but would
>>>> be annoyed if the perfectly functional device models they were
>>>> using got deleted...
>>>
>>> I didn't mean to erase the entire device, just the code for save/load
>>> which
>>> as you say, might not be used at all.
>>
>> Like the one in virtio?
>
> /me caught off guard. I wonder why it wasn't converted to VMSTATE
> before? virtio is one of the key devices, it's not just random forgotten
> one that might not care about migration.
>
> It's worth to utilize this discussion to realize whether vmstate is
> significant enough.

VMState does two things.  It provides a common code path for save/load. 
  This is wonderful and it absolutely has prevent numerous bugs from 
happening.  Undeniably, it's made migration better and more robust 
because of that.

It also provides a declarative description of the serialization state. 
The declarative language has gotten complex and it's still not quite 
covering everything we do (there's a lot of one-off marshalling handlers 
to handle corner cases).

I think we've basically gotten as much as we can with the declarative 
approach.  I think we have to take the next logical step which is to use 
the declarative descriptions (or imperative marshallers) to generate a 
richer internal representation that we can manipulate in a high level 
fashion.

We could keep trying to make everything declarative but that in and of 
itself does not get us to the next step with improving migration.  And 
it shouldn't gate us either.

>  From my brief browsing it looks like vmstate helps to reduce some plain
> errors with double save/load coding, ease the field encoding and handles
> subsections (which imho is the most important).

I think we need to really step back and look at the larger picture.

What do we really need to "fix" migration?  I've given this a ton of 
thought, and I think there's really two classes of problems:

1) Creating the same guest visible device model in two, potentially 
different, versions of QEMU.

2) Given an identical guest visible device model, coping with variations 
in the internal implementation and state serialization.

Subsections and versions are solutions to (2), but limited to the scope 
of an individual device (and really, individual fields).  We completely 
punt (1) to management tools.

You need a comprehensive object model to solve (1).  I'm convinced of 
that.  To solve (2), we need to be able to separate compatibility from 
internal implementation.

To me, this means migrating to an internal data structure and then 
manipulating that data structure before/after transferring it over the wire.

>
> It's true that we need to introduce capabilities to the live migration
> protocol and some other goodies but we might be able to do that with
> the existing method of gradual enhancement for VMSTATE to whatever form
> it may be.

I've already written this up:

http://wiki.qemu.org/Features/Migration/Next

Regards,

Anthony Liguori

>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 20:43                         ` Dor Laor
  2011-07-31 20:55                           ` Anthony Liguori
@ 2011-07-31 23:10                           ` Christoph Hellwig
  2011-08-01  0:15                             ` Anthony Liguori
  1 sibling, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2011-07-31 23:10 UTC (permalink / raw)
  To: Dor Laor
  Cc: Kevin Wolf, Peter Maydell, Stefan Hajnoczi, mst, Ryan Harper,
	quintela, qemu-devel, Paolo Bonzini, Christoph Hellwig

On Sun, Jul 31, 2011 at 11:43:08PM +0300, Dor Laor wrote:
> /me caught off guard. I wonder why it wasn't converted to VMSTATE before?  
> virtio is one of the key devices, it's not just random forgotten one that 
> might not care about migration.

It just shows the extent of incomplete transitions in qemu.  Given how
much burden incomplete transitions have on software projects we should
try to minimize them in qemu.  That is if people add a new API we need
to have a clear roadmap when it's going to be finished, and more importantly
what the consequence of not finishing it are instead of leaving it half
done.  I think the way the Linux kernel handles API transitions is something
qemu could borrow from. For most of them it's simply expected to do a simple
conversion of all users of an API to the new equivalent, maybe it in
simplistic and dumb way, but at least a transition.  Combined with a
deprectation schedule for unused drivers that seems to do wonders, although
of course even the Linux kernel is slacking in some areas.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 23:10                           ` Christoph Hellwig
@ 2011-08-01  0:15                             ` Anthony Liguori
  2011-08-01  7:54                               ` Christoph Hellwig
  0 siblings, 1 reply; 47+ messages in thread
From: Anthony Liguori @ 2011-08-01  0:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kevin Wolf, Peter Maydell, Stefan Hajnoczi, mst, Ryan Harper,
	Dor Laor, qemu-devel, quintela, Paolo Bonzini

On 07/31/2011 06:10 PM, Christoph Hellwig wrote:
> On Sun, Jul 31, 2011 at 11:43:08PM +0300, Dor Laor wrote:
>> /me caught off guard. I wonder why it wasn't converted to VMSTATE before?
>> virtio is one of the key devices, it's not just random forgotten one that
>> might not care about migration.
>
> It just shows the extent of incomplete transitions in qemu.  Given how
> much burden incomplete transitions have on software projects we should
> try to minimize them in qemu.  That is if people add a new API we need
> to have a clear roadmap when it's going to be finished, and more importantly
> what the consequence of not finishing it are instead of leaving it half
> done.  I think the way the Linux kernel handles API transitions is something
> qemu could borrow from. For most of them it's simply expected to do a simple
> conversion of all users of an API to the new equivalent, maybe it in
> simplistic and dumb way, but at least a transition.  Combined with a
> deprectation schedule for unused drivers that seems to do wonders, although
> of course even the Linux kernel is slacking in some areas.

One of the things I think the kernel is good at, is making relatively 
large changes outside of the tree and then merging it in a way that 
makes sense when it makes sense.

I think we've set the bar too low historically for introducing new 
interfaces.  I think Avi's new memory API is a good example of how we 
should approach these things--do the vast majority of the thankless work 
up front before initial merge.

Besides making sure we don't have incomplete interfaces, it also helps 
validate the interface before committing to it.

Regards,

Anthony Liguori

>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-08-01  0:15                             ` Anthony Liguori
@ 2011-08-01  7:54                               ` Christoph Hellwig
  2011-08-01 13:53                                 ` Anthony Liguori
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2011-08-01  7:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Peter Maydell, Stefan Hajnoczi, mst, Ryan Harper,
	Dor Laor, qemu-devel, quintela, Paolo Bonzini, Christoph Hellwig

On Sun, Jul 31, 2011 at 07:15:21PM -0500, Anthony Liguori wrote:
> I think we've set the bar too low historically for introducing new 
> interfaces.  I think Avi's new memory API is a good example of how we 
> should approach these things--do the vast majority of the thankless work up 
> front before initial merge.

Yes, that seems to work a bit better.

So how will we sort out and finalized the vmstate bits, QMP, and making
sure we have one sort of error reporting?

For vmstate I'd agree to Dor and principle and just drop support for old-style
load/save functions after converting everything that matters to vmstate.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-08-01  7:54                               ` Christoph Hellwig
@ 2011-08-01 13:53                                 ` Anthony Liguori
  2011-08-04 14:59                                   ` Luiz Capitulino
  0 siblings, 1 reply; 47+ messages in thread
From: Anthony Liguori @ 2011-08-01 13:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Kevin Wolf, Peter Maydell, Stefan Hajnoczi, mst, Ryan Harper,
	Dor Laor, qemu-devel, quintela, Paolo Bonzini

On 08/01/2011 02:54 AM, Christoph Hellwig wrote:
> On Sun, Jul 31, 2011 at 07:15:21PM -0500, Anthony Liguori wrote:
>> I think we've set the bar too low historically for introducing new
>> interfaces.  I think Avi's new memory API is a good example of how we
>> should approach these things--do the vast majority of the thankless work up
>> front before initial merge.
>
> Yes, that seems to work a bit better.
>
> So how will we sort out and finalized the vmstate bits,

http://wiki.qemu.org/Features/Migration/Next

Is what I think we need to do next for migration.  In terms of VMState, 
I think we should can leave it in the current state its in for now.  If 
there is a desire to keep converting devices, that would be fine.

Because I think the next thing to do in terms of changing device 
serialization is to make serialization a proper virtual method of the 
base object class.  I think devices that use composition should also 
serialize their children as part of their serialization.

I think that falls under the banner of updating the object model.

> QMP, and making
> sure we have one sort of error reporting?

I've updated the QMP merge plan on the wiki:

http://wiki.qemu.org/Features/QAPI#Merge_Plan

We've merged phase one, and phase two shouldn't be that hard to merge as 
the code is already written.  It's just a matter of rebasing and 
incorporating in an incremental fashion.

Phase two eliminates qerror_report() in favor of passing Error **s. 
It's very invasive which is why we decided to merge in two phases.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-08-01 13:53                                 ` Anthony Liguori
@ 2011-08-04 14:59                                   ` Luiz Capitulino
  0 siblings, 0 replies; 47+ messages in thread
From: Luiz Capitulino @ 2011-08-04 14:59 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Peter Maydell, Stefan Hajnoczi, mst, Ryan Harper,
	Dor Laor, qemu-devel, quintela, Paolo Bonzini, Christoph Hellwig

On Mon, 01 Aug 2011 08:53:28 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 08/01/2011 02:54 AM, Christoph Hellwig wrote:
> > On Sun, Jul 31, 2011 at 07:15:21PM -0500, Anthony Liguori wrote:
> >> I think we've set the bar too low historically for introducing new
> >> interfaces.  I think Avi's new memory API is a good example of how we
> >> should approach these things--do the vast majority of the thankless work up
> >> front before initial merge.
> >
> > Yes, that seems to work a bit better.
> >
> > So how will we sort out and finalized the vmstate bits,
> 
> http://wiki.qemu.org/Features/Migration/Next
> 
> Is what I think we need to do next for migration.  In terms of VMState, 
> I think we should can leave it in the current state its in for now.  If 
> there is a desire to keep converting devices, that would be fine.
> 
> Because I think the next thing to do in terms of changing device 
> serialization is to make serialization a proper virtual method of the 
> base object class.  I think devices that use composition should also 
> serialize their children as part of their serialization.
> 
> I think that falls under the banner of updating the object model.
> 
> > QMP, and making
> > sure we have one sort of error reporting?
> 
> I've updated the QMP merge plan on the wiki:
> 
> http://wiki.qemu.org/Features/QAPI#Merge_Plan

Something that delays a full QMP conversion is designing the new interfaces
(sometimes internal ones too).

I feel that we're striving for perfection. While it's obvious that we need
good interfaces, we have tons of commands and properly designing each of
them will take ages.

> We've merged phase one, and phase two shouldn't be that hard to merge as 
> the code is already written.  It's just a matter of rebasing and 
> incorporating in an incremental fashion.
> 
> Phase two eliminates qerror_report() in favor of passing Error **s. 
> It's very invasive which is why we decided to merge in two phases.
> 
> Regards,
> 
> Anthony Liguori
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 10:48                 ` Dor Laor
  2011-07-31 11:37                   ` Peter Maydell
@ 2011-07-31 20:43                   ` Anthony Liguori
  2011-07-31 20:57                     ` Dor Laor
  1 sibling, 1 reply; 47+ messages in thread
From: Anthony Liguori @ 2011-07-31 20:43 UTC (permalink / raw)
  To: dlaor
  Cc: Kevin Wolf, Ryan Harper, Stefan Hajnoczi, mst, quintela,
	qemu-devel, Paolo Bonzini

On 07/31/2011 05:48 AM, Dor Laor wrote:
> On 07/30/2011 01:28 AM, Anthony Liguori wrote:
>> No, not at all. Just that converting everything to VMState isn't a
>> prerequisite for building a more robust migration protocol.
>
> The main thing is to priorities the problems we're facing with.
> - Live migration protocol:
> - VMState conversion is not complete

But this is not a problem because it doesn't gate anything.  That's my 
point.

> - Live migration is not flexible enough (even with subsections)

To make it more flexible, we need to be able to marshal to an internal 
data structure that we can transform in more flexible ways.

> - Simplify destination cmdline for machine creation

This needs qdev fixing.

> - Qdev
> - conversion is not complete
> - Machine + devices description are complex and have hidden glue

This is a hard problem.

> - Qapi
> - Needs merging

We merged the first part (which includes the new QMP server).  The work 
is done for converting the actual QMP commands.

> - QOB
> - Only the beginning
>
> So overall there are many parallel projects, probably more than the
> above. The RightThink(tm) would be to pick the ones that we can converge
> on and not try to handle all in parallel. There are problems we can live
> with. Engineering wise it might not be a beauty but they can wait (for
> instance dark magic to create the machines). There are some that prevent
> adding new features or make the code hard to support w/o them.
>
> Cheers,
> Dor
>
> ps: how hard is to finish the vmstate conversion? Can't we just assume
> not converted code is not functional and just remove all of it?

No.  VMState is a solution looking for a problem.  Many important device 
models are still not converted and ultimately, it doesn't solve the 
problem we're really trying to solve.

Regards,

Anthony Liguori

>
>>
>> Regards,
>>
>> Anthony Liguori
>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 20:43                   ` Anthony Liguori
@ 2011-07-31 20:57                     ` Dor Laor
  2011-07-31 21:03                       ` Anthony Liguori
  0 siblings, 1 reply; 47+ messages in thread
From: Dor Laor @ 2011-07-31 20:57 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Ryan Harper, Stefan Hajnoczi, quintela, mst,
	qemu-devel, Paolo Bonzini

On 07/31/2011 11:43 PM, Anthony Liguori wrote:
> On 07/31/2011 05:48 AM, Dor Laor wrote:
>> On 07/30/2011 01:28 AM, Anthony Liguori wrote:
>>> No, not at all. Just that converting everything to VMState isn't a
>>> prerequisite for building a more robust migration protocol.
>>
>> The main thing is to priorities the problems we're facing with.
>> - Live migration protocol:
>> - VMState conversion is not complete
>
> But this is not a problem because it doesn't gate anything. That's my
> point.

The VMState might be an exception but in general we have too many 
unfinished businesses going on.

>
>> - Live migration is not flexible enough (even with subsections)
>
> To make it more flexible, we need to be able to marshal to an internal
> data structure that we can transform in more flexible ways.
>
>> - Simplify destination cmdline for machine creation
>
> This needs qdev fixing.
>
>> - Qdev
>> - conversion is not complete
>> - Machine + devices description are complex and have hidden glue
>
> This is a hard problem.
>
>> - Qapi
>> - Needs merging
>
> We merged the first part (which includes the new QMP server). The work
> is done for converting the actual QMP commands.
>
>> - QOB
>> - Only the beginning
>>
>> So overall there are many parallel projects, probably more than the
>> above. The RightThink(tm) would be to pick the ones that we can converge
>> on and not try to handle all in parallel. There are problems we can live
>> with. Engineering wise it might not be a beauty but they can wait (for
>> instance dark magic to create the machines). There are some that prevent
>> adding new features or make the code hard to support w/o them.
>>
>> Cheers,
>> Dor
>>
>> ps: how hard is to finish the vmstate conversion? Can't we just assume
>> not converted code is not functional and just remove all of it?
>
> No. VMState is a solution looking for a problem. Many important device

The initial target solved some rare bugs, that tend not to bite us with 
virtio. On the way, it got enhanced with subsections that was a major 
improvement.

> models are still not converted and ultimately, it doesn't solve the
> problem we're really trying to solve.

 From the start I supported Michael Tisrkin's idea for ASN.1 protocol.
The question is how visitors and ability to translate from one 
representation to another will help us. I do see value in it but I don't 
think it is that important. If we have one real device serialization 
method that is flexible enough we can stick with it w/o translation. If 
we define qdev serialization into vmstate/asn.1/json/other and add some 
capability negotiation and various other goodies it should be enough.

btw: separating the live migration protocol from the machine state is 
even more important if we take a gradual approach.

>
> Regards,
>
> Anthony Liguori
>
>>
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 20:57                     ` Dor Laor
@ 2011-07-31 21:03                       ` Anthony Liguori
  2011-07-31 21:25                         ` Dor Laor
  0 siblings, 1 reply; 47+ messages in thread
From: Anthony Liguori @ 2011-07-31 21:03 UTC (permalink / raw)
  To: dlaor
  Cc: Kevin Wolf, Ryan Harper, Stefan Hajnoczi, mst, quintela,
	qemu-devel, Paolo Bonzini

On 07/31/2011 03:57 PM, Dor Laor wrote:
> On 07/31/2011 11:43 PM, Anthony Liguori wrote:
>>> ps: how hard is to finish the vmstate conversion? Can't we just assume
>>> not converted code is not functional and just remove all of it?
>>
>> No. VMState is a solution looking for a problem. Many important device
>
> The initial target solved some rare bugs, that tend not to bite us with
> virtio. On the way, it got enhanced with subsections that was a major
> improvement.

I should have qualified my statement.  VMState did solve many real 
problems.  I meant that at this point in time, we've gotten pretty much 
what we can get out it.

>
>> models are still not converted and ultimately, it doesn't solve the
>> problem we're really trying to solve.
>
>  From the start I supported Michael Tisrkin's idea for ASN.1 protocol.
> The question is how visitors and ability to translate from one
> representation to another will help us.

Because with Visitors you can do:

Devices -> internal QObject representation -> ASN.1 -> wire -> ASN.1 -> 
internal QObject representation -> Device.

While it's in an internal representation, we can make large changes like 
translating entire device state structures to new formats, splitting one 
device into two, etc.

It's sort of the ultimate mechanism to make compatibility changes.  If 
you just go Devices -> ASN.1, you miss out on that.

BTW, another really useful thing that Visitor would enable is the 
ability to read an individual device to a QObject and implement the 
equivalent of 'show devicename' which dumps the state of arbitrary 
devices via QMP.  This could be very useful for debugging.

> I do see value in it but I don't
> think it is that important. If we have one real device serialization
> method that is flexible enough we can stick with it w/o translation. If
> we define qdev serialization into vmstate/asn.1/json/other and add some
> capability negotiation and various other goodies it should be enough.
>
> btw: separating the live migration protocol from the machine state is
> even more important if we take a gradual approach.

Yeah, I think the critical technical requirement to achieve this is that 
the devices need to generate their own serialization format, and then 
another layer translates that to the "live migration protocol" format.

Regards,

Anthony Liguori

>
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>>
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
>>>
>>>
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 21:03                       ` Anthony Liguori
@ 2011-07-31 21:25                         ` Dor Laor
  2011-07-31 21:49                           ` Anthony Liguori
  0 siblings, 1 reply; 47+ messages in thread
From: Dor Laor @ 2011-07-31 21:25 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Kevin Wolf, Ryan Harper, Stefan Hajnoczi, mst, quintela,
	qemu-devel, Paolo Bonzini

On 08/01/2011 12:03 AM, Anthony Liguori wrote:
> On 07/31/2011 03:57 PM, Dor Laor wrote:
>> On 07/31/2011 11:43 PM, Anthony Liguori wrote:
>>>> ps: how hard is to finish the vmstate conversion? Can't we just assume
>>>> not converted code is not functional and just remove all of it?
>>>
>>> No. VMState is a solution looking for a problem. Many important device
>>
>> The initial target solved some rare bugs, that tend not to bite us with
>> virtio. On the way, it got enhanced with subsections that was a major
>> improvement.
>
> I should have qualified my statement. VMState did solve many real
> problems. I meant that at this point in time, we've gotten pretty much
> what we can get out it.
>
>>
>>> models are still not converted and ultimately, it doesn't solve the
>>> problem we're really trying to solve.
>>
>> From the start I supported Michael Tisrkin's idea for ASN.1 protocol.
>> The question is how visitors and ability to translate from one
>> representation to another will help us.
>
> Because with Visitors you can do:
>
> Devices -> internal QObject representation -> ASN.1 -> wire -> ASN.1 ->
> internal QObject representation -> Device.

I admit that QObject sounds more appealing than VMState, we can convert 
all into it. I'm not sure what's the difference between visitor and the 
load/save functions, potentially with enhanced parameters like name 
which can be part of QObject anyway.

>
> While it's in an internal representation, we can make large changes like
> translating entire device state structures to new formats, splitting one
> device into two, etc.
>
> It's sort of the ultimate mechanism to make compatibility changes. If
> you just go Devices -> ASN.1, you miss out on that.

What's important in ASN.1 is not the data representation itself but the 
ability to have a flexible protocol. We can have it with VMState and 
QObject as well. I do admit that QObject+ASN.1 will ease the way to make 
it right so you convinced me :).

I still don't see have using ASN.1 will easily join/split several 
devices into few and some other magics. Not that it is not possible but 
it is way too hard.

The main 'real' problems you're trying to solve are migration from one 
release to the other while most of our problems were forgotten fields 
here and there (floppy/ide/rtl/kvmclock/etc). I doubt that live 
migration of the same release worked on upstream for the random git 
head. Verifying save(i)== load(i)+save(i+1) is simple but no one 
executing it. Looks like we might be ready to go with your suggestion, 
I'm just worried that there are too many other non migration open 
issues. If the above work won't get complete we're better off with the 
current machine type + VMState + subsections. If it will be all 
completed, we're better with your suggestion.

>
> BTW, another really useful thing that Visitor would enable is the
> ability to read an individual device to a QObject and implement the
> equivalent of 'show devicename' which dumps the state of arbitrary
> devices via QMP. This could be very useful for debugging.
>
>> I do see value in it but I don't
>> think it is that important. If we have one real device serialization
>> method that is flexible enough we can stick with it w/o translation. If
>> we define qdev serialization into vmstate/asn.1/json/other and add some
>> capability negotiation and various other goodies it should be enough.
>>
>> btw: separating the live migration protocol from the machine state is
>> even more important if we take a gradual approach.
>
> Yeah, I think the critical technical requirement to achieve this is that
> the devices need to generate their own serialization format, and then
> another layer translates that to the "live migration protocol" format.
>
> Regards,
>
> Anthony Liguori
>
>>
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>>
>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Anthony Liguori
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-31 21:25                         ` Dor Laor
@ 2011-07-31 21:49                           ` Anthony Liguori
  0 siblings, 0 replies; 47+ messages in thread
From: Anthony Liguori @ 2011-07-31 21:49 UTC (permalink / raw)
  To: dlaor
  Cc: Kevin Wolf, Ryan Harper, Stefan Hajnoczi, quintela, mst,
	qemu-devel, Paolo Bonzini

On 07/31/2011 04:25 PM, Dor Laor wrote:
> On 08/01/2011 12:03 AM, Anthony Liguori wrote:
>> On 07/31/2011 03:57 PM, Dor Laor wrote:
>>> On 07/31/2011 11:43 PM, Anthony Liguori wrote:
>>>>> ps: how hard is to finish the vmstate conversion? Can't we just assume
>>>>> not converted code is not functional and just remove all of it?
>>>>
>>>> No. VMState is a solution looking for a problem. Many important device
>>>
>>> The initial target solved some rare bugs, that tend not to bite us with
>>> virtio. On the way, it got enhanced with subsections that was a major
>>> improvement.
>>
>> I should have qualified my statement. VMState did solve many real
>> problems. I meant that at this point in time, we've gotten pretty much
>> what we can get out it.
>>
>>>
>>>> models are still not converted and ultimately, it doesn't solve the
>>>> problem we're really trying to solve.
>>>
>>> From the start I supported Michael Tisrkin's idea for ASN.1 protocol.
>>> The question is how visitors and ability to translate from one
>>> representation to another will help us.
>>
>> Because with Visitors you can do:
>>
>> Devices -> internal QObject representation -> ASN.1 -> wire -> ASN.1 ->
>> internal QObject representation -> Device.
>
> I admit that QObject sounds more appealing than VMState, we can convert
> all into it. I'm not sure what's the difference between visitor and the
> load/save functions, potentially with enhanced parameters like name
> which can be part of QObject anyway.

VMStateInfo contains

struct VMStateInfo {
     const char *name;
     int (*get)(QEMUFile *f, void *pv, size_t size);
     void (*put)(QEMUFile *f, void *pv, size_t size);
};

It needs to change to:

struct VMStateInfo {
     const char *name;
     void (*visit)(Visitor *v, const char *name, void *pv, size_t size,
                   Error **errp);
};

For each VMStateInfo, like vmstate_info_bool, we go from:


static int get_bool(QEMUFile *f, void *pv, size_t size)
{
     bool *v = pv;
     *v = qemu_get_byte(f);
     return 0;
}

static void put_bool(QEMUFile *f, void *pv, size_t size)
{
     bool *v = pv;
     qemu_put_byte(f, *v);
}

To:

static void visit_bool(Visitor *v, const char *name, void *pv,
                        size_t size, Error **errp)
{
     bool *v = pv;
     visit_type_bool(v, name, v, errp);
}

For non-converted devices, like virtio, we change:
int virtio_load(VirtIODevice *vdev, QEMUFile *f)
{
     int num, i, ret;
     uint32_t features;
     uint32_t supported_features =
         vdev->binding->get_features(vdev->binding_opaque);

     if (vdev->binding->load_config) {
         ret = vdev->binding->load_config(vdev->binding_opaque, f);
         if (ret)
             return ret;
     }

     qemu_get_8s(f, &vdev->status);
     qemu_get_8s(f, &vdev->isr);
     ...

To:

void visit_type_virtio(Visitor *v, VirtIODevice *vdev,
                        const char *name, Error **errp)
{
     int num, i, ret;
     uint32_t features;
     uint32_t supported_features =
         vdev->binding->get_features(vdev->binding_opaque);

     if (vdev->binding->load_config) {
         ret = vdev->binding->load_config(vdev->binding_opaque, f);
         if (ret)
             return ret;
     }

     visit_start_struct(v, "VirtIODevice", name, errp);
     visit_type_u8(v, "status", &vdev->status);
     visit_type_u8(v, "isr", &vdev->isr);
     ...

You'll notice it's almost entirely mechanical.  It can probably be done 
with a few seds and an afternoons worth of grunt work.

I'm resisting the urge to do this myself because it's a good intro task 
and we've got a number of folks looking for those.

>>
>> While it's in an internal representation, we can make large changes like
>> translating entire device state structures to new formats, splitting one
>> device into two, etc.
>>
>> It's sort of the ultimate mechanism to make compatibility changes. If
>> you just go Devices -> ASN.1, you miss out on that.
>
> What's important in ASN.1 is not the data representation itself but the
> ability to have a flexible protocol. We can have it with VMState and
> QObject as well. I do admit that QObject+ASN.1 will ease the way to make
> it right so you convinced me :).
>
> I still don't see have using ASN.1 will easily join/split several
> devices into few and some other magics. Not that it is not possible but
> it is way too hard.

ASN.1 doesn't do it but having an object representation that we can 
manipulate will.  Think of it like a compiler optimization phase, you 
write a visitor that can identify a node, and transform it into a 
different set of nodes.

>
> The main 'real' problems you're trying to solve are migration from one
> release to the other while most of our problems were forgotten fields
> here and there (floppy/ide/rtl/kvmclock/etc). I doubt that live
> migration of the same release worked on upstream for the random git
> head. Verifying save(i)== load(i)+save(i+1) is simple but no one
> executing it.

Because it's not easily automated.  I know it's preaching to the choir, 
but we need better unit tests.  We're getting there though, we know have 
a handful of tests in the tree with hopefully more growing now that 
we're embracing glib.

> Looks like we might be ready to go with your suggestion,
> I'm just worried that there are too many other non migration open
> issues. If the above work won't get complete we're better off with the
> current machine type + VMState + subsections. If it will be all
> completed, we're better with your suggestion.

I think the trick is having a long term vision, but also to divide it 
into shorter term, valuable incremental changes.  Even without the full 
object model stuff, just introducing visitors will mean that we're 
pretty instantly get QMP introspection for all device model objects.

Since the change is pretty small and straight forward, I think QMP 
device model introspection justifies it on it's own.  Being able to do 
essentially a live migration dump (minus memory) in a structured format 
as part of sosreport would have made my life a lot easier numerous times :-)

Regards,

Anthony Liguori

>
>>
>> BTW, another really useful thing that Visitor would enable is the
>> ability to read an individual device to a QObject and implement the
>> equivalent of 'show devicename' which dumps the state of arbitrary
>> devices via QMP. This could be very useful for debugging.
>>
>>> I do see value in it but I don't
>>> think it is that important. If we have one real device serialization
>>> method that is flexible enough we can stick with it w/o translation. If
>>> we define qdev serialization into vmstate/asn.1/json/other and add some
>>> capability negotiation and various other goodies it should be enough.
>>>
>>> btw: separating the live migration protocol from the machine state is
>>> even more important if we take a gradual approach.
>>
>> Yeah, I think the critical technical requirement to achieve this is that
>> the devices need to generate their own serialization format, and then
>> another layer translates that to the "live migration protocol" format.
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>>
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
>>>>
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Anthony Liguori
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-25 21:10 ` [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
  2011-07-25 23:23   ` Anthony Liguori
@ 2011-07-29 13:14   ` Anthony Liguori
  2011-07-29 14:49     ` Paolo Bonzini
  1 sibling, 1 reply; 47+ messages in thread
From: Anthony Liguori @ 2011-07-29 13:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: quintela, qemu-devel, mst

On 07/25/2011 04:10 PM, Paolo Bonzini wrote:
> On Thu, Jun 30, 2011 at 17:46, Paolo Bonzini<pbonzini@redhat.com>  wrote:
> I have now tested this series (exactly as sent) both by examining
> manually the differences between the two formats on the same guest
> state, and by a mix of saves/restores (new on new, 0.14 on new
> pc-0.14, new pc-0.14 on 0.14; also the same combinations on RHEL).  It
> always does what is expected.
>
> Michael Tsirkin objected that the format should be passed as a
> parameter in the migrate command.  I kind of agree, however since this
> is a real bug you would need to bump the default for new machine
> types, and this default would still go in the QEMUMachine struct like
> I am doing.  So I consider the two settings to be orthogonal.  Also,
> the alternative requires changes to the whole management stack and if
> the default is not changed it imposes a broken format unless you
> update the management tools.  Clearly much less bang for the buck.
>
> I think this is ready to go into 0.15.  The bug happens when migrating
> to 0.14 a pc-0.14 machine created with QEMU 0.15 and which has a
> floppy.  The media changed subsection is almost always included, and
> this causes problems when migrating to 0.14 which didn't have any
> subsection for the floppy device.  While QEMU support for migration to
> old version admittedly depends on luck, this isn't true of certain
> downstreams :) which would like to have an unambiguous migration
> format.

I really hate the idea of changing the migration format moments before 
the release.

Since subsections are optional, can't we take the offending subsections, 
remove them, bump the section version numbers and make the fields required?

That "fixes" this issue temporarily without changing the format and we 
can change the format for 1.0.

Regards,

Anthony Liguori

>
> Paolo
>
>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format
  2011-07-29 13:14   ` Anthony Liguori
@ 2011-07-29 14:49     ` Paolo Bonzini
  0 siblings, 0 replies; 47+ messages in thread
From: Paolo Bonzini @ 2011-07-29 14:49 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: quintela, qemu-devel, mst

On 07/29/2011 03:14 PM, Anthony Liguori wrote:
> I really hate the idea of changing the migration format moments before
> the release.

So do I, but that's life.

> Since subsections are optional, can't we take the offending subsections,
> remove them, bump the section version numbers and make the fields required?

The bug happens when you migrate from 0.15 to 0.15, and 0.14 didn't have 
any subsection for that device.  This happens pretty much in all cases 
that were added to 0.15.  It quickly makes a bigger patch than this one, 
and actually one that's harder to review.  At least with this one things 
can only go _royally_ wrong, and any serious automated test would catch it.

Paolo

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2011-08-05 19:41 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-30 15:46 [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 1/4] add support for machine models to specify their " Paolo Bonzini
2011-06-30 18:11   ` Michael S. Tsirkin
2011-07-01  6:10     ` Paolo Bonzini
2011-07-29 13:08   ` Anthony Liguori
2011-07-29 14:35     ` Paolo Bonzini
2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 2/4] add pc-0.14 machine Paolo Bonzini
2011-08-05 19:26   ` Bruce Rogers
2011-08-05 19:41     ` Anthony Liguori
2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 3/4] savevm: define new unambiguous migration format Paolo Bonzini
2011-07-29 13:12   ` Anthony Liguori
2011-07-29 14:35     ` Paolo Bonzini
2011-06-30 15:46 ` [Qemu-devel] [RFC PATCH 4/4] Partially revert "savevm: fix corruption in vmstate_subsection_load()." Paolo Bonzini
2011-07-25 21:10 ` [Qemu-devel] [RFC PATCH 0/4] Fix subsection ambiguity in the migration format Paolo Bonzini
2011-07-25 23:23   ` Anthony Liguori
2011-07-26  9:42     ` Daniel P. Berrange
2011-07-26  9:48     ` Stefan Hajnoczi
2011-07-26 12:51       ` Stefan Hajnoczi
2011-07-26 13:00         ` Anthony Liguori
2011-07-26 12:07     ` Juan Quintela
2011-07-26 12:37       ` Anthony Liguori
2011-07-26 20:13         ` Juan Quintela
2011-07-26 21:46           ` Anthony Liguori
2011-07-26 22:22             ` Peter Maydell
2011-07-26 23:08               ` Anthony Liguori
2011-07-29 14:03         ` Kevin Wolf
2011-07-29 14:28           ` Anthony Liguori
2011-07-29 15:18             ` Kevin Wolf
2011-07-29 22:28               ` Anthony Liguori
2011-07-31 10:48                 ` Dor Laor
2011-07-31 11:37                   ` Peter Maydell
2011-07-31 11:45                     ` Dor Laor
2011-07-31 18:46                       ` Christoph Hellwig
2011-07-31 20:43                         ` Dor Laor
2011-07-31 20:55                           ` Anthony Liguori
2011-07-31 23:10                           ` Christoph Hellwig
2011-08-01  0:15                             ` Anthony Liguori
2011-08-01  7:54                               ` Christoph Hellwig
2011-08-01 13:53                                 ` Anthony Liguori
2011-08-04 14:59                                   ` Luiz Capitulino
2011-07-31 20:43                   ` Anthony Liguori
2011-07-31 20:57                     ` Dor Laor
2011-07-31 21:03                       ` Anthony Liguori
2011-07-31 21:25                         ` Dor Laor
2011-07-31 21:49                           ` Anthony Liguori
2011-07-29 13:14   ` Anthony Liguori
2011-07-29 14:49     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).