All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juan Quintela <quintela@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Michael S . Tsirkin" <mst@redhat.com>,
	"Leonardo Bras" <leobras@redhat.com>,
	"Jiri Denemark" <jdenemar@redhat.com>,
	"Avihai Horon" <avihaih@nvidia.com>,
	"Fiona Ebner" <f.ebner@proxmox.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH v3 3/3] migration/doc: We broke backwards compatibility
Date: Tue, 17 Oct 2023 16:18:04 +0200	[thread overview]
Message-ID: <87sf69z61v.fsf@secure.mitica> (raw)
In-Reply-To: <ZGQZ6A+hQx0+6vBo@x1n> (Peter Xu's message of "Tue, 16 May 2023 20:03:52 -0400")

Peter Xu <peterx@redhat.com> wrote:
> On Mon, May 15, 2023 at 10:32:01AM +0200, Juan Quintela wrote:
>> When we detect that we have broken backwards compantibility in a
>> released version, we can't do anything for that version.  But once we
>> fix that bug on the next released version, we can "mitigate" that
>> problem when migrating to new versions to give a way out of that
>> machine until it does a hard reboot.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>>  docs/devel/migration.rst | 194 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 194 insertions(+)
>> 
>> diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
>> index 95e797ee60..97b6f48474 100644
>> --- a/docs/devel/migration.rst
>> +++ b/docs/devel/migration.rst
>> @@ -451,6 +451,200 @@ binary in both sides of the migration.  If we use different QEMU
>>  versions process, then we need to have into account all other
>>  differences and the examples become even more complicated.
>>  
>> +How to mitigate when we have a backward compatibility error
>> +-----------------------------------------------------------
>> +
>> +We broke migration for old machine types continously during
>
> continuously

done.

>> +development.  But as soon as we find that there is a problem, we fix
>> +it.  The problem is what happens when we detect after we have done a
>> +release that something has gone wrong.
>> +
>> +Let see how it worked with one example.
>> +
>> +After the release of qemu-8.0 we found a problem when doing migration
>> +of the machine type pc-7.2.
>> +
>> +- $ qemu-7.2 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
>> +
>> +  This migration works
>> +
>> +- $ qemu-8.0 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
>> +
>> +  This migration works
>> +
>> +- $ qemu-8.0 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
>> +
>> +  This migration fails
>> +
>> +- $ qemu-7.2 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
>> +
>> +  This migration fails
>> +
>> +So clearly something fails when migration between qemu-7.2 and
>> +qemu-8.0 with machine type pc-7.2.  The error messages, and git bisect
>> +pointed to this commit.
>> +
>> +In qemu-8.0 we got this commit: ::
>> +
>> +    commit 9a6ef182c03eaa138bae553f0fbb5a123bef9a53
>> +    Author: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> +    Date:   Thu Mar 2 13:37:03 2023 +0000
>> +
>> +        hw/pci/aer: Add missing routing for AER errors
>
> Worst timing ever for him.. :(

haha

> The lesson is never break migration when the maintainer has any intention
> to add some docs explaining backward compatibility.

Nah.  Just don't ever break migration and you are safe O:-)

>> +Notice that we enable te feature for new machine types.
>
> the

Done.

>> +                      PCI_ERR_UNC_SEVERITY_DEFAULT);
>> +
>> +I.e. If the property bit is enabled, we configure it as we did for
>> +qemu-8.0.  If the property bit is not set, we configure it as it was in 7.2.
>> +
>> +And now, everything that is missing is disable the feature for old
>
> disabling

Done.

>> +Can we do better?
>> +
>> +Yeap.  If we know that we are gonig to do this migration:
>
> IIUC the other thing one should do is always keep their QEMU binaries
> uptodate.  E.g., anyone seriously using 8.0 released QEMU should always
> consider to do timely upgrade to e.g. 8.0.1 so this issue can also be
> avoided (by dropping 8.0 directly).

I think that advice is valid, but independently of migration.

>> +
>> +- $ qemu-8.0 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2
>> +
>> +We can launche the appropiate devices with
>
> launch
>
>> +
>> +--device...,x-pci-e-err-unc-mask=on
>> +
>> +And now we can receive a migration from 8.0.  And from now on, we can
>> +do that migration to new machine types if we remember to enable that
>> +property for pc-7.2.  Notice that we need to remember, it is not
>> +enough to know that the source of the migration is qemu-8.0.  Think of this example:
>
> (wrap)
>
>> +
>> +$ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 -> qemu-8.2 -M pc-7.2
>> +
>> +In the second migration, the source is not qemu-8.0, but we still have
>> +that "problem" and have that property enabled.  Notice that we need to
>> +continue having this mark/property until we have this machine
>> +rebooted.  But it is not a normal reboot (that don't reload qemu) we
>> +need the mapchine to poweroff/poweron on a fixed qemu.  And from now
>> +on we can use the proper real machine.
>> +
>
> The 8.0.1 breaking migration to 8.0 is a very important point to mention
> indeed.

I hope it was clear on the example.

> Acked-by: Peter Xu <peterx@redhat.com>

Thanks, Juan.



  reply	other threads:[~2023-10-17 14:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15  8:31 [PATCH v3 0/3] Migration documentation Juan Quintela
2023-05-15  8:31 ` [PATCH v3 1/3] migration: Add documentation for backwards compatiblity Juan Quintela
2023-05-16 23:39   ` Peter Xu
2023-05-18  1:47     ` Xiaoyao Li
2023-10-17 13:59       ` Juan Quintela
2023-10-23 11:09     ` Juan Quintela
2023-05-15  8:32 ` [PATCH v3 2/3] migration/docs: How to migrate when hosts have different features Juan Quintela
2023-05-16 23:51   ` Peter Xu
2023-10-17 14:05     ` Juan Quintela
2023-05-17 10:23   ` Michael S. Tsirkin
2023-10-17 14:11     ` Juan Quintela
2023-05-15  8:32 ` [PATCH v3 3/3] migration/doc: We broke backwards compatibility Juan Quintela
2023-05-17  0:03   ` Peter Xu
2023-10-17 14:18     ` Juan Quintela [this message]
2023-05-17  7:09   ` Fiona Ebner
2023-10-23 11:09     ` Juan Quintela
2023-05-17 10:20   ` Michael S. Tsirkin
2023-05-17 11:43     ` Juan Quintela
2023-05-17 11:47       ` Michael S. Tsirkin
2023-05-31 13:23       ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sf69z61v.fsf@secure.mitica \
    --to=quintela@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=berrange@redhat.com \
    --cc=f.ebner@proxmox.com \
    --cc=jdenemar@redhat.com \
    --cc=leobras@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.