qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Juan Quintela <quintela@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Michael S . Tsirkin" <mst@redhat.com>,
	"Leonardo Bras" <leobras@redhat.com>,
	"Jiri Denemark" <jdenemar@redhat.com>,
	"Avihai Horon" <avihaih@nvidia.com>,
	"Fiona Ebner" <f.ebner@proxmox.com>,
	"Daniel P . Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH v3 3/3] migration/doc: We broke backwards compatibility
Date: Tue, 17 Oct 2023 16:18:04 +0200	[thread overview]
Message-ID: <87sf69z61v.fsf@secure.mitica> (raw)
In-Reply-To: <ZGQZ6A+hQx0+6vBo@x1n> (Peter Xu's message of "Tue, 16 May 2023 20:03:52 -0400")

Peter Xu <peterx@redhat.com> wrote:
> On Mon, May 15, 2023 at 10:32:01AM +0200, Juan Quintela wrote:
>> When we detect that we have broken backwards compantibility in a
>> released version, we can't do anything for that version.  But once we
>> fix that bug on the next released version, we can "mitigate" that
>> problem when migrating to new versions to give a way out of that
>> machine until it does a hard reboot.
>> 
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>>  docs/devel/migration.rst | 194 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 194 insertions(+)
>> 
>> diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
>> index 95e797ee60..97b6f48474 100644
>> --- a/docs/devel/migration.rst
>> +++ b/docs/devel/migration.rst
>> @@ -451,6 +451,200 @@ binary in both sides of the migration.  If we use different QEMU
>>  versions process, then we need to have into account all other
>>  differences and the examples become even more complicated.
>>  
>> +How to mitigate when we have a backward compatibility error
>> +-----------------------------------------------------------
>> +
>> +We broke migration for old machine types continously during
>
> continuously

done.

>> +development.  But as soon as we find that there is a problem, we fix
>> +it.  The problem is what happens when we detect after we have done a
>> +release that something has gone wrong.
>> +
>> +Let see how it worked with one example.
>> +
>> +After the release of qemu-8.0 we found a problem when doing migration
>> +of the machine type pc-7.2.
>> +
>> +- $ qemu-7.2 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
>> +
>> +  This migration works
>> +
>> +- $ qemu-8.0 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
>> +
>> +  This migration works
>> +
>> +- $ qemu-8.0 -M pc-7.2  ->  qemu-7.2 -M pc-7.2
>> +
>> +  This migration fails
>> +
>> +- $ qemu-7.2 -M pc-7.2  ->  qemu-8.0 -M pc-7.2
>> +
>> +  This migration fails
>> +
>> +So clearly something fails when migration between qemu-7.2 and
>> +qemu-8.0 with machine type pc-7.2.  The error messages, and git bisect
>> +pointed to this commit.
>> +
>> +In qemu-8.0 we got this commit: ::
>> +
>> +    commit 9a6ef182c03eaa138bae553f0fbb5a123bef9a53
>> +    Author: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> +    Date:   Thu Mar 2 13:37:03 2023 +0000
>> +
>> +        hw/pci/aer: Add missing routing for AER errors
>
> Worst timing ever for him.. :(

haha

> The lesson is never break migration when the maintainer has any intention
> to add some docs explaining backward compatibility.

Nah.  Just don't ever break migration and you are safe O:-)

>> +Notice that we enable te feature for new machine types.
>
> the

Done.

>> +                      PCI_ERR_UNC_SEVERITY_DEFAULT);
>> +
>> +I.e. If the property bit is enabled, we configure it as we did for
>> +qemu-8.0.  If the property bit is not set, we configure it as it was in 7.2.
>> +
>> +And now, everything that is missing is disable the feature for old
>
> disabling

Done.

>> +Can we do better?
>> +
>> +Yeap.  If we know that we are gonig to do this migration:
>
> IIUC the other thing one should do is always keep their QEMU binaries
> uptodate.  E.g., anyone seriously using 8.0 released QEMU should always
> consider to do timely upgrade to e.g. 8.0.1 so this issue can also be
> avoided (by dropping 8.0 directly).

I think that advice is valid, but independently of migration.

>> +
>> +- $ qemu-8.0 -M pc-7.2  ->  qemu-8.0.1 -M pc-7.2
>> +
>> +We can launche the appropiate devices with
>
> launch
>
>> +
>> +--device...,x-pci-e-err-unc-mask=on
>> +
>> +And now we can receive a migration from 8.0.  And from now on, we can
>> +do that migration to new machine types if we remember to enable that
>> +property for pc-7.2.  Notice that we need to remember, it is not
>> +enough to know that the source of the migration is qemu-8.0.  Think of this example:
>
> (wrap)
>
>> +
>> +$ qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2 -> qemu-8.2 -M pc-7.2
>> +
>> +In the second migration, the source is not qemu-8.0, but we still have
>> +that "problem" and have that property enabled.  Notice that we need to
>> +continue having this mark/property until we have this machine
>> +rebooted.  But it is not a normal reboot (that don't reload qemu) we
>> +need the mapchine to poweroff/poweron on a fixed qemu.  And from now
>> +on we can use the proper real machine.
>> +
>
> The 8.0.1 breaking migration to 8.0 is a very important point to mention
> indeed.

I hope it was clear on the example.

> Acked-by: Peter Xu <peterx@redhat.com>

Thanks, Juan.



  reply	other threads:[~2023-10-17 14:19 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15  8:31 [PATCH v3 0/3] Migration documentation Juan Quintela
2023-05-15  8:31 ` [PATCH v3 1/3] migration: Add documentation for backwards compatiblity Juan Quintela
2023-05-16 23:39   ` Peter Xu
2023-05-18  1:47     ` Xiaoyao Li
2023-10-17 13:59       ` Juan Quintela
2023-10-23 11:09     ` Juan Quintela
2023-05-15  8:32 ` [PATCH v3 2/3] migration/docs: How to migrate when hosts have different features Juan Quintela
2023-05-16 23:51   ` Peter Xu
2023-10-17 14:05     ` Juan Quintela
2023-05-17 10:23   ` Michael S. Tsirkin
2023-10-17 14:11     ` Juan Quintela
2023-05-15  8:32 ` [PATCH v3 3/3] migration/doc: We broke backwards compatibility Juan Quintela
2023-05-17  0:03   ` Peter Xu
2023-10-17 14:18     ` Juan Quintela [this message]
2023-05-17  7:09   ` Fiona Ebner
2023-10-23 11:09     ` Juan Quintela
2023-05-17 10:20   ` Michael S. Tsirkin
2023-05-17 11:43     ` Juan Quintela
2023-05-17 11:47       ` Michael S. Tsirkin
2023-05-31 13:23       ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sf69z61v.fsf@secure.mitica \
    --to=quintela@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=berrange@redhat.com \
    --cc=f.ebner@proxmox.com \
    --cc=jdenemar@redhat.com \
    --cc=leobras@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).