All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Liran Alon <liran.alon@oracle.com>
Cc: kvm list <kvm@vger.kernel.org>,
	qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: QEMU/KVM migration backwards compatibility broken?
Date: Thu, 6 Jun 2019 09:42:22 +0100	[thread overview]
Message-ID: <20190606084222.GA2788@work-vm> (raw)
In-Reply-To: <38B8F53B-F993-45C3-9A82-796A0D4A55EC@oracle.com>

* Liran Alon (liran.alon@oracle.com) wrote:
> Hi,
> 
> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
> 
> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
> is not going to be restored properly on destination.
> 
> I’m puzzled about what will happen in the following scenario:
> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
> 
> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
> However, it seems from current QEMU code that this will actually succeed in many cases.
> 
> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
> Therefore, migration will succeed even though it should have failed…
> 
> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
> and otherwise fail migration.
> 
> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?

I don't know the x86 specific side that much; but from my migration side
the answer should mostly be through machine types - indeed for smi-count
there's a property 'x-migrate-smi-count' which is off for machine types
pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
kernel you should stick to the old machine types.

There's nothing guarding running the new machine type on old-kernels;
and arguably we should have a check at startup that complains if
your kernel is missing something the machine type uses.
However, that would mean that people running with -M pc   would fail
on old kernels.

A post-load is also a valid check; but one question is whether,
for a particular register, the pain is worth it - it depends on the
symptom that the missing state causes.  If it's minor then you might
conclude it's not worth a failed migration;  if it's a hung or
corrupt guest then yes it is.   Certainly a warning printed is worth
it.

Dave

> Thanks,
> -Liran
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

WARNING: multiple messages have this Message-ID (diff)
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Liran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org, kvm list <kvm@vger.kernel.org>
Subject: Re: [Qemu-devel] QEMU/KVM migration backwards compatibility broken?
Date: Thu, 6 Jun 2019 09:42:22 +0100	[thread overview]
Message-ID: <20190606084222.GA2788@work-vm> (raw)
In-Reply-To: <38B8F53B-F993-45C3-9A82-796A0D4A55EC@oracle.com>

* Liran Alon (liran.alon@oracle.com) wrote:
> Hi,
> 
> Looking at QEMU source code, I am puzzled regarding how migration backwards compatibility is preserved regarding X86CPU.
> 
> As I understand it, fields that are based on KVM capabilities and guest runtime usage are defined in VMState subsections in order to not send them if not necessary.
> This is done such that in case they are not needed and we migrate to an old QEMU which don’t support loading this state, migration will still succeed
> (As .needed() method will return false and therefore this state won’t be sent as part of migration stream).
> Furthermore, in case .needed() returns true and old QEMU don’t support loading this state, migration fails. As it should because we are aware that guest state
> is not going to be restored properly on destination.
> 
> I’m puzzled about what will happen in the following scenario:
> 1) Source is running new QEMU with new KVM that supports save of some VMState subsection.
> 2) Destination is running new QEMU that supports load this state but with old kernel that doesn’t know how to load this state.
> 
> I would have expected in this case that if source .needed() returns true, then migration will fail because of lack of support in destination kernel.
> However, it seems from current QEMU code that this will actually succeed in many cases.
> 
> For example, if msr_smi_count is sent as part of migration stream (See vmstate_msr_smi_count) and destination have has_msr_smi_count==false,
> then destination will succeed loading migration stream but kvm_put_msrs() will actually ignore env->msr_smi_count and will successfully load guest state.
> Therefore, migration will succeed even though it should have failed…
> 
> It seems to me that QEMU should have for every such VMState subsection, a .post_load() method that verifies that relevant capability is supported by kernel
> and otherwise fail migration.
> 
> What do you think? Should I really create a patch to modify all these CPUX86 VMState subsections to behave like this?

I don't know the x86 specific side that much; but from my migration side
the answer should mostly be through machine types - indeed for smi-count
there's a property 'x-migrate-smi-count' which is off for machine types
pre 2.11 (see hw/i386/pc.c pc_compat_2_11) - so if you've got an old
kernel you should stick to the old machine types.

There's nothing guarding running the new machine type on old-kernels;
and arguably we should have a check at startup that complains if
your kernel is missing something the machine type uses.
However, that would mean that people running with -M pc   would fail
on old kernels.

A post-load is also a valid check; but one question is whether,
for a particular register, the pain is worth it - it depends on the
symptom that the missing state causes.  If it's minor then you might
conclude it's not worth a failed migration;  if it's a hung or
corrupt guest then yes it is.   Certainly a warning printed is worth
it.

Dave

> Thanks,
> -Liran
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


  reply	other threads:[~2019-06-06  8:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-06  0:09 QEMU/KVM migration backwards compatibility broken? Liran Alon
2019-06-06  0:09 ` [Qemu-devel] " Liran Alon
2019-06-06  8:42 ` Dr. David Alan Gilbert [this message]
2019-06-06  8:42   ` Dr. David Alan Gilbert
2019-06-06  9:11   ` Liran Alon
2019-06-06  9:11     ` [Qemu-devel] " Liran Alon
2019-06-06  9:23     ` Dr. David Alan Gilbert
2019-06-06  9:23       ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06 10:09       ` Liran Alon
2019-06-06 10:09         ` [Qemu-devel] " Liran Alon
2019-06-06 10:39         ` Dr. David Alan Gilbert
2019-06-06 10:39           ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06 10:57           ` Liran Alon
2019-06-06 10:57             ` [Qemu-devel] " Liran Alon
2019-06-06 11:07             ` Dr. David Alan Gilbert
2019-06-06 11:07               ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06 11:29               ` Liran Alon
2019-06-06 11:29                 ` [Qemu-devel] " Liran Alon
2019-06-06 13:31                 ` Dr. David Alan Gilbert
2019-06-06 13:31                   ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06 15:16                   ` Liran Alon
2019-06-06 15:16                     ` [Qemu-devel] " Liran Alon
2019-06-10  9:44                     ` Dr. David Alan Gilbert
2019-06-10  9:44                       ` [Qemu-devel] " Dr. David Alan Gilbert
2019-06-06 13:13         ` Roman Kagan
2019-06-06 13:13           ` Roman Kagan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190606084222.GA2788@work-vm \
    --to=dgilbert@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=liran.alon@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.