Re: [Qemu-devel] About live migration rollback

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Gonglei (Arei)" <arei.gonglei@huawei.com>
Cc: Juan Quintela <quintela@redhat.com>,
	"pbonzini@redhat.com peterx@redhat.com" <peterx@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Liujinsong (Paul)" <liu.jinsong@huawei.com>
Subject: Re: [Qemu-devel] About live migration rollback
Date: Thu, 3 Jan 2019 09:26:23 +0000	[thread overview]
Message-ID: <20190103092622.GA2316@work-vm> (raw)
In-Reply-To: <33183CC9F5247A488A2544077AF19020DB1EBE93@dggeml531-mbs.china.huawei.com>

* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi,
> 
> > 
> > * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > > Hi Dave,
> > >
> > > We discussed some live migration fallback scenarios in this year's KVM forum,
> > > and now I can provide another scenario, perhaps the upstream should
> > consider rolling
> > > back for this situation.
> > >
> > > Environments information:
> > >
> > > host A: cpu E5620(model WestmereEP without flag xsave)
> > > host B: cpu E5-2643(model SandyBridgeEP with flag xsave)
> > >
> > > The reproduce steps is :
> > > 1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
> > 
> > Well we don't guarantee migration across -cpu host - does this problem
> > go away if both qemu's are started with matching CPU flags
> > (corresponding to the Westmere) ?
> > 
> Sorry, we didn't test other cpu model scenarios since we should assure
> that the live migration support from lower generation CPUs to higher
> generation CPUs. :(
> 
> 
> > > 2. Migrate the vm to host B when cr4.OSXSAVE=0.
> > > 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
> > > 4. Then migrate the vm to host A successfully, but vm was paused, and qemu
> > printed log as followed:
> > >
> > > KVM: entry failed, hardware error 0x80000021
> > >
> > > If you're running a guest on an Intel machine without unrestricted mode
> > > support, the failure can be most likely due to the guest entering an invalid
> > > state for Intel VT. For example, the guest maybe running in big real mode
> > > which is not supported on less recent Intel processors.
> > >
> > > EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
> > > ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
> > > EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > > ES =0000 00000000 0000ffff 00009300
> > > CS =f000 ffff0000 0000ffff 00009b00
> > > SS =0000 00000000 0000ffff 00009300
> > > DS =0000 00000000 0000ffff 00009300
> > > FS =0000 00000000 0000ffff 00009300
> > > GS =0000 00000000 0000ffff 00009300
> > > LDT=0000 00000000 0000ffff 00008200
> > > TR =0000 00000000 0000ffff 00008b00
> > > GDT=     00000000 0000ffff
> > > IDT=     00000000 0000ffff
> > > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> > > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> > DR3=0000000000000000
> > > DR6=00000000ffff0ff0 DR7=0000000000000400
> > > EFER=0000000000000000
> > > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00
> > >
> > > Problem happened when kvm_put_sregs returns err -22(called by
> > kvm_arch_put_registers(qemu)).
> > >
> > > Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that
> > > guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
> > > We should cancel migration if kvm_arch_put_registers returns error.
> > 
> > Do you have a backtrace of when the kvm_arch_put_registers is called
> > when it fails?
> 
> The main backtrace is below:
> 
>  qemu_loadvm_state
>      cpu_synchronize_all_post_init    --> w/o return value
>          cpu_synchronize_post_init   --> w/o return value
>              kvm_cpu_synchronize_post_init  --> w/o return value
> 				run_on_cpu  ---> w/o return value
> 				   do_kvm_cpu_synchronize_post_init  --> w/o return value
>                       kvm_arch_put_registers  --> w/ return value
> 
> Root cause is some functions don't have return values, the migration thread
> can't detect those failures. Paolo?

OK, so yes it would be great to get return values and get them up to
qemu_loadvm_state;  I guess the tricky one is getting the return value
through 'run_on_cpu'.

> > If it's called during the loading of the device state then we should be
> > able to detect it and fail the migration; however if it's only failing
> > after the CPU is restarted after the migration then it's a bit too late.
> > 
> Actually the CPUs haven't started in this scenario.

OK, then yes, it's worth trying to fail the migrate.

Dave

> Thanks,
> -Gonglei
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

     prev parent reply	other threads:[~2019-01-03  9:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-19  2:05 [Qemu-devel] About live migration rollback Gonglei (Arei)
2019-01-02 11:29 ` Dr. David Alan Gilbert
2019-01-03  1:30   ` Gonglei (Arei)
2019-01-03  9:26     ` Dr. David Alan Gilbert [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190103092622.GA2316@work-vm \
    --to=dgilbert@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=liu.jinsong@huawei.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).