From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Gonglei (Arei)" <arei.gonglei@huawei.com>
Cc: Juan Quintela <quintela@redhat.com>,
"pbonzini@redhat.com peterx@redhat.com" <peterx@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"Liujinsong (Paul)" <liu.jinsong@huawei.com>
Subject: Re: [Qemu-devel] About live migration rollback
Date: Thu, 3 Jan 2019 09:26:23 +0000 [thread overview]
Message-ID: <20190103092622.GA2316@work-vm> (raw)
In-Reply-To: <33183CC9F5247A488A2544077AF19020DB1EBE93@dggeml531-mbs.china.huawei.com>
* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi,
>
> >
> > * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > > Hi Dave,
> > >
> > > We discussed some live migration fallback scenarios in this year's KVM forum,
> > > and now I can provide another scenario, perhaps the upstream should
> > consider rolling
> > > back for this situation.
> > >
> > > Environments information:
> > >
> > > host A: cpu E5620(model WestmereEP without flag xsave)
> > > host B: cpu E5-2643(model SandyBridgeEP with flag xsave)
> > >
> > > The reproduce steps is :
> > > 1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
> >
> > Well we don't guarantee migration across -cpu host - does this problem
> > go away if both qemu's are started with matching CPU flags
> > (corresponding to the Westmere) ?
> >
> Sorry, we didn't test other cpu model scenarios since we should assure
> that the live migration support from lower generation CPUs to higher
> generation CPUs. :(
>
>
> > > 2. Migrate the vm to host B when cr4.OSXSAVE=0.
> > > 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
> > > 4. Then migrate the vm to host A successfully, but vm was paused, and qemu
> > printed log as followed:
> > >
> > > KVM: entry failed, hardware error 0x80000021
> > >
> > > If you're running a guest on an Intel machine without unrestricted mode
> > > support, the failure can be most likely due to the guest entering an invalid
> > > state for Intel VT. For example, the guest maybe running in big real mode
> > > which is not supported on less recent Intel processors.
> > >
> > > EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
> > > ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
> > > EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > > ES =0000 00000000 0000ffff 00009300
> > > CS =f000 ffff0000 0000ffff 00009b00
> > > SS =0000 00000000 0000ffff 00009300
> > > DS =0000 00000000 0000ffff 00009300
> > > FS =0000 00000000 0000ffff 00009300
> > > GS =0000 00000000 0000ffff 00009300
> > > LDT=0000 00000000 0000ffff 00008200
> > > TR =0000 00000000 0000ffff 00008b00
> > > GDT= 00000000 0000ffff
> > > IDT= 00000000 0000ffff
> > > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> > > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> > DR3=0000000000000000
> > > DR6=00000000ffff0ff0 DR7=0000000000000400
> > > EFER=0000000000000000
> > > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00
> > >
> > > Problem happened when kvm_put_sregs returns err -22(called by
> > kvm_arch_put_registers(qemu)).
> > >
> > > Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that
> > > guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
> > > We should cancel migration if kvm_arch_put_registers returns error.
> >
> > Do you have a backtrace of when the kvm_arch_put_registers is called
> > when it fails?
>
> The main backtrace is below:
>
> qemu_loadvm_state
> cpu_synchronize_all_post_init --> w/o return value
> cpu_synchronize_post_init --> w/o return value
> kvm_cpu_synchronize_post_init --> w/o return value
> run_on_cpu ---> w/o return value
> do_kvm_cpu_synchronize_post_init --> w/o return value
> kvm_arch_put_registers --> w/ return value
>
> Root cause is some functions don't have return values, the migration thread
> can't detect those failures. Paolo?
OK, so yes it would be great to get return values and get them up to
qemu_loadvm_state; I guess the tricky one is getting the return value
through 'run_on_cpu'.
> > If it's called during the loading of the device state then we should be
> > able to detect it and fail the migration; however if it's only failing
> > after the CPU is restarted after the migration then it's a bit too late.
> >
> Actually the CPUs haven't started in this scenario.
OK, then yes, it's worth trying to fail the migrate.
Dave
> Thanks,
> -Gonglei
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
prev parent reply other threads:[~2019-01-03 9:26 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-19 2:05 [Qemu-devel] About live migration rollback Gonglei (Arei)
2019-01-02 11:29 ` Dr. David Alan Gilbert
2019-01-03 1:30 ` Gonglei (Arei)
2019-01-03 9:26 ` Dr. David Alan Gilbert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190103092622.GA2316@work-vm \
--to=dgilbert@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=liu.jinsong@huawei.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.