* [Qemu-devel] About live migration rollback
@ 2018-12-19 2:05 Gonglei (Arei)
2019-01-02 11:29 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 4+ messages in thread
From: Gonglei (Arei) @ 2018-12-19 2:05 UTC (permalink / raw)
To: dgilbert@redhat.com, Juan Quintela, peterx@redhat.com
Cc: qemu-devel@nongnu.org, Liujinsong (Paul)
Hi Dave,
We discussed some live migration fallback scenarios in this year's KVM forum,
and now I can provide another scenario, perhaps the upstream should consider rolling
back for this situation.
Environments information:
host A: cpu E5620(model WestmereEP without flag xsave)
host B: cpu E5-2643(model SandyBridgeEP with flag xsave)
The reproduce steps is :
1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
2. Migrate the vm to host B when cr4.OSXSAVE=0.
3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
4. Then migrate the vm to host A successfully, but vm was paused, and qemu printed log as followed:
KVM: entry failed, hardware error 0x80000021
If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.
EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 0000ffff 00009300
CS =f000 ffff0000 0000ffff 00009b00
SS =0000 00000000 0000ffff 00009300
DS =0000 00000000 0000ffff 00009300
FS =0000 00000000 0000ffff 00009300
GS =0000 00000000 0000ffff 00009300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 00000000 0000ffff
IDT= 00000000 0000ffff
CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
DR6=00000000ffff0ff0 DR7=0000000000000400
EFER=0000000000000000
Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Problem happened when kvm_put_sregs returns err -22(called by kvm_arch_put_registers(qemu)).
Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that
guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
We should cancel migration if kvm_arch_put_registers returns error.
Thanks,
-Gonglei
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] About live migration rollback
2018-12-19 2:05 [Qemu-devel] About live migration rollback Gonglei (Arei)
@ 2019-01-02 11:29 ` Dr. David Alan Gilbert
2019-01-03 1:30 ` Gonglei (Arei)
0 siblings, 1 reply; 4+ messages in thread
From: Dr. David Alan Gilbert @ 2019-01-02 11:29 UTC (permalink / raw)
To: Gonglei (Arei)
Cc: Juan Quintela, pbonzini@redhat.com peterx@redhat.com,
qemu-devel@nongnu.org, Liujinsong (Paul)
* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi Dave,
>
> We discussed some live migration fallback scenarios in this year's KVM forum,
> and now I can provide another scenario, perhaps the upstream should consider rolling
> back for this situation.
>
> Environments information:
>
> host A: cpu E5620(model WestmereEP without flag xsave)
> host B: cpu E5-2643(model SandyBridgeEP with flag xsave)
>
> The reproduce steps is :
> 1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
Well we don't guarantee migration across -cpu host - does this problem
go away if both qemu's are started with matching CPU flags
(corresponding to the Westmere) ?
> 2. Migrate the vm to host B when cr4.OSXSAVE=0.
> 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
> 4. Then migrate the vm to host A successfully, but vm was paused, and qemu printed log as followed:
>
> KVM: entry failed, hardware error 0x80000021
>
> If you're running a guest on an Intel machine without unrestricted mode
> support, the failure can be most likely due to the guest entering an invalid
> state for Intel VT. For example, the guest maybe running in big real mode
> which is not supported on less recent Intel processors.
>
> EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
> ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
> EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0000 00000000 0000ffff 00009300
> CS =f000 ffff0000 0000ffff 00009b00
> SS =0000 00000000 0000ffff 00009300
> DS =0000 00000000 0000ffff 00009300
> FS =0000 00000000 0000ffff 00009300
> GS =0000 00000000 0000ffff 00009300
> LDT=0000 00000000 0000ffff 00008200
> TR =0000 00000000 0000ffff 00008b00
> GDT= 00000000 0000ffff
> IDT= 00000000 0000ffff
> CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
> DR6=00000000ffff0ff0 DR7=0000000000000400
> EFER=0000000000000000
> Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>
> Problem happened when kvm_put_sregs returns err -22(called by kvm_arch_put_registers(qemu)).
>
> Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that
> guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
> We should cancel migration if kvm_arch_put_registers returns error.
Do you have a backtrace of when the kvm_arch_put_registers is called
when it fails?
If it's called during the loading of the device state then we should be
able to detect it and fail the migration; however if it's only failing
after the CPU is restarted after the migration then it's a bit too late.
Dave
> Thanks,
> -Gonglei
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] About live migration rollback
2019-01-02 11:29 ` Dr. David Alan Gilbert
@ 2019-01-03 1:30 ` Gonglei (Arei)
2019-01-03 9:26 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 4+ messages in thread
From: Gonglei (Arei) @ 2019-01-03 1:30 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Juan Quintela, pbonzini@redhat.com peterx@redhat.com,
qemu-devel@nongnu.org, Liujinsong (Paul)
Hi,
>
> * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > Hi Dave,
> >
> > We discussed some live migration fallback scenarios in this year's KVM forum,
> > and now I can provide another scenario, perhaps the upstream should
> consider rolling
> > back for this situation.
> >
> > Environments information:
> >
> > host A: cpu E5620(model WestmereEP without flag xsave)
> > host B: cpu E5-2643(model SandyBridgeEP with flag xsave)
> >
> > The reproduce steps is :
> > 1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
>
> Well we don't guarantee migration across -cpu host - does this problem
> go away if both qemu's are started with matching CPU flags
> (corresponding to the Westmere) ?
>
Sorry, we didn't test other cpu model scenarios since we should assure
that the live migration support from lower generation CPUs to higher
generation CPUs. :(
> > 2. Migrate the vm to host B when cr4.OSXSAVE=0.
> > 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
> > 4. Then migrate the vm to host A successfully, but vm was paused, and qemu
> printed log as followed:
> >
> > KVM: entry failed, hardware error 0x80000021
> >
> > If you're running a guest on an Intel machine without unrestricted mode
> > support, the failure can be most likely due to the guest entering an invalid
> > state for Intel VT. For example, the guest maybe running in big real mode
> > which is not supported on less recent Intel processors.
> >
> > EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
> > ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
> > EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =0000 00000000 0000ffff 00009300
> > CS =f000 ffff0000 0000ffff 00009b00
> > SS =0000 00000000 0000ffff 00009300
> > DS =0000 00000000 0000ffff 00009300
> > FS =0000 00000000 0000ffff 00009300
> > GS =0000 00000000 0000ffff 00009300
> > LDT=0000 00000000 0000ffff 00008200
> > TR =0000 00000000 0000ffff 00008b00
> > GDT= 00000000 0000ffff
> > IDT= 00000000 0000ffff
> > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> DR3=0000000000000000
> > DR6=00000000ffff0ff0 DR7=0000000000000400
> > EFER=0000000000000000
> > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> >
> > Problem happened when kvm_put_sregs returns err -22(called by
> kvm_arch_put_registers(qemu)).
> >
> > Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that
> > guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
> > We should cancel migration if kvm_arch_put_registers returns error.
>
> Do you have a backtrace of when the kvm_arch_put_registers is called
> when it fails?
The main backtrace is below:
qemu_loadvm_state
cpu_synchronize_all_post_init --> w/o return value
cpu_synchronize_post_init --> w/o return value
kvm_cpu_synchronize_post_init --> w/o return value
run_on_cpu ---> w/o return value
do_kvm_cpu_synchronize_post_init --> w/o return value
kvm_arch_put_registers --> w/ return value
Root cause is some functions don't have return values, the migration thread
can't detect those failures. Paolo?
> If it's called during the loading of the device state then we should be
> able to detect it and fail the migration; however if it's only failing
> after the CPU is restarted after the migration then it's a bit too late.
>
Actually the CPUs haven't started in this scenario.
Thanks,
-Gonglei
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] About live migration rollback
2019-01-03 1:30 ` Gonglei (Arei)
@ 2019-01-03 9:26 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 4+ messages in thread
From: Dr. David Alan Gilbert @ 2019-01-03 9:26 UTC (permalink / raw)
To: Gonglei (Arei)
Cc: Juan Quintela, pbonzini@redhat.com peterx@redhat.com,
qemu-devel@nongnu.org, Liujinsong (Paul)
* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi,
>
> >
> > * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > > Hi Dave,
> > >
> > > We discussed some live migration fallback scenarios in this year's KVM forum,
> > > and now I can provide another scenario, perhaps the upstream should
> > consider rolling
> > > back for this situation.
> > >
> > > Environments information:
> > >
> > > host A: cpu E5620(model WestmereEP without flag xsave)
> > > host B: cpu E5-2643(model SandyBridgeEP with flag xsave)
> > >
> > > The reproduce steps is :
> > > 1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
> >
> > Well we don't guarantee migration across -cpu host - does this problem
> > go away if both qemu's are started with matching CPU flags
> > (corresponding to the Westmere) ?
> >
> Sorry, we didn't test other cpu model scenarios since we should assure
> that the live migration support from lower generation CPUs to higher
> generation CPUs. :(
>
>
> > > 2. Migrate the vm to host B when cr4.OSXSAVE=0.
> > > 3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
> > > 4. Then migrate the vm to host A successfully, but vm was paused, and qemu
> > printed log as followed:
> > >
> > > KVM: entry failed, hardware error 0x80000021
> > >
> > > If you're running a guest on an Intel machine without unrestricted mode
> > > support, the failure can be most likely due to the guest entering an invalid
> > > state for Intel VT. For example, the guest maybe running in big real mode
> > > which is not supported on less recent Intel processors.
> > >
> > > EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
> > > ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
> > > EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > > ES =0000 00000000 0000ffff 00009300
> > > CS =f000 ffff0000 0000ffff 00009b00
> > > SS =0000 00000000 0000ffff 00009300
> > > DS =0000 00000000 0000ffff 00009300
> > > FS =0000 00000000 0000ffff 00009300
> > > GS =0000 00000000 0000ffff 00009300
> > > LDT=0000 00000000 0000ffff 00008200
> > > TR =0000 00000000 0000ffff 00008b00
> > > GDT= 00000000 0000ffff
> > > IDT= 00000000 0000ffff
> > > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
> > > DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
> > DR3=0000000000000000
> > > DR6=00000000ffff0ff0 DR7=0000000000000400
> > > EFER=0000000000000000
> > > Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 00
> > >
> > > Problem happened when kvm_put_sregs returns err -22(called by
> > kvm_arch_put_registers(qemu)).
> > >
> > > Because kvm_arch_vcpu_ioctl_set_sregs(kvm module) checked that
> > > guest_cpuid_has no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
> > > We should cancel migration if kvm_arch_put_registers returns error.
> >
> > Do you have a backtrace of when the kvm_arch_put_registers is called
> > when it fails?
>
> The main backtrace is below:
>
> qemu_loadvm_state
> cpu_synchronize_all_post_init --> w/o return value
> cpu_synchronize_post_init --> w/o return value
> kvm_cpu_synchronize_post_init --> w/o return value
> run_on_cpu ---> w/o return value
> do_kvm_cpu_synchronize_post_init --> w/o return value
> kvm_arch_put_registers --> w/ return value
>
> Root cause is some functions don't have return values, the migration thread
> can't detect those failures. Paolo?
OK, so yes it would be great to get return values and get them up to
qemu_loadvm_state; I guess the tricky one is getting the return value
through 'run_on_cpu'.
> > If it's called during the loading of the device state then we should be
> > able to detect it and fail the migration; however if it's only failing
> > after the CPU is restarted after the migration then it's a bit too late.
> >
> Actually the CPUs haven't started in this scenario.
OK, then yes, it's worth trying to fail the migrate.
Dave
> Thanks,
> -Gonglei
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-01-03 9:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-19 2:05 [Qemu-devel] About live migration rollback Gonglei (Arei)
2019-01-02 11:29 ` Dr. David Alan Gilbert
2019-01-03 1:30 ` Gonglei (Arei)
2019-01-03 9:26 ` Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).