xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xen: fix reboot with a PVH Dom0
@ 2014-06-04 16:47 Roger Pau Monne
  2014-06-04 16:54 ` Andrew Cooper
  2014-06-05  7:30 ` Jan Beulich
  0 siblings, 2 replies; 3+ messages in thread
From: Roger Pau Monne @ 2014-06-04 16:47 UTC (permalink / raw)
  To: xen-devel; +Cc: Keir Fraser, Jan Beulich, Roger Pau Monne

On stop_this_cpu we clear the TS flag from CR0 in order to clear the
FPU, and Xen leaves the TS bit unset after that. This seems to cause
problems when running a PVH Dom0, since some VMX routines expect
the TS flag to be set, and can lead to the following ASSERT in Xen:

(XEN) Domain 0 shutdown: rebooting machine.
(XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000080050033   rbx: ffff8300dfb1c000   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: ffff82d0802d7fc0   rdi: ffff8300dfb1c000
(XEN) rbp: ffff82d0802d7a88   rsp: ffff82d0802d7a78   r8:  0000000000000000
(XEN) r9:  ffff82cffffff000   r10: 0000000b06dca869   r11: 0000003d7d708160
(XEN) r12: ffff8300dfb1c000   r13: 0000000000000000   r14: ffff82d0802d0000
(XEN) r15: ffff82d0802d7f18   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 000000019ed8d000   cr2: 0000000800dcb040
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff82d0802d7a78:
(XEN)    ffff8300dfdf2000 ffff8300dfdf2000 ffff82d0802d7ad8 ffff82d08015d129
(XEN)    fffffe003d272a38 ffff83019a3f9140 0000000000000000 0000000000000001
(XEN)    0000000000000086 000000000019a400 0000000000000000 0001000000010030
(XEN)    ffff82d0802d7af8 ffff82d080160acf ffff83019ec18530 ffff8300dfdf2000
(XEN)    ffff82d0802d7b08 ffff82d080160af9 ffff82d0802d7b58 ffff82d080161728
(XEN)    ffff82d0802d7b48 ffff82d08013ffbf 0000000000000002 ffff83019ec18530
(XEN)    0000000000000000 0000000000000012 0000000000000000 0001000000010030
(XEN)    ffff82d0802d7b68 ffff82d08014e721 ffff82d0802d7bc8 ffff82d08014cda2
(XEN)    ffff8300dfb1c000 0000000000000092 ffff83019ec18604 ffff83019ec185f8
(XEN)    ffff82d0802d0000 0000000000000001 0000000000000000 ffff82d08016560e
(XEN)    ffff82d080272860 0000000000000020 ffff82d0802d7bd8 ffff82d0801448a8
(XEN)    ffff82d0802d7be8 ffff82d080165625 ffff82d0802d7c18 ffff82d080166143
(XEN)    0000000000000000 0000000000000001 0000000000000000 ffff82d080272860
(XEN)    ffff82d0802d7c48 ffff82d080166aa8 ffff8300dfb1c060 0000000000010000
(XEN)    0000000000000001 ffff82d080272860 ffff82d0802d7c78 ffff82d080166bae
(XEN)    000000000000000a ffff82d080276fe0 00000000000000fb 0000000000000061
(XEN)    ffff82d0802d7c98 ffff82d080166f63 ffff82d0802d7c98 ffff82d0801821ff
(XEN)    ffff82d0802d7cb8 ffff82d08018228b 0000000000000000 ffff82d0802d7dd8
(XEN)    ffff82d0802d7cf8 ffff82d080181aa7 ffff82d0802d7d08 0000000000000206
(XEN)    0000000000000000 ffff82d0802d7dd8 00000000000000fb 0000000000000008
(XEN) Xen call trace:
(XEN)    [<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
(XEN)    [<ffff82d08015d129>] __context_switch+0x127/0x462
(XEN)    [<ffff82d080160acf>] __sync_local_execstate+0x6a/0x8b
(XEN)    [<ffff82d080160af9>] sync_local_execstate+0x9/0xb
(XEN)    [<ffff82d080161728>] map_domain_page+0x88/0x4de
(XEN)    [<ffff82d08014e721>] map_vtd_domain_page+0xd/0xf
(XEN)    [<ffff82d08014cda2>] io_apic_read_remap_rte+0x158/0x29f
(XEN)    [<ffff82d0801448a8>] iommu_read_apic_from_ire+0x27/0x29
(XEN)    [<ffff82d080165625>] io_apic_read+0x17/0x65
(XEN)    [<ffff82d080166143>] __ioapic_read_entry+0x38/0x61
(XEN)    [<ffff82d080166aa8>] clear_IO_APIC_pin+0x1a/0xf3
(XEN)    [<ffff82d080166bae>] clear_IO_APIC+0x2d/0x60
(XEN)    [<ffff82d080166f63>] disable_IO_APIC+0xd/0x81
(XEN)    [<ffff82d08018228b>] smp_send_stop+0x58/0x68
(XEN)    [<ffff82d080181aa7>] machine_restart+0x80/0x20a
(XEN)    [<ffff82d080181c3c>] __machine_restart+0xb/0xf
(XEN)    [<ffff82d080128fb9>] smp_call_function_interrupt+0x99/0xc0
(XEN)    [<ffff82d080182330>] call_function_interrupt+0x33/0x43
(XEN)    [<ffff82d08016bd89>] do_IRQ+0x9e/0x63a
(XEN)    [<ffff82d08016406f>] common_interrupt+0x5f/0x70
(XEN)    [<ffff82d0801a8600>] mwait_idle+0x29c/0x2f7
(XEN)    [<ffff82d08015cf67>] idle_loop+0x58/0x76
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

The commit that introduced the clts and the FPU clearing (53a82f)
notes that this is needed to workaround some broken BIOSes, but
there's no mention about which specific BIOSes have this issue, so I'm
uncertain if the following fix is appropiate, or if those broken
BIOSes were only found on i386 hardware which Xen does no longer
support.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Mukesh Rathor <mukesh.rathor@oracle.com>
Cc: Keir Fraser <keir@xen.org>
---
 xen/arch/x86/smp.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 0433f30..0226d04 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -295,6 +295,7 @@ void __stop_this_cpu(void)
      */
     clts();
     asm volatile ( "fninit" );
+    stts();
 
     cpumask_clear_cpu(smp_processor_id(), &cpu_online_map);
 }
-- 
1.7.7.5 (Apple Git-26)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] xen: fix reboot with a PVH Dom0
  2014-06-04 16:47 [PATCH] xen: fix reboot with a PVH Dom0 Roger Pau Monne
@ 2014-06-04 16:54 ` Andrew Cooper
  2014-06-05  7:30 ` Jan Beulich
  1 sibling, 0 replies; 3+ messages in thread
From: Andrew Cooper @ 2014-06-04 16:54 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Keir Fraser, Jan Beulich

On 04/06/14 17:47, Roger Pau Monne wrote:
> On stop_this_cpu we clear the TS flag from CR0 in order to clear the
> FPU, and Xen leaves the TS bit unset after that. This seems to cause
> problems when running a PVH Dom0, since some VMX routines expect
> the TS flag to be set, and can lead to the following ASSERT in Xen:
>
> (XEN) Domain 0 shutdown: rebooting machine.
> (XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
> (XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
> (XEN) rax: 0000000080050033   rbx: ffff8300dfb1c000   rcx: 0000000000000000
> (XEN) rdx: 0000000000000000   rsi: ffff82d0802d7fc0   rdi: ffff8300dfb1c000
> (XEN) rbp: ffff82d0802d7a88   rsp: ffff82d0802d7a78   r8:  0000000000000000
> (XEN) r9:  ffff82cffffff000   r10: 0000000b06dca869   r11: 0000003d7d708160
> (XEN) r12: ffff8300dfb1c000   r13: 0000000000000000   r14: ffff82d0802d0000
> (XEN) r15: ffff82d0802d7f18   cr0: 0000000080050033   cr4: 00000000000026f0
> (XEN) cr3: 000000019ed8d000   cr2: 0000000800dcb040
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff82d0802d7a78:
> (XEN)    ffff8300dfdf2000 ffff8300dfdf2000 ffff82d0802d7ad8 ffff82d08015d129
> (XEN)    fffffe003d272a38 ffff83019a3f9140 0000000000000000 0000000000000001
> (XEN)    0000000000000086 000000000019a400 0000000000000000 0001000000010030
> (XEN)    ffff82d0802d7af8 ffff82d080160acf ffff83019ec18530 ffff8300dfdf2000
> (XEN)    ffff82d0802d7b08 ffff82d080160af9 ffff82d0802d7b58 ffff82d080161728
> (XEN)    ffff82d0802d7b48 ffff82d08013ffbf 0000000000000002 ffff83019ec18530
> (XEN)    0000000000000000 0000000000000012 0000000000000000 0001000000010030
> (XEN)    ffff82d0802d7b68 ffff82d08014e721 ffff82d0802d7bc8 ffff82d08014cda2
> (XEN)    ffff8300dfb1c000 0000000000000092 ffff83019ec18604 ffff83019ec185f8
> (XEN)    ffff82d0802d0000 0000000000000001 0000000000000000 ffff82d08016560e
> (XEN)    ffff82d080272860 0000000000000020 ffff82d0802d7bd8 ffff82d0801448a8
> (XEN)    ffff82d0802d7be8 ffff82d080165625 ffff82d0802d7c18 ffff82d080166143
> (XEN)    0000000000000000 0000000000000001 0000000000000000 ffff82d080272860
> (XEN)    ffff82d0802d7c48 ffff82d080166aa8 ffff8300dfb1c060 0000000000010000
> (XEN)    0000000000000001 ffff82d080272860 ffff82d0802d7c78 ffff82d080166bae
> (XEN)    000000000000000a ffff82d080276fe0 00000000000000fb 0000000000000061
> (XEN)    ffff82d0802d7c98 ffff82d080166f63 ffff82d0802d7c98 ffff82d0801821ff
> (XEN)    ffff82d0802d7cb8 ffff82d08018228b 0000000000000000 ffff82d0802d7dd8
> (XEN)    ffff82d0802d7cf8 ffff82d080181aa7 ffff82d0802d7d08 0000000000000206
> (XEN)    0000000000000000 ffff82d0802d7dd8 00000000000000fb 0000000000000008
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
> (XEN)    [<ffff82d08015d129>] __context_switch+0x127/0x462
> (XEN)    [<ffff82d080160acf>] __sync_local_execstate+0x6a/0x8b
> (XEN)    [<ffff82d080160af9>] sync_local_execstate+0x9/0xb
> (XEN)    [<ffff82d080161728>] map_domain_page+0x88/0x4de
> (XEN)    [<ffff82d08014e721>] map_vtd_domain_page+0xd/0xf
> (XEN)    [<ffff82d08014cda2>] io_apic_read_remap_rte+0x158/0x29f
> (XEN)    [<ffff82d0801448a8>] iommu_read_apic_from_ire+0x27/0x29
> (XEN)    [<ffff82d080165625>] io_apic_read+0x17/0x65
> (XEN)    [<ffff82d080166143>] __ioapic_read_entry+0x38/0x61
> (XEN)    [<ffff82d080166aa8>] clear_IO_APIC_pin+0x1a/0xf3
> (XEN)    [<ffff82d080166bae>] clear_IO_APIC+0x2d/0x60
> (XEN)    [<ffff82d080166f63>] disable_IO_APIC+0xd/0x81
> (XEN)    [<ffff82d08018228b>] smp_send_stop+0x58/0x68
> (XEN)    [<ffff82d080181aa7>] machine_restart+0x80/0x20a
> (XEN)    [<ffff82d080181c3c>] __machine_restart+0xb/0xf
> (XEN)    [<ffff82d080128fb9>] smp_call_function_interrupt+0x99/0xc0
> (XEN)    [<ffff82d080182330>] call_function_interrupt+0x33/0x43
> (XEN)    [<ffff82d08016bd89>] do_IRQ+0x9e/0x63a
> (XEN)    [<ffff82d08016406f>] common_interrupt+0x5f/0x70
> (XEN)    [<ffff82d0801a8600>] mwait_idle+0x29c/0x2f7
> (XEN)    [<ffff82d08015cf67>] idle_loop+0x58/0x76
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
>
> The commit that introduced the clts and the FPU clearing (53a82f)
> notes that this is needed to workaround some broken BIOSes, but
> there's no mention about which specific BIOSes have this issue, so I'm
> uncertain if the following fix is appropiate, or if those broken
> BIOSes were only found on i386 hardware which Xen does no longer
> support.

Ouch.  This late during shutdown, we should not be anywhere near
__context_switch().

In this specific case, it appears to be a side effect of the VT-d code
using map_domain_page() rather than map_domain_page_global() for its
control pages.

I suspect that if you hadn’t faulted from this, you would have faulted
from something else fairly shortly afterwards.

~Andrew

>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Mukesh Rathor <mukesh.rathor@oracle.com>
> Cc: Keir Fraser <keir@xen.org>
> ---
>  xen/arch/x86/smp.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> index 0433f30..0226d04 100644
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -295,6 +295,7 @@ void __stop_this_cpu(void)
>       */
>      clts();
>      asm volatile ( "fninit" );
> +    stts();
>  
>      cpumask_clear_cpu(smp_processor_id(), &cpu_online_map);
>  }


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] xen: fix reboot with a PVH Dom0
  2014-06-04 16:47 [PATCH] xen: fix reboot with a PVH Dom0 Roger Pau Monne
  2014-06-04 16:54 ` Andrew Cooper
@ 2014-06-05  7:30 ` Jan Beulich
  1 sibling, 0 replies; 3+ messages in thread
From: Jan Beulich @ 2014-06-05  7:30 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: xen-devel, Keir Fraser

>>> On 04.06.14 at 18:47, <roger.pau@citrix.com> wrote:
> The commit that introduced the clts and the FPU clearing (53a82f)
> notes that this is needed to workaround some broken BIOSes, but
> there's no mention about which specific BIOSes have this issue, so I'm
> uncertain if the following fix is appropiate, or if those broken
> BIOSes were only found on i386 hardware which Xen does no longer
> support.

We should be conservative here and hence don't leave the FPU in
a bad state. Which includes not leaving CR0.TS set, i.e. the fix you
suggest isn't the right one. Instead, how about simply making
__stop_this_cpu() call sync_local_execstate() before that FPU
handling (perhaps before anything) it does? Of course, a comment
should explain why this is needed, to avoid people wondering later
and suggesting its removal.

Otoh, it does call hvm_cpu_down(), so VMX code should be able to
know that it has no business doing anything anymore. I.e. as
Andrew already suggested, subsequent operations (__vmwrite()
for example) are bogus too considering that __vmxoff() already
happened. So perhaps the better thing would be to add an
unlikely(!this_cpu(vmxon)) check to vmx_ctxt_switch_from(). And
the same ought to be done to SVM, which shouldn't do any SVM
stuff after having cleared EFER.SVME (the whole change here
really is kind of PVH independent anyway, as the same problem
might surface if you shut down without first shutting down all
your guests).

Jan

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-06-05  7:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-04 16:47 [PATCH] xen: fix reboot with a PVH Dom0 Roger Pau Monne
2014-06-04 16:54 ` Andrew Cooper
2014-06-05  7:30 ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).