xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
@ 2013-11-04 19:54 Lars Kurth
  2013-11-04 20:00 ` Andrew Cooper
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Lars Kurth @ 2013-11-04 19:54 UTC (permalink / raw)
  To: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 4044 bytes --]

See
http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-1.html
---
I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
DOM0 is Centos 6.3 based with linux kernel 3.10.16.
In my configuration all of the windows HVMs are running having been
restored from xl save.
VM's are destroyed or restored in an on-demand fashion. After some time XEN
will experience a fatal page fault while restoring one of the windows HVM
subjects. This does not happen very often, perhaps once in a 16 to 48 hour
period.
The stack trace from xen follows. Thanks in advance for any help.

(XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
(XEN) CPU: 52
(XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
(XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
(XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
(XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
(XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
(XEN) cr3: 000000211bee5000 cr2: ffff810000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff8310333e7cd8:
(XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
(XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
(XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
(XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
(XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
(XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
(XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
(XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
(XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
(XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
(XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
(XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
(XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
(XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
(XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
(XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN) [] domain_page_map_to_mfn+0x86/0xc0
(XEN) [] nvmx_handle_vmlaunch+0x49/0x160
(XEN) [] __update_vcpu_system_time+0x240/0x310
(XEN) [] vmx_vmexit_handler+0xb58/0x18c0
(XEN) [] pt_restore_timer+0xa8/0xc0
(XEN) [] hvm_io_assist+0xef/0x120
(XEN) [] hvm_do_resume+0x195/0x1c0
(XEN) [] vmx_do_resume+0x148/0x210
(XEN) [] context_switch+0x1bc/0xfc0
(XEN) [] schedule+0x254/0x5f0
(XEN) [] pt_update_irq+0x256/0x2b0
(XEN) [] timer_softirq_action+0x168/0x210
(XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
(XEN) [] nvmx_switch_guest+0x54/0x1560
(XEN) [] vmx_intr_assist+0x6c/0x490
(XEN) [] vmx_vmenter_helper+0x88/0x160
(XEN) [] __do_softirq+0x69/0xa0
(XEN) [] __do_softirq+0x69/0xa0
(XEN) [] vmx_asm_do_vmentry+0/0xed
(XEN)
(XEN) Pagetable walk from ffff810000000000:
(XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
(XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 52:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff810000000000
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

[-- Attachment #1.2: Type: text/html, Size: 7553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-04 19:54 Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) Lars Kurth
@ 2013-11-04 20:00 ` Andrew Cooper
  2013-11-05  9:53 ` Ian Campbell
  2013-11-05 10:04 ` Jan Beulich
  2 siblings, 0 replies; 32+ messages in thread
From: Andrew Cooper @ 2013-11-04 20:00 UTC (permalink / raw)
  To: Lars Kurth; +Cc: xen-devel@lists.xen.org


[-- Attachment #1.1: Type: text/plain, Size: 4527 bytes --]

On 04/11/13 19:54, Lars Kurth wrote:
> See http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-1.html
> ---
> I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
> DOM0 is Centos 6.3 based with linux kernel 3.10.16.
> In my configuration all of the windows HVMs are running having been
> restored from xl save.
> VM's are destroyed or restored in an on-demand fashion. After some
> time XEN will experience a fatal page fault while restoring one of the
> windows HVM subjects. This does not happen very often, perhaps once in
> a 16 to 48 hour period.
> The stack trace from xen follows. Thanks in advance for any help.

Which version of Xen were these images saved on?

Are you expecting to be using nested-virt? (It is still very definitely
experimental)

~Andrew

>
> (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
> (XEN) CPU: 52
> (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
> (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
> (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
> (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
> (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
> (XEN) cr3: 000000211bee5000 cr2: ffff810000000000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> (XEN) Xen stack trace from rsp=ffff8310333e7cd8:
> (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
> (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
> (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
> (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
> (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
> (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
> (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
> (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
> (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
> (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
> (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
> (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
> (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
> (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN) [] domain_page_map_to_mfn+0x86/0xc0
> (XEN) [] nvmx_handle_vmlaunch+0x49/0x160
> (XEN) [] __update_vcpu_system_time+0x240/0x310
> (XEN) [] vmx_vmexit_handler+0xb58/0x18c0
> (XEN) [] pt_restore_timer+0xa8/0xc0
> (XEN) [] hvm_io_assist+0xef/0x120
> (XEN) [] hvm_do_resume+0x195/0x1c0
> (XEN) [] vmx_do_resume+0x148/0x210
> (XEN) [] context_switch+0x1bc/0xfc0
> (XEN) [] schedule+0x254/0x5f0
> (XEN) [] pt_update_irq+0x256/0x2b0
> (XEN) [] timer_softirq_action+0x168/0x210
> (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
> (XEN) [] nvmx_switch_guest+0x54/0x1560
> (XEN) [] vmx_intr_assist+0x6c/0x490
> (XEN) [] vmx_vmenter_helper+0x88/0x160
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] vmx_asm_do_vmentry+0/0xed
> (XEN) 
> (XEN) Pagetable walk from ffff810000000000:
> (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
> (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
> (XEN) 
> (XEN) ****************************************
> (XEN) Panic on CPU 52:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: ffff810000000000
> (XEN) ****************************************
> (XEN) 
> (XEN) Reboot in five seconds...
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


[-- Attachment #1.2: Type: text/html, Size: 10163 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-04 19:54 Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) Lars Kurth
  2013-11-04 20:00 ` Andrew Cooper
@ 2013-11-05  9:53 ` Ian Campbell
  2013-11-05 10:04 ` Jan Beulich
  2 siblings, 0 replies; 32+ messages in thread
From: Ian Campbell @ 2013-11-05  9:53 UTC (permalink / raw)
  To: Lars Kurth; +Cc: xen-devel@lists.xen.org

On Mon, 2013-11-04 at 19:54 +0000, Lars Kurth wrote:
> See
> http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-1.html

TBH I think for this kind of thing (i.e. a bug not a user question) the
most appropriate thing to do would be to redirect them to xen-devel
themselves (with a reminder that they do not need to subscribe to post).
This is going to take some back and forth to get to the bottom of and
having you sit in the middle is just silly.

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-04 19:54 Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) Lars Kurth
  2013-11-04 20:00 ` Andrew Cooper
  2013-11-05  9:53 ` Ian Campbell
@ 2013-11-05 10:04 ` Jan Beulich
  2013-11-05 15:46   ` Lars Kurth
  2 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2013-11-05 10:04 UTC (permalink / raw)
  To: Lars Kurth; +Cc: xen-devel

>>> On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com> wrote:
> See
> http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3- 
> 1.html
> ---
> I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
> DOM0 is Centos 6.3 based with linux kernel 3.10.16.
> In my configuration all of the windows HVMs are running having been
> restored from xl save.
> VM's are destroyed or restored in an on-demand fashion. After some time XEN
> will experience a fatal page fault while restoring one of the windows HVM
> subjects. This does not happen very often, perhaps once in a 16 to 48 hour
> period.
> The stack trace from xen follows. Thanks in advance for any help.
> 
> (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
> (XEN) CPU: 52
> (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0

Zapping addresses (here and below in the stack trace) is never
helpful when someone asks for help with a crash. Also, in order
to not just guess, the matching xen-syms or xen.efi should be
made available or pointed to.

> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
> (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
> (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
> (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
> (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
> (XEN) cr3: 000000211bee5000 cr2: ffff810000000000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> (XEN) Xen stack trace from rsp=ffff8310333e7cd8:
> (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
> (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
> (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
> (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
> (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
> (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
> (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
> (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
> (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
> (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
> (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
> (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
> (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
> (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN) [] domain_page_map_to_mfn+0x86/0xc0
> (XEN) [] nvmx_handle_vmlaunch+0x49/0x160
> (XEN) [] __update_vcpu_system_time+0x240/0x310
> (XEN) [] vmx_vmexit_handler+0xb58/0x18c0
> (XEN) [] pt_restore_timer+0xa8/0xc0
> (XEN) [] hvm_io_assist+0xef/0x120
> (XEN) [] hvm_do_resume+0x195/0x1c0
> (XEN) [] vmx_do_resume+0x148/0x210
> (XEN) [] context_switch+0x1bc/0xfc0
> (XEN) [] schedule+0x254/0x5f0
> (XEN) [] pt_update_irq+0x256/0x2b0
> (XEN) [] timer_softirq_action+0x168/0x210
> (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
> (XEN) [] nvmx_switch_guest+0x54/0x1560
> (XEN) [] vmx_intr_assist+0x6c/0x490
> (XEN) [] vmx_vmenter_helper+0x88/0x160
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] vmx_asm_do_vmentry+0/0xed
> (XEN)
> (XEN) Pagetable walk from ffff810000000000:
> (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
> (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff

This makes me suspect that domain_page_map_to_mfn() gets a
NULL pointer passed here. As said above, this is only guesswork
at this point, and as Ian already pointed out, directing the
reporter to xen-devel would seem to be the right thing to do
here anyway.

Jan

> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 52:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=0000]
> (XEN) Faulting linear address: ffff810000000000
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-05 10:04 ` Jan Beulich
@ 2013-11-05 15:46   ` Lars Kurth
  2013-11-05 21:55     ` Jeff_Zimmerman
       [not found]     ` <5E2B3362-4D93-4FEF-987A-E477B0DCEE51@mcafee.com>
  0 siblings, 2 replies; 32+ messages in thread
From: Lars Kurth @ 2013-11-05 15:46 UTC (permalink / raw)
  To: Jan Beulich, Lars Kurth, jeff_zimmerman; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5556 bytes --]

Jan, Andrew, Ian,

pulling in Jeff who raised the question. Snippets from misc replies 
attached. Jeff, please look through these (in particular Jan's answer) 
and answer any further questions on this thread.

On 05/11/2013 09:53, Ian Campbell wrote:
 > TBH I think for this kind of thing (i.e. a bug not a user question) 
the most appropriate thing to
 > do would be to redirect them to xen-devel themselves (with a reminder 
that they do not need
 > to subscribe to post).
Agreed. Another option is for me to start the thread and pull in the 
raiser of the thread into it, if it is a bug. Was not sure this was a 
real bug at first, but it seems it is.

On 04/11/2013 20:00, Andrew Cooper wrote:
 > Which version of Xen were these images saved on?
[Jeff] We were careful to regenerate all the images after upgrading the 
4.3.1. Also saw the same problem on 4.3.0.

 > Are you expecting to be using nested-virt? (It is still very 
definitely experimental)
[Jeff] Not using nested-virt.

On 05/11/2013 10:04, Jan Beulich wrote:
>>>> On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com> wrote:
>> See
>> http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-
>> 1.html
>> ---
>> I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
>> DOM0 is Centos 6.3 based with linux kernel 3.10.16.
>> In my configuration all of the windows HVMs are running having been
>> restored from xl save.
>> VM's are destroyed or restored in an on-demand fashion. After some time XEN
>> will experience a fatal page fault while restoring one of the windows HVM
>> subjects. This does not happen very often, perhaps once in a 16 to 48 hour
>> period.
>> The stack trace from xen follows. Thanks in advance for any help.
>>
>> (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
>> (XEN) CPU: 52
>> (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
> Zapping addresses (here and below in the stack trace) is never
> helpful when someone asks for help with a crash. Also, in order
> to not just guess, the matching xen-syms or xen.efi should be
> made available or pointed to.
>
>> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
>> (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
>> (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
>> (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
>> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
>> (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
>> (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
>> (XEN) cr3: 000000211bee5000 cr2: ffff810000000000
>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
>> (XEN) Xen stack trace from rsp=ffff8310333e7cd8:
>> (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
>> (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
>> (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
>> (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
>> (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
>> (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
>> (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
>> (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
>> (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
>> (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
>> (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
>> (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
>> (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
>> (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
>> (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
>> (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
>> (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
>> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
>> (XEN) Xen call trace:
>> (XEN) [] domain_page_map_to_mfn+0x86/0xc0
>> (XEN) [] nvmx_handle_vmlaunch+0x49/0x160
>> (XEN) [] __update_vcpu_system_time+0x240/0x310
>> (XEN) [] vmx_vmexit_handler+0xb58/0x18c0
>> (XEN) [] pt_restore_timer+0xa8/0xc0
>> (XEN) [] hvm_io_assist+0xef/0x120
>> (XEN) [] hvm_do_resume+0x195/0x1c0
>> (XEN) [] vmx_do_resume+0x148/0x210
>> (XEN) [] context_switch+0x1bc/0xfc0
>> (XEN) [] schedule+0x254/0x5f0
>> (XEN) [] pt_update_irq+0x256/0x2b0
>> (XEN) [] timer_softirq_action+0x168/0x210
>> (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
>> (XEN) [] nvmx_switch_guest+0x54/0x1560
>> (XEN) [] vmx_intr_assist+0x6c/0x490
>> (XEN) [] vmx_vmenter_helper+0x88/0x160
>> (XEN) [] __do_softirq+0x69/0xa0
>> (XEN) [] __do_softirq+0x69/0xa0
>> (XEN) [] vmx_asm_do_vmentry+0/0xed
>> (XEN)
>> (XEN) Pagetable walk from ffff810000000000:
>> (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
>> (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
> This makes me suspect that domain_page_map_to_mfn() gets a
> NULL pointer passed here. As said above, this is only guesswork
> at this point, and as Ian already pointed out, directing the
> reporter to xen-devel would seem to be the right thing to do
> here anyway.
>
> Jan


[-- Attachment #1.2: Type: text/html, Size: 8201 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-05 15:46   ` Lars Kurth
@ 2013-11-05 21:55     ` Jeff_Zimmerman
       [not found]     ` <5E2B3362-4D93-4FEF-987A-E477B0DCEE51@mcafee.com>
  1 sibling, 0 replies; 32+ messages in thread
From: Jeff_Zimmerman @ 2013-11-05 21:55 UTC (permalink / raw)
  To: lars.kurth; +Cc: lars.kurth.xen, xen-devel, JBeulich


[-- Attachment #1.1: Type: text/plain, Size: 5686 bytes --]

Lars,
I understand the mailing list limits attachment size to 512K. Where can I post the xen binary an symbols file?
Jeff

On Nov 5, 2013, at 7:46 AM, Lars Kurth <lars.kurth@xen.org<mailto:lars.kurth@xen.org>> wrote:

Jan, Andrew, Ian,

pulling in Jeff who raised the question. Snippets from misc replies attached. Jeff, please look through these (in particular Jan's answer) and answer any further questions on this thread.

On 05/11/2013 09:53, Ian Campbell wrote:
> TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to
> do would be to redirect them to xen-devel themselves (with a reminder that they do not need
> to subscribe to post).
Agreed. Another option is for me to start the thread and pull in the raiser of the thread into it, if it is a bug. Was not sure this was a real bug at first, but it seems it is.

On 04/11/2013 20:00, Andrew Cooper wrote:
> Which version of Xen were these images saved on?
[Jeff] We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0.

> Are you expecting to be using nested-virt? (It is still very definitely experimental)
[Jeff] Not using nested-virt.

On 05/11/2013 10:04, Jan Beulich wrote:

On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com> wrote:


See
http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-
1.html
---
I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
DOM0 is Centos 6.3 based with linux kernel 3.10.16.
In my configuration all of the windows HVMs are running having been
restored from xl save.
VM's are destroyed or restored in an on-demand fashion. After some time XEN
will experience a fatal page fault while restoring one of the windows HVM
subjects. This does not happen very often, perhaps once in a 16 to 48 hour
period.
The stack trace from xen follows. Thanks in advance for any help.

(XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
(XEN) CPU: 52
(XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0


Zapping addresses (here and below in the stack trace) is never
helpful when someone asks for help with a crash. Also, in order
to not just guess, the matching xen-syms or xen.efi should be
made available or pointed to.



(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
(XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
(XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
(XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
(XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
(XEN) cr3: 000000211bee5000 cr2: ffff810000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff8310333e7cd8:
(XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
(XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
(XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
(XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
(XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
(XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
(XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
(XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
(XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
(XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
(XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
(XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
(XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
(XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
(XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
(XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN) [] domain_page_map_to_mfn+0x86/0xc0
(XEN) [] nvmx_handle_vmlaunch+0x49/0x160
(XEN) [] __update_vcpu_system_time+0x240/0x310
(XEN) [] vmx_vmexit_handler+0xb58/0x18c0
(XEN) [] pt_restore_timer+0xa8/0xc0
(XEN) [] hvm_io_assist+0xef/0x120
(XEN) [] hvm_do_resume+0x195/0x1c0
(XEN) [] vmx_do_resume+0x148/0x210
(XEN) [] context_switch+0x1bc/0xfc0
(XEN) [] schedule+0x254/0x5f0
(XEN) [] pt_update_irq+0x256/0x2b0
(XEN) [] timer_softirq_action+0x168/0x210
(XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
(XEN) [] nvmx_switch_guest+0x54/0x1560
(XEN) [] vmx_intr_assist+0x6c/0x490
(XEN) [] vmx_vmenter_helper+0x88/0x160
(XEN) [] __do_softirq+0x69/0xa0
(XEN) [] __do_softirq+0x69/0xa0
(XEN) [] vmx_asm_do_vmentry+0/0xed
(XEN)
(XEN) Pagetable walk from ffff810000000000:
(XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
(XEN) L3[0x000] = 0000000000000000 ffffffffffffffff


This makes me suspect that domain_page_map_to_mfn() gets a
NULL pointer passed here. As said above, this is only guesswork
at this point, and as Ian already pointed out, directing the
reporter to xen-devel would seem to be the right thing to do
here anyway.

Jan




[-- Attachment #1.2: Type: text/html, Size: 8620 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
       [not found] <CE9EAEF6.59305%asit.k.mallick@intel.com>
@ 2013-11-05 22:46 ` Jeff_Zimmerman
  2013-11-05 23:17   ` Mallick, Asit K
  2013-11-06  0:23   ` Andrew Cooper
  0 siblings, 2 replies; 32+ messages in thread
From: Jeff_Zimmerman @ 2013-11-05 22:46 UTC (permalink / raw)
  To: asit.k.mallick; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 7051 bytes --]

Asit,
I've attached two files, one is from dmesg | grep microcode, second is first process from /proc/cpuinfo
Jeff

On Nov 5, 2013, at 2:29 PM, "Mallick, Asit K" <asit.k.mallick@intel.com>
 wrote:

> Jeff,
> Could you check if you you have latest microcode updates installed on this system? Or, could you send me the microcode rev and I can check.
>
> Thanks,
> Asit
>
>
> From: "Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>" <Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>>
> Date: Tuesday, November 5, 2013 2:55 PM
> To: "lars.kurth@xen.org<mailto:lars.kurth@xen.org>" <lars.kurth@xen.org<mailto:lars.kurth@xen.org>>
> Cc: "lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>" <lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>>, "xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>" <xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>>, "JBeulich@suse.com<mailto:JBeulich@suse.com>" <JBeulich@suse.com<mailto:JBeulich@suse.com>>
> Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
>
> Lars,
> I understand the mailing list limits attachment size to 512K. Where can I post the xen binary an symbols file?
> Jeff
>
> On Nov 5, 2013, at 7:46 AM, Lars Kurth <lars.kurth@xen.org<mailto:lars.kurth@xen.org>> wrote:
>
> Jan, Andrew, Ian,
>
> pulling in Jeff who raised the question. Snippets from misc replies attached. Jeff, please look through these (in particular Jan's answer) and answer any further questions on this thread.
>
> On 05/11/2013 09:53, Ian Campbell wrote:
>> TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to
>> do would be to redirect them to xen-devel themselves (with a reminder that they do not need
>> to subscribe to post).
> Agreed. Another option is for me to start the thread and pull in the raiser of the thread into it, if it is a bug. Was not sure this was a real bug at first, but it seems it is.
>
> On 04/11/2013 20:00, Andrew Cooper wrote:
>> Which version of Xen were these images saved on?
> [Jeff] We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0.
>
>> Are you expecting to be using nested-virt? (It is still very definitely experimental)
> [Jeff] Not using nested-virt.
>
> On 05/11/2013 10:04, Jan Beulich wrote:
>
> On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com> wrote:
>
>
> See
> http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-
> 1.html
> ---
> I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
> DOM0 is Centos 6.3 based with linux kernel 3.10.16.
> In my configuration all of the windows HVMs are running having been
> restored from xl save.
> VM's are destroyed or restored in an on-demand fashion. After some time XEN
> will experience a fatal page fault while restoring one of the windows HVM
> subjects. This does not happen very often, perhaps once in a 16 to 48 hour
> period.
> The stack trace from xen follows. Thanks in advance for any help.
>
> (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
> (XEN) CPU: 52
> (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
>
>
> Zapping addresses (here and below in the stack trace) is never
> helpful when someone asks for help with a crash. Also, in order
> to not just guess, the matching xen-syms or xen.efi should be
> made available or pointed to.
>
>
>
> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
> (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
> (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
> (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
> (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
> (XEN) cr3: 000000211bee5000 cr2: ffff810000000000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> (XEN) Xen stack trace from rsp=ffff8310333e7cd8:
> (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
> (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
> (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
> (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
> (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
> (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
> (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
> (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
> (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
> (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
> (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
> (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
> (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
> (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN) [] domain_page_map_to_mfn+0x86/0xc0
> (XEN) [] nvmx_handle_vmlaunch+0x49/0x160
> (XEN) [] __update_vcpu_system_time+0x240/0x310
> (XEN) [] vmx_vmexit_handler+0xb58/0x18c0
> (XEN) [] pt_restore_timer+0xa8/0xc0
> (XEN) [] hvm_io_assist+0xef/0x120
> (XEN) [] hvm_do_resume+0x195/0x1c0
> (XEN) [] vmx_do_resume+0x148/0x210
> (XEN) [] context_switch+0x1bc/0xfc0
> (XEN) [] schedule+0x254/0x5f0
> (XEN) [] pt_update_irq+0x256/0x2b0
> (XEN) [] timer_softirq_action+0x168/0x210
> (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
> (XEN) [] nvmx_switch_guest+0x54/0x1560
> (XEN) [] vmx_intr_assist+0x6c/0x490
> (XEN) [] vmx_vmenter_helper+0x88/0x160
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] vmx_asm_do_vmentry+0/0xed
> (XEN)
> (XEN) Pagetable walk from ffff810000000000:
> (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
> (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
>
>
> This makes me suspect that domain_page_map_to_mfn() gets a
> NULL pointer passed here. As said above, this is only guesswork
> at this point, and as Ian already pointed out, directing the
> reporter to xen-devel would seem to be the right thing to do
> here anyway.
>
> Jan
>
>
>


[-- Attachment #1.2: Type: text/html, Size: 8799 bytes --]

[-- Attachment #2: microcode.txt --]
[-- Type: text/plain, Size: 956 bytes --]

microcode: CPU0 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU1 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU2 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU3 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU4 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU5 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU6 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU7 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU8 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU9 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU10 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU11 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU12 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU13 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU14 sig=0x206d7, pf=0x40, revision=0x710
microcode: CPU15 sig=0x206d7, pf=0x40, revision=0x710
microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba

[-- Attachment #3: cpuinfo.txt --]
[-- Type: text/plain, Size: 904 bytes --]

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz
stepping        : 7
microcode       : 0x710
cpu MHz         : 2394.318
cache size      : 20480 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc pni pclmulqdq est ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm
bogomips        : 4788.63
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual

[-- Attachment #4: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-05 22:46 ` Jeff_Zimmerman
@ 2013-11-05 23:17   ` Mallick, Asit K
  2013-11-06  0:23   ` Andrew Cooper
  1 sibling, 0 replies; 32+ messages in thread
From: Mallick, Asit K @ 2013-11-05 23:17 UTC (permalink / raw)
  To: Jeff_Zimmerman@McAfee.com; +Cc: xen-devel@lists.xenproject.org

It is running with the latest microcode revision 0x710.

Thanks,
Asit


From: "Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>" <Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>>
Date: Tuesday, November 5, 2013 3:46 PM
To: "Mallick, Asit K" <asit.k.mallick@intel.com<mailto:asit.k.mallick@intel.com>>
Cc: "xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>" <xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>>
Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)

Asit,
I've attached two files, one is from dmesg | grep microcode, second is first process from /proc/cpuinfo
Jeff

On Nov 5, 2013, at 2:29 PM, "Mallick, Asit K" <asit.k.mallick@intel.com<mailto:asit.k.mallick@intel.com>>
 wrote:

> Jeff,
> Could you check if you you have latest microcode updates installed on this system? Or, could you send me the microcode rev and I can check.
>
> Thanks,
> Asit
>
>
> From: "Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com><mailto:Jeff_Zimmerman@McAfee.com>" <Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com><mailto:Jeff_Zimmerman@McAfee.com>>
> Date: Tuesday, November 5, 2013 2:55 PM
> To: "lars.kurth@xen.org<mailto:lars.kurth@xen.org><mailto:lars.kurth@xen.org>" <lars.kurth@xen.org<mailto:lars.kurth@xen.org><mailto:lars.kurth@xen.org>>
> Cc: "lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com>" <lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com>>, "xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org><mailto:xen-devel@lists.xenproject.org>" <xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org><mailto:xen-devel@lists.xenproject.org>>, "JBeulich@suse.com<mailto:JBeulich@suse.com><mailto:JBeulich@suse.com>" <JBeulich@suse.com<mailto:JBeulich@suse.com><mailto:JBeulich@suse.com>>
> Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
>
> Lars,
> I understand the mailing list limits attachment size to 512K. Where can I post the xen binary an symbols file?
> Jeff
>
> On Nov 5, 2013, at 7:46 AM, Lars Kurth <lars.kurth@xen.org<mailto:lars.kurth@xen.org><mailto:lars.kurth@xen.org>> wrote:
>
> Jan, Andrew, Ian,
>
> pulling in Jeff who raised the question. Snippets from misc replies attached. Jeff, please look through these (in particular Jan's answer) and answer any further questions on this thread.
>
> On 05/11/2013 09:53, Ian Campbell wrote:
>> TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to
>> do would be to redirect them to xen-devel themselves (with a reminder that they do not need
>> to subscribe to post).
> Agreed. Another option is for me to start the thread and pull in the raiser of the thread into it, if it is a bug. Was not sure this was a real bug at first, but it seems it is.
>
> On 04/11/2013 20:00, Andrew Cooper wrote:
>> Which version of Xen were these images saved on?
> [Jeff] We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0.
>
>> Are you expecting to be using nested-virt? (It is still very definitely experimental)
> [Jeff] Not using nested-virt.
>
> On 05/11/2013 10:04, Jan Beulich wrote:
>
> On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>><mailto:lars.kurth.xen@gmail.com> wrote:
>
>
> See
> http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-
> 1.html
> ---
> I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
> DOM0 is Centos 6.3 based with linux kernel 3.10.16.
> In my configuration all of the windows HVMs are running having been
> restored from xl save.
> VM's are destroyed or restored in an on-demand fashion. After some time XEN
> will experience a fatal page fault while restoring one of the windows HVM
> subjects. This does not happen very often, perhaps once in a 16 to 48 hour
> period.
> The stack trace from xen follows. Thanks in advance for any help.
>
> (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
> (XEN) CPU: 52
> (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
>
>
> Zapping addresses (here and below in the stack trace) is never
> helpful when someone asks for help with a crash. Also, in order
> to not just guess, the matching xen-syms or xen.efi should be
> made available or pointed to.
>
>
>
> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
> (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
> (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
> (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
> (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
> (XEN) cr3: 000000211bee5000 cr2: ffff810000000000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> (XEN) Xen stack trace from rsp=ffff8310333e7cd8:
> (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
> (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
> (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
> (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
> (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
> (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
> (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
> (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
> (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
> (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
> (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
> (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
> (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
> (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN) [] domain_page_map_to_mfn+0x86/0xc0
> (XEN) [] nvmx_handle_vmlaunch+0x49/0x160
> (XEN) [] __update_vcpu_system_time+0x240/0x310
> (XEN) [] vmx_vmexit_handler+0xb58/0x18c0
> (XEN) [] pt_restore_timer+0xa8/0xc0
> (XEN) [] hvm_io_assist+0xef/0x120
> (XEN) [] hvm_do_resume+0x195/0x1c0
> (XEN) [] vmx_do_resume+0x148/0x210
> (XEN) [] context_switch+0x1bc/0xfc0
> (XEN) [] schedule+0x254/0x5f0
> (XEN) [] pt_update_irq+0x256/0x2b0
> (XEN) [] timer_softirq_action+0x168/0x210
> (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
> (XEN) [] nvmx_switch_guest+0x54/0x1560
> (XEN) [] vmx_intr_assist+0x6c/0x490
> (XEN) [] vmx_vmenter_helper+0x88/0x160
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] vmx_asm_do_vmentry+0/0xed
> (XEN)
> (XEN) Pagetable walk from ffff810000000000:
> (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
> (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
>
>
> This makes me suspect that domain_page_map_to_mfn() gets a
> NULL pointer passed here. As said above, this is only guesswork
> at this point, and as Ian already pointed out, directing the
> reporter to xen-devel would seem to be the right thing to do
> here anyway.
>
> Jan
>
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-05 22:46 ` Jeff_Zimmerman
  2013-11-05 23:17   ` Mallick, Asit K
@ 2013-11-06  0:23   ` Andrew Cooper
  2013-11-06 10:05     ` Ian Campbell
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2013-11-06  0:23 UTC (permalink / raw)
  To: Jeff_Zimmerman, asit.k.mallick; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 7667 bytes --]

On 05/11/2013 22:46, Jeff_Zimmerman@McAfee.com wrote:
> Asit,
> I've attached two files, one is from dmesg | grep microcode, second is
> first process from /proc/cpuinfo
> Jeff
>
> On Nov 5, 2013, at 2:29 PM, "Mallick, Asit K" <asit.k.mallick@intel.com>
>  wrote:
>
> > Jeff,
> > Could you check if you you have latest microcode updates installed
> on this system? Or, could you send me the microcode rev and I can check.
> >
> > Thanks,
> > Asit
> >
> >
> > From: "Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>"
> <Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>>
> > Date: Tuesday, November 5, 2013 2:55 PM
> > To: "lars.kurth@xen.org<mailto:lars.kurth@xen.org>"
> <lars.kurth@xen.org<mailto:lars.kurth@xen.org>>
> > Cc: "lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>"
> <lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>>,
> "xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>" <xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>>,
> "JBeulich@suse.com<mailto:JBeulich@suse.com>"
> <JBeulich@suse.com<mailto:JBeulich@suse.com>>
> > Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN
> 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
> >
> > Lars,
> > I understand the mailing list limits attachment size to 512K. Where
> can I post the xen binary an symbols file?
> > Jeff
> >
> > On Nov 5, 2013, at 7:46 AM, Lars Kurth
> <lars.kurth@xen.org<mailto:lars.kurth@xen.org>> wrote:
> >
> > Jan, Andrew, Ian,
> >
> > pulling in Jeff who raised the question. Snippets from misc replies
> attached. Jeff, please look through these (in particular Jan's answer)
> and answer any further questions on this thread.
> >
> > On 05/11/2013 09:53, Ian Campbell wrote:
> >> TBH I think for this kind of thing (i.e. a bug not a user question)
> the most appropriate thing to
> >> do would be to redirect them to xen-devel themselves (with a
> reminder that they do not need
> >> to subscribe to post).
> > Agreed. Another option is for me to start the thread and pull in the
> raiser of the thread into it, if it is a bug. Was not sure this was a
> real bug at first, but it seems it is.
> >
> > On 04/11/2013 20:00, Andrew Cooper wrote:
> >> Which version of Xen were these images saved on?
> > [Jeff] We were careful to regenerate all the images after upgrading
> the 4.3.1. Also saw the same problem on 4.3.0.
> >
> >> Are you expecting to be using nested-virt? (It is still very
> definitely experimental)
> > [Jeff] Not using nested-virt.
> >
> > On 05/11/2013 10:04, Jan Beulich wrote:
> >
> > On 04.11.13 at 20:54, Lars Kurth
> <lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com> wrote:
> >
> >
> > See
> >
> http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-
> > 1.html
> > ---
> > I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
> > DOM0 is Centos 6.3 based with linux kernel 3.10.16.
> > In my configuration all of the windows HVMs are running having been
> > restored from xl save.
> > VM's are destroyed or restored in an on-demand fashion. After some
> time XEN
> > will experience a fatal page fault while restoring one of the
> windows HVM
> > subjects. This does not happen very often, perhaps once in a 16 to
> 48 hour
> > period.
> > The stack trace from xen follows. Thanks in advance for any help.
> >
> > (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
> > (XEN) CPU: 52
> > (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
> >
> >
> > Zapping addresses (here and below in the stack trace) is never
> > helpful when someone asks for help with a crash. Also, in order
> > to not just guess, the matching xen-syms or xen.efi should be
> > made available or pointed to.
> >
> >
> >
> > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
> > (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
> > (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
> > (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
> > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> > (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
> > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
> > (XEN) cr3: 000000211bee5000 cr2: ffff810000000000
> > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> > (XEN) Xen stack trace from rsp=ffff8310333e7cd8:
> > (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70
> ffff8300bb163000
> > (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000
> ffff82c4c01d7548
> > (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8
> ffff8310333e7e60
> > (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003
> ffff833144d8e000
> > (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000
> ffff8300bdff1000
> > (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880
> ffff82c4c0308440
> > (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c
> ffff82c4c02f2880
> > (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060
> ffff82c4c02f2880
> > (XEN) 0000000000000282 0010000000000000 0000000000000000
> 0000000000000000
> > (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000
> ffff8300bb163000
> > (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880
> ffff82c4c0308440
> > (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060
> 0000000001c9c380
> > (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380
> ffffffffffffff00
> > (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000
> ffff82c4c01bc490
> > (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0
> ffff82c4c01cfc3c
> > (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9
> ffff82c4c0125db9
> > (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0
> 0000000000000000
> > (XEN) 0000000000000000 0000000000000000 0000000000000000
> ffff82c4c01deaa3
> > (XEN) 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000
> > (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000
> 0000000000000000
> > (XEN) Xen call trace:
> > (XEN) [] domain_page_map_to_mfn+0x86/0xc0
> > (XEN) [] nvmx_handle_vmlaunch+0x49/0x160
> > (XEN) [] __update_vcpu_system_time+0x240/0x310
> > (XEN) [] vmx_vmexit_handler+0xb58/0x18c0
> > (XEN) [] pt_restore_timer+0xa8/0xc0
> > (XEN) [] hvm_io_assist+0xef/0x120
> > (XEN) [] hvm_do_resume+0x195/0x1c0
> > (XEN) [] vmx_do_resume+0x148/0x210
> > (XEN) [] context_switch+0x1bc/0xfc0
> > (XEN) [] schedule+0x254/0x5f0
> > (XEN) [] pt_update_irq+0x256/0x2b0
> > (XEN) [] timer_softirq_action+0x168/0x210
> > (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
> > (XEN) [] nvmx_switch_guest+0x54/0x1560
> > (XEN) [] vmx_intr_assist+0x6c/0x490
> > (XEN) [] vmx_vmenter_helper+0x88/0x160
> > (XEN) [] __do_softirq+0x69/0xa0
> > (XEN) [] __do_softirq+0x69/0xa0
> > (XEN) [] vmx_asm_do_vmentry+0/0xed
> > (XEN)
> > (XEN) Pagetable walk from ffff810000000000:
> > (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
> > (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
> >
> >
> > This makes me suspect that domain_page_map_to_mfn() gets a
> > NULL pointer passed here. As said above, this is only guesswork
> > at this point, and as Ian already pointed out, directing the
> > reporter to xen-devel would seem to be the right thing to do
> > here anyway.
> >
> > Jan
> >
> >
> >
>

As Jan said, the above censoring is almost completely defeating the
purpose of trying to help you.

However, while you are not expecting to be using nested-virt, you
clearly appear to be from the stack trace, so something is clearly up.

Which toolstack are you using for VMs ?  What is the configuration for
the affected VM?

~Andrew

[-- Attachment #1.2: Type: text/html, Size: 14333 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-06  0:23   ` Andrew Cooper
@ 2013-11-06 10:05     ` Ian Campbell
  0 siblings, 0 replies; 32+ messages in thread
From: Ian Campbell @ 2013-11-06 10:05 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: asit.k.mallick, xen-devel, Jeff_Zimmerman

On Wed, 2013-11-06 at 00:23 +0000, Andrew Cooper wrote:

> Which toolstack are you using for VMs ?  What is the configuration for
> the affected VM?

And what exact Windows OS? It's not entirely out the question that a
modern one might try and use VMX for various things if it saw it. And
doesn't mcafee have a Windows product which does things along those
lines ? :-)

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
       [not found]     ` <5E2B3362-4D93-4FEF-987A-E477B0DCEE51@mcafee.com>
@ 2013-11-06 14:09       ` Jan Beulich
  2013-11-06 16:05         ` Jeff_Zimmerman
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2013-11-06 14:09 UTC (permalink / raw)
  To: Jeff_Zimmerman; +Cc: lars.kurth.xen, xen-devel, lars.kurth

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

>>> On 05.11.13 at 22:36, <Jeff_Zimmerman@McAfee.com> wrote:
> Attaching the xen binary and symbols file.
> Hopefully they will come through.

Please give the attached patch a try - afaict it should eliminate
the host crash, but I'm pretty certain you'll then see the guest
misbehave. Depending on what other load you place on the
system as a whole, you're either overloading it (i.e. we're
running out of mapping space in the hypervisor) or there's a
mapping leak that - so far at least - I can't spot.

In any event I'd suggest you try running a debug build of the
hypervisor, so that eventual problems can be spotted earlier.

Jan


[-- Attachment #2: nVMX-map-errors.patch --]
[-- Type: text/plain, Size: 4034 bytes --]

--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -747,7 +747,7 @@ static void __clear_current_vvmcs(struct
         __vmpclear(virt_to_maddr(nvcpu->nv_n2vmcx));
 }
 
-static void __map_msr_bitmap(struct vcpu *v)
+static bool_t __must_check _map_msr_bitmap(struct vcpu *v)
 {
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
     unsigned long gpa;
@@ -756,9 +756,11 @@ static void __map_msr_bitmap(struct vcpu
         hvm_unmap_guest_frame(nvmx->msrbitmap, 1);
     gpa = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, MSR_BITMAP);
     nvmx->msrbitmap = hvm_map_guest_frame_ro(gpa >> PAGE_SHIFT, 1);
+
+    return nvmx->msrbitmap != NULL;
 }
 
-static void __map_io_bitmap(struct vcpu *v, u64 vmcs_reg)
+static bool_t __must_check _map_io_bitmap(struct vcpu *v, u64 vmcs_reg)
 {
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
     unsigned long gpa;
@@ -769,12 +771,14 @@ static void __map_io_bitmap(struct vcpu 
         hvm_unmap_guest_frame(nvmx->iobitmap[index], 1);
     gpa = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, vmcs_reg);
     nvmx->iobitmap[index] = hvm_map_guest_frame_ro(gpa >> PAGE_SHIFT, 1);
+
+    return nvmx->iobitmap[index] != NULL;
 }
 
-static inline void map_io_bitmap_all(struct vcpu *v)
+static inline bool_t __must_check map_io_bitmap_all(struct vcpu *v)
 {
-   __map_io_bitmap (v, IO_BITMAP_A);
-   __map_io_bitmap (v, IO_BITMAP_B);
+   return _map_io_bitmap(v, IO_BITMAP_A) &&
+          _map_io_bitmap(v, IO_BITMAP_B);
 }
 
 static void nvmx_purge_vvmcs(struct vcpu *v)
@@ -1608,9 +1612,15 @@ int nvmx_handle_vmptrld(struct cpu_user_
     if ( nvcpu->nv_vvmcxaddr == VMCX_EADDR )
     {
         nvcpu->nv_vvmcx = hvm_map_guest_frame_rw(gpa >> PAGE_SHIFT, 1);
-        nvcpu->nv_vvmcxaddr = gpa;
-        map_io_bitmap_all (v);
-        __map_msr_bitmap(v);
+        if ( nvcpu->nv_vvmcx )
+            nvcpu->nv_vvmcxaddr = gpa;
+        if ( !nvcpu->nv_vvmcx ||
+             !map_io_bitmap_all(v) ||
+             !_map_msr_bitmap(v) )
+        {
+            vmreturn(regs, VMFAIL_VALID);
+            goto out;
+        }
     }
 
     if ( cpu_has_vmx_vmcs_shadowing )
@@ -1676,9 +1686,8 @@ int nvmx_handle_vmclear(struct cpu_user_
     {
         /* Even if this VMCS isn't the current one, we must clear it. */
         vvmcs = hvm_map_guest_frame_rw(gpa >> PAGE_SHIFT, 0);
-        if ( vvmcs ) 
-            clear_vvmcs_launched(&nvmx->launched_list,
-                domain_page_map_to_mfn(vvmcs));
+        clear_vvmcs_launched(&nvmx->launched_list,
+                             domain_page_map_to_mfn(vvmcs));
         hvm_unmap_guest_frame(vvmcs, 0);
     }
 
@@ -1722,6 +1731,7 @@ int nvmx_handle_vmwrite(struct cpu_user_
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     unsigned long operand; 
     u64 vmcs_encoding;
+    bool_t okay = 1;
 
     if ( decode_vmx_inst(regs, &decode, &operand, 0)
              != X86EMUL_OKAY )
@@ -1730,16 +1740,21 @@ int nvmx_handle_vmwrite(struct cpu_user_
     vmcs_encoding = reg_read(regs, decode.reg2);
     __set_vvmcs(nvcpu->nv_vvmcx, vmcs_encoding, operand);
 
-    if ( vmcs_encoding == IO_BITMAP_A || vmcs_encoding == IO_BITMAP_A_HIGH )
-        __map_io_bitmap (v, IO_BITMAP_A);
-    else if ( vmcs_encoding == IO_BITMAP_B || 
-              vmcs_encoding == IO_BITMAP_B_HIGH )
-        __map_io_bitmap (v, IO_BITMAP_B);
+    switch ( vmcs_encoding )
+    {
+    case IO_BITMAP_A: case IO_BITMAP_A_HIGH:
+        okay = _map_io_bitmap(v, IO_BITMAP_A);
+        break;
+    case IO_BITMAP_B: case IO_BITMAP_B_HIGH:
+        okay = _map_io_bitmap(v, IO_BITMAP_B);
+        break;
+    case MSR_BITMAP: case MSR_BITMAP_HIGH:
+        okay = _map_msr_bitmap(v);
+        break;
+    }
 
-    if ( vmcs_encoding == MSR_BITMAP || vmcs_encoding == MSR_BITMAP_HIGH )
-        __map_msr_bitmap(v);
+    vmreturn(regs, okay ? VMSUCCEED : VMFAIL_VALID);
 
-    vmreturn(regs, VMSUCCEED);
     return X86EMUL_OKAY;
 }
 

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-06 14:09       ` Jan Beulich
@ 2013-11-06 16:05         ` Jeff_Zimmerman
  2013-11-06 16:16           ` Jan Beulich
  2013-11-06 16:18           ` Ian Campbell
  0 siblings, 2 replies; 32+ messages in thread
From: Jeff_Zimmerman @ 2013-11-06 16:05 UTC (permalink / raw)
  To: JBeulich; +Cc: lars.kurth.xen, xen-devel, lars.kurth

Jan,

I will give your patch a try.
I have to recant my previous statement regarding not using nested-virt.
It seems some of the code that is being executed on the vm contains vmx instructions.
Since by virtue of running this code in an hvm subject make it nested-virt.

This raises a question, if this functionality is undesired can we just disable nested virt by adding
nestedhvm=false to the configuration file?  Should the cpuid and cupid_check settings be changed as well?

Thanks,
Jeff

On Nov 6, 2013, at 6:09 AM, Jan Beulich <JBeulich@suse.com>
wrote:

>>>> On 05.11.13 at 22:36, <Jeff_Zimmerman@McAfee.com> wrote:
>> Attaching the xen binary and symbols file.
>> Hopefully they will come through.
> 
> Please give the attached patch a try - afaict it should eliminate
> the host crash, but I'm pretty certain you'll then see the guest
> misbehave. Depending on what other load you place on the
> system as a whole, you're either overloading it (i.e. we're
> running out of mapping space in the hypervisor) or there's a
> mapping leak that - so far at least - I can't spot.
> 
> In any event I'd suggest you try running a debug build of the
> hypervisor, so that eventual problems can be spotted earlier.
> 
> Jan
> 
> <nVMX-map-errors.patch>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-06 16:05         ` Jeff_Zimmerman
@ 2013-11-06 16:16           ` Jan Beulich
  2013-11-06 16:18           ` Ian Campbell
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2013-11-06 16:16 UTC (permalink / raw)
  To: Jeff_Zimmerman; +Cc: lars.kurth.xen, xen-devel, lars.kurth

>>> On 06.11.13 at 17:05, <Jeff_Zimmerman@McAfee.com> wrote:
> This raises a question, if this functionality is undesired can we just 
> disable nested virt by adding
> nestedhvm=false to the configuration file?

Sure. And as that's supposedly the default, just deleting the line
should be fine too.

> Should the cpuid and cupid_check settings be changed as well?

I don't think so, unless you manually override it to look like VMX
was available.

That said - it would still be nice if you could help us figure out the
bug's origin (and I assume you realize that it would be even more
helpful for us if you did all this on 4.4-unstable).

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-06 16:05         ` Jeff_Zimmerman
  2013-11-06 16:16           ` Jan Beulich
@ 2013-11-06 16:18           ` Ian Campbell
  2013-11-06 16:48             ` Jeff_Zimmerman
  1 sibling, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-11-06 16:18 UTC (permalink / raw)
  To: Jeff_Zimmerman; +Cc: lars.kurth.xen, xen-devel, lars.kurth, JBeulich

On Wed, 2013-11-06 at 16:05 +0000, Jeff_Zimmerman@McAfee.com wrote:
> Jan,
> 
> I will give your patch a try.
> I have to recant my previous statement regarding not using nested-virt.
> It seems some of the code that is being executed on the vm contains vmx instructions.
> Since by virtue of running this code in an hvm subject make it nested-virt.
> 
> This raises a question, if this functionality is undesired can we just disable nested virt by adding
> nestedhvm=false to the configuration file?  Should the cpuid and
> cupid_check settings be changed as well?

I'm reasonably certain that nestedhvm=false will clear the relevant
flags in the guest visible cpuid. I'd say it was a bug if this doesn't
happen.

nestedhvm should be disabled by default, did you explicitly enable it?
Removing the line altogether ought to disable it too. Please let us know
if not.

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-06 16:18           ` Ian Campbell
@ 2013-11-06 16:48             ` Jeff_Zimmerman
  2013-11-06 16:54               ` Andrew Cooper
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff_Zimmerman @ 2013-11-06 16:48 UTC (permalink / raw)
  To: Ian.Campbell; +Cc: lars.kurth.xen, xen-devel, lars.kurth, JBeulich


On Nov 6, 2013, at 8:18 AM, Ian Campbell <Ian.Campbell@citrix.com>
 wrote:

> On Wed, 2013-11-06 at 16:05 +0000, Jeff_Zimmerman@McAfee.com wrote:
>> Jan,
>> 
>> I will give your patch a try.
>> I have to recant my previous statement regarding not using nested-virt.
>> It seems some of the code that is being executed on the vm contains vmx instructions.
>> Since by virtue of running this code in an hvm subject make it nested-virt.
>> 
>> This raises a question, if this functionality is undesired can we just disable nested virt by adding
>> nestedhvm=false to the configuration file?  Should the cpuid and
>> cupid_check settings be changed as well?
> 
> I'm reasonably certain that nestedhvm=false will clear the relevant
> flags in the guest visible cpuid. I'd say it was a bug if this doesn't
> happen.
> 
> nestedhvm should be disabled by default, did you explicitly enable it?
> Removing the line altogether ought to disable it too. Please let us know
> if not.
> 
> Ian.
> 
I did not enable nestedvm and when I run xl list -l the output shows nestedhvm=<default>
I was not sure what the default was supposed to be. I will try setting it and re-run our test.
Jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-06 16:48             ` Jeff_Zimmerman
@ 2013-11-06 16:54               ` Andrew Cooper
  2013-11-06 17:06                 ` Ian Campbell
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2013-11-06 16:54 UTC (permalink / raw)
  To: Jeff_Zimmerman
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell, JBeulich

On 06/11/13 16:48, Jeff_Zimmerman@McAfee.com wrote:
> On Nov 6, 2013, at 8:18 AM, Ian Campbell <Ian.Campbell@citrix.com>
>  wrote:
>
>> On Wed, 2013-11-06 at 16:05 +0000, Jeff_Zimmerman@McAfee.com wrote:
>>> Jan,
>>>
>>> I will give your patch a try.
>>> I have to recant my previous statement regarding not using nested-virt.
>>> It seems some of the code that is being executed on the vm contains vmx instructions.
>>> Since by virtue of running this code in an hvm subject make it nested-virt.
>>>
>>> This raises a question, if this functionality is undesired can we just disable nested virt by adding
>>> nestedhvm=false to the configuration file?  Should the cpuid and
>>> cupid_check settings be changed as well?
>> I'm reasonably certain that nestedhvm=false will clear the relevant
>> flags in the guest visible cpuid. I'd say it was a bug if this doesn't
>> happen.
>>
>> nestedhvm should be disabled by default, did you explicitly enable it?
>> Removing the line altogether ought to disable it too. Please let us know
>> if not.
>>
>> Ian.
>>
> I did not enable nestedvm and when I run xl list -l the output shows nestedhvm=<default>
> I was not sure what the default was supposed to be. I will try setting it and re-run our test.
> Jeff

nested-virt is strictly experimental, and still has known bugs (and
clearly some unknown ones).

I looked over the xl code and thought that nestedhvm should default to
false, but I would prefer someone more familar with libxl and the idl to
confirm what the default should be.

~Andrew

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-06 16:54               ` Andrew Cooper
@ 2013-11-06 17:06                 ` Ian Campbell
  2013-11-06 17:07                   ` Andrew Cooper
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-11-06 17:06 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Jeff_Zimmerman, JBeulich

On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote:
> I looked over the xl code and thought that nestedhvm should default to
> false, but I would prefer someone more familar with libxl and the idl to
> confirm what the default should be.

libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0
in that case. Is there some way to query the hypervisor for what it
thinks the setting is?

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-06 17:06                 ` Ian Campbell
@ 2013-11-06 17:07                   ` Andrew Cooper
  2013-11-07  9:10                     ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2013-11-06 17:07 UTC (permalink / raw)
  To: Ian Campbell
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Jeff_Zimmerman, JBeulich

On 06/11/13 17:06, Ian Campbell wrote:
> On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote:
>> I looked over the xl code and thought that nestedhvm should default to
>> false, but I would prefer someone more familar with libxl and the idl to
>> confirm what the default should be.
> libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0
> in that case. Is there some way to query the hypervisor for what it
> thinks the setting is?
>
> Ian.
>
>

A get hvmparam hypercall will retrieve the value, but it is initialised
to 0 and only ever set by a set hvmparam hypercall.

~Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-06 17:07                   ` Andrew Cooper
@ 2013-11-07  9:10                     ` Jan Beulich
  2013-11-07  9:30                       ` Ian Campbell
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2013-11-07  9:10 UTC (permalink / raw)
  To: Andrew Cooper, Ian Campbell
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Jeff_Zimmerman

>>> On 06.11.13 at 18:07, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 06/11/13 17:06, Ian Campbell wrote:
>> On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote:
>>> I looked over the xl code and thought that nestedhvm should default to
>>> false, but I would prefer someone more familar with libxl and the idl to
>>> confirm what the default should be.
>> libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0
>> in that case. Is there some way to query the hypervisor for what it
>> thinks the setting is?
> 
> A get hvmparam hypercall will retrieve the value, but it is initialised
> to 0 and only ever set by a set hvmparam hypercall.

Which makes me start suspecting that the guest might be deriving
its information on VMX being available from something other than
CPUID. Of course we ought to confirm that we don't unintentionally
return the VMX flag set (and that the config file doesn't override it
in this way - I think we shouldn't be suppressing user overrides
here, but I didn't go check whether we do).

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07  9:10                     ` Jan Beulich
@ 2013-11-07  9:30                       ` Ian Campbell
  2013-11-07 15:41                         ` Jeff_Zimmerman
  0 siblings, 1 reply; 32+ messages in thread
From: Ian Campbell @ 2013-11-07  9:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: lars.kurth.xen, Andrew Cooper, lars.kurth, Jeff_Zimmerman,
	xen-devel

On Thu, 2013-11-07 at 09:10 +0000, Jan Beulich wrote:
> >>> On 06.11.13 at 18:07, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> > On 06/11/13 17:06, Ian Campbell wrote:
> >> On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote:
> >>> I looked over the xl code and thought that nestedhvm should default to
> >>> false, but I would prefer someone more familar with libxl and the idl to
> >>> confirm what the default should be.
> >> libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0
> >> in that case. Is there some way to query the hypervisor for what it
> >> thinks the setting is?
> > 
> > A get hvmparam hypercall will retrieve the value, but it is initialised
> > to 0 and only ever set by a set hvmparam hypercall.
> 
> Which makes me start suspecting that the guest might be deriving
> its information on VMX being available from something other than
> CPUID. Of course we ought to confirm that we don't unintentionally
> return the VMX flag set (and that the config file doesn't override it
> in this way - I think we shouldn't be suppressing user overrides
> here, but I didn't go check whether we do).

I was also wondering about the behaviour of using vmx instructions in a
guest despite vmx not being visible in cpuid...

Ian.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07  9:30                       ` Ian Campbell
@ 2013-11-07 15:41                         ` Jeff_Zimmerman
  2013-11-07 15:54                           ` Andrew Cooper
  2013-11-07 15:57                           ` Jan Beulich
  0 siblings, 2 replies; 32+ messages in thread
From: Jeff_Zimmerman @ 2013-11-07 15:41 UTC (permalink / raw)
  To: Ian.Campbell
  Cc: lars.kurth.xen, andrew.cooper3, lars.kurth, JBeulich, xen-devel


On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com>
 wrote:

> On Thu, 2013-11-07 at 09:10 +0000, Jan Beulich wrote:
>>>>> On 06.11.13 at 18:07, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 06/11/13 17:06, Ian Campbell wrote:
>>>> On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote:
>>>>> I looked over the xl code and thought that nestedhvm should default to
>>>>> false, but I would prefer someone more familar with libxl and the idl to
>>>>> confirm what the default should be.
>>>> libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0
>>>> in that case. Is there some way to query the hypervisor for what it
>>>> thinks the setting is?
>>> 
>>> A get hvmparam hypercall will retrieve the value, but it is initialised
>>> to 0 and only ever set by a set hvmparam hypercall.
>> 
>> Which makes me start suspecting that the guest might be deriving
>> its information on VMX being available from something other than
>> CPUID. Of course we ought to confirm that we don't unintentionally
>> return the VMX flag set (and that the config file doesn't override it
>> in this way - I think we shouldn't be suppressing user overrides
>> here, but I didn't go check whether we do).
> 
> I was also wondering about the behaviour of using vmx instructions in a
> guest despite vmx not being visible in cpuid...
> 
> Ian.
> 
> 
We have found in our situation this is exactly the case. To verify we wrote some
test code that makes vmx calls without checking cupid. On bare hardware the program
executes as expected. In a VM on Xen it causes the hypervisor to panic.

>From a security standpoint this is very very bad. It might be a good idea to provide either
a run-time or build-time option to disable nestedhvm. Just turning off the vmx bit is not enough
as malicious or badly written code can cause a system crash.

For us it looks like we can disable these instructions and avoid the crash.

Jeff.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 15:41                         ` Jeff_Zimmerman
@ 2013-11-07 15:54                           ` Andrew Cooper
  2013-11-07 16:00                             ` Jan Beulich
  2013-11-07 15:57                           ` Jan Beulich
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2013-11-07 15:54 UTC (permalink / raw)
  To: Jeff_Zimmerman
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell, JBeulich

On 07/11/13 15:41, Jeff_Zimmerman@McAfee.com wrote:
> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com>
>  wrote:
>
>> On Thu, 2013-11-07 at 09:10 +0000, Jan Beulich wrote:
>>>>>> On 06.11.13 at 18:07, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>> On 06/11/13 17:06, Ian Campbell wrote:
>>>>> On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote:
>>>>>> I looked over the xl code and thought that nestedhvm should default to
>>>>>> false, but I would prefer someone more familar with libxl and the idl to
>>>>>> confirm what the default should be.
>>>>> libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0
>>>>> in that case. Is there some way to query the hypervisor for what it
>>>>> thinks the setting is?
>>>> A get hvmparam hypercall will retrieve the value, but it is initialised
>>>> to 0 and only ever set by a set hvmparam hypercall.
>>> Which makes me start suspecting that the guest might be deriving
>>> its information on VMX being available from something other than
>>> CPUID. Of course we ought to confirm that we don't unintentionally
>>> return the VMX flag set (and that the config file doesn't override it
>>> in this way - I think we shouldn't be suppressing user overrides
>>> here, but I didn't go check whether we do).
>> I was also wondering about the behaviour of using vmx instructions in a
>> guest despite vmx not being visible in cpuid...
>>
>> Ian.
>>
>>
> We have found in our situation this is exactly the case. To verify we wrote some
> test code that makes vmx calls without checking cupid. On bare hardware the program
> executes as expected. In a VM on Xen it causes the hypervisor to panic.
>
> From a security standpoint this is very very bad. It might be a good idea to provide either
> a run-time or build-time option to disable nestedhvm. Just turning off the vmx bit is not enough
> as malicious or badly written code can cause a system crash.
>
> For us it looks like we can disable these instructions and avoid the crash.
>
> Jeff.

Hmm - that is very concerning that.

And there does look to be a bug.

Can you try the following patch and see whether it helps?

diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index c9afb56..7b1a349 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -359,7 +359,7 @@ static inline int hvm_event_pending(struct vcpu *v)
 /* These bits in CR4 cannot be set by the guest. */
 #define HVM_CR4_GUEST_RESERVED_BITS(_v)                 \
     (~((unsigned long)                                  \
-       (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD |       \
+       (X86_CR4_PVI | X86_CR4_TSD |                     \
         X86_CR4_DE  | X86_CR4_PSE | X86_CR4_PAE |       \
         X86_CR4_MCE | X86_CR4_PGE | X86_CR4_PCE |       \
         X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT |           \

~Andrew

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 15:41                         ` Jeff_Zimmerman
  2013-11-07 15:54                           ` Andrew Cooper
@ 2013-11-07 15:57                           ` Jan Beulich
  2013-11-07 16:02                             ` Jeff_Zimmerman
  1 sibling, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2013-11-07 15:57 UTC (permalink / raw)
  To: Ian.Campbell, Jeff_Zimmerman
  Cc: lars.kurth.xen, andrew.cooper3, lars.kurth, xen-devel

>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote:
> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com>  wrote:
>> I was also wondering about the behaviour of using vmx instructions in a
>> guest despite vmx not being visible in cpuid...
>> 
> We have found in our situation this is exactly the case. To verify we wrote 
> some
> test code that makes vmx calls without checking cupid. On bare hardware the 
> program
> executes as expected. In a VM on Xen it causes the hypervisor to panic.

You trying it doesn't yet imply that Windows also does so.

Also, you say "program" - are you using these from user mode code?

> From a security standpoint this is very very bad. It might be a good idea to 
> provide either
> a run-time or build-time option to disable nestedhvm. Just turning off the vmx 
> bit is not enough
> as malicious or badly written code can cause a system crash.

Yes, we will absolutely need to do that.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 15:54                           ` Andrew Cooper
@ 2013-11-07 16:00                             ` Jan Beulich
  2013-11-07 16:06                               ` Andrew Cooper
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2013-11-07 16:00 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell,
	Jeff_Zimmerman

>>> On 07.11.13 at 16:54, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> Can you try the following patch and see whether it helps?
> 
> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
> index c9afb56..7b1a349 100644
> --- a/xen/include/asm-x86/hvm/hvm.h
> +++ b/xen/include/asm-x86/hvm/hvm.h
> @@ -359,7 +359,7 @@ static inline int hvm_event_pending(struct vcpu *v)
>  /* These bits in CR4 cannot be set by the guest. */
>  #define HVM_CR4_GUEST_RESERVED_BITS(_v)                 \
>      (~((unsigned long)                                  \
> -       (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD |       \
> +       (X86_CR4_PVI | X86_CR4_TSD |                     \

Are you mixing up VME and VMXE perhaps?

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 15:57                           ` Jan Beulich
@ 2013-11-07 16:02                             ` Jeff_Zimmerman
  2013-11-07 16:53                               ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff_Zimmerman @ 2013-11-07 16:02 UTC (permalink / raw)
  To: JBeulich
  Cc: lars.kurth.xen, andrew.cooper3, lars.kurth, Ian.Campbell,
	xen-devel


On Nov 7, 2013, at 7:57 AM, Jan Beulich <JBeulich@suse.com>
 wrote:

>>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote:
>> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com>  wrote:
>>> I was also wondering about the behaviour of using vmx instructions in a
>>> guest despite vmx not being visible in cpuid...
>>> 
>> We have found in our situation this is exactly the case. To verify we wrote 
>> some
>> test code that makes vmx calls without checking cupid. On bare hardware the 
>> program
>> executes as expected. In a VM on Xen it causes the hypervisor to panic.
> 
> You trying it doesn't yet imply that Windows also does so.
> 
> Also, you say "program" - are you using these from user mode code?

Yes, from windows run as a privileged user. Windows XP sp3 can cause the crash.
It seems windows 7 has better security, we cannot crash the system from a win7 guest.
> 
>> From a security standpoint this is very very bad. It might be a good idea to 
>> provide either
>> a run-time or build-time option to disable nestedhvm. Just turning off the vmx 
>> bit is not enough
>> as malicious or badly written code can cause a system crash.
> 
> Yes, we will absolutely need to do that.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 16:00                             ` Jan Beulich
@ 2013-11-07 16:06                               ` Andrew Cooper
  2013-11-07 16:12                                 ` Jeff_Zimmerman
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2013-11-07 16:06 UTC (permalink / raw)
  To: Jan Beulich
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell,
	Jeff_Zimmerman

On 07/11/13 16:00, Jan Beulich wrote:
>>>> On 07.11.13 at 16:54, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> Can you try the following patch and see whether it helps?
>>
>> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
>> index c9afb56..7b1a349 100644
>> --- a/xen/include/asm-x86/hvm/hvm.h
>> +++ b/xen/include/asm-x86/hvm/hvm.h
>> @@ -359,7 +359,7 @@ static inline int hvm_event_pending(struct vcpu *v)
>>  /* These bits in CR4 cannot be set by the guest. */
>>  #define HVM_CR4_GUEST_RESERVED_BITS(_v)                 \
>>      (~((unsigned long)                                  \
>> -       (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD |       \
>> +       (X86_CR4_PVI | X86_CR4_TSD |                     \
> Are you mixing up VME and VMXE perhaps?
>
> Jan
>

I am indeed.  Apologies for the noise, but I am still quite concerned

I shall attempt to repro this on a XenRT machine

Jeff: What system is this on (so I can pick a similar server to try with)?

~Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 16:06                               ` Andrew Cooper
@ 2013-11-07 16:12                                 ` Jeff_Zimmerman
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff_Zimmerman @ 2013-11-07 16:12 UTC (permalink / raw)
  To: andrew.cooper3
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell, JBeulich


On Nov 7, 2013, at 8:06 AM, Andrew Cooper <andrew.cooper3@citrix.com>
 wrote:

> On 07/11/13 16:00, Jan Beulich wrote:
>>>>> On 07.11.13 at 16:54, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> Can you try the following patch and see whether it helps?
>>> 
>>> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
>>> index c9afb56..7b1a349 100644
>>> --- a/xen/include/asm-x86/hvm/hvm.h
>>> +++ b/xen/include/asm-x86/hvm/hvm.h
>>> @@ -359,7 +359,7 @@ static inline int hvm_event_pending(struct vcpu *v)
>>> /* These bits in CR4 cannot be set by the guest. */
>>> #define HVM_CR4_GUEST_RESERVED_BITS(_v)                 \
>>>     (~((unsigned long)                                  \
>>> -       (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD |       \
>>> +       (X86_CR4_PVI | X86_CR4_TSD |                     \
>> Are you mixing up VME and VMXE perhaps?
>> 
>> Jan
>> 
> 
> I am indeed.  Apologies for the noise, but I am still quite concerned
> 
> I shall attempt to repro this on a XenRT machine
> 
> Jeff: What system is this on (so I can pick a similar server to try with)?

It is an intel S4600LH board.
> 
> ~Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 16:02                             ` Jeff_Zimmerman
@ 2013-11-07 16:53                               ` Jan Beulich
  2013-11-07 17:02                                 ` Andrew Cooper
                                                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Jan Beulich @ 2013-11-07 16:53 UTC (permalink / raw)
  To: Jeff_Zimmerman
  Cc: lars.kurth.xen, andrew.cooper3, lars.kurth, Ian.Campbell,
	xen-devel

[-- Attachment #1: Type: text/plain, Size: 1092 bytes --]

>>> On 07.11.13 at 17:02, <Jeff_Zimmerman@McAfee.com> wrote:

> On Nov 7, 2013, at 7:57 AM, Jan Beulich <JBeulich@suse.com>
>  wrote:
> 
>>>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote:
>>> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com>  wrote:
>>>> I was also wondering about the behaviour of using vmx instructions in a
>>>> guest despite vmx not being visible in cpuid...
>>>> 
>>> We have found in our situation this is exactly the case. To verify we wrote 
>>> some
>>> test code that makes vmx calls without checking cupid. On bare hardware the 
>>> program
>>> executes as expected. In a VM on Xen it causes the hypervisor to panic.
>> 
>> You trying it doesn't yet imply that Windows also does so.
>> 
>> Also, you say "program" - are you using these from user mode code?
> 
> Yes, from windows run as a privileged user. Windows XP sp3 can cause the 
> crash.
> It seems windows 7 has better security, we cannot crash the system from a 
> win7 guest.

Which is sort of odd. Anyway - care to try the attached patch?

Jan


[-- Attachment #2: xsa75.patch --]
[-- Type: text/plain, Size: 1667 bytes --]

nested VMX: VMLANUCH/VMRESUME emulation must check permission first thing

Otherwise uninitialized data may be used, leading to crashes.

This is XSA-75.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1509,15 +1509,10 @@ static void clear_vvmcs_launched(struct 
     }
 }
 
-int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs)
+static int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs)
 {
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
-    int rc;
-
-    rc = vmx_inst_check_privilege(regs, 0);
-    if ( rc != X86EMUL_OKAY )
-        return rc;
 
     /* check VMCS is valid and IO BITMAP is set */
     if ( (nvcpu->nv_vvmcxaddr != VMCX_EADDR) &&
@@ -1536,6 +1531,10 @@ int nvmx_handle_vmresume(struct cpu_user
     struct vcpu *v = current;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+    int rc = vmx_inst_check_privilege(regs, 0);
+
+    if ( rc != X86EMUL_OKAY )
+        return rc;
 
     if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR )
     {
@@ -1555,10 +1554,13 @@ int nvmx_handle_vmresume(struct cpu_user
 int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
 {
     bool_t launched;
-    int rc;
     struct vcpu *v = current;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+    int rc = vmx_inst_check_privilege(regs, 0);
+
+    if ( rc != X86EMUL_OKAY )
+        return rc;
 
     if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR )
     {

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 16:53                               ` Jan Beulich
@ 2013-11-07 17:02                                 ` Andrew Cooper
  2013-11-08  7:50                                   ` Jan Beulich
  2013-11-07 18:13                                 ` Andrew Cooper
  2013-11-07 18:33                                 ` Jeff_Zimmerman
  2 siblings, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2013-11-07 17:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Jeff_Zimmerman,
	Ian.Campbell

On 07/11/13 16:53, Jan Beulich wrote:
>>>> On 07.11.13 at 17:02, <Jeff_Zimmerman@McAfee.com> wrote:
>> On Nov 7, 2013, at 7:57 AM, Jan Beulich <JBeulich@suse.com>
>>  wrote:
>>
>>>>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote:
>>>> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com>  wrote:
>>>>> I was also wondering about the behaviour of using vmx instructions in a
>>>>> guest despite vmx not being visible in cpuid...
>>>>>
>>>> We have found in our situation this is exactly the case. To verify we wrote 
>>>> some
>>>> test code that makes vmx calls without checking cupid. On bare hardware the 
>>>> program
>>>> executes as expected. In a VM on Xen it causes the hypervisor to panic.
>>> You trying it doesn't yet imply that Windows also does so.
>>>
>>> Also, you say "program" - are you using these from user mode code?
>> Yes, from windows run as a privileged user. Windows XP sp3 can cause the 
>> crash.
>> It seems windows 7 has better security, we cannot crash the system from a 
>> win7 guest.
> Which is sort of odd. Anyway - care to try the attached patch?
>
> Jan
>

While the patch does look plausible, there is still clearly an issue
that an HVM guest with nested_virt disabled can even use the VMX
instructions, rather than getting flat out #UD exceptions.

~Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 16:53                               ` Jan Beulich
  2013-11-07 17:02                                 ` Andrew Cooper
@ 2013-11-07 18:13                                 ` Andrew Cooper
  2013-11-07 18:33                                 ` Jeff_Zimmerman
  2 siblings, 0 replies; 32+ messages in thread
From: Andrew Cooper @ 2013-11-07 18:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Jeff_Zimmerman,
	Ian.Campbell

On 07/11/13 16:53, Jan Beulich wrote:
>>>> On 07.11.13 at 17:02, <Jeff_Zimmerman@McAfee.com> wrote:
>> On Nov 7, 2013, at 7:57 AM, Jan Beulich <JBeulich@suse.com>
>>  wrote:
>>
>>>>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote:
>>>> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com>  wrote:
>>>>> I was also wondering about the behaviour of using vmx instructions in a
>>>>> guest despite vmx not being visible in cpuid...
>>>>>
>>>> We have found in our situation this is exactly the case. To verify we wrote 
>>>> some
>>>> test code that makes vmx calls without checking cupid. On bare hardware the 
>>>> program
>>>> executes as expected. In a VM on Xen it causes the hypervisor to panic.
>>> You trying it doesn't yet imply that Windows also does so.
>>>
>>> Also, you say "program" - are you using these from user mode code?
>> Yes, from windows run as a privileged user. Windows XP sp3 can cause the 
>> crash.
>> It seems windows 7 has better security, we cannot crash the system from a 
>> win7 guest.
> Which is sort of odd. Anyway - care to try the attached patch?
>
> Jan
>

I have managed to reproduce the issue, and the patch appears to fix things.

I have to admit to being very surprised that the VMX hardware doesn't
check CR4.VMXE before causing a vmexit.

Reviewed-and-tested-by: Andrew Cooper <andrew.cooper3@citrix.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 16:53                               ` Jan Beulich
  2013-11-07 17:02                                 ` Andrew Cooper
  2013-11-07 18:13                                 ` Andrew Cooper
@ 2013-11-07 18:33                                 ` Jeff_Zimmerman
  2 siblings, 0 replies; 32+ messages in thread
From: Jeff_Zimmerman @ 2013-11-07 18:33 UTC (permalink / raw)
  To: JBeulich
  Cc: lars.kurth.xen, andrew.cooper3, lars.kurth, Ian.Campbell,
	xen-devel


On Nov 7, 2013, at 8:53 AM, Jan Beulich <JBeulich@suse.com>
 wrote:

>>>> On 07.11.13 at 17:02, <Jeff_Zimmerman@McAfee.com> wrote:
> 
>> On Nov 7, 2013, at 7:57 AM, Jan Beulich <JBeulich@suse.com>
>> wrote:
>> 
>>>>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote:
>>>> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com>  wrote:
>>>>> I was also wondering about the behaviour of using vmx instructions in a
>>>>> guest despite vmx not being visible in cpuid...
>>>>> 
>>>> We have found in our situation this is exactly the case. To verify we wrote 
>>>> some
>>>> test code that makes vmx calls without checking cupid. On bare hardware the 
>>>> program
>>>> executes as expected. In a VM on Xen it causes the hypervisor to panic.
>>> 
>>> You trying it doesn't yet imply that Windows also does so.
>>> 
>>> Also, you say "program" - are you using these from user mode code?
>> 
>> Yes, from windows run as a privileged user. Windows XP sp3 can cause the 
>> crash.
>> It seems windows 7 has better security, we cannot crash the system from a 
>> win7 guest.
> 
> Which is sort of odd. Anyway - care to try the attached patch?
> 
> Jan
> 
> <xsa75.patch>

Just tried your patch. It seems to mitigate the problem.
Thanks!  -jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
  2013-11-07 17:02                                 ` Andrew Cooper
@ 2013-11-08  7:50                                   ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2013-11-08  7:50 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell,
	Jeff_Zimmerman

>>> On 07.11.13 at 18:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> While the patch does look plausible, there is still clearly an issue
> that an HVM guest with nested_virt disabled can even use the VMX
> instructions, rather than getting flat out #UD exceptions.

The real CR4.VMXE is (of course) set, and basing a decision on the
read shadow would clearly be wrong from an architectural pov (as
then this would no longer be just a read shadow).

And this isn't the problem here anyway - one problems is that the
privilege level check is done _after_ the VMX non-root mode one.
I guess they do it that way in order to allow the VMM maximum
flexibility.

Jan

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2013-11-08  7:50 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-04 19:54 Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) Lars Kurth
2013-11-04 20:00 ` Andrew Cooper
2013-11-05  9:53 ` Ian Campbell
2013-11-05 10:04 ` Jan Beulich
2013-11-05 15:46   ` Lars Kurth
2013-11-05 21:55     ` Jeff_Zimmerman
     [not found]     ` <5E2B3362-4D93-4FEF-987A-E477B0DCEE51@mcafee.com>
2013-11-06 14:09       ` Jan Beulich
2013-11-06 16:05         ` Jeff_Zimmerman
2013-11-06 16:16           ` Jan Beulich
2013-11-06 16:18           ` Ian Campbell
2013-11-06 16:48             ` Jeff_Zimmerman
2013-11-06 16:54               ` Andrew Cooper
2013-11-06 17:06                 ` Ian Campbell
2013-11-06 17:07                   ` Andrew Cooper
2013-11-07  9:10                     ` Jan Beulich
2013-11-07  9:30                       ` Ian Campbell
2013-11-07 15:41                         ` Jeff_Zimmerman
2013-11-07 15:54                           ` Andrew Cooper
2013-11-07 16:00                             ` Jan Beulich
2013-11-07 16:06                               ` Andrew Cooper
2013-11-07 16:12                                 ` Jeff_Zimmerman
2013-11-07 15:57                           ` Jan Beulich
2013-11-07 16:02                             ` Jeff_Zimmerman
2013-11-07 16:53                               ` Jan Beulich
2013-11-07 17:02                                 ` Andrew Cooper
2013-11-08  7:50                                   ` Jan Beulich
2013-11-07 18:13                                 ` Andrew Cooper
2013-11-07 18:33                                 ` Jeff_Zimmerman
     [not found] <CE9EAEF6.59305%asit.k.mallick@intel.com>
2013-11-05 22:46 ` Jeff_Zimmerman
2013-11-05 23:17   ` Mallick, Asit K
2013-11-06  0:23   ` Andrew Cooper
2013-11-06 10:05     ` Ian Campbell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).