* Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
@ 2013-11-04 19:54 Lars Kurth
2013-11-04 20:00 ` Andrew Cooper
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Lars Kurth @ 2013-11-04 19:54 UTC (permalink / raw)
To: xen-devel@lists.xen.org
[-- Attachment #1.1: Type: text/plain, Size: 4044 bytes --]
See
http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-1.html
---
I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
DOM0 is Centos 6.3 based with linux kernel 3.10.16.
In my configuration all of the windows HVMs are running having been
restored from xl save.
VM's are destroyed or restored in an on-demand fashion. After some time XEN
will experience a fatal page fault while restoring one of the windows HVM
subjects. This does not happen very often, perhaps once in a 16 to 48 hour
period.
The stack trace from xen follows. Thanks in advance for any help.
(XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
(XEN) CPU: 52
(XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
(XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
(XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
(XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
(XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
(XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
(XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
(XEN) cr3: 000000211bee5000 cr2: ffff810000000000
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff8310333e7cd8:
(XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
(XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
(XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
(XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
(XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
(XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
(XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
(XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
(XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
(XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
(XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
(XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
(XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
(XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
(XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
(XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
(XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN) [] domain_page_map_to_mfn+0x86/0xc0
(XEN) [] nvmx_handle_vmlaunch+0x49/0x160
(XEN) [] __update_vcpu_system_time+0x240/0x310
(XEN) [] vmx_vmexit_handler+0xb58/0x18c0
(XEN) [] pt_restore_timer+0xa8/0xc0
(XEN) [] hvm_io_assist+0xef/0x120
(XEN) [] hvm_do_resume+0x195/0x1c0
(XEN) [] vmx_do_resume+0x148/0x210
(XEN) [] context_switch+0x1bc/0xfc0
(XEN) [] schedule+0x254/0x5f0
(XEN) [] pt_update_irq+0x256/0x2b0
(XEN) [] timer_softirq_action+0x168/0x210
(XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
(XEN) [] nvmx_switch_guest+0x54/0x1560
(XEN) [] vmx_intr_assist+0x6c/0x490
(XEN) [] vmx_vmenter_helper+0x88/0x160
(XEN) [] __do_softirq+0x69/0xa0
(XEN) [] __do_softirq+0x69/0xa0
(XEN) [] vmx_asm_do_vmentry+0/0xed
(XEN)
(XEN) Pagetable walk from ffff810000000000:
(XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
(XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 52:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff810000000000
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
[-- Attachment #1.2: Type: text/html, Size: 7553 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-04 19:54 Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) Lars Kurth @ 2013-11-04 20:00 ` Andrew Cooper 2013-11-05 9:53 ` Ian Campbell 2013-11-05 10:04 ` Jan Beulich 2 siblings, 0 replies; 32+ messages in thread From: Andrew Cooper @ 2013-11-04 20:00 UTC (permalink / raw) To: Lars Kurth; +Cc: xen-devel@lists.xen.org [-- Attachment #1.1: Type: text/plain, Size: 4527 bytes --] On 04/11/13 19:54, Lars Kurth wrote: > See http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-1.html > --- > I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's. > DOM0 is Centos 6.3 based with linux kernel 3.10.16. > In my configuration all of the windows HVMs are running having been > restored from xl save. > VM's are destroyed or restored in an on-demand fashion. After some > time XEN will experience a fatal page fault while restoring one of the > windows HVM subjects. This does not happen very often, perhaps once in > a 16 to 48 hour period. > The stack trace from xen follows. Thanks in advance for any help. Which version of Xen were these images saved on? Are you expecting to be using nested-virt? (It is still very definitely experimental) ~Andrew > > (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]---- > (XEN) CPU: 52 > (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0 > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000 > (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000 > (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0 > (XEN) cr3: 000000211bee5000 cr2: ffff810000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff8310333e7cd8: > (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000 > (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548 > (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60 > (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000 > (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000 > (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440 > (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880 > (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880 > (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000 > (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440 > (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380 > (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00 > (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490 > (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c > (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9 > (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000 > (XEN) Xen call trace: > (XEN) [] domain_page_map_to_mfn+0x86/0xc0 > (XEN) [] nvmx_handle_vmlaunch+0x49/0x160 > (XEN) [] __update_vcpu_system_time+0x240/0x310 > (XEN) [] vmx_vmexit_handler+0xb58/0x18c0 > (XEN) [] pt_restore_timer+0xa8/0xc0 > (XEN) [] hvm_io_assist+0xef/0x120 > (XEN) [] hvm_do_resume+0x195/0x1c0 > (XEN) [] vmx_do_resume+0x148/0x210 > (XEN) [] context_switch+0x1bc/0xfc0 > (XEN) [] schedule+0x254/0x5f0 > (XEN) [] pt_update_irq+0x256/0x2b0 > (XEN) [] timer_softirq_action+0x168/0x210 > (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0 > (XEN) [] nvmx_switch_guest+0x54/0x1560 > (XEN) [] vmx_intr_assist+0x6c/0x490 > (XEN) [] vmx_vmenter_helper+0x88/0x160 > (XEN) [] __do_softirq+0x69/0xa0 > (XEN) [] __do_softirq+0x69/0xa0 > (XEN) [] vmx_asm_do_vmentry+0/0xed > (XEN) > (XEN) Pagetable walk from ffff810000000000: > (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff > (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 52: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0000] > (XEN) Faulting linear address: ffff810000000000 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel [-- Attachment #1.2: Type: text/html, Size: 10163 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-04 19:54 Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) Lars Kurth 2013-11-04 20:00 ` Andrew Cooper @ 2013-11-05 9:53 ` Ian Campbell 2013-11-05 10:04 ` Jan Beulich 2 siblings, 0 replies; 32+ messages in thread From: Ian Campbell @ 2013-11-05 9:53 UTC (permalink / raw) To: Lars Kurth; +Cc: xen-devel@lists.xen.org On Mon, 2013-11-04 at 19:54 +0000, Lars Kurth wrote: > See > http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-1.html TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to do would be to redirect them to xen-devel themselves (with a reminder that they do not need to subscribe to post). This is going to take some back and forth to get to the bottom of and having you sit in the middle is just silly. Ian. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-04 19:54 Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) Lars Kurth 2013-11-04 20:00 ` Andrew Cooper 2013-11-05 9:53 ` Ian Campbell @ 2013-11-05 10:04 ` Jan Beulich 2013-11-05 15:46 ` Lars Kurth 2 siblings, 1 reply; 32+ messages in thread From: Jan Beulich @ 2013-11-05 10:04 UTC (permalink / raw) To: Lars Kurth; +Cc: xen-devel >>> On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com> wrote: > See > http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3- > 1.html > --- > I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's. > DOM0 is Centos 6.3 based with linux kernel 3.10.16. > In my configuration all of the windows HVMs are running having been > restored from xl save. > VM's are destroyed or restored in an on-demand fashion. After some time XEN > will experience a fatal page fault while restoring one of the windows HVM > subjects. This does not happen very often, perhaps once in a 16 to 48 hour > period. > The stack trace from xen follows. Thanks in advance for any help. > > (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]---- > (XEN) CPU: 52 > (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0 Zapping addresses (here and below in the stack trace) is never helpful when someone asks for help with a crash. Also, in order to not just guess, the matching xen-syms or xen.efi should be made available or pointed to. > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000 > (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000 > (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0 > (XEN) cr3: 000000211bee5000 cr2: ffff810000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff8310333e7cd8: > (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000 > (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548 > (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60 > (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000 > (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000 > (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440 > (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880 > (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880 > (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000 > (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440 > (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380 > (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00 > (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490 > (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c > (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9 > (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000 > (XEN) Xen call trace: > (XEN) [] domain_page_map_to_mfn+0x86/0xc0 > (XEN) [] nvmx_handle_vmlaunch+0x49/0x160 > (XEN) [] __update_vcpu_system_time+0x240/0x310 > (XEN) [] vmx_vmexit_handler+0xb58/0x18c0 > (XEN) [] pt_restore_timer+0xa8/0xc0 > (XEN) [] hvm_io_assist+0xef/0x120 > (XEN) [] hvm_do_resume+0x195/0x1c0 > (XEN) [] vmx_do_resume+0x148/0x210 > (XEN) [] context_switch+0x1bc/0xfc0 > (XEN) [] schedule+0x254/0x5f0 > (XEN) [] pt_update_irq+0x256/0x2b0 > (XEN) [] timer_softirq_action+0x168/0x210 > (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0 > (XEN) [] nvmx_switch_guest+0x54/0x1560 > (XEN) [] vmx_intr_assist+0x6c/0x490 > (XEN) [] vmx_vmenter_helper+0x88/0x160 > (XEN) [] __do_softirq+0x69/0xa0 > (XEN) [] __do_softirq+0x69/0xa0 > (XEN) [] vmx_asm_do_vmentry+0/0xed > (XEN) > (XEN) Pagetable walk from ffff810000000000: > (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff > (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff This makes me suspect that domain_page_map_to_mfn() gets a NULL pointer passed here. As said above, this is only guesswork at this point, and as Ian already pointed out, directing the reporter to xen-devel would seem to be the right thing to do here anyway. Jan > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 52: > (XEN) FATAL PAGE FAULT > (XEN) [error_code=0000] > (XEN) Faulting linear address: ffff810000000000 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-05 10:04 ` Jan Beulich @ 2013-11-05 15:46 ` Lars Kurth 2013-11-05 21:55 ` Jeff_Zimmerman [not found] ` <5E2B3362-4D93-4FEF-987A-E477B0DCEE51@mcafee.com> 0 siblings, 2 replies; 32+ messages in thread From: Lars Kurth @ 2013-11-05 15:46 UTC (permalink / raw) To: Jan Beulich, Lars Kurth, jeff_zimmerman; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 5556 bytes --] Jan, Andrew, Ian, pulling in Jeff who raised the question. Snippets from misc replies attached. Jeff, please look through these (in particular Jan's answer) and answer any further questions on this thread. On 05/11/2013 09:53, Ian Campbell wrote: > TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to > do would be to redirect them to xen-devel themselves (with a reminder that they do not need > to subscribe to post). Agreed. Another option is for me to start the thread and pull in the raiser of the thread into it, if it is a bug. Was not sure this was a real bug at first, but it seems it is. On 04/11/2013 20:00, Andrew Cooper wrote: > Which version of Xen were these images saved on? [Jeff] We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0. > Are you expecting to be using nested-virt? (It is still very definitely experimental) [Jeff] Not using nested-virt. On 05/11/2013 10:04, Jan Beulich wrote: >>>> On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com> wrote: >> See >> http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3- >> 1.html >> --- >> I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's. >> DOM0 is Centos 6.3 based with linux kernel 3.10.16. >> In my configuration all of the windows HVMs are running having been >> restored from xl save. >> VM's are destroyed or restored in an on-demand fashion. After some time XEN >> will experience a fatal page fault while restoring one of the windows HVM >> subjects. This does not happen very often, perhaps once in a 16 to 48 hour >> period. >> The stack trace from xen follows. Thanks in advance for any help. >> >> (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]---- >> (XEN) CPU: 52 >> (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0 > Zapping addresses (here and below in the stack trace) is never > helpful when someone asks for help with a crash. Also, in order > to not just guess, the matching xen-syms or xen.efi should be > made available or pointed to. > >> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor >> (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000 >> (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000 >> (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000 >> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 >> (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000 >> (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0 >> (XEN) cr3: 000000211bee5000 cr2: ffff810000000000 >> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 >> (XEN) Xen stack trace from rsp=ffff8310333e7cd8: >> (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000 >> (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548 >> (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60 >> (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000 >> (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000 >> (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440 >> (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880 >> (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880 >> (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000 >> (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000 >> (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440 >> (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380 >> (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00 >> (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490 >> (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c >> (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9 >> (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3 >> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >> (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [] domain_page_map_to_mfn+0x86/0xc0 >> (XEN) [] nvmx_handle_vmlaunch+0x49/0x160 >> (XEN) [] __update_vcpu_system_time+0x240/0x310 >> (XEN) [] vmx_vmexit_handler+0xb58/0x18c0 >> (XEN) [] pt_restore_timer+0xa8/0xc0 >> (XEN) [] hvm_io_assist+0xef/0x120 >> (XEN) [] hvm_do_resume+0x195/0x1c0 >> (XEN) [] vmx_do_resume+0x148/0x210 >> (XEN) [] context_switch+0x1bc/0xfc0 >> (XEN) [] schedule+0x254/0x5f0 >> (XEN) [] pt_update_irq+0x256/0x2b0 >> (XEN) [] timer_softirq_action+0x168/0x210 >> (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0 >> (XEN) [] nvmx_switch_guest+0x54/0x1560 >> (XEN) [] vmx_intr_assist+0x6c/0x490 >> (XEN) [] vmx_vmenter_helper+0x88/0x160 >> (XEN) [] __do_softirq+0x69/0xa0 >> (XEN) [] __do_softirq+0x69/0xa0 >> (XEN) [] vmx_asm_do_vmentry+0/0xed >> (XEN) >> (XEN) Pagetable walk from ffff810000000000: >> (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff >> (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff > This makes me suspect that domain_page_map_to_mfn() gets a > NULL pointer passed here. As said above, this is only guesswork > at this point, and as Ian already pointed out, directing the > reporter to xen-devel would seem to be the right thing to do > here anyway. > > Jan [-- Attachment #1.2: Type: text/html, Size: 8201 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-05 15:46 ` Lars Kurth @ 2013-11-05 21:55 ` Jeff_Zimmerman [not found] ` <5E2B3362-4D93-4FEF-987A-E477B0DCEE51@mcafee.com> 1 sibling, 0 replies; 32+ messages in thread From: Jeff_Zimmerman @ 2013-11-05 21:55 UTC (permalink / raw) To: lars.kurth; +Cc: lars.kurth.xen, xen-devel, JBeulich [-- Attachment #1.1: Type: text/plain, Size: 5686 bytes --] Lars, I understand the mailing list limits attachment size to 512K. Where can I post the xen binary an symbols file? Jeff On Nov 5, 2013, at 7:46 AM, Lars Kurth <lars.kurth@xen.org<mailto:lars.kurth@xen.org>> wrote: Jan, Andrew, Ian, pulling in Jeff who raised the question. Snippets from misc replies attached. Jeff, please look through these (in particular Jan's answer) and answer any further questions on this thread. On 05/11/2013 09:53, Ian Campbell wrote: > TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to > do would be to redirect them to xen-devel themselves (with a reminder that they do not need > to subscribe to post). Agreed. Another option is for me to start the thread and pull in the raiser of the thread into it, if it is a bug. Was not sure this was a real bug at first, but it seems it is. On 04/11/2013 20:00, Andrew Cooper wrote: > Which version of Xen were these images saved on? [Jeff] We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0. > Are you expecting to be using nested-virt? (It is still very definitely experimental) [Jeff] Not using nested-virt. On 05/11/2013 10:04, Jan Beulich wrote: On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com> wrote: See http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3- 1.html --- I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's. DOM0 is Centos 6.3 based with linux kernel 3.10.16. In my configuration all of the windows HVMs are running having been restored from xl save. VM's are destroyed or restored in an on-demand fashion. After some time XEN will experience a fatal page fault while restoring one of the windows HVM subjects. This does not happen very often, perhaps once in a 16 to 48 hour period. The stack trace from xen follows. Thanks in advance for any help. (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]---- (XEN) CPU: 52 (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0 Zapping addresses (here and below in the stack trace) is never helpful when someone asks for help with a crash. Also, in order to not just guess, the matching xen-syms or xen.efi should be made available or pointed to. (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000 (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000 (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000 (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0 (XEN) cr3: 000000211bee5000 cr2: ffff810000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff8310333e7cd8: (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000 (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548 (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60 (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000 (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000 (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440 (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880 (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880 (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000 (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440 (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380 (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00 (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490 (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9 (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000 (XEN) Xen call trace: (XEN) [] domain_page_map_to_mfn+0x86/0xc0 (XEN) [] nvmx_handle_vmlaunch+0x49/0x160 (XEN) [] __update_vcpu_system_time+0x240/0x310 (XEN) [] vmx_vmexit_handler+0xb58/0x18c0 (XEN) [] pt_restore_timer+0xa8/0xc0 (XEN) [] hvm_io_assist+0xef/0x120 (XEN) [] hvm_do_resume+0x195/0x1c0 (XEN) [] vmx_do_resume+0x148/0x210 (XEN) [] context_switch+0x1bc/0xfc0 (XEN) [] schedule+0x254/0x5f0 (XEN) [] pt_update_irq+0x256/0x2b0 (XEN) [] timer_softirq_action+0x168/0x210 (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0 (XEN) [] nvmx_switch_guest+0x54/0x1560 (XEN) [] vmx_intr_assist+0x6c/0x490 (XEN) [] vmx_vmenter_helper+0x88/0x160 (XEN) [] __do_softirq+0x69/0xa0 (XEN) [] __do_softirq+0x69/0xa0 (XEN) [] vmx_asm_do_vmentry+0/0xed (XEN) (XEN) Pagetable walk from ffff810000000000: (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff This makes me suspect that domain_page_map_to_mfn() gets a NULL pointer passed here. As said above, this is only guesswork at this point, and as Ian already pointed out, directing the reporter to xen-devel would seem to be the right thing to do here anyway. Jan [-- Attachment #1.2: Type: text/html, Size: 8620 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <5E2B3362-4D93-4FEF-987A-E477B0DCEE51@mcafee.com>]
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) [not found] ` <5E2B3362-4D93-4FEF-987A-E477B0DCEE51@mcafee.com> @ 2013-11-06 14:09 ` Jan Beulich 2013-11-06 16:05 ` Jeff_Zimmerman 0 siblings, 1 reply; 32+ messages in thread From: Jan Beulich @ 2013-11-06 14:09 UTC (permalink / raw) To: Jeff_Zimmerman; +Cc: lars.kurth.xen, xen-devel, lars.kurth [-- Attachment #1: Type: text/plain, Size: 651 bytes --] >>> On 05.11.13 at 22:36, <Jeff_Zimmerman@McAfee.com> wrote: > Attaching the xen binary and symbols file. > Hopefully they will come through. Please give the attached patch a try - afaict it should eliminate the host crash, but I'm pretty certain you'll then see the guest misbehave. Depending on what other load you place on the system as a whole, you're either overloading it (i.e. we're running out of mapping space in the hypervisor) or there's a mapping leak that - so far at least - I can't spot. In any event I'd suggest you try running a debug build of the hypervisor, so that eventual problems can be spotted earlier. Jan [-- Attachment #2: nVMX-map-errors.patch --] [-- Type: text/plain, Size: 4034 bytes --] --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -747,7 +747,7 @@ static void __clear_current_vvmcs(struct __vmpclear(virt_to_maddr(nvcpu->nv_n2vmcx)); } -static void __map_msr_bitmap(struct vcpu *v) +static bool_t __must_check _map_msr_bitmap(struct vcpu *v) { struct nestedvmx *nvmx = &vcpu_2_nvmx(v); unsigned long gpa; @@ -756,9 +756,11 @@ static void __map_msr_bitmap(struct vcpu hvm_unmap_guest_frame(nvmx->msrbitmap, 1); gpa = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, MSR_BITMAP); nvmx->msrbitmap = hvm_map_guest_frame_ro(gpa >> PAGE_SHIFT, 1); + + return nvmx->msrbitmap != NULL; } -static void __map_io_bitmap(struct vcpu *v, u64 vmcs_reg) +static bool_t __must_check _map_io_bitmap(struct vcpu *v, u64 vmcs_reg) { struct nestedvmx *nvmx = &vcpu_2_nvmx(v); unsigned long gpa; @@ -769,12 +771,14 @@ static void __map_io_bitmap(struct vcpu hvm_unmap_guest_frame(nvmx->iobitmap[index], 1); gpa = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, vmcs_reg); nvmx->iobitmap[index] = hvm_map_guest_frame_ro(gpa >> PAGE_SHIFT, 1); + + return nvmx->iobitmap[index] != NULL; } -static inline void map_io_bitmap_all(struct vcpu *v) +static inline bool_t __must_check map_io_bitmap_all(struct vcpu *v) { - __map_io_bitmap (v, IO_BITMAP_A); - __map_io_bitmap (v, IO_BITMAP_B); + return _map_io_bitmap(v, IO_BITMAP_A) && + _map_io_bitmap(v, IO_BITMAP_B); } static void nvmx_purge_vvmcs(struct vcpu *v) @@ -1608,9 +1612,15 @@ int nvmx_handle_vmptrld(struct cpu_user_ if ( nvcpu->nv_vvmcxaddr == VMCX_EADDR ) { nvcpu->nv_vvmcx = hvm_map_guest_frame_rw(gpa >> PAGE_SHIFT, 1); - nvcpu->nv_vvmcxaddr = gpa; - map_io_bitmap_all (v); - __map_msr_bitmap(v); + if ( nvcpu->nv_vvmcx ) + nvcpu->nv_vvmcxaddr = gpa; + if ( !nvcpu->nv_vvmcx || + !map_io_bitmap_all(v) || + !_map_msr_bitmap(v) ) + { + vmreturn(regs, VMFAIL_VALID); + goto out; + } } if ( cpu_has_vmx_vmcs_shadowing ) @@ -1676,9 +1686,8 @@ int nvmx_handle_vmclear(struct cpu_user_ { /* Even if this VMCS isn't the current one, we must clear it. */ vvmcs = hvm_map_guest_frame_rw(gpa >> PAGE_SHIFT, 0); - if ( vvmcs ) - clear_vvmcs_launched(&nvmx->launched_list, - domain_page_map_to_mfn(vvmcs)); + clear_vvmcs_launched(&nvmx->launched_list, + domain_page_map_to_mfn(vvmcs)); hvm_unmap_guest_frame(vvmcs, 0); } @@ -1722,6 +1731,7 @@ int nvmx_handle_vmwrite(struct cpu_user_ struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); unsigned long operand; u64 vmcs_encoding; + bool_t okay = 1; if ( decode_vmx_inst(regs, &decode, &operand, 0) != X86EMUL_OKAY ) @@ -1730,16 +1740,21 @@ int nvmx_handle_vmwrite(struct cpu_user_ vmcs_encoding = reg_read(regs, decode.reg2); __set_vvmcs(nvcpu->nv_vvmcx, vmcs_encoding, operand); - if ( vmcs_encoding == IO_BITMAP_A || vmcs_encoding == IO_BITMAP_A_HIGH ) - __map_io_bitmap (v, IO_BITMAP_A); - else if ( vmcs_encoding == IO_BITMAP_B || - vmcs_encoding == IO_BITMAP_B_HIGH ) - __map_io_bitmap (v, IO_BITMAP_B); + switch ( vmcs_encoding ) + { + case IO_BITMAP_A: case IO_BITMAP_A_HIGH: + okay = _map_io_bitmap(v, IO_BITMAP_A); + break; + case IO_BITMAP_B: case IO_BITMAP_B_HIGH: + okay = _map_io_bitmap(v, IO_BITMAP_B); + break; + case MSR_BITMAP: case MSR_BITMAP_HIGH: + okay = _map_msr_bitmap(v); + break; + } - if ( vmcs_encoding == MSR_BITMAP || vmcs_encoding == MSR_BITMAP_HIGH ) - __map_msr_bitmap(v); + vmreturn(regs, okay ? VMSUCCEED : VMFAIL_VALID); - vmreturn(regs, VMSUCCEED); return X86EMUL_OKAY; } [-- Attachment #3: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-06 14:09 ` Jan Beulich @ 2013-11-06 16:05 ` Jeff_Zimmerman 2013-11-06 16:16 ` Jan Beulich 2013-11-06 16:18 ` Ian Campbell 0 siblings, 2 replies; 32+ messages in thread From: Jeff_Zimmerman @ 2013-11-06 16:05 UTC (permalink / raw) To: JBeulich; +Cc: lars.kurth.xen, xen-devel, lars.kurth Jan, I will give your patch a try. I have to recant my previous statement regarding not using nested-virt. It seems some of the code that is being executed on the vm contains vmx instructions. Since by virtue of running this code in an hvm subject make it nested-virt. This raises a question, if this functionality is undesired can we just disable nested virt by adding nestedhvm=false to the configuration file? Should the cpuid and cupid_check settings be changed as well? Thanks, Jeff On Nov 6, 2013, at 6:09 AM, Jan Beulich <JBeulich@suse.com> wrote: >>>> On 05.11.13 at 22:36, <Jeff_Zimmerman@McAfee.com> wrote: >> Attaching the xen binary and symbols file. >> Hopefully they will come through. > > Please give the attached patch a try - afaict it should eliminate > the host crash, but I'm pretty certain you'll then see the guest > misbehave. Depending on what other load you place on the > system as a whole, you're either overloading it (i.e. we're > running out of mapping space in the hypervisor) or there's a > mapping leak that - so far at least - I can't spot. > > In any event I'd suggest you try running a debug build of the > hypervisor, so that eventual problems can be spotted earlier. > > Jan > > <nVMX-map-errors.patch> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-06 16:05 ` Jeff_Zimmerman @ 2013-11-06 16:16 ` Jan Beulich 2013-11-06 16:18 ` Ian Campbell 1 sibling, 0 replies; 32+ messages in thread From: Jan Beulich @ 2013-11-06 16:16 UTC (permalink / raw) To: Jeff_Zimmerman; +Cc: lars.kurth.xen, xen-devel, lars.kurth >>> On 06.11.13 at 17:05, <Jeff_Zimmerman@McAfee.com> wrote: > This raises a question, if this functionality is undesired can we just > disable nested virt by adding > nestedhvm=false to the configuration file? Sure. And as that's supposedly the default, just deleting the line should be fine too. > Should the cpuid and cupid_check settings be changed as well? I don't think so, unless you manually override it to look like VMX was available. That said - it would still be nice if you could help us figure out the bug's origin (and I assume you realize that it would be even more helpful for us if you did all this on 4.4-unstable). Jan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-06 16:05 ` Jeff_Zimmerman 2013-11-06 16:16 ` Jan Beulich @ 2013-11-06 16:18 ` Ian Campbell 2013-11-06 16:48 ` Jeff_Zimmerman 1 sibling, 1 reply; 32+ messages in thread From: Ian Campbell @ 2013-11-06 16:18 UTC (permalink / raw) To: Jeff_Zimmerman; +Cc: lars.kurth.xen, xen-devel, lars.kurth, JBeulich On Wed, 2013-11-06 at 16:05 +0000, Jeff_Zimmerman@McAfee.com wrote: > Jan, > > I will give your patch a try. > I have to recant my previous statement regarding not using nested-virt. > It seems some of the code that is being executed on the vm contains vmx instructions. > Since by virtue of running this code in an hvm subject make it nested-virt. > > This raises a question, if this functionality is undesired can we just disable nested virt by adding > nestedhvm=false to the configuration file? Should the cpuid and > cupid_check settings be changed as well? I'm reasonably certain that nestedhvm=false will clear the relevant flags in the guest visible cpuid. I'd say it was a bug if this doesn't happen. nestedhvm should be disabled by default, did you explicitly enable it? Removing the line altogether ought to disable it too. Please let us know if not. Ian. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-06 16:18 ` Ian Campbell @ 2013-11-06 16:48 ` Jeff_Zimmerman 2013-11-06 16:54 ` Andrew Cooper 0 siblings, 1 reply; 32+ messages in thread From: Jeff_Zimmerman @ 2013-11-06 16:48 UTC (permalink / raw) To: Ian.Campbell; +Cc: lars.kurth.xen, xen-devel, lars.kurth, JBeulich On Nov 6, 2013, at 8:18 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Wed, 2013-11-06 at 16:05 +0000, Jeff_Zimmerman@McAfee.com wrote: >> Jan, >> >> I will give your patch a try. >> I have to recant my previous statement regarding not using nested-virt. >> It seems some of the code that is being executed on the vm contains vmx instructions. >> Since by virtue of running this code in an hvm subject make it nested-virt. >> >> This raises a question, if this functionality is undesired can we just disable nested virt by adding >> nestedhvm=false to the configuration file? Should the cpuid and >> cupid_check settings be changed as well? > > I'm reasonably certain that nestedhvm=false will clear the relevant > flags in the guest visible cpuid. I'd say it was a bug if this doesn't > happen. > > nestedhvm should be disabled by default, did you explicitly enable it? > Removing the line altogether ought to disable it too. Please let us know > if not. > > Ian. > I did not enable nestedvm and when I run xl list -l the output shows nestedhvm=<default> I was not sure what the default was supposed to be. I will try setting it and re-run our test. Jeff ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-06 16:48 ` Jeff_Zimmerman @ 2013-11-06 16:54 ` Andrew Cooper 2013-11-06 17:06 ` Ian Campbell 0 siblings, 1 reply; 32+ messages in thread From: Andrew Cooper @ 2013-11-06 16:54 UTC (permalink / raw) To: Jeff_Zimmerman Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell, JBeulich On 06/11/13 16:48, Jeff_Zimmerman@McAfee.com wrote: > On Nov 6, 2013, at 8:18 AM, Ian Campbell <Ian.Campbell@citrix.com> > wrote: > >> On Wed, 2013-11-06 at 16:05 +0000, Jeff_Zimmerman@McAfee.com wrote: >>> Jan, >>> >>> I will give your patch a try. >>> I have to recant my previous statement regarding not using nested-virt. >>> It seems some of the code that is being executed on the vm contains vmx instructions. >>> Since by virtue of running this code in an hvm subject make it nested-virt. >>> >>> This raises a question, if this functionality is undesired can we just disable nested virt by adding >>> nestedhvm=false to the configuration file? Should the cpuid and >>> cupid_check settings be changed as well? >> I'm reasonably certain that nestedhvm=false will clear the relevant >> flags in the guest visible cpuid. I'd say it was a bug if this doesn't >> happen. >> >> nestedhvm should be disabled by default, did you explicitly enable it? >> Removing the line altogether ought to disable it too. Please let us know >> if not. >> >> Ian. >> > I did not enable nestedvm and when I run xl list -l the output shows nestedhvm=<default> > I was not sure what the default was supposed to be. I will try setting it and re-run our test. > Jeff nested-virt is strictly experimental, and still has known bugs (and clearly some unknown ones). I looked over the xl code and thought that nestedhvm should default to false, but I would prefer someone more familar with libxl and the idl to confirm what the default should be. ~Andrew > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-06 16:54 ` Andrew Cooper @ 2013-11-06 17:06 ` Ian Campbell 2013-11-06 17:07 ` Andrew Cooper 0 siblings, 1 reply; 32+ messages in thread From: Ian Campbell @ 2013-11-06 17:06 UTC (permalink / raw) To: Andrew Cooper Cc: lars.kurth.xen, xen-devel, lars.kurth, Jeff_Zimmerman, JBeulich On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote: > I looked over the xl code and thought that nestedhvm should default to > false, but I would prefer someone more familar with libxl and the idl to > confirm what the default should be. libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0 in that case. Is there some way to query the hypervisor for what it thinks the setting is? Ian. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-06 17:06 ` Ian Campbell @ 2013-11-06 17:07 ` Andrew Cooper 2013-11-07 9:10 ` Jan Beulich 0 siblings, 1 reply; 32+ messages in thread From: Andrew Cooper @ 2013-11-06 17:07 UTC (permalink / raw) To: Ian Campbell Cc: lars.kurth.xen, xen-devel, lars.kurth, Jeff_Zimmerman, JBeulich On 06/11/13 17:06, Ian Campbell wrote: > On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote: >> I looked over the xl code and thought that nestedhvm should default to >> false, but I would prefer someone more familar with libxl and the idl to >> confirm what the default should be. > libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0 > in that case. Is there some way to query the hypervisor for what it > thinks the setting is? > > Ian. > > A get hvmparam hypercall will retrieve the value, but it is initialised to 0 and only ever set by a set hvmparam hypercall. ~Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-06 17:07 ` Andrew Cooper @ 2013-11-07 9:10 ` Jan Beulich 2013-11-07 9:30 ` Ian Campbell 0 siblings, 1 reply; 32+ messages in thread From: Jan Beulich @ 2013-11-07 9:10 UTC (permalink / raw) To: Andrew Cooper, Ian Campbell Cc: lars.kurth.xen, xen-devel, lars.kurth, Jeff_Zimmerman >>> On 06.11.13 at 18:07, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 06/11/13 17:06, Ian Campbell wrote: >> On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote: >>> I looked over the xl code and thought that nestedhvm should default to >>> false, but I would prefer someone more familar with libxl and the idl to >>> confirm what the default should be. >> libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0 >> in that case. Is there some way to query the hypervisor for what it >> thinks the setting is? > > A get hvmparam hypercall will retrieve the value, but it is initialised > to 0 and only ever set by a set hvmparam hypercall. Which makes me start suspecting that the guest might be deriving its information on VMX being available from something other than CPUID. Of course we ought to confirm that we don't unintentionally return the VMX flag set (and that the config file doesn't override it in this way - I think we shouldn't be suppressing user overrides here, but I didn't go check whether we do). Jan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 9:10 ` Jan Beulich @ 2013-11-07 9:30 ` Ian Campbell 2013-11-07 15:41 ` Jeff_Zimmerman 0 siblings, 1 reply; 32+ messages in thread From: Ian Campbell @ 2013-11-07 9:30 UTC (permalink / raw) To: Jan Beulich Cc: lars.kurth.xen, Andrew Cooper, lars.kurth, Jeff_Zimmerman, xen-devel On Thu, 2013-11-07 at 09:10 +0000, Jan Beulich wrote: > >>> On 06.11.13 at 18:07, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > On 06/11/13 17:06, Ian Campbell wrote: > >> On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote: > >>> I looked over the xl code and thought that nestedhvm should default to > >>> false, but I would prefer someone more familar with libxl and the idl to > >>> confirm what the default should be. > >> libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0 > >> in that case. Is there some way to query the hypervisor for what it > >> thinks the setting is? > > > > A get hvmparam hypercall will retrieve the value, but it is initialised > > to 0 and only ever set by a set hvmparam hypercall. > > Which makes me start suspecting that the guest might be deriving > its information on VMX being available from something other than > CPUID. Of course we ought to confirm that we don't unintentionally > return the VMX flag set (and that the config file doesn't override it > in this way - I think we shouldn't be suppressing user overrides > here, but I didn't go check whether we do). I was also wondering about the behaviour of using vmx instructions in a guest despite vmx not being visible in cpuid... Ian. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 9:30 ` Ian Campbell @ 2013-11-07 15:41 ` Jeff_Zimmerman 2013-11-07 15:54 ` Andrew Cooper 2013-11-07 15:57 ` Jan Beulich 0 siblings, 2 replies; 32+ messages in thread From: Jeff_Zimmerman @ 2013-11-07 15:41 UTC (permalink / raw) To: Ian.Campbell Cc: lars.kurth.xen, andrew.cooper3, lars.kurth, JBeulich, xen-devel On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Thu, 2013-11-07 at 09:10 +0000, Jan Beulich wrote: >>>>> On 06.11.13 at 18:07, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> On 06/11/13 17:06, Ian Campbell wrote: >>>> On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote: >>>>> I looked over the xl code and thought that nestedhvm should default to >>>>> false, but I would prefer someone more familar with libxl and the idl to >>>>> confirm what the default should be. >>>> libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0 >>>> in that case. Is there some way to query the hypervisor for what it >>>> thinks the setting is? >>> >>> A get hvmparam hypercall will retrieve the value, but it is initialised >>> to 0 and only ever set by a set hvmparam hypercall. >> >> Which makes me start suspecting that the guest might be deriving >> its information on VMX being available from something other than >> CPUID. Of course we ought to confirm that we don't unintentionally >> return the VMX flag set (and that the config file doesn't override it >> in this way - I think we shouldn't be suppressing user overrides >> here, but I didn't go check whether we do). > > I was also wondering about the behaviour of using vmx instructions in a > guest despite vmx not being visible in cpuid... > > Ian. > > We have found in our situation this is exactly the case. To verify we wrote some test code that makes vmx calls without checking cupid. On bare hardware the program executes as expected. In a VM on Xen it causes the hypervisor to panic. >From a security standpoint this is very very bad. It might be a good idea to provide either a run-time or build-time option to disable nestedhvm. Just turning off the vmx bit is not enough as malicious or badly written code can cause a system crash. For us it looks like we can disable these instructions and avoid the crash. Jeff. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 15:41 ` Jeff_Zimmerman @ 2013-11-07 15:54 ` Andrew Cooper 2013-11-07 16:00 ` Jan Beulich 2013-11-07 15:57 ` Jan Beulich 1 sibling, 1 reply; 32+ messages in thread From: Andrew Cooper @ 2013-11-07 15:54 UTC (permalink / raw) To: Jeff_Zimmerman Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell, JBeulich On 07/11/13 15:41, Jeff_Zimmerman@McAfee.com wrote: > On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com> > wrote: > >> On Thu, 2013-11-07 at 09:10 +0000, Jan Beulich wrote: >>>>>> On 06.11.13 at 18:07, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>>> On 06/11/13 17:06, Ian Campbell wrote: >>>>> On Wed, 2013-11-06 at 16:54 +0000, Andrew Cooper wrote: >>>>>> I looked over the xl code and thought that nestedhvm should default to >>>>>> false, but I would prefer someone more familar with libxl and the idl to >>>>>> confirm what the default should be. >>>>> libxl thinks the default is false and will set HVM_PARAM_NESTEDHVM to 0 >>>>> in that case. Is there some way to query the hypervisor for what it >>>>> thinks the setting is? >>>> A get hvmparam hypercall will retrieve the value, but it is initialised >>>> to 0 and only ever set by a set hvmparam hypercall. >>> Which makes me start suspecting that the guest might be deriving >>> its information on VMX being available from something other than >>> CPUID. Of course we ought to confirm that we don't unintentionally >>> return the VMX flag set (and that the config file doesn't override it >>> in this way - I think we shouldn't be suppressing user overrides >>> here, but I didn't go check whether we do). >> I was also wondering about the behaviour of using vmx instructions in a >> guest despite vmx not being visible in cpuid... >> >> Ian. >> >> > We have found in our situation this is exactly the case. To verify we wrote some > test code that makes vmx calls without checking cupid. On bare hardware the program > executes as expected. In a VM on Xen it causes the hypervisor to panic. > > From a security standpoint this is very very bad. It might be a good idea to provide either > a run-time or build-time option to disable nestedhvm. Just turning off the vmx bit is not enough > as malicious or badly written code can cause a system crash. > > For us it looks like we can disable these instructions and avoid the crash. > > Jeff. Hmm - that is very concerning that. And there does look to be a bug. Can you try the following patch and see whether it helps? diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index c9afb56..7b1a349 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -359,7 +359,7 @@ static inline int hvm_event_pending(struct vcpu *v) /* These bits in CR4 cannot be set by the guest. */ #define HVM_CR4_GUEST_RESERVED_BITS(_v) \ (~((unsigned long) \ - (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | \ + (X86_CR4_PVI | X86_CR4_TSD | \ X86_CR4_DE | X86_CR4_PSE | X86_CR4_PAE | \ X86_CR4_MCE | X86_CR4_PGE | X86_CR4_PCE | \ X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT | \ ~Andrew ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 15:54 ` Andrew Cooper @ 2013-11-07 16:00 ` Jan Beulich 2013-11-07 16:06 ` Andrew Cooper 0 siblings, 1 reply; 32+ messages in thread From: Jan Beulich @ 2013-11-07 16:00 UTC (permalink / raw) To: Andrew Cooper Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell, Jeff_Zimmerman >>> On 07.11.13 at 16:54, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > Can you try the following patch and see whether it helps? > > diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h > index c9afb56..7b1a349 100644 > --- a/xen/include/asm-x86/hvm/hvm.h > +++ b/xen/include/asm-x86/hvm/hvm.h > @@ -359,7 +359,7 @@ static inline int hvm_event_pending(struct vcpu *v) > /* These bits in CR4 cannot be set by the guest. */ > #define HVM_CR4_GUEST_RESERVED_BITS(_v) \ > (~((unsigned long) \ > - (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | \ > + (X86_CR4_PVI | X86_CR4_TSD | \ Are you mixing up VME and VMXE perhaps? Jan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 16:00 ` Jan Beulich @ 2013-11-07 16:06 ` Andrew Cooper 2013-11-07 16:12 ` Jeff_Zimmerman 0 siblings, 1 reply; 32+ messages in thread From: Andrew Cooper @ 2013-11-07 16:06 UTC (permalink / raw) To: Jan Beulich Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell, Jeff_Zimmerman On 07/11/13 16:00, Jan Beulich wrote: >>>> On 07.11.13 at 16:54, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> Can you try the following patch and see whether it helps? >> >> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h >> index c9afb56..7b1a349 100644 >> --- a/xen/include/asm-x86/hvm/hvm.h >> +++ b/xen/include/asm-x86/hvm/hvm.h >> @@ -359,7 +359,7 @@ static inline int hvm_event_pending(struct vcpu *v) >> /* These bits in CR4 cannot be set by the guest. */ >> #define HVM_CR4_GUEST_RESERVED_BITS(_v) \ >> (~((unsigned long) \ >> - (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | \ >> + (X86_CR4_PVI | X86_CR4_TSD | \ > Are you mixing up VME and VMXE perhaps? > > Jan > I am indeed. Apologies for the noise, but I am still quite concerned I shall attempt to repro this on a XenRT machine Jeff: What system is this on (so I can pick a similar server to try with)? ~Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 16:06 ` Andrew Cooper @ 2013-11-07 16:12 ` Jeff_Zimmerman 0 siblings, 0 replies; 32+ messages in thread From: Jeff_Zimmerman @ 2013-11-07 16:12 UTC (permalink / raw) To: andrew.cooper3 Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell, JBeulich On Nov 7, 2013, at 8:06 AM, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 07/11/13 16:00, Jan Beulich wrote: >>>>> On 07.11.13 at 16:54, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> Can you try the following patch and see whether it helps? >>> >>> diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h >>> index c9afb56..7b1a349 100644 >>> --- a/xen/include/asm-x86/hvm/hvm.h >>> +++ b/xen/include/asm-x86/hvm/hvm.h >>> @@ -359,7 +359,7 @@ static inline int hvm_event_pending(struct vcpu *v) >>> /* These bits in CR4 cannot be set by the guest. */ >>> #define HVM_CR4_GUEST_RESERVED_BITS(_v) \ >>> (~((unsigned long) \ >>> - (X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | \ >>> + (X86_CR4_PVI | X86_CR4_TSD | \ >> Are you mixing up VME and VMXE perhaps? >> >> Jan >> > > I am indeed. Apologies for the noise, but I am still quite concerned > > I shall attempt to repro this on a XenRT machine > > Jeff: What system is this on (so I can pick a similar server to try with)? It is an intel S4600LH board. > > ~Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 15:41 ` Jeff_Zimmerman 2013-11-07 15:54 ` Andrew Cooper @ 2013-11-07 15:57 ` Jan Beulich 2013-11-07 16:02 ` Jeff_Zimmerman 1 sibling, 1 reply; 32+ messages in thread From: Jan Beulich @ 2013-11-07 15:57 UTC (permalink / raw) To: Ian.Campbell, Jeff_Zimmerman Cc: lars.kurth.xen, andrew.cooper3, lars.kurth, xen-devel >>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote: > On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> I was also wondering about the behaviour of using vmx instructions in a >> guest despite vmx not being visible in cpuid... >> > We have found in our situation this is exactly the case. To verify we wrote > some > test code that makes vmx calls without checking cupid. On bare hardware the > program > executes as expected. In a VM on Xen it causes the hypervisor to panic. You trying it doesn't yet imply that Windows also does so. Also, you say "program" - are you using these from user mode code? > From a security standpoint this is very very bad. It might be a good idea to > provide either > a run-time or build-time option to disable nestedhvm. Just turning off the vmx > bit is not enough > as malicious or badly written code can cause a system crash. Yes, we will absolutely need to do that. Jan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 15:57 ` Jan Beulich @ 2013-11-07 16:02 ` Jeff_Zimmerman 2013-11-07 16:53 ` Jan Beulich 0 siblings, 1 reply; 32+ messages in thread From: Jeff_Zimmerman @ 2013-11-07 16:02 UTC (permalink / raw) To: JBeulich Cc: lars.kurth.xen, andrew.cooper3, lars.kurth, Ian.Campbell, xen-devel On Nov 7, 2013, at 7:57 AM, Jan Beulich <JBeulich@suse.com> wrote: >>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote: >> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>> I was also wondering about the behaviour of using vmx instructions in a >>> guest despite vmx not being visible in cpuid... >>> >> We have found in our situation this is exactly the case. To verify we wrote >> some >> test code that makes vmx calls without checking cupid. On bare hardware the >> program >> executes as expected. In a VM on Xen it causes the hypervisor to panic. > > You trying it doesn't yet imply that Windows also does so. > > Also, you say "program" - are you using these from user mode code? Yes, from windows run as a privileged user. Windows XP sp3 can cause the crash. It seems windows 7 has better security, we cannot crash the system from a win7 guest. > >> From a security standpoint this is very very bad. It might be a good idea to >> provide either >> a run-time or build-time option to disable nestedhvm. Just turning off the vmx >> bit is not enough >> as malicious or badly written code can cause a system crash. > > Yes, we will absolutely need to do that. > > Jan > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 16:02 ` Jeff_Zimmerman @ 2013-11-07 16:53 ` Jan Beulich 2013-11-07 17:02 ` Andrew Cooper ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Jan Beulich @ 2013-11-07 16:53 UTC (permalink / raw) To: Jeff_Zimmerman Cc: lars.kurth.xen, andrew.cooper3, lars.kurth, Ian.Campbell, xen-devel [-- Attachment #1: Type: text/plain, Size: 1092 bytes --] >>> On 07.11.13 at 17:02, <Jeff_Zimmerman@McAfee.com> wrote: > On Nov 7, 2013, at 7:57 AM, Jan Beulich <JBeulich@suse.com> > wrote: > >>>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote: >>> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>>> I was also wondering about the behaviour of using vmx instructions in a >>>> guest despite vmx not being visible in cpuid... >>>> >>> We have found in our situation this is exactly the case. To verify we wrote >>> some >>> test code that makes vmx calls without checking cupid. On bare hardware the >>> program >>> executes as expected. In a VM on Xen it causes the hypervisor to panic. >> >> You trying it doesn't yet imply that Windows also does so. >> >> Also, you say "program" - are you using these from user mode code? > > Yes, from windows run as a privileged user. Windows XP sp3 can cause the > crash. > It seems windows 7 has better security, we cannot crash the system from a > win7 guest. Which is sort of odd. Anyway - care to try the attached patch? Jan [-- Attachment #2: xsa75.patch --] [-- Type: text/plain, Size: 1667 bytes --] nested VMX: VMLANUCH/VMRESUME emulation must check permission first thing Otherwise uninitialized data may be used, leading to crashes. This is XSA-75. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1509,15 +1509,10 @@ static void clear_vvmcs_launched(struct } } -int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs) +static int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs) { struct nestedvmx *nvmx = &vcpu_2_nvmx(v); struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); - int rc; - - rc = vmx_inst_check_privilege(regs, 0); - if ( rc != X86EMUL_OKAY ) - return rc; /* check VMCS is valid and IO BITMAP is set */ if ( (nvcpu->nv_vvmcxaddr != VMCX_EADDR) && @@ -1536,6 +1531,10 @@ int nvmx_handle_vmresume(struct cpu_user struct vcpu *v = current; struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + int rc = vmx_inst_check_privilege(regs, 0); + + if ( rc != X86EMUL_OKAY ) + return rc; if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR ) { @@ -1555,10 +1554,13 @@ int nvmx_handle_vmresume(struct cpu_user int nvmx_handle_vmlaunch(struct cpu_user_regs *regs) { bool_t launched; - int rc; struct vcpu *v = current; struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + int rc = vmx_inst_check_privilege(regs, 0); + + if ( rc != X86EMUL_OKAY ) + return rc; if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR ) { [-- Attachment #3: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 16:53 ` Jan Beulich @ 2013-11-07 17:02 ` Andrew Cooper 2013-11-08 7:50 ` Jan Beulich 2013-11-07 18:13 ` Andrew Cooper 2013-11-07 18:33 ` Jeff_Zimmerman 2 siblings, 1 reply; 32+ messages in thread From: Andrew Cooper @ 2013-11-07 17:02 UTC (permalink / raw) To: Jan Beulich Cc: lars.kurth.xen, xen-devel, lars.kurth, Jeff_Zimmerman, Ian.Campbell On 07/11/13 16:53, Jan Beulich wrote: >>>> On 07.11.13 at 17:02, <Jeff_Zimmerman@McAfee.com> wrote: >> On Nov 7, 2013, at 7:57 AM, Jan Beulich <JBeulich@suse.com> >> wrote: >> >>>>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote: >>>> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>>>> I was also wondering about the behaviour of using vmx instructions in a >>>>> guest despite vmx not being visible in cpuid... >>>>> >>>> We have found in our situation this is exactly the case. To verify we wrote >>>> some >>>> test code that makes vmx calls without checking cupid. On bare hardware the >>>> program >>>> executes as expected. In a VM on Xen it causes the hypervisor to panic. >>> You trying it doesn't yet imply that Windows also does so. >>> >>> Also, you say "program" - are you using these from user mode code? >> Yes, from windows run as a privileged user. Windows XP sp3 can cause the >> crash. >> It seems windows 7 has better security, we cannot crash the system from a >> win7 guest. > Which is sort of odd. Anyway - care to try the attached patch? > > Jan > While the patch does look plausible, there is still clearly an issue that an HVM guest with nested_virt disabled can even use the VMX instructions, rather than getting flat out #UD exceptions. ~Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 17:02 ` Andrew Cooper @ 2013-11-08 7:50 ` Jan Beulich 0 siblings, 0 replies; 32+ messages in thread From: Jan Beulich @ 2013-11-08 7:50 UTC (permalink / raw) To: Andrew Cooper Cc: lars.kurth.xen, xen-devel, lars.kurth, Ian.Campbell, Jeff_Zimmerman >>> On 07.11.13 at 18:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > While the patch does look plausible, there is still clearly an issue > that an HVM guest with nested_virt disabled can even use the VMX > instructions, rather than getting flat out #UD exceptions. The real CR4.VMXE is (of course) set, and basing a decision on the read shadow would clearly be wrong from an architectural pov (as then this would no longer be just a read shadow). And this isn't the problem here anyway - one problems is that the privilege level check is done _after_ the VMX non-root mode one. I guess they do it that way in order to allow the VMM maximum flexibility. Jan ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 16:53 ` Jan Beulich 2013-11-07 17:02 ` Andrew Cooper @ 2013-11-07 18:13 ` Andrew Cooper 2013-11-07 18:33 ` Jeff_Zimmerman 2 siblings, 0 replies; 32+ messages in thread From: Andrew Cooper @ 2013-11-07 18:13 UTC (permalink / raw) To: Jan Beulich Cc: lars.kurth.xen, xen-devel, lars.kurth, Jeff_Zimmerman, Ian.Campbell On 07/11/13 16:53, Jan Beulich wrote: >>>> On 07.11.13 at 17:02, <Jeff_Zimmerman@McAfee.com> wrote: >> On Nov 7, 2013, at 7:57 AM, Jan Beulich <JBeulich@suse.com> >> wrote: >> >>>>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote: >>>> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>>>> I was also wondering about the behaviour of using vmx instructions in a >>>>> guest despite vmx not being visible in cpuid... >>>>> >>>> We have found in our situation this is exactly the case. To verify we wrote >>>> some >>>> test code that makes vmx calls without checking cupid. On bare hardware the >>>> program >>>> executes as expected. In a VM on Xen it causes the hypervisor to panic. >>> You trying it doesn't yet imply that Windows also does so. >>> >>> Also, you say "program" - are you using these from user mode code? >> Yes, from windows run as a privileged user. Windows XP sp3 can cause the >> crash. >> It seems windows 7 has better security, we cannot crash the system from a >> win7 guest. > Which is sort of odd. Anyway - care to try the attached patch? > > Jan > I have managed to reproduce the issue, and the patch appears to fix things. I have to admit to being very surprised that the VMX hardware doesn't check CR4.VMXE before causing a vmexit. Reviewed-and-tested-by: Andrew Cooper <andrew.cooper3@citrix.com> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-07 16:53 ` Jan Beulich 2013-11-07 17:02 ` Andrew Cooper 2013-11-07 18:13 ` Andrew Cooper @ 2013-11-07 18:33 ` Jeff_Zimmerman 2 siblings, 0 replies; 32+ messages in thread From: Jeff_Zimmerman @ 2013-11-07 18:33 UTC (permalink / raw) To: JBeulich Cc: lars.kurth.xen, andrew.cooper3, lars.kurth, Ian.Campbell, xen-devel On Nov 7, 2013, at 8:53 AM, Jan Beulich <JBeulich@suse.com> wrote: >>>> On 07.11.13 at 17:02, <Jeff_Zimmerman@McAfee.com> wrote: > >> On Nov 7, 2013, at 7:57 AM, Jan Beulich <JBeulich@suse.com> >> wrote: >> >>>>>> On 07.11.13 at 16:41, <Jeff_Zimmerman@McAfee.com> wrote: >>>> On Nov 7, 2013, at 1:30 AM, Ian Campbell <Ian.Campbell@citrix.com> wrote: >>>>> I was also wondering about the behaviour of using vmx instructions in a >>>>> guest despite vmx not being visible in cpuid... >>>>> >>>> We have found in our situation this is exactly the case. To verify we wrote >>>> some >>>> test code that makes vmx calls without checking cupid. On bare hardware the >>>> program >>>> executes as expected. In a VM on Xen it causes the hypervisor to panic. >>> >>> You trying it doesn't yet imply that Windows also does so. >>> >>> Also, you say "program" - are you using these from user mode code? >> >> Yes, from windows run as a privileged user. Windows XP sp3 can cause the >> crash. >> It seems windows 7 has better security, we cannot crash the system from a >> win7 guest. > > Which is sort of odd. Anyway - care to try the attached patch? > > Jan > > <xsa75.patch> Just tried your patch. It seems to mitigate the problem. Thanks! -jeff ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <CE9EAEF6.59305%asit.k.mallick@intel.com>]
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) [not found] <CE9EAEF6.59305%asit.k.mallick@intel.com> @ 2013-11-05 22:46 ` Jeff_Zimmerman 2013-11-05 23:17 ` Mallick, Asit K 2013-11-06 0:23 ` Andrew Cooper 0 siblings, 2 replies; 32+ messages in thread From: Jeff_Zimmerman @ 2013-11-05 22:46 UTC (permalink / raw) To: asit.k.mallick; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 7051 bytes --] Asit, I've attached two files, one is from dmesg | grep microcode, second is first process from /proc/cpuinfo Jeff On Nov 5, 2013, at 2:29 PM, "Mallick, Asit K" <asit.k.mallick@intel.com> wrote: > Jeff, > Could you check if you you have latest microcode updates installed on this system? Or, could you send me the microcode rev and I can check. > > Thanks, > Asit > > > From: "Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>" <Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>> > Date: Tuesday, November 5, 2013 2:55 PM > To: "lars.kurth@xen.org<mailto:lars.kurth@xen.org>" <lars.kurth@xen.org<mailto:lars.kurth@xen.org>> > Cc: "lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>" <lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>>, "xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>" <xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>>, "JBeulich@suse.com<mailto:JBeulich@suse.com>" <JBeulich@suse.com<mailto:JBeulich@suse.com>> > Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) > > Lars, > I understand the mailing list limits attachment size to 512K. Where can I post the xen binary an symbols file? > Jeff > > On Nov 5, 2013, at 7:46 AM, Lars Kurth <lars.kurth@xen.org<mailto:lars.kurth@xen.org>> wrote: > > Jan, Andrew, Ian, > > pulling in Jeff who raised the question. Snippets from misc replies attached. Jeff, please look through these (in particular Jan's answer) and answer any further questions on this thread. > > On 05/11/2013 09:53, Ian Campbell wrote: >> TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to >> do would be to redirect them to xen-devel themselves (with a reminder that they do not need >> to subscribe to post). > Agreed. Another option is for me to start the thread and pull in the raiser of the thread into it, if it is a bug. Was not sure this was a real bug at first, but it seems it is. > > On 04/11/2013 20:00, Andrew Cooper wrote: >> Which version of Xen were these images saved on? > [Jeff] We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0. > >> Are you expecting to be using nested-virt? (It is still very definitely experimental) > [Jeff] Not using nested-virt. > > On 05/11/2013 10:04, Jan Beulich wrote: > > On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com> wrote: > > > See > http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3- > 1.html > --- > I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's. > DOM0 is Centos 6.3 based with linux kernel 3.10.16. > In my configuration all of the windows HVMs are running having been > restored from xl save. > VM's are destroyed or restored in an on-demand fashion. After some time XEN > will experience a fatal page fault while restoring one of the windows HVM > subjects. This does not happen very often, perhaps once in a 16 to 48 hour > period. > The stack trace from xen follows. Thanks in advance for any help. > > (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]---- > (XEN) CPU: 52 > (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0 > > > Zapping addresses (here and below in the stack trace) is never > helpful when someone asks for help with a crash. Also, in order > to not just guess, the matching xen-syms or xen.efi should be > made available or pointed to. > > > > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000 > (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000 > (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0 > (XEN) cr3: 000000211bee5000 cr2: ffff810000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff8310333e7cd8: > (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000 > (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548 > (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60 > (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000 > (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000 > (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440 > (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880 > (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880 > (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000 > (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440 > (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380 > (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00 > (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490 > (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c > (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9 > (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000 > (XEN) Xen call trace: > (XEN) [] domain_page_map_to_mfn+0x86/0xc0 > (XEN) [] nvmx_handle_vmlaunch+0x49/0x160 > (XEN) [] __update_vcpu_system_time+0x240/0x310 > (XEN) [] vmx_vmexit_handler+0xb58/0x18c0 > (XEN) [] pt_restore_timer+0xa8/0xc0 > (XEN) [] hvm_io_assist+0xef/0x120 > (XEN) [] hvm_do_resume+0x195/0x1c0 > (XEN) [] vmx_do_resume+0x148/0x210 > (XEN) [] context_switch+0x1bc/0xfc0 > (XEN) [] schedule+0x254/0x5f0 > (XEN) [] pt_update_irq+0x256/0x2b0 > (XEN) [] timer_softirq_action+0x168/0x210 > (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0 > (XEN) [] nvmx_switch_guest+0x54/0x1560 > (XEN) [] vmx_intr_assist+0x6c/0x490 > (XEN) [] vmx_vmenter_helper+0x88/0x160 > (XEN) [] __do_softirq+0x69/0xa0 > (XEN) [] __do_softirq+0x69/0xa0 > (XEN) [] vmx_asm_do_vmentry+0/0xed > (XEN) > (XEN) Pagetable walk from ffff810000000000: > (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff > (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff > > > This makes me suspect that domain_page_map_to_mfn() gets a > NULL pointer passed here. As said above, this is only guesswork > at this point, and as Ian already pointed out, directing the > reporter to xen-devel would seem to be the right thing to do > here anyway. > > Jan > > > [-- Attachment #1.2: Type: text/html, Size: 8799 bytes --] [-- Attachment #2: microcode.txt --] [-- Type: text/plain, Size: 956 bytes --] microcode: CPU0 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU1 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU2 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU3 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU4 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU5 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU6 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU7 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU8 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU9 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU10 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU11 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU12 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU13 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU14 sig=0x206d7, pf=0x40, revision=0x710 microcode: CPU15 sig=0x206d7, pf=0x40, revision=0x710 microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba [-- Attachment #3: cpuinfo.txt --] [-- Type: text/plain, Size: 904 bytes --] processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 45 model name : Intel(R) Xeon(R) CPU E5-4640 0 @ 2.40GHz stepping : 7 microcode : 0x710 cpu MHz : 2394.318 cache size : 20480 KB physical id : 0 siblings : 16 core id : 0 cpu cores : 8 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc pni pclmulqdq est ssse3 cx16 sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm bogomips : 4788.63 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual [-- Attachment #4: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-05 22:46 ` Jeff_Zimmerman @ 2013-11-05 23:17 ` Mallick, Asit K 2013-11-06 0:23 ` Andrew Cooper 1 sibling, 0 replies; 32+ messages in thread From: Mallick, Asit K @ 2013-11-05 23:17 UTC (permalink / raw) To: Jeff_Zimmerman@McAfee.com; +Cc: xen-devel@lists.xenproject.org It is running with the latest microcode revision 0x710. Thanks, Asit From: "Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>" <Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>> Date: Tuesday, November 5, 2013 3:46 PM To: "Mallick, Asit K" <asit.k.mallick@intel.com<mailto:asit.k.mallick@intel.com>> Cc: "xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>" <xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>> Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) Asit, I've attached two files, one is from dmesg | grep microcode, second is first process from /proc/cpuinfo Jeff On Nov 5, 2013, at 2:29 PM, "Mallick, Asit K" <asit.k.mallick@intel.com<mailto:asit.k.mallick@intel.com>> wrote: > Jeff, > Could you check if you you have latest microcode updates installed on this system? Or, could you send me the microcode rev and I can check. > > Thanks, > Asit > > > From: "Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com><mailto:Jeff_Zimmerman@McAfee.com>" <Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com><mailto:Jeff_Zimmerman@McAfee.com>> > Date: Tuesday, November 5, 2013 2:55 PM > To: "lars.kurth@xen.org<mailto:lars.kurth@xen.org><mailto:lars.kurth@xen.org>" <lars.kurth@xen.org<mailto:lars.kurth@xen.org><mailto:lars.kurth@xen.org>> > Cc: "lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com>" <lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com>>, "xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org><mailto:xen-devel@lists.xenproject.org>" <xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org><mailto:xen-devel@lists.xenproject.org>>, "JBeulich@suse.com<mailto:JBeulich@suse.com><mailto:JBeulich@suse.com>" <JBeulich@suse.com<mailto:JBeulich@suse.com><mailto:JBeulich@suse.com>> > Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) > > Lars, > I understand the mailing list limits attachment size to 512K. Where can I post the xen binary an symbols file? > Jeff > > On Nov 5, 2013, at 7:46 AM, Lars Kurth <lars.kurth@xen.org<mailto:lars.kurth@xen.org><mailto:lars.kurth@xen.org>> wrote: > > Jan, Andrew, Ian, > > pulling in Jeff who raised the question. Snippets from misc replies attached. Jeff, please look through these (in particular Jan's answer) and answer any further questions on this thread. > > On 05/11/2013 09:53, Ian Campbell wrote: >> TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to >> do would be to redirect them to xen-devel themselves (with a reminder that they do not need >> to subscribe to post). > Agreed. Another option is for me to start the thread and pull in the raiser of the thread into it, if it is a bug. Was not sure this was a real bug at first, but it seems it is. > > On 04/11/2013 20:00, Andrew Cooper wrote: >> Which version of Xen were these images saved on? > [Jeff] We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0. > >> Are you expecting to be using nested-virt? (It is still very definitely experimental) > [Jeff] Not using nested-virt. > > On 05/11/2013 10:04, Jan Beulich wrote: > > On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>><mailto:lars.kurth.xen@gmail.com> wrote: > > > See > http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3- > 1.html > --- > I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's. > DOM0 is Centos 6.3 based with linux kernel 3.10.16. > In my configuration all of the windows HVMs are running having been > restored from xl save. > VM's are destroyed or restored in an on-demand fashion. After some time XEN > will experience a fatal page fault while restoring one of the windows HVM > subjects. This does not happen very often, perhaps once in a 16 to 48 hour > period. > The stack trace from xen follows. Thanks in advance for any help. > > (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]---- > (XEN) CPU: 52 > (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0 > > > Zapping addresses (here and below in the stack trace) is never > helpful when someone asks for help with a crash. Also, in order > to not just guess, the matching xen-syms or xen.efi should be > made available or pointed to. > > > > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000 > (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000 > (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000 > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0 > (XEN) cr3: 000000211bee5000 cr2: ffff810000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen stack trace from rsp=ffff8310333e7cd8: > (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000 > (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548 > (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60 > (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000 > (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000 > (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440 > (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880 > (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880 > (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000 > (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440 > (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380 > (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00 > (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490 > (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c > (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9 > (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000 > (XEN) Xen call trace: > (XEN) [] domain_page_map_to_mfn+0x86/0xc0 > (XEN) [] nvmx_handle_vmlaunch+0x49/0x160 > (XEN) [] __update_vcpu_system_time+0x240/0x310 > (XEN) [] vmx_vmexit_handler+0xb58/0x18c0 > (XEN) [] pt_restore_timer+0xa8/0xc0 > (XEN) [] hvm_io_assist+0xef/0x120 > (XEN) [] hvm_do_resume+0x195/0x1c0 > (XEN) [] vmx_do_resume+0x148/0x210 > (XEN) [] context_switch+0x1bc/0xfc0 > (XEN) [] schedule+0x254/0x5f0 > (XEN) [] pt_update_irq+0x256/0x2b0 > (XEN) [] timer_softirq_action+0x168/0x210 > (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0 > (XEN) [] nvmx_switch_guest+0x54/0x1560 > (XEN) [] vmx_intr_assist+0x6c/0x490 > (XEN) [] vmx_vmenter_helper+0x88/0x160 > (XEN) [] __do_softirq+0x69/0xa0 > (XEN) [] __do_softirq+0x69/0xa0 > (XEN) [] vmx_asm_do_vmentry+0/0xed > (XEN) > (XEN) Pagetable walk from ffff810000000000: > (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff > (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff > > > This makes me suspect that domain_page_map_to_mfn() gets a > NULL pointer passed here. As said above, this is only guesswork > at this point, and as Ian already pointed out, directing the > reporter to xen-devel would seem to be the right thing to do > here anyway. > > Jan > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-05 22:46 ` Jeff_Zimmerman 2013-11-05 23:17 ` Mallick, Asit K @ 2013-11-06 0:23 ` Andrew Cooper 2013-11-06 10:05 ` Ian Campbell 1 sibling, 1 reply; 32+ messages in thread From: Andrew Cooper @ 2013-11-06 0:23 UTC (permalink / raw) To: Jeff_Zimmerman, asit.k.mallick; +Cc: xen-devel [-- Attachment #1.1: Type: text/plain, Size: 7667 bytes --] On 05/11/2013 22:46, Jeff_Zimmerman@McAfee.com wrote: > Asit, > I've attached two files, one is from dmesg | grep microcode, second is > first process from /proc/cpuinfo > Jeff > > On Nov 5, 2013, at 2:29 PM, "Mallick, Asit K" <asit.k.mallick@intel.com> > wrote: > > > Jeff, > > Could you check if you you have latest microcode updates installed > on this system? Or, could you send me the microcode rev and I can check. > > > > Thanks, > > Asit > > > > > > From: "Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>" > <Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>> > > Date: Tuesday, November 5, 2013 2:55 PM > > To: "lars.kurth@xen.org<mailto:lars.kurth@xen.org>" > <lars.kurth@xen.org<mailto:lars.kurth@xen.org>> > > Cc: "lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>" > <lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>>, > "xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>" <xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>>, > "JBeulich@suse.com<mailto:JBeulich@suse.com>" > <JBeulich@suse.com<mailto:JBeulich@suse.com>> > > Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN > 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) > > > > Lars, > > I understand the mailing list limits attachment size to 512K. Where > can I post the xen binary an symbols file? > > Jeff > > > > On Nov 5, 2013, at 7:46 AM, Lars Kurth > <lars.kurth@xen.org<mailto:lars.kurth@xen.org>> wrote: > > > > Jan, Andrew, Ian, > > > > pulling in Jeff who raised the question. Snippets from misc replies > attached. Jeff, please look through these (in particular Jan's answer) > and answer any further questions on this thread. > > > > On 05/11/2013 09:53, Ian Campbell wrote: > >> TBH I think for this kind of thing (i.e. a bug not a user question) > the most appropriate thing to > >> do would be to redirect them to xen-devel themselves (with a > reminder that they do not need > >> to subscribe to post). > > Agreed. Another option is for me to start the thread and pull in the > raiser of the thread into it, if it is a bug. Was not sure this was a > real bug at first, but it seems it is. > > > > On 04/11/2013 20:00, Andrew Cooper wrote: > >> Which version of Xen were these images saved on? > > [Jeff] We were careful to regenerate all the images after upgrading > the 4.3.1. Also saw the same problem on 4.3.0. > > > >> Are you expecting to be using nested-virt? (It is still very > definitely experimental) > > [Jeff] Not using nested-virt. > > > > On 05/11/2013 10:04, Jan Beulich wrote: > > > > On 04.11.13 at 20:54, Lars Kurth > <lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com> wrote: > > > > > > See > > > http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3- > > 1.html > > --- > > I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's. > > DOM0 is Centos 6.3 based with linux kernel 3.10.16. > > In my configuration all of the windows HVMs are running having been > > restored from xl save. > > VM's are destroyed or restored in an on-demand fashion. After some > time XEN > > will experience a fatal page fault while restoring one of the > windows HVM > > subjects. This does not happen very often, perhaps once in a 16 to > 48 hour > > period. > > The stack trace from xen follows. Thanks in advance for any help. > > > > (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]---- > > (XEN) CPU: 52 > > (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0 > > > > > > Zapping addresses (here and below in the stack trace) is never > > helpful when someone asks for help with a crash. Also, in order > > to not just guess, the matching xen-syms or xen.efi should be > > made available or pointed to. > > > > > > > > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > > (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000 > > (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000 > > (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000 > > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > > (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000 > > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0 > > (XEN) cr3: 000000211bee5000 cr2: ffff810000000000 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen stack trace from rsp=ffff8310333e7cd8: > > (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 > ffff8300bb163000 > > (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 > ffff82c4c01d7548 > > (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 > ffff8310333e7e60 > > (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 > ffff833144d8e000 > > (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 > ffff8300bdff1000 > > (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 > ffff82c4c0308440 > > (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c > ffff82c4c02f2880 > > (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 > ffff82c4c02f2880 > > (XEN) 0000000000000282 0010000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 > ffff8300bb163000 > > (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 > ffff82c4c0308440 > > (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 > 0000000001c9c380 > > (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 > ffffffffffffff00 > > (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 > ffff82c4c01bc490 > > (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 > ffff82c4c01cfc3c > > (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 > ffff82c4c0125db9 > > (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > ffff82c4c01deaa3 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 > 0000000000000000 > > (XEN) Xen call trace: > > (XEN) [] domain_page_map_to_mfn+0x86/0xc0 > > (XEN) [] nvmx_handle_vmlaunch+0x49/0x160 > > (XEN) [] __update_vcpu_system_time+0x240/0x310 > > (XEN) [] vmx_vmexit_handler+0xb58/0x18c0 > > (XEN) [] pt_restore_timer+0xa8/0xc0 > > (XEN) [] hvm_io_assist+0xef/0x120 > > (XEN) [] hvm_do_resume+0x195/0x1c0 > > (XEN) [] vmx_do_resume+0x148/0x210 > > (XEN) [] context_switch+0x1bc/0xfc0 > > (XEN) [] schedule+0x254/0x5f0 > > (XEN) [] pt_update_irq+0x256/0x2b0 > > (XEN) [] timer_softirq_action+0x168/0x210 > > (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0 > > (XEN) [] nvmx_switch_guest+0x54/0x1560 > > (XEN) [] vmx_intr_assist+0x6c/0x490 > > (XEN) [] vmx_vmenter_helper+0x88/0x160 > > (XEN) [] __do_softirq+0x69/0xa0 > > (XEN) [] __do_softirq+0x69/0xa0 > > (XEN) [] vmx_asm_do_vmentry+0/0xed > > (XEN) > > (XEN) Pagetable walk from ffff810000000000: > > (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff > > (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff > > > > > > This makes me suspect that domain_page_map_to_mfn() gets a > > NULL pointer passed here. As said above, this is only guesswork > > at this point, and as Ian already pointed out, directing the > > reporter to xen-devel would seem to be the right thing to do > > here anyway. > > > > Jan > > > > > > > As Jan said, the above censoring is almost completely defeating the purpose of trying to help you. However, while you are not expecting to be using nested-virt, you clearly appear to be from the stack trace, so something is clearly up. Which toolstack are you using for VMs ? What is the configuration for the affected VM? ~Andrew [-- Attachment #1.2: Type: text/html, Size: 14333 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) 2013-11-06 0:23 ` Andrew Cooper @ 2013-11-06 10:05 ` Ian Campbell 0 siblings, 0 replies; 32+ messages in thread From: Ian Campbell @ 2013-11-06 10:05 UTC (permalink / raw) To: Andrew Cooper; +Cc: asit.k.mallick, xen-devel, Jeff_Zimmerman On Wed, 2013-11-06 at 00:23 +0000, Andrew Cooper wrote: > Which toolstack are you using for VMs ? What is the configuration for > the affected VM? And what exact Windows OS? It's not entirely out the question that a modern one might try and use VMX for various things if it saw it. And doesn't mcafee have a Windows product which does things along those lines ? :-) Ian. ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2013-11-08 7:50 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-04 19:54 Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) Lars Kurth
2013-11-04 20:00 ` Andrew Cooper
2013-11-05 9:53 ` Ian Campbell
2013-11-05 10:04 ` Jan Beulich
2013-11-05 15:46 ` Lars Kurth
2013-11-05 21:55 ` Jeff_Zimmerman
[not found] ` <5E2B3362-4D93-4FEF-987A-E477B0DCEE51@mcafee.com>
2013-11-06 14:09 ` Jan Beulich
2013-11-06 16:05 ` Jeff_Zimmerman
2013-11-06 16:16 ` Jan Beulich
2013-11-06 16:18 ` Ian Campbell
2013-11-06 16:48 ` Jeff_Zimmerman
2013-11-06 16:54 ` Andrew Cooper
2013-11-06 17:06 ` Ian Campbell
2013-11-06 17:07 ` Andrew Cooper
2013-11-07 9:10 ` Jan Beulich
2013-11-07 9:30 ` Ian Campbell
2013-11-07 15:41 ` Jeff_Zimmerman
2013-11-07 15:54 ` Andrew Cooper
2013-11-07 16:00 ` Jan Beulich
2013-11-07 16:06 ` Andrew Cooper
2013-11-07 16:12 ` Jeff_Zimmerman
2013-11-07 15:57 ` Jan Beulich
2013-11-07 16:02 ` Jeff_Zimmerman
2013-11-07 16:53 ` Jan Beulich
2013-11-07 17:02 ` Andrew Cooper
2013-11-08 7:50 ` Jan Beulich
2013-11-07 18:13 ` Andrew Cooper
2013-11-07 18:33 ` Jeff_Zimmerman
[not found] <CE9EAEF6.59305%asit.k.mallick@intel.com>
2013-11-05 22:46 ` Jeff_Zimmerman
2013-11-05 23:17 ` Mallick, Asit K
2013-11-06 0:23 ` Andrew Cooper
2013-11-06 10:05 ` Ian Campbell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).