From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) Date: Wed, 6 Nov 2013 00:23:25 +0000 Message-ID: <52798BFD.3010608@citrix.com> References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2553674123198112793==" Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1VdquD-0004Dk-FQ for xen-devel@lists.xenproject.org; Wed, 06 Nov 2013 00:23:33 +0000 In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jeff_Zimmerman@McAfee.com, asit.k.mallick@intel.com Cc: xen-devel@lists.xenproject.org List-Id: xen-devel@lists.xenproject.org --===============2553674123198112793== Content-Type: multipart/alternative; boundary="------------050804010605010800060100" --------------050804010605010800060100 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit On 05/11/2013 22:46, Jeff_Zimmerman@McAfee.com wrote: > Asit, > I've attached two files, one is from dmesg | grep microcode, second is > first process from /proc/cpuinfo > Jeff > > On Nov 5, 2013, at 2:29 PM, "Mallick, Asit K" > wrote: > > > Jeff, > > Could you check if you you have latest microcode updates installed > on this system? Or, could you send me the microcode rev and I can check. > > > > Thanks, > > Asit > > > > > > From: "Jeff_Zimmerman@McAfee.com" > > > > Date: Tuesday, November 5, 2013 2:55 PM > > To: "lars.kurth@xen.org" > > > > Cc: "lars.kurth.xen@gmail.com" > >, > "xen-devel@lists.xenproject.org" >, > "JBeulich@suse.com" > > > > Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN > 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.) > > > > Lars, > > I understand the mailing list limits attachment size to 512K. Where > can I post the xen binary an symbols file? > > Jeff > > > > On Nov 5, 2013, at 7:46 AM, Lars Kurth > > wrote: > > > > Jan, Andrew, Ian, > > > > pulling in Jeff who raised the question. Snippets from misc replies > attached. Jeff, please look through these (in particular Jan's answer) > and answer any further questions on this thread. > > > > On 05/11/2013 09:53, Ian Campbell wrote: > >> TBH I think for this kind of thing (i.e. a bug not a user question) > the most appropriate thing to > >> do would be to redirect them to xen-devel themselves (with a > reminder that they do not need > >> to subscribe to post). > > Agreed. Another option is for me to start the thread and pull in the > raiser of the thread into it, if it is a bug. Was not sure this was a > real bug at first, but it seems it is. > > > > On 04/11/2013 20:00, Andrew Cooper wrote: > >> Which version of Xen were these images saved on? > > [Jeff] We were careful to regenerate all the images after upgrading > the 4.3.1. Also saw the same problem on 4.3.0. > > > >> Are you expecting to be using nested-virt? (It is still very > definitely experimental) > > [Jeff] Not using nested-virt. > > > > On 05/11/2013 10:04, Jan Beulich wrote: > > > > On 04.11.13 at 20:54, Lars Kurth > wrote: > > > > > > See > > > http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3- > > 1.html > > --- > > I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's. > > DOM0 is Centos 6.3 based with linux kernel 3.10.16. > > In my configuration all of the windows HVMs are running having been > > restored from xl save. > > VM's are destroyed or restored in an on-demand fashion. After some > time XEN > > will experience a fatal page fault while restoring one of the > windows HVM > > subjects. This does not happen very often, perhaps once in a 16 to > 48 hour > > period. > > The stack trace from xen follows. Thanks in advance for any help. > > > > (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]---- > > (XEN) CPU: 52 > > (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0 > > > > > > Zapping addresses (here and below in the stack trace) is never > > helpful when someone asks for help with a crash. Also, in order > > to not just guess, the matching xen-syms or xen.efi should be > > made available or pointed to. > > > > > > > > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > > (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000 > > (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000 > > (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000 > > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 > > (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000 > > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0 > > (XEN) cr3: 000000211bee5000 cr2: ffff810000000000 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen stack trace from rsp=ffff8310333e7cd8: > > (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 > ffff8300bb163000 > > (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 > ffff82c4c01d7548 > > (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 > ffff8310333e7e60 > > (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 > ffff833144d8e000 > > (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 > ffff8300bdff1000 > > (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 > ffff82c4c0308440 > > (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c > ffff82c4c02f2880 > > (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 > ffff82c4c02f2880 > > (XEN) 0000000000000282 0010000000000000 0000000000000000 > 0000000000000000 > > (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 > ffff8300bb163000 > > (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 > ffff82c4c0308440 > > (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 > 0000000001c9c380 > > (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 > ffffffffffffff00 > > (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 > ffff82c4c01bc490 > > (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 > ffff82c4c01cfc3c > > (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 > ffff82c4c0125db9 > > (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 > 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > ffff82c4c01deaa3 > > (XEN) 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > > (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 > 0000000000000000 > > (XEN) Xen call trace: > > (XEN) [] domain_page_map_to_mfn+0x86/0xc0 > > (XEN) [] nvmx_handle_vmlaunch+0x49/0x160 > > (XEN) [] __update_vcpu_system_time+0x240/0x310 > > (XEN) [] vmx_vmexit_handler+0xb58/0x18c0 > > (XEN) [] pt_restore_timer+0xa8/0xc0 > > (XEN) [] hvm_io_assist+0xef/0x120 > > (XEN) [] hvm_do_resume+0x195/0x1c0 > > (XEN) [] vmx_do_resume+0x148/0x210 > > (XEN) [] context_switch+0x1bc/0xfc0 > > (XEN) [] schedule+0x254/0x5f0 > > (XEN) [] pt_update_irq+0x256/0x2b0 > > (XEN) [] timer_softirq_action+0x168/0x210 > > (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0 > > (XEN) [] nvmx_switch_guest+0x54/0x1560 > > (XEN) [] vmx_intr_assist+0x6c/0x490 > > (XEN) [] vmx_vmenter_helper+0x88/0x160 > > (XEN) [] __do_softirq+0x69/0xa0 > > (XEN) [] __do_softirq+0x69/0xa0 > > (XEN) [] vmx_asm_do_vmentry+0/0xed > > (XEN) > > (XEN) Pagetable walk from ffff810000000000: > > (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff > > (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff > > > > > > This makes me suspect that domain_page_map_to_mfn() gets a > > NULL pointer passed here. As said above, this is only guesswork > > at this point, and as Ian already pointed out, directing the > > reporter to xen-devel would seem to be the right thing to do > > here anyway. > > > > Jan > > > > > > > As Jan said, the above censoring is almost completely defeating the purpose of trying to help you. However, while you are not expecting to be using nested-virt, you clearly appear to be from the stack trace, so something is clearly up. Which toolstack are you using for VMs ? What is the configuration for the affected VM? ~Andrew --------------050804010605010800060100 Content-Type: text/html; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit
On 05/11/2013 22:46, Jeff_Zimmerman@McAfee.com wrote:
Asit,
I've attached two files, one is from dmesg | grep microcode, second is first process from /proc/cpuinfo
Jeff

On Nov 5, 2013, at 2:29 PM, "Mallick, Asit K" <asit.k.mallick@intel.com>
 wrote:

> Jeff,
> Could you check if you you have latest microcode updates installed on this system? Or, could you send me the microcode rev and I can check.
>
> Thanks,
> Asit
>
>
> From: "Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>" <Jeff_Zimmerman@McAfee.com<mailto:Jeff_Zimmerman@McAfee.com>>
> Date: Tuesday, November 5, 2013 2:55 PM
> To: "lars.kurth@xen.org<mailto:lars.kurth@xen.org>" <lars.kurth@xen.org<mailto:lars.kurth@xen.org>>
> Cc: "lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>" <lars.kurth.xen@gmail.com<mailto:lars.kurth.xen@gmail.com>>, "xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>" <xen-devel@lists.xenproject.org<mailto:xen-devel@lists.xenproject.org>>, "JBeulich@suse.com<mailto:JBeulich@suse.com>" <JBeulich@suse.com<mailto:JBeulich@suse.com>>
> Subject: Re: [Xen-devel] Intermittent fatal page fault with XEN 4.3.1 (Centos 6.3 DOM0 with linux kernel 3.10.16.)
>
> Lars,
> I understand the mailing list limits attachment size to 512K. Where can I post the xen binary an symbols file?
> Jeff
>
> On Nov 5, 2013, at 7:46 AM, Lars Kurth <lars.kurth@xen.org<mailto:lars.kurth@xen.org>> wrote:
>
> Jan, Andrew, Ian,
>
> pulling in Jeff who raised the question. Snippets from misc replies attached. Jeff, please look through these (in particular Jan's answer) and answer any further questions on this thread.
>
> On 05/11/2013 09:53, Ian Campbell wrote:
>> TBH I think for this kind of thing (i.e. a bug not a user question) the most appropriate thing to
>> do would be to redirect them to xen-devel themselves (with a reminder that they do not need
>> to subscribe to post).
> Agreed. Another option is for me to start the thread and pull in the raiser of the thread into it, if it is a bug. Was not sure this was a real bug at first, but it seems it is.
>
> On 04/11/2013 20:00, Andrew Cooper wrote:
>> Which version of Xen were these images saved on?
> [Jeff] We were careful to regenerate all the images after upgrading the 4.3.1. Also saw the same problem on 4.3.0.
>
>> Are you expecting to be using nested-virt? (It is still very definitely experimental)
> [Jeff] Not using nested-virt.
>
> On 05/11/2013 10:04, Jan Beulich wrote:
>
> On 04.11.13 at 20:54, Lars Kurth <lars.kurth.xen@gmail.com><mailto:lars.kurth.xen@gmail.com> wrote:
>
>
> See
> http://xenproject.org/help/questions-and-answers/hypervisor-fatal-page-fault-xen-4-3-
> 1.html
> ---
> I have a 32 core system running XEN 4.3.1 with 30 Windows XP VM's.
> DOM0 is Centos 6.3 based with linux kernel 3.10.16.
> In my configuration all of the windows HVMs are running having been
> restored from xl save.
> VM's are destroyed or restored in an on-demand fashion. After some time XEN
> will experience a fatal page fault while restoring one of the windows HVM
> subjects. This does not happen very often, perhaps once in a 16 to 48 hour
> period.
> The stack trace from xen follows. Thanks in advance for any help.
>
> (XEN) ----[ Xen-4.3.1 x86_64 debug=n Tainted: C ]----
> (XEN) CPU: 52
> (XEN) RIP: e008:[] domain_page_map_to_mfn+0x86/0xc0
>
>
> Zapping addresses (here and below in the stack trace) is never
> helpful when someone asks for help with a crash. Also, in order
> to not just guess, the matching xen-syms or xen.efi should be
> made available or pointed to.
>
>
>
> (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor
> (XEN) rax: 000ffffffffff000 rbx: ffff8300bb163760 rcx: 0000000000000000
> (XEN) rdx: ffff810000000000 rsi: 0000000000000000 rdi: 0000000000000000
> (XEN) rbp: ffff8300bb163000 rsp: ffff8310333e7cd8 r8: 0000000000000000
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000
> (XEN) r12: ffff8310333e7f18 r13: 0000000000000000 r14: 0000000000000000
> (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000000426f0
> (XEN) cr3: 000000211bee5000 cr2: ffff810000000000
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008
> (XEN) Xen stack trace from rsp=ffff8310333e7cd8:
> (XEN) 0000000000000001 ffff82c4c01de869 ffff82c4c0182c70 ffff8300bb163000
> (XEN) 0000000000000014 ffff8310333e7f18 0000000000000000 ffff82c4c01d7548
> (XEN) ffff8300bb163490 ffff8300bb163000 ffff82c4c01c65b8 ffff8310333e7e60
> (XEN) ffff82c4c01badef ffff8300bb163000 0000000000000003 ffff833144d8e000
> (XEN) ffff82c4c01b4885 ffff8300bb163000 ffff8300bb163000 ffff8300bdff1000
> (XEN) 0000000000000001 ffff82c4c02f2880 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c01d0ea8 ffff8300bb163000 ffff82c4c015ad6c ffff82c4c02f2880
> (XEN) ffff82c4c02cf800 00000000ffffffff ffff8310333f5060 ffff82c4c02f2880
> (XEN) 0000000000000282 0010000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 ffff82c4c02f2880 ffff8300bdff1000 ffff8300bb163000
> (XEN) 000031a10f2b16ca 0000000000000001 ffff82c4c02f2880 ffff82c4c0308440
> (XEN) ffff82c4c0124444 0000000000000034 ffff8310333f5060 0000000001c9c380
> (XEN) 00000000c0155965 ffff82c4c01c6146 0000000001c9c380 ffffffffffffff00
> (XEN) ffff82c4c0128fa8 ffff8300bb163000 ffff8327d50e9000 ffff82c4c01bc490
> (XEN) 0000000000000000 ffff82c4c01dd254 0000000080549ae0 ffff82c4c01cfc3c
> (XEN) ffff8300bb163000 ffff82c4c01d6128 ffff82c4c0125db9 ffff82c4c0125db9
> (XEN) ffff8310333e0000 ffff8300bb163000 000000000012ffc0 0000000000000000
> (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff82c4c01deaa3
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 000000000012ffc0 000000007ffdf000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN) [] domain_page_map_to_mfn+0x86/0xc0
> (XEN) [] nvmx_handle_vmlaunch+0x49/0x160
> (XEN) [] __update_vcpu_system_time+0x240/0x310
> (XEN) [] vmx_vmexit_handler+0xb58/0x18c0
> (XEN) [] pt_restore_timer+0xa8/0xc0
> (XEN) [] hvm_io_assist+0xef/0x120
> (XEN) [] hvm_do_resume+0x195/0x1c0
> (XEN) [] vmx_do_resume+0x148/0x210
> (XEN) [] context_switch+0x1bc/0xfc0
> (XEN) [] schedule+0x254/0x5f0
> (XEN) [] pt_update_irq+0x256/0x2b0
> (XEN) [] timer_softirq_action+0x168/0x210
> (XEN) [] hvm_vcpu_has_pending_irq+0x50/0xb0
> (XEN) [] nvmx_switch_guest+0x54/0x1560
> (XEN) [] vmx_intr_assist+0x6c/0x490
> (XEN) [] vmx_vmenter_helper+0x88/0x160
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] __do_softirq+0x69/0xa0
> (XEN) [] vmx_asm_do_vmentry+0/0xed
> (XEN)
> (XEN) Pagetable walk from ffff810000000000:
> (XEN) L4[0x102] = 000000211bee5063 ffffffffffffffff
> (XEN) L3[0x000] = 0000000000000000 ffffffffffffffff
>
>
> This makes me suspect that domain_page_map_to_mfn() gets a
> NULL pointer passed here. As said above, this is only guesswork
> at this point, and as Ian already pointed out, directing the
> reporter to xen-devel would seem to be the right thing to do
> here anyway.
>
> Jan
>
>
>


As Jan said, the above censoring is almost completely defeating the purpose of trying to help you.

However, while you are not expecting to be using nested-virt, you clearly appear to be from the stack trace, so something is clearly up.

Which toolstack are you using for VMs ?  What is the configuration for the affected VM?

~Andrew
--------------050804010605010800060100-- --===============2553674123198112793== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============2553674123198112793==--