* Xen crash: map_domain_page() on an NMI path @ 2013-12-18 19:37 Andrew Cooper 2013-12-19 11:00 ` Tim Deegan 2013-12-19 14:55 ` Jan Beulich 0 siblings, 2 replies; 8+ messages in thread From: Andrew Cooper @ 2013-12-18 19:37 UTC (permalink / raw) To: Jan Beulich; +Cc: Keir Fraser, Tim Deegan, Xen-devel List Hello, This is a stack trace caught by automated testing. The server BMC has indicated that it has genuinely injected an IOCK NMI (which is believed to be caused by a system erratum we are aware of and trying to work around) However, the interesting point is the nested crash. This is a failed assertion while attempting to execute the kexec crash path. Xen is 4.3.1 based, and built with debug, so the stack trace below is generated with frame pointers, and is correct. (XEN) Xen call trace: (XEN) [<ffff82c4c01634ac>] __context_switch+0xb0/0x41e (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb (XEN) [<ffff82c4c0166789>] map_domain_page+0x98/0x5c4 (XEN) [<ffff82c4c0153820>] map_vtd_domain_page+0xd/0x1d (XEN) [<ffff82c4c015139f>] queue_invalidate_context+0x94/0x141 (XEN) [<ffff82c4c0151891>] flush_context_qi+0x55/0x66 (XEN) [<ffff82c4c014d1ed>] iommu_flush_all+0x68/0x12f (XEN) [<ffff82c4c014f770>] vtd_crash_shutdown+0x15/0x64 (XEN) [<ffff82c4c0149eec>] iommu_crash_shutdown+0x3f/0x4f (XEN) [<ffff82c4c01a8790>] machine_crash_shutdown+0x273/0x2eb (XEN) [<ffff82c4c0114af2>] kexec_crash+0x4c/0x70 (XEN) [<ffff82c4c01442f2>] panic+0x12c/0x15b (XEN) [<ffff82c4c0190815>] fatal_trap+0xb8/0xc6 (XEN) [<ffff82c4c0190f1c>] do_nmi+0xf9/0x180 (XEN) [<ffff82c4c02366fc>] handle_ist_exception+0x92/0xf6 (XEN) [<ffff82c4c0167558>] write_cr3+0x6a/0x83 (XEN) [<ffff82c4c0176b08>] write_ptbase+0x10/0x12 (XEN) [<ffff82c4c016374b>] __context_switch+0x34f/0x41e (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb (XEN) [<ffff82c4c012df35>] do_tasklet_work+0x9d/0xeb (XEN) [<ffff82c4c012e152>] tasklet_softirq_action+0x44/0x92 (XEN) [<ffff82c4c012b4bc>] __do_softirq+0x9f/0xb0 (XEN) [<ffff82c4c012b4e0>] do_softirq+0x13/0x15 (XEN) [<ffff82c4c01628bc>] idle_loop+0x66/0x6c (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Assertion 'cpumask_empty(n->vcpu_dirty_cpumask)' failed at domain.c:1321 (XEN) **************************************** (XEN) Here, we have managed to re-enter the __context_switch() path because of an NMI interrupting it. The sync_local_execstate() in map_domain_page() is by way of mapcache_current_vcpu(). I am struggling to work out how best to fix this. Would it be best for the crash path to unconditionally change to the idle_pagetables and use mapcache_override_current(NULL)? ~Andrew ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xen crash: map_domain_page() on an NMI path 2013-12-18 19:37 Xen crash: map_domain_page() on an NMI path Andrew Cooper @ 2013-12-19 11:00 ` Tim Deegan 2013-12-19 12:11 ` Andrew Cooper 2013-12-19 14:55 ` Jan Beulich 1 sibling, 1 reply; 8+ messages in thread From: Tim Deegan @ 2013-12-19 11:00 UTC (permalink / raw) To: Andrew Cooper; +Cc: Keir Fraser, Jan Beulich, Xen-devel List At 19:37 +0000 on 18 Dec (1387391848), Andrew Cooper wrote: > Hello, > > This is a stack trace caught by automated testing. The server BMC has > indicated that it has genuinely injected an IOCK NMI (which is believed > to be caused by a system erratum we are aware of and trying to work around) > > However, the interesting point is the nested crash. This is a failed > assertion while attempting to execute the kexec crash path. Xen is > 4.3.1 based, and built with debug, so the stack trace below is generated > with frame pointers, and is correct. > > (XEN) Xen call trace: > (XEN) [<ffff82c4c01634ac>] __context_switch+0xb0/0x41e > (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 > (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb > (XEN) [<ffff82c4c0166789>] map_domain_page+0x98/0x5c4 > (XEN) [<ffff82c4c0153820>] map_vtd_domain_page+0xd/0x1d > (XEN) [<ffff82c4c015139f>] queue_invalidate_context+0x94/0x141 > (XEN) [<ffff82c4c0151891>] flush_context_qi+0x55/0x66 > (XEN) [<ffff82c4c014d1ed>] iommu_flush_all+0x68/0x12f > (XEN) [<ffff82c4c014f770>] vtd_crash_shutdown+0x15/0x64 > (XEN) [<ffff82c4c0149eec>] iommu_crash_shutdown+0x3f/0x4f > (XEN) [<ffff82c4c01a8790>] machine_crash_shutdown+0x273/0x2eb > (XEN) [<ffff82c4c0114af2>] kexec_crash+0x4c/0x70 > (XEN) [<ffff82c4c01442f2>] panic+0x12c/0x15b > (XEN) [<ffff82c4c0190815>] fatal_trap+0xb8/0xc6 > (XEN) [<ffff82c4c0190f1c>] do_nmi+0xf9/0x180 > (XEN) [<ffff82c4c02366fc>] handle_ist_exception+0x92/0xf6 > (XEN) [<ffff82c4c0167558>] write_cr3+0x6a/0x83 > (XEN) [<ffff82c4c0176b08>] write_ptbase+0x10/0x12 > (XEN) [<ffff82c4c016374b>] __context_switch+0x34f/0x41e > (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 > (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb > (XEN) [<ffff82c4c012df35>] do_tasklet_work+0x9d/0xeb > (XEN) [<ffff82c4c012e152>] tasklet_softirq_action+0x44/0x92 > (XEN) [<ffff82c4c012b4bc>] __do_softirq+0x9f/0xb0 > (XEN) [<ffff82c4c012b4e0>] do_softirq+0x13/0x15 > (XEN) [<ffff82c4c01628bc>] idle_loop+0x66/0x6c > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Assertion 'cpumask_empty(n->vcpu_dirty_cpumask)' failed at > domain.c:1321 > (XEN) **************************************** > (XEN) > > Here, we have managed to re-enter the __context_switch() path because of > an NMI interrupting it. The sync_local_execstate() in map_domain_page() > is by way of mapcache_current_vcpu(). > > I am struggling to work out how best to fix this. Would it be best for > the crash path to unconditionally change to the idle_pagetables and use > mapcache_override_current(NULL)? I think it would be best for the iommu_crash_shutdown() path to be made crash-safe -- after all, that code takes spinlocks too. Presumably we can do something a bit ruder in crash code, like just turn the IOMMUs off entirely? Or are there other map_domain_page() ops on the crash path? Does kexec need it? Tim. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xen crash: map_domain_page() on an NMI path 2013-12-19 11:00 ` Tim Deegan @ 2013-12-19 12:11 ` Andrew Cooper 0 siblings, 0 replies; 8+ messages in thread From: Andrew Cooper @ 2013-12-19 12:11 UTC (permalink / raw) To: Tim Deegan; +Cc: Keir Fraser, Jan Beulich, Xen-devel List On 19/12/13 11:00, Tim Deegan wrote: > At 19:37 +0000 on 18 Dec (1387391848), Andrew Cooper wrote: >> Hello, >> >> This is a stack trace caught by automated testing. The server BMC has >> indicated that it has genuinely injected an IOCK NMI (which is believed >> to be caused by a system erratum we are aware of and trying to work around) >> >> However, the interesting point is the nested crash. This is a failed >> assertion while attempting to execute the kexec crash path. Xen is >> 4.3.1 based, and built with debug, so the stack trace below is generated >> with frame pointers, and is correct. >> >> (XEN) Xen call trace: >> (XEN) [<ffff82c4c01634ac>] __context_switch+0xb0/0x41e >> (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 >> (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb >> (XEN) [<ffff82c4c0166789>] map_domain_page+0x98/0x5c4 >> (XEN) [<ffff82c4c0153820>] map_vtd_domain_page+0xd/0x1d >> (XEN) [<ffff82c4c015139f>] queue_invalidate_context+0x94/0x141 >> (XEN) [<ffff82c4c0151891>] flush_context_qi+0x55/0x66 >> (XEN) [<ffff82c4c014d1ed>] iommu_flush_all+0x68/0x12f >> (XEN) [<ffff82c4c014f770>] vtd_crash_shutdown+0x15/0x64 >> (XEN) [<ffff82c4c0149eec>] iommu_crash_shutdown+0x3f/0x4f >> (XEN) [<ffff82c4c01a8790>] machine_crash_shutdown+0x273/0x2eb >> (XEN) [<ffff82c4c0114af2>] kexec_crash+0x4c/0x70 >> (XEN) [<ffff82c4c01442f2>] panic+0x12c/0x15b >> (XEN) [<ffff82c4c0190815>] fatal_trap+0xb8/0xc6 >> (XEN) [<ffff82c4c0190f1c>] do_nmi+0xf9/0x180 >> (XEN) [<ffff82c4c02366fc>] handle_ist_exception+0x92/0xf6 >> (XEN) [<ffff82c4c0167558>] write_cr3+0x6a/0x83 >> (XEN) [<ffff82c4c0176b08>] write_ptbase+0x10/0x12 >> (XEN) [<ffff82c4c016374b>] __context_switch+0x34f/0x41e >> (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83 >> (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb >> (XEN) [<ffff82c4c012df35>] do_tasklet_work+0x9d/0xeb >> (XEN) [<ffff82c4c012e152>] tasklet_softirq_action+0x44/0x92 >> (XEN) [<ffff82c4c012b4bc>] __do_softirq+0x9f/0xb0 >> (XEN) [<ffff82c4c012b4e0>] do_softirq+0x13/0x15 >> (XEN) [<ffff82c4c01628bc>] idle_loop+0x66/0x6c >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) Assertion 'cpumask_empty(n->vcpu_dirty_cpumask)' failed at >> domain.c:1321 >> (XEN) **************************************** >> (XEN) >> >> Here, we have managed to re-enter the __context_switch() path because of >> an NMI interrupting it. The sync_local_execstate() in map_domain_page() >> is by way of mapcache_current_vcpu(). >> >> I am struggling to work out how best to fix this. Would it be best for >> the crash path to unconditionally change to the idle_pagetables and use >> mapcache_override_current(NULL)? > I think it would be best for the iommu_crash_shutdown() path to be > made crash-safe -- after all, that code takes spinlocks too. > Presumably we can do something a bit ruder in crash code, like just > turn the IOMMUs off entirely? > > Or are there other map_domain_page() ops on the crash path? Does > kexec need it? > > Tim. I don't believe we can safely just disable the IOMMU without tearing it down in a sensible fashion. Having said that, we certainly should try and make the crash path as "crash safe" as possible. I don't think it is reasonable to prevent the use of map_domain_page() on codepaths in the crash path (as being too invasive), but the mapcache_override_current(NULL) is an override which prevents any playing with the pagetables, with the caveat that mfn_to_virt(mfn) needs to work for all mfn's in the current set of pagetables. ~Andrew ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xen crash: map_domain_page() on an NMI path 2013-12-18 19:37 Xen crash: map_domain_page() on an NMI path Andrew Cooper 2013-12-19 11:00 ` Tim Deegan @ 2013-12-19 14:55 ` Jan Beulich 2013-12-19 16:19 ` Andrew Cooper 1 sibling, 1 reply; 8+ messages in thread From: Jan Beulich @ 2013-12-19 14:55 UTC (permalink / raw) To: andrew.cooper3; +Cc: tim, keir, xen-devel >>> Andrew Cooper <andrew.cooper3@citrix.com> 12/18/13 8:37 PM >>> >However, the interesting point is the nested crash. This is a failed >assertion while attempting to execute the kexec crash path. Xen is >4.3.1 based, and built with debug, so the stack trace below is generated >with frame pointers, and is correct. >... >Here, we have managed to re-enter the __context_switch() path because of >an NMI interrupting it. The sync_local_execstate() in map_domain_page() >is by way of mapcache_current_vcpu(). > >I am struggling to work out how best to fix this. Would it be best for >the crash path to unconditionally change to the idle_pagetables and use >mapcache_override_current(NULL)? It is wrong to at all call map_domain_page() in NMI context, so fixing the environment for it to get invoked is not the right solution in any case (or else you'd have to fix more than this, like forcibly making the spin lock available that the code will want to acquire subsequently). As with anything else on the NMI path, and as you probably know better than me - great care is needed in every piece of code that may get invoked here. I don't think we want to make map_domain_page() usable in NMI context; instead I would think map_vtd_domain_page() might better learn of possibly getting called this way, and use fixmaps in that case (assuming we can determine a reasonably low maximum number of pages that may need to be mapped this way at any one time - ideally that would turn out to be just one). Jan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xen crash: map_domain_page() on an NMI path 2013-12-19 14:55 ` Jan Beulich @ 2013-12-19 16:19 ` Andrew Cooper 2013-12-20 8:43 ` Jan Beulich 0 siblings, 1 reply; 8+ messages in thread From: Andrew Cooper @ 2013-12-19 16:19 UTC (permalink / raw) To: Jan Beulich; +Cc: keir, tim, xen-devel On 19/12/13 14:55, Jan Beulich wrote: >>>> Andrew Cooper <andrew.cooper3@citrix.com> 12/18/13 8:37 PM >>> >> However, the interesting point is the nested crash. This is a failed >> assertion while attempting to execute the kexec crash path. Xen is >> 4.3.1 based, and built with debug, so the stack trace below is generated >> with frame pointers, and is correct. >> ... >> Here, we have managed to re-enter the __context_switch() path because of >> an NMI interrupting it. The sync_local_execstate() in map_domain_page() >> is by way of mapcache_current_vcpu(). >> >> I am struggling to work out how best to fix this. Would it be best for >> the crash path to unconditionally change to the idle_pagetables and use >> mapcache_override_current(NULL)? > It is wrong to at all call map_domain_page() in NMI context, so fixing the > environment for it to get invoked is not the right solution in any case (or > else you'd have to fix more than this, like forcibly making the spin lock > available that the code will want to acquire subsequently). > > As with anything else on the NMI path, and as you probably know better > than me - great care is needed in every piece of code that may get > invoked here. I don't think we want to make map_domain_page() usable > in NMI context; instead I would think map_vtd_domain_page() might > better learn of possibly getting called this way, and use fixmaps in that > case (assuming we can determine a reasonably low maximum number of > pages that may need to be mapped this way at any one time - ideally that > would turn out to be just one). > > Jan I would certainly agree that anything complicated in NMI context (map_domain_page() included) must be completely avoided. However, once we have started crashing, I would argue we have moved to a new context (a 'crashing' context perhaps?), which must be able to function correctly, no matter which context it originated from (be it regular, interrupt, NMI or MCE contexts). The advantage of the crashing context is that we can safely trash anything we need to, in order to execute purgatory with sensible state. This particular use of map_vtd_domain_page() is for the qinval control region, which looks to be up to 8 contiguous frames, per IOMMU. There are another 8 frames per IOMMU for the intremap control region (and other control regions as well). I don't think this practical to put all of this in the fixmap. However, for hardware pieces like this which are set up once at the start of day, and have the hardware pointed at a chosen region, would it be acceptable to allocate their frames low enough to be covered by the direct map area (protected by BUG()s?) and set up their base virtual addresses knowing that there will always be a valid mapping from any Xen pagetables? This seems better than constantly playing around with the mappings. For the 'crashing' context, I was thinking of extending enum system_state to include "panic" and "crash_single" states, where crash_single implies "safe to not actually lock a spinlock" Integrating this without an adverse effect on spinlock performance might be a little tricky however. ~Andrew ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xen crash: map_domain_page() on an NMI path 2013-12-19 16:19 ` Andrew Cooper @ 2013-12-20 8:43 ` Jan Beulich 2013-12-27 7:29 ` Kai Huang 0 siblings, 1 reply; 8+ messages in thread From: Jan Beulich @ 2013-12-20 8:43 UTC (permalink / raw) To: Andrew Cooper; +Cc: tim, keir, xen-devel >>> On 19.12.13 at 17:19, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > However, for hardware pieces like this which are set up once at the > start of day, and have the hardware pointed at a chosen region, would it > be acceptable to allocate their frames low enough to be covered by the > direct map area (protected by BUG()s?) and set up their base virtual > addresses knowing that there will always be a valid mapping from any Xen > pagetables? This seems better than constantly playing around with the > mappings. That would still require further special casing in map_domain_page(). In the case here, and with 32-bit no longer a concern, a virtual mapping should rather be obtained at boot time once and for all using vmap(). > For the 'crashing' context, I was thinking of extending enum > system_state to include "panic" and "crash_single" states, where > crash_single implies "safe to not actually lock a spinlock" Integrating > this without an adverse effect on spinlock performance might be a little > tricky however. It's not just performance - depending what a particular lock protects, just ignoring the need to take the lock may end up using inconsistent state. Jan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xen crash: map_domain_page() on an NMI path 2013-12-20 8:43 ` Jan Beulich @ 2013-12-27 7:29 ` Kai Huang 2013-12-27 11:38 ` Andrew Cooper 0 siblings, 1 reply; 8+ messages in thread From: Kai Huang @ 2013-12-27 7:29 UTC (permalink / raw) To: Jan Beulich; +Cc: Andrew Cooper, keir, tim, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1740 bytes --] On Fri, Dec 20, 2013 at 4:43 PM, Jan Beulich <JBeulich@suse.com> wrote: > >>> On 19.12.13 at 17:19, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > However, for hardware pieces like this which are set up once at the > > start of day, and have the hardware pointed at a chosen region, would it > > be acceptable to allocate their frames low enough to be covered by the > > direct map area (protected by BUG()s?) and set up their base virtual > > addresses knowing that there will always be a valid mapping from any Xen > > pagetables? This seems better than constantly playing around with the > > mappings. > > That would still require further special casing in map_domain_page(). > > In the case here, and with 32-bit no longer a concern, a virtual > mapping should rather be obtained at boot time once and for all > using vmap(). > A question about map_domain_page. If I understand correctly, currently map_domain_page will still do page table setup with virtual address in mapcache area. Why can't we just map all physical memory to XEN's virtual address slot, and do mfn_to_virt to get the virtual address? -Kai > > > For the 'crashing' context, I was thinking of extending enum > > system_state to include "panic" and "crash_single" states, where > > crash_single implies "safe to not actually lock a spinlock" Integrating > > this without an adverse effect on spinlock performance might be a little > > tricky however. > > It's not just performance - depending what a particular lock > protects, just ignoring the need to take the lock may end up > using inconsistent state. > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > [-- Attachment #1.2: Type: text/html, Size: 2712 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Xen crash: map_domain_page() on an NMI path 2013-12-27 7:29 ` Kai Huang @ 2013-12-27 11:38 ` Andrew Cooper 0 siblings, 0 replies; 8+ messages in thread From: Andrew Cooper @ 2013-12-27 11:38 UTC (permalink / raw) To: Kai Huang, Jan Beulich; +Cc: keir, tim, xen-devel [-- Attachment #1.1: Type: text/plain, Size: 1652 bytes --] On 27/12/2013 07:29, Kai Huang wrote: > > > > On Fri, Dec 20, 2013 at 4:43 PM, Jan Beulich <JBeulich@suse.com > <mailto:JBeulich@suse.com>> wrote: > > >>> On 19.12.13 at 17:19, Andrew Cooper <andrew.cooper3@citrix.com > <mailto:andrew.cooper3@citrix.com>> wrote: > > However, for hardware pieces like this which are set up once at the > > start of day, and have the hardware pointed at a chosen region, > would it > > be acceptable to allocate their frames low enough to be covered > by the > > direct map area (protected by BUG()s?) and set up their base virtual > > addresses knowing that there will always be a valid mapping from > any Xen > > pagetables? This seems better than constantly playing around > with the > > mappings. > > That would still require further special casing in map_domain_page(). > > In the case here, and with 32-bit no longer a concern, a virtual > mapping should rather be obtained at boot time once and for all > using vmap(). > > > A question about map_domain_page. If I understand correctly, currently > map_domain_page will still do page table setup with virtual address in > mapcache area. Why can't we just map all physical memory to XEN's > virtual address slot, and do mfn_to_virt to get the virtual address? > > -Kai Xen hands most of the upper canonical half to 64bit PV guest kernels. The first 5TiB of RAM is unconditionally available via mfn_to_virt, but Xen supports up to 16TiB of RAM. Therefore, a server with more than 5TiB of RAM, or with RAM hoplug regions above the 5TiB boundary require domain mappings to be accessed. ~Andrew [-- Attachment #1.2: Type: text/html, Size: 3245 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-12-27 11:38 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-12-18 19:37 Xen crash: map_domain_page() on an NMI path Andrew Cooper 2013-12-19 11:00 ` Tim Deegan 2013-12-19 12:11 ` Andrew Cooper 2013-12-19 14:55 ` Jan Beulich 2013-12-19 16:19 ` Andrew Cooper 2013-12-20 8:43 ` Jan Beulich 2013-12-27 7:29 ` Kai Huang 2013-12-27 11:38 ` Andrew Cooper
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.