All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen crash: map_domain_page() on an NMI path
@ 2013-12-18 19:37 Andrew Cooper
  2013-12-19 11:00 ` Tim Deegan
  2013-12-19 14:55 ` Jan Beulich
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Cooper @ 2013-12-18 19:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Keir Fraser, Tim Deegan, Xen-devel List

Hello,

This is a stack trace caught by automated testing.  The server BMC has
indicated that it has genuinely injected an IOCK NMI (which is believed
to be caused by a system erratum we are aware of and trying to work around)

However, the interesting point is the nested crash.  This is a failed
assertion while attempting to execute the kexec crash path.  Xen is
4.3.1 based, and built with debug, so the stack trace below is generated
with frame pointers, and is correct.

(XEN) Xen call trace:
(XEN)    [<ffff82c4c01634ac>] __context_switch+0xb0/0x41e
(XEN)    [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83
(XEN)    [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb
(XEN)    [<ffff82c4c0166789>] map_domain_page+0x98/0x5c4
(XEN)    [<ffff82c4c0153820>] map_vtd_domain_page+0xd/0x1d
(XEN)    [<ffff82c4c015139f>] queue_invalidate_context+0x94/0x141
(XEN)    [<ffff82c4c0151891>] flush_context_qi+0x55/0x66
(XEN)    [<ffff82c4c014d1ed>] iommu_flush_all+0x68/0x12f
(XEN)    [<ffff82c4c014f770>] vtd_crash_shutdown+0x15/0x64
(XEN)    [<ffff82c4c0149eec>] iommu_crash_shutdown+0x3f/0x4f
(XEN)    [<ffff82c4c01a8790>] machine_crash_shutdown+0x273/0x2eb
(XEN)    [<ffff82c4c0114af2>] kexec_crash+0x4c/0x70
(XEN)    [<ffff82c4c01442f2>] panic+0x12c/0x15b
(XEN)    [<ffff82c4c0190815>] fatal_trap+0xb8/0xc6
(XEN)    [<ffff82c4c0190f1c>] do_nmi+0xf9/0x180
(XEN)    [<ffff82c4c02366fc>] handle_ist_exception+0x92/0xf6
(XEN)    [<ffff82c4c0167558>] write_cr3+0x6a/0x83
(XEN)    [<ffff82c4c0176b08>] write_ptbase+0x10/0x12
(XEN)    [<ffff82c4c016374b>] __context_switch+0x34f/0x41e
(XEN)    [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83
(XEN)    [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb
(XEN)    [<ffff82c4c012df35>] do_tasklet_work+0x9d/0xeb
(XEN)    [<ffff82c4c012e152>] tasklet_softirq_action+0x44/0x92
(XEN)    [<ffff82c4c012b4bc>] __do_softirq+0x9f/0xb0
(XEN)    [<ffff82c4c012b4e0>] do_softirq+0x13/0x15
(XEN)    [<ffff82c4c01628bc>] idle_loop+0x66/0x6c
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'cpumask_empty(n->vcpu_dirty_cpumask)' failed at
domain.c:1321
(XEN) ****************************************
(XEN)

Here, we have managed to re-enter the __context_switch() path because of
an NMI interrupting it.  The sync_local_execstate() in map_domain_page()
is by way of mapcache_current_vcpu().

I am struggling to work out how best to fix this.  Would it be best for
the crash path to unconditionally change to the idle_pagetables and use
mapcache_override_current(NULL)?

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-12-27 11:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-18 19:37 Xen crash: map_domain_page() on an NMI path Andrew Cooper
2013-12-19 11:00 ` Tim Deegan
2013-12-19 12:11   ` Andrew Cooper
2013-12-19 14:55 ` Jan Beulich
2013-12-19 16:19   ` Andrew Cooper
2013-12-20  8:43     ` Jan Beulich
2013-12-27  7:29       ` Kai Huang
2013-12-27 11:38         ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.