From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Tim Deegan <tim@xen.org>
Cc: Keir Fraser <keir@xen.org>, Jan Beulich <JBeulich@suse.com>,
Xen-devel List <xen-devel@lists.xen.org>
Subject: Re: Xen crash: map_domain_page() on an NMI path
Date: Thu, 19 Dec 2013 12:11:24 +0000 [thread overview]
Message-ID: <52B2E26C.2000402@citrix.com> (raw)
In-Reply-To: <20131219110036.GB20571@deinos.phlegethon.org>
On 19/12/13 11:00, Tim Deegan wrote:
> At 19:37 +0000 on 18 Dec (1387391848), Andrew Cooper wrote:
>> Hello,
>>
>> This is a stack trace caught by automated testing. The server BMC has
>> indicated that it has genuinely injected an IOCK NMI (which is believed
>> to be caused by a system erratum we are aware of and trying to work around)
>>
>> However, the interesting point is the nested crash. This is a failed
>> assertion while attempting to execute the kexec crash path. Xen is
>> 4.3.1 based, and built with debug, so the stack trace below is generated
>> with frame pointers, and is correct.
>>
>> (XEN) Xen call trace:
>> (XEN) [<ffff82c4c01634ac>] __context_switch+0xb0/0x41e
>> (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83
>> (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb
>> (XEN) [<ffff82c4c0166789>] map_domain_page+0x98/0x5c4
>> (XEN) [<ffff82c4c0153820>] map_vtd_domain_page+0xd/0x1d
>> (XEN) [<ffff82c4c015139f>] queue_invalidate_context+0x94/0x141
>> (XEN) [<ffff82c4c0151891>] flush_context_qi+0x55/0x66
>> (XEN) [<ffff82c4c014d1ed>] iommu_flush_all+0x68/0x12f
>> (XEN) [<ffff82c4c014f770>] vtd_crash_shutdown+0x15/0x64
>> (XEN) [<ffff82c4c0149eec>] iommu_crash_shutdown+0x3f/0x4f
>> (XEN) [<ffff82c4c01a8790>] machine_crash_shutdown+0x273/0x2eb
>> (XEN) [<ffff82c4c0114af2>] kexec_crash+0x4c/0x70
>> (XEN) [<ffff82c4c01442f2>] panic+0x12c/0x15b
>> (XEN) [<ffff82c4c0190815>] fatal_trap+0xb8/0xc6
>> (XEN) [<ffff82c4c0190f1c>] do_nmi+0xf9/0x180
>> (XEN) [<ffff82c4c02366fc>] handle_ist_exception+0x92/0xf6
>> (XEN) [<ffff82c4c0167558>] write_cr3+0x6a/0x83
>> (XEN) [<ffff82c4c0176b08>] write_ptbase+0x10/0x12
>> (XEN) [<ffff82c4c016374b>] __context_switch+0x34f/0x41e
>> (XEN) [<ffff82c4c016388d>] __sync_local_execstate+0x73/0x83
>> (XEN) [<ffff82c4c01638a6>] sync_local_execstate+0x9/0xb
>> (XEN) [<ffff82c4c012df35>] do_tasklet_work+0x9d/0xeb
>> (XEN) [<ffff82c4c012e152>] tasklet_softirq_action+0x44/0x92
>> (XEN) [<ffff82c4c012b4bc>] __do_softirq+0x9f/0xb0
>> (XEN) [<ffff82c4c012b4e0>] do_softirq+0x13/0x15
>> (XEN) [<ffff82c4c01628bc>] idle_loop+0x66/0x6c
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion 'cpumask_empty(n->vcpu_dirty_cpumask)' failed at
>> domain.c:1321
>> (XEN) ****************************************
>> (XEN)
>>
>> Here, we have managed to re-enter the __context_switch() path because of
>> an NMI interrupting it. The sync_local_execstate() in map_domain_page()
>> is by way of mapcache_current_vcpu().
>>
>> I am struggling to work out how best to fix this. Would it be best for
>> the crash path to unconditionally change to the idle_pagetables and use
>> mapcache_override_current(NULL)?
> I think it would be best for the iommu_crash_shutdown() path to be
> made crash-safe -- after all, that code takes spinlocks too.
> Presumably we can do something a bit ruder in crash code, like just
> turn the IOMMUs off entirely?
>
> Or are there other map_domain_page() ops on the crash path? Does
> kexec need it?
>
> Tim.
I don't believe we can safely just disable the IOMMU without tearing it
down in a sensible fashion.
Having said that, we certainly should try and make the crash path as
"crash safe" as possible.
I don't think it is reasonable to prevent the use of map_domain_page()
on codepaths in the crash path (as being too invasive), but the
mapcache_override_current(NULL) is an override which prevents any
playing with the pagetables, with the caveat that mfn_to_virt(mfn) needs
to work for all mfn's in the current set of pagetables.
~Andrew
next prev parent reply other threads:[~2013-12-19 12:11 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-18 19:37 Xen crash: map_domain_page() on an NMI path Andrew Cooper
2013-12-19 11:00 ` Tim Deegan
2013-12-19 12:11 ` Andrew Cooper [this message]
2013-12-19 14:55 ` Jan Beulich
2013-12-19 16:19 ` Andrew Cooper
2013-12-20 8:43 ` Jan Beulich
2013-12-27 7:29 ` Kai Huang
2013-12-27 11:38 ` Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52B2E26C.2000402@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=keir@xen.org \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.