From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Jan Beulich <JBeulich@suse.com>, Tim Deegan <tim@xen.org>,
Keir Fraser <keir@xen.org>
Cc: Xen-devel List <xen-devel@lists.xen.org>
Subject: HPET stack overflow, and general problems with do_IRQ()
Date: Thu, 15 Aug 2013 21:21:52 +0100 [thread overview]
Message-ID: <520D3860.5010109@citrix.com> (raw)
Hello,
I have finally managed to get a full stack dump from affected hardware.
The logs can be found here (including hypervisor with debugging symbols):
http://xenbits.xen.org/people/andrewcoop/hpet-overflow-full-stackdump.tar.gz
The interesting log file is xen.pcpu0.stack.log
By my count (grepping for e008 as CS), there are are 8 exception frames
on the Xen stack (all stack page 6)
However, because of the early ack() at the LAPIC, and disabling of
interrupts, the vectors (in order of interrupts arriving) are
c1, 99, b1, b9, a9, a1, 91, 89
These 8 interrupts take a little more than half the available stack,
while the bottom half of the stack seems be a vmentry which failed
because of a hap pagefault.
One "solution" to the problem would be to extend the Xen
PRIMARY_STACK_SIZE to 3 pages rather than 2, but is hardly a good thing
to do.
I think that the fundamental problem is the early ack and re-enabling of
interrupts. We have servers where 150 VMs using PCIpassthrough are
starting to run out of available entries in the IDTs. While unlikely,
it would be possible to encounter a situation with 40 nested interrupts,
at which point there is a real danger of trashing the compat sysenter
trampoline, located at the base of stack page 3, and just a few more
before MCEs and NMIs will end up walking over the main stack.
While I hate to suggest this, the only sensible solution without edge
cases is to never enable interrupts in do_IRQ(). I suppose a slightly
less extreme solution could be to promote the TPR to 0xe0 and re-enable
interrupts, so high-priority processing can still occur?
Thought/comments?
Unfortunately, I am out of the office now until Monday 26th, with
limited access to internet during that time (Although I will still be
with internet tomorrow morning). I will check emails when I can, but I
don't expect to be able to make timely contributions to the above
discussion during this time.
~Andrew
next reply other threads:[~2013-08-15 20:21 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-15 20:21 Andrew Cooper [this message]
2013-08-16 7:53 ` HPET stack overflow, and general problems with do_IRQ() Jan Beulich
2013-08-16 15:34 ` Keir Fraser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=520D3860.5010109@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=keir@xen.org \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).