From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Michael A Fetterman <Michael.Fetterman@cl.cam.ac.uk>,
Tim Deegan <Tim.Deegan@xensource.com>,
Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
Cc: Chris Wright <chrisw@sous-sol.org>,
Xen-devel <xen-devel@lists.xensource.com>
Subject: shadow2 corrupting PV guest state
Date: Fri, 13 Oct 2006 16:27:42 -0700 [thread overview]
Message-ID: <453020EE.4080603@goop.org> (raw)
I've been fighting random crashes in the paravirt tree for a while.
After a fair amount of head-banging, it looks to me like the shadow2
code is trashing the guest stack (and maybe register state) at random
points.
If I boot a kernel with CONFIG_DEBUG_PAGEALLOC enabled (which
dramatically increases the rate of pagetable modifications), it rarely
makes it through early boot without some random crash. The crashes are
often at the same place, but they move around; however they tend to be
near places where the pagetable is touched. It may also interact with
timer events; certainly masking events seems to help a bit.
I tend to see this a lot more when running under qemu, but I've also
seen strange things happen on real hardware.
If I roll Xen back to pre-shadow2 (change fda70200da01), all these
mysterious crashes disappear.
Looking into it a bit more deeply, the kind of crash I'm seeing are
along the lines of:
mov (%ebx), %eax # works; %ebx is a valid pointer
call xen_enable_irq
mov %eax, (%ebx) # crashes; %ebx will equal 0, 1, or something bad
where xen_enable_irq will have pushed %ebx, set the flag state, polled
for pending events and popped %ebx. My suspicion is that something
about re-enabling interrupts is causing the on-stack version of %ebx to
get trashed, rather than the actual %ebx register state (in general the
corrupted register is the one near or at the top of the stack).
Sometimes the corruption shows up as %eip off in the weeds (either at
NULL-ish addresses, or executing the stack).
I'm speculating that the sequence is:
1. change pagetable; this creates a deferred pagetable update
2. enable events
3. handle pending timer interrupt, which also does a deferred
pagetable update
4. resume running with corrupted stack
But I don't really know enough about how shadow2 works to know if that's
really plausible. Maybe a vcpu/guest context switch is a part of the
sequence.
I wonder if the stack corruption is caused by a mismatch of exception
frame formats between exception->iret?
All a bit handwavy, but I haven't really managed to make much headway.
I spent some time assuming it was a bug on my side, but the fact that
all these symptoms go away with pre-shadow2 Xen makes me point the
finger over the wall.
Or perhaps it really is just a qemu bug, but I can't imagine that
shadow2 exercises qemu's emulated CPU exception stuff in a way which
normal Xen doesn't... I think its more likely that there's some race
which is much more easily triggered by qemu's slow speed.
Any thoughts or ideas?
J
next reply other threads:[~2006-10-13 23:27 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-10-13 23:27 Jeremy Fitzhardinge [this message]
2006-10-14 7:05 ` shadow2 corrupting PV guest state Keir Fraser
2006-10-16 22:09 ` Jeremy Fitzhardinge
2006-10-17 6:51 ` Keir Fraser
2006-10-17 18:54 ` Jeremy Fitzhardinge
2006-10-20 13:42 ` Doi.Tsunehisa
2006-10-20 13:57 ` Tim Deegan
2006-10-23 5:45 ` Doi.Tsunehisa
2006-10-23 10:26 ` Tim Deegan
2006-10-23 11:21 ` Doi.Tsunehisa
2006-10-23 12:42 ` Tim Deegan
2006-10-24 7:18 ` Doi.Tsunehisa
2006-10-24 9:09 ` Tim Deegan
2006-10-24 9:39 ` Doi.Tsunehisa
2006-10-24 9:44 ` Keir Fraser
2006-10-24 10:05 ` Doi.Tsunehisa
2006-10-24 10:08 ` Keir Fraser
2006-10-24 10:16 ` Doi.Tsunehisa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=453020EE.4080603@goop.org \
--to=jeremy@goop.org \
--cc=Keir.Fraser@cl.cam.ac.uk \
--cc=Michael.Fetterman@cl.cam.ac.uk \
--cc=Tim.Deegan@xensource.com \
--cc=chrisw@sous-sol.org \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.