From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Xen-devel <xen-devel@lists.xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
Keir Fraser <keir@xen.org>,
David Vrabel <david.vrabel@citrix.com>,
Jan Beulich <JBeulich@suse.com>, Tim Deegan <tim@xen.org>
Subject: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.
Date: Fri, 15 Nov 2013 20:32:46 +0000 [thread overview]
Message-ID: <1384547567-17059-2-git-send-email-andrew.cooper3@citrix.com> (raw)
In-Reply-To: <1384547567-17059-1-git-send-email-andrew.cooper3@citrix.com>
In some cases, such as suffering a queued-invalidation timeout while
performing an iommu_crash_shutdown(), Xen can end up reentering the crash
path. Previously, this would result in a deadlock in one_cpu_only(), as the
test_and_set_bit() would fail.
The crash path is not reentrant, and even if it could be made to be so, it is
almost certain that we would fall over the same reentry condition again.
The new code can distinguish a reentry case from multiple cpus racing down the
crash path. In the case that a reentry is detected, return back out to the
nested panic() call, which will maybe_reboot() on our behalf. This requires a
bit of return plumbing back up to kexec_crash().
While fixing this deadlock, also fix up an minor niggle seen recently from a
XenServer crash report. The report was from a Bank 8 MCE, which had managed
to crash on all cpus at once. The result was a lot of stack traces with cpus
in kexec_common_shutdown(), which was infact the inlined version of
one_cpu_only(). The kexec crash path is not a hotpath, so we can easily
afford to prevent inlining for the sake of clarity in the stack traces.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: David Vrabel <david.vrabel@citrix.com>
---
xen/common/kexec.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 44 insertions(+), 7 deletions(-)
diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 17f3ed7..481b0c2 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -233,11 +233,39 @@ void __init set_kexec_crash_area_size(u64 system_ram)
}
}
-static void one_cpu_only(void)
+/*
+ * Only allow one cpu to continue on the crash path, forcing others to spin.
+ * Racing on the crash path from here will end in misery. If we reenter,
+ * something has very gone wrong and retrying will (almost certainly) be
+ * futile. Return up to our nested panic() to try and reboot.
+ *
+ * This is noinline to make it obvious in stack traces which cpus have lost
+ * the race (as opposed to being somewhere in kexec_common_shutdown())
+ */
+static int noinline one_cpu_only(void)
{
- /* Only allow the first cpu to continue - force other cpus to spin */
- if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
- for ( ; ; ) ;
+ static unsigned int crashing_cpu = -1;
+ unsigned int cpu = smp_processor_id();
+
+ if ( cmpxchg(&crashing_cpu, -1, cpu) != -1 )
+ {
+ /* Not the first entry into one_cpu_only(). */
+ if ( crashing_cpu == cpu )
+ {
+ printk("Reentered the crash path. Something is very broken\n");
+ return -EBUSY;
+ }
+
+ /*
+ * Another cpu has beaten us to this point. Wait here patiently for
+ * it to kill us.
+ */
+ for ( ; ; )
+ halt();
+ }
+
+ set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags);
+ return 0;
}
/* Save the registers in the per-cpu crash note buffer. */
@@ -288,13 +316,20 @@ crash_xen_info_t *kexec_crash_save_info(void)
return out;
}
-static void kexec_common_shutdown(void)
+static int kexec_common_shutdown(void)
{
+ int ret;
+
+ ret = one_cpu_only();
+ if ( ret )
+ return ret;
+
watchdog_disable();
console_start_sync();
spin_debug_disable();
- one_cpu_only();
acpi_dmar_reinstate();
+
+ return 0;
}
void kexec_crash(void)
@@ -309,7 +344,9 @@ void kexec_crash(void)
kexecing = TRUE;
- kexec_common_shutdown();
+ if ( kexec_common_shutdown() != 0 )
+ return;
+
kexec_crash_save_cpu();
machine_crash_shutdown();
machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
--
1.7.10.4
next prev parent reply other threads:[~2013-11-15 20:32 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-15 20:32 [PATCH 0/2] Kexec crash path fixes Andrew Cooper
2013-11-15 20:32 ` Andrew Cooper [this message]
2013-11-22 14:55 ` [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path Andrew Cooper
2013-11-25 13:28 ` Jan Beulich
2013-11-25 13:30 ` Andrew Cooper
2013-11-25 13:39 ` Jan Beulich
2013-11-25 15:38 ` Andrew Cooper
2013-11-27 10:27 ` David Vrabel
2013-11-15 20:32 ` [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu Andrew Cooper
2013-11-15 21:01 ` David Vrabel
2013-11-15 21:09 ` Andrew Cooper
2013-11-18 9:26 ` Jan Beulich
2013-11-18 10:33 ` Andrew Cooper
2013-11-18 10:35 ` Andrew Cooper
2013-11-18 11:04 ` Jan Beulich
2013-11-18 11:09 ` Andrew Cooper
2013-11-19 10:53 ` Ian Campbell
2013-11-20 15:08 ` [Patch v2 " Andrew Cooper
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1384547567-17059-2-git-send-email-andrew.cooper3@citrix.com \
--to=andrew.cooper3@citrix.com \
--cc=JBeulich@suse.com \
--cc=david.vrabel@citrix.com \
--cc=keir@xen.org \
--cc=tim@xen.org \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).