From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path. Date: Mon, 25 Nov 2013 13:30:19 +0000 Message-ID: <529350EB.5010804@citrix.com> References: <1384547567-17059-1-git-send-email-andrew.cooper3@citrix.com> <1384547567-17059-2-git-send-email-andrew.cooper3@citrix.com> <52935E7A0200007800106A69@nat28.tlf.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1VkwF5-0001rB-UZ for xen-devel@lists.xenproject.org; Mon, 25 Nov 2013 13:30:24 +0000 In-Reply-To: <52935E7A0200007800106A69@nat28.tlf.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: xen-devel , Keir Fraser , David Vrabel , Tim Deegan List-Id: xen-devel@lists.xenproject.org On 25/11/13 13:28, Jan Beulich wrote: >>>> On 15.11.13 at 21:32, Andrew Cooper wrote: >> In some cases, such as suffering a queued-invalidation timeout while >> performing an iommu_crash_shutdown(), Xen can end up reentering the crash >> path. Previously, this would result in a deadlock in one_cpu_only(), as the >> test_and_set_bit() would fail. >> >> The crash path is not reentrant, and even if it could be made to be so, it is >> almost certain that we would fall over the same reentry condition again. >> >> The new code can distinguish a reentry case from multiple cpus racing down the >> crash path. In the case that a reentry is detected, return back out to the >> nested panic() call, which will maybe_reboot() on our behalf. This requires a >> bit of return plumbing back up to kexec_crash(). >> >> While fixing this deadlock, also fix up an minor niggle seen recently from a >> XenServer crash report. The report was from a Bank 8 MCE, which had managed >> to crash on all cpus at once. The result was a lot of stack traces with cpus >> in kexec_common_shutdown(), which was infact the inlined version of >> one_cpu_only(). The kexec crash path is not a hotpath, so we can easily >> afford to prevent inlining for the sake of clarity in the stack traces. >> >> Signed-off-by: Andrew Cooper >> CC: Keir Fraser >> CC: Jan Beulich >> CC: Tim Deegan >> CC: David Vrabel >> --- >> xen/common/kexec.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- >> 1 file changed, 44 insertions(+), 7 deletions(-) > David, you being the maintainer of this code now, I don't think I've > seen a response from you on this patch, despite - iirc - Andrew > having pinged you on it already too. > > Jan > David is out of the office for a week on vacation at the moment. I did get code-review from him before submitting it upstream, but I guess that doesn't count for much as a formal ack. ~Andrew