From: Chris Metcalf <cmetcalf@tilera.com>
To: holzheu@linux.vnet.ibm.com
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
David Howells <dhowells@redhat.com>,
Chen Liqin <liqin.chen@sunplusct.com>,
Paul Mackerras <paulus@samba.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Guan Xuetao <gxt@mprc.pku.edu.cn>,
Lennox Wu <lennox.wu@gmail.com>,
Hans-Christian Egtvedt <egtvedt@samfundet.no>,
Jonas Bonn <jonas@southpole.se>,
Jesper Nilsson <jesper.nilsson@axis.com>,
Russell King <linux@arm.linux.org.uk>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
"David S. Miller" <davem@davemloft.net>,
Richard Weinberger <richard@nod.at>, Helge Deller <deller@gmx.de>,
"James E.J. Bottomley" <jejb@parisc-linux.org>,
Ingo Molnar <mingo@redhat.com>,
Geert Uytterhoeven <geert@linux-m68k.org>,
linux-arch@vger.kernel.org, Matt Turner <mattst88@gmail.com>,
Vivek Goyal <vgoyal@redhat.com>,
Haavard Skinnemoen <hskinnemoen@gmail.com>,
Don Zickus <dzickus@redhat.com>,
Fenghua Yu <fenghua.yu@intel.com>,
Mike Frysinger <vapier@gentoo.org>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Jeff Dike <jdike@addtoit.com>, Mikael Starvik <starvik@axis.com>,
Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
Thomas Gleixner <tglx@linutronix.de>,
Richard Henderson <rth@twiddle.net>,
Chris Zankel <chris@zankel.net>, Michal Simek <monstr@monstr.eu>,
Tony Luck <tony.luck@intel.com>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
Ralf Baechle <ralf@linux-mips.org>,
Richard Kuo <rkuo@codeaurora.org>,
Kyle McMartin <kyle@mcmartin.ca>,
Paul Mundt <lethal@linux-sh.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Koichi Yasutake <yasutake.koichi@jp.panasonic.com>,
Hirokazu Takata <takata@linux-m32r.org>
Subject: Re: [PATCH] kdump: Fix crash_kexec - smp_send_stop race in panic
Date: Fri, 11 Nov 2011 12:02:46 -0500 [thread overview]
Message-ID: <4EBD5536.7010806@tilera.com> (raw)
In-Reply-To: <1321014494.2745.7.camel@br98xy6r>
On 11/11/2011 7:28 AM, Michael Holzheu wrote:
> Hello Chris,
>
> On Thu, 2011-11-10 at 10:11 -0500, Chris Metcalf wrote:
>> On 11/10/2011 9:22 AM, Michael Holzheu wrote:
> [snip]
>
>> If a cleaner API seems useful (either for power reasons or restartability
>> or whatever), I suppose a standard global function name could be specified
>> that's the thing you execute when you get an smp_send_stop IPI (in tile's
>> case it's "smp_stop_cpu_interrupt()") and the panic() code could instead
>> just do an atomic_inc_return() of a global panic counter, and if it wasn't
>> the first panicking cpu, call directly into the smp_stop handler routine to
>> quiesce itself. Then the panicking cpu could finish whatever it needs to
>> do and then halt, reboot, etc., all the cpus.
> Thanks for the info. So introducing a "weak" function that can stop the
> CPU it is running on could solve the problem. Every architecture can
> override the function with something appropriate. E.g. "tile" can use
> the lower-power "nap" instruction there.
>
> What about the following patch.
Seems reasonable to me.
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
>
> Michael
> ---
> From: Michael Holzheu<holzheu@linux.vnet.ibm.com>
> Subject: kdump: fix crash_kexec()/smp_send_stop() race in panic
>
> When two CPUs call panic at the same time there is a possible race
> condition that can stop kdump. The first CPU calls crash_kexec() and the
> second CPU calls smp_send_stop() in panic() before crash_kexec() finished
> on the first CPU. So the second CPU stops the first CPU and therefore
> kdump fails:
>
> 1st CPU:
> panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump
>
> 2nd CPU:
> panic()->crash_kexec()->kexec_mutex already held by 1st CPU
> ->smp_send_stop()-> stop 1st CPU (stop kdump)
>
> This patch fixes the problem by introducing a spinlock in panic that
> allows only one CPU to process crash_kexec() and the subsequent panic
> code.
>
> Signed-off-by: Michael Holzheu<holzheu@linux.vnet.ibm.com>
> ---
> kernel/panic.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -49,6 +49,15 @@ static long no_blink(int state)
> long (*panic_blink)(int state);
> EXPORT_SYMBOL(panic_blink);
>
> +/*
> + * Stop ourself in panic -- architecture code may override this
> + */
> +void __attribute__ ((weak)) panic_smp_self_stop(void)
> +{
> + while (1)
> + cpu_relax();
> +}
> +
> /**
> * panic - halt the system
> * @fmt: The text string to print
> @@ -59,6 +68,7 @@ EXPORT_SYMBOL(panic_blink);
> */
> NORET_TYPE void panic(const char * fmt, ...)
> {
> + static DEFINE_SPINLOCK(panic_lock);
> static char buf[1024];
> va_list args;
> long i, i_next = 0;
> @@ -68,8 +78,14 @@ NORET_TYPE void panic(const char * fmt,
> * It's possible to come here directly from a panic-assertion and
> * not have preempt disabled. Some functions called from here want
> * preempt to be disabled. No point enabling it later though...
> + *
> + * Only one CPU is allowed to execute the panic code from here. For
> + * multiple parallel invocations of panic, all other CPUs either
> + * stop themself or will wait until they are stopped by the 1st CPU
> + * with smp_send_stop().
> */
> - preempt_disable();
> + if (!spin_trylock(&panic_lock))
> + panic_smp_self_stop();
>
> console_verbose();
> bust_spinlocks(1);
>
>
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Chris Metcalf <cmetcalf-kv+TWInifGbQT0dZR+AlfA@public.gmane.org>
To: holzheu-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org
Cc: Benjamin Herrenschmidt
<benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r@public.gmane.org>,
Heiko Carstens
<heiko.carstens-tA70FqPdS9bQT0dZR+AlfA@public.gmane.org>,
David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Chen Liqin <liqin.chen-+XGAvkf1AAHby3iVrkZq2A@public.gmane.org>,
Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>,
"H. Peter Anvin" <hpa-YMNOUZJC4hwAvxtiuMwx3w@public.gmane.org>,
Guan Xuetao <gxt-TG0Ac1+ktVePQbnJrJN+5g@public.gmane.org>,
Lennox Wu <lennox.wu-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Hans-Christian Egtvedt
<egtvedt-BrfabpQBY5qlHtIdYg32fQ@public.gmane.org>,
Jonas Bonn <jonas-A9uVI2HLR7kOP4wsBPIw7w@public.gmane.org>,
Jesper Nilsson <jesper.nilsson-VrBV9hrLPhE@public.gmane.org>,
Russell King <linux-lFZ/pmaqli7XmaaqVzeoHQ@public.gmane.org>,
Yoshinori Sato
<ysato-Rn4VEauK+AKRv+LV9MX5uooqe+aC9MnS@public.gmane.org>,
"David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>,
Richard Weinberger <richard-/L3Ra7n9ekc@public.gmane.org>,
Helge Deller <deller-Mmb7MZpHnFY@public.gmane.org>,
"James E.J. Bottomley"
<jejb-6jwH94ZQLHl74goWV3ctuw@public.gmane.org>,
Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Geert Uytterhoeven
<geert-Td1EMuHUCqxL1ZNQvxDV9g@public.gmane.org>,
linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Matt Turner <mattst88-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Haavard Skinnemoen
<hskinnemoen-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [PATCH] kdump: Fix crash_kexec - smp_send_stop race in panic
Date: Fri, 11 Nov 2011 12:02:46 -0500 [thread overview]
Message-ID: <4EBD5536.7010806@tilera.com> (raw)
In-Reply-To: <1321014494.2745.7.camel@br98xy6r>
On 11/11/2011 7:28 AM, Michael Holzheu wrote:
> Hello Chris,
>
> On Thu, 2011-11-10 at 10:11 -0500, Chris Metcalf wrote:
>> On 11/10/2011 9:22 AM, Michael Holzheu wrote:
> [snip]
>
>> If a cleaner API seems useful (either for power reasons or restartability
>> or whatever), I suppose a standard global function name could be specified
>> that's the thing you execute when you get an smp_send_stop IPI (in tile's
>> case it's "smp_stop_cpu_interrupt()") and the panic() code could instead
>> just do an atomic_inc_return() of a global panic counter, and if it wasn't
>> the first panicking cpu, call directly into the smp_stop handler routine to
>> quiesce itself. Then the panicking cpu could finish whatever it needs to
>> do and then halt, reboot, etc., all the cpus.
> Thanks for the info. So introducing a "weak" function that can stop the
> CPU it is running on could solve the problem. Every architecture can
> override the function with something appropriate. E.g. "tile" can use
> the lower-power "nap" instruction there.
>
> What about the following patch.
Seems reasonable to me.
Acked-by: Chris Metcalf <cmetcalf-kv+TWInifGbQT0dZR+AlfA@public.gmane.org>
>
> Michael
> ---
> From: Michael Holzheu<holzheu-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> Subject: kdump: fix crash_kexec()/smp_send_stop() race in panic
>
> When two CPUs call panic at the same time there is a possible race
> condition that can stop kdump. The first CPU calls crash_kexec() and the
> second CPU calls smp_send_stop() in panic() before crash_kexec() finished
> on the first CPU. So the second CPU stops the first CPU and therefore
> kdump fails:
>
> 1st CPU:
> panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump
>
> 2nd CPU:
> panic()->crash_kexec()->kexec_mutex already held by 1st CPU
> ->smp_send_stop()-> stop 1st CPU (stop kdump)
>
> This patch fixes the problem by introducing a spinlock in panic that
> allows only one CPU to process crash_kexec() and the subsequent panic
> code.
>
> Signed-off-by: Michael Holzheu<holzheu-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
> kernel/panic.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -49,6 +49,15 @@ static long no_blink(int state)
> long (*panic_blink)(int state);
> EXPORT_SYMBOL(panic_blink);
>
> +/*
> + * Stop ourself in panic -- architecture code may override this
> + */
> +void __attribute__ ((weak)) panic_smp_self_stop(void)
> +{
> + while (1)
> + cpu_relax();
> +}
> +
> /**
> * panic - halt the system
> * @fmt: The text string to print
> @@ -59,6 +68,7 @@ EXPORT_SYMBOL(panic_blink);
> */
> NORET_TYPE void panic(const char * fmt, ...)
> {
> + static DEFINE_SPINLOCK(panic_lock);
> static char buf[1024];
> va_list args;
> long i, i_next = 0;
> @@ -68,8 +78,14 @@ NORET_TYPE void panic(const char * fmt,
> * It's possible to come here directly from a panic-assertion and
> * not have preempt disabled. Some functions called from here want
> * preempt to be disabled. No point enabling it later though...
> + *
> + * Only one CPU is allowed to execute the panic code from here. For
> + * multiple parallel invocations of panic, all other CPUs either
> + * stop themself or will wait until they are stopped by the 1st CPU
> + * with smp_send_stop().
> */
> - preempt_disable();
> + if (!spin_trylock(&panic_lock))
> + panic_smp_self_stop();
>
> console_verbose();
> bust_spinlocks(1);
>
>
--
Chris Metcalf, Tilera Corp.
http://www.tilera.com
next prev parent reply other threads:[~2011-11-11 17:02 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-26 14:34 [PATCH] kdump: Fix crash_kexec - smp_send_stop race in panic Michael Holzheu
2011-10-26 14:34 ` Michael Holzheu
2011-10-27 17:40 ` Vivek Goyal
2011-10-27 17:40 ` Vivek Goyal
2011-10-28 23:11 ` Andrew Morton
2011-10-28 23:11 ` Andrew Morton
2011-10-31 9:57 ` Michael Holzheu
2011-10-31 9:57 ` Michael Holzheu
2011-10-31 10:39 ` Andrew Morton
2011-10-31 10:39 ` Andrew Morton
2011-10-31 12:34 ` [PATCH v2] " Michael Holzheu
2011-10-31 12:34 ` Michael Holzheu
2011-11-01 20:04 ` Don Zickus
2011-11-01 20:04 ` Don Zickus
2011-11-02 10:03 ` Michael Holzheu
2011-11-02 10:03 ` Michael Holzheu
2011-11-02 10:03 ` Michael Holzheu
2011-11-02 20:57 ` Luck, Tony
2011-11-02 20:57 ` Luck, Tony
2011-11-03 10:07 ` [PATCH] " Michael Holzheu
2011-11-03 10:07 ` Michael Holzheu
2011-11-10 0:04 ` Andrew Morton
2011-11-10 0:04 ` Andrew Morton
2011-11-10 14:17 ` Américo Wang
2011-11-10 14:17 ` Américo Wang
2011-11-10 14:22 ` Michael Holzheu
2011-11-10 14:22 ` Michael Holzheu
2011-11-10 15:11 ` Chris Metcalf
2011-11-10 15:11 ` Chris Metcalf
2011-11-11 12:28 ` Michael Holzheu
2011-11-11 12:28 ` Michael Holzheu
2011-11-11 12:30 ` James Bottomley
2011-11-11 12:30 ` James Bottomley
2011-11-11 17:02 ` Chris Metcalf [this message]
2011-11-11 17:02 ` Chris Metcalf
2011-11-29 8:58 ` [PATCH v3] " Michael Holzheu
2011-11-29 8:58 ` Michael Holzheu
2011-11-11 17:45 ` [PATCH] " Richard Kuo
2011-11-11 17:45 ` Richard Kuo
2011-11-10 15:31 ` James Bottomley
2011-11-10 15:31 ` James Bottomley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EBD5536.7010806@tilera.com \
--to=cmetcalf@tilera.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=chris@zankel.net \
--cc=davem@davemloft.net \
--cc=deller@gmx.de \
--cc=dhowells@redhat.com \
--cc=dzickus@redhat.com \
--cc=ebiederm@xmission.com \
--cc=egtvedt@samfundet.no \
--cc=fenghua.yu@intel.com \
--cc=geert@linux-m68k.org \
--cc=gxt@mprc.pku.edu.cn \
--cc=heiko.carstens@de.ibm.com \
--cc=holzheu@linux.vnet.ibm.com \
--cc=hpa@zytor.com \
--cc=hskinnemoen@gmail.com \
--cc=ink@jurassic.park.msu.ru \
--cc=jdike@addtoit.com \
--cc=jejb@parisc-linux.org \
--cc=jesper.nilsson@axis.com \
--cc=jonas@southpole.se \
--cc=kexec@lists.infradead.org \
--cc=kyle@mcmartin.ca \
--cc=lennox.wu@gmail.com \
--cc=lethal@linux-sh.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=liqin.chen@sunplusct.com \
--cc=mattst88@gmail.com \
--cc=mingo@redhat.com \
--cc=monstr@monstr.eu \
--cc=paulus@samba.org \
--cc=ralf@linux-mips.org \
--cc=richard@nod.at \
--cc=rkuo@codeaurora.org \
--cc=rth@twiddle.net \
--cc=schwidefsky@de.ibm.com \
--cc=starvik@axis.com \
--cc=takata@linux-m32r.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=vapier@gentoo.org \
--cc=vgoyal@redhat.com \
--cc=yasutake.koichi@jp.panasonic.com \
--cc=ysato@users.sourceforge.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.