linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>, linux-arch@vger.kernel.org
Cc: heiko.carstens@de.ibm.com, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	schwidefsky@de.ibm.com, Vivek Goyal <vgoyal@redhat.com>
Subject: [PATCH v2] kdump: Fix crash_kexec - smp_send_stop race in panic
Date: Mon, 31 Oct 2011 13:34:19 +0100	[thread overview]
Message-ID: <1320064459.2796.26.camel@br98xy6r> (raw)
In-Reply-To: <20111031033948.a0edb7f3.akpm@linux-foundation.org>

Hello Andrew, hello linux-arch,

On Mon, 2011-10-31 at 03:39 -0700, Andrew Morton wrote:
> On Mon, 31 Oct 2011 10:57:16 +0100 Michael Holzheu <holzheu@linux.vnet.ibm.com> wrote:
> 
> > > Should this be done earlier in the function?  As it stands we'll have
> > > multiple CPUs scribbling on buf[] at the same time and all trying to
> > > print the same thing at the same time, dumping their stacks, etc. 
> > > Perhaps it would be better to single-thread all that stuff
> > 
> > My fist patch took the spinlock at the beginning of panic(). But then
> > Eric asked, if it wouldn't be better to get both panic printk's and I
> > agreed.
> 
> Hm, why?  It will make a big mess.

@Andrew:

I thought it would be good to have both messages and it would be good to
change the panic behavior as less as possible...

But ok, I have no problem with getting the lock at the beginning of
panic(). Below, I attached the updated patch.

> > > Also...  this patch affects all CPU architectures, all configs, etc. 
> > > So we're expecting that every architecture's smp_send_stop() is able to
> > > stop a CPU which is spinning in spin_lock(), possibly with local
> > > interrupts disabled.  Will this work?
> > 
> > At least on s390 it will work. If there are architectures that can't
> > stop disabled CPUs then this problem is already there without this
> > patch.
> > 
> > Example:
> > 
> > 1. 1st CPU gets lock X and panics
> > 2. 2nd CPU is disabled and gets lock X
> 
> (irq-disabled)
> 
> > 3. 1st CPU calls smp_send_stop()
> >    -> 2nd CPU loops disabled and can't be stopped
> 
> Well OK.  Maybe some architectures do have this problem - who would
> notice?  If that is the case, we just made the failure cases much more
> common.  Could you check, please?

@linux-arch: 

This patch introduces a spinlock to prevent parallel execution of the
panic code. Andrew pointed out that this might be a problem for
architectures that can't do smp_send_stop() on remote CPUs that have
interrupts disabled. When irq-disabled CPUs execute panic() in parallel,
we then would have looping CPUs.

So please speak up if somebody has a problem with this patch!

Michael
---
From: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Subject: kdump: fix crash_kexec()/smp_send_stop() race in panic

When two CPUs call panic at the same time there is a possible race
condition that can stop kdump.  The first CPU calls crash_kexec() and the
second CPU calls smp_send_stop() in panic() before crash_kexec() finished
on the first CPU.  So the second CPU stops the first CPU and therefore
kdump fails:

1st CPU:
panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump

2nd CPU:
panic()->crash_kexec()->kexec_mutex already held by 1st CPU
       ->smp_send_stop()-> stop 1st CPU (stop kdump)

This patch fixes the problem by introducing a spinlock in panic that
allows only one CPU to process crash_kexec() and the subsequent panic
code.

Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
---
 kernel/panic.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -59,6 +59,7 @@ EXPORT_SYMBOL(panic_blink);
  */
 NORET_TYPE void panic(const char * fmt, ...)
 {
+	static DEFINE_SPINLOCK(panic_lock);
 	static char buf[1024];
 	va_list args;
 	long i, i_next = 0;
@@ -68,8 +69,12 @@ NORET_TYPE void panic(const char * fmt,
 	 * It's possible to come here directly from a panic-assertion and
 	 * not have preempt disabled. Some functions called from here want
 	 * preempt to be disabled. No point enabling it later though...
+	 *
+	 * Only one CPU is allowed to execute the panic code from here. For
+	 * multiple parallel invocations of panic all other CPUs will wait on
+	 * the panic_lock. They are stopped afterwards by smp_send_stop().
 	 */
-	preempt_disable();
+	spin_lock_irq(&panic_lock);
 
 	console_verbose();
 	bust_spinlocks(1);

       reply	other threads:[~2011-10-31 12:34 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1319639649.3321.11.camel@br98xy6r>
     [not found] ` <20111028161143.e5ebf617.akpm@linux-foundation.org>
     [not found]   ` <1320055036.2796.8.camel@br98xy6r>
     [not found]     ` <20111031033948.a0edb7f3.akpm@linux-foundation.org>
2011-10-31 12:34       ` Michael Holzheu [this message]
2011-11-01 20:04         ` [PATCH v2] kdump: Fix crash_kexec - smp_send_stop race in panic Don Zickus
     [not found]           ` <20111101200420.GN17705-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2011-11-02 10:03             ` Michael Holzheu
2011-11-02 10:03               ` Michael Holzheu
2011-11-02 20:57               ` Luck, Tony
2011-11-03 10:07       ` [PATCH] " Michael Holzheu
2011-11-10  0:04         ` Andrew Morton
2011-11-10 14:17           ` Américo Wang
     [not found]           ` <20111109160400.cc2d27d9.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2011-11-10 14:22             ` Michael Holzheu
2011-11-10 15:11               ` Chris Metcalf
     [not found]                 ` <4EBBE9B4.3040009-kv+TWInifGbQT0dZR+AlfA@public.gmane.org>
2011-11-11 12:28                   ` Michael Holzheu
2011-11-11 12:30                     ` James Bottomley
2011-11-11 17:02                     ` Chris Metcalf
     [not found]                       ` <4EBD5536.7010806-kv+TWInifGbQT0dZR+AlfA@public.gmane.org>
2011-11-29  8:58                         ` [PATCH v3] " Michael Holzheu
2011-11-11 17:45                     ` [PATCH] " Richard Kuo
2011-11-10 15:31           ` James Bottomley
2011-11-10 15:31             ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1320064459.2796.26.camel@br98xy6r \
    --to=holzheu@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).