public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Dave Hansen <haveblue@us.ibm.com>
Cc: "David S. Miller" <davem@redhat.com>,
	Andrew Morton <akpm@osdl.org>,
	Hannes.Reinecke@suse.de,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Dumb question: BKL on reboot ?
Date: Thu, 21 Aug 2003 17:33:59 +0200	[thread overview]
Message-ID: <20030821153359.GD29612@dualathlon.random> (raw)
In-Reply-To: <1061411024.9371.33.camel@nighthawk>

On Wed, Aug 20, 2003 at 01:23:44PM -0700, Dave Hansen wrote:
> On Wed, 2003-08-20 at 11:35, David S. Miller wrote:
> > On Wed, 20 Aug 2003 11:29:18 -0700
> > Andrew Morton <akpm@osdl.org> wrote:
> > 
> > > Where exactly does the rebooting CPU get stuck in machine_restart()?  If
> > > someone has done lock_kernel() with local interrupts disabled then yes,
> > > it'll deadlock.  But that's unlikely?  Confused.
> 
> I thought it was legal to do that.  The normal interrupts-off spinlock
> deadlock happens because a thread does a spin_lock() with no irq
> disable, then gets interrupted.  The interrupt handler tries to take the
> lock, and gets stuck.  

that's the regular spinlock case but it has nothing to do with the big
kernel lock during smp_call_function. holding the big kernel lock while
broadcasting the IPI with smp_call_function is perfectly legal. Infact
if something it's safer (and it can't explain any deadlock), and since
it's a so low performance path, where scalability doesn't matter,
leaving the BKL around that code can be acceptable.

The only illegal thing is to call smp_call_function with _local_irqs_
disabled (as Andrew noted above). the big kernel lock doesn't matter for
the smp_call_function callers.

> If the BKL is put in that situation, the interrupt handler will see
> current->lock_depth > 0, and not actually take the spinlock; it will
> just increment the lock_depth.  It won't deadlock.
> 
> Or, are you saying that a CPU could have the BKL, and have
> stop_this_cpu() called on it?  I guess we could add
> release_kernel_lock() to stop_this_cpu().

the *real* bug IMHO is that machine_restart isn't shutting down the
other cpus properly in smp, it has nothing to do with lock_kernel in
kernel/, the bug is in arch/. Dropping the big kernel lock in kernel/
can hide the bug, but it will showup again later in any other spinlock.

The problem is that machine_restart in arch/s390 (and arch/i386 too) can
easily hang a cpu in the IPI handler while it had zillon of spinlocks
held, that may later interfere with the main "reboot" cpu.

So at the very least a better fix than the one that drops the big kernel
lock from kernel/ is to execute current->lock_depth = -1 before entering
the infinite loop in the arch/ code.

A real fix would be to use the unplug cpu framework to allow all cpus to
reach the quiescent point, in this case it's easily doable even w/o
using RCU and w/o the pluggable cpu framework in 2.4, by simply spawning
a kernel thread bind to interesting cpu, and having this kernel thread
setting the bitflag after it become runnable (cut and pasting the
spawn_ksoftirqd will do the trick). The kernel thread will only execute
the infinite loop then. Hanging a cpu forever in a IPI is simply
deadlock prone for all other cpus, and that's the real bug.

Andrea

  parent reply	other threads:[~2003-08-21 15:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-08-20 10:22 Dumb question: BKL on reboot ? Hannes Reinecke
2003-08-20 18:29 ` Andrew Morton
2003-08-20 18:35   ` David S. Miller
2003-08-20 20:23     ` Dave Hansen
2003-08-21  8:05       ` Hannes Reinecke
2003-08-21 15:41         ` Andrea Arcangeli
2003-08-21 15:55           ` Hannes Reinecke
2003-08-21 16:39             ` Andrea Arcangeli
2003-08-21 16:58               ` Andrea Arcangeli
2003-08-22  6:39               ` Hannes Reinecke
2003-08-22 13:57                 ` Andrea Arcangeli
2003-08-24 21:43                 ` Andrea Arcangeli
2003-08-21 15:33       ` Andrea Arcangeli [this message]
     [not found] <3F434BD1.9050704@suse.de.suse.lists.linux.kernel>
2003-08-20 10:49 ` Andi Kleen
2003-08-20 11:51   ` Hannes Reinecke
2003-08-20 12:03     ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20030821153359.GD29612@dualathlon.random \
    --to=andrea@suse.de \
    --cc=Hannes.Reinecke@suse.de \
    --cc=akpm@osdl.org \
    --cc=davem@redhat.com \
    --cc=haveblue@us.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox