* Re: [kernel.org users] [KORG] Panics on master backend
[not found] ` <1314129133.8002.102.camel@twins>
@ 2011-08-23 21:32 ` James Bottomley
2011-08-24 9:59 ` Peter Zijlstra
0 siblings, 1 reply; 2+ messages in thread
From: James Bottomley @ 2011-08-23 21:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: J.H., users, linux-kernel, Oleg Nesterov, Frank Rowand,
yong.zhang0, mingo, Jens Axboe, Thomas Gleixner, scameron, hch,
linux-scsi
Cc: linux-scsi added
On Tue, 2011-08-23 at 21:52 +0200, Peter Zijlstra wrote:
> On Tue, 2011-08-23 at 11:09 -0700, J.H. wrote:
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.702389] <<EOE>> <IRQ> [<ffffffff81049469>] try_to_wake_up+0x89/0x1ca
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.709536] [<ffffffff810495d3>] ? wake_up_process+0x15/0x17
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.715422] [<ffffffff810495bc>] default_wake_function+0x12/0x14
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.721659] [<ffffffff8103d165>] __wake_up_common+0x4e/0x84
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.727489] [<ffffffffa0177066>] ? _xfs_buf_ioend+0x2f/0x36 [xfs]
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.733852] [<ffffffff810428b0>] complete+0x3f/0x52
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.738985] [<ffffffffa0177017>] xfs_buf_ioend+0xc2/0xe2 [xfs]
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.745079] [<ffffffffa0177066>] _xfs_buf_ioend+0x2f/0x36 [xfs]
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.751259] [<ffffffffa017727a>] xfs_buf_bio_end_io+0x2a/0x37 [xfs]
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.757758] [<ffffffff8114f684>] bio_endio+0x2d/0x2f
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.762945] [<ffffffff81229cd1>] req_bio_endio.clone.31+0x9e/0xa7
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.772811] [<ffffffff81229e6b>] blk_update_request+0x191/0x32e
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.782563] [<ffffffff8122a029>] blk_update_bidi_request+0x21/0x6d
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.792529] [<ffffffff8122a180>] blk_end_bidi_request+0x1f/0x5d
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.802230] [<ffffffff8122a1f8>] blk_end_request+0x10/0x12
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.811446] [<ffffffff81313300>] scsi_io_completion+0x1e2/0x4d8
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.821002] [<ffffffff814d47e7>] ? _raw_spin_unlock_irqrestore+0x17/0x19
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.831405] [<ffffffff8130bf80>] scsi_finish_command+0xe8/0xf1
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.840948] [<ffffffff8131305f>] scsi_softirq_done+0x109/0x112
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.850485] [<ffffffff8122e3c8>] blk_done_softirq+0x73/0x87
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.859729] [<ffffffff8105757a>] __do_softirq+0xc9/0x1b6
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.868705] [<ffffffff81014e01>] ? paravirt_read_tsc+0x9/0xd
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.878093] [<ffffffff814dd2ac>] call_softirq+0x1c/0x30
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.887025] [<ffffffff81010c5e>] do_softirq+0x46/0x84
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.895767] [<ffffffff81057849>] irq_exit+0x52/0xaf
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.904246] [<ffffffff814ddc21>] smp_apic_timer_interrupt+0x7c/0x8a
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.914171] [<ffffffff814dbb1e>] apic_timer_interrupt+0x6e/0x80
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.923618] <EOI> [<ffffffff81080526>] ? arch_local_irq_restore+0x6/0xd
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.933756] [<ffffffff814d47e7>] _raw_spin_unlock_irqrestore+0x17/0x19
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.943630] [<ffffffffa00446c4>] hpsa_scsi_queue_command+0x2f9/0x319 [hpsa]
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.953851] [<ffffffff8130c9bb>] scsi_dispatch_cmd+0x18c/0x251
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.962917] [<ffffffff81312d70>] scsi_request_fn+0x3cd/0x3f9
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.971781] [<ffffffff81225214>] __blk_run_queue+0x1b/0x1d
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.980439] [<ffffffff8123b637>] cfq_insert_request+0x4cf/0x504
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.989469] [<ffffffff812248d4>] __elv_add_request+0x19c/0x1e7
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145813.998481] [<ffffffff8122a934>] blk_flush_plug_list+0x166/0x19b
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145814.007670] [<ffffffff814d2ce4>] schedule+0x2a3/0x6dc
> > 20110823.log:Aug 23 16:49:53 console1 port05 RXDATA: [145814.015903] [<ffffffff814d35e1>] schedule_timeout+0x36/0xe3
>
> Whee, you must be real 'lucky' to hit this so reliable.
>
> Both panics included the above bit, so what I think happens is that we
> release rq->lock while calling that blk_flush_plug muck, it however then
> goes and re-enabled interrupts *FAIL*
>
> An interrupts happens and it (or in fact a pending softirq) then tries
> to wake the current task which is in the process of going to sleep.
> Since the task is still on the cpu we spin until its gone (assuming its
> a remote task since interrupts aren't supposed to happen here).
>
> Now I can fudge around this, see below, but really the whole
> block/scsi/hpsa crap should *NOT* re-enable interrupts here.
>
> It looks like the offender is:
>
> scsi_request_fn(), which does:
>
> /*
> * XXX(hch): This is rather suboptimal, scsi_dispatch_cmd will
> * take the lock again.
> */
> spin_unlock_irq(shost->host_lock);
Um, that one looks like a mistake missed in cleanup (probably years
ago). The other host_lock acquisitions aren't _irq, so this one
shouldn't be either. I can patch it up (or you can).
James
>
>
> ---
> kernel/sched.c | 9 +++------
> 1 files changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index ccacdbd..30d9788 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2630,7 +2630,6 @@ static void ttwu_queue_remote(struct task_struct *p, int cpu)
> smp_send_reschedule(cpu);
> }
>
> -#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
> static int ttwu_activate_remote(struct task_struct *p, int wake_flags)
> {
> struct rq *rq;
> @@ -2647,7 +2646,6 @@ static int ttwu_activate_remote(struct task_struct *p, int wake_flags)
> return ret;
>
> }
> -#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
> #endif /* CONFIG_SMP */
>
> static void ttwu_queue(struct task_struct *p, int cpu)
> @@ -2705,7 +2703,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> * this task as prev, wait until its done referencing the task.
> */
> while (p->on_cpu) {
> -#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
> /*
> * In case the architecture enables interrupts in
> * context_switch(), we cannot busy wait, since that
> @@ -2713,11 +2710,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
> * tries to wake up @prev. So bail and do a complete
> * remote wakeup.
> */
> - if (ttwu_activate_remote(p, wake_flags))
> + if (cpu == smp_processor_id() &&
> + ttwu_activate_remote(p, wake_flags))
> goto stat;
> -#else
> +
> cpu_relax();
> -#endif
> }
> /*
> * Pairs with the smp_wmb() in finish_lock_switch().
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [kernel.org users] [KORG] Panics on master backend
2011-08-23 21:32 ` [kernel.org users] [KORG] Panics on master backend James Bottomley
@ 2011-08-24 9:59 ` Peter Zijlstra
0 siblings, 0 replies; 2+ messages in thread
From: Peter Zijlstra @ 2011-08-24 9:59 UTC (permalink / raw)
To: James Bottomley
Cc: linux-scsi, Frank Rowand, Oleg Nesterov, linux-kernel, hch,
yong.zhang0, scameron, Jens Axboe, Thomas Gleixner, users
[-- Attachment #1: Type: text/plain, Size: 993 bytes --]
On Tue, 2011-08-23 at 16:32 -0500, James Bottomley wrote:
> > scsi_request_fn(), which does:
> >
> > /*
> > * XXX(hch): This is rather suboptimal, scsi_dispatch_cmd will
> > * take the lock again.
> > */
> > spin_unlock_irq(shost->host_lock);
>
> Um, that one looks like a mistake missed in cleanup (probably years
> ago). The other host_lock acquisitions aren't _irq, so this one
> shouldn't be either. I can patch it up (or you can).
I would much appreciate if someone who knows that whole stack would
audit it, my preferred solution is removing that blk plug stuff from the
scheduler guts since clearly nobody bothered making sure its sane.
Something like the three patches below, lifted from -rt:
sched-separate-the-scheduler-entry-for-preemption.patch
sched-move-blk_schedule_flush_plug-out-of-__schedule.patch
block-shorten-interrupt-disabled-regions.patch
[-- Attachment #2: block-shorten-interrupt-disabled-regions.patch --]
[-- Type: text/x-patch, Size: 3653 bytes --]
Subject: block: Shorten interrupt disabled regions
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 22 Jun 2011 19:47:02 +0200
Moving the blk_sched_flush_plug() call out of the interrupt/preempt
disabled region in the scheduler allows us to replace
local_irq_save/restore(flags) by local_irq_disable/enable() in
blk_flush_plug().
Now instead of doing this we disable interrupts explicitely when we
lock the request_queue and reenable them when we drop the lock. That
allows interrupts to be handled when the plug list contains requests
for more than one queue.
Aside of that this change makes the scope of the irq disabled region
more obvious. The current code confused the hell out of me when
looking at:
local_irq_save(flags);
spin_lock(q->queue_lock);
...
queue_unplugged(q...);
scsi_request_fn();
spin_unlock(q->queue_lock);
spin_lock(shost->host_lock);
spin_unlock_irq(shost->host_lock);
-------------------^^^ ????
spin_lock_irq(q->queue_lock);
spin_unlock(q->lock);
local_irq_restore(flags);
Also add a comment to __blk_run_queue() documenting that
q->request_fn() can drop q->queue_lock and reenable interrupts, but
must return with q->queue_lock held and interrupts disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110622174919.025446432@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
block/blk-core.c | 20 ++++++++------------
1 file changed, 8 insertions(+), 12 deletions(-)
Index: linux-2.6/block/blk-core.c
===================================================================
--- linux-2.6.orig/block/blk-core.c
+++ linux-2.6/block/blk-core.c
@@ -301,7 +301,11 @@ void __blk_run_queue(struct request_queu
{
if (unlikely(blk_queue_stopped(q)))
return;
-
+ /*
+ * q->request_fn() can drop q->queue_lock and reenable
+ * interrupts, but must return with q->queue_lock held and
+ * interrupts disabled.
+ */
q->request_fn(q);
}
EXPORT_SYMBOL(__blk_run_queue);
@@ -2667,11 +2671,11 @@ static void queue_unplugged(struct reque
* this lock).
*/
if (from_schedule) {
- spin_unlock(q->queue_lock);
+ spin_unlock_irq(q->queue_lock);
blk_run_queue_async(q);
} else {
__blk_run_queue(q);
- spin_unlock(q->queue_lock);
+ spin_unlock_irq(q->queue_lock);
}
}
@@ -2697,7 +2701,6 @@ static void flush_plug_callbacks(struct
void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
{
struct request_queue *q;
- unsigned long flags;
struct request *rq;
LIST_HEAD(list);
unsigned int depth;
@@ -2718,11 +2721,6 @@ void blk_flush_plug_list(struct blk_plug
q = NULL;
depth = 0;
- /*
- * Save and disable interrupts here, to avoid doing it for every
- * queue lock we have to take.
- */
- local_irq_save(flags);
while (!list_empty(&list)) {
rq = list_entry_rq(list.next);
list_del_init(&rq->queuelist);
@@ -2735,7 +2733,7 @@ void blk_flush_plug_list(struct blk_plug
queue_unplugged(q, depth, from_schedule);
q = rq->q;
depth = 0;
- spin_lock(q->queue_lock);
+ spin_lock_irq(q->queue_lock);
}
/*
* rq is already accounted, so use raw insert
@@ -2753,8 +2751,6 @@ void blk_flush_plug_list(struct blk_plug
*/
if (q)
queue_unplugged(q, depth, from_schedule);
-
- local_irq_restore(flags);
}
void blk_finish_plug(struct blk_plug *plug)
[-- Attachment #3: sched-move-blk_schedule_flush_plug-out-of-__schedule.patch --]
[-- Type: text/x-patch, Size: 2095 bytes --]
Subject: sched: Move blk_schedule_flush_plug() out of __schedule()
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 22 Jun 2011 19:47:01 +0200
There is no real reason to run blk_schedule_flush_plug() with
interrupts and preemption disabled.
Move it into schedule() and call it when the task is going voluntarily
to sleep. There might be false positives when the task is woken
between that call and actually scheduling, but that's not really
different from being woken immediately after switching away.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110622174918.917236198@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/sched.c | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4253,16 +4253,6 @@ need_resched:
if (to_wakeup)
try_to_wake_up_local(to_wakeup);
}
-
- /*
- * If we are going to sleep and we have plugged IO
- * queued, make sure to submit it to avoid deadlocks.
- */
- if (blk_needs_flush_plug(prev)) {
- raw_spin_unlock(&rq->lock);
- blk_schedule_flush_plug(prev);
- raw_spin_lock(&rq->lock);
- }
}
switch_count = &prev->nvcsw;
}
@@ -4301,8 +4291,23 @@ need_resched:
goto need_resched;
}
+static inline void sched_submit_work(struct task_struct *tsk)
+{
+ if (!tsk->state)
+ return;
+ /*
+ * If we are going to sleep and we have plugged IO queued,
+ * make sure to submit it to avoid deadlocks.
+ */
+ if (blk_needs_flush_plug(tsk))
+ blk_schedule_flush_plug(tsk);
+}
+
asmlinkage void schedule(void)
{
+ struct task_struct *tsk = current;
+
+ sched_submit_work(tsk);
__schedule();
}
EXPORT_SYMBOL(schedule);
[-- Attachment #4: sched-separate-the-scheduler-entry-for-preemption.patch --]
[-- Type: text/x-patch, Size: 2433 bytes --]
Subject: sched: Separate the scheduler entry for preemption
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 22 Jun 2011 19:47:00 +0200
Block-IO and workqueues call into notifier functions from the
scheduler core code with interrupts and preemption disabled. These
calls should be made before entering the scheduler core.
To simplify this, separate the scheduler core code into
__schedule(). __schedule() is directly called from the places which
set PREEMPT_ACTIVE and from schedule(). This allows us to add the work
checks into schedule(), so they are only called when a task voluntary
goes to sleep.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110622174918.813258321@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/sched.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -4210,9 +4210,9 @@ pick_next_task(struct rq *rq)
}
/*
- * schedule() is the main scheduler function.
+ * __schedule() is the main scheduler function.
*/
-asmlinkage void __sched schedule(void)
+static void __sched __schedule(void)
{
struct task_struct *prev, *next;
unsigned long *switch_count;
@@ -4300,6 +4300,11 @@ need_resched:
if (need_resched())
goto need_resched;
}
+
+asmlinkage void schedule(void)
+{
+ __schedule();
+}
EXPORT_SYMBOL(schedule);
#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
@@ -4373,7 +4378,7 @@ asmlinkage void __sched notrace preempt_
do {
add_preempt_count_notrace(PREEMPT_ACTIVE);
- schedule();
+ __schedule();
sub_preempt_count_notrace(PREEMPT_ACTIVE);
/*
@@ -4401,7 +4406,7 @@ asmlinkage void __sched preempt_schedule
do {
add_preempt_count(PREEMPT_ACTIVE);
local_irq_enable();
- schedule();
+ __schedule();
local_irq_disable();
sub_preempt_count(PREEMPT_ACTIVE);
@@ -5526,7 +5531,7 @@ static inline int should_resched(void)
static void __cond_resched(void)
{
add_preempt_count(PREEMPT_ACTIVE);
- schedule();
+ __schedule();
sub_preempt_count(PREEMPT_ACTIVE);
}
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2011-08-24 10:03 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4E53ECEF.7040109@kernel.org>
[not found] ` <1314129133.8002.102.camel@twins>
2011-08-23 21:32 ` [kernel.org users] [KORG] Panics on master backend James Bottomley
2011-08-24 9:59 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox