From: Jens Axboe <jens.axboe@oracle.com>
To: Neil Brown <neilb@suse.de>
Cc: Chuck Ebbert <cebbert@redhat.com>,
Brad Campbell <brad@wasp.net.au>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Date: Wed, 18 Apr 2007 14:37:57 +0200 [thread overview]
Message-ID: <20070418123757.GC3796@kernel.dk> (raw)
In-Reply-To: <17956.22235.574867.179016@notabene.brown>
On Tue, Apr 17 2007, Neil Brown wrote:
> On Monday April 16, cebbert@redhat.com wrote:
> >
> > cfq_dispatch_insert() was called with rq == 0. This one is getting really
> > annoying... and md is involved again (RAID0 this time.)
>
> Yeah... weird.
> RAID0 is so light-weight and so different from RAID1 or RAID5 that I
> feel fairly safe concluding that the problem isn't in or near md.
> But that doesn't help you.
Well the fact is that we have 3-4 distinct reports of this oops, and all
of them are using md. No reports have been filed where md isn't managing
the disks. While this doesn't conclusively put the finger of blame on
md, it is still likely. It could bug a CFQ bug too of course, or (more
likely), some bad interaction between md and CFQ.
> This really feels like a locking problem.
Very much.
> The problem occurs when ->next_rq is NULL, but ->sort_list.rb_node is
> not NULL. That happens plenty of times in the code (particularly as
> the first request is inserted) but always under ->queue_lock so it
> should never be visible to cfq_dispatch_insert..
>
> Except that drivers/scsi/ide-scsi.c:idescsi_eh_reset calls
> elv_next_request which could ultimately call __cfq_dispatch_requests
> without taking ->queue_lock (that I can see). But you probably aren't
> using ide-scsi (does anyone?).
>
> Given that interrupts are always disabled when queue_lock is taken, it
> might be useful to add
> WARN_ON(!irqs_disabled());
> every time ->next_rq is set.
> Something like the following.
>
> It might show something useful.... if we are lucky.
I had something similar for generic_unplug_request() as well, but didn't
see/hear any reports of it being tried out. Here's a complete debugging
patch for this and other potential dangers.
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b6491c0..8f749aa 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -587,8 +587,11 @@ static void cfq_remove_request(struct request *rq)
{
struct cfq_queue *cfqq = RQ_CFQQ(rq);
- if (cfqq->next_rq == rq)
+ BUG_ON(!irqs_disabled());
+ if (cfqq->next_rq == rq) {
cfqq->next_rq = cfq_find_next_rq(cfqq->cfqd, cfqq, rq);
+ BUG_ON(!cfqq->next_rq && !RB_EMPTY_ROOT(&cfqq->sort_list));
+ }
list_del_init(&rq->queuelist);
cfq_del_rq_rb(rq);
@@ -1637,6 +1640,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
/*
* check if this request is a better next-serve candidate)) {
*/
+ BUG_ON(!irqs_disabled());
cfqq->next_rq = cfq_choose_req(cfqd, cfqq->next_rq, rq);
BUG_ON(!cfqq->next_rq);
diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 3de0695..c16863e 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1602,6 +1602,8 @@ EXPORT_SYMBOL(__generic_unplug_device);
**/
void generic_unplug_device(request_queue_t *q)
{
+ BUG_ON(irqs_disabled());
+
spin_lock_irq(q->queue_lock);
__generic_unplug_device(q);
spin_unlock_irq(q->queue_lock);
--
Jens Axboe
next prev parent reply other threads:[~2007-04-18 12:39 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-15 10:14 [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert Brad Campbell
2007-04-15 10:49 ` Brad Campbell
2007-04-15 23:53 ` Adrian Bunk
2007-04-16 3:23 ` Brad Campbell
2007-04-16 22:39 ` Chuck Ebbert
2007-04-17 5:10 ` Neil Brown
2007-04-17 8:13 ` Brad Campbell
2007-04-17 11:48 ` Brad Campbell
2007-04-17 20:39 ` Bartlomiej Zolnierkiewicz
2007-04-18 12:37 ` Jens Axboe [this message]
2007-04-18 13:19 ` Brad Campbell
2007-04-18 13:21 ` Jens Axboe
2007-04-22 7:37 ` Brad Campbell
2007-04-23 7:35 ` Jens Axboe
2007-04-24 19:40 ` Brad Campbell
2007-04-25 8:34 ` Neil Brown
2007-04-25 8:46 ` Jens Axboe
2007-04-25 9:34 ` Jens Axboe
2007-04-25 9:37 ` Neil Brown
2007-04-25 9:47 ` Jens Axboe
2007-04-25 10:02 ` Brad Campbell
2007-04-25 10:18 ` Jens Axboe
2007-04-25 13:59 ` Roland Kuhn
2007-04-25 10:25 ` Neil Brown
2007-04-25 10:36 ` Jens Axboe
2007-04-25 9:54 ` Brad Campbell
2007-04-25 8:50 ` Brad Campbell
2007-04-25 10:06 ` Brad Campbell
2007-04-25 10:59 ` Neil Brown
2007-04-25 11:17 ` Degraded RAID performance - Was : " Brad Campbell
2007-04-18 13:19 ` Jens Axboe
[not found] <79880979-51BB-4D28-A3E8-3AE0F56F5B0A@e18.physik.tu-muenchen.de>
[not found] ` <20070424091807.GA3744@kernel.dk>
[not found] ` <6A6800B3-F9C8-4046-9E1C-A8CEA81B2CE0@e18.physik.tu-muenchen.de>
[not found] ` <20070424093904.GB3744@kernel.dk>
[not found] ` <20070424094003.GC3744@kernel.dk>
2007-04-24 12:27 ` Roland Kuhn
2007-04-24 12:32 ` Jens Axboe
2007-04-24 13:03 ` Roland Kuhn
2007-04-24 13:07 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070418123757.GC3796@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=brad@wasp.net.au \
--cc=cebbert@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox