From: Jens Axboe <jens.axboe@oracle.com>
To: Neil Brown <neilb@suse.de>
Cc: Brad Campbell <brad@wasp.net.au>,
Chuck Ebbert <cebbert@redhat.com>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert
Date: Wed, 25 Apr 2007 10:46:07 +0200 [thread overview]
Message-ID: <20070425084607.GM9715@kernel.dk> (raw)
In-Reply-To: <17967.4734.783140.512857@notabene.brown>
On Wed, Apr 25 2007, Neil Brown wrote:
> On Tuesday April 24, brad@wasp.net.au wrote:
> > [105449.653682] cfq: rbroot not empty, but ->next_rq == NULL! Fixing up, report the issue to
> > lkml@vger.kernel.org
> > [105449.683646] cfq: busy=1,drv=0,timer=0
> > [105449.694871] cfq rr_list:
> > [105449.702715] 3108: sort=0,next=00000000,q=0/1,a=1/0,d=0/0,f=69
> > [105449.720693] cfq busy_list:
> > [105449.729054] cfq idle_list:
> > [105449.737418] cfq cur_rr:
>
> Ok, I have a theory.
>
> An ELEVATOR_FRONT_MERGE occurs which changes req->sector and calls
> ->elevator_merged_fn which is cfq_merged_request.
>
> At this time there is already a request in cfq with the same sector
> number, and that request is the only other request on the queue.
>
> cfq_merged_request calls cfq_reposition_rq_rb which removes the
> req from ->sortlist and then calls cfq_add_rq_rb to add it back (at
> the new location because ->sector has changed).
>
> cfq_add_rq_rb finds there is already a request with the same sector
> number and so elv_rb_add returns an __alias which is passed to
> cfq_dispatch_insert.
> This calls cfq_remove_request and as that is the only request present,
> ->next_rq gets set to NULL.
> The old request with the new sector number is then added to the
> ->sortlist, but ->next_rq is never set - it remains NULL.
>
> How likely it would be to get two requests with the same sector number
> I don't know. I wouldn't expect it to ever happen - I have seen it
> before, but it was due to a bug in ext3. Maybe XFS does it
> intentionally some times?
>
> You could test this theory by putting a
> WARN_ON(cfqq->next_rq == NULL);
> at the end of cfq_reposition_rq_rb, just after the cfq_add_rq_rb call.
>
> I will leave the development of a suitable fix up to Jens if he agrees
> that this is possible.
That's pretty close to where I think the problem is (the front merging
and cfq_reposition_rq_rb()). The issue with that is that you'd only get
aliases for O_DIRECT and/or raw IO, and that doesn't seem to be the case
here. Given that front merges are equally not very likely, I'd be
surprised is something like that has ever happened.
BUT... That may explain while we are only seeing it on md. Would md
ever be issuing such requests that trigger this condition?
I'll try and concoct a test case.
--
Jens Axboe
next prev parent reply other threads:[~2007-04-25 8:47 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-15 10:14 [OOPS] 2.6.21-rc6-git5 in cfq_dispatch_insert Brad Campbell
2007-04-15 10:49 ` Brad Campbell
2007-04-15 23:53 ` Adrian Bunk
2007-04-16 3:23 ` Brad Campbell
2007-04-16 22:39 ` Chuck Ebbert
2007-04-17 5:10 ` Neil Brown
2007-04-17 8:13 ` Brad Campbell
2007-04-17 11:48 ` Brad Campbell
2007-04-17 20:39 ` Bartlomiej Zolnierkiewicz
2007-04-18 12:37 ` Jens Axboe
2007-04-18 13:19 ` Brad Campbell
2007-04-18 13:21 ` Jens Axboe
2007-04-22 7:37 ` Brad Campbell
2007-04-23 7:35 ` Jens Axboe
2007-04-24 19:40 ` Brad Campbell
2007-04-25 8:34 ` Neil Brown
2007-04-25 8:46 ` Jens Axboe [this message]
2007-04-25 9:34 ` Jens Axboe
2007-04-25 9:37 ` Neil Brown
2007-04-25 9:47 ` Jens Axboe
2007-04-25 10:02 ` Brad Campbell
2007-04-25 10:18 ` Jens Axboe
2007-04-25 13:59 ` Roland Kuhn
2007-04-25 10:25 ` Neil Brown
2007-04-25 10:36 ` Jens Axboe
2007-04-25 9:54 ` Brad Campbell
2007-04-25 8:50 ` Brad Campbell
2007-04-25 10:06 ` Brad Campbell
2007-04-25 10:59 ` Neil Brown
2007-04-25 11:17 ` Degraded RAID performance - Was : " Brad Campbell
2007-04-18 13:19 ` Jens Axboe
[not found] <79880979-51BB-4D28-A3E8-3AE0F56F5B0A@e18.physik.tu-muenchen.de>
[not found] ` <20070424091807.GA3744@kernel.dk>
[not found] ` <6A6800B3-F9C8-4046-9E1C-A8CEA81B2CE0@e18.physik.tu-muenchen.de>
[not found] ` <20070424093904.GB3744@kernel.dk>
[not found] ` <20070424094003.GC3744@kernel.dk>
2007-04-24 12:27 ` Roland Kuhn
2007-04-24 12:32 ` Jens Axboe
2007-04-24 13:03 ` Roland Kuhn
2007-04-24 13:07 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070425084607.GM9715@kernel.dk \
--to=jens.axboe@oracle.com \
--cc=brad@wasp.net.au \
--cc=cebbert@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox