* Re: [Fwd: Block driver freezes when using CFQ]
[not found] <454313C9.4010602@adaptec.com>
@ 2006-10-30 8:22 ` Jens Axboe
2006-10-31 4:57 ` Block driver freezes when using CFQ Ravi Krishnamurthy
0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2006-10-30 8:22 UTC (permalink / raw)
To: Ravi Krishnamurthy; +Cc: linux-kernel
On Sat, Oct 28 2006, Ravi Krishnamurthy wrote:
> Hi all,
>
> I have written a block driver that registers a virtual device and
> routes requests to appropriate real devices after some re-mapping of
> the requests. I am testing the driver by creating a filesystem on the
> virtual device and copying a large number of files on to it. The test
> causes the device to become unresponsive after some time. After some
> debugging, I noticed that this happens only if the I/O scheduler being
> used is CFQ. I have not had any trouble if the scheduler is noop,
> anticipatory or deadline. The problem occurs on all the kernels I have
> tested - 2.6.18-rc2, 2.6.18-rc4, 2.6.19-rc3.
>
> Below are some details about the driver and what I have observed during
> testing:
>
> The request function registered by my driver is a simple loop -
>
> while ((req = elv_next_request(q))) {
> blkdev_dequeue_request(req);
>
> /*
> Add request to an internal queue for further processing
> Wake up thread to start processing the queue
> Update some variables for book-keeping
> */
> }
>
> Completed requests are handled in a different thread -
> while (work to be done) {
> /*
> Dequeue completed requests from internal queue
> Call end_that_request_first() and end_that_request_last()
> Update some variables for book-keeping
> */
> }
The io scheduler is not obligated to recall your request handling
function, _unless_ you have no pending io at the point where
elv_next_request() returns NULL but there are things pending. IOW, when
you complete your requests you want to just recall your request handling
function. Just insert something ala:
if (elv_next_request(q))
q->request_fn(q);
when you are done completing requests.
Does that fix it?
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Block driver freezes when using CFQ
2006-10-30 8:22 ` [Fwd: Block driver freezes when using CFQ] Jens Axboe
@ 2006-10-31 4:57 ` Ravi Krishnamurthy
2006-10-31 7:10 ` Jens Axboe
0 siblings, 1 reply; 5+ messages in thread
From: Ravi Krishnamurthy @ 2006-10-31 4:57 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-kernel
Jens Axboe wrote:
> On Sat, Oct 28 2006, Ravi Krishnamurthy wrote:
>> Hi all,
>>
>> I have written a block driver that registers a virtual device and
>> routes requests to appropriate real devices after some re-mapping of
>> the requests. I am testing the driver by creating a filesystem on the
>> virtual device and copying a large number of files on to it. The test
>> causes the device to become unresponsive after some time. After some
>> debugging, I noticed that this happens only if the I/O scheduler being
>> used is CFQ. I have not had any trouble if the scheduler is noop,
>> anticipatory or deadline. The problem occurs on all the kernels I have
>> tested - 2.6.18-rc2, 2.6.18-rc4, 2.6.19-rc3.
>>
>
> The io scheduler is not obligated to recall your request handling
> function, _unless_ you have no pending io at the point where
> elv_next_request() returns NULL but there are things pending.
> IOW, when you complete your requests you want to just recall your request handling
> function. Just insert something ala:
>
> if (elv_next_request(q))
> q->request_fn(q);
>
> when you are done completing requests.
>
> Does that fix it?
I haven't had a chance to test this fix. A workaround I had tried was to
insert these lines at the end of the request function:
if (! elv_queue_empty(q))
blk_plug_device(q);
This worked for me. So I assume the fix you have suggested will surely
work.
I am curious to know why the problem does not occur when I am using the
anticipatory scheduler. Also, in the suggested fix, is it guaranteed that
elv_next_request() will not return NULL as long as the elevator queue is
not empty?
Thanks,
Ravi.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Block driver freezes when using CFQ
2006-10-31 4:57 ` Block driver freezes when using CFQ Ravi Krishnamurthy
@ 2006-10-31 7:10 ` Jens Axboe
0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2006-10-31 7:10 UTC (permalink / raw)
To: Ravi Krishnamurthy; +Cc: linux-kernel
On Tue, Oct 31 2006, Ravi Krishnamurthy wrote:
> Jens Axboe wrote:
> >On Sat, Oct 28 2006, Ravi Krishnamurthy wrote:
> >>Hi all,
> >>
> >> I have written a block driver that registers a virtual device and
> >>routes requests to appropriate real devices after some re-mapping of
> >>the requests. I am testing the driver by creating a filesystem on the
> >>virtual device and copying a large number of files on to it. The test
> >>causes the device to become unresponsive after some time. After some
> >>debugging, I noticed that this happens only if the I/O scheduler being
> >>used is CFQ. I have not had any trouble if the scheduler is noop,
> >>anticipatory or deadline. The problem occurs on all the kernels I have
> >>tested - 2.6.18-rc2, 2.6.18-rc4, 2.6.19-rc3.
> >>
>
>
> >
> >The io scheduler is not obligated to recall your request handling
> >function, _unless_ you have no pending io at the point where
> >elv_next_request() returns NULL but there are things pending.
> >IOW, when you complete your requests you want to just recall your request
> >handling
> >function. Just insert something ala:
> >
> > if (elv_next_request(q))
> > q->request_fn(q);
> >
> >when you are done completing requests.
> >
> >Does that fix it?
>
> I haven't had a chance to test this fix. A workaround I had tried was to
> insert these lines at the end of the request function:
> if (! elv_queue_empty(q))
> blk_plug_device(q);
>
> This worked for me. So I assume the fix you have suggested will surely
> work.
You don't want to do that. It is the duty of the plugger to unplug the
device again, and in your case that is probably deferred to the timer
auto-unplug. So don't involve plugging, it's a seperate thing. Just
leave the request function when elv_next_request(), and always recall it
when you are done completing requests.
> I am curious to know why the problem does not occur when I am using the
> anticipatory scheduler. Also, in the suggested fix, is it guaranteed that
> elv_next_request() will not return NULL as long as the elevator queue is
> not empty?
Perhaps it recalls ->request_fn() more often than it should. If you call
--
Jens Axboe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Block driver freezes when using CFQ
@ 2006-10-28 6:58 Ravi Krishnamurthy
2006-10-30 6:22 ` Tejun Heo
0 siblings, 1 reply; 5+ messages in thread
From: Ravi Krishnamurthy @ 2006-10-28 6:58 UTC (permalink / raw)
To: linux-kernel; +Cc: axboe
Hi all,
I have written a block driver that registers a virtual device and
routes requests to appropriate real devices after some re-mapping of
the requests. I am testing the driver by creating a filesystem on the
virtual device and copying a large number of files on to it. The test
causes the device to become unresponsive after some time. After some
debugging, I noticed that this happens only if the I/O scheduler being
used is CFQ. I have not had any trouble if the scheduler is noop,
anticipatory or deadline. The problem occurs on all the kernels I have
tested - 2.6.18-rc2, 2.6.18-rc4, 2.6.19-rc3.
Below are some details about the driver and what I have observed during
testing:
The request function registered by my driver is a simple loop -
while ((req = elv_next_request(q))) {
blkdev_dequeue_request(req);
/*
Add request to an internal queue for further processing
Wake up thread to start processing the queue
Update some variables for book-keeping
*/
}
Completed requests are handled in a different thread -
while (work to be done) {
/*
Dequeue completed requests from internal queue
Call end_that_request_first() and end_that_request_last()
Update some variables for book-keeping
*/
}
Several times during the test run, the while() loop in the request
function comes out without dequeuing any request even though the
elevator queue is not empty. (Confirmed by printing the return value of
elv_queue_empty(), and the values of q->rq.count[] outside the loop).
After one such occurrence, the request function is not called at all
and the device becomes unresponsive.
I added some code that lets me trigger the request function from userspace.
If I nudge the driver this way, I/Os continue for a short while and stop
again.
Since CFQ is the default I/O scheduler in current kernels, it has been
widely used and tested. So I suspect I am not doing something right in my
driver. Since the driver works well with the other schedulers, is there
something CFQ-specific that I should take care of?
Please Cc me on the responses since I am not subscribed to lkml.
Thanks,
Ravi.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Block driver freezes when using CFQ
2006-10-28 6:58 Ravi Krishnamurthy
@ 2006-10-30 6:22 ` Tejun Heo
0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2006-10-30 6:22 UTC (permalink / raw)
To: Ravi Krishnamurthy; +Cc: linux-kernel, axboe
Hello, Ravi.
First of all, it usually attracts more people if you include full source
code of something runnable.
Ravi Krishnamurthy wrote:
[--snip--]
> Several times during the test run, the while() loop in the request
> function comes out without dequeuing any request even though the
> elevator queue is not empty. (Confirmed by printing the return value of
> elv_queue_empty(), and the values of q->rq.count[] outside the loop).
Yeap, both cfq and anticipatory pause queue processing to improve
performance. This is a bit counter-intuitive at first but the loooong
seek time justifies such pauses if they can reduce seeks.
> After one such occurrence, the request function is not called at all
> and the device becomes unresponsive.
> I added some code that lets me trigger the request function from userspace.
> If I nudge the driver this way, I/Os continue for a short while and stop
> again.
>
> Since CFQ is the default I/O scheduler in current kernels, it has been
> widely used and tested. So I suspect I am not doing something right in my
> driver. Since the driver works well with the other schedulers, is there
> something CFQ-specific that I should take care of?
After such pauses, cfq does the needed 'nudging' by itself. cfq has
changed quite a bit so I might be mistaken but such 'nudging' ends up
calling blk_start_queueing() which either runs request_fn directly or
unplug the queue if plugged. So, does your driver's queue have proper
unplug function? How is your queue initialized? (you can see why it's
much better to post full working source.)
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-10-31 7:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <454313C9.4010602@adaptec.com>
2006-10-30 8:22 ` [Fwd: Block driver freezes when using CFQ] Jens Axboe
2006-10-31 4:57 ` Block driver freezes when using CFQ Ravi Krishnamurthy
2006-10-31 7:10 ` Jens Axboe
2006-10-28 6:58 Ravi Krishnamurthy
2006-10-30 6:22 ` Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox