Re: CFQ: async queue blocks the whole system

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Tao Ma <tm@tao.ma>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-kernel@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
Subject: Re: CFQ: async queue blocks the whole system
Date: Thu, 09 Jun 2011 23:44:21 +0800	[thread overview]
Message-ID: <4DF0EA55.10209@tao.ma> (raw)
In-Reply-To: <20110609153738.GF29913@redhat.com>

On 06/09/2011 11:37 PM, Vivek Goyal wrote:
> On Thu, Jun 09, 2011 at 10:47:43PM +0800, Tao Ma wrote:
>> Hi Vivek,
>> 	Thanks for the quick response.
>> On 06/09/2011 10:14 PM, Vivek Goyal wrote:
>>> On Thu, Jun 09, 2011 at 06:49:37PM +0800, Tao Ma wrote:
>>>> Hi Jens and Vivek,
>>>> 	We are current running some heavy ext4 metadata test,
>>>> and we found a very severe problem for CFQ. Please correct me if
>>>> my statement below is wrong.
>>>>
>>>> CFQ only has an async queue for every priority of every class and
>>>> these queues have a very low serving priority, so if the system
>>>> has a large number of sync reads, these queues will be delayed a
>>>> lot of time. As a result, the flushers will be blocked, then the
>>>> journal and finally our applications[1].
>>>>
>>>> I have tried to let jbd/2 to use WRITE_SYNC so that they can checkpoint
>>>> in time and the patches are sent. But today we found another similar
>>>> block in kswapd which make me think that maybe CFQ should be changed
>>>> somehow so that all these callers can benefit from it.
>>>>
>>>> So is there any way to let the async queue work timely or at least
>>>> is there any deadline for async queue to finish an request in time
>>>> even in case there are many reads?
>>>>
>>>> btw, We have tested deadline scheduler and it seems to work in our test.
>>>>
>>>> [1] the message we get from one system:
>>>> INFO: task flush-8:0:2950 blocked for more than 120 seconds.
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> flush-8:0       D ffff88062bfde738     0  2950      2 0x00000000
>>>>  ffff88062b137820 0000000000000046 ffff88062b137750 ffffffff812b7bc3
>>>>  ffff88032cddc000 ffff88062bfde380 ffff88032d3d8840 0000000c2be37400
>>>>  000000002be37601 0000000000000006 ffff88062b137760 ffffffff811c242e
>>>> Call Trace:
>>>>  [<ffffffff812b7bc3>] ? scsi_request_fn+0x345/0x3df
>>>>  [<ffffffff811c242e>] ? __blk_run_queue+0x1a/0x1c
>>>>  [<ffffffff811c57cc>] ? queue_unplugged+0x77/0x8e
>>>>  [<ffffffff813dbe67>] io_schedule+0x47/0x61
>>>>  [<ffffffff811c512c>] get_request_wait+0xe0/0x152
>>>
>>> Ok, so flush slept on trying to get a "request" allocated on request 
>>> queue. That means all the ASYNC request descriptors are already consumed
>>> and we are not making progress with ASYNc requests.
>>>
>>> A relatively recent patch allowed sync queues to always preempt async queues
>>> and schedule sync workload instead of async. This had the potential to
>>> starve async queues and looks like that's what we are running into.
>>>
>>> commit f8ae6e3eb8251be32c6e913393d9f8d9e0609489
>>> Author: Shaohua Li <shaohua.li@intel.com>
>>> Date:   Fri Jan 14 08:41:02 2011 +0100
>>>
>>>     block cfq: make queue preempt work for queues from different workload
>>>
>>> Do you have few seconds of blktrace. I just wanted to verify that this
>>> is what we are running into. 
>> We are using the latest kernel, so the patch is already there. :(
>>
>> You are right that all the requests have been allocated and the flusher
>> is waiting for requests to be available. But the root cause is that in
>> heavy sync read, the async queue in cfq is delayed too much. I have
>> added some traces in the cfq codes path and after several investigation,
>> I found several interesting things and tried to improve it. But I am not
>> sure whether it is bug or it is designed intentionally.
>>
>> 1. In cfq_dispatch_requests we select a sync queue to serve, but if the
>> queue has too much requests in flight, the cfq_slice_used_soon may be
>> true and the cfqq isn't allowed to send and will waste some timeslice.
>> Then why choose this cfqq? Why not choose a qualified one?
> 
> CFQ in general tries not to drive too deep a queue depth in an effort
> to improve latencies. CFQ is generally recommened for slow SATA drives
> and dispatching too many requests from a single queue can only serve to
> increase the latency.
ok, so do you mean that for a fast drive, cfq isn't recommended and
deadline is always prefered? ;) We have a SAS with queue_depth=128, so
it should be a fast drive I guess. :)
> 
>>
>> 2. async queue isn't allowed to be sent if there is some sync request in
>> fly, but as now most of the devices has a greater depth, should we
>> improve it somehow? I guess queue_depth should be a valid number maybe?
> 
> We seem to be running this batching thing in cfq_may_dispatch() where
> we drain sync requests before async is dispatched and vice-a-versa. 
> I am not sure how does this batching thing helps. I think Jens should
> be a better person to comment on that.
> 
> I ran a fio job with few readers and few writers. I do see that few times
> we have schedule ASYNC workload/queue but did not dispatch a request
> from that. And reason being that there are sync requests in flight. And
> by the time sync requests finish, async queue gets preempted.
> 
> So async queue does it scheduled but never gets a chance to dispatch
> a request because there was sync IO in flight.
yeah, that's one thing I found in my test.
> 
> If there is no major advantage of draining sync requests before async
> is dispatched, I think this should be an easy fix.
>  
>>
>> 3. Even there is no sync i/o, the async queue isn't allowed to send too
>> much requests because of the check in cfq_may_dispatch "Async queues
>> must wait a bit before being allowed dispatch", so in my test the async
>> queue has several chances to be selected, but it is only allowed
>> todispatch one request at a time. It is really amazing.
> 
> Again heavily loaded to improve sync latencies. Say you have queue
> depth of 128 and you fill that all with async requests because right
> now there is no sync request around. Then a sync request comes in.
> We don't have a way to give it a priority and it might happen that
> it gets executed after 128 async requests have finished (driver and
> drive dependent though).
> 
> So in an attempt to improve sync latencies we don't drive too
> high queue depths.
> 
> Its latency vs throughput tradeoff.
ok, so it seems that all these are designed, not a bug. Thanks for the
clarification.

btw, reverting the patch doesn't work. I can still get the livelock.

Regards,
Tao

next prev parent reply	other threads:[~2011-06-09 15:44 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-09 10:49 CFQ: async queue blocks the whole system Tao Ma
2011-06-09 14:14 ` Vivek Goyal
2011-06-09 14:34   ` Jens Axboe
2011-06-09 14:47   ` Tao Ma
2011-06-09 15:37     ` Vivek Goyal
2011-06-09 15:44       ` Tao Ma [this message]
2011-06-09 18:27         ` Vivek Goyal
2011-06-10  5:48           ` Tao Ma
2011-06-10  9:14             ` Vivek Goyal
2011-06-10 10:00               ` Tao Ma
2011-06-10 15:44                 ` Vivek Goyal
2011-06-11  7:24                   ` Tao Ma
2011-06-13 10:08                   ` Tao Ma
2011-06-13 21:41                     ` Vivek Goyal
2011-06-14  7:03                       ` Tao Ma
2011-06-14 13:30                         ` Vivek Goyal
2011-06-14 15:42                           ` Tao Ma
2011-06-14 21:14                             ` Vivek Goyal
2011-06-17  3:04                   ` Tao Ma
2011-06-17 12:50                     ` Vivek Goyal
2011-06-17 14:34                       ` Tao Ma
2011-06-10  1:19       ` Shaohua Li
2011-06-10  1:34         ` Shaohua Li
2011-06-10  2:06           ` Tao Ma
2011-06-10  2:35             ` Shaohua Li
2011-06-10  3:02               ` Tao Ma
2011-06-10  9:20                 ` Vivek Goyal
2011-06-10  9:21                   ` Jens Axboe
2011-06-13  1:03                   ` Shaohua Li
2011-06-10  9:17         ` Vivek Goyal
2011-06-10  9:20           ` Jens Axboe
2011-06-10  9:29             ` Vivek Goyal
2011-06-10  9:31               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DF0EA55.10209@tao.ma \
    --to=tm@tao.ma \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox