From: Jens Axboe <axboe@kernel.dk>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Tao Ma <tm@tao.ma>, linux-kernel@vger.kernel.org
Subject: Re: CFQ: async queue blocks the whole system
Date: Thu, 09 Jun 2011 16:34:08 +0200 [thread overview]
Message-ID: <4DF0D9E0.1060107@kernel.dk> (raw)
In-Reply-To: <20110609141451.GD29913@redhat.com>
On 2011-06-09 16:14, Vivek Goyal wrote:
> On Thu, Jun 09, 2011 at 06:49:37PM +0800, Tao Ma wrote:
>> Hi Jens and Vivek,
>> We are current running some heavy ext4 metadata test,
>> and we found a very severe problem for CFQ. Please correct me if
>> my statement below is wrong.
>>
>> CFQ only has an async queue for every priority of every class and
>> these queues have a very low serving priority, so if the system
>> has a large number of sync reads, these queues will be delayed a
>> lot of time. As a result, the flushers will be blocked, then the
>> journal and finally our applications[1].
>>
>> I have tried to let jbd/2 to use WRITE_SYNC so that they can checkpoint
>> in time and the patches are sent. But today we found another similar
>> block in kswapd which make me think that maybe CFQ should be changed
>> somehow so that all these callers can benefit from it.
>>
>> So is there any way to let the async queue work timely or at least
>> is there any deadline for async queue to finish an request in time
>> even in case there are many reads?
>>
>> btw, We have tested deadline scheduler and it seems to work in our test.
>>
>> [1] the message we get from one system:
>> INFO: task flush-8:0:2950 blocked for more than 120 seconds.
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> flush-8:0 D ffff88062bfde738 0 2950 2 0x00000000
>> ffff88062b137820 0000000000000046 ffff88062b137750 ffffffff812b7bc3
>> ffff88032cddc000 ffff88062bfde380 ffff88032d3d8840 0000000c2be37400
>> 000000002be37601 0000000000000006 ffff88062b137760 ffffffff811c242e
>> Call Trace:
>> [<ffffffff812b7bc3>] ? scsi_request_fn+0x345/0x3df
>> [<ffffffff811c242e>] ? __blk_run_queue+0x1a/0x1c
>> [<ffffffff811c57cc>] ? queue_unplugged+0x77/0x8e
>> [<ffffffff813dbe67>] io_schedule+0x47/0x61
>> [<ffffffff811c512c>] get_request_wait+0xe0/0x152
>
> Ok, so flush slept on trying to get a "request" allocated on request
> queue. That means all the ASYNC request descriptors are already consumed
> and we are not making progress with ASYNc requests.
>
> A relatively recent patch allowed sync queues to always preempt async queues
> and schedule sync workload instead of async. This had the potential to
> starve async queues and looks like that's what we are running into.
>
> commit f8ae6e3eb8251be32c6e913393d9f8d9e0609489
> Author: Shaohua Li <shaohua.li@intel.com>
> Date: Fri Jan 14 08:41:02 2011 +0100
>
> block cfq: make queue preempt work for queues from different workload
>
> Do you have few seconds of blktrace. I just wanted to verify that this
> is what we are running into.
That's a good first step. Tao Ma, is this a known regression or is that
unknown?
On vacation this week, I'll look into as soon as I get back.
--
Jens Axboe
next prev parent reply other threads:[~2011-06-09 14:34 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-09 10:49 CFQ: async queue blocks the whole system Tao Ma
2011-06-09 14:14 ` Vivek Goyal
2011-06-09 14:34 ` Jens Axboe [this message]
2011-06-09 14:47 ` Tao Ma
2011-06-09 15:37 ` Vivek Goyal
2011-06-09 15:44 ` Tao Ma
2011-06-09 18:27 ` Vivek Goyal
2011-06-10 5:48 ` Tao Ma
2011-06-10 9:14 ` Vivek Goyal
2011-06-10 10:00 ` Tao Ma
2011-06-10 15:44 ` Vivek Goyal
2011-06-11 7:24 ` Tao Ma
2011-06-13 10:08 ` Tao Ma
2011-06-13 21:41 ` Vivek Goyal
2011-06-14 7:03 ` Tao Ma
2011-06-14 13:30 ` Vivek Goyal
2011-06-14 15:42 ` Tao Ma
2011-06-14 21:14 ` Vivek Goyal
2011-06-17 3:04 ` Tao Ma
2011-06-17 12:50 ` Vivek Goyal
2011-06-17 14:34 ` Tao Ma
2011-06-10 1:19 ` Shaohua Li
2011-06-10 1:34 ` Shaohua Li
2011-06-10 2:06 ` Tao Ma
2011-06-10 2:35 ` Shaohua Li
2011-06-10 3:02 ` Tao Ma
2011-06-10 9:20 ` Vivek Goyal
2011-06-10 9:21 ` Jens Axboe
2011-06-13 1:03 ` Shaohua Li
2011-06-10 9:17 ` Vivek Goyal
2011-06-10 9:20 ` Jens Axboe
2011-06-10 9:29 ` Vivek Goyal
2011-06-10 9:31 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DF0D9E0.1060107@kernel.dk \
--to=axboe@kernel.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=tm@tao.ma \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox