From: Shaohua Li <shaohua.li@intel.com>
To: Jens Axboe <jaxboe@fusionio.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"hch@infradead.org" <hch@infradead.org>,
"jmoyer@redhat.com" <jmoyer@redhat.com>,
Vivek Goyal <vgoyal@redhat.com>
Subject: Re: [PATCH 04/10] block: initial patch for on-stack per-task plugging
Date: Fri, 18 Mar 2011 09:55:40 +0800 [thread overview]
Message-ID: <1300413340.2337.129.camel@sli10-conroe> (raw)
In-Reply-To: <4D81D813.8060608@fusionio.com>
On Thu, 2011-03-17 at 17:44 +0800, Jens Axboe wrote:
> On 2011-03-17 04:19, Shaohua Li wrote:
> > On Thu, 2011-03-17 at 09:00 +0800, Shaohua Li wrote:
> >> On Thu, 2011-03-17 at 01:31 +0800, Vivek Goyal wrote:
> >>> On Wed, Mar 16, 2011 at 04:18:30PM +0800, Shaohua Li wrote:
> >>>> 2011/1/22 Jens Axboe <jaxboe@fusionio.com>:
> >>>>> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
> >>>>> ---
> >>>>> block/blk-core.c | 357 ++++++++++++++++++++++++++++++++------------
> >>>>> block/elevator.c | 6 +-
> >>>>> include/linux/blk_types.h | 2 +
> >>>>> include/linux/blkdev.h | 30 ++++
> >>>>> include/linux/elevator.h | 1 +
> >>>>> include/linux/sched.h | 6 +
> >>>>> kernel/exit.c | 1 +
> >>>>> kernel/fork.c | 3 +
> >>>>> kernel/sched.c | 11 ++-
> >>>>> 9 files changed, 317 insertions(+), 100 deletions(-)
> >>>>>
> >>>>> diff --git a/block/blk-core.c b/block/blk-core.c
> >>>>> index 960f12c..42dbfcc 100644
> >>>>> --- a/block/blk-core.c
> >>>>> +++ b/block/blk-core.c
> >>>>> @@ -27,6 +27,7 @@
> >>>>> #include <linux/writeback.h>
> >>>>> #include <linux/task_io_accounting_ops.h>
> >>>>> #include <linux/fault-inject.h>
> >>>>> +#include <linux/list_sort.h>
> >>>>>
> >>>>> #define CREATE_TRACE_POINTS
> >>>>> #include <trace/events/block.h>
> >>>>> @@ -213,7 +214,7 @@ static void blk_delay_work(struct work_struct *work)
> >>>>>
> >>>>> q = container_of(work, struct request_queue, delay_work.work);
> >>>>> spin_lock_irq(q->queue_lock);
> >>>>> - q->request_fn(q);
> >>>>> + __blk_run_queue(q);
> >>>>> spin_unlock_irq(q->queue_lock);
> >>>>> }
> >>>> Hi Jens,
> >>>> I have some questions about the per-task plugging. Since the request
> >>>> list is per-task, and each task delivers its requests at finish flush
> >>>> or schedule. But when one cpu delivers requests to global queue, other
> >>>> cpus don't know. This seems to have problem. For example:
> >>>> 1. get_request_wait() can only flush current task's request list,
> >>>> other cpus/tasks might still have a lot of requests, which aren't sent
> >>>> to request_queue.
> >>>
> >>> But very soon these requests will be sent to request queue as soon task
> >>> is either scheduled out or task explicitly flushes the plug? So we might
> >>> wait a bit longer but that might not matter in general, i guess.
> >> Yes, I understand there is just a bit delay. I don't know how severe it
> >> is, but this still could be a problem, especially for fast storage or
> >> random I/O. My current tests show slight regression (3% or so) with
> >> Jens's for 2.6.39/core branch. I'm still checking if it's caused by the
> >> per-task plug, but the per-task plug is highly suspected.
> >>
> >>>> your ioc-rq-alloc branch is for this, right? Will it
> >>>> be pushed to 2.6.39 too? I'm wondering if we should limit per-task
> >>>> queue length. If there are enough requests there, we force a flush
> >>>> plug.
> >>>
> >>> That's the idea jens had. But then came the question of maintaining
> >>> data structures per task per disk. That makes it complicated.
> >>>
> >>> Even if we move the accounting out of request queue and do it say at
> >>> bdi, ideally we shall to do per task per bdi accounting.
> >>>
> >>> Jens seemed to be suggesting that generally fluser threads are the
> >>> main cluprit for submitting large amount of IO. They are already per
> >>> bdi. So probably just maintain a per task limit for flusher threads.
> >> Yep, flusher is the main spot in my mind. We need call more flush plug
> >> for flusher thread.
> >>
> >>> I am not sure what happens to direct reclaim path, AIO deep queue
> >>> paths etc.
> >> direct reclaim path could build deep write queue too. It
> >> uses .writepage, currently there is no flush plug there. Maybe we need
> >> add flush plug in shrink_inactive_list too.
> >>
> >>>> 2. some APIs like blk_delay_work, which call __blk_run_queue() might
> >>>> not work. because other CPUs might not dispatch their requests to
> >>>> request queue. So __blk_run_queue will eventually find no requests,
> >>>> which might stall devices.
> >>>> Since one cpu doesn't know other cpus' request list, I'm wondering if
> >>>> there are other similar issues.
> >>>
> >>> So again in this case if queue is empty at the time of __blk_run_queue(),
> >>> then we will probably just experinece little more delay then intended
> >>> till some task flushes. But should not stall the system?
> >> not stall the system, but device stalls a little time.
> > Jens,
> > I need below patch to recover a ffsb fsync workload, which has about 30%
> > regression with stack plug.
> > I guess the reason is WRITE_SYNC_PLUG doesn't work now, so if a context
> > hasn't blk_plug, we lose previous plug (request merge). This suggests
> > all places we use WRITE_SYNC_PLUG before (for example, kjournald) should
> > have a blk_plug context.
>
> Good point, those should be auto-converted. I'll take this patch and
> double check the others. Thanks!
>
> Does it remove that performance regression completely?
Yes, it removes the regression completely at my side.
next prev parent reply other threads:[~2011-03-18 1:55 UTC|newest]
Thread overview: 152+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-22 1:17 [PATCH 0/10] On-stack explicit block queue plugging Jens Axboe
2011-01-22 1:17 ` [PATCH 01/10] block: add API for delaying work/request_fn a little bit Jens Axboe
2011-01-22 1:17 ` [PATCH 02/10] ide-cd: convert to blk_delay_queue() for a short pause Jens Axboe
2011-01-22 1:19 ` David Miller
2011-01-22 1:17 ` [PATCH 03/10] scsi: convert to blk_delay_queue() Jens Axboe
2011-01-22 1:17 ` [PATCH 04/10] block: initial patch for on-stack per-task plugging Jens Axboe
2011-01-24 19:36 ` Jeff Moyer
2011-01-24 21:23 ` Jens Axboe
2011-03-10 16:54 ` Vivek Goyal
2011-03-10 19:32 ` Jens Axboe
2011-03-10 19:46 ` Vivek Goyal
2011-03-16 8:18 ` Shaohua Li
2011-03-16 17:31 ` Vivek Goyal
2011-03-17 1:00 ` Shaohua Li
2011-03-17 3:19 ` Shaohua Li
2011-03-17 9:44 ` Jens Axboe
2011-03-18 1:55 ` Shaohua Li [this message]
2011-03-17 9:43 ` Jens Axboe
2011-03-18 6:36 ` Shaohua Li
2011-03-18 12:54 ` Jens Axboe
2011-03-18 13:52 ` Jens Axboe
2011-03-21 6:52 ` Shaohua Li
2011-03-21 9:20 ` Jens Axboe
2011-03-22 0:32 ` Shaohua Li
2011-03-22 7:36 ` Jens Axboe
2011-03-17 9:39 ` Jens Axboe
2011-01-22 1:17 ` [PATCH 05/10] block: remove per-queue plugging Jens Axboe
2011-01-22 1:31 ` Nick Piggin
2011-03-03 21:23 ` Mike Snitzer
2011-03-03 21:27 ` Mike Snitzer
2011-03-03 22:13 ` Mike Snitzer
2011-03-04 13:02 ` Shaohua Li
2011-03-04 13:20 ` Jens Axboe
2011-03-04 21:43 ` Mike Snitzer
2011-03-04 21:50 ` Jens Axboe
2011-03-04 22:27 ` Mike Snitzer
2011-03-05 20:54 ` Jens Axboe
2011-03-07 10:23 ` Peter Zijlstra
2011-03-07 19:43 ` Jens Axboe
2011-03-07 20:41 ` Peter Zijlstra
2011-03-07 20:46 ` Jens Axboe
2011-03-08 9:38 ` Peter Zijlstra
2011-03-08 9:41 ` Jens Axboe
2011-03-07 0:54 ` Shaohua Li
2011-03-07 8:07 ` Jens Axboe
2011-03-08 12:16 ` Jens Axboe
2011-03-08 20:21 ` Mike Snitzer
2011-03-08 20:27 ` Jens Axboe
2011-03-08 21:36 ` Jeff Moyer
2011-03-09 7:25 ` Jens Axboe
2011-03-08 22:05 ` Mike Snitzer
2011-03-10 0:58 ` Mike Snitzer
2011-04-05 3:05 ` NeilBrown
2011-04-11 4:50 ` NeilBrown
2011-04-11 9:19 ` Jens Axboe
2011-04-11 10:59 ` NeilBrown
2011-04-11 11:04 ` Jens Axboe
2011-04-11 11:26 ` NeilBrown
2011-04-11 11:37 ` Jens Axboe
2011-04-11 12:05 ` NeilBrown
2011-04-11 12:11 ` Jens Axboe
2011-04-11 12:36 ` NeilBrown
2011-04-11 12:48 ` Jens Axboe
2011-04-12 1:12 ` hch
2011-04-12 8:36 ` Jens Axboe
2011-04-12 12:22 ` Dave Chinner
2011-04-12 12:28 ` Jens Axboe
2011-04-12 12:41 ` Dave Chinner
2011-04-12 12:58 ` Jens Axboe
2011-04-12 13:31 ` Dave Chinner
2011-04-12 13:45 ` Jens Axboe
2011-04-12 14:34 ` Dave Chinner
2011-04-12 21:08 ` NeilBrown
2011-04-13 2:23 ` Linus Torvalds
2011-04-13 11:12 ` Peter Zijlstra
2011-04-13 11:23 ` Jens Axboe
2011-04-13 11:41 ` Peter Zijlstra
2011-04-13 15:13 ` Linus Torvalds
2011-04-13 17:35 ` Jens Axboe
2011-04-12 16:58 ` hch
2011-04-12 17:29 ` Jens Axboe
2011-04-12 16:44 ` hch
2011-04-12 16:49 ` Jens Axboe
2011-04-12 16:54 ` hch
2011-04-12 17:24 ` Jens Axboe
2011-04-12 13:40 ` Dave Chinner
2011-04-12 13:48 ` Jens Axboe
2011-04-12 23:35 ` Dave Chinner
2011-04-12 16:50 ` hch
2011-04-15 4:26 ` hch
2011-04-15 6:34 ` Jens Axboe
2011-04-17 22:19 ` NeilBrown
2011-04-18 4:19 ` NeilBrown
2011-04-18 6:38 ` Jens Axboe
2011-04-18 7:25 ` NeilBrown
2011-04-18 8:10 ` Jens Axboe
2011-04-18 8:33 ` NeilBrown
2011-04-18 8:42 ` Jens Axboe
2011-04-18 21:23 ` hch
2011-04-22 15:39 ` hch
2011-04-22 16:01 ` Vivek Goyal
2011-04-22 16:10 ` Vivek Goyal
2011-04-18 21:30 ` hch
2011-04-18 22:38 ` NeilBrown
2011-04-20 10:55 ` hch
2011-04-18 9:19 ` hch
2011-04-18 9:40 ` [dm-devel] " Hannes Reinecke
2011-04-18 9:47 ` Jens Axboe
2011-04-18 9:46 ` Jens Axboe
2011-04-11 11:55 ` NeilBrown
2011-04-11 12:12 ` Jens Axboe
2011-04-11 22:58 ` hch
2011-04-12 6:20 ` Jens Axboe
2011-04-11 16:59 ` hch
2011-04-11 21:14 ` NeilBrown
2011-04-11 22:59 ` hch
2011-04-12 6:18 ` Jens Axboe
2011-03-17 15:51 ` Mike Snitzer
2011-03-17 18:31 ` Jens Axboe
2011-03-17 18:46 ` Mike Snitzer
2011-03-18 9:15 ` hch
2011-03-08 12:15 ` Jens Axboe
2011-03-04 4:00 ` Vivek Goyal
2011-03-08 12:24 ` Jens Axboe
2011-03-08 22:10 ` blk-throttle: Use blk_plug in throttle code (Was: Re: [PATCH 05/10] block: remove per-queue plugging) Vivek Goyal
2011-03-09 7:26 ` Jens Axboe
2011-01-22 1:17 ` [PATCH 06/10] block: kill request allocation batching Jens Axboe
2011-01-22 9:31 ` Christoph Hellwig
2011-01-24 19:09 ` Jens Axboe
2011-01-22 1:17 ` [PATCH 07/10] fs: make generic file read/write functions plug Jens Axboe
2011-01-24 3:57 ` Dave Chinner
2011-01-24 19:11 ` Jens Axboe
2011-03-04 4:09 ` Vivek Goyal
2011-03-04 13:22 ` Jens Axboe
2011-03-04 13:25 ` hch
2011-03-04 13:40 ` Jens Axboe
2011-03-04 14:08 ` hch
2011-03-04 22:07 ` Jens Axboe
2011-03-04 23:12 ` hch
2011-03-08 12:38 ` Jens Axboe
2011-03-09 10:38 ` hch
2011-03-09 10:52 ` Jens Axboe
2011-01-22 1:17 ` [PATCH 08/10] read-ahead: use plugging Jens Axboe
2011-01-22 1:17 ` [PATCH 09/10] fs: make mpage read/write_pages() plug Jens Axboe
2011-01-22 1:17 ` [PATCH 10/10] fs: make aio plug Jens Axboe
2011-01-24 17:59 ` Jeff Moyer
2011-01-24 19:09 ` Jens Axboe
2011-01-24 19:15 ` Jeff Moyer
2011-01-24 19:22 ` Jens Axboe
2011-01-24 19:29 ` Jeff Moyer
2011-01-24 19:31 ` Jens Axboe
2011-01-24 19:38 ` Jeff Moyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1300413340.2337.129.camel@sli10-conroe \
--to=shaohua.li@intel.com \
--cc=hch@infradead.org \
--cc=jaxboe@fusionio.com \
--cc=jmoyer@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).