All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <jaxboe@fusionio.com>
To: Shaohua Li <shaohua.li@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"hch@infradead.org" <hch@infradead.org>,
	"jmoyer@redhat.com" <jmoyer@redhat.com>
Subject: Re: [PATCH 04/10] block: initial patch for on-stack per-task  plugging
Date: Fri, 18 Mar 2011 14:52:26 +0100	[thread overview]
Message-ID: <4D83639A.3060407@fusionio.com> (raw)
In-Reply-To: <4D83560E.8000002@fusionio.com>

On 2011-03-18 13:54, Jens Axboe wrote:
> On 2011-03-18 07:36, Shaohua Li wrote:
>> On Thu, 2011-03-17 at 17:43 +0800, Jens Axboe wrote:
>>> On 2011-03-17 02:00, Shaohua Li wrote:
>>>> On Thu, 2011-03-17 at 01:31 +0800, Vivek Goyal wrote:
>>>>> On Wed, Mar 16, 2011 at 04:18:30PM +0800, Shaohua Li wrote:
>>>>>> 2011/1/22 Jens Axboe <jaxboe@fusionio.com>:
>>>>>>> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
>>>>>>> ---
>>>>>>>  block/blk-core.c          |  357 ++++++++++++++++++++++++++++++++------------
>>>>>>>  block/elevator.c          |    6 +-
>>>>>>>  include/linux/blk_types.h |    2 +
>>>>>>>  include/linux/blkdev.h    |   30 ++++
>>>>>>>  include/linux/elevator.h  |    1 +
>>>>>>>  include/linux/sched.h     |    6 +
>>>>>>>  kernel/exit.c             |    1 +
>>>>>>>  kernel/fork.c             |    3 +
>>>>>>>  kernel/sched.c            |   11 ++-
>>>>>>>  9 files changed, 317 insertions(+), 100 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>>>>> index 960f12c..42dbfcc 100644
>>>>>>> --- a/block/blk-core.c
>>>>>>> +++ b/block/blk-core.c
>>>>>>> @@ -27,6 +27,7 @@
>>>>>>>  #include <linux/writeback.h>
>>>>>>>  #include <linux/task_io_accounting_ops.h>
>>>>>>>  #include <linux/fault-inject.h>
>>>>>>> +#include <linux/list_sort.h>
>>>>>>>
>>>>>>>  #define CREATE_TRACE_POINTS
>>>>>>>  #include <trace/events/block.h>
>>>>>>> @@ -213,7 +214,7 @@ static void blk_delay_work(struct work_struct *work)
>>>>>>>
>>>>>>>        q = container_of(work, struct request_queue, delay_work.work);
>>>>>>>        spin_lock_irq(q->queue_lock);
>>>>>>> -       q->request_fn(q);
>>>>>>> +       __blk_run_queue(q);
>>>>>>>        spin_unlock_irq(q->queue_lock);
>>>>>>>  }
>>>>>> Hi Jens,
>>>>>> I have some questions about the per-task plugging. Since the request
>>>>>> list is per-task, and each task delivers its requests at finish flush
>>>>>> or schedule. But when one cpu delivers requests to global queue, other
>>>>>> cpus don't know. This seems to have problem. For example:
>>>>>> 1. get_request_wait() can only flush current task's request list,
>>>>>> other cpus/tasks might still have a lot of requests, which aren't sent
>>>>>> to request_queue.
>>>>>
>>>>> But very soon these requests will be sent to request queue as soon task
>>>>> is either scheduled out or task explicitly flushes the plug? So we might
>>>>> wait a bit longer but that might not matter in general, i guess. 
>>>> Yes, I understand there is just a bit delay. I don't know how severe it
>>>> is, but this still could be a problem, especially for fast storage or
>>>> random I/O. My current tests show slight regression (3% or so) with
>>>> Jens's for 2.6.39/core branch. I'm still checking if it's caused by the
>>>> per-task plug, but the per-task plug is highly suspected.
>>>
>>> To check this particular case, you can always just bump the request
>>> limit. What test is showing a slowdown? 
>> this is a simple multi-threaded seq read. The issue tends to be request
>> merge related (not verified yet). The merge reduces about 60% with stack
>> plug from fio reported data. From trace, without stack plug, requests
>> from different threads get merged. But with it, such merge is impossible
>> because flush_plug doesn't check merge, I thought we need add it again.
> 
> What we could try is have the plug flush insert be
> ELEVATOR_INSERT_SORT_MERGE and have it lookup potential backmerges.
> 
> Here's a quick hack that does that, I have not tested it at all.

Gave it a quick test spin, as suspected it had a few issues. This one
seems to work. Can you toss it through that workload and see if it fares
better?

diff --git a/block/blk-core.c b/block/blk-core.c
index e1fcf7a..e1b29e7 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -55,7 +55,7 @@ struct kmem_cache *blk_requestq_cachep;
  */
 static struct workqueue_struct *kblockd_workqueue;
 
-static void drive_stat_acct(struct request *rq, int new_io)
+void drive_stat_acct(struct request *rq, int new_io)
 {
 	struct hd_struct *part;
 	int rw = rq_data_dir(rq);
@@ -2685,7 +2685,7 @@ static void flush_plug_list(struct blk_plug *plug)
 		/*
 		 * rq is already accounted, so use raw insert
 		 */
-		__elv_add_request(q, rq, ELEVATOR_INSERT_SORT);
+		__elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE);
 	}
 
 	if (q) {
diff --git a/block/blk-merge.c b/block/blk-merge.c
index ea85e20..27a7926 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -429,12 +429,14 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 
 	req->__data_len += blk_rq_bytes(next);
 
-	elv_merge_requests(q, req, next);
+	if (next->cmd_flags & REQ_SORTED) {
+		elv_merge_requests(q, req, next);
 
-	/*
-	 * 'next' is going away, so update stats accordingly
-	 */
-	blk_account_io_merge(next);
+		/*
+		 * 'next' is going away, so update stats accordingly
+		 */
+		blk_account_io_merge(next);
+	}
 
 	req->ioprio = ioprio_best(req->ioprio, next->ioprio);
 	if (blk_rq_cpu_valid(next))
@@ -465,3 +467,15 @@ int attempt_front_merge(struct request_queue *q, struct request *rq)
 
 	return 0;
 }
+
+int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
+			  struct request *next)
+{
+	int ret;
+
+	ret = attempt_merge(q, rq, next);
+	if (ret)
+		drive_stat_acct(rq, 0);
+
+	return ret;
+}
diff --git a/block/blk.h b/block/blk.h
index 49d21af..5b8ecbf 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -103,6 +103,9 @@ int ll_front_merge_fn(struct request_queue *q, struct request *req,
 		      struct bio *bio);
 int attempt_back_merge(struct request_queue *q, struct request *rq);
 int attempt_front_merge(struct request_queue *q, struct request *rq);
+int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
+				struct request *next);
+void drive_stat_acct(struct request *rq, int new_io);
 void blk_recalc_rq_segments(struct request *rq);
 void blk_rq_set_mixed_merge(struct request *rq);
 
diff --git a/block/elevator.c b/block/elevator.c
index 542ce82..88bdf81 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -521,6 +521,33 @@ int elv_merge(struct request_queue *q, struct request **req, struct bio *bio)
 	return ELEVATOR_NO_MERGE;
 }
 
+/*
+ * Returns true if we merged, false otherwise
+ */
+static bool elv_attempt_insert_merge(struct request_queue *q,
+				     struct request *rq)
+{
+	struct request *__rq;
+
+	if (blk_queue_nomerges(q) || blk_queue_noxmerges(q))
+		return false;
+
+	/*
+	 * First try one-hit cache.
+	 */
+	if (q->last_merge && blk_attempt_req_merge(q, q->last_merge, rq))
+		return true;
+
+	/*
+	 * See if our hash lookup can find a potential backmerge.
+	 */
+	__rq = elv_rqhash_find(q, blk_rq_pos(rq));
+	if (__rq && blk_attempt_req_merge(q, __rq, rq))
+		return true;
+
+	return false;
+}
+
 void elv_merged_request(struct request_queue *q, struct request *rq, int type)
 {
 	struct elevator_queue *e = q->elevator;
@@ -647,6 +674,9 @@ void elv_insert(struct request_queue *q, struct request *rq, int where)
 		__blk_run_queue(q, false);
 		break;
 
+	case ELEVATOR_INSERT_SORT_MERGE:
+		if (elv_attempt_insert_merge(q, rq))
+			break;
 	case ELEVATOR_INSERT_SORT:
 		BUG_ON(rq->cmd_type != REQ_TYPE_FS &&
 		       !(rq->cmd_flags & REQ_DISCARD));
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index ec6f72b..d93efcc 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -166,6 +166,7 @@ extern struct request *elv_rb_find(struct rb_root *, sector_t);
 #define ELEVATOR_INSERT_SORT	3
 #define ELEVATOR_INSERT_REQUEUE	4
 #define ELEVATOR_INSERT_FLUSH	5
+#define ELEVATOR_INSERT_SORT_MERGE	6
 
 /*
  * return values from elevator_may_queue_fn

-- 
Jens Axboe


  reply	other threads:[~2011-03-18 13:52 UTC|newest]

Thread overview: 158+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-22  1:17 [PATCH 0/10] On-stack explicit block queue plugging Jens Axboe
2011-01-22  1:17 ` [PATCH 01/10] block: add API for delaying work/request_fn a little bit Jens Axboe
2011-01-22  1:17 ` [PATCH 02/10] ide-cd: convert to blk_delay_queue() for a short pause Jens Axboe
2011-01-22  1:19   ` David Miller
2011-01-22  1:17 ` [PATCH 03/10] scsi: convert to blk_delay_queue() Jens Axboe
2011-01-22  1:17 ` [PATCH 04/10] block: initial patch for on-stack per-task plugging Jens Axboe
2011-01-24 19:36   ` Jeff Moyer
2011-01-24 21:23     ` Jens Axboe
2011-03-10 16:54   ` Vivek Goyal
2011-03-10 19:32     ` Jens Axboe
2011-03-10 19:46       ` Vivek Goyal
2011-03-16  8:18   ` Shaohua Li
2011-03-16 17:31     ` Vivek Goyal
2011-03-17  1:00       ` Shaohua Li
2011-03-17  3:19         ` Shaohua Li
2011-03-17  9:44           ` Jens Axboe
2011-03-18  1:55             ` Shaohua Li
2011-03-17  9:43         ` Jens Axboe
2011-03-18  6:36           ` Shaohua Li
2011-03-18 12:54             ` Jens Axboe
2011-03-18 13:52               ` Jens Axboe [this message]
2011-03-21  6:52                 ` Shaohua Li
2011-03-21  9:20                   ` Jens Axboe
2011-03-22  0:32                     ` Shaohua Li
2011-03-22  7:36                       ` Jens Axboe
2011-03-17  9:39     ` Jens Axboe
2011-01-22  1:17 ` [PATCH 05/10] block: remove per-queue plugging Jens Axboe
2011-01-22  1:31   ` Nick Piggin
2011-03-03 21:23   ` Mike Snitzer
2011-03-03 21:27     ` Mike Snitzer
2011-03-03 22:13     ` Mike Snitzer
2011-03-04 13:02       ` Shaohua Li
2011-03-04 13:20         ` Jens Axboe
2011-03-04 21:43         ` Mike Snitzer
2011-03-04 21:50           ` Jens Axboe
2011-03-04 22:27             ` Mike Snitzer
2011-03-05 20:54               ` Jens Axboe
2011-03-07 10:23                 ` Peter Zijlstra
2011-03-07 19:43                   ` Jens Axboe
2011-03-07 20:41                     ` Peter Zijlstra
2011-03-07 20:46                       ` Jens Axboe
2011-03-08  9:38                         ` Peter Zijlstra
2011-03-08  9:41                           ` Jens Axboe
2011-03-07  0:54             ` Shaohua Li
2011-03-07  8:07               ` Jens Axboe
2011-03-08 12:16       ` Jens Axboe
2011-03-08 20:21         ` Mike Snitzer
2011-03-08 20:27           ` Jens Axboe
2011-03-08 21:36             ` Jeff Moyer
2011-03-09  7:25               ` Jens Axboe
2011-03-08 22:05             ` Mike Snitzer
2011-03-10  0:58               ` Mike Snitzer
2011-04-05  3:05                 ` NeilBrown
2011-04-11  4:50                   ` NeilBrown
2011-04-11  4:50                     ` NeilBrown
2011-04-11  9:19                     ` Jens Axboe
2011-04-11 10:59                       ` NeilBrown
2011-04-11 11:04                         ` Jens Axboe
2011-04-11 11:26                           ` NeilBrown
2011-04-11 11:37                             ` Jens Axboe
2011-04-11 12:05                               ` NeilBrown
2011-04-11 12:11                                 ` Jens Axboe
2011-04-11 12:36                                   ` NeilBrown
2011-04-11 12:48                                     ` Jens Axboe
2011-04-12  1:12                                       ` hch
2011-04-12  8:36                                         ` Jens Axboe
2011-04-12 12:22                                           ` Dave Chinner
2011-04-12 12:28                                             ` Jens Axboe
2011-04-12 12:41                                               ` Dave Chinner
2011-04-12 12:58                                                 ` Jens Axboe
2011-04-12 13:31                                                   ` Dave Chinner
2011-04-12 13:45                                                     ` Jens Axboe
2011-04-12 14:34                                                       ` Dave Chinner
2011-04-12 21:08                                                         ` NeilBrown
2011-04-13  2:23                                                           ` Linus Torvalds
2011-04-13 11:12                                                             ` Peter Zijlstra
2011-04-13 11:23                                                               ` Jens Axboe
2011-04-13 11:41                                                                 ` Peter Zijlstra
2011-04-13 15:13                                                                 ` Linus Torvalds
2011-04-13 17:35                                                                   ` Jens Axboe
2011-04-12 16:58                                                     ` hch
2011-04-12 17:29                                                       ` Jens Axboe
2011-04-12 16:44                                                   ` hch
2011-04-12 16:49                                                     ` Jens Axboe
2011-04-12 16:54                                                       ` hch
2011-04-12 17:24                                                         ` Jens Axboe
2011-04-12 13:40                                               ` Dave Chinner
2011-04-12 13:48                                                 ` Jens Axboe
2011-04-12 23:35                                                   ` Dave Chinner
2011-04-12 16:50                                           ` hch
2011-04-15  4:26                                   ` hch
2011-04-15  6:34                                     ` Jens Axboe
2011-04-17 22:19                                   ` NeilBrown
2011-04-17 22:19                                     ` NeilBrown
2011-04-18  4:19                                     ` NeilBrown
2011-04-18  4:19                                       ` NeilBrown
2011-04-18  6:38                                     ` Jens Axboe
2011-04-18  7:25                                       ` NeilBrown
2011-04-18  8:10                                         ` Jens Axboe
2011-04-18  8:10                                           ` Jens Axboe
2011-04-18  8:33                                           ` NeilBrown
2011-04-18  8:42                                             ` Jens Axboe
2011-04-18  8:42                                               ` Jens Axboe
2011-04-18 21:23                                             ` hch
2011-04-22 15:39                                               ` hch
2011-04-22 16:01                                                 ` Vivek Goyal
2011-04-22 16:10                                                   ` Vivek Goyal
2011-04-18 21:30                                             ` hch
2011-04-18 22:38                                               ` NeilBrown
2011-04-20 10:55                                                 ` hch
2011-04-18  9:19                                           ` hch
2011-04-18  9:40                                             ` [dm-devel] " Hannes Reinecke
2011-04-18  9:40                                               ` Hannes Reinecke
2011-04-18  9:47                                               ` Jens Axboe
2011-04-18  9:46                                             ` Jens Axboe
2011-04-11 11:55                         ` NeilBrown
2011-04-11 12:12                           ` Jens Axboe
2011-04-11 22:58                             ` hch
2011-04-12  6:20                               ` Jens Axboe
2011-04-11 16:59                     ` hch
2011-04-11 21:14                       ` NeilBrown
2011-04-11 22:59                         ` hch
2011-04-12  6:18                         ` Jens Axboe
2011-03-17 15:51               ` Mike Snitzer
2011-03-17 18:31                 ` Jens Axboe
2011-03-17 18:46                   ` Mike Snitzer
2011-03-18  9:15                     ` hch
2011-03-08 12:15     ` Jens Axboe
2011-03-04  4:00   ` Vivek Goyal
2011-03-08 12:24     ` Jens Axboe
2011-03-08 22:10       ` blk-throttle: Use blk_plug in throttle code (Was: Re: [PATCH 05/10] block: remove per-queue plugging) Vivek Goyal
2011-03-09  7:26         ` Jens Axboe
2011-01-22  1:17 ` [PATCH 06/10] block: kill request allocation batching Jens Axboe
2011-01-22  9:31   ` Christoph Hellwig
2011-01-24 19:09     ` Jens Axboe
2011-01-22  1:17 ` [PATCH 07/10] fs: make generic file read/write functions plug Jens Axboe
2011-01-24  3:57   ` Dave Chinner
2011-01-24 19:11     ` Jens Axboe
2011-03-04  4:09   ` Vivek Goyal
2011-03-04 13:22     ` Jens Axboe
2011-03-04 13:25       ` hch
2011-03-04 13:40         ` Jens Axboe
2011-03-04 14:08           ` hch
2011-03-04 22:07             ` Jens Axboe
2011-03-04 23:12               ` hch
2011-03-08 12:38         ` Jens Axboe
2011-03-09 10:38           ` hch
2011-03-09 10:52             ` Jens Axboe
2011-01-22  1:17 ` [PATCH 08/10] read-ahead: use plugging Jens Axboe
2011-01-22  1:17 ` [PATCH 09/10] fs: make mpage read/write_pages() plug Jens Axboe
2011-01-22  1:17 ` [PATCH 10/10] fs: make aio plug Jens Axboe
2011-01-24 17:59   ` Jeff Moyer
2011-01-24 19:09     ` Jens Axboe
2011-01-24 19:15       ` Jeff Moyer
2011-01-24 19:22         ` Jens Axboe
2011-01-24 19:29           ` Jeff Moyer
2011-01-24 19:31             ` Jens Axboe
2011-01-24 19:38               ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D83639A.3060407@fusionio.com \
    --to=jaxboe@fusionio.com \
    --cc=hch@infradead.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.