public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: "Alan D. Brunelle" <Alan.Brunelle@hp.com>
Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
	Corrado Zoccolo <czoccolo@gmail.com>
Subject: Re: [RFC] Block IO Controller V2 - some results
Date: Wed, 18 Nov 2009 10:32:27 -0500	[thread overview]
Message-ID: <20091118153227.GA5796@redhat.com> (raw)
In-Reply-To: <1258461527.2862.2.camel@cail>

On Tue, Nov 17, 2009 at 07:38:47AM -0500, Alan D. Brunelle wrote:
> On Mon, 2009-11-16 at 17:18 -0500, Vivek Goyal wrote:
> > On Mon, Nov 16, 2009 at 03:51:00PM -0500, Alan D. Brunelle wrote:
> > 
> > [..]
> > > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> > > 
> > > The next thing to look at is to see what the "penalty" is for the
> > > additional code: see how much bandwidth we lose for the capability
> > > added. Here we see the sum of the system's throughput for the various
> > > tests:
> > > 
> > > ---- ---- - ----------- ----------- ----------- ----------- 
> > > Mode RdWr N    base       ioc off   ioc no idle  ioc idle   
> > > ---- ---- - ----------- ----------- ----------- ----------- 
> > >  rnd   rd 2        17.3        17.1         9.4         9.1 
> > >  rnd   rd 4        27.1        27.1         8.1         8.2 
> > >  rnd   rd 8        37.1        37.1         6.8         7.1 
> > > 
> > 

Hi Alan,

I got hold of a better system where few disks are striped. On this system
I can also see the performance drop and it does come from the fact that
we idle on empty sync-noidle service tree. So with multiple groups we
increase the instances of sync-noidle trees hence more idling and hence
reduced throughput.

With this patch, I had done a little optimization where I don't idle on
a sync-noidle service tree if there are no competing sync-idle or async
queues.

What that means is that if a group is doing only random IO and it does not
have sufficient IO to keep disk busy, then we will move onto next group
and start dispatching from that. So in your tests, with this patch you 
should see better results for random reads with group_idle=0. With
group_idle=1, results will still remain same as we continue to idle on
sync-noidle service tree.

It is working for me. Can you please try it out. You can just apply this
patch on top of V3. (You don't have to apply hw_tag patch from corrodo).

Thanks
Vivek


o Now we don't remove the queue from service tree until we expire it, even if
  queue is empty. So st->count = 0 is not a valid state while we have a cfqq
  at hand. Fix it in arm_slice_timer().

o wait_busy gets set only if group is emtpy. If st->count > 1, then group is
  not empty and wait_busy will not be set. Remove that extra check.

o There is no need to idle on aysnc service tree as it is backlogged most of
  the time if writes are on. Those queues don't get deleted hence don't wait
  on async service tree. Similiarly don't wait on sync-idle service tree as
  we do idling on individual queues if need be. Fixed cfq_should_idle().

o Currently we wait on sync-noidle service tree so that sync-noidle type of
  workload does not get swamped by sync-idle or async type of workload. Don't
  do this idling if there are no sync-idle or async type of queues in the group
  and there are other groups to dispatch the requests from and user has decided
  not to wait on slow groups to achieve better throughput. (group_idle=0).

  This will make sure if some group is doing just random IO and does not
  have sufficient IO to keep the disk busy, we will move onto other groups to
  dispatch the requests from and utilize the storage better. 

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c |   35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

Index: linux6/block/cfq-iosched.c
===================================================================
--- linux6.orig/block/cfq-iosched.c	2009-11-17 14:44:09.000000000 -0500
+++ linux6/block/cfq-iosched.c	2009-11-18 10:09:33.000000000 -0500
@@ -899,6 +899,10 @@ static bool cfq_should_idle(struct cfq_d
 {
 	enum wl_prio_t prio = cfqq_prio(cfqq);
 	struct cfq_rb_root *service_tree = cfqq->service_tree;
+	struct cfq_group *cfqg = cfqq->cfqg;
+
+	BUG_ON(!service_tree);
+	BUG_ON(!service_tree->count);
 
 	/* We never do for idle class queues. */
 	if (prio == IDLE_WORKLOAD)
@@ -908,18 +912,18 @@ static bool cfq_should_idle(struct cfq_d
 	if (cfq_cfqq_idle_window(cfqq))
 		return true;
 
+	/* Don't idle on async and sync-idle service trees */
+	if (cfqd->serving_type != SYNC_NOIDLE_WORKLOAD)
+		return false;
 	/*
-	 * Otherwise, we do only if they are the last ones
-	 * in their service tree.
+	 * If there are other competing groups present, don't wait on service
+	 * tree if this is last queue in the group and there are no other
+	 * competing queues (sync-idle or async) queues present
 	 */
-	if (!service_tree)
-		service_tree = service_tree_for(cfqq->cfqg, prio,
-						cfqq_type(cfqq), cfqd);
-
-	if (service_tree->count == 0)
-		return true;
-
-	return (service_tree->count == 1 && cfq_rb_first(service_tree) == cfqq);
+	if (cfqd->nr_groups > 1 && !cfqd->cfq_group_idle)
+		return (service_tree->count == 1 && cfqg->nr_cfqq > 1);
+	else
+		return service_tree->count == 1;
 }
 
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
@@ -1102,9 +1106,6 @@ static inline bool cfqq_should_wait_busy
 	if (!RB_EMPTY_ROOT(&cfqq->sort_list) || cfqq->cfqg->nr_cfqq > 1)
 		return false;
 
-	if (!cfq_should_idle(cfqq->cfqd, cfqq))
-		return false;
-
 	return true;
 }
 
@@ -1801,7 +1802,10 @@ static bool cfq_arm_slice_timer(struct c
 	/*
 	 * idle is disabled, either manually or by past process history
 	 */
-	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq))
+	if (!cfqd->cfq_slice_idle)
+		return false;
+
+	if (!cfq_should_idle(cfqd, cfqq) && !wait_busy)
 		return false;
 
 	/*
@@ -1837,8 +1841,7 @@ static bool cfq_arm_slice_timer(struct c
 	 */
 	st = service_tree_for(cfqq->cfqg, cfqd->serving_prio,
 				SYNC_NOIDLE_WORKLOAD, cfqd);
-	if (!wait_busy && cfqd->serving_type == SYNC_NOIDLE_WORKLOAD
-	    && st->count > 0) {
+	if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD && st->count > 1) {
 		if (blk_queue_nonrot(cfqd->queue) || cfqd->hw_tag)
 			return false;
 		sl = min(sl, msecs_to_jiffies(CFQ_MIN_TT));

  parent reply	other threads:[~2009-11-18 15:34 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-16 20:51 [RFC] Block IO Controller V2 - some results Alan D. Brunelle
2009-11-16 21:14 ` Vivek Goyal
2009-11-16 21:32   ` Alan D. Brunelle
2009-11-16 21:37     ` Vivek Goyal
2009-11-16 22:18 ` Vivek Goyal
2009-11-17 12:38   ` Alan D. Brunelle
2009-11-17 14:14     ` Vivek Goyal
2009-11-17 16:17       ` Corrado Zoccolo
2009-11-17 16:40         ` Vivek Goyal
2009-11-17 17:30           ` Alan D. Brunelle
2009-11-17 17:44             ` Vivek Goyal
2009-11-17 20:59           ` Corrado Zoccolo
2009-11-17 22:38             ` Vivek Goyal
2009-11-17 23:11               ` Corrado Zoccolo
2009-11-19  0:04                 ` Vivek Goyal
2009-11-19 20:12                   ` Corrado Zoccolo
2009-11-17 16:45         ` Alan D. Brunelle
2009-11-18 15:32     ` Vivek Goyal [this message]
2009-11-18 16:20       ` Corrado Zoccolo
2009-11-18 22:56         ` Vivek Goyal
2009-11-18 23:35           ` Corrado Zoccolo
2009-11-20 14:18             ` Vivek Goyal
2009-11-20 14:28               ` Corrado Zoccolo
2009-11-20 15:04                 ` Vivek Goyal
2009-11-20 18:32                   ` Corrado Zoccolo
2009-11-20 18:42                     ` Vivek Goyal
2009-11-20 19:50                       ` Corrado Zoccolo
2009-11-21 17:57                         ` Corrado Zoccolo
2009-11-23 15:19                           ` Vivek Goyal
2009-11-23 16:22                             ` Corrado Zoccolo
2009-11-17 20:38 ` Alan D. Brunelle
2009-11-19 16:57   ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091118153227.GA5796@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=Alan.Brunelle@hp.com \
    --cc=czoccolo@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox