[RFC PATCH] cfq-iosched: Implement group idle V2

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] cfq-iosched: Implement group idle V2
@ 2010-07-19 17:14 Vivek Goyal
  2010-07-19 17:14 ` [PATCH 1/3] cfq-iosched: Improve time slice charging logic Vivek Goyal
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Vivek Goyal @ 2010-07-19 17:14 UTC (permalink / raw)
  To: linux-kernel, jens.axboe
  Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

Hi,

This is V2 of the group_idle implementation patchset. I have done some more
testing since V1 and fixed couple of bugs since V1.

What's the problem
------------------
On high end storage (I got on HP EVA storage array with 12 SATA disks in 
RAID 5), CFQ's model of dispatching requests from a single queue at a
time (sequential readers/write sync writers etc), becomes a bottleneck.
Often we don't drive enough request queue depth to keep all the disks busy
and suffer a lot in terms of overall throughput.

All these problems primarily originate from two things. Idling on per
cfq queue and quantum (dispatching limited number of requests from a
single queue) and till then not allowing dispatch from other queues. Once
you set the slice_idle=0 and quantum to higher value, most of the CFQ's
problem on higher end storage disappear.

This problem also becomes visible in IO controller where one creates
multiple groups and gets the fairness but overall throughput is less. In
the following table, I am running increasing number of sequential readers
(1,2,4,8) in 8 groups of weight 100 to 800.

Kernel=2.6.35-rc5-gi-sl-accounting+
GROUPMODE=1          NRGRP=8             
DIR=/mnt/iostestmnt/fio        DEV=/dev/dm-4                 
Workload=bsr      iosched=cfq     Filesz=512M bs=4K   
group_isolation=1 slice_idle=8    group_idle=8    quantum=8    
=========================================================================
AVERAGE[bsr]    [bw in KB/s]    
------- 
job     Set NR  test1  test2  test3  test4  test5  test6  test7  test8  total  
---     --- --  ---------------------------------------------------------------
bsr     1   1   6245   12776  16591  23471  28746  36799  43031  49778  217437 
bsr     1   2   5100   11063  17216  23136  23738  30391  35910  40874  187428 
bsr     1   4   4623   9718   14746  18356  22875  30407  33215  38073  172013 
bsr     1   8   4720   10143  13499  19115  22157  29126  31688  30784  161232 

Notice that overall throughput is just around 160MB/s with 8 sequential reader
in each group.

With this patch set, I have set slice_idle=0 and re-ran same test.

Kernel=2.6.35-rc5-gi-sl-accounting+
GROUPMODE=1          NRGRP=8             
DIR=/mnt/iostestmnt/fio        DEV=/dev/dm-4                 
Workload=bsr      iosched=cfq     Filesz=512M bs=4K   
group_isolation=1 slice_idle=0    group_idle=8    quantum=8    
=========================================================================
AVERAGE[bsr]    [bw in KB/s]    
------- 
job     Set NR  test1  test2  test3  test4  test5  test6  test7  test8  total  
---     --- --  ---------------------------------------------------------------
bsr     1   1   6789   12764  17174  23111  28528  36807  42753  48826  216752 
bsr     1   2   9845   20617  30521  39061  45514  51607  63683  63770  324618 
bsr     1   4   14835  24241  42444  55207  45914  51318  54661  60318  348938 
bsr     1   8   12022  24478  36732  48651  54333  60974  64856  72930  374976 

Notice how overall throughput has shot upto 374MB/s while retaining the ability
to do the IO control.

So this is not the default mode. This new tunable group_idle, allows one to
set slice_idle=0 to disable some of the CFQ features and and use primarily
group service differentation feature.

If you have thoughts on other ways of solving the problem, I am all ears
to it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] cfq-iosched: Improve time slice charging logic
  2010-07-19 17:14 [RFC PATCH] cfq-iosched: Implement group idle V2 Vivek Goyal
@ 2010-07-19 17:14 ` Vivek Goyal
  2010-07-19 17:14 ` [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle Vivek Goyal
  2010-07-19 17:14 ` [PATCH 3/3] cfq-iosched: Print per slice sectors dispatched in blktrace Vivek Goyal
  2 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2010-07-19 17:14 UTC (permalink / raw)
  To: linux-kernel, jens.axboe
  Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

- Currently in CFQ there are many situations where don't know how
  much time slice has been consumed by a queue. For example, all
  the random reader/writer queues where we don't idle on
  individual queues and we expire the queue either immediately
  after the request dispatch.

- In this case time consumed by a queue is just a memory copy
  operation. Actually time measurement is possible only if we
  idle on a queue and allow dispatch from a queue for significant
  amount of time.

- As of today, in such cases we calculate the time since the
  dispatch from the queue started and charge all that time.
  Generally this rounds to 1 jiffy but in some cases it can
  be more. For example, if we are driving high request queue
  depth and driver is too busy and does not ask for new
  requests for 8-10 jiffies. In such cases, the active queue
  gets charged very unfairly.

- So fundamentally, whole notion of charging for time slice
  is valid only if we have been idling on the queue. Otherwise
  in an NCQ queue, there might be other requests on the queue
  and we can not do the time slice calculation.

- This patch tweaks the slice charging logic a bit so that
  in the cases where we can't know the amount of time, we
  start charging in terms of number of requests dispatched
  (IOPS). This practically switching CFQ fairness model to
  fairness in terms of IOPS with slice_idle=0.

- As of today this will primarily be useful only with
  group_idle patches so that we get fairness in terms of
  IOPS across groups. The idea is that on fast storage
  one can run CFQ with slice_idle=0 and still get IO
  controller working without losing too much of
  throughput.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 7982b83..f44064c 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -896,16 +896,34 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
 		 * if there are mutiple queues in the group, each can dispatch
 		 * a single request on seeky media and cause lots of seek time
 		 * and group will never know it.
+		 *
+		 * If drive is NCQ and we are driving deep queue depths, then
+		 * it is not reasonable to charge the slice since dispatch
+		 * started because this time will include time taken by all
+		 * the other requests in the queue.
+		 *
+		 * Actually there is no reasonable way to know the disk time
+		 * here and we need to come up with some approximation. If
+		 * disk is non NCQ, we should be driving request queue depth
+		 * 1, then charge for time since dispatch start and this will
+		 * account for seek time properly on seeky media. If request
+		 * queue depth is high, then charge for number of requests
+		 * dispatched from the queue. This will sort of becoming
+		 * charging in terms of IOPS.
 		 */
-		slice_used = max_t(unsigned, (jiffies - cfqq->dispatch_start),
-					1);
+		if (cfqq->cfqd->hw_tag == 0)
+			slice_used = max_t(unsigned,
+					(jiffies - cfqq->dispatch_start), 1);
+		else
+			slice_used = cfqq->slice_dispatch;
 	} else {
 		slice_used = jiffies - cfqq->slice_start;
 		if (slice_used > cfqq->allocated_slice)
 			slice_used = cfqq->allocated_slice;
 	}
 
-	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used);
+	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u, sl_disp=%u", slice_used,
+			cfqq->slice_dispatch);
 	return slice_used;
 }
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle
  2010-07-19 17:14 [RFC PATCH] cfq-iosched: Implement group idle V2 Vivek Goyal
  2010-07-19 17:14 ` [PATCH 1/3] cfq-iosched: Improve time slice charging logic Vivek Goyal
@ 2010-07-19 17:14 ` Vivek Goyal
  2010-07-19 20:54   ` Divyesh Shah
  2010-07-19 17:14 ` [PATCH 3/3] cfq-iosched: Print per slice sectors dispatched in blktrace Vivek Goyal
  2 siblings, 1 reply; 9+ messages in thread
From: Vivek Goyal @ 2010-07-19 17:14 UTC (permalink / raw)
  To: linux-kernel, jens.axboe
  Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

o Implement a new tunable group_idle, which allows idling on the group
  instead of a cfq queue. Hence one can set slice_idle = 0 and not idle
  on the individual queues but idle on the group. This way on fast storage
  we can get fairness between groups at the same time overall throughput
  improves.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c |   60 +++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index f44064c..b23d7f4 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -30,6 +30,7 @@ static const int cfq_slice_sync = HZ / 10;
 static int cfq_slice_async = HZ / 25;
 static const int cfq_slice_async_rq = 2;
 static int cfq_slice_idle = HZ / 125;
+static int cfq_group_idle = HZ / 125;
 static const int cfq_target_latency = HZ * 3/10; /* 300 ms */
 static const int cfq_hist_divisor = 4;
 
@@ -198,6 +199,8 @@ struct cfq_group {
 	struct hlist_node cfqd_node;
 	atomic_t ref;
 #endif
+	/* number of requests that are on the dispatch list or inside driver */
+	int dispatched;
 };
 
 /*
@@ -271,6 +274,7 @@ struct cfq_data {
 	unsigned int cfq_slice[2];
 	unsigned int cfq_slice_async_rq;
 	unsigned int cfq_slice_idle;
+	unsigned int cfq_group_idle;
 	unsigned int cfq_latency;
 	unsigned int cfq_group_isolation;
 
@@ -1856,6 +1860,9 @@ static bool cfq_should_idle(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 	BUG_ON(!service_tree);
 	BUG_ON(!service_tree->count);
 
+	if (!cfqd->cfq_slice_idle)
+		return false;
+
 	/* We never do for idle class queues. */
 	if (prio == IDLE_WORKLOAD)
 		return false;
@@ -1880,7 +1887,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 {
 	struct cfq_queue *cfqq = cfqd->active_queue;
 	struct cfq_io_context *cic;
-	unsigned long sl;
+	unsigned long sl, group_idle = 0;
 
 	/*
 	 * SSD device without seek penalty, disable idling. But only do so
@@ -1896,8 +1903,13 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 	/*
 	 * idle is disabled, either manually or by past process history
 	 */
-	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq))
-		return;
+	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq)) {
+		/* no queue idling. Check for group idling */
+		if (cfqd->cfq_group_idle)
+			group_idle = cfqd->cfq_group_idle;
+		else
+			return;
+	}
 
 	/*
 	 * still active requests from this queue, don't idle
@@ -1924,13 +1936,21 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 		return;
 	}
 
+	/* There are other queues in the group, don't do group idle */
+	if (group_idle && cfqq->cfqg->nr_cfqq > 1)
+		return;
+
 	cfq_mark_cfqq_wait_request(cfqq);
 
-	sl = cfqd->cfq_slice_idle;
+	if (group_idle)
+		sl = cfqd->cfq_group_idle;
+	else
+		sl = cfqd->cfq_slice_idle;
 
 	mod_timer(&cfqd->idle_slice_timer, jiffies + sl);
 	cfq_blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg);
-	cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu", sl);
+	cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu group_idle: %d", sl,
+			group_idle ? 1 : 0);
 }
 
 /*
@@ -1946,6 +1966,7 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
 	cfqq->next_rq = cfq_find_next_rq(cfqd, cfqq, rq);
 	cfq_remove_request(rq);
 	cfqq->dispatched++;
+	(RQ_CFQG(rq))->dispatched++;
 	elv_dispatch_sort(q, rq);
 
 	cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
@@ -2215,7 +2236,7 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
 			cfqq = NULL;
 			goto keep_queue;
 		} else
-			goto expire;
+			goto check_group_idle;
 	}
 
 	/*
@@ -2249,6 +2270,17 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
 		goto keep_queue;
 	}
 
+	/*
+	 * If group idle is enabled and there are requests dispatched from
+	 * this group, wait for requests to complete.
+	 */
+check_group_idle:
+	if (cfqd->cfq_group_idle && cfqq->cfqg->nr_cfqq == 1
+	    && cfqq->cfqg->dispatched) {
+		cfqq = NULL;
+		goto keep_queue;
+	}
+
 expire:
 	cfq_slice_expired(cfqd, 0);
 new_queue:
@@ -3391,6 +3423,7 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
 	WARN_ON(!cfqq->dispatched);
 	cfqd->rq_in_driver--;
 	cfqq->dispatched--;
+	(RQ_CFQG(rq))->dispatched--;
 	cfq_blkiocg_update_completion_stats(&cfqq->cfqg->blkg,
 			rq_start_time_ns(rq), rq_io_start_time_ns(rq),
 			rq_data_dir(rq), rq_is_sync(rq));
@@ -3420,7 +3453,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
 		 * the queue.
 		 */
 		if (cfq_should_wait_busy(cfqd, cfqq)) {
-			cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
+			unsigned long extend_sl = cfqd->cfq_slice_idle;
+			if (!cfqd->cfq_slice_idle)
+				extend_sl = cfqd->cfq_group_idle;
+			cfqq->slice_end = jiffies + extend_sl;
 			cfq_mark_cfqq_wait_busy(cfqq);
 			cfq_log_cfqq(cfqd, cfqq, "will busy wait");
 		}
@@ -3865,6 +3901,7 @@ static void *cfq_init_queue(struct request_queue *q)
 	cfqd->cfq_slice[1] = cfq_slice_sync;
 	cfqd->cfq_slice_async_rq = cfq_slice_async_rq;
 	cfqd->cfq_slice_idle = cfq_slice_idle;
+	cfqd->cfq_group_idle = cfq_group_idle;
 	cfqd->cfq_latency = 1;
 	cfqd->cfq_group_isolation = 0;
 	cfqd->hw_tag = -1;
@@ -3937,6 +3974,7 @@ SHOW_FUNCTION(cfq_fifo_expire_async_show, cfqd->cfq_fifo_expire[0], 1);
 SHOW_FUNCTION(cfq_back_seek_max_show, cfqd->cfq_back_max, 0);
 SHOW_FUNCTION(cfq_back_seek_penalty_show, cfqd->cfq_back_penalty, 0);
 SHOW_FUNCTION(cfq_slice_idle_show, cfqd->cfq_slice_idle, 1);
+SHOW_FUNCTION(cfq_group_idle_show, cfqd->cfq_group_idle, 1);
 SHOW_FUNCTION(cfq_slice_sync_show, cfqd->cfq_slice[1], 1);
 SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1);
 SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0);
@@ -3969,6 +4007,7 @@ STORE_FUNCTION(cfq_back_seek_max_store, &cfqd->cfq_back_max, 0, UINT_MAX, 0);
 STORE_FUNCTION(cfq_back_seek_penalty_store, &cfqd->cfq_back_penalty, 1,
 		UINT_MAX, 0);
 STORE_FUNCTION(cfq_slice_idle_store, &cfqd->cfq_slice_idle, 0, UINT_MAX, 1);
+STORE_FUNCTION(cfq_group_idle_store, &cfqd->cfq_group_idle, 0, UINT_MAX, 1);
 STORE_FUNCTION(cfq_slice_sync_store, &cfqd->cfq_slice[1], 1, UINT_MAX, 1);
 STORE_FUNCTION(cfq_slice_async_store, &cfqd->cfq_slice[0], 1, UINT_MAX, 1);
 STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1,
@@ -3990,6 +4029,7 @@ static struct elv_fs_entry cfq_attrs[] = {
 	CFQ_ATTR(slice_async),
 	CFQ_ATTR(slice_async_rq),
 	CFQ_ATTR(slice_idle),
+	CFQ_ATTR(group_idle),
 	CFQ_ATTR(low_latency),
 	CFQ_ATTR(group_isolation),
 	__ATTR_NULL
@@ -4043,6 +4083,12 @@ static int __init cfq_init(void)
 	if (!cfq_slice_idle)
 		cfq_slice_idle = 1;
 
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
+	if (!cfq_group_idle)
+		cfq_group_idle = 1;
+#else
+		cfq_group_idle = 0;
+#endif
 	if (cfq_slab_setup())
 		return -ENOMEM;
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle
  2010-07-19 17:14 ` [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle Vivek Goyal
@ 2010-07-19 20:54   ` Divyesh Shah
  2010-07-19 21:04     ` Vivek Goyal
  0 siblings, 1 reply; 9+ messages in thread
From: Divyesh Shah @ 2010-07-19 20:54 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-kernel, jens.axboe, nauman, guijianfeng, jmoyer, czoccolo

On Mon, Jul 19, 2010 at 10:14 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> @@ -3420,7 +3453,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
>                 * the queue.
>                 */
>                if (cfq_should_wait_busy(cfqd, cfqq)) {
> -                       cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
> +                       unsigned long extend_sl = cfqd->cfq_slice_idle;
> +                       if (!cfqd->cfq_slice_idle)
> +                               extend_sl = cfqd->cfq_group_idle;
> +                       cfqq->slice_end = jiffies + extend_sl;
>                        cfq_mark_cfqq_wait_busy(cfqq);
>                        cfq_log_cfqq(cfqd, cfqq, "will busy wait");
>                }

Vivek, I haven't looked at this particular code snippet for some time.
Can you tell me why we add the slice_idle (or w/ your change
extend_sl) to slice_end instead of arming the idle timer with that
amount of time?

-Divyesh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle
  2010-07-19 20:54   ` Divyesh Shah
@ 2010-07-19 21:04     ` Vivek Goyal
  0 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2010-07-19 21:04 UTC (permalink / raw)
  To: Divyesh Shah
  Cc: linux-kernel, jens.axboe, nauman, guijianfeng, jmoyer, czoccolo

On Mon, Jul 19, 2010 at 01:54:53PM -0700, Divyesh Shah wrote:
> On Mon, Jul 19, 2010 at 10:14 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > @@ -3420,7 +3453,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> >                 * the queue.
> >                 */
> >                if (cfq_should_wait_busy(cfqd, cfqq)) {
> > -                       cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
> > +                       unsigned long extend_sl = cfqd->cfq_slice_idle;
> > +                       if (!cfqd->cfq_slice_idle)
> > +                               extend_sl = cfqd->cfq_group_idle;
> > +                       cfqq->slice_end = jiffies + extend_sl;
> >                        cfq_mark_cfqq_wait_busy(cfqq);
> >                        cfq_log_cfqq(cfqd, cfqq, "will busy wait");
> >                }
> 
> Vivek, I haven't looked at this particular code snippet for some time.
> Can you tell me why we add the slice_idle (or w/ your change
> extend_sl) to slice_end instead of arming the idle timer with that
> amount of time?

Divyesh,

With wait busy we do arm the slice time also. wait busy is just saying
that extend the slice a bit so that we do arm the timer and select_queue()
does not expire the queue right away.

Vivek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 3/3] cfq-iosched: Print per slice sectors dispatched in blktrace
  2010-07-19 17:14 [RFC PATCH] cfq-iosched: Implement group idle V2 Vivek Goyal
  2010-07-19 17:14 ` [PATCH 1/3] cfq-iosched: Improve time slice charging logic Vivek Goyal
  2010-07-19 17:14 ` [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle Vivek Goyal
@ 2010-07-19 17:14 ` Vivek Goyal
  2 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2010-07-19 17:14 UTC (permalink / raw)
  To: linux-kernel, jens.axboe
  Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

- Divyesh had gotten rid of this code in the past. I want to re-introduce it
  back as it helps me a lot during debugging.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index b23d7f4..1f0d4ea 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -148,6 +148,8 @@ struct cfq_queue {
 	struct cfq_queue *new_cfqq;
 	struct cfq_group *cfqg;
 	struct cfq_group *orig_cfqg;
+	/* Number of sectors dispatched from queue in single dispatch round */
+	unsigned long nr_sectors;
 };
 
 /*
@@ -926,8 +928,8 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
 			slice_used = cfqq->allocated_slice;
 	}
 
-	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u, sl_disp=%u", slice_used,
-			cfqq->slice_dispatch);
+	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u, disp=%u sect=%u",
+			slice_used, cfqq->slice_dispatch, cfqq->nr_sectors);
 	return slice_used;
 }
 
@@ -1608,6 +1610,7 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd,
 		cfqq->allocated_slice = 0;
 		cfqq->slice_end = 0;
 		cfqq->slice_dispatch = 0;
+		cfqq->nr_sectors = 0;
 
 		cfq_clear_cfqq_wait_request(cfqq);
 		cfq_clear_cfqq_must_dispatch(cfqq);
@@ -1970,6 +1973,7 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
 	elv_dispatch_sort(q, rq);
 
 	cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
+	cfqq->nr_sectors += blk_rq_sectors(rq);
 	cfq_blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq),
 					rq_data_dir(rq), rq_is_sync(rq));
 }
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [RFC PATCH] cfq-iosched: Implement group idle V2
@ 2010-07-19 17:20 Vivek Goyal
  2010-07-19 17:20 ` [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle Vivek Goyal
  0 siblings, 1 reply; 9+ messages in thread
From: Vivek Goyal @ 2010-07-19 17:20 UTC (permalink / raw)
  To: linux-kernel, axboe; +Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

[ Got Jens's mail id wrong in last post hence reposting. Sorry for cluttering
your mailboxes.]

Hi,

This is V2 of the group_idle implementation patchset. I have done some more
testing since V1 and fixed couple of bugs since V1.

What's the problem
------------------
On high end storage (I got on HP EVA storage array with 12 SATA disks in 
RAID 5), CFQ's model of dispatching requests from a single queue at a
time (sequential readers/write sync writers etc), becomes a bottleneck.
Often we don't drive enough request queue depth to keep all the disks busy
and suffer a lot in terms of overall throughput.

All these problems primarily originate from two things. Idling on per
cfq queue and quantum (dispatching limited number of requests from a
single queue) and till then not allowing dispatch from other queues. Once
you set the slice_idle=0 and quantum to higher value, most of the CFQ's
problem on higher end storage disappear.

This problem also becomes visible in IO controller where one creates
multiple groups and gets the fairness but overall throughput is less. In
the following table, I am running increasing number of sequential readers
(1,2,4,8) in 8 groups of weight 100 to 800.

Kernel=2.6.35-rc5-gi-sl-accounting+
GROUPMODE=1          NRGRP=8             
DIR=/mnt/iostestmnt/fio        DEV=/dev/dm-4                 
Workload=bsr      iosched=cfq     Filesz=512M bs=4K   
group_isolation=1 slice_idle=8    group_idle=8    quantum=8    
=========================================================================
AVERAGE[bsr]    [bw in KB/s]    
------- 
job     Set NR  test1  test2  test3  test4  test5  test6  test7  test8  total  
---     --- --  ---------------------------------------------------------------
bsr     1   1   6245   12776  16591  23471  28746  36799  43031  49778  217437 
bsr     1   2   5100   11063  17216  23136  23738  30391  35910  40874  187428 
bsr     1   4   4623   9718   14746  18356  22875  30407  33215  38073  172013 
bsr     1   8   4720   10143  13499  19115  22157  29126  31688  30784  161232 

Notice that overall throughput is just around 160MB/s with 8 sequential reader
in each group.

With this patch set, I have set slice_idle=0 and re-ran same test.

Kernel=2.6.35-rc5-gi-sl-accounting+
GROUPMODE=1          NRGRP=8             
DIR=/mnt/iostestmnt/fio        DEV=/dev/dm-4                 
Workload=bsr      iosched=cfq     Filesz=512M bs=4K   
group_isolation=1 slice_idle=0    group_idle=8    quantum=8    
=========================================================================
AVERAGE[bsr]    [bw in KB/s]    
------- 
job     Set NR  test1  test2  test3  test4  test5  test6  test7  test8  total  
---     --- --  ---------------------------------------------------------------
bsr     1   1   6789   12764  17174  23111  28528  36807  42753  48826  216752 
bsr     1   2   9845   20617  30521  39061  45514  51607  63683  63770  324618 
bsr     1   4   14835  24241  42444  55207  45914  51318  54661  60318  348938 
bsr     1   8   12022  24478  36732  48651  54333  60974  64856  72930  374976 

Notice how overall throughput has shot upto 374MB/s while retaining the ability
to do the IO control.

So this is not the default mode. This new tunable group_idle, allows one to
set slice_idle=0 to disable some of the CFQ features and and use primarily
group service differentation feature.

If you have thoughts on other ways of solving the problem, I am all ears
to it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle
  2010-07-19 17:20 [RFC PATCH] cfq-iosched: Implement group idle V2 Vivek Goyal
@ 2010-07-19 17:20 ` Vivek Goyal
  2010-07-19 18:58   ` Jeff Moyer
  0 siblings, 1 reply; 9+ messages in thread
From: Vivek Goyal @ 2010-07-19 17:20 UTC (permalink / raw)
  To: linux-kernel, axboe; +Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

o Implement a new tunable group_idle, which allows idling on the group
  instead of a cfq queue. Hence one can set slice_idle = 0 and not idle
  on the individual queues but idle on the group. This way on fast storage
  we can get fairness between groups at the same time overall throughput
  improves.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c |   60 +++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index f44064c..b23d7f4 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -30,6 +30,7 @@ static const int cfq_slice_sync = HZ / 10;
 static int cfq_slice_async = HZ / 25;
 static const int cfq_slice_async_rq = 2;
 static int cfq_slice_idle = HZ / 125;
+static int cfq_group_idle = HZ / 125;
 static const int cfq_target_latency = HZ * 3/10; /* 300 ms */
 static const int cfq_hist_divisor = 4;
 
@@ -198,6 +199,8 @@ struct cfq_group {
 	struct hlist_node cfqd_node;
 	atomic_t ref;
 #endif
+	/* number of requests that are on the dispatch list or inside driver */
+	int dispatched;
 };
 
 /*
@@ -271,6 +274,7 @@ struct cfq_data {
 	unsigned int cfq_slice[2];
 	unsigned int cfq_slice_async_rq;
 	unsigned int cfq_slice_idle;
+	unsigned int cfq_group_idle;
 	unsigned int cfq_latency;
 	unsigned int cfq_group_isolation;
 
@@ -1856,6 +1860,9 @@ static bool cfq_should_idle(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 	BUG_ON(!service_tree);
 	BUG_ON(!service_tree->count);
 
+	if (!cfqd->cfq_slice_idle)
+		return false;
+
 	/* We never do for idle class queues. */
 	if (prio == IDLE_WORKLOAD)
 		return false;
@@ -1880,7 +1887,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 {
 	struct cfq_queue *cfqq = cfqd->active_queue;
 	struct cfq_io_context *cic;
-	unsigned long sl;
+	unsigned long sl, group_idle = 0;
 
 	/*
 	 * SSD device without seek penalty, disable idling. But only do so
@@ -1896,8 +1903,13 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 	/*
 	 * idle is disabled, either manually or by past process history
 	 */
-	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq))
-		return;
+	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq)) {
+		/* no queue idling. Check for group idling */
+		if (cfqd->cfq_group_idle)
+			group_idle = cfqd->cfq_group_idle;
+		else
+			return;
+	}
 
 	/*
 	 * still active requests from this queue, don't idle
@@ -1924,13 +1936,21 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 		return;
 	}
 
+	/* There are other queues in the group, don't do group idle */
+	if (group_idle && cfqq->cfqg->nr_cfqq > 1)
+		return;
+
 	cfq_mark_cfqq_wait_request(cfqq);
 
-	sl = cfqd->cfq_slice_idle;
+	if (group_idle)
+		sl = cfqd->cfq_group_idle;
+	else
+		sl = cfqd->cfq_slice_idle;
 
 	mod_timer(&cfqd->idle_slice_timer, jiffies + sl);
 	cfq_blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg);
-	cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu", sl);
+	cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu group_idle: %d", sl,
+			group_idle ? 1 : 0);
 }
 
 /*
@@ -1946,6 +1966,7 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
 	cfqq->next_rq = cfq_find_next_rq(cfqd, cfqq, rq);
 	cfq_remove_request(rq);
 	cfqq->dispatched++;
+	(RQ_CFQG(rq))->dispatched++;
 	elv_dispatch_sort(q, rq);
 
 	cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
@@ -2215,7 +2236,7 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
 			cfqq = NULL;
 			goto keep_queue;
 		} else
-			goto expire;
+			goto check_group_idle;
 	}
 
 	/*
@@ -2249,6 +2270,17 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
 		goto keep_queue;
 	}
 
+	/*
+	 * If group idle is enabled and there are requests dispatched from
+	 * this group, wait for requests to complete.
+	 */
+check_group_idle:
+	if (cfqd->cfq_group_idle && cfqq->cfqg->nr_cfqq == 1
+	    && cfqq->cfqg->dispatched) {
+		cfqq = NULL;
+		goto keep_queue;
+	}
+
 expire:
 	cfq_slice_expired(cfqd, 0);
 new_queue:
@@ -3391,6 +3423,7 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
 	WARN_ON(!cfqq->dispatched);
 	cfqd->rq_in_driver--;
 	cfqq->dispatched--;
+	(RQ_CFQG(rq))->dispatched--;
 	cfq_blkiocg_update_completion_stats(&cfqq->cfqg->blkg,
 			rq_start_time_ns(rq), rq_io_start_time_ns(rq),
 			rq_data_dir(rq), rq_is_sync(rq));
@@ -3420,7 +3453,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
 		 * the queue.
 		 */
 		if (cfq_should_wait_busy(cfqd, cfqq)) {
-			cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
+			unsigned long extend_sl = cfqd->cfq_slice_idle;
+			if (!cfqd->cfq_slice_idle)
+				extend_sl = cfqd->cfq_group_idle;
+			cfqq->slice_end = jiffies + extend_sl;
 			cfq_mark_cfqq_wait_busy(cfqq);
 			cfq_log_cfqq(cfqd, cfqq, "will busy wait");
 		}
@@ -3865,6 +3901,7 @@ static void *cfq_init_queue(struct request_queue *q)
 	cfqd->cfq_slice[1] = cfq_slice_sync;
 	cfqd->cfq_slice_async_rq = cfq_slice_async_rq;
 	cfqd->cfq_slice_idle = cfq_slice_idle;
+	cfqd->cfq_group_idle = cfq_group_idle;
 	cfqd->cfq_latency = 1;
 	cfqd->cfq_group_isolation = 0;
 	cfqd->hw_tag = -1;
@@ -3937,6 +3974,7 @@ SHOW_FUNCTION(cfq_fifo_expire_async_show, cfqd->cfq_fifo_expire[0], 1);
 SHOW_FUNCTION(cfq_back_seek_max_show, cfqd->cfq_back_max, 0);
 SHOW_FUNCTION(cfq_back_seek_penalty_show, cfqd->cfq_back_penalty, 0);
 SHOW_FUNCTION(cfq_slice_idle_show, cfqd->cfq_slice_idle, 1);
+SHOW_FUNCTION(cfq_group_idle_show, cfqd->cfq_group_idle, 1);
 SHOW_FUNCTION(cfq_slice_sync_show, cfqd->cfq_slice[1], 1);
 SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1);
 SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0);
@@ -3969,6 +4007,7 @@ STORE_FUNCTION(cfq_back_seek_max_store, &cfqd->cfq_back_max, 0, UINT_MAX, 0);
 STORE_FUNCTION(cfq_back_seek_penalty_store, &cfqd->cfq_back_penalty, 1,
 		UINT_MAX, 0);
 STORE_FUNCTION(cfq_slice_idle_store, &cfqd->cfq_slice_idle, 0, UINT_MAX, 1);
+STORE_FUNCTION(cfq_group_idle_store, &cfqd->cfq_group_idle, 0, UINT_MAX, 1);
 STORE_FUNCTION(cfq_slice_sync_store, &cfqd->cfq_slice[1], 1, UINT_MAX, 1);
 STORE_FUNCTION(cfq_slice_async_store, &cfqd->cfq_slice[0], 1, UINT_MAX, 1);
 STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1,
@@ -3990,6 +4029,7 @@ static struct elv_fs_entry cfq_attrs[] = {
 	CFQ_ATTR(slice_async),
 	CFQ_ATTR(slice_async_rq),
 	CFQ_ATTR(slice_idle),
+	CFQ_ATTR(group_idle),
 	CFQ_ATTR(low_latency),
 	CFQ_ATTR(group_isolation),
 	__ATTR_NULL
@@ -4043,6 +4083,12 @@ static int __init cfq_init(void)
 	if (!cfq_slice_idle)
 		cfq_slice_idle = 1;
 
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
+	if (!cfq_group_idle)
+		cfq_group_idle = 1;
+#else
+		cfq_group_idle = 0;
+#endif
 	if (cfq_slab_setup())
 		return -ENOMEM;
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle
  2010-07-19 17:20 ` [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle Vivek Goyal
@ 2010-07-19 18:58   ` Jeff Moyer
  2010-07-19 20:20     ` Vivek Goyal
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Moyer @ 2010-07-19 18:58 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel, axboe, nauman, dpshah, guijianfeng, czoccolo

Vivek Goyal <vgoyal@redhat.com> writes:

> o Implement a new tunable group_idle, which allows idling on the group
>   instead of a cfq queue. Hence one can set slice_idle = 0 and not idle
>   on the individual queues but idle on the group. This way on fast storage
>   we can get fairness between groups at the same time overall throughput
>   improves.
>
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  block/cfq-iosched.c |   60 +++++++++++++++++++++++++++++++++++++++++++++------
>  1 files changed, 53 insertions(+), 7 deletions(-)
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index f44064c..b23d7f4 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -30,6 +30,7 @@ static const int cfq_slice_sync = HZ / 10;
>  static int cfq_slice_async = HZ / 25;
>  static const int cfq_slice_async_rq = 2;
>  static int cfq_slice_idle = HZ / 125;
> +static int cfq_group_idle = HZ / 125;
>  static const int cfq_target_latency = HZ * 3/10; /* 300 ms */
>  static const int cfq_hist_divisor = 4;
>  
> @@ -198,6 +199,8 @@ struct cfq_group {
>  	struct hlist_node cfqd_node;
>  	atomic_t ref;
>  #endif
> +	/* number of requests that are on the dispatch list or inside driver */
> +	int dispatched;
>  };
>  
>  /*
> @@ -271,6 +274,7 @@ struct cfq_data {
>  	unsigned int cfq_slice[2];
>  	unsigned int cfq_slice_async_rq;
>  	unsigned int cfq_slice_idle;
> +	unsigned int cfq_group_idle;
>  	unsigned int cfq_latency;
>  	unsigned int cfq_group_isolation;
>  
> @@ -1856,6 +1860,9 @@ static bool cfq_should_idle(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>  	BUG_ON(!service_tree);
>  	BUG_ON(!service_tree->count);
>  
> +	if (!cfqd->cfq_slice_idle)
> +		return false;
> +
>  	/* We never do for idle class queues. */
>  	if (prio == IDLE_WORKLOAD)
>  		return false;
> @@ -1880,7 +1887,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
>  {
>  	struct cfq_queue *cfqq = cfqd->active_queue;
>  	struct cfq_io_context *cic;
> -	unsigned long sl;
> +	unsigned long sl, group_idle = 0;
>  
>  	/*
>  	 * SSD device without seek penalty, disable idling. But only do so
> @@ -1896,8 +1903,13 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
>  	/*
>  	 * idle is disabled, either manually or by past process history
>  	 */
> -	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq))
> -		return;
> +	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq)) {

The check for cfqd->cfq_slice_idle is now redundant (as it's done in
cfq_should_idle).

> @@ -2215,7 +2236,7 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
>  			cfqq = NULL;
>  			goto keep_queue;
>  		} else
> -			goto expire;
> +			goto check_group_idle;
>  	}
>  
>  	/*
> @@ -2249,6 +2270,17 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
>  		goto keep_queue;
>  	}
>  
> +	/*
> +	 * If group idle is enabled and there are requests dispatched from
> +	 * this group, wait for requests to complete.
> +	 */
> +check_group_idle:
> +	if (cfqd->cfq_group_idle && cfqq->cfqg->nr_cfqq == 1
> +	    && cfqq->cfqg->dispatched) {
> +		cfqq = NULL;
> +		goto keep_queue;
> +	}

I really wish we could roll all of this logic into cfq_should_idle.

> @@ -3420,7 +3453,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
>  		 * the queue.
>  		 */
>  		if (cfq_should_wait_busy(cfqd, cfqq)) {
> -			cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
> +			unsigned long extend_sl = cfqd->cfq_slice_idle;
> +			if (!cfqd->cfq_slice_idle)
> +				extend_sl = cfqd->cfq_group_idle;
> +			cfqq->slice_end = jiffies + extend_sl;

So, if slice_idle and group_idle are both set, slice_idle trumps
group_idle?  Did you give that case any thought?  If it doesn't make
sense to configure both, then we should probably make sure they can't
both be set.

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle
  2010-07-19 18:58   ` Jeff Moyer
@ 2010-07-19 20:20     ` Vivek Goyal
  0 siblings, 0 replies; 9+ messages in thread
From: Vivek Goyal @ 2010-07-19 20:20 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel, axboe, nauman, dpshah, guijianfeng, czoccolo

On Mon, Jul 19, 2010 at 02:58:41PM -0400, Jeff Moyer wrote:
> Vivek Goyal <vgoyal@redhat.com> writes:
> 
> > o Implement a new tunable group_idle, which allows idling on the group
> >   instead of a cfq queue. Hence one can set slice_idle = 0 and not idle
> >   on the individual queues but idle on the group. This way on fast storage
> >   we can get fairness between groups at the same time overall throughput
> >   improves.
> >
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  block/cfq-iosched.c |   60 +++++++++++++++++++++++++++++++++++++++++++++------
> >  1 files changed, 53 insertions(+), 7 deletions(-)
> >
> > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> > index f44064c..b23d7f4 100644
> > --- a/block/cfq-iosched.c
> > +++ b/block/cfq-iosched.c
> > @@ -30,6 +30,7 @@ static const int cfq_slice_sync = HZ / 10;
> >  static int cfq_slice_async = HZ / 25;
> >  static const int cfq_slice_async_rq = 2;
> >  static int cfq_slice_idle = HZ / 125;
> > +static int cfq_group_idle = HZ / 125;
> >  static const int cfq_target_latency = HZ * 3/10; /* 300 ms */
> >  static const int cfq_hist_divisor = 4;
> >  
> > @@ -198,6 +199,8 @@ struct cfq_group {
> >  	struct hlist_node cfqd_node;
> >  	atomic_t ref;
> >  #endif
> > +	/* number of requests that are on the dispatch list or inside driver */
> > +	int dispatched;
> >  };
> >  
> >  /*
> > @@ -271,6 +274,7 @@ struct cfq_data {
> >  	unsigned int cfq_slice[2];
> >  	unsigned int cfq_slice_async_rq;
> >  	unsigned int cfq_slice_idle;
> > +	unsigned int cfq_group_idle;
> >  	unsigned int cfq_latency;
> >  	unsigned int cfq_group_isolation;
> >  
> > @@ -1856,6 +1860,9 @@ static bool cfq_should_idle(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> >  	BUG_ON(!service_tree);
> >  	BUG_ON(!service_tree->count);
> >  
> > +	if (!cfqd->cfq_slice_idle)
> > +		return false;
> > +
> >  	/* We never do for idle class queues. */
> >  	if (prio == IDLE_WORKLOAD)
> >  		return false;
> > @@ -1880,7 +1887,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
> >  {
> >  	struct cfq_queue *cfqq = cfqd->active_queue;
> >  	struct cfq_io_context *cic;
> > -	unsigned long sl;
> > +	unsigned long sl, group_idle = 0;
> >  
> >  	/*
> >  	 * SSD device without seek penalty, disable idling. But only do so
> > @@ -1896,8 +1903,13 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
> >  	/*
> >  	 * idle is disabled, either manually or by past process history
> >  	 */
> > -	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq))
> > -		return;
> > +	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq)) {
> 
> The check for cfqd->cfq_slice_idle is now redundant (as it's done in
> cfq_should_idle).

Yep. I will get rid of extra check.

> 
> > @@ -2215,7 +2236,7 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
> >  			cfqq = NULL;
> >  			goto keep_queue;
> >  		} else
> > -			goto expire;
> > +			goto check_group_idle;
> >  	}
> >  
> >  	/*
> > @@ -2249,6 +2270,17 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
> >  		goto keep_queue;
> >  	}
> >  
> > +	/*
> > +	 * If group idle is enabled and there are requests dispatched from
> > +	 * this group, wait for requests to complete.
> > +	 */
> > +check_group_idle:
> > +	if (cfqd->cfq_group_idle && cfqq->cfqg->nr_cfqq == 1
> > +	    && cfqq->cfqg->dispatched) {
> > +		cfqq = NULL;
> > +		goto keep_queue;
> > +	}
> 
> I really wish we could roll all of this logic into cfq_should_idle.

Currently whether to idle on queue or not logic is also part of select_queue().
So to keep it same it makes sense to keep group logic also in select_queue().

> 
> > @@ -3420,7 +3453,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> >  		 * the queue.
> >  		 */
> >  		if (cfq_should_wait_busy(cfqd, cfqq)) {
> > -			cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
> > +			unsigned long extend_sl = cfqd->cfq_slice_idle;
> > +			if (!cfqd->cfq_slice_idle)
> > +				extend_sl = cfqd->cfq_group_idle;
> > +			cfqq->slice_end = jiffies + extend_sl;
> 
> So, if slice_idle and group_idle are both set, slice_idle trumps
> group_idle?  Did you give that case any thought?  If it doesn't make
> sense to configure both, then we should probably make sure they can't
> both be set.

Actually, the wait busy logic makes sense in case of slice_idle, where we
can give an extended slice to a queue and to make sure a queue/group does
not get deleted immediately after completing the slice and hence losing
the share.

Now in case of slice_idle=0, if there is only one queue in the group then
it becomes the same case as one queue in the group with slice_idle=8.

So yes, slice_idle trumps group_idle. group_idle kicks in only if
slice_idle=0. I think by default it being set it does not harm. What good
it will do if slice_idle=8 and group_idle=0. Instead it will become
more work for user to also set one more tunable if he plans to set
slice_idle=0.

Thanks
Vivek



> 
> Cheers,
> Jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-07-19 21:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-19 17:14 [RFC PATCH] cfq-iosched: Implement group idle V2 Vivek Goyal
2010-07-19 17:14 ` [PATCH 1/3] cfq-iosched: Improve time slice charging logic Vivek Goyal
2010-07-19 17:14 ` [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle Vivek Goyal
2010-07-19 20:54   ` Divyesh Shah
2010-07-19 21:04     ` Vivek Goyal
2010-07-19 17:14 ` [PATCH 3/3] cfq-iosched: Print per slice sectors dispatched in blktrace Vivek Goyal
  -- strict thread matches above, loose matches on Subject: below --
2010-07-19 17:20 [RFC PATCH] cfq-iosched: Implement group idle V2 Vivek Goyal
2010-07-19 17:20 ` [PATCH 2/3] cfq-iosched: Implement a new tunable group_idle Vivek Goyal
2010-07-19 18:58   ` Jeff Moyer
2010-07-19 20:20     ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox