[RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
@ 2010-07-22 21:29 Vivek Goyal
  2010-07-22 21:29 ` [PATCH 1/5] cfq-iosched: Do not idle on service tree if slice_idle=0 Vivek Goyal
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Vivek Goyal @ 2010-07-22 21:29 UTC (permalink / raw)
  To: linux-kernel, jaxboe
  Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

Hi,

This is V4 of the patchset which implements a new tunable group_idle and also
implements IOPS mode for group fairness. Following are changes since V3.

- Cleaned up the code a bit to make clear that IOPS mode is effective only
  for group scheduling and cfqq queue scheduling should not be affected. Note
  that currently cfqq uses slightly different algorithms for cfq queue and
  cfq group scheduling.

- Updated the documentation as per Christoph's comments.

What's the problem
------------------
On high end storage (I got on HP EVA storage array with 12 SATA disks in 
RAID 5), CFQ's model of dispatching requests from a single queue at a
time (sequential readers/write sync writers etc), becomes a bottleneck.
Often we don't drive enough request queue depth to keep all the disks busy
and suffer a lot in terms of overall throughput.

All these problems primarily originate from two things. Idling on per
cfq queue and quantum (dispatching limited number of requests from a
single queue) and till then not allowing dispatch from other queues. Once
you set the slice_idle=0 and quantum to higher value, most of the CFQ's
problem on higher end storage disappear.

This problem also becomes visible in IO controller where one creates
multiple groups and gets the fairness but overall throughput is less. In
the following table, I am running increasing number of sequential readers
(1,2,4,8) in 8 groups of weight 100 to 800.

Kernel=2.6.35-rc6-iops+       
GROUPMODE=1          NRGRP=8             
DIR=/mnt/iostestmnt/fio        DEV=/dev/dm-4                 
Workload=bsr      iosched=cfq     Filesz=512M bs=4K   
group_isolation=1 slice_idle=8    group_idle=8    quantum=8    
=========================================================================
AVERAGE[bsr]    [bw in KB/s]    
------- 
job     Set NR  cgrp1  cgrp2  cgrp3  cgrp4  cgrp5  cgrp6  cgrp7  cgrp8  total  
---     --- --  ---------------------------------------------------------------
bsr     1   1   6120   12596  16530  23408  28984  35579  42061  47335  212613 
bsr     1   2   5250   10545  16604  23717  24677  29997  36753  42571  190114 
bsr     1   4   4437   10372  12546  17231  26100  32241  38208  35419  176554 
bsr     1   8   4636   9367   11902  18948  24589  27472  30341  37262  164517 

Notice that overall throughput is just around 164MB/s with 8 sequential reader
in each group.

With this patch set, I have set slice_idle=0 and re-ran same test.

Kernel=2.6.35-rc6-iops+       
GROUPMODE=1          NRGRP=8             
DIR=/mnt/iostestmnt/fio        DEV=/dev/dm-4                 
Workload=bsr      iosched=cfq     Filesz=512M bs=4K   
group_isolation=1 slice_idle=0    group_idle=8    quantum=8    
=========================================================================
AVERAGE[bsr]    [bw in KB/s]    
------- 
job     Set NR  cgrp1  cgrp2  cgrp3  cgrp4  cgrp5  cgrp6  cgrp7  cgrp8  total  
---     --- --  ---------------------------------------------------------------
bsr     1   1   6548   12174  17870  24063  29992  35695  41439  47034  214815 
bsr     1   2   10299  20487  30460  39375  46812  52783  59455  64351  324022 
bsr     1   4   10648  21735  32565  43442  52756  59513  64425  70324  355408 
bsr     1   8   11818  24483  36779  48144  55623  62583  65478  72279  377187 

Notice how overall throughput has shot upto 377MB/s while retaining the ability
to do the IO control.

This patchset implements a CFQ group IOPS fairness mode where if slice_idle=0
and if storage supports NCQ, CFQ starts doing accounting in terms of number
of requests dispatched and not in terms of time for groups.

This patchset also implements a new tunable group_idle, which allows one to set
slice_idle=0 to disable slice idling on cfqq and service tree but still idle on
group to make sure we can achieve better throughput for certain workloads
(read sequential) and also be able to achive service differentation among groups.

If you have thoughts on other ways of solving the problem, I am all ears
to it.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/5] cfq-iosched: Do not idle on service tree if slice_idle=0
  2010-07-22 21:29 [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Vivek Goyal
@ 2010-07-22 21:29 ` Vivek Goyal
  2010-07-22 21:29 ` [PATCH 2/5] cfq-iosched: Implment IOPS mode for group scheduling Vivek Goyal
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Vivek Goyal @ 2010-07-22 21:29 UTC (permalink / raw)
  To: linux-kernel, jaxboe
  Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

o do not idle either on cfq queue or service tree if slice_idle=0. User does
  not want any queue or service tree idling. Currently even if slice_idle=0,
  we were idling on service tree.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 7982b83..c5ec2eb 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1838,6 +1838,9 @@ static bool cfq_should_idle(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 	BUG_ON(!service_tree);
 	BUG_ON(!service_tree->count);
 
+	if (!cfqd->cfq_slice_idle)
+		return false;
+
 	/* We never do for idle class queues. */
 	if (prio == IDLE_WORKLOAD)
 		return false;
@@ -1878,7 +1881,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 	/*
 	 * idle is disabled, either manually or by past process history
 	 */
-	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq))
+	if (!cfq_should_idle(cfqd, cfqq))
 		return;
 
 	/*
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/5] cfq-iosched: Implment IOPS mode for group scheduling
  2010-07-22 21:29 [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Vivek Goyal
  2010-07-22 21:29 ` [PATCH 1/5] cfq-iosched: Do not idle on service tree if slice_idle=0 Vivek Goyal
@ 2010-07-22 21:29 ` Vivek Goyal
  2010-07-27  5:47   ` Gui Jianfeng
  2010-07-22 21:29 ` [PATCH 3/5] cfq-iosched: Implement a tunable group_idle Vivek Goyal
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2010-07-22 21:29 UTC (permalink / raw)
  To: linux-kernel, jaxboe
  Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

o Implement another CFQ mode where we charge group in terms of number
  of requests dispatched instead of measuring the time. Measuring in terms
  of time is not possible when we are driving deeper queue depths and there
  are requests from multiple cfq queues in the request queue.

o This mode currently gets activated if one sets slice_idle=0 and associated
  disk supports NCQ. Again the idea is that on an NCQ disk with idling disabled
  most of the queues will dispatch 1 or more requests and then cfq queue
  expiry happens and we don't have a way to measure time. So start providing
  fairness in terms of IOPS.

o Currently IOPS mode works only with cfq group scheduling. CFQ is following
  different scheduling algorithms for queue and group scheduling. These IOPS
  stats are used only for group scheduling hence in non-croup mode nothing
  should change.

o For CFQ group scheduling one can disable slice idling so that we don't idle
  on queue and drive deeper request queue depths (achieving better throughput),
  at the same time group idle is enabled so one should get service
  differentiation among groups.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c |   30 ++++++++++++++++++++++++------
 1 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index c5ec2eb..9f82ec6 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -378,6 +378,21 @@ CFQ_CFQQ_FNS(wait_busy);
 			&cfqg->service_trees[i][j]: NULL) \
 
 
+static inline bool iops_mode(struct cfq_data *cfqd)
+{
+	/*
+	 * If we are not idling on queues and it is a NCQ drive, parallel
+	 * execution of requests is on and measuring time is not possible
+	 * in most of the cases until and unless we drive shallower queue
+	 * depths and that becomes a performance bottleneck. In such cases
+	 * switch to start providing fairness in terms of number of IOs.
+	 */
+	if (!cfqd->cfq_slice_idle && cfqd->hw_tag)
+		return true;
+	else
+		return false;
+}
+
 static inline enum wl_prio_t cfqq_prio(struct cfq_queue *cfqq)
 {
 	if (cfq_class_idle(cfqq))
@@ -905,7 +920,6 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
 			slice_used = cfqq->allocated_slice;
 	}
 
-	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used);
 	return slice_used;
 }
 
@@ -913,19 +927,21 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
 				struct cfq_queue *cfqq)
 {
 	struct cfq_rb_root *st = &cfqd->grp_service_tree;
-	unsigned int used_sl, charge_sl;
+	unsigned int used_sl, charge;
 	int nr_sync = cfqg->nr_cfqq - cfqg_busy_async_queues(cfqd, cfqg)
 			- cfqg->service_tree_idle.count;
 
 	BUG_ON(nr_sync < 0);
-	used_sl = charge_sl = cfq_cfqq_slice_usage(cfqq);
+	used_sl = charge = cfq_cfqq_slice_usage(cfqq);
 
-	if (!cfq_cfqq_sync(cfqq) && !nr_sync)
-		charge_sl = cfqq->allocated_slice;
+	if (iops_mode(cfqd))
+		charge = cfqq->slice_dispatch;
+	else if (!cfq_cfqq_sync(cfqq) && !nr_sync)
+		charge = cfqq->allocated_slice;
 
 	/* Can't update vdisktime while group is on service tree */
 	cfq_rb_erase(&cfqg->rb_node, st);
-	cfqg->vdisktime += cfq_scale_slice(charge_sl, cfqg);
+	cfqg->vdisktime += cfq_scale_slice(charge, cfqg);
 	__cfq_group_service_tree_add(st, cfqg);
 
 	/* This group is being expired. Save the context */
@@ -939,6 +955,8 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
 
 	cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime,
 					st->min_vdisktime);
+	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u disp=%u charge=%u iops=%u",
+			used_sl, cfqq->slice_dispatch, charge, iops_mode(cfqd));
 	cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl);
 	cfq_blkiocg_set_start_empty_time(&cfqg->blkg);
 }
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/5] cfq-iosched: Implement a tunable group_idle
  2010-07-22 21:29 [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Vivek Goyal
  2010-07-22 21:29 ` [PATCH 1/5] cfq-iosched: Do not idle on service tree if slice_idle=0 Vivek Goyal
  2010-07-22 21:29 ` [PATCH 2/5] cfq-iosched: Implment IOPS mode for group scheduling Vivek Goyal
@ 2010-07-22 21:29 ` Vivek Goyal
  2010-07-22 21:29 ` [PATCH 4/5] cfq-iosched: Print number of sectors dispatched per cfqq slice Vivek Goyal
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Vivek Goyal @ 2010-07-22 21:29 UTC (permalink / raw)
  To: linux-kernel, jaxboe
  Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

o Implement a new tunable group_idle, which allows idling on the group
  instead of a cfq queue. Hence one can set slice_idle = 0 and not idle
  on the individual queues but idle on the group. This way on fast storage
  we can get fairness between groups at the same time overall throughput
  improves.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c |   57 ++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 50 insertions(+), 7 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 9f82ec6..e172fa1 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -30,6 +30,7 @@ static const int cfq_slice_sync = HZ / 10;
 static int cfq_slice_async = HZ / 25;
 static const int cfq_slice_async_rq = 2;
 static int cfq_slice_idle = HZ / 125;
+static int cfq_group_idle = HZ / 125;
 static const int cfq_target_latency = HZ * 3/10; /* 300 ms */
 static const int cfq_hist_divisor = 4;
 
@@ -198,6 +199,8 @@ struct cfq_group {
 	struct hlist_node cfqd_node;
 	atomic_t ref;
 #endif
+	/* number of requests that are on the dispatch list or inside driver */
+	int dispatched;
 };
 
 /*
@@ -271,6 +274,7 @@ struct cfq_data {
 	unsigned int cfq_slice[2];
 	unsigned int cfq_slice_async_rq;
 	unsigned int cfq_slice_idle;
+	unsigned int cfq_group_idle;
 	unsigned int cfq_latency;
 	unsigned int cfq_group_isolation;
 
@@ -1883,7 +1887,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 {
 	struct cfq_queue *cfqq = cfqd->active_queue;
 	struct cfq_io_context *cic;
-	unsigned long sl;
+	unsigned long sl, group_idle = 0;
 
 	/*
 	 * SSD device without seek penalty, disable idling. But only do so
@@ -1899,8 +1903,13 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 	/*
 	 * idle is disabled, either manually or by past process history
 	 */
-	if (!cfq_should_idle(cfqd, cfqq))
-		return;
+	if (!cfq_should_idle(cfqd, cfqq)) {
+		/* no queue idling. Check for group idling */
+		if (cfqd->cfq_group_idle)
+			group_idle = cfqd->cfq_group_idle;
+		else
+			return;
+	}
 
 	/*
 	 * still active requests from this queue, don't idle
@@ -1927,13 +1936,21 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
 		return;
 	}
 
+	/* There are other queues in the group, don't do group idle */
+	if (group_idle && cfqq->cfqg->nr_cfqq > 1)
+		return;
+
 	cfq_mark_cfqq_wait_request(cfqq);
 
-	sl = cfqd->cfq_slice_idle;
+	if (group_idle)
+		sl = cfqd->cfq_group_idle;
+	else
+		sl = cfqd->cfq_slice_idle;
 
 	mod_timer(&cfqd->idle_slice_timer, jiffies + sl);
 	cfq_blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg);
-	cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu", sl);
+	cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu group_idle: %d", sl,
+			group_idle ? 1 : 0);
 }
 
 /*
@@ -1949,6 +1966,7 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
 	cfqq->next_rq = cfq_find_next_rq(cfqd, cfqq, rq);
 	cfq_remove_request(rq);
 	cfqq->dispatched++;
+	(RQ_CFQG(rq))->dispatched++;
 	elv_dispatch_sort(q, rq);
 
 	cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
@@ -2218,7 +2236,7 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
 			cfqq = NULL;
 			goto keep_queue;
 		} else
-			goto expire;
+			goto check_group_idle;
 	}
 
 	/*
@@ -2252,6 +2270,17 @@ static struct cfq_queue *cfq_select_queue(struct cfq_data *cfqd)
 		goto keep_queue;
 	}
 
+	/*
+	 * If group idle is enabled and there are requests dispatched from
+	 * this group, wait for requests to complete.
+	 */
+check_group_idle:
+	if (cfqd->cfq_group_idle && cfqq->cfqg->nr_cfqq == 1
+	    && cfqq->cfqg->dispatched) {
+		cfqq = NULL;
+		goto keep_queue;
+	}
+
 expire:
 	cfq_slice_expired(cfqd, 0);
 new_queue:
@@ -3394,6 +3423,7 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
 	WARN_ON(!cfqq->dispatched);
 	cfqd->rq_in_driver--;
 	cfqq->dispatched--;
+	(RQ_CFQG(rq))->dispatched--;
 	cfq_blkiocg_update_completion_stats(&cfqq->cfqg->blkg,
 			rq_start_time_ns(rq), rq_io_start_time_ns(rq),
 			rq_data_dir(rq), rq_is_sync(rq));
@@ -3423,7 +3453,10 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
 		 * the queue.
 		 */
 		if (cfq_should_wait_busy(cfqd, cfqq)) {
-			cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
+			unsigned long extend_sl = cfqd->cfq_slice_idle;
+			if (!cfqd->cfq_slice_idle)
+				extend_sl = cfqd->cfq_group_idle;
+			cfqq->slice_end = jiffies + extend_sl;
 			cfq_mark_cfqq_wait_busy(cfqq);
 			cfq_log_cfqq(cfqd, cfqq, "will busy wait");
 		}
@@ -3868,6 +3901,7 @@ static void *cfq_init_queue(struct request_queue *q)
 	cfqd->cfq_slice[1] = cfq_slice_sync;
 	cfqd->cfq_slice_async_rq = cfq_slice_async_rq;
 	cfqd->cfq_slice_idle = cfq_slice_idle;
+	cfqd->cfq_group_idle = cfq_group_idle;
 	cfqd->cfq_latency = 1;
 	cfqd->cfq_group_isolation = 0;
 	cfqd->hw_tag = -1;
@@ -3940,6 +3974,7 @@ SHOW_FUNCTION(cfq_fifo_expire_async_show, cfqd->cfq_fifo_expire[0], 1);
 SHOW_FUNCTION(cfq_back_seek_max_show, cfqd->cfq_back_max, 0);
 SHOW_FUNCTION(cfq_back_seek_penalty_show, cfqd->cfq_back_penalty, 0);
 SHOW_FUNCTION(cfq_slice_idle_show, cfqd->cfq_slice_idle, 1);
+SHOW_FUNCTION(cfq_group_idle_show, cfqd->cfq_group_idle, 1);
 SHOW_FUNCTION(cfq_slice_sync_show, cfqd->cfq_slice[1], 1);
 SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1);
 SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0);
@@ -3972,6 +4007,7 @@ STORE_FUNCTION(cfq_back_seek_max_store, &cfqd->cfq_back_max, 0, UINT_MAX, 0);
 STORE_FUNCTION(cfq_back_seek_penalty_store, &cfqd->cfq_back_penalty, 1,
 		UINT_MAX, 0);
 STORE_FUNCTION(cfq_slice_idle_store, &cfqd->cfq_slice_idle, 0, UINT_MAX, 1);
+STORE_FUNCTION(cfq_group_idle_store, &cfqd->cfq_group_idle, 0, UINT_MAX, 1);
 STORE_FUNCTION(cfq_slice_sync_store, &cfqd->cfq_slice[1], 1, UINT_MAX, 1);
 STORE_FUNCTION(cfq_slice_async_store, &cfqd->cfq_slice[0], 1, UINT_MAX, 1);
 STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1,
@@ -3993,6 +4029,7 @@ static struct elv_fs_entry cfq_attrs[] = {
 	CFQ_ATTR(slice_async),
 	CFQ_ATTR(slice_async_rq),
 	CFQ_ATTR(slice_idle),
+	CFQ_ATTR(group_idle),
 	CFQ_ATTR(low_latency),
 	CFQ_ATTR(group_isolation),
 	__ATTR_NULL
@@ -4046,6 +4083,12 @@ static int __init cfq_init(void)
 	if (!cfq_slice_idle)
 		cfq_slice_idle = 1;
 
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
+	if (!cfq_group_idle)
+		cfq_group_idle = 1;
+#else
+		cfq_group_idle = 0;
+#endif
 	if (cfq_slab_setup())
 		return -ENOMEM;
 
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/5] cfq-iosched: Print number of sectors dispatched per cfqq slice
  2010-07-22 21:29 [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Vivek Goyal
                   ` (2 preceding siblings ...)
  2010-07-22 21:29 ` [PATCH 3/5] cfq-iosched: Implement a tunable group_idle Vivek Goyal
@ 2010-07-22 21:29 ` Vivek Goyal
  2010-07-22 21:29 ` [PATCH 5/5] cfq-iosched: Documentation update Vivek Goyal
  2010-07-23 14:03 ` [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Heinz Diehl
  5 siblings, 0 replies; 26+ messages in thread
From: Vivek Goyal @ 2010-07-22 21:29 UTC (permalink / raw)
  To: linux-kernel, jaxboe
  Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

o Divyesh had gotten rid of this code in the past. I want to re-introduce it
  back as it helps me a lot during debugging.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Reviewed-by: Divyesh Shah <dpshah@google.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/cfq-iosched.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index e172fa1..147b3e8 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -148,6 +148,8 @@ struct cfq_queue {
 	struct cfq_queue *new_cfqq;
 	struct cfq_group *cfqg;
 	struct cfq_group *orig_cfqg;
+	/* Number of sectors dispatched from queue in single dispatch round */
+	unsigned long nr_sectors;
 };
 
 /*
@@ -959,8 +961,9 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
 
 	cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime,
 					st->min_vdisktime);
-	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u disp=%u charge=%u iops=%u",
-			used_sl, cfqq->slice_dispatch, charge, iops_mode(cfqd));
+	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u disp=%u charge=%u iops=%u"
+			" sect=%u", used_sl, cfqq->slice_dispatch, charge,
+			iops_mode(cfqd), cfqq->nr_sectors);
 	cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl);
 	cfq_blkiocg_set_start_empty_time(&cfqg->blkg);
 }
@@ -1608,6 +1611,7 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd,
 		cfqq->allocated_slice = 0;
 		cfqq->slice_end = 0;
 		cfqq->slice_dispatch = 0;
+		cfqq->nr_sectors = 0;
 
 		cfq_clear_cfqq_wait_request(cfqq);
 		cfq_clear_cfqq_must_dispatch(cfqq);
@@ -1970,6 +1974,7 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq)
 	elv_dispatch_sort(q, rq);
 
 	cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
+	cfqq->nr_sectors += blk_rq_sectors(rq);
 	cfq_blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq),
 					rq_data_dir(rq), rq_is_sync(rq));
 }
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/5] cfq-iosched: Documentation update
  2010-07-22 21:29 [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Vivek Goyal
                   ` (3 preceding siblings ...)
  2010-07-22 21:29 ` [PATCH 4/5] cfq-iosched: Print number of sectors dispatched per cfqq slice Vivek Goyal
@ 2010-07-22 21:29 ` Vivek Goyal
  2010-07-22 21:36   ` Randy Dunlap
  2010-07-23 14:03 ` [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Heinz Diehl
  5 siblings, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2010-07-22 21:29 UTC (permalink / raw)
  To: linux-kernel, jaxboe
  Cc: nauman, dpshah, guijianfeng, jmoyer, czoccolo, vgoyal

o Documentation update for group_idle tunable and Group IOPS mode.
---
 Documentation/block/cfq-iosched.txt        |   44 ++++++++++++++++++++++++++++
 Documentation/cgroups/blkio-controller.txt |   28 +++++++++++++++++
 2 files changed, 72 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/block/cfq-iosched.txt

diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt
new file mode 100644
index 0000000..6cc2151
--- /dev/null
+++ b/Documentation/block/cfq-iosched.txt
@@ -0,0 +1,44 @@
+CFQ ioscheduler tunables
+========================
+
+slice_idle
+----------
+This specifies how long CFQ should idle for next request on certain cfq queues
+(for sequential workloads) and service trees (for random workloads) before
+queue is expired and CFQ selects next queue to dispatch from.
+
+By default slice_idle is a non zero value. That means by default we idle on
+queues/service trees. This can be very helpful on highly seeky media like
+single spindle SATA/SAS disks where we can cut down on overall number of
+seeks and see improved throughput.
+
+Setting slice_idle to 0 will remove all the idling on queues/service tree
+level and one should see an overall improved throughput on faster storage
+devices like multiple SATA/SAS disks in hardware RAID configuration. The down
+side is that isolation provided from WRITES also goes down and notion of
+ioprio becomes weaker.
+
+So depending on storage and workload, it might be a useful to set slice_idle=0.
+In general I think for SATA/SAS disks and software RAID of SATA/SAS disks
+keeping slice_idle enabled should be useful. For any configurations where
+there are multiple spindles behind single LUN (Host based hardware RAID
+controller or for storage arrays), setting slice_idle=0 might end up in better
+throughput and acceptable latencies.
+
+CFQ IOPS Mode for group scheduling
+==================================
+Basic CFQ design is to provide prio based time slices. Higher prio process
+gets bigger time slice and lower prio process gets smaller time slice.
+Measuring time becomes harder if storage is fast and supports NCQ and it would
+be better to dispatch multiple requests from multiple cfq queues in request
+queue at a time. In such scenario, it is not possible to measure time consumed
+by single queue accurately.
+
+What is possible though to measure number of requests dispatched from a single
+queue and also allow dispatch from multiple cfqq at the same time. This
+effectively becomes the fairness in terms of IOPS (IO operations per second).
+
+If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
+to IOPS mode and starts providing fairness in terms of number of requests
+dispatched. Note that this mode switching takes effect only for group
+scheduling. For non cgroup users nothing should change.
diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index 48e0b21..6919d62 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -217,6 +217,7 @@ Details of cgroup files
 CFQ sysfs tunable
 =================
 /sys/block/<disk>/queue/iosched/group_isolation
+-----------------------------------------------
 
 If group_isolation=1, it provides stronger isolation between groups at the
 expense of throughput. By default group_isolation is 0. In general that
@@ -243,6 +244,33 @@ By default one should run with group_isolation=0. If that is not sufficient
 and one wants stronger isolation between groups, then set group_isolation=1
 but this will come at cost of reduced throughput.
 
+/sys/block/<disk>/queue/iosched/slice_idle
+------------------------------------------
+On a faster hardware CFQ can be slow, especially with sequential workload.
+This happens because CFQ idles on a single queue and single queue might not
+drive deeper request queue depths to keep the storage busy. In such scenarios
+one can try setting slice_idle=0 and that would switch CFQ to IOPS
+(IO operations per second) mode on NCQ supporting hardware.
+
+That means CFQ will not idle between cfq queues of a cfq group and hence be
+able to driver higher queue depth and achieve better throughput. That also
+means that cfq provides fairness among groups in terms of IOPS and not in
+terms of disk time.
+
+/sys/block/<disk>/queue/iosched/group_idle
+------------------------------------------
+If one disables idling on individual cfq queues and cfq service trees by
+setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
+on the group in an attempt to provide fairness among groups.
+
+By default group_idle is same as slice_idle and does not do anything if
+slice_idle is enabled.
+
+One can experience an overall throughput drop if you have created multiple
+groups and put applications in that group which are not driving enough
+IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
+on individual groups and throughput should improve.
+
 What works
 ==========
 - Currently only sync IO queues are support. All the buffered writes are
-- 
1.7.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] cfq-iosched: Documentation update
  2010-07-22 21:29 ` [PATCH 5/5] cfq-iosched: Documentation update Vivek Goyal
@ 2010-07-22 21:36   ` Randy Dunlap
  2010-07-23 20:22     ` Vivek Goyal
  0 siblings, 1 reply; 26+ messages in thread
From: Randy Dunlap @ 2010-07-22 21:36 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-kernel, jaxboe, nauman, dpshah, guijianfeng, jmoyer,
	czoccolo

On Thu, 22 Jul 2010 17:29:32 -0400 Vivek Goyal wrote:

> o Documentation update for group_idle tunable and Group IOPS mode.
> ---
>  Documentation/block/cfq-iosched.txt        |   44 ++++++++++++++++++++++++++++
>  Documentation/cgroups/blkio-controller.txt |   28 +++++++++++++++++
>  2 files changed, 72 insertions(+), 0 deletions(-)
>  create mode 100644 Documentation/block/cfq-iosched.txt
> 
> diff --git a/Documentation/block/cfq-iosched.txt b/Documentation/block/cfq-iosched.txt
> new file mode 100644
> index 0000000..6cc2151
> --- /dev/null
> +++ b/Documentation/block/cfq-iosched.txt
> @@ -0,0 +1,44 @@
> +CFQ ioscheduler tunables
> +========================
> +
> +slice_idle
> +----------
> +This specifies how long CFQ should idle for next request on certain cfq queues
> +(for sequential workloads) and service trees (for random workloads) before
> +queue is expired and CFQ selects next queue to dispatch from.
> +
> +By default slice_idle is a non zero value. That means by default we idle on

                              non-zero

> +queues/service trees. This can be very helpful on highly seeky media like
> +single spindle SATA/SAS disks where we can cut down on overall number of
> +seeks and see improved throughput.
> +
> +Setting slice_idle to 0 will remove all the idling on queues/service tree
> +level and one should see an overall improved throughput on faster storage
> +devices like multiple SATA/SAS disks in hardware RAID configuration. The down
> +side is that isolation provided from WRITES also goes down and notion of
> +ioprio becomes weaker.
> +
> +So depending on storage and workload, it might be a useful to set slice_idle=0.

                                            might be useful

> +In general I think for SATA/SAS disks and software RAID of SATA/SAS disks
> +keeping slice_idle enabled should be useful. For any configurations where
> +there are multiple spindles behind single LUN (Host based hardware RAID
> +controller or for storage arrays), setting slice_idle=0 might end up in better
> +throughput and acceptable latencies.
> +
> +CFQ IOPS Mode for group scheduling
> +==================================
> +Basic CFQ design is to provide prio based time slices. Higher prio process
> +gets bigger time slice and lower prio process gets smaller time slice.

s/prio/priority/ multiple places.

> +Measuring time becomes harder if storage is fast and supports NCQ and it would
> +be better to dispatch multiple requests from multiple cfq queues in request
> +queue at a time. In such scenario, it is not possible to measure time consumed
> +by single queue accurately.
> +
> +What is possible though to measure number of requests dispatched from a single

                    though is to measure (?)

> +queue and also allow dispatch from multiple cfqq at the same time. This

                   what is cfqq?               ^^^^

> +effectively becomes the fairness in terms of IOPS (IO operations per second).
> +
> +If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
> +to IOPS mode and starts providing fairness in terms of number of requests
> +dispatched. Note that this mode switching takes effect only for group
> +scheduling. For non cgroup users nothing should change.

                   non-cgroup

> diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
> index 48e0b21..6919d62 100644
> --- a/Documentation/cgroups/blkio-controller.txt
> +++ b/Documentation/cgroups/blkio-controller.txt
> @@ -217,6 +217,7 @@ Details of cgroup files
>  CFQ sysfs tunable
>  =================
>  /sys/block/<disk>/queue/iosched/group_isolation
> +-----------------------------------------------
>  
>  If group_isolation=1, it provides stronger isolation between groups at the
>  expense of throughput. By default group_isolation is 0. In general that
> @@ -243,6 +244,33 @@ By default one should run with group_isolation=0. If that is not sufficient
>  and one wants stronger isolation between groups, then set group_isolation=1
>  but this will come at cost of reduced throughput.
>  
> +/sys/block/<disk>/queue/iosched/slice_idle
> +------------------------------------------
> +On a faster hardware CFQ can be slow, especially with sequential workload.
> +This happens because CFQ idles on a single queue and single queue might not
> +drive deeper request queue depths to keep the storage busy. In such scenarios
> +one can try setting slice_idle=0 and that would switch CFQ to IOPS
> +(IO operations per second) mode on NCQ supporting hardware.
> +
> +That means CFQ will not idle between cfq queues of a cfq group and hence be
> +able to driver higher queue depth and achieve better throughput. That also
> +means that cfq provides fairness among groups in terms of IOPS and not in
> +terms of disk time.
> +
> +/sys/block/<disk>/queue/iosched/group_idle
> +------------------------------------------
> +If one disables idling on individual cfq queues and cfq service trees by
> +setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
> +on the group in an attempt to provide fairness among groups.
> +
> +By default group_idle is same as slice_idle and does not do anything if
> +slice_idle is enabled.
> +
> +One can experience an overall throughput drop if you have created multiple
> +groups and put applications in that group which are not driving enough
> +IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
> +on individual groups and throughput should improve.
> +
>  What works
>  ==========
>  - Currently only sync IO queues are support. All the buffered writes are
> -- 


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-22 21:29 [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Vivek Goyal
                   ` (4 preceding siblings ...)
  2010-07-22 21:29 ` [PATCH 5/5] cfq-iosched: Documentation update Vivek Goyal
@ 2010-07-23 14:03 ` Heinz Diehl
  2010-07-23 14:13   ` Vivek Goyal
  5 siblings, 1 reply; 26+ messages in thread
From: Heinz Diehl @ 2010-07-23 14:03 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-kernel, jaxboe, nauman, dpshah, guijianfeng, jmoyer,
	czoccolo

On 23.07.2010, Vivek Goyal wrote: 

> This is V4 of the patchset which implements a new tunable group_idle and also
> implements IOPS mode for group fairness. Following are changes since V3.
[....]

Just for information: this patchset, applied to 2.6.35-rc6, gives about
20-25% increase in speed/throughput on my desktop system 
(Phenom 2.5GHz Quadcore, 3 disks) with the tunables set according 
to what you've used/reported here (the setup with slice_idle set to 0),
and it's measurable with fs_mark, too.

After 2 hours of hard testing, the machine remains stable and responsive.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-23 14:03 ` [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Heinz Diehl
@ 2010-07-23 14:13   ` Vivek Goyal
  2010-07-23 14:56     ` Heinz Diehl
  0 siblings, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2010-07-23 14:13 UTC (permalink / raw)
  To: Heinz Diehl
  Cc: linux-kernel, jaxboe, nauman, dpshah, guijianfeng, jmoyer,
	czoccolo

On Fri, Jul 23, 2010 at 04:03:43PM +0200, Heinz Diehl wrote:
> On 23.07.2010, Vivek Goyal wrote: 
> 
> > This is V4 of the patchset which implements a new tunable group_idle and also
> > implements IOPS mode for group fairness. Following are changes since V3.
> [....]
> 
> Just for information: this patchset, applied to 2.6.35-rc6, gives about
> 20-25% increase in speed/throughput on my desktop system 
> (Phenom 2.5GHz Quadcore, 3 disks) with the tunables set according 
> to what you've used/reported here (the setup with slice_idle set to 0),
> and it's measurable with fs_mark, too.
> 
> After 2 hours of hard testing, the machine remains stable and responsive.

Thanks for some testing Heinz. I am assuming you are not using cgroups
and blkio controller.

In that case, you are seeing improvements probably due to first patch
where we don't idle on service tree if slice_idle=0. Hence we cut down on
overall idling and can see throughput incrase.

What kind of configuration these 3 disks are on your system? Some Hardare
RAID or software RAID ?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-23 14:13   ` Vivek Goyal
@ 2010-07-23 14:56     ` Heinz Diehl
  2010-07-23 18:37       ` Vivek Goyal
  0 siblings, 1 reply; 26+ messages in thread
From: Heinz Diehl @ 2010-07-23 14:56 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-kernel, jaxboe, nauman, dpshah, guijianfeng, jmoyer,
	czoccolo

On 23.07.2010, Vivek Goyal wrote: 

> Thanks for some testing Heinz. I am assuming you are not using cgroups
> and blkio controller.

Not at all.

> In that case, you are seeing improvements probably due to first patch
> where we don't idle on service tree if slice_idle=0. Hence we cut down on
> overall idling and can see throughput incrase.

Hmm, in any case it's not getting worse by setting slice_idle to 8. 

My main motivation to test your patches was that I thought 
the other way 'round, and was just curious on how this patchset 
will affect machines which are NOT a high end server/storage system :-) 

> What kind of configuration these 3 disks are on your system? Some Hardare
> RAID or software RAID ?

Just 3 SATA disks plugged into the onboard controller, no RAID or whatsoever.

I used fs_mark for testing:
"fs_mark  -S  1  -D  10000  -N  100000  -d  /home/htd/fsmark/test  -s 65536  -t  1  -w  4096  -F"

These are the results with plain cfq (2.6.35-rc6) and the settings which
gave the best speed/throughput on my machine:

low_latency = 0
slice_idle = 4
quantum = 32

Setting slice_idle to 0 didn't improve anything, I tried this before.

FSUse%        Count         Size    Files/sec     App Overhead
    27         1000        65536        360.3            34133
    27         2000        65536        384.4            34657
    27         3000        65536        401.1            32994
    27         4000        65536        394.3            33781
    27         5000        65536        406.8            32569
    27         6000        65536        401.9            34001
    27         7000        65536        374.5            33192
    27         8000        65536        398.3            32839
    27         9000        65536        405.2            34110
    27        10000        65536        398.9            33887
    27        11000        65536        402.3            34111
    27        12000        65536        398.1            33652
    27        13000        65536        412.9            32443
    27        14000        65536        408.1            32197


And this is after applying your patchset, with your settings
(and slice_idle = 0):

FSUse%        Count         Size    Files/sec     App Overhead
    27         1000        65536        600.7            29579
    27         2000        65536        568.4            30650
    27         3000        65536        522.0            29171
    27         4000        65536        534.1            29751
    27         5000        65536        550.7            30168
    27         6000        65536        521.7            30158
    27         7000        65536        493.3            29211
    27         8000        65536        495.3            30183
    27         9000        65536        587.8            29881
    27        10000        65536        469.9            29602
    27        11000        65536        482.7            29557
    27        12000        65536        486.6            30700
    27        13000        65536        516.1            30243


There's some 2-3% further improvement on my system with these settings,
which after som fiddling turned out to give most performance here
(don't need the group settings, of course):

group_idle = 0
group_isolation = 0
low_latency = 1
quantum = 8
slice_idle = 8

Thanks,
Heinz.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-23 14:56     ` Heinz Diehl
@ 2010-07-23 18:37       ` Vivek Goyal
  2010-07-24  8:06         ` Heinz Diehl
  0 siblings, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2010-07-23 18:37 UTC (permalink / raw)
  To: Heinz Diehl
  Cc: linux-kernel, jaxboe, nauman, dpshah, guijianfeng, jmoyer,
	czoccolo

On Fri, Jul 23, 2010 at 04:56:31PM +0200, Heinz Diehl wrote:
> On 23.07.2010, Vivek Goyal wrote: 
> 
> > Thanks for some testing Heinz. I am assuming you are not using cgroups
> > and blkio controller.
> 
> Not at all.
> 
> > In that case, you are seeing improvements probably due to first patch
> > where we don't idle on service tree if slice_idle=0. Hence we cut down on
> > overall idling and can see throughput incrase.
> 
> Hmm, in any case it's not getting worse by setting slice_idle to 8. 
> 
> My main motivation to test your patches was that I thought 
> the other way 'round, and was just curious on how this patchset 
> will affect machines which are NOT a high end server/storage system :-) 
> 
> > What kind of configuration these 3 disks are on your system? Some Hardare
> > RAID or software RAID ?
> 
> Just 3 SATA disks plugged into the onboard controller, no RAID or whatsoever.
> 
> I used fs_mark for testing:
> "fs_mark  -S  1  -D  10000  -N  100000  -d  /home/htd/fsmark/test  -s 65536  -t  1  -w  4096  -F"
> 
> These are the results with plain cfq (2.6.35-rc6) and the settings which
> gave the best speed/throughput on my machine:
> 
> low_latency = 0
> slice_idle = 4
> quantum = 32
> 
> Setting slice_idle to 0 didn't improve anything, I tried this before.
> 
> FSUse%        Count         Size    Files/sec     App Overhead
>     27         1000        65536        360.3            34133
>     27         2000        65536        384.4            34657
>     27         3000        65536        401.1            32994
>     27         4000        65536        394.3            33781
>     27         5000        65536        406.8            32569
>     27         6000        65536        401.9            34001
>     27         7000        65536        374.5            33192
>     27         8000        65536        398.3            32839
>     27         9000        65536        405.2            34110
>     27        10000        65536        398.9            33887
>     27        11000        65536        402.3            34111
>     27        12000        65536        398.1            33652
>     27        13000        65536        412.9            32443
>     27        14000        65536        408.1            32197
> 
> 
> And this is after applying your patchset, with your settings
> (and slice_idle = 0):
> 
> FSUse%        Count         Size    Files/sec     App Overhead
>     27         1000        65536        600.7            29579
>     27         2000        65536        568.4            30650
>     27         3000        65536        522.0            29171
>     27         4000        65536        534.1            29751
>     27         5000        65536        550.7            30168
>     27         6000        65536        521.7            30158
>     27         7000        65536        493.3            29211
>     27         8000        65536        495.3            30183
>     27         9000        65536        587.8            29881
>     27        10000        65536        469.9            29602
>     27        11000        65536        482.7            29557
>     27        12000        65536        486.6            30700
>     27        13000        65536        516.1            30243
> 

I think that above improvement is due to first patch and changes in
cfq_should_idle(). cfq_should_idle() used to return 1 even if slice_idle=0
and that created bottlenecks at some places like in select_queue() we
will not expire a queue till request from that queue completed. This
stopped a new queue from dispatching requests etc...

Anyway, for fs_mark problem, can you give following patch a try.

https://patchwork.kernel.org/patch/113061/

Above patch should improve your fs_mark numbers even without setting
slice_idle=0.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/5] cfq-iosched: Documentation update
  2010-07-22 21:36   ` Randy Dunlap
@ 2010-07-23 20:22     ` Vivek Goyal
  0 siblings, 0 replies; 26+ messages in thread
From: Vivek Goyal @ 2010-07-23 20:22 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: linux-kernel, jaxboe, nauman, dpshah, guijianfeng, jmoyer,
	czoccolo

On Thu, Jul 22, 2010 at 02:36:59PM -0700, Randy Dunlap wrote:
> On Thu, 22 Jul 2010 17:29:32 -0400 Vivek Goyal wrote:
> 
> > o Documentation update for group_idle tunable and Group IOPS mode.
> > ---

Thanks Randy. I have taken care of your comments in the attached patch.

Vivek

---
 Documentation/block/cfq-iosched.txt        |   45 +++++++++++++++++++++++++++++
 Documentation/cgroups/blkio-controller.txt |   28 ++++++++++++++++++
 2 files changed, 73 insertions(+)

Index: linux-2.6/Documentation/block/cfq-iosched.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/Documentation/block/cfq-iosched.txt	2010-07-23 16:20:52.000000000 -0400
@@ -0,0 +1,45 @@
+CFQ ioscheduler tunables
+========================
+
+slice_idle
+----------
+This specifies how long CFQ should idle for next request on certain cfq queues
+(for sequential workloads) and service trees (for random workloads) before
+queue is expired and CFQ selects next queue to dispatch from.
+
+By default slice_idle is a non-zero value. That means by default we idle on
+queues/service trees. This can be very helpful on highly seeky media like
+single spindle SATA/SAS disks where we can cut down on overall number of
+seeks and see improved throughput.
+
+Setting slice_idle to 0 will remove all the idling on queues/service tree
+level and one should see an overall improved throughput on faster storage
+devices like multiple SATA/SAS disks in hardware RAID configuration. The down
+side is that isolation provided from WRITES also goes down and notion of
+IO priority becomes weaker.
+
+So depending on storage and workload, it might be useful to set slice_idle=0.
+In general I think for SATA/SAS disks and software RAID of SATA/SAS disks
+keeping slice_idle enabled should be useful. For any configurations where
+there are multiple spindles behind single LUN (Host based hardware RAID
+controller or for storage arrays), setting slice_idle=0 might end up in better
+throughput and acceptable latencies.
+
+CFQ IOPS Mode for group scheduling
+==================================
+Basic CFQ design is to provide priority based time slices. Higher priority
+process gets bigger time slice and lower priority process gets smaller time
+slice. Measuring time becomes harder if storage is fast and supports NCQ and
+it would be better to dispatch multiple requests from multiple cfq queues in
+request queue at a time. In such scenario, it is not possible to measure time
+consumed by single queue accurately.
+
+What is possible though is to measure number of requests dispatched from a
+single queue and also allow dispatch from multiple cfq queue at the same time.
+This effectively becomes the fairness in terms of IOPS (IO operations per
+second).
+
+If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
+to IOPS mode and starts providing fairness in terms of number of requests
+dispatched. Note that this mode switching takes effect only for group
+scheduling. For non-cgroup users nothing should change.
Index: linux-2.6/Documentation/cgroups/blkio-controller.txt
===================================================================
--- linux-2.6.orig/Documentation/cgroups/blkio-controller.txt	2010-07-22 16:52:22.000000000 -0400
+++ linux-2.6/Documentation/cgroups/blkio-controller.txt	2010-07-23 16:16:09.000000000 -0400
@@ -217,6 +217,7 @@ Details of cgroup files
 CFQ sysfs tunable
 =================
 /sys/block/<disk>/queue/iosched/group_isolation
+-----------------------------------------------
 
 If group_isolation=1, it provides stronger isolation between groups at the
 expense of throughput. By default group_isolation is 0. In general that
@@ -243,6 +244,33 @@ By default one should run with group_iso
 and one wants stronger isolation between groups, then set group_isolation=1
 but this will come at cost of reduced throughput.
 
+/sys/block/<disk>/queue/iosched/slice_idle
+------------------------------------------
+On a faster hardware CFQ can be slow, especially with sequential workload.
+This happens because CFQ idles on a single queue and single queue might not
+drive deeper request queue depths to keep the storage busy. In such scenarios
+one can try setting slice_idle=0 and that would switch CFQ to IOPS
+(IO operations per second) mode on NCQ supporting hardware.
+
+That means CFQ will not idle between cfq queues of a cfq group and hence be
+able to driver higher queue depth and achieve better throughput. That also
+means that cfq provides fairness among groups in terms of IOPS and not in
+terms of disk time.
+
+/sys/block/<disk>/queue/iosched/group_idle
+------------------------------------------
+If one disables idling on individual cfq queues and cfq service trees by
+setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
+on the group in an attempt to provide fairness among groups.
+
+By default group_idle is same as slice_idle and does not do anything if
+slice_idle is enabled.
+
+One can experience an overall throughput drop if you have created multiple
+groups and put applications in that group which are not driving enough
+IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
+on individual groups and throughput should improve.
+
 What works
 ==========
 - Currently only sync IO queues are support. All the buffered writes are

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-23 18:37       ` Vivek Goyal
@ 2010-07-24  8:06         ` Heinz Diehl
  2010-07-26 13:43           ` Vivek Goyal
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Heinz Diehl @ 2010-07-24  8:06 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-kernel, jaxboe, nauman, dpshah, guijianfeng, jmoyer,
	czoccolo

On 23.07.2010, Vivek Goyal wrote: 

> Anyway, for fs_mark problem, can you give following patch a try.
> https://patchwork.kernel.org/patch/113061/

Ported it to 2.6.35-rc6, and these are my results using the same fs_mark
call as before:

slice_idle = 0

FSUse%        Count         Size    Files/sec     App Overhead
    28         1000        65536        241.6            39574
    28         2000        65536        231.1            39939
    28         3000        65536        230.4            39722
    28         4000        65536        243.2            39646
    28         5000        65536        227.0            39892
    28         6000        65536        224.1            39555
    28         7000        65536        228.2            39761
    28         8000        65536        235.3            39766
    28         9000        65536        237.3            40518
    28        10000        65536        225.7            39861
    28        11000        65536        227.2            39441


slice_idle = 8

FSUse%        Count         Size    Files/sec     App Overhead
    28         1000        65536        502.2            30545
    28         2000        65536        407.6            29406
    28         3000        65536        381.8            30152
    28         4000        65536        438.1            30038
    28         5000        65536        447.5            30477
    28         6000        65536        422.0            29610
    28         7000        65536        383.1            30327
    28         8000        65536        415.3            30102
    28         9000        65536        397.6            31013
    28        10000        65536        401.4            29201
    28        11000        65536        408.8            29720
    28        12000        65536        391.2            29157

Huh...there's quite a difference! It's definitely the slice_idle settings
which affect the results here. Besides, this patch gives noticeably bad
desktop interactivity on my system.

Don't know if this is related, but I'm not quite shure if XFS (which I use
exclusively) uses the jbd/jbd2 journaling layer at all.

Thanks,
Heinz.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-24  8:06         ` Heinz Diehl
@ 2010-07-26 13:43           ` Vivek Goyal
  2010-07-26 13:48             ` Christoph Hellwig
  2010-07-26 16:15             ` Heinz Diehl
  2010-07-26 14:13           ` Christoph Hellwig
  2010-07-28 20:22           ` Vivek Goyal
  2 siblings, 2 replies; 26+ messages in thread
From: Vivek Goyal @ 2010-07-26 13:43 UTC (permalink / raw)
  To: Heinz Diehl
  Cc: linux-kernel, jaxboe, nauman, dpshah, guijianfeng, jmoyer,
	czoccolo, Christoph Hellwig

On Sat, Jul 24, 2010 at 10:06:13AM +0200, Heinz Diehl wrote:
> On 23.07.2010, Vivek Goyal wrote: 
> 
> > Anyway, for fs_mark problem, can you give following patch a try.
> > https://patchwork.kernel.org/patch/113061/
> 
> Ported it to 2.6.35-rc6, and these are my results using the same fs_mark
> call as before:
> 
> slice_idle = 0
> 
> FSUse%        Count         Size    Files/sec     App Overhead
>     28         1000        65536        241.6            39574
>     28         2000        65536        231.1            39939
>     28         3000        65536        230.4            39722
>     28         4000        65536        243.2            39646
>     28         5000        65536        227.0            39892
>     28         6000        65536        224.1            39555
>     28         7000        65536        228.2            39761
>     28         8000        65536        235.3            39766
>     28         9000        65536        237.3            40518
>     28        10000        65536        225.7            39861
>     28        11000        65536        227.2            39441
> 
> 
> slice_idle = 8
> 
> FSUse%        Count         Size    Files/sec     App Overhead
>     28         1000        65536        502.2            30545
>     28         2000        65536        407.6            29406
>     28         3000        65536        381.8            30152
>     28         4000        65536        438.1            30038
>     28         5000        65536        447.5            30477
>     28         6000        65536        422.0            29610
>     28         7000        65536        383.1            30327
>     28         8000        65536        415.3            30102
>     28         9000        65536        397.6            31013
>     28        10000        65536        401.4            29201
>     28        11000        65536        408.8            29720
>     28        12000        65536        391.2            29157
> 
> Huh...there's quite a difference! It's definitely the slice_idle settings
> which affect the results here.

In this case it is not slice_idle. This patch puts both fsync writer and
jbd thread on same service tree. That way once fsync writer is done there
is no idling after that and jbd thread almost immediately gets to dispatch
requests to disk hence we see improved throughput.

> Besides, this patch gives noticeably bad desktop interactivity on my system.
> 

How do you measure it? IOW, are you running something else also on the
desktop in the background. Like a heavy writer etc and then measuring
how interactive desktop feels?

> Don't know if this is related, but I'm not quite shure if XFS (which I use
> exclusively) uses the jbd/jbd2 journaling layer at all.

I also don't know. But because this patch is making a difference with your
XFS file system performance, may be it does use.

CCing Christoph, he can tell us.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-26 13:43           ` Vivek Goyal
@ 2010-07-26 13:48             ` Christoph Hellwig
  2010-07-26 13:54               ` Vivek Goyal
  2010-07-26 16:15             ` Heinz Diehl
  1 sibling, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2010-07-26 13:48 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Heinz Diehl, linux-kernel, jaxboe, nauman, dpshah, guijianfeng,
	jmoyer, czoccolo, Christoph Hellwig

On Mon, Jul 26, 2010 at 09:43:29AM -0400, Vivek Goyal wrote:
> > Don't know if this is related, but I'm not quite shure if XFS (which I use
> > exclusively) uses the jbd/jbd2 journaling layer at all.
> 
> I also don't know. But because this patch is making a difference with your
> XFS file system performance, may be it does use.
> 
> CCing Christoph, he can tell us.

No, of course XFS doesn't use jbd.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-26 13:48             ` Christoph Hellwig
@ 2010-07-26 13:54               ` Vivek Goyal
  0 siblings, 0 replies; 26+ messages in thread
From: Vivek Goyal @ 2010-07-26 13:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Heinz Diehl, linux-kernel, jaxboe, nauman, dpshah, guijianfeng,
	jmoyer, czoccolo

On Mon, Jul 26, 2010 at 09:48:18AM -0400, Christoph Hellwig wrote:
> On Mon, Jul 26, 2010 at 09:43:29AM -0400, Vivek Goyal wrote:
> > > Don't know if this is related, but I'm not quite shure if XFS (which I use
> > > exclusively) uses the jbd/jbd2 journaling layer at all.
> > 
> > I also don't know. But because this patch is making a difference with your
> > XFS file system performance, may be it does use.
> > 
> > CCing Christoph, he can tell us.
> 
> No, of course XFS doesn't use jbd.

Hmm.., interesting. So somewhere WRITE_SYNC idling in CFQ is hurting XFS
performance also. This time for some other reason and not jbd/jbd2.

Vivek



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-24  8:06         ` Heinz Diehl
  2010-07-26 13:43           ` Vivek Goyal
@ 2010-07-26 14:13           ` Christoph Hellwig
  2010-07-27  7:48             ` Heinz Diehl
  2010-07-28 20:22           ` Vivek Goyal
  2 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2010-07-26 14:13 UTC (permalink / raw)
  To: Heinz Diehl
  Cc: Vivek Goyal, linux-kernel, jaxboe, nauman, dpshah, guijianfeng,
	jmoyer, czoccolo

Just curious, what numbers do you see when simply using the deadline
I/O scheduler?  That's what we recommend for use with XFS anyway.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-26 13:43           ` Vivek Goyal
  2010-07-26 13:48             ` Christoph Hellwig
@ 2010-07-26 16:15             ` Heinz Diehl
  1 sibling, 0 replies; 26+ messages in thread
From: Heinz Diehl @ 2010-07-26 16:15 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-kernel, jaxboe, nauman, dpshah, guijianfeng, jmoyer,
	czoccolo, Christoph Hellwig

On 26.07.2010, Vivek Goyal wrote: 

> How do you measure it? IOW, are you running something else also on the
> desktop in the background. Like a heavy writer etc and then measuring
> how interactive desktop feels?

I used Linus' "bigfile torture test" in the background:

 while : ; do time sh -c "dd if=/dev/zero of=bigfile bs=8M count=256 ;
 sync; rm bigfile"; done

and Theodore Tso's "fsync-tester" to benchmark interactivity. 

Didn't save any results, wasn't expecting that this could be of any
further interest (but can run the tests one more time, if desired).

Thanks,
Heinz.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] cfq-iosched: Implment IOPS mode for group scheduling
  2010-07-22 21:29 ` [PATCH 2/5] cfq-iosched: Implment IOPS mode for group scheduling Vivek Goyal
@ 2010-07-27  5:47   ` Gui Jianfeng
  2010-07-27 13:09     ` Vivek Goyal
  0 siblings, 1 reply; 26+ messages in thread
From: Gui Jianfeng @ 2010-07-27  5:47 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel, jaxboe, nauman, dpshah, jmoyer, czoccolo

Vivek Goyal wrote:
> o Implement another CFQ mode where we charge group in terms of number
>   of requests dispatched instead of measuring the time. Measuring in terms
>   of time is not possible when we are driving deeper queue depths and there
>   are requests from multiple cfq queues in the request queue.
> 
> o This mode currently gets activated if one sets slice_idle=0 and associated
>   disk supports NCQ. Again the idea is that on an NCQ disk with idling disabled
>   most of the queues will dispatch 1 or more requests and then cfq queue
>   expiry happens and we don't have a way to measure time. So start providing
>   fairness in terms of IOPS.
> 
> o Currently IOPS mode works only with cfq group scheduling. CFQ is following
>   different scheduling algorithms for queue and group scheduling. These IOPS
>   stats are used only for group scheduling hence in non-croup mode nothing
>   should change.
> 
> o For CFQ group scheduling one can disable slice idling so that we don't idle
>   on queue and drive deeper request queue depths (achieving better throughput),
>   at the same time group idle is enabled so one should get service
>   differentiation among groups.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  block/cfq-iosched.c |   30 ++++++++++++++++++++++++------
>  1 files changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index c5ec2eb..9f82ec6 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -378,6 +378,21 @@ CFQ_CFQQ_FNS(wait_busy);
>  			&cfqg->service_trees[i][j]: NULL) \
>  
>  
> +static inline bool iops_mode(struct cfq_data *cfqd)
> +{
> +	/*
> +	 * If we are not idling on queues and it is a NCQ drive, parallel
> +	 * execution of requests is on and measuring time is not possible
> +	 * in most of the cases until and unless we drive shallower queue
> +	 * depths and that becomes a performance bottleneck. In such cases
> +	 * switch to start providing fairness in terms of number of IOs.
> +	 */
> +	if (!cfqd->cfq_slice_idle && cfqd->hw_tag)
> +		return true;
> +	else
> +		return false;
> +}
> +
>  static inline enum wl_prio_t cfqq_prio(struct cfq_queue *cfqq)
>  {
>  	if (cfq_class_idle(cfqq))
> @@ -905,7 +920,6 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
>  			slice_used = cfqq->allocated_slice;
>  	}
>  
> -	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used);
>  	return slice_used;
>  }
>  
> @@ -913,19 +927,21 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
>  				struct cfq_queue *cfqq)
>  {
>  	struct cfq_rb_root *st = &cfqd->grp_service_tree;
> -	unsigned int used_sl, charge_sl;
> +	unsigned int used_sl, charge;
>  	int nr_sync = cfqg->nr_cfqq - cfqg_busy_async_queues(cfqd, cfqg)
>  			- cfqg->service_tree_idle.count;
>  
>  	BUG_ON(nr_sync < 0);
> -	used_sl = charge_sl = cfq_cfqq_slice_usage(cfqq);
> +	used_sl = charge = cfq_cfqq_slice_usage(cfqq);
>  
> -	if (!cfq_cfqq_sync(cfqq) && !nr_sync)
> -		charge_sl = cfqq->allocated_slice;
> +	if (iops_mode(cfqd))
> +		charge = cfqq->slice_dispatch;

Hi Vivek,

At this time, requests may still stay in dispatch list, shall we add a new variable
in cfqq to keep track of the number of requests that go into driver, and charging
this number?

Thanks
Gui

> +	else if (!cfq_cfqq_sync(cfqq) && !nr_sync)
> +		charge = cfqq->allocated_slice;
>  
>  	/* Can't update vdisktime while group is on service tree */
>  	cfq_rb_erase(&cfqg->rb_node, st);
> -	cfqg->vdisktime += cfq_scale_slice(charge_sl, cfqg);
> +	cfqg->vdisktime += cfq_scale_slice(charge, cfqg);
>  	__cfq_group_service_tree_add(st, cfqg);
>  
>  	/* This group is being expired. Save the context */
> @@ -939,6 +955,8 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
>  
>  	cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime,
>  					st->min_vdisktime);
> +	cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u disp=%u charge=%u iops=%u",
> +			used_sl, cfqq->slice_dispatch, charge, iops_mode(cfqd));
>  	cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl);
>  	cfq_blkiocg_set_start_empty_time(&cfqg->blkg);
>  }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-26 14:13           ` Christoph Hellwig
@ 2010-07-27  7:48             ` Heinz Diehl
  0 siblings, 0 replies; 26+ messages in thread
From: Heinz Diehl @ 2010-07-27  7:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Vivek Goyal, linux-kernel, jaxboe, nauman, dpshah, guijianfeng,
	jmoyer, czoccolo

On 26.07.2010, Christoph Hellwig wrote: 

> Just curious, what numbers do you see when simply using the deadline
> I/O scheduler?  That's what we recommend for use with XFS anyway.

Some fs_mark testing first:

Deadline, 1 thread:

#  ./fs_mark  -S  1  -D  10000  -N  100000  -d  /home/htd/fsmark/test  -s  65536  -t  1  -w  4096  -F 

FSUse%        Count         Size    Files/sec     App Overhead
    26         1000        65536        227.7            39998
    26         2000        65536        229.2            39309
    26         3000        65536        236.4            40232
    26         4000        65536        231.1            39294
    26         5000        65536        233.4            39728
    26         6000        65536        234.2            39719
    26         7000        65536        227.9            39463
    26         8000        65536        239.0            39477
    26         9000        65536        233.1            39563
    26        10000        65536        233.1            39878
    26        11000        65536        233.2            39560

Deadline, 4 threads:

#  ./fs_mark  -S  1  -D  10000  -N  100000  -d  /home/htd/fsmark/test  -s  65536  -t  4  -w  4096  -F 

FSUse%        Count         Size    Files/sec     App Overhead
    26         4000        65536        465.6           148470
    26         8000        65536        398.6           152827
    26        12000        65536        472.7           147235
    26        16000        65536        477.0           149344
    27        20000        65536        489.7           148055
    27        24000        65536        444.3           152806
    27        28000        65536        515.5           144821
    27        32000        65536        501.0           146561
    27        36000        65536        456.8           150124
    27        40000        65536        427.8           148830
    27        44000        65536        489.6           149843
    27        48000        65536        467.8           147501


CFQ, 1 thread:

#  ./fs_mark  -S  1  -D  10000  -N  100000  -d  /home/htd/fsmark/test  -s  65536  -t  1  -w  4096  -F 

FSUse%        Count         Size    Files/sec     App Overhead
    27         1000        65536        439.3            30158
    27         2000        65536        457.7            30274
    27         3000        65536        432.0            30572
    27         4000        65536        413.9            29641
    27         5000        65536        410.4            30289
    27         6000        65536        458.5            29861
    27         7000        65536        441.1            30268
    27         8000        65536        459.3            28900
    27         9000        65536        420.1            30439
    27        10000        65536        426.1            30628
    27        11000        65536        479.7            30058

CFQ, 4 threads:

#  ./fs_mark  -S  1  -D  10000  -N  100000  -d  /home/htd/fsmark/test  -s  65536  -t  4  -w  4096  -F 

FSUse%        Count         Size    Files/sec     App Overhead
    27         4000        65536        540.7           149177
    27         8000        65536        469.6           147957
    27        12000        65536        507.6           149185
    27        16000        65536        460.0           145953
    28        20000        65536        534.3           151936
    28        24000        65536        542.1           147083
    28        28000        65536        516.0           149363
    28        32000        65536        534.3           148655
    28        36000        65536        511.1           146989
    28        40000        65536        499.9           147884
    28        44000        65536        514.3           147846
    28        48000        65536        467.1           148099
    28        52000        65536        454.7           149052


Here are the results of the fsync-tester, doing

 "while : ; do time sh -c "dd if=/dev/zero of=bigfile bs=8M count=256 ;
 sync; rm bigfile"; done"
 
in the background on the root fs and running fsync-tester on /home.

Deadline:

liesel:~/test # ./fsync-tester
fsync time: 7.7866
fsync time: 9.5638
fsync time: 5.8163
fsync time: 5.5412
fsync time: 5.2630
fsync time: 8.6688
fsync time: 3.9947
fsync time: 5.4753
fsync time: 14.7666
fsync time: 4.0060
fsync time: 3.9231
fsync time: 4.0635
fsync time: 1.6129
^C

CFQ:

liesel:/home/htd/fs # liesel:~/test # ./fsync-tester
fsync time: 0.2457
fsync time: 0.3045
fsync time: 0.1980
fsync time: 0.2011
fsync time: 0.1941
fsync time: 0.2580
fsync time: 0.2041
fsync time: 0.2671
fsync time: 0.0320
fsync time: 0.2372
^C

The same setup here, running both the "bigfile torture test" and
fsync-tester on /home:

Deadline:

htd@liesel:~/fs> ./fsync-tester
fsync time: 11.0455
fsync time: 18.3555
fsync time: 6.8022
fsync time: 14.2020
fsync time: 9.4786
fsync time: 10.3002
fsync time: 7.2607
fsync time: 8.2169
fsync time: 3.7805
fsync time: 7.0325
fsync time: 12.0827
^C


CFQ:
htd@liesel:~/fs> ./fsync-tester
fsync time: 13.1126
fsync time: 4.9432
fsync time: 4.7833
fsync time: 0.2117
fsync time: 0.0167
fsync time: 14.6472
fsync time: 10.7527
fsync time: 4.3230
fsync time: 0.0151
fsync time: 15.1668
fsync time: 10.7662
fsync time: 0.1670
fsync time: 0.0156
^C

All partitions are XFS formatted using

 mkfs.xfs -f -l lazy-count=1,version=2 -i attr=2 -d agcount=4

and mounted that way:

 (rw,noatime,logbsize=256k,logbufs=2,nobarrier)

Kernel is 2.6.35-rc6.


Thanks, Heinz.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/5] cfq-iosched: Implment IOPS mode for group scheduling
  2010-07-27  5:47   ` Gui Jianfeng
@ 2010-07-27 13:09     ` Vivek Goyal
  0 siblings, 0 replies; 26+ messages in thread
From: Vivek Goyal @ 2010-07-27 13:09 UTC (permalink / raw)
  To: Gui Jianfeng; +Cc: linux-kernel, jaxboe, nauman, dpshah, jmoyer, czoccolo

On Tue, Jul 27, 2010 at 01:47:39PM +0800, Gui Jianfeng wrote:

[..]
> > @@ -913,19 +927,21 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg,
> >  				struct cfq_queue *cfqq)
> >  {
> >  	struct cfq_rb_root *st = &cfqd->grp_service_tree;
> > -	unsigned int used_sl, charge_sl;
> > +	unsigned int used_sl, charge;
> >  	int nr_sync = cfqg->nr_cfqq - cfqg_busy_async_queues(cfqd, cfqg)
> >  			- cfqg->service_tree_idle.count;
> >  
> >  	BUG_ON(nr_sync < 0);
> > -	used_sl = charge_sl = cfq_cfqq_slice_usage(cfqq);
> > +	used_sl = charge = cfq_cfqq_slice_usage(cfqq);
> >  
> > -	if (!cfq_cfqq_sync(cfqq) && !nr_sync)
> > -		charge_sl = cfqq->allocated_slice;
> > +	if (iops_mode(cfqd))
> > +		charge = cfqq->slice_dispatch;
> 
> Hi Vivek,
> 
> At this time, requests may still stay in dispatch list, shall we add a new variable
> in cfqq to keep track of the number of requests that go into driver, and charging
> this number?
> 

Hi Gui,

How does that help. Even if request is in dispatch list, sooner or later
it will be dispatched. As long as we can make sure that requests in 
dispatch list are in proportion to group weights, things should be just
fine.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-24  8:06         ` Heinz Diehl
  2010-07-26 13:43           ` Vivek Goyal
  2010-07-26 14:13           ` Christoph Hellwig
@ 2010-07-28 20:22           ` Vivek Goyal
  2010-07-28 23:57             ` Christoph Hellwig
  2 siblings, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2010-07-28 20:22 UTC (permalink / raw)
  To: Heinz Diehl
  Cc: linux-kernel, jaxboe, nauman, dpshah, guijianfeng, jmoyer,
	czoccolo

On Sat, Jul 24, 2010 at 10:06:13AM +0200, Heinz Diehl wrote:
> On 23.07.2010, Vivek Goyal wrote: 
> 
> > Anyway, for fs_mark problem, can you give following patch a try.
> > https://patchwork.kernel.org/patch/113061/
> 
> Ported it to 2.6.35-rc6, and these are my results using the same fs_mark
> call as before:
> 
> slice_idle = 0
> 
> FSUse%        Count         Size    Files/sec     App Overhead
>     28         1000        65536        241.6            39574
>     28         2000        65536        231.1            39939
>     28         3000        65536        230.4            39722
>     28         4000        65536        243.2            39646
>     28         5000        65536        227.0            39892
>     28         6000        65536        224.1            39555
>     28         7000        65536        228.2            39761
>     28         8000        65536        235.3            39766
>     28         9000        65536        237.3            40518
>     28        10000        65536        225.7            39861
>     28        11000        65536        227.2            39441
> 
> 
> slice_idle = 8
> 
> FSUse%        Count         Size    Files/sec     App Overhead
>     28         1000        65536        502.2            30545
>     28         2000        65536        407.6            29406
>     28         3000        65536        381.8            30152
>     28         4000        65536        438.1            30038
>     28         5000        65536        447.5            30477
>     28         6000        65536        422.0            29610
>     28         7000        65536        383.1            30327
>     28         8000        65536        415.3            30102
>     28         9000        65536        397.6            31013
>     28        10000        65536        401.4            29201
>     28        11000        65536        408.8            29720
>     28        12000        65536        391.2            29157
> 
> Huh...there's quite a difference! It's definitely the slice_idle settings
> which affect the results here.


> Besides, this patch gives noticeably bad desktop interactivity on my system.

Heinz,

I also ran linus torture test and fsync-tester on ext3 file system on my
SATA disk and with this corrado's fsync patch applied in fact I see better
results.

2.6.35-rc6 kernel
=================
fsync time: 1.2109
fsync time: 2.7531
fsync time: 1.3770
fsync time: 2.0839
fsync time: 1.4243
fsync time: 1.3211
fsync time: 1.1672
fsync time: 2.8345
fsync time: 1.4798
fsync time: 0.0170
fsync time: 0.0199
fsync time: 0.0204
fsync time: 0.2794
fsync time: 1.3525
fsync time: 2.2679
fsync time: 1.4629
fsync time: 1.5234
fsync time: 1.5693
fsync time: 1.7263
fsync time: 3.5739
fsync time: 1.4114
fsync time: 1.5517
fsync time: 1.5675
fsync time: 1.3818
fsync time: 1.8127
fsync time: 1.6394

2.6.35-rc6-fsync
================
fsync time: 3.8638
fsync time: 0.1209
fsync time: 2.3390
fsync time: 3.1501
fsync time: 0.1348
fsync time: 0.0879
fsync time: 1.0642
fsync time: 0.2153
fsync time: 0.1166
fsync time: 0.2744
fsync time: 0.1227
fsync time: 0.2072
fsync time: 0.0666
fsync time: 0.1818
fsync time: 0.2170
fsync time: 0.1814
fsync time: 0.0501
fsync time: 0.0198
fsync time: 0.1950
fsync time: 0.2099
fsync time: 0.0877
fsync time: 0.8291
fsync time: 0.0821
fsync time: 0.0777
fsync time: 0.0258
fsync time: 0.0574
fsync time: 0.1152
fsync time: 1.1466
fsync time: 0.2349
fsync time: 0.9589
fsync time: 1.1013
fsync time: 0.1681
fsync time: 0.0902
fsync time: 0.2052
fsync time: 0.0673

I also did "time firefox &" testing to see how long firefox takes to
launch when linus torture test is running and without patch it took
around 20 seconds and with patch it took around 17 seconds. 

So to me above test results suggest that this patch does not worsen
the performance. In fact it helps. (at least on ext3 file system.)

Not sure why are you seeing different results with XFS.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable
  2010-07-28 20:22           ` Vivek Goyal
@ 2010-07-28 23:57             ` Christoph Hellwig
  2010-07-29  4:34               ` cfq fsync patch testing results (Was: Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable) Vivek Goyal
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2010-07-28 23:57 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Heinz Diehl, linux-kernel, jaxboe, nauman, dpshah, guijianfeng,
	jmoyer, czoccolo

On Wed, Jul 28, 2010 at 04:22:12PM -0400, Vivek Goyal wrote:
> I also did "time firefox &" testing to see how long firefox takes to
> launch when linus torture test is running and without patch it took
> around 20 seconds and with patch it took around 17 seconds. 
> 
> So to me above test results suggest that this patch does not worsen
> the performance. In fact it helps. (at least on ext3 file system.)
> 
> Not sure why are you seeing different results with XFS.

So why didn't you test it with XFS to verify his results?  We all know
that different filesystems have different I/O patters, and we have
a history of really nasty regressions in one filesystem by good meaning
changes to the I/O scheduler.

ext3 in fact is a particularly bad test case as it not only doesn't have
I/O barriers enabled, but also has particularly bad I/O patterns
compared to modern filesystems.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* cfq fsync patch testing results (Was: Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable)
  2010-07-28 23:57             ` Christoph Hellwig
@ 2010-07-29  4:34               ` Vivek Goyal
  2010-07-29 14:56                 ` Vivek Goyal
  0 siblings, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2010-07-29  4:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Heinz Diehl, linux-kernel, jaxboe, nauman, dpshah, guijianfeng,
	jmoyer, czoccolo

On Wed, Jul 28, 2010 at 07:57:16PM -0400, Christoph Hellwig wrote:
> On Wed, Jul 28, 2010 at 04:22:12PM -0400, Vivek Goyal wrote:
> > I also did "time firefox &" testing to see how long firefox takes to
> > launch when linus torture test is running and without patch it took
> > around 20 seconds and with patch it took around 17 seconds. 
> > 
> > So to me above test results suggest that this patch does not worsen
> > the performance. In fact it helps. (at least on ext3 file system.)
> > 
> > Not sure why are you seeing different results with XFS.
> 
> So why didn't you test it with XFS to verify his results?

Just got little lazy. Find the testing results with ext3, ext4 and
xfs below.

>  We all know
> that different filesystems have different I/O patters, and we have
> a history of really nasty regressions in one filesystem by good meaning
> changes to the I/O scheduler.
> 
> ext3 in fact is a particularly bad test case as it not only doesn't have
> I/O barriers enabled, but also has particularly bad I/O patterns
> compared to modern filesystems.

Ext3 results
============
ext3 (2.6.35-rc6)	ext3 (35-rc6-fsync)
-----------------	-------------------
fsync time: 3.4173	fsync time: 0.0171
fsync time: 0.8831	fsync time: 0.0951
fsync time: 0.6985	fsync time: 0.0848
fsync time: 8.9449	fsync time: 0.1206
fsync time: 4.3075	fsync time: 0.4150
fsync time: 6.0146	fsync time: 0.0856
fsync time: 9.7134	fsync time: 0.1151
fsync time: 9.2247	fsync time: 0.1083
fsync time: 6.5061	fsync time: 0.1218
fsync time: 6.1862	fsync time: 4.1666
fsync time: 6.1136	fsync time: 0.1075
fsync time: 3.3593	fsync time: 0.3442
fsync time: 4.3309	fsync time: 0.1062
fsync time: 2.3596	fsync time: 2.8502
fsync time: 0.0151	fsync time: 0.0433
fsync time: 0.0180	fsync time: 4.0526
fsync time: 0.3685	fsync time: 0.1819
fsync time: 2.7396	fsync time: 0.1479
fsync time: 3.1537	fsync time: 0.1480
fsync time: 2.4474	fsync time: 0.1715
fsync time: 2.7085	fsync time: 0.0079
fsync time: 3.1629	fsync time: 0.0181
fsync time: 2.9186	fsync time: 0.0134

XFS results
==========
XFS (2.6.35-rc6)	XFS (with fsync patch)
fsync time: 5.0746	fsync time: 1.8025
fsync time: 3.0057	fsync time: 2.3392
fsync time: 3.0960	fsync time: 2.2810
fsync time: 2.8392	fsync time: 2.2894
fsync time: 2.4901	fsync time: 2.3059
fsync time: 2.3151	fsync time: 2.3061
fsync time: 2.3066	fsync time: 2.9825
fsync time: 0.6608	fsync time: 2.3144
fsync time: 0.0595	fsync time: 2.2894
fsync time: 2.0977	fsync time: 0.0508
fsync time: 2.3236	fsync time: 2.3396
fsync time: 2.3229	fsync time: 2.3310
fsync time: 2.3065	fsync time: 2.3061
fsync time: 2.3234	fsync time: 2.3060
fsync time: 2.3150	fsync time: 2.3561
fsync time: 2.3149	fsync time: 2.3313
fsync time: 2.3234	fsync time: 2.0221
fsync time: 2.3066	fsync time: 2.2891
fsync time: 2.3232	fsync time: 2.3144
fsync time: 2.3317	fsync time: 2.3144
fsync time: 2.3321	fsync time: 2.2894
fsync time: 2.3232	fsync time: 2.3228
fsync time: 0.0514	fsync time: 2.3144
fsync time: 2.2480	fsync time: 0.0506

Ext4
====
ext4 (vanilla)		ext4 (patched)
fsync time: 3.4080	fsync time: 2.9109
fsync time: 17.8330	fsync time: 25.0503
fsync time: 0.0922	fsync time: 2.5495
fsync time: 0.0710	fsync time: 0.0943
fsync time: 19.7977	fsync time: 0.0770
fsync time: 20.6592	fsync time: 16.3287
fsync time: 0.1020	fsync time: 24.4983
fsync time: 0.0689	fsync time: 0.1006
fsync time: 19.9981	fsync time: 0.0783
fsync time: 20.6605	fsync time: 19.1181
fsync time: 0.0930	fsync time: 22.0860
fsync time: 0.0776	fsync time: 0.0909


Notes:
======
- Above results are with and without corrado's fsync issue patch. We
  happen to be discussing it in a different thread though, hence
  specifying it specifically.

- I am running linus torture test and also running ted so's fsync-tester
  to monitor fsync latencies.

- Looks like ext3 fsync times have improved.
- XFS fsync times have remained unchanged.
- ext4 fsync times seems to have gone up a bit.

I used default mount options. So I am assuming high fsync times of ext4
comes from the fact that barriers much be enabled by default. Will do
some blktracing on ext4 case tomorrow, otherwise I think this patch
looks good.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: cfq fsync patch testing results (Was: Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable)
  2010-07-29  4:34               ` cfq fsync patch testing results (Was: Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable) Vivek Goyal
@ 2010-07-29 14:56                 ` Vivek Goyal
  2010-07-29 19:39                   ` Jeff Moyer
  0 siblings, 1 reply; 26+ messages in thread
From: Vivek Goyal @ 2010-07-29 14:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Heinz Diehl, linux-kernel, jaxboe, nauman, dpshah, guijianfeng,
	jmoyer, czoccolo, Ted Ts'o

On Thu, Jul 29, 2010 at 12:34:43AM -0400, Vivek Goyal wrote:
> On Wed, Jul 28, 2010 at 07:57:16PM -0400, Christoph Hellwig wrote:
> > On Wed, Jul 28, 2010 at 04:22:12PM -0400, Vivek Goyal wrote:
> > > I also did "time firefox &" testing to see how long firefox takes to
> > > launch when linus torture test is running and without patch it took
> > > around 20 seconds and with patch it took around 17 seconds. 
> > > 
> > > So to me above test results suggest that this patch does not worsen
> > > the performance. In fact it helps. (at least on ext3 file system.)
> > > 
> > > Not sure why are you seeing different results with XFS.
> > 
> > So why didn't you test it with XFS to verify his results?
> 
> Just got little lazy. Find the testing results with ext3, ext4 and
> xfs below.
> 
> >  We all know
> > that different filesystems have different I/O patters, and we have
> > a history of really nasty regressions in one filesystem by good meaning
> > changes to the I/O scheduler.
> > 
> > ext3 in fact is a particularly bad test case as it not only doesn't have
> > I/O barriers enabled, but also has particularly bad I/O patterns
> > compared to modern filesystems.
> 
> Ext3 results
> ============
> ext3 (2.6.35-rc6)	ext3 (35-rc6-fsync)
> -----------------	-------------------
> fsync time: 3.4173	fsync time: 0.0171
> fsync time: 0.8831	fsync time: 0.0951
> fsync time: 0.6985	fsync time: 0.0848
> fsync time: 8.9449	fsync time: 0.1206
> fsync time: 4.3075	fsync time: 0.4150
> fsync time: 6.0146	fsync time: 0.0856
> fsync time: 9.7134	fsync time: 0.1151
> fsync time: 9.2247	fsync time: 0.1083
> fsync time: 6.5061	fsync time: 0.1218
> fsync time: 6.1862	fsync time: 4.1666
> fsync time: 6.1136	fsync time: 0.1075
> fsync time: 3.3593	fsync time: 0.3442
> fsync time: 4.3309	fsync time: 0.1062
> fsync time: 2.3596	fsync time: 2.8502
> fsync time: 0.0151	fsync time: 0.0433
> fsync time: 0.0180	fsync time: 4.0526
> fsync time: 0.3685	fsync time: 0.1819
> fsync time: 2.7396	fsync time: 0.1479
> fsync time: 3.1537	fsync time: 0.1480
> fsync time: 2.4474	fsync time: 0.1715
> fsync time: 2.7085	fsync time: 0.0079
> fsync time: 3.1629	fsync time: 0.0181
> fsync time: 2.9186	fsync time: 0.0134
> 
> XFS results
> ==========
> XFS (2.6.35-rc6)	XFS (with fsync patch)
> fsync time: 5.0746	fsync time: 1.8025
> fsync time: 3.0057	fsync time: 2.3392
> fsync time: 3.0960	fsync time: 2.2810
> fsync time: 2.8392	fsync time: 2.2894
> fsync time: 2.4901	fsync time: 2.3059
> fsync time: 2.3151	fsync time: 2.3061
> fsync time: 2.3066	fsync time: 2.9825
> fsync time: 0.6608	fsync time: 2.3144
> fsync time: 0.0595	fsync time: 2.2894
> fsync time: 2.0977	fsync time: 0.0508
> fsync time: 2.3236	fsync time: 2.3396
> fsync time: 2.3229	fsync time: 2.3310
> fsync time: 2.3065	fsync time: 2.3061
> fsync time: 2.3234	fsync time: 2.3060
> fsync time: 2.3150	fsync time: 2.3561
> fsync time: 2.3149	fsync time: 2.3313
> fsync time: 2.3234	fsync time: 2.0221
> fsync time: 2.3066	fsync time: 2.2891
> fsync time: 2.3232	fsync time: 2.3144
> fsync time: 2.3317	fsync time: 2.3144
> fsync time: 2.3321	fsync time: 2.2894
> fsync time: 2.3232	fsync time: 2.3228
> fsync time: 0.0514	fsync time: 2.3144
> fsync time: 2.2480	fsync time: 0.0506
> 
> Ext4
> ====
> ext4 (vanilla)		ext4 (patched)
> fsync time: 3.4080	fsync time: 2.9109
> fsync time: 17.8330	fsync time: 25.0503
> fsync time: 0.0922	fsync time: 2.5495
> fsync time: 0.0710	fsync time: 0.0943
> fsync time: 19.7977	fsync time: 0.0770
> fsync time: 20.6592	fsync time: 16.3287
> fsync time: 0.1020	fsync time: 24.4983
> fsync time: 0.0689	fsync time: 0.1006
> fsync time: 19.9981	fsync time: 0.0783
> fsync time: 20.6605	fsync time: 19.1181
> fsync time: 0.0930	fsync time: 22.0860
> fsync time: 0.0776	fsync time: 0.0909
> 
> 
> Notes:
> ======
> - Above results are with and without corrado's fsync issue patch. We
>   happen to be discussing it in a different thread though, hence
>   specifying it specifically.
> 
> - I am running linus torture test and also running ted so's fsync-tester
>   to monitor fsync latencies.
> 
> - Looks like ext3 fsync times have improved.
> - XFS fsync times have remained unchanged.
> - ext4 fsync times seems to have gone up a bit.
> 
> I used default mount options. So I am assuming high fsync times of ext4
> comes from the fact that barriers much be enabled by default. Will do
> some blktracing on ext4 case tomorrow, otherwise I think this patch
> looks good.

For the sake of completeness, I also ran same tests on ext3 with barrier
enabled.

ext3 (barrier=1)	ext3 (barrier=1)
fsync time: 2.7601	fsync time: 1.5323
fsync time: 2.2352	fsync time: 1.5254
fsync time: 2.1689	fsync time: 1.4228
fsync time: 2.1666	fsync time: 1.8404
fsync time: 2.3017	fsync time: 5.6249
fsync time: 2.2256	fsync time: 1.6099
fsync time: 2.1588	fsync time: 1.5318
fsync time: 5.1648	fsync time: 2.0092
fsync time: 5.8390	fsync time: 1.9966
fsync time: 0.2109	fsync time: 2.0055
fsync time: 0.0906	fsync time: 2.0054
fsync time: 3.6327	fsync time: 0.1778
fsync time: 3.0161	fsync time: 0.0827
fsync time: 2.3194	fsync time: 2.3796
fsync time: 2.0581	fsync time: 1.5960
fsync time: 2.2850	fsync time: 1.5074
fsync time: 2.2002	fsync time: 1.8653
fsync time: 2.1932	fsync time: 1.8910
fsync time: 2.1753	fsync time: 1.9091
fsync time: 2.1669	fsync time: 1.8322
fsync time: 2.1671	fsync time: 1.8744
fsync time: 1.9552	fsync time: 1.8254
fsync time: 3.9870	fsync time: 1.8662
fsync time: 2.5140	fsync time: 1.8587
fsync time: 0.0867	fsync time: 1.7981

It is hard to say whether things improved or not with patch. I guess
slight improvement is there.

What is interesting though that this fsync-tester test case works well
with ext3 and xfs but with ext4 there seems to be large spikes in 
fsync times.

[CCing Ted Tso]

Thanks
Vivek

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: cfq fsync patch testing results (Was: Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable)
  2010-07-29 14:56                 ` Vivek Goyal
@ 2010-07-29 19:39                   ` Jeff Moyer
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff Moyer @ 2010-07-29 19:39 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Christoph Hellwig, Heinz Diehl, linux-kernel, jaxboe, nauman,
	dpshah, guijianfeng, czoccolo, Ted Ts'o

Vivek Goyal <vgoyal@redhat.com> writes:

> On Thu, Jul 29, 2010 at 12:34:43AM -0400, Vivek Goyal wrote:
>> On Wed, Jul 28, 2010 at 07:57:16PM -0400, Christoph Hellwig wrote:
>> > On Wed, Jul 28, 2010 at 04:22:12PM -0400, Vivek Goyal wrote:
>> > > I also did "time firefox &" testing to see how long firefox takes to
>> > > launch when linus torture test is running and without patch it took
>> > > around 20 seconds and with patch it took around 17 seconds. 
>> > > 
>> > > So to me above test results suggest that this patch does not worsen
>> > > the performance. In fact it helps. (at least on ext3 file system.)
>> > > 
>> > > Not sure why are you seeing different results with XFS.
>> > 
>> > So why didn't you test it with XFS to verify his results?
>> 
>> Just got little lazy. Find the testing results with ext3, ext4 and
>> xfs below.
>> 
>> >  We all know
>> > that different filesystems have different I/O patters, and we have
>> > a history of really nasty regressions in one filesystem by good meaning
>> > changes to the I/O scheduler.
>> > 
>> > ext3 in fact is a particularly bad test case as it not only doesn't have
>> > I/O barriers enabled, but also has particularly bad I/O patterns
>> > compared to modern filesystems.

A string of numbers is hard for me to parse.  In the hopes that this
will help others, here is some awk-fu that I shamelessly stole from the
internets:

awk '{total1+=$3; total2+=$6; array1[NR]=$3; array2[NR]=$6} END{for(x=1;x<=NR;x++){sumsq1+=((array1[x]-(total1/NR))**2); sumsq2+=((array2[x]-(total2/NR))**2);}print total1/NR " " sqrt(sumsq1/NR) " " total2/NR " " sqrt(sumsq2/NR)}'

>> Ext3 results
>> ============
>> ext3 (2.6.35-rc6)	ext3 (35-rc6-fsync)
>> -----------------	-------------------
avg     stddev           avg       stddev
3.8953  2.80654          0.587943  1.22399

>> 
>> XFS results
>> ==========
>> XFS (2.6.35-rc6)	XFS (with fsync patch)
2.2538 0.95565          2.11869 0.649704

>> Ext4
>> ====
>> ext4 (vanilla)		ext4 (patched)
8.57177 9.54596                 9.41524 10.4037

> ext3 (barrier=1)	ext3 (barrier=1)
2.40316 1.26992         1.82272 0.922305

It is interesting that ext4 does worse with the patch (though,
realistically, not by much).

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-07-29 19:40 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-22 21:29 [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Vivek Goyal
2010-07-22 21:29 ` [PATCH 1/5] cfq-iosched: Do not idle on service tree if slice_idle=0 Vivek Goyal
2010-07-22 21:29 ` [PATCH 2/5] cfq-iosched: Implment IOPS mode for group scheduling Vivek Goyal
2010-07-27  5:47   ` Gui Jianfeng
2010-07-27 13:09     ` Vivek Goyal
2010-07-22 21:29 ` [PATCH 3/5] cfq-iosched: Implement a tunable group_idle Vivek Goyal
2010-07-22 21:29 ` [PATCH 4/5] cfq-iosched: Print number of sectors dispatched per cfqq slice Vivek Goyal
2010-07-22 21:29 ` [PATCH 5/5] cfq-iosched: Documentation update Vivek Goyal
2010-07-22 21:36   ` Randy Dunlap
2010-07-23 20:22     ` Vivek Goyal
2010-07-23 14:03 ` [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable Heinz Diehl
2010-07-23 14:13   ` Vivek Goyal
2010-07-23 14:56     ` Heinz Diehl
2010-07-23 18:37       ` Vivek Goyal
2010-07-24  8:06         ` Heinz Diehl
2010-07-26 13:43           ` Vivek Goyal
2010-07-26 13:48             ` Christoph Hellwig
2010-07-26 13:54               ` Vivek Goyal
2010-07-26 16:15             ` Heinz Diehl
2010-07-26 14:13           ` Christoph Hellwig
2010-07-27  7:48             ` Heinz Diehl
2010-07-28 20:22           ` Vivek Goyal
2010-07-28 23:57             ` Christoph Hellwig
2010-07-29  4:34               ` cfq fsync patch testing results (Was: Re: [RFC PATCH] cfq-iosched: IOPS mode for group scheduling and new group_idle tunable) Vivek Goyal
2010-07-29 14:56                 ` Vivek Goyal
2010-07-29 19:39                   ` Jeff Moyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox