* [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
@ 2010-05-21 8:40 Gui Jianfeng
2010-05-21 8:43 ` [PATCH 1/4] io-controller: a new interface to keep track of bytes during group is backlogged Gui Jianfeng
` (4 more replies)
0 siblings, 5 replies; 15+ messages in thread
From: Gui Jianfeng @ 2010-05-21 8:40 UTC (permalink / raw)
To: Vivek Goyal, Jens Axboe; +Cc: linux kernel mailing list
Hi,
This series implements three new interfaces to keep track of tranferred bytes,
elapsing time and io rate since group getting backlogged. If the group dequeues
from service tree, these three interfaces will reset and shows zero.
Thanks
Gui
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] io-controller: a new interface to keep track of bytes during group is backlogged
2010-05-21 8:40 [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Gui Jianfeng
@ 2010-05-21 8:43 ` Gui Jianfeng
2010-05-21 8:44 ` [PATCH 2/4] io-controller: a new interface to keep track of the time since group bacame backlogged Gui Jianfeng
` (3 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Gui Jianfeng @ 2010-05-21 8:43 UTC (permalink / raw)
To: Vivek Goyal, Jens Axboe; +Cc: linux kernel mailing list
Add a new interface to keep track of how many bytes tranferred since this
group become backlogged. It'll be reset when this group dequeued.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
---
block/blk-cgroup.c | 24 ++++++++++++++++++++++++
block/blk-cgroup.h | 6 ++++++
block/cfq-iosched.c | 1 +
3 files changed, 31 insertions(+), 0 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 60bb049..749fc6b 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -292,6 +292,22 @@ void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time)
}
EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used);
+void blkiocg_reset_active_bytes(struct blkio_group *blkg)
+{
+ int i;
+ struct blkio_group_stats *stats;
+ unsigned long flags;
+
+ spin_lock_irqsave(&blkg->stats_lock, flags);
+
+ stats = &blkg->stats;
+ for (i = 0; i < BLKIO_STAT_TOTAL; i++)
+ stats->stat_arr[BLKIO_STAT_ACTIVE_BYTES][i] = 0;
+
+ spin_unlock_irqrestore(&blkg->stats_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blkiocg_reset_active_bytes);
+
void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
uint64_t bytes, bool direction, bool sync)
{
@@ -305,6 +321,8 @@ void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
sync);
blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICE_BYTES], bytes,
direction, sync);
+ blkio_add_stat(stats->stat_arr[BLKIO_STAT_ACTIVE_BYTES], bytes,
+ direction, sync);
spin_unlock_irqrestore(&blkg->stats_lock, flags);
}
EXPORT_SYMBOL_GPL(blkiocg_update_dispatch_stats);
@@ -625,6 +643,7 @@ static int blkiocg_##__VAR##_read(struct cgroup *cgroup, \
SHOW_FUNCTION_PER_GROUP(time, BLKIO_STAT_TIME, 0);
SHOW_FUNCTION_PER_GROUP(sectors, BLKIO_STAT_SECTORS, 0);
SHOW_FUNCTION_PER_GROUP(io_service_bytes, BLKIO_STAT_SERVICE_BYTES, 1);
+SHOW_FUNCTION_PER_GROUP(io_active_bytes, BLKIO_STAT_ACTIVE_BYTES, 1);
SHOW_FUNCTION_PER_GROUP(io_serviced, BLKIO_STAT_SERVICED, 1);
SHOW_FUNCTION_PER_GROUP(io_service_time, BLKIO_STAT_SERVICE_TIME, 1);
SHOW_FUNCTION_PER_GROUP(io_wait_time, BLKIO_STAT_WAIT_TIME, 1);
@@ -851,6 +870,11 @@ struct cftype blkio_files[] = {
.read_map = blkiocg_io_service_bytes_read,
},
{
+ .name = "io_active_bytes",
+ .read_map = blkiocg_io_active_bytes_read,
+ },
+
+ {
.name = "io_serviced",
.read_map = blkiocg_io_serviced_read,
},
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 2b866ec..67d4284 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -30,6 +30,11 @@ enum stat_type {
BLKIO_STAT_SERVICE_TIME = 0,
/* Total bytes transferred */
BLKIO_STAT_SERVICE_BYTES,
+ /*
+ * Total bytes transferred since group became backlogged, will reset
+ * when group dequeued.
+ */
+ BLKIO_STAT_ACTIVE_BYTES,
/* Total IOs serviced, post merge */
BLKIO_STAT_SERVICED,
/* Total time spent waiting in scheduler queue in ns */
@@ -144,6 +149,7 @@ struct blkio_policy_type {
/* Blkio controller policy registration */
extern void blkio_policy_register(struct blkio_policy_type *);
extern void blkio_policy_unregister(struct blkio_policy_type *);
+extern void blkiocg_reset_active_bytes(struct blkio_group *);
static inline char *blkg_path(struct blkio_group *blkg)
{
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 0f3eb70..d4a3525 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -858,6 +858,7 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg)
cfq_rb_erase(&cfqg->rb_node, st);
cfqg->saved_workload_slice = 0;
blkiocg_update_dequeue_stats(&cfqg->blkg, 1);
+ blkiocg_reset_active_bytes(&cfqg->blkg);
}
static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq)
-- 1.5.4.rc3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] io-controller: a new interface to keep track of the time since group bacame backlogged
2010-05-21 8:40 [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Gui Jianfeng
2010-05-21 8:43 ` [PATCH 1/4] io-controller: a new interface to keep track of bytes during group is backlogged Gui Jianfeng
@ 2010-05-21 8:44 ` Gui Jianfeng
2010-05-21 8:45 ` [PATCH 3/4] io-controller: a new interface to keep track of io rate when group is backlogged Gui Jianfeng
` (2 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Gui Jianfeng @ 2010-05-21 8:44 UTC (permalink / raw)
To: Vivek Goyal, Jens Axboe; +Cc: linux kernel mailing list
A new interface to keep track of the time elapsed since group became
backlogged. Will be reset when group enqueued.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
---
block/blk-cgroup.c | 42 +++++++++++++++++++++++++++++++++++++++++-
block/blk-cgroup.h | 9 +++++++++
block/cfq-iosched.c | 14 ++++++++++++++
3 files changed, 64 insertions(+), 1 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 749fc6b..5b47655 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -308,6 +308,20 @@ void blkiocg_reset_active_bytes(struct blkio_group *blkg)
}
EXPORT_SYMBOL_GPL(blkiocg_reset_active_bytes);
+void blkiocg_update_active_start_time(struct blkio_group *blkg)
+{
+ struct blkio_group_stats *stats;
+ unsigned long flags;
+
+ spin_lock_irqsave(&blkg->stats_lock, flags);
+
+ stats = &blkg->stats;
+ stats->active_start_time = get_jiffies_64();
+
+ spin_unlock_irqrestore(&blkg->stats_lock, flags);
+}
+EXPORT_SYMBOL_GPL(blkiocg_update_active_start_time);
+
void blkiocg_update_dispatch_stats(struct blkio_group *blkg,
uint64_t bytes, bool direction, bool sync)
{
@@ -568,6 +582,8 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg,
uint64_t disk_total;
char key_str[MAX_KEY_LEN];
enum stat_sub_type sub_type;
+ struct blkio_policy_type *blkiop;
+ int ret;
if (type == BLKIO_STAT_TIME)
return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
@@ -575,6 +591,26 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg,
if (type == BLKIO_STAT_SECTORS)
return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
blkg->stats.sectors, cb, dev);
+ if (type == BLKIO_STAT_ACTIVE_TIME) {
+ uint64_t delta;
+
+ delta = get_jiffies_64() -
+ blkg->stats.active_start_time;
+
+ delta = jiffies_to_msecs(delta);
+ list_for_each_entry(blkiop, &blkio_list, list) {
+ ret = blkiop->ops.blkio_is_blkg_active_fn(blkg);
+ /* If group isn't backlogged, don't show */
+ if (!ret) {
+ delta = 0;
+ break;
+ }
+ }
+
+ return blkio_fill_stat(key_str, MAX_KEY_LEN - 1,
+ delta, cb, dev);
+ }
+
#ifdef CONFIG_DEBUG_BLK_CGROUP
if (type == BLKIO_STAT_AVG_QUEUE_SIZE) {
uint64_t sum = blkg->stats.avg_queue_size_sum;
@@ -644,6 +680,7 @@ SHOW_FUNCTION_PER_GROUP(time, BLKIO_STAT_TIME, 0);
SHOW_FUNCTION_PER_GROUP(sectors, BLKIO_STAT_SECTORS, 0);
SHOW_FUNCTION_PER_GROUP(io_service_bytes, BLKIO_STAT_SERVICE_BYTES, 1);
SHOW_FUNCTION_PER_GROUP(io_active_bytes, BLKIO_STAT_ACTIVE_BYTES, 1);
+SHOW_FUNCTION_PER_GROUP(io_active_time, BLKIO_STAT_ACTIVE_TIME, 0);
SHOW_FUNCTION_PER_GROUP(io_serviced, BLKIO_STAT_SERVICED, 1);
SHOW_FUNCTION_PER_GROUP(io_service_time, BLKIO_STAT_SERVICE_TIME, 1);
SHOW_FUNCTION_PER_GROUP(io_wait_time, BLKIO_STAT_WAIT_TIME, 1);
@@ -873,7 +910,10 @@ struct cftype blkio_files[] = {
.name = "io_active_bytes",
.read_map = blkiocg_io_active_bytes_read,
},
-
+ {
+ .name = "io_active_time",
+ .read_map = blkiocg_io_active_time_read,
+ },
{
.name = "io_serviced",
.read_map = blkiocg_io_serviced_read,
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 67d4284..2ae7976 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -35,6 +35,11 @@ enum stat_type {
* when group dequeued.
*/
BLKIO_STAT_ACTIVE_BYTES,
+ /*
+ * Total time (in ms) elapsed since group became backlogged, will
+ * reset when group enqueued.
+ */
+ BLKIO_STAT_ACTIVE_TIME,
/* Total IOs serviced, post merge */
BLKIO_STAT_SERVICED,
/* Total time spent waiting in scheduler queue in ns */
@@ -83,6 +88,7 @@ struct blkio_group_stats {
uint64_t time;
uint64_t sectors;
uint64_t stat_arr[BLKIO_STAT_QUEUED + 1][BLKIO_STAT_TOTAL];
+ uint64_t active_start_time;
#ifdef CONFIG_DEBUG_BLK_CGROUP
/* Sum of number of IOs queued across all samples */
uint64_t avg_queue_size_sum;
@@ -135,10 +141,12 @@ extern unsigned int blkcg_get_weight(struct blkio_cgroup *blkcg,
typedef void (blkio_unlink_group_fn) (void *key, struct blkio_group *blkg);
typedef void (blkio_update_group_weight_fn) (struct blkio_group *blkg,
unsigned int weight);
+typedef bool (blkio_is_blkg_active) (struct blkio_group *blkg);
struct blkio_policy_ops {
blkio_unlink_group_fn *blkio_unlink_group_fn;
blkio_update_group_weight_fn *blkio_update_group_weight_fn;
+ blkio_is_blkg_active *blkio_is_blkg_active_fn;
};
struct blkio_policy_type {
@@ -150,6 +158,7 @@ struct blkio_policy_type {
extern void blkio_policy_register(struct blkio_policy_type *);
extern void blkio_policy_unregister(struct blkio_policy_type *);
extern void blkiocg_reset_active_bytes(struct blkio_group *);
+extern void blkiocg_update_active_start_time(struct blkio_group *);
static inline char *blkg_path(struct blkio_group *blkg)
{
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index d4a3525..b21ae10 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -834,6 +834,7 @@ cfq_group_service_tree_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
__cfq_group_service_tree_add(st, cfqg);
cfqg->on_st = true;
st->total_weight += cfqg->weight;
+ blkiocg_update_active_start_time(&cfqg->blkg);
}
static void
@@ -930,6 +931,18 @@ static inline struct cfq_group *cfqg_of_blkg(struct blkio_group *blkg)
return NULL;
}
+bool is_blkg_active(struct blkio_group *blkg)
+{
+ struct cfq_group *cfqg;
+
+ cfqg = cfqg_of_blkg(blkg);
+ if (cfqg && cfqg->on_st)
+ return true;
+
+ return false;
+}
+EXPORT_SYMBOL_GPL(is_blkg_active);
+
void
cfq_update_blkio_group_weight(struct blkio_group *blkg, unsigned int weight)
{
@@ -3944,6 +3957,7 @@ static struct blkio_policy_type blkio_policy_cfq = {
.ops = {
.blkio_unlink_group_fn = cfq_unlink_blkio_group,
.blkio_update_group_weight_fn = cfq_update_blkio_group_weight,
+ .blkio_is_blkg_active_fn = is_blkg_active,
},
};
#else
-- 1.5.4.rc3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] io-controller: a new interface to keep track of io rate when group is backlogged
2010-05-21 8:40 [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Gui Jianfeng
2010-05-21 8:43 ` [PATCH 1/4] io-controller: a new interface to keep track of bytes during group is backlogged Gui Jianfeng
2010-05-21 8:44 ` [PATCH 2/4] io-controller: a new interface to keep track of the time since group bacame backlogged Gui Jianfeng
@ 2010-05-21 8:45 ` Gui Jianfeng
2010-05-21 8:46 ` [PATCH 4/4] io-controller: Document for active bytes, time and rate Gui Jianfeng
2010-05-21 13:17 ` [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Vivek Goyal
4 siblings, 0 replies; 15+ messages in thread
From: Gui Jianfeng @ 2010-05-21 8:45 UTC (permalink / raw)
To: Vivek Goyal, Jens Axboe; +Cc: linux kernel mailing list
Add a new interface to keep track of io rate of a group when it's backlogged.
If the group is dequeued, io rate isn't calculated.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
---
block/blk-cgroup.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
block/blk-cgroup.h | 5 +++++
2 files changed, 53 insertions(+), 0 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 5b47655..01a8c4e 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -611,6 +611,49 @@ static uint64_t blkio_get_stat(struct blkio_group *blkg,
delta, cb, dev);
}
+ if (type == BLKIO_STAT_ACTIVE_RATE) {
+ uint64_t bytes;
+ uint64_t delta;
+
+ delta = get_jiffies_64() -
+ blkg->stats.active_start_time;
+
+ list_for_each_entry(blkiop, &blkio_list, list) {
+ ret = blkiop->ops.blkio_is_blkg_active_fn(blkg);
+ if (!ret) {
+ delta = 0;
+ break;
+ }
+ }
+
+ if (delta == 0)
+ return 0;
+
+ delta = jiffies_to_msecs(delta);
+
+ for (sub_type = BLKIO_STAT_READ; sub_type < BLKIO_STAT_TOTAL;
+ sub_type++) {
+ blkio_get_key_name(sub_type, dev, key_str,
+ MAX_KEY_LEN, false);
+
+ bytes = blkg->stats.stat_arr
+ [BLKIO_STAT_ACTIVE_BYTES][sub_type];
+ do_div(bytes, delta);
+ cb->fill(cb, key_str, bytes);
+ }
+
+ disk_total = blkg->stats.stat_arr[BLKIO_STAT_ACTIVE_BYTES]
+ [BLKIO_STAT_READ] +
+ blkg->stats.stat_arr[BLKIO_STAT_ACTIVE_BYTES]
+ [BLKIO_STAT_WRITE];
+
+ do_div(disk_total, delta);
+ blkio_get_key_name(BLKIO_STAT_TOTAL, dev,
+ key_str, MAX_KEY_LEN, false);
+ cb->fill(cb, key_str, disk_total);
+ return 0;
+ }
+
#ifdef CONFIG_DEBUG_BLK_CGROUP
if (type == BLKIO_STAT_AVG_QUEUE_SIZE) {
uint64_t sum = blkg->stats.avg_queue_size_sum;
@@ -681,6 +724,7 @@ SHOW_FUNCTION_PER_GROUP(sectors, BLKIO_STAT_SECTORS, 0);
SHOW_FUNCTION_PER_GROUP(io_service_bytes, BLKIO_STAT_SERVICE_BYTES, 1);
SHOW_FUNCTION_PER_GROUP(io_active_bytes, BLKIO_STAT_ACTIVE_BYTES, 1);
SHOW_FUNCTION_PER_GROUP(io_active_time, BLKIO_STAT_ACTIVE_TIME, 0);
+SHOW_FUNCTION_PER_GROUP(io_active_rate, BLKIO_STAT_ACTIVE_RATE, 0);
SHOW_FUNCTION_PER_GROUP(io_serviced, BLKIO_STAT_SERVICED, 1);
SHOW_FUNCTION_PER_GROUP(io_service_time, BLKIO_STAT_SERVICE_TIME, 1);
SHOW_FUNCTION_PER_GROUP(io_wait_time, BLKIO_STAT_WAIT_TIME, 1);
@@ -915,6 +959,10 @@ struct cftype blkio_files[] = {
.read_map = blkiocg_io_active_time_read,
},
{
+ .name = "io_active_rate",
+ .read_map = blkiocg_io_active_rate_read,
+ },
+ {
.name = "io_serviced",
.read_map = blkiocg_io_serviced_read,
},
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 2ae7976..cd72e7f 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -40,6 +40,11 @@ enum stat_type {
* reset when group enqueued.
*/
BLKIO_STAT_ACTIVE_TIME,
+ /*
+ * io rate (in bytes/ms) of the group since group became backlogged,
+ * if group is dequeued, show zero.
+ */
+ BLKIO_STAT_ACTIVE_RATE,
/* Total IOs serviced, post merge */
BLKIO_STAT_SERVICED,
/* Total time spent waiting in scheduler queue in ns */
-- 1.5.4.rc3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] io-controller: Document for active bytes, time and rate.
2010-05-21 8:40 [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Gui Jianfeng
` (2 preceding siblings ...)
2010-05-21 8:45 ` [PATCH 3/4] io-controller: a new interface to keep track of io rate when group is backlogged Gui Jianfeng
@ 2010-05-21 8:46 ` Gui Jianfeng
2010-05-26 18:57 ` Randy Dunlap
2010-05-21 13:17 ` [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Vivek Goyal
4 siblings, 1 reply; 15+ messages in thread
From: Gui Jianfeng @ 2010-05-21 8:46 UTC (permalink / raw)
To: Vivek Goyal, Jens Axboe; +Cc: linux kernel mailing list
Document for active bytes time and rate.
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
---
Documentation/cgroups/blkio-controller.txt | 21 +++++++++++++++++++++
1 files changed, 21 insertions(+), 0 deletions(-)
diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index 48e0b21..2b015e8 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -168,6 +168,27 @@ Details of cgroup files
cgroup. This is further divided by the type of operation - read or
write, sync or async.
+- blkio.io_active_bytes
+ - Total number bytes transferred to/from the disk by the group since
+ the group got backlogged. First two fields specify the major and
+ minor number of the device, the third field specifies the operation
+ type and the fourth field specifies the number of bytes. It will be
+ reset when group gets dequeued from service tree.
+
+- blkio.io_active_time
+ - Total amount of time(in ms) spent since the group got backlogged.
+ First two fields specify the major and minor number of the device,
+ third field specify how long it spents since the group got
+ backlogged. It will be reset when group gets enqueued onto service
+ tree.
+
+- blkio.io_active_rate
+ - The io rate (in bytes/ms) of the group when group is issuing io.
+ First two fields specify the major and minor number of the device,
+ the third field specifies the operation type and the fourth field
+ specifies the io rate. If there isn't any io on the group, io rate
+ won't be computed.
+
- blkio.avg_queue_size
- Debugging aid only enabled if CONFIG_DEBUG_BLK_CGROUP=y.
The average queue size for this cgroup over the entire time of this
-- 1.5.4.rc3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
2010-05-21 8:40 [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Gui Jianfeng
` (3 preceding siblings ...)
2010-05-21 8:46 ` [PATCH 4/4] io-controller: Document for active bytes, time and rate Gui Jianfeng
@ 2010-05-21 13:17 ` Vivek Goyal
2010-05-24 1:12 ` Gui Jianfeng
4 siblings, 1 reply; 15+ messages in thread
From: Vivek Goyal @ 2010-05-21 13:17 UTC (permalink / raw)
To: Gui Jianfeng; +Cc: Jens Axboe, linux kernel mailing list
On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
> Hi,
>
> This series implements three new interfaces to keep track of tranferred bytes,
> elapsing time and io rate since group getting backlogged. If the group dequeues
> from service tree, these three interfaces will reset and shows zero.
Hi Gui,
Can you give some details regarding how this functionality is useful? Why
would somebody be interested in only in stats of till group was
backlogged and not in total stats?
Groups can come and go so fast and these stats will reset so many times
that I am not able to visualize how these stats will be useful.
Thanks
Vivek
>
> Thanks
> Gui
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
2010-05-21 13:17 ` [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Vivek Goyal
@ 2010-05-24 1:12 ` Gui Jianfeng
2010-05-24 21:22 ` Vivek Goyal
0 siblings, 1 reply; 15+ messages in thread
From: Gui Jianfeng @ 2010-05-24 1:12 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Jens Axboe, linux kernel mailing list
Vivek Goyal wrote:
> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
>> Hi,
>>
>> This series implements three new interfaces to keep track of tranferred bytes,
>> elapsing time and io rate since group getting backlogged. If the group dequeues
>> from service tree, these three interfaces will reset and shows zero.
>
> Hi Gui,
>
> Can you give some details regarding how this functionality is useful? Why
> would somebody be interested in only in stats of till group was
> backlogged and not in total stats?
>
> Groups can come and go so fast and these stats will reset so many times
> that I am not able to visualize how these stats will be useful.
Hi Vivek,
Currently, we assign weight to a group, but user still doesn't know how fast the
group runs. With io rate interface, users can check the rate of a group at any
moment, or to determine whether the weight assigned to a group is enough.
bytes and time interface is just for debug purpose.
Thanks,
Gui
>
> Thanks
> Vivek
>
>> Thanks
>> Gui
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
2010-05-24 1:12 ` Gui Jianfeng
@ 2010-05-24 21:22 ` Vivek Goyal
2010-05-25 1:37 ` Gui Jianfeng
0 siblings, 1 reply; 15+ messages in thread
From: Vivek Goyal @ 2010-05-24 21:22 UTC (permalink / raw)
To: Gui Jianfeng; +Cc: Jens Axboe, linux kernel mailing list
On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
> Vivek Goyal wrote:
> > On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
> >> Hi,
> >>
> >> This series implements three new interfaces to keep track of tranferred bytes,
> >> elapsing time and io rate since group getting backlogged. If the group dequeues
> >> from service tree, these three interfaces will reset and shows zero.
> >
> > Hi Gui,
> >
> > Can you give some details regarding how this functionality is useful? Why
> > would somebody be interested in only in stats of till group was
> > backlogged and not in total stats?
> >
> > Groups can come and go so fast and these stats will reset so many times
> > that I am not able to visualize how these stats will be useful.
>
> Hi Vivek,
>
> Currently, we assign weight to a group, but user still doesn't know how fast the
> group runs. With io rate interface, users can check the rate of a group at any
> moment, or to determine whether the weight assigned to a group is enough.
> bytes and time interface is just for debug purpose.
Gui,
I still don't understand that why blkio.sectors or blkio.io_service_bytes
or blkio.io_serviced interfaces are not good enough to determine at what
rate a group is doing IO.
I think we can very well write something in userspace like "iostat" to
display the per group rate. Utility can read the any of the above files
say at the interfval of 1s, calculate the diff between the values and
display that as group effective rate.
Thanks
Vivek
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
2010-05-24 21:22 ` Vivek Goyal
@ 2010-05-25 1:37 ` Gui Jianfeng
2010-05-25 2:03 ` Vivek Goyal
0 siblings, 1 reply; 15+ messages in thread
From: Gui Jianfeng @ 2010-05-25 1:37 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Jens Axboe, linux kernel mailing list
Vivek Goyal wrote:
> On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
>> Vivek Goyal wrote:
>>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
>>>> Hi,
>>>>
>>>> This series implements three new interfaces to keep track of tranferred bytes,
>>>> elapsing time and io rate since group getting backlogged. If the group dequeues
>>>> from service tree, these three interfaces will reset and shows zero.
>>> Hi Gui,
>>>
>>> Can you give some details regarding how this functionality is useful? Why
>>> would somebody be interested in only in stats of till group was
>>> backlogged and not in total stats?
>>>
>>> Groups can come and go so fast and these stats will reset so many times
>>> that I am not able to visualize how these stats will be useful.
>> Hi Vivek,
>>
>> Currently, we assign weight to a group, but user still doesn't know how fast the
>> group runs. With io rate interface, users can check the rate of a group at any
>> moment, or to determine whether the weight assigned to a group is enough.
>> bytes and time interface is just for debug purpose.
>
> Gui,
>
> I still don't understand that why blkio.sectors or blkio.io_service_bytes
> or blkio.io_serviced interfaces are not good enough to determine at what
> rate a group is doing IO.
>
> I think we can very well write something in userspace like "iostat" to
> display the per group rate. Utility can read the any of the above files
> say at the interfval of 1s, calculate the diff between the values and
> display that as group effective rate.
Hi Vivek,
blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth
value. This value represents the actual rate a group runs. IMO, io rate calculated from
user space is not accurate in following two scenarios:
1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the
rate calculated in this interval doesn't make sense.
2 Consider there're several groups are waiting for service, but most part of the interval
is just fall into the period that the group is under-service. such rate calculated by user
app isn't acurate, rate burst might occur.
Further more, once max weight control is available, we can make use of such interface to realize
how well this group works.
Thanks,
Gui
>
> Thanks
> Vivek
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
2010-05-25 1:37 ` Gui Jianfeng
@ 2010-05-25 2:03 ` Vivek Goyal
2010-05-25 3:00 ` Gui Jianfeng
0 siblings, 1 reply; 15+ messages in thread
From: Vivek Goyal @ 2010-05-25 2:03 UTC (permalink / raw)
To: Gui Jianfeng; +Cc: Jens Axboe, linux kernel mailing list
On Tue, May 25, 2010 at 09:37:31AM +0800, Gui Jianfeng wrote:
> Vivek Goyal wrote:
> > On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
> >> Vivek Goyal wrote:
> >>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
> >>>> Hi,
> >>>>
> >>>> This series implements three new interfaces to keep track of tranferred bytes,
> >>>> elapsing time and io rate since group getting backlogged. If the group dequeues
> >>>> from service tree, these three interfaces will reset and shows zero.
> >>> Hi Gui,
> >>>
> >>> Can you give some details regarding how this functionality is useful? Why
> >>> would somebody be interested in only in stats of till group was
> >>> backlogged and not in total stats?
> >>>
> >>> Groups can come and go so fast and these stats will reset so many times
> >>> that I am not able to visualize how these stats will be useful.
> >> Hi Vivek,
> >>
> >> Currently, we assign weight to a group, but user still doesn't know how fast the
> >> group runs. With io rate interface, users can check the rate of a group at any
> >> moment, or to determine whether the weight assigned to a group is enough.
> >> bytes and time interface is just for debug purpose.
> >
> > Gui,
> >
> > I still don't understand that why blkio.sectors or blkio.io_service_bytes
> > or blkio.io_serviced interfaces are not good enough to determine at what
> > rate a group is doing IO.
> >
> > I think we can very well write something in userspace like "iostat" to
> > display the per group rate. Utility can read the any of the above files
> > say at the interfval of 1s, calculate the diff between the values and
> > display that as group effective rate.
>
> Hi Vivek,
>
> blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth
> value. This value represents the actual rate a group runs. IMO, io rate calculated from
> user space is not accurate in following two scenarios:
>
> 1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the
> rate calculated in this interval doesn't make sense.
>
If you are not servicing groups for long time, anyway it is very bad for
latency. So that's why soft limit of 300ms of CFQ makes sense and
practically I am not sure you will be blocking groups for .5s.
Even if you do, then user just needs to choose a bigger interval and you
will see more smooth rates. Reduce the interval and you might see little
bursty rate.
And, why do you say that "io_active_rate" is smooth interface. IIUC, the
value of group rate will vary depending on time when I read the file.
Assume a group gets serviced for 30ms and then is put back in the queue
and is serviced again after 50ms. If I read the "io_active_rate"
immediately after group has been serviced I should see a high rate value
and if I read the same file after another 30ms I would see a reduced rate.
Point being that to get a better idea of average rate of group, we need
to observe byte transferred over a little longer period. If you sample
bytes transferred from a group over a very short interval then you can
expect bursty output. There is no way to avoid that?
> 2 Consider there're several groups are waiting for service, but most part of the interval
> is just fall into the period that the group is under-service. such rate calculated by user
> app isn't acurate, rate burst might occur.
Actually I think that whole notion of relying on time calculations of CFQ
is not very good. these are very approximate time calculations. There are
many situations where calculating time is not possible and we approximate
the slice_used to 1ms. So relying on that time for rate calculation is
much more inaccurate.
Hence I think calculating group's rate in user space makes much more
sense.
>
> Further more, once max weight control is available, we can make use of such interface to realize
> how well this group works.
Again I don't understand with max BW controller, why can't we monitor the
group's BW in userspace accurately?
Vivek
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
2010-05-25 2:03 ` Vivek Goyal
@ 2010-05-25 3:00 ` Gui Jianfeng
2010-05-25 13:25 ` Vivek Goyal
0 siblings, 1 reply; 15+ messages in thread
From: Gui Jianfeng @ 2010-05-25 3:00 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Jens Axboe, linux kernel mailing list
Vivek Goyal wrote:
> On Tue, May 25, 2010 at 09:37:31AM +0800, Gui Jianfeng wrote:
>> Vivek Goyal wrote:
>>> On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
>>>> Vivek Goyal wrote:
>>>>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
>>>>>> Hi,
>>>>>>
>>>>>> This series implements three new interfaces to keep track of tranferred bytes,
>>>>>> elapsing time and io rate since group getting backlogged. If the group dequeues
>>>>>> from service tree, these three interfaces will reset and shows zero.
>>>>> Hi Gui,
>>>>>
>>>>> Can you give some details regarding how this functionality is useful? Why
>>>>> would somebody be interested in only in stats of till group was
>>>>> backlogged and not in total stats?
>>>>>
>>>>> Groups can come and go so fast and these stats will reset so many times
>>>>> that I am not able to visualize how these stats will be useful.
>>>> Hi Vivek,
>>>>
>>>> Currently, we assign weight to a group, but user still doesn't know how fast the
>>>> group runs. With io rate interface, users can check the rate of a group at any
>>>> moment, or to determine whether the weight assigned to a group is enough.
>>>> bytes and time interface is just for debug purpose.
>>> Gui,
>>>
>>> I still don't understand that why blkio.sectors or blkio.io_service_bytes
>>> or blkio.io_serviced interfaces are not good enough to determine at what
>>> rate a group is doing IO.
>>>
>>> I think we can very well write something in userspace like "iostat" to
>>> display the per group rate. Utility can read the any of the above files
>>> say at the interfval of 1s, calculate the diff between the values and
>>> display that as group effective rate.
>> Hi Vivek,
>>
>> blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth
>> value. This value represents the actual rate a group runs. IMO, io rate calculated from
>> user space is not accurate in following two scenarios:
>>
>> 1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the
>> rate calculated in this interval doesn't make sense.
>>
>
> If you are not servicing groups for long time, anyway it is very bad for
> latency. So that's why soft limit of 300ms of CFQ makes sense and
> practically I am not sure you will be blocking groups for .5s.
>
> Even if you do, then user just needs to choose a bigger interval and you
> will see more smooth rates. Reduce the interval and you might see little
> bursty rate.
Vivek,
IIUC, the most big problem for user app is the user app doesn't know how long
the group has been dequeued during the interval. For example, user choose
10s interval, 8s of which is not backlogged, but when user app calculates
io rate, this 8s still include. So this rate isn't what we want. Am i missing
something?
"io_active_rate" will never take un-backlogged time into account when calculating
io rate.
>
> And, why do you say that "io_active_rate" is smooth interface. IIUC, the
> value of group rate will vary depending on time when I read the file.
"io_active_rate" always shows backlogged io rate. Maybe when io_active_time
is very small, don't need to calculate "io_active_rate".
Thanks
Gui
>
> Assume a group gets serviced for 30ms and then is put back in the queue
> and is serviced again after 50ms. If I read the "io_active_rate"
> immediately after group has been serviced I should see a high rate value
> and if I read the same file after another 30ms I would see a reduced rate.
>
> Point being that to get a better idea of average rate of group, we need
> to observe byte transferred over a little longer period. If you sample
> bytes transferred from a group over a very short interval then you can
> expect bursty output. There is no way to avoid that?
>
>> 2 Consider there're several groups are waiting for service, but most part of the interval
>> is just fall into the period that the group is under-service. such rate calculated by user
>> app isn't acurate, rate burst might occur.
>
> Actually I think that whole notion of relying on time calculations of CFQ
> is not very good. these are very approximate time calculations. There are
> many situations where calculating time is not possible and we approximate
> the slice_used to 1ms. So relying on that time for rate calculation is
> much more inaccurate.
>
> Hence I think calculating group's rate in user space makes much more
> sense.
>
>> Further more, once max weight control is available, we can make use of such interface to realize
>> how well this group works.
>
> Again I don't understand with max BW controller, why can't we monitor the
> group's BW in userspace accurately?
>
> Vivek
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
2010-05-25 3:00 ` Gui Jianfeng
@ 2010-05-25 13:25 ` Vivek Goyal
2010-06-11 5:10 ` Divyesh Shah
0 siblings, 1 reply; 15+ messages in thread
From: Vivek Goyal @ 2010-05-25 13:25 UTC (permalink / raw)
To: Gui Jianfeng; +Cc: Jens Axboe, linux kernel mailing list
On Tue, May 25, 2010 at 11:00:54AM +0800, Gui Jianfeng wrote:
> Vivek Goyal wrote:
> > On Tue, May 25, 2010 at 09:37:31AM +0800, Gui Jianfeng wrote:
> >> Vivek Goyal wrote:
> >>> On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
> >>>> Vivek Goyal wrote:
> >>>>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> This series implements three new interfaces to keep track of tranferred bytes,
> >>>>>> elapsing time and io rate since group getting backlogged. If the group dequeues
> >>>>>> from service tree, these three interfaces will reset and shows zero.
> >>>>> Hi Gui,
> >>>>>
> >>>>> Can you give some details regarding how this functionality is useful? Why
> >>>>> would somebody be interested in only in stats of till group was
> >>>>> backlogged and not in total stats?
> >>>>>
> >>>>> Groups can come and go so fast and these stats will reset so many times
> >>>>> that I am not able to visualize how these stats will be useful.
> >>>> Hi Vivek,
> >>>>
> >>>> Currently, we assign weight to a group, but user still doesn't know how fast the
> >>>> group runs. With io rate interface, users can check the rate of a group at any
> >>>> moment, or to determine whether the weight assigned to a group is enough.
> >>>> bytes and time interface is just for debug purpose.
> >>> Gui,
> >>>
> >>> I still don't understand that why blkio.sectors or blkio.io_service_bytes
> >>> or blkio.io_serviced interfaces are not good enough to determine at what
> >>> rate a group is doing IO.
> >>>
> >>> I think we can very well write something in userspace like "iostat" to
> >>> display the per group rate. Utility can read the any of the above files
> >>> say at the interfval of 1s, calculate the diff between the values and
> >>> display that as group effective rate.
> >> Hi Vivek,
> >>
> >> blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth
> >> value. This value represents the actual rate a group runs. IMO, io rate calculated from
> >> user space is not accurate in following two scenarios:
> >>
> >> 1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the
> >> rate calculated in this interval doesn't make sense.
> >>
> >
> > If you are not servicing groups for long time, anyway it is very bad for
> > latency. So that's why soft limit of 300ms of CFQ makes sense and
> > practically I am not sure you will be blocking groups for .5s.
> >
> > Even if you do, then user just needs to choose a bigger interval and you
> > will see more smooth rates. Reduce the interval and you might see little
> > bursty rate.
>
> Vivek,
>
> IIUC, the most big problem for user app is the user app doesn't know how long
> the group has been dequeued during the interval. For example, user choose
> 10s interval, 8s of which is not backlogged, but when user app calculates
> io rate, this 8s still include. So this rate isn't what we want. Am i missing
> something?
Gui,
If user application is not doing enough IO and group is getting deleted
fast, io_active_rate is not going to give you any meaningful data as it
will be lost the moment group gets deleted.
Hence one needs to monitor the IO rate when a workload is running and is
keeping disk busy more or less all the time.
Even in your example, if you monitored IO rate over 10 second interval and
group is not doing any IO, you just can't do anything about it. Just that
your measurement e method is wrong. Even io_active_rate will not help you
here as by the time you read the file, group is gone and there is no data.
The very reason you want to monitor rate is that you want to make sure
group is getting enough BW. If group is not doing IO then one can look at
blkio.dequeue file and see if group is getting deleted too frequently. If
yes, that means group is not doing enough IO to keep the disk busy. One
can also try increasing the weight of the group but that will not help
much if group does not remain backlogged for significant amount of time.
> "io_active_rate" will never take un-backlogged time into account when calculating
> io rate.
>
Theoritically blkio.sectors/blkio.time gives the rate excluding the time
when group was not backlogged?
But I will not recommend using blkio.time as it is very approximate.
I really am not able to see what this interface is really buying you.
Thanks
Vivek
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] io-controller: Document for active bytes, time and rate.
2010-05-21 8:46 ` [PATCH 4/4] io-controller: Document for active bytes, time and rate Gui Jianfeng
@ 2010-05-26 18:57 ` Randy Dunlap
0 siblings, 0 replies; 15+ messages in thread
From: Randy Dunlap @ 2010-05-26 18:57 UTC (permalink / raw)
To: Gui Jianfeng; +Cc: Vivek Goyal, Jens Axboe, linux kernel mailing list
On Fri, 21 May 2010 16:46:59 +0800 Gui Jianfeng wrote:
> Document for active bytes time and rate.
>
> Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
> ---
> Documentation/cgroups/blkio-controller.txt | 21 +++++++++++++++++++++
> 1 files changed, 21 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
> index 48e0b21..2b015e8 100644
> --- a/Documentation/cgroups/blkio-controller.txt
> +++ b/Documentation/cgroups/blkio-controller.txt
> @@ -168,6 +168,27 @@ Details of cgroup files
> cgroup. This is further divided by the type of operation - read or
> write, sync or async.
>
> +- blkio.io_active_bytes
> + - Total number bytes transferred to/from the disk by the group since
Total number of bytes
> + the group got backlogged. First two fields specify the major and
> + minor number of the device, the third field specifies the operation
> + type and the fourth field specifies the number of bytes. It will be
> + reset when group gets dequeued from service tree.
> +
> +- blkio.io_active_time
> + - Total amount of time(in ms) spent since the group got backlogged.
time (in ms)
> + First two fields specify the major and minor number of the device,
> + third field specify how long it spents since the group got
specifies how long it spent
> + backlogged. It will be reset when group gets enqueued onto service
> + tree.
> +
> +- blkio.io_active_rate
> + - The io rate (in bytes/ms) of the group when group is issuing io.
s/io/IO/ in multiple places (or s:io:I/O: if you prefer the '/')
> + First two fields specify the major and minor number of the device,
> + the third field specifies the operation type and the fourth field
> + specifies the io rate. If there isn't any io on the group, io rate
> + won't be computed.
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
2010-05-25 13:25 ` Vivek Goyal
@ 2010-06-11 5:10 ` Divyesh Shah
2010-06-11 6:31 ` Gui Jianfeng
0 siblings, 1 reply; 15+ messages in thread
From: Divyesh Shah @ 2010-06-11 5:10 UTC (permalink / raw)
To: Vivek Goyal; +Cc: Gui Jianfeng, Jens Axboe, linux kernel mailing list
On Tue, May 25, 2010 at 6:25 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, May 25, 2010 at 11:00:54AM +0800, Gui Jianfeng wrote:
>> Vivek Goyal wrote:
>> > On Tue, May 25, 2010 at 09:37:31AM +0800, Gui Jianfeng wrote:
>> >> Vivek Goyal wrote:
>> >>> On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
>> >>>> Vivek Goyal wrote:
>> >>>>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
>> >>>>>> Hi,
>> >>>>>>
>> >>>>>> This series implements three new interfaces to keep track of tranferred bytes,
>> >>>>>> elapsing time and io rate since group getting backlogged. If the group dequeues
>> >>>>>> from service tree, these three interfaces will reset and shows zero.
>> >>>>> Hi Gui,
>> >>>>>
>> >>>>> Can you give some details regarding how this functionality is useful? Why
>> >>>>> would somebody be interested in only in stats of till group was
>> >>>>> backlogged and not in total stats?
>> >>>>>
>> >>>>> Groups can come and go so fast and these stats will reset so many times
>> >>>>> that I am not able to visualize how these stats will be useful.
>> >>>> Hi Vivek,
>> >>>>
>> >>>> Currently, we assign weight to a group, but user still doesn't know how fast the
>> >>>> group runs. With io rate interface, users can check the rate of a group at any
>> >>>> moment, or to determine whether the weight assigned to a group is enough.
>> >>>> bytes and time interface is just for debug purpose.
>> >>> Gui,
>> >>>
>> >>> I still don't understand that why blkio.sectors or blkio.io_service_bytes
>> >>> or blkio.io_serviced interfaces are not good enough to determine at what
>> >>> rate a group is doing IO.
>> >>>
>> >>> I think we can very well write something in userspace like "iostat" to
>> >>> display the per group rate. Utility can read the any of the above files
>> >>> say at the interfval of 1s, calculate the diff between the values and
>> >>> display that as group effective rate.
>> >> Hi Vivek,
>> >>
>> >> blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth
>> >> value. This value represents the actual rate a group runs. IMO, io rate calculated from
>> >> user space is not accurate in following two scenarios:
>> >>
>> >> 1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the
>> >> rate calculated in this interval doesn't make sense.
>> >>
>> >
>> > If you are not servicing groups for long time, anyway it is very bad for
>> > latency. So that's why soft limit of 300ms of CFQ makes sense and
>> > practically I am not sure you will be blocking groups for .5s.
>> >
>> > Even if you do, then user just needs to choose a bigger interval and you
>> > will see more smooth rates. Reduce the interval and you might see little
>> > bursty rate.
>>
>> Vivek,
>>
>> IIUC, the most big problem for user app is the user app doesn't know how long
>> the group has been dequeued during the interval. For example, user choose
>> 10s interval, 8s of which is not backlogged, but when user app calculates
>> io rate, this 8s still include. So this rate isn't what we want. Am i missing
>> something?
>
> Gui,
>
> If user application is not doing enough IO and group is getting deleted
> fast, io_active_rate is not going to give you any meaningful data as it
> will be lost the moment group gets deleted.
>
> Hence one needs to monitor the IO rate when a workload is running and is
> keeping disk busy more or less all the time.
>
> Even in your example, if you monitored IO rate over 10 second interval and
> group is not doing any IO, you just can't do anything about it. Just that
> your measurement e method is wrong. Even io_active_rate will not help you
> here as by the time you read the file, group is gone and there is no data.
>
> The very reason you want to monitor rate is that you want to make sure
> group is getting enough BW. If group is not doing IO then one can look at
> blkio.dequeue file and see if group is getting deleted too frequently. If
> yes, that means group is not doing enough IO to keep the disk busy. One
> can also try increasing the weight of the group but that will not help
> much if group does not remain backlogged for significant amount of time.
>
>> "io_active_rate" will never take un-backlogged time into account when calculating
>> io rate.
>>
>
> Theoritically blkio.sectors/blkio.time gives the rate excluding the time
> when group was not backlogged?
I agree with Vivek here. We use blkio.time as a source for io rate
count for each cgroup, knowing that it is not entirely accurate but a
good enough approximation.
Gui, if you want to find out whether the cgroup has enough weight or
not, I'd recommend looking at the wait_time stat in addition to
blkio.time. It has been very useful in identifying jobs that are not
getting enough IO done due to less weight assigned to them.
>
> But I will not recommend using blkio.time as it is very approximate.
>
> I really am not able to see what this interface is really buying you.
>
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status
2010-06-11 5:10 ` Divyesh Shah
@ 2010-06-11 6:31 ` Gui Jianfeng
0 siblings, 0 replies; 15+ messages in thread
From: Gui Jianfeng @ 2010-06-11 6:31 UTC (permalink / raw)
To: Divyesh Shah; +Cc: Vivek Goyal, Jens Axboe, linux kernel mailing list
Divyesh Shah wrote:
> On Tue, May 25, 2010 at 6:25 AM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> On Tue, May 25, 2010 at 11:00:54AM +0800, Gui Jianfeng wrote:
>>> Vivek Goyal wrote:
>>>> On Tue, May 25, 2010 at 09:37:31AM +0800, Gui Jianfeng wrote:
>>>>> Vivek Goyal wrote:
>>>>>> On Mon, May 24, 2010 at 09:12:05AM +0800, Gui Jianfeng wrote:
>>>>>>> Vivek Goyal wrote:
>>>>>>>> On Fri, May 21, 2010 at 04:40:50PM +0800, Gui Jianfeng wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This series implements three new interfaces to keep track of tranferred bytes,
>>>>>>>>> elapsing time and io rate since group getting backlogged. If the group dequeues
>>>>>>>>> from service tree, these three interfaces will reset and shows zero.
>>>>>>>> Hi Gui,
>>>>>>>>
>>>>>>>> Can you give some details regarding how this functionality is useful? Why
>>>>>>>> would somebody be interested in only in stats of till group was
>>>>>>>> backlogged and not in total stats?
>>>>>>>>
>>>>>>>> Groups can come and go so fast and these stats will reset so many times
>>>>>>>> that I am not able to visualize how these stats will be useful.
>>>>>>> Hi Vivek,
>>>>>>>
>>>>>>> Currently, we assign weight to a group, but user still doesn't know how fast the
>>>>>>> group runs. With io rate interface, users can check the rate of a group at any
>>>>>>> moment, or to determine whether the weight assigned to a group is enough.
>>>>>>> bytes and time interface is just for debug purpose.
>>>>>> Gui,
>>>>>>
>>>>>> I still don't understand that why blkio.sectors or blkio.io_service_bytes
>>>>>> or blkio.io_serviced interfaces are not good enough to determine at what
>>>>>> rate a group is doing IO.
>>>>>>
>>>>>> I think we can very well write something in userspace like "iostat" to
>>>>>> display the per group rate. Utility can read the any of the above files
>>>>>> say at the interfval of 1s, calculate the diff between the values and
>>>>>> display that as group effective rate.
>>>>> Hi Vivek,
>>>>>
>>>>> blkio.io_active_rate reflects the rate since group get backlogged, so the rate is a smooth
>>>>> value. This value represents the actual rate a group runs. IMO, io rate calculated from
>>>>> user space is not accurate in following two scenarios:
>>>>>
>>>>> 1 Userspace app chooses the interval of 1s, if 0.5s is backlogged and 0.5s is not, the
>>>>> rate calculated in this interval doesn't make sense.
>>>>>
>>>> If you are not servicing groups for long time, anyway it is very bad for
>>>> latency. So that's why soft limit of 300ms of CFQ makes sense and
>>>> practically I am not sure you will be blocking groups for .5s.
>>>>
>>>> Even if you do, then user just needs to choose a bigger interval and you
>>>> will see more smooth rates. Reduce the interval and you might see little
>>>> bursty rate.
>>> Vivek,
>>>
>>> IIUC, the most big problem for user app is the user app doesn't know how long
>>> the group has been dequeued during the interval. For example, user choose
>>> 10s interval, 8s of which is not backlogged, but when user app calculates
>>> io rate, this 8s still include. So this rate isn't what we want. Am i missing
>>> something?
>> Gui,
>>
>> If user application is not doing enough IO and group is getting deleted
>> fast, io_active_rate is not going to give you any meaningful data as it
>> will be lost the moment group gets deleted.
>>
>> Hence one needs to monitor the IO rate when a workload is running and is
>> keeping disk busy more or less all the time.
>>
>> Even in your example, if you monitored IO rate over 10 second interval and
>> group is not doing any IO, you just can't do anything about it. Just that
>> your measurement e method is wrong. Even io_active_rate will not help you
>> here as by the time you read the file, group is gone and there is no data.
>>
>> The very reason you want to monitor rate is that you want to make sure
>> group is getting enough BW. If group is not doing IO then one can look at
>> blkio.dequeue file and see if group is getting deleted too frequently. If
>> yes, that means group is not doing enough IO to keep the disk busy. One
>> can also try increasing the weight of the group but that will not help
>> much if group does not remain backlogged for significant amount of time.
>>
>>> "io_active_rate" will never take un-backlogged time into account when calculating
>>> io rate.
>>>
>> Theoritically blkio.sectors/blkio.time gives the rate excluding the time
>> when group was not backlogged?
>
> I agree with Vivek here. We use blkio.time as a source for io rate
> count for each cgroup, knowing that it is not entirely accurate but a
> good enough approximation.
>
> Gui, if you want to find out whether the cgroup has enough weight or
> not, I'd recommend looking at the wait_time stat in addition to
> blkio.time. It has been very useful in identifying jobs that are not
> getting enough IO done due to less weight assigned to them.
Ok, see. :)
Thanks,
Gui
>
>> But I will not recommend using blkio.time as it is very approximate.
>>
>> I really am not able to see what this interface is really buying you.
>>
>> Thanks
>> Vivek
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-06-11 6:33 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-21 8:40 [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Gui Jianfeng
2010-05-21 8:43 ` [PATCH 1/4] io-controller: a new interface to keep track of bytes during group is backlogged Gui Jianfeng
2010-05-21 8:44 ` [PATCH 2/4] io-controller: a new interface to keep track of the time since group bacame backlogged Gui Jianfeng
2010-05-21 8:45 ` [PATCH 3/4] io-controller: a new interface to keep track of io rate when group is backlogged Gui Jianfeng
2010-05-21 8:46 ` [PATCH 4/4] io-controller: Document for active bytes, time and rate Gui Jianfeng
2010-05-26 18:57 ` Randy Dunlap
2010-05-21 13:17 ` [PATCH 0/4] io-controller: Add new interfaces to trace backlogged group status Vivek Goyal
2010-05-24 1:12 ` Gui Jianfeng
2010-05-24 21:22 ` Vivek Goyal
2010-05-25 1:37 ` Gui Jianfeng
2010-05-25 2:03 ` Vivek Goyal
2010-05-25 3:00 ` Gui Jianfeng
2010-05-25 13:25 ` Vivek Goyal
2010-06-11 5:10 ` Divyesh Shah
2010-06-11 6:31 ` Gui Jianfeng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox