* [PATCH 0/3][v3] blkio: IO controller stats
@ 2010-04-06 3:35 Divyesh Shah
2010-04-06 3:36 ` [PATCH 1/3][v3] blkio: Remove per-cfqq nr_sectors as we'll be passing Divyesh Shah
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Divyesh Shah @ 2010-04-06 3:35 UTC (permalink / raw)
To: jens.axboe, vgoyal; +Cc: linux-kernel, nauman, ctalbott
The following series implements some additional stats for IO controller.
These stats have helped us debug issues with earlier IO controller
versions and should be useful now as well.
We've been using these stats for monitoring and debugging problems after the
fact as these stats can be collected and stored for later use.
One might argue that most of this information can be exported using blktrace
when debugging. However, blktrace has non-trivial performance impact and
cannot be always turned on. These stats provide a way for continuous monitoring
without losing much performance on rotational disks. We've been able to look
at these stats and debug issues after problems have been reported in the wild
and understand the IO pattern of the affected workloads.
Some of these stats are also a good data source for high-level analysis and
capacity planning.
This patchset adds 4 stats and I will send out another patchset later for
stats like io_merged and some stats that can be turned on only for
debugging - idle_time (total time spent idling for this blkio_group),
wait_time (total time spent by the blkio_group waiting before any one of its
queues got a timeslice). I've tried to breakdown the stats and sent the most
basic ones here.
Changelog from v2 (most based on Vivek Goyal's comments):
o Initialize blkg->stats_lock
o rename io_add_stat to blkio_add_stat and declare it static
o use bool for direction and sync
o derive direction and sync info from existing rq methods
o use 12 for major:minor string length
o define io_service_time better to cover the NCQ case
o add a separate reset_stats interface
o make the indexed stats a 2d array to simplify macro and function pointer code
Changelog from v1 (most based on Vivek Goyal's comments):
o blkio.time now exports in jiffies as before
o Added stats description in patch description and
Documentation/cgroup/blkio-controller.txt
o Prefix all stats functions with blkio and make them static as applicable
o replace IO_TYPE_MAX with IO_TYPE_TOTAL
o Moved #define constant to top of blk-cgroup.c
o Pass dev_t around instead of char *
o Add note to documentation file about resetting stats
o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef
statements
o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has
rq_direction() and rq_sync() functions which are used by CFQ and when using
io-controller at a higher level, bio_* functions can be added.
---
Divyesh Shah (3):
Increment the blkio cgroup stats for real now.
Add io controller stats like
Remove per-cfqq nr_sectors as we'll be passing that info at request dispatch
Documentation/cgroups/blkio-controller.txt | 40 +++++
block/blk-cgroup.c | 225 ++++++++++++++++++++++++++--
block/blk-cgroup.h | 79 ++++++++--
block/blk-core.c | 6 -
block/cfq-iosched.c | 18 +-
include/linux/blkdev.h | 38 +++++
6 files changed, 356 insertions(+), 50 deletions(-)
--
Divyesh
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/3][v3] blkio: Remove per-cfqq nr_sectors as we'll be passing 2010-04-06 3:35 [PATCH 0/3][v3] blkio: IO controller stats Divyesh Shah @ 2010-04-06 3:36 ` Divyesh Shah 2010-04-06 15:30 ` Vivek Goyal 2010-04-06 3:37 ` [PATCH 2/3][v3] blkio: Add io controller stats like Divyesh Shah 2010-04-06 3:37 ` [PATCH 3/3][v3] blkio: Increment the blkio cgroup stats for real now Divyesh Shah 2 siblings, 1 reply; 9+ messages in thread From: Divyesh Shah @ 2010-04-06 3:36 UTC (permalink / raw) To: jens.axboe, vgoyal; +Cc: linux-kernel, nauman, ctalbott that info at request dispatch with other stats now. This patch removes the existing support for accounting sectors for a blkio_group. This will be added back differently in the next two patches. Signed-off-by: Divyesh Shah<dpshah@google.com> --- block/blk-cgroup.c | 3 +-- block/blk-cgroup.h | 6 ++---- block/cfq-iosched.c | 10 ++-------- 3 files changed, 5 insertions(+), 14 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 4b686ad..5be3981 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -56,10 +56,9 @@ struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup) EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup); void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, - unsigned long time, unsigned long sectors) + unsigned long time) { blkg->time += time; - blkg->sectors += sectors; } EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_stats); diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index 8ccc204..fe44517 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -106,7 +106,7 @@ extern int blkiocg_del_blkio_group(struct blkio_group *blkg); extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key); void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, - unsigned long time, unsigned long sectors); + unsigned long time); #else struct cgroup; static inline struct blkio_cgroup * @@ -123,8 +123,6 @@ blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; } static inline struct blkio_group * blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; } static inline void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, - unsigned long time, unsigned long sectors) -{ -} + unsigned long time) {} #endif #endif /* _BLK_CGROUP_H */ diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index ef1680b..c18e348 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -141,8 +141,6 @@ struct cfq_queue { struct cfq_queue *new_cfqq; struct cfq_group *cfqg; struct cfq_group *orig_cfqg; - /* Sectors dispatched in current dispatch round */ - unsigned long nr_sectors; }; /* @@ -882,8 +880,7 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq) slice_used = cfqq->allocated_slice; } - cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u sect=%lu", slice_used, - cfqq->nr_sectors); + cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used); return slice_used; } @@ -917,8 +914,7 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg, cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime, st->min_vdisktime); - blkiocg_update_blkio_group_stats(&cfqg->blkg, used_sl, - cfqq->nr_sectors); + blkiocg_update_blkio_group_stats(&cfqg->blkg, used_sl); } #ifdef CONFIG_CFQ_GROUP_IOSCHED @@ -1524,7 +1520,6 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd, cfqq->allocated_slice = 0; cfqq->slice_end = 0; cfqq->slice_dispatch = 0; - cfqq->nr_sectors = 0; cfq_clear_cfqq_wait_request(cfqq); cfq_clear_cfqq_must_dispatch(cfqq); @@ -1869,7 +1864,6 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq) elv_dispatch_sort(q, rq); cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++; - cfqq->nr_sectors += blk_rq_sectors(rq); } /* ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3][v3] blkio: Remove per-cfqq nr_sectors as we'll be passing 2010-04-06 3:36 ` [PATCH 1/3][v3] blkio: Remove per-cfqq nr_sectors as we'll be passing Divyesh Shah @ 2010-04-06 15:30 ` Vivek Goyal 0 siblings, 0 replies; 9+ messages in thread From: Vivek Goyal @ 2010-04-06 15:30 UTC (permalink / raw) To: Divyesh Shah; +Cc: jens.axboe, linux-kernel, nauman, ctalbott On Mon, Apr 05, 2010 at 08:36:24PM -0700, Divyesh Shah wrote: > that info at request dispatch with other stats now. This patch removes the > existing support for accounting sectors for a blkio_group. This will be added > back differently in the next two patches. > > Signed-off-by: Divyesh Shah<dpshah@google.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Vivek > --- > > block/blk-cgroup.c | 3 +-- > block/blk-cgroup.h | 6 ++---- > block/cfq-iosched.c | 10 ++-------- > 3 files changed, 5 insertions(+), 14 deletions(-) > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index 4b686ad..5be3981 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -56,10 +56,9 @@ struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup) > EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup); > > void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, > - unsigned long time, unsigned long sectors) > + unsigned long time) > { > blkg->time += time; > - blkg->sectors += sectors; > } > EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_stats); > > diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h > index 8ccc204..fe44517 100644 > --- a/block/blk-cgroup.h > +++ b/block/blk-cgroup.h > @@ -106,7 +106,7 @@ extern int blkiocg_del_blkio_group(struct blkio_group *blkg); > extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, > void *key); > void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, > - unsigned long time, unsigned long sectors); > + unsigned long time); > #else > struct cgroup; > static inline struct blkio_cgroup * > @@ -123,8 +123,6 @@ blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; } > static inline struct blkio_group * > blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; } > static inline void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, > - unsigned long time, unsigned long sectors) > -{ > -} > + unsigned long time) {} > #endif > #endif /* _BLK_CGROUP_H */ > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c > index ef1680b..c18e348 100644 > --- a/block/cfq-iosched.c > +++ b/block/cfq-iosched.c > @@ -141,8 +141,6 @@ struct cfq_queue { > struct cfq_queue *new_cfqq; > struct cfq_group *cfqg; > struct cfq_group *orig_cfqg; > - /* Sectors dispatched in current dispatch round */ > - unsigned long nr_sectors; > }; > > /* > @@ -882,8 +880,7 @@ static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq) > slice_used = cfqq->allocated_slice; > } > > - cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u sect=%lu", slice_used, > - cfqq->nr_sectors); > + cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used); > return slice_used; > } > > @@ -917,8 +914,7 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg, > > cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime, > st->min_vdisktime); > - blkiocg_update_blkio_group_stats(&cfqg->blkg, used_sl, > - cfqq->nr_sectors); > + blkiocg_update_blkio_group_stats(&cfqg->blkg, used_sl); > } > > #ifdef CONFIG_CFQ_GROUP_IOSCHED > @@ -1524,7 +1520,6 @@ static void __cfq_set_active_queue(struct cfq_data *cfqd, > cfqq->allocated_slice = 0; > cfqq->slice_end = 0; > cfqq->slice_dispatch = 0; > - cfqq->nr_sectors = 0; > > cfq_clear_cfqq_wait_request(cfqq); > cfq_clear_cfqq_must_dispatch(cfqq); > @@ -1869,7 +1864,6 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq) > elv_dispatch_sort(q, rq); > > cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++; > - cfqq->nr_sectors += blk_rq_sectors(rq); > } > > /* ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/3][v3] blkio: Add io controller stats like 2010-04-06 3:35 [PATCH 0/3][v3] blkio: IO controller stats Divyesh Shah 2010-04-06 3:36 ` [PATCH 1/3][v3] blkio: Remove per-cfqq nr_sectors as we'll be passing Divyesh Shah @ 2010-04-06 3:37 ` Divyesh Shah 2010-04-06 15:16 ` Vivek Goyal 2010-04-06 3:37 ` [PATCH 3/3][v3] blkio: Increment the blkio cgroup stats for real now Divyesh Shah 2 siblings, 1 reply; 9+ messages in thread From: Divyesh Shah @ 2010-04-06 3:37 UTC (permalink / raw) To: jens.axboe, vgoyal; +Cc: linux-kernel, nauman, ctalbott - io_service_time (the time between request dispatch and completion for IOs in the cgroup) - io_wait_time (the time spent waiting in the IO shceduler queues before getting serviced) - io_serviced (number of IOs serviced from this blkio_group) - io_service_bytes (Number of bytes served for this cgroup) These stats are accumulated per operation type helping us to distinguish between read and write, and sync and async IO. This patch does not increment any of these stats. Signed-off-by: Divyesh Shah<dpshah@google.com> --- Documentation/cgroups/blkio-controller.txt | 40 +++++++ block/blk-cgroup.c | 166 +++++++++++++++++++++++++--- block/blk-cgroup.h | 60 ++++++++-- block/cfq-iosched.c | 3 - 4 files changed, 239 insertions(+), 30 deletions(-) diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt index 630879c..087925b 100644 --- a/Documentation/cgroups/blkio-controller.txt +++ b/Documentation/cgroups/blkio-controller.txt @@ -77,7 +77,6 @@ Details of cgroup files ======================= - blkio.weight - Specifies per cgroup weight. - Currently allowed range of weights is from 100 to 1000. - blkio.time @@ -92,6 +91,41 @@ Details of cgroup files third field specifies the number of sectors transferred by the group to/from the device. +- blkio.io_service_bytes + - Number of bytes transferred to/from the disk by the group. These + are further divided by the type of operation - read or write, sync + or async. First two fields specify the major and minor number of the + device, third field specifies the operation type and the fourth field + specifies the number of bytes. + +- blkio.io_serviced + - Number of IOs completed to/from the disk by the group. These + are further divided by the type of operation - read or write, sync + or async. First two fields specify the major and minor number of the + device, third field specifies the operation type and the fourth field + specifies the number of IOs. + +- blkio.io_service_time + - Total amount of time between request dispatch and request completion + for the IOs done by this cgroup. This is in nanoseconds to make it + meaningful for flash devices too. For devices with queue depth of 1, + this time represents the actual service time. When queue_depth > 1, + that is no longer true as requests may be served out of order. + This time is further divided by the type of operation - + read or write, sync or async. First two fields specify the major and + minor number of the device, third field specifies the operation type + and the fourth field specifies the io_service_time in ns. + +- blkio.io_wait_time + - Total amount of time the IO spent waiting in the scheduler queues for + service. This can be greater than the total time elapsed since it is + cumulative io_wait_time for all IOs. This is in nanoseconds to make it + meaningful for flash devices too. This time is further divided by the + type of operation - read or write, sync or async. First two fields + specify the major and minor number of the device, third field + specifies the operation type and the fourth field specifies the + io_wait_time in ns. + - blkio.dequeue - Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This gives the statistics about how many a times a group was dequeued @@ -99,6 +133,10 @@ Details of cgroup files and minor number of the device and third field specifies the number of times a group was dequeued from a particular device. +- blkio.reset_stats + - Writing an int to this file will result in resetting all the stats + for that cgroup. + CFQ sysfs tunable ================= /sys/block/<disk>/queue/iosched/group_isolation diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 5be3981..d585a05 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -17,6 +17,8 @@ #include <linux/err.h> #include "blk-cgroup.h" +#define MAX_KEY_LEN 100 + static DEFINE_SPINLOCK(blkio_list_lock); static LIST_HEAD(blkio_list); @@ -55,12 +57,21 @@ struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup) } EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup); -void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, - unsigned long time) +void blkio_group_init(struct blkio_group *blkg) +{ + spin_lock_init(&blkg->stats_lock); +} +EXPORT_SYMBOL_GPL(blkio_group_init); + +void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) { - blkg->time += time; + unsigned long flags; + + spin_lock_irqsave(&blkg->stats_lock, flags); + blkg->stats.time += time; + spin_unlock_irqrestore(&blkg->stats_lock, flags); } -EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_stats); +EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used); void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, struct blkio_group *blkg, void *key, dev_t dev) @@ -170,13 +181,107 @@ blkiocg_weight_write(struct cgroup *cgroup, struct cftype *cftype, u64 val) return 0; } -#define SHOW_FUNCTION_PER_GROUP(__VAR) \ +static int +blkiocg_reset_write(struct cgroup *cgroup, struct cftype *cftype, u64 val) +{ + struct blkio_cgroup *blkcg; + struct blkio_group *blkg; + struct hlist_node *n; + struct blkio_group_stats *stats; + + blkcg = cgroup_to_blkio_cgroup(cgroup); + spin_lock_irq(&blkcg->lock); + hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) { + spin_lock(&blkg->stats_lock); + stats = &blkg->stats; + memset(stats, 0, sizeof(struct blkio_group_stats)); + spin_unlock(&blkg->stats_lock); + } + spin_unlock_irq(&blkcg->lock); + return 0; +} + +static void blkio_get_key_name(enum stat_sub_type type, dev_t dev, char *str, + int chars_left, bool diskname_only) +{ + snprintf(str, chars_left, "%d:%d", MAJOR(dev), MINOR(dev)); + chars_left -= strlen(str); + if (chars_left <= 0) { + printk(KERN_WARNING + "Possibly incorrect cgroup stat display format"); + return; + } + if (diskname_only) + return; + switch (type) { + case BLKIO_STAT_READ: + strlcat(str, " Read", chars_left); + break; + case BLKIO_STAT_WRITE: + strlcat(str, " Write", chars_left); + break; + case BLKIO_STAT_SYNC: + strlcat(str, " Sync", chars_left); + break; + case BLKIO_STAT_ASYNC: + strlcat(str, " Async", chars_left); + break; + case BLKIO_STAT_TOTAL: + strlcat(str, " Total", chars_left); + break; + default: + strlcat(str, " Invalid", chars_left); + } +} + +static uint64_t blkio_fill_stat(char *str, int chars_left, uint64_t val, + struct cgroup_map_cb *cb, dev_t dev) +{ + blkio_get_key_name(0, dev, str, chars_left, true); + cb->fill(cb, str, val); + return val; +} + +/* This should be called with blkg->stats_lock held */ +static uint64_t blkio_get_stat(struct blkio_group *blkg, + struct cgroup_map_cb *cb, dev_t dev, enum stat_type type) +{ + uint64_t disk_total; + char key_str[MAX_KEY_LEN]; + enum stat_sub_type sub_type; + + if (type == BLKIO_STAT_TIME) + return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, + blkg->stats.time, cb, dev); + if (type == BLKIO_STAT_SECTORS) + return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, + blkg->stats.sectors, cb, dev); +#ifdef CONFIG_DEBUG_BLK_CGROUP + if (type == BLKIO_STAT_DEQUEUE) + return blkio_fill_stat(key_str, MAX_KEY_LEN - 1, + blkg->stats.dequeue, cb, dev); +#endif + + for (sub_type = BLKIO_STAT_READ; sub_type < BLKIO_STAT_TOTAL; + sub_type++) { + blkio_get_key_name(sub_type, dev, key_str, MAX_KEY_LEN, false); + cb->fill(cb, key_str, blkg->stats.stat_arr[type][sub_type]); + } + disk_total = blkg->stats.stat_arr[type][BLKIO_STAT_READ] + + blkg->stats.stat_arr[type][BLKIO_STAT_WRITE]; + blkio_get_key_name(BLKIO_STAT_TOTAL, dev, key_str, MAX_KEY_LEN, false); + cb->fill(cb, key_str, disk_total); + return disk_total; +} + +#define SHOW_FUNCTION_PER_GROUP(__VAR, type, show_total) \ static int blkiocg_##__VAR##_read(struct cgroup *cgroup, \ - struct cftype *cftype, struct seq_file *m) \ + struct cftype *cftype, struct cgroup_map_cb *cb) \ { \ struct blkio_cgroup *blkcg; \ struct blkio_group *blkg; \ struct hlist_node *n; \ + uint64_t cgroup_total = 0; \ \ if (!cgroup_lock_live_group(cgroup)) \ return -ENODEV; \ @@ -184,19 +289,28 @@ static int blkiocg_##__VAR##_read(struct cgroup *cgroup, \ blkcg = cgroup_to_blkio_cgroup(cgroup); \ rcu_read_lock(); \ hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node) {\ - if (blkg->dev) \ - seq_printf(m, "%u:%u %lu\n", MAJOR(blkg->dev), \ - MINOR(blkg->dev), blkg->__VAR); \ + if (blkg->dev) { \ + spin_lock_irq(&blkg->stats_lock); \ + cgroup_total += blkio_get_stat(blkg, cb, \ + blkg->dev, type); \ + spin_unlock_irq(&blkg->stats_lock); \ + } \ } \ + if (show_total) \ + cb->fill(cb, "Total", cgroup_total); \ rcu_read_unlock(); \ cgroup_unlock(); \ return 0; \ } -SHOW_FUNCTION_PER_GROUP(time); -SHOW_FUNCTION_PER_GROUP(sectors); +SHOW_FUNCTION_PER_GROUP(time, BLKIO_STAT_TIME, 0); +SHOW_FUNCTION_PER_GROUP(sectors, BLKIO_STAT_SECTORS, 0); +SHOW_FUNCTION_PER_GROUP(io_service_bytes, BLKIO_STAT_SERVICE_BYTES, 1); +SHOW_FUNCTION_PER_GROUP(io_serviced, BLKIO_STAT_SERVICED, 1); +SHOW_FUNCTION_PER_GROUP(io_service_time, BLKIO_STAT_SERVICE_TIME, 1); +SHOW_FUNCTION_PER_GROUP(io_wait_time, BLKIO_STAT_WAIT_TIME, 1); #ifdef CONFIG_DEBUG_BLK_CGROUP -SHOW_FUNCTION_PER_GROUP(dequeue); +SHOW_FUNCTION_PER_GROUP(dequeue, BLKIO_STAT_DEQUEUE, 0); #endif #undef SHOW_FUNCTION_PER_GROUP @@ -204,7 +318,7 @@ SHOW_FUNCTION_PER_GROUP(dequeue); void blkiocg_update_blkio_group_dequeue_stats(struct blkio_group *blkg, unsigned long dequeue) { - blkg->dequeue += dequeue; + blkg->stats.dequeue += dequeue; } EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_dequeue_stats); #endif @@ -217,16 +331,36 @@ struct cftype blkio_files[] = { }, { .name = "time", - .read_seq_string = blkiocg_time_read, + .read_map = blkiocg_time_read, }, { .name = "sectors", - .read_seq_string = blkiocg_sectors_read, + .read_map = blkiocg_sectors_read, + }, + { + .name = "io_service_bytes", + .read_map = blkiocg_io_service_bytes_read, + }, + { + .name = "io_serviced", + .read_map = blkiocg_io_serviced_read, + }, + { + .name = "io_service_time", + .read_map = blkiocg_io_service_time_read, + }, + { + .name = "io_wait_time", + .read_map = blkiocg_io_wait_time_read, + }, + { + .name = "reset_stats", + .write_u64 = blkiocg_reset_write, }, #ifdef CONFIG_DEBUG_BLK_CGROUP { .name = "dequeue", - .read_seq_string = blkiocg_dequeue_read, + .read_map = blkiocg_dequeue_read, }, #endif }; diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index fe44517..a4bc4bb 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -23,6 +23,33 @@ extern struct cgroup_subsys blkio_subsys; #define blkio_subsys_id blkio_subsys.subsys_id #endif +enum stat_type { + /* Total time spent (in ns) between request dispatch to the driver and + * request completion for IOs doen by this cgroup. This may not be + * accurate when NCQ is turned on. */ + BLKIO_STAT_SERVICE_TIME = 0, + /* Total bytes transferred */ + BLKIO_STAT_SERVICE_BYTES, + /* Total IOs serviced, post merge */ + BLKIO_STAT_SERVICED, + /* Total time spent waiting in scheduler queue in ns */ + BLKIO_STAT_WAIT_TIME, + /* All the single valued stats go below this */ + BLKIO_STAT_TIME, + BLKIO_STAT_SECTORS, +#ifdef CONFIG_DEBUG_BLK_CGROUP + BLKIO_STAT_DEQUEUE +#endif +}; + +enum stat_sub_type { + BLKIO_STAT_READ = 0, + BLKIO_STAT_WRITE, + BLKIO_STAT_SYNC, + BLKIO_STAT_ASYNC, + BLKIO_STAT_TOTAL +}; + struct blkio_cgroup { struct cgroup_subsys_state css; unsigned int weight; @@ -30,6 +57,17 @@ struct blkio_cgroup { struct hlist_head blkg_list; }; +struct blkio_group_stats { + /* total disk time and nr sectors dispatched by this group */ + uint64_t time; + uint64_t sectors; + uint64_t stat_arr[BLKIO_STAT_WAIT_TIME + 1][BLKIO_STAT_TOTAL]; +#ifdef CONFIG_DEBUG_BLK_CGROUP + /* How many times this group has been removed from service tree */ + unsigned long dequeue; +#endif +}; + struct blkio_group { /* An rcu protected unique identifier for the group */ void *key; @@ -38,15 +76,13 @@ struct blkio_group { #ifdef CONFIG_DEBUG_BLK_CGROUP /* Store cgroup path */ char path[128]; - /* How many times this group has been removed from service tree */ - unsigned long dequeue; #endif /* The device MKDEV(major, minor), this group has been created for */ - dev_t dev; + dev_t dev; - /* total disk time and nr sectors dispatched by this group */ - unsigned long time; - unsigned long sectors; + /* Need to serialize the stats in the case of reset/update */ + spinlock_t stats_lock; + struct blkio_group_stats stats; }; typedef void (blkio_unlink_group_fn) (void *key, struct blkio_group *blkg); @@ -105,24 +141,24 @@ extern void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, extern int blkiocg_del_blkio_group(struct blkio_group *blkg); extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key); -void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, - unsigned long time); +void blkio_group_init(struct blkio_group *blkg); +void blkiocg_update_timeslice_used(struct blkio_group *blkg, + unsigned long time); #else struct cgroup; static inline struct blkio_cgroup * cgroup_to_blkio_cgroup(struct cgroup *cgroup) { return NULL; } +static inline void blkio_group_init(struct blkio_group *blkg) {} static inline void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, - struct blkio_group *blkg, void *key, dev_t dev) -{ -} + struct blkio_group *blkg, void *key, dev_t dev) {} static inline int blkiocg_del_blkio_group(struct blkio_group *blkg) { return 0; } static inline struct blkio_group * blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; } -static inline void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, +static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) {} #endif #endif /* _BLK_CGROUP_H */ diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index c18e348..cf11548 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -914,7 +914,7 @@ static void cfq_group_served(struct cfq_data *cfqd, struct cfq_group *cfqg, cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime, st->min_vdisktime); - blkiocg_update_blkio_group_stats(&cfqg->blkg, used_sl); + blkiocg_update_timeslice_used(&cfqg->blkg, used_sl); } #ifdef CONFIG_CFQ_GROUP_IOSCHED @@ -954,6 +954,7 @@ cfq_find_alloc_cfqg(struct cfq_data *cfqd, struct cgroup *cgroup, int create) for_each_cfqg_st(cfqg, i, j, st) *st = CFQ_RB_ROOT; RB_CLEAR_NODE(&cfqg->rb_node); + blkio_group_init(&cfqg->blkg); /* * Take the initial reference that will be released on destroy ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3][v3] blkio: Add io controller stats like 2010-04-06 3:37 ` [PATCH 2/3][v3] blkio: Add io controller stats like Divyesh Shah @ 2010-04-06 15:16 ` Vivek Goyal 2010-04-06 16:59 ` Divyesh Shah 0 siblings, 1 reply; 9+ messages in thread From: Vivek Goyal @ 2010-04-06 15:16 UTC (permalink / raw) To: Divyesh Shah; +Cc: jens.axboe, linux-kernel, nauman, ctalbott On Mon, Apr 05, 2010 at 08:37:01PM -0700, Divyesh Shah wrote: > - io_service_time (the time between request dispatch and completion for IOs > in the cgroup) > - io_wait_time (the time spent waiting in the IO shceduler queues before > getting serviced) > - io_serviced (number of IOs serviced from this blkio_group) > - io_service_bytes (Number of bytes served for this cgroup) > > These stats are accumulated per operation type helping us to distinguish between > read and write, and sync and async IO. This patch does not increment any of > these stats. > > Signed-off-by: Divyesh Shah<dpshah@google.com> > --- > > Documentation/cgroups/blkio-controller.txt | 40 +++++++ > block/blk-cgroup.c | 166 +++++++++++++++++++++++++--- > block/blk-cgroup.h | 60 ++++++++-- > block/cfq-iosched.c | 3 - > 4 files changed, 239 insertions(+), 30 deletions(-) > > diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt > index 630879c..087925b 100644 > --- a/Documentation/cgroups/blkio-controller.txt > +++ b/Documentation/cgroups/blkio-controller.txt > @@ -77,7 +77,6 @@ Details of cgroup files > ======================= > - blkio.weight > - Specifies per cgroup weight. > - > Currently allowed range of weights is from 100 to 1000. > > - blkio.time > @@ -92,6 +91,41 @@ Details of cgroup files > third field specifies the number of sectors transferred by the > group to/from the device. > > +- blkio.io_service_bytes > + - Number of bytes transferred to/from the disk by the group. These > + are further divided by the type of operation - read or write, sync > + or async. First two fields specify the major and minor number of the > + device, third field specifies the operation type and the fourth field > + specifies the number of bytes. > + > +- blkio.io_serviced > + - Number of IOs completed to/from the disk by the group. These > + are further divided by the type of operation - read or write, sync > + or async. First two fields specify the major and minor number of the > + device, third field specifies the operation type and the fourth field > + specifies the number of IOs. > + Hi Divyesh, V3 looks much better. Just couple of minor nits. > +- blkio.io_service_time > + - Total amount of time between request dispatch and request completion > + for the IOs done by this cgroup. This is in nanoseconds to make it > + meaningful for flash devices too. For devices with queue depth of 1, > + this time represents the actual service time. When queue_depth > 1, > + that is no longer true as requests may be served out of order. Like io_wait_time, can you also mention here that this time is cumulative time and on NCQ disks it can be more than actual time elapsed. I did a quick run with your patches. I ran a sequential workload for 30s in two groups of weight 100 and 200. Following is the output of blkio.io_service_time (I have kept stats for only disk 253:3 in output). # cat test1/blkio.io_service_time 253:3 Read 18019970625 253:3 Write 0 253:3 Sync 18019970625 253:3 Async 0 253:3 Total 18019970625 # cat test2/blkio.io_service_time 253:3 Read 35479070171 253:3 Write 0 253:3 Sync 35479070171 253:3 Async 0 This storage supports NCQ. We see that though I ran test only for 30 seconds, total service ime for cgroup is close to 18+35=53 seconds. > + This time is further divided by the type of operation - > + read or write, sync or async. First two fields specify the major and > + minor number of the device, third field specifies the operation type > + and the fourth field specifies the io_service_time in ns. > + > +- blkio.io_wait_time > + - Total amount of time the IO spent waiting in the scheduler queues for > + service. This can be greater than the total time elapsed since it is > + cumulative io_wait_time for all IOs. This is in nanoseconds to make it > + meaningful for flash devices too. This time is further divided by the > + type of operation - read or write, sync or async. First two fields > + specify the major and minor number of the device, third field > + specifies the operation type and the fourth field specifies the > + io_wait_time in ns. > + > - blkio.dequeue > - Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This > gives the statistics about how many a times a group was dequeued > @@ -99,6 +133,10 @@ Details of cgroup files > and minor number of the device and third field specifies the number > of times a group was dequeued from a particular device. > > +- blkio.reset_stats > + - Writing an int to this file will result in resetting all the stats > + for that cgroup. > + Personally, I like adding a separate file to reset the stats. Now one does not get surprised by the fact that writting to blkio.io_service_time, also reset rest of the stats. > CFQ sysfs tunable > ================= > /sys/block/<disk>/queue/iosched/group_isolation > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index 5be3981..d585a05 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -17,6 +17,8 @@ > #include <linux/err.h> > #include "blk-cgroup.h" > > +#define MAX_KEY_LEN 100 > + > static DEFINE_SPINLOCK(blkio_list_lock); > static LIST_HEAD(blkio_list); > > @@ -55,12 +57,21 @@ struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup) > } > EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup); > > -void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, > - unsigned long time) > +void blkio_group_init(struct blkio_group *blkg) > +{ > + spin_lock_init(&blkg->stats_lock); > +} > +EXPORT_SYMBOL_GPL(blkio_group_init); > + > +void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) > { > - blkg->time += time; > + unsigned long flags; > + > + spin_lock_irqsave(&blkg->stats_lock, flags); > + blkg->stats.time += time; > + spin_unlock_irqrestore(&blkg->stats_lock, flags); > } > -EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_stats); > +EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used); > > void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, > struct blkio_group *blkg, void *key, dev_t dev) > @@ -170,13 +181,107 @@ blkiocg_weight_write(struct cgroup *cgroup, struct cftype *cftype, u64 val) > return 0; > } > > -#define SHOW_FUNCTION_PER_GROUP(__VAR) \ > +static int > +blkiocg_reset_write(struct cgroup *cgroup, struct cftype *cftype, u64 val) > +{ I guess we can rename this function to blkiocg_reset_stats(). [..] > @@ -217,16 +331,36 @@ struct cftype blkio_files[] = { > }, > { > .name = "time", > - .read_seq_string = blkiocg_time_read, > + .read_map = blkiocg_time_read, > }, > { > .name = "sectors", > - .read_seq_string = blkiocg_sectors_read, > + .read_map = blkiocg_sectors_read, > + }, > + { > + .name = "io_service_bytes", > + .read_map = blkiocg_io_service_bytes_read, > + }, > + { > + .name = "io_serviced", > + .read_map = blkiocg_io_serviced_read, > + }, > + { > + .name = "io_service_time", > + .read_map = blkiocg_io_service_time_read, > + }, > + { > + .name = "io_wait_time", > + .read_map = blkiocg_io_wait_time_read, > + }, > + { > + .name = "reset_stats", > + .write_u64 = blkiocg_reset_write, use blkiocg_reset_stats? Thanks Vivek ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3][v3] blkio: Add io controller stats like 2010-04-06 15:16 ` Vivek Goyal @ 2010-04-06 16:59 ` Divyesh Shah 2010-04-06 17:42 ` Vivek Goyal 0 siblings, 1 reply; 9+ messages in thread From: Divyesh Shah @ 2010-04-06 16:59 UTC (permalink / raw) To: Vivek Goyal; +Cc: jens.axboe, linux-kernel, nauman, ctalbott Vivek, I'll send out a v3.1 only for this patch since you've Acked the other 2 patches. Thanks a lot for the detailed reviews again! On Tue, Apr 6, 2010 at 8:16 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > On Mon, Apr 05, 2010 at 08:37:01PM -0700, Divyesh Shah wrote: >> - io_service_time (the time between request dispatch and completion for IOs >> in the cgroup) >> - io_wait_time (the time spent waiting in the IO shceduler queues before >> getting serviced) >> - io_serviced (number of IOs serviced from this blkio_group) >> - io_service_bytes (Number of bytes served for this cgroup) >> >> These stats are accumulated per operation type helping us to distinguish between >> read and write, and sync and async IO. This patch does not increment any of >> these stats. >> >> Signed-off-by: Divyesh Shah<dpshah@google.com> >> --- >> >> Documentation/cgroups/blkio-controller.txt | 40 +++++++ >> block/blk-cgroup.c | 166 +++++++++++++++++++++++++--- >> block/blk-cgroup.h | 60 ++++++++-- >> block/cfq-iosched.c | 3 - >> 4 files changed, 239 insertions(+), 30 deletions(-) >> >> diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt >> index 630879c..087925b 100644 >> --- a/Documentation/cgroups/blkio-controller.txt >> +++ b/Documentation/cgroups/blkio-controller.txt >> @@ -77,7 +77,6 @@ Details of cgroup files >> ======================= >> - blkio.weight >> - Specifies per cgroup weight. >> - >> Currently allowed range of weights is from 100 to 1000. >> >> - blkio.time >> @@ -92,6 +91,41 @@ Details of cgroup files >> third field specifies the number of sectors transferred by the >> group to/from the device. >> >> +- blkio.io_service_bytes >> + - Number of bytes transferred to/from the disk by the group. These >> + are further divided by the type of operation - read or write, sync >> + or async. First two fields specify the major and minor number of the >> + device, third field specifies the operation type and the fourth field >> + specifies the number of bytes. >> + >> +- blkio.io_serviced >> + - Number of IOs completed to/from the disk by the group. These >> + are further divided by the type of operation - read or write, sync >> + or async. First two fields specify the major and minor number of the >> + device, third field specifies the operation type and the fourth field >> + specifies the number of IOs. >> + > > Hi Divyesh, > > V3 looks much better. Just couple of minor nits. > >> +- blkio.io_service_time >> + - Total amount of time between request dispatch and request completion >> + for the IOs done by this cgroup. This is in nanoseconds to make it >> + meaningful for flash devices too. For devices with queue depth of 1, >> + this time represents the actual service time. When queue_depth > 1, >> + that is no longer true as requests may be served out of order. > > Like io_wait_time, can you also mention here that this time is cumulative > time and on NCQ disks it can be more than actual time elapsed. Will do. > > I did a quick run with your patches. I ran a sequential workload for 30s > in two groups of weight 100 and 200. Following is the output of > blkio.io_service_time (I have kept stats for only disk 253:3 in output). > > # cat test1/blkio.io_service_time > 253:3 Read 18019970625 > 253:3 Write 0 > 253:3 Sync 18019970625 > 253:3 Async 0 > 253:3 Total 18019970625 > > # cat test2/blkio.io_service_time > 253:3 Read 35479070171 > 253:3 Write 0 > 253:3 Sync 35479070171 > 253:3 Async 0 > > This storage supports NCQ. We see that though I ran test only for > 30 seconds, total service ime for cgroup is close to 18+35=53 seconds. Yes that is expected as from our discussion about NCQ as any IO can have the service time of multiple IOs coupled together when serviced out of order. > >> + This time is further divided by the type of operation - >> + read or write, sync or async. First two fields specify the major and >> + minor number of the device, third field specifies the operation type >> + and the fourth field specifies the io_service_time in ns. >> + >> +- blkio.io_wait_time >> + - Total amount of time the IO spent waiting in the scheduler queues for >> + service. This can be greater than the total time elapsed since it is >> + cumulative io_wait_time for all IOs. This is in nanoseconds to make it >> + meaningful for flash devices too. This time is further divided by the >> + type of operation - read or write, sync or async. First two fields >> + specify the major and minor number of the device, third field >> + specifies the operation type and the fourth field specifies the >> + io_wait_time in ns. >> + >> - blkio.dequeue >> - Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This >> gives the statistics about how many a times a group was dequeued >> @@ -99,6 +133,10 @@ Details of cgroup files >> and minor number of the device and third field specifies the number >> of times a group was dequeued from a particular device. >> >> +- blkio.reset_stats >> + - Writing an int to this file will result in resetting all the stats >> + for that cgroup. >> + > > Personally, I like adding a separate file to reset the stats. Now one does > not get surprised by the fact that writting to blkio.io_service_time, also > reset rest of the stats. > >> CFQ sysfs tunable >> ================= >> /sys/block/<disk>/queue/iosched/group_isolation >> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c >> index 5be3981..d585a05 100644 >> --- a/block/blk-cgroup.c >> +++ b/block/blk-cgroup.c >> @@ -17,6 +17,8 @@ >> #include <linux/err.h> >> #include "blk-cgroup.h" >> >> +#define MAX_KEY_LEN 100 >> + >> static DEFINE_SPINLOCK(blkio_list_lock); >> static LIST_HEAD(blkio_list); >> >> @@ -55,12 +57,21 @@ struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup) >> } >> EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup); >> >> -void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, >> - unsigned long time) >> +void blkio_group_init(struct blkio_group *blkg) >> +{ >> + spin_lock_init(&blkg->stats_lock); >> +} >> +EXPORT_SYMBOL_GPL(blkio_group_init); >> + >> +void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) >> { >> - blkg->time += time; >> + unsigned long flags; >> + >> + spin_lock_irqsave(&blkg->stats_lock, flags); >> + blkg->stats.time += time; >> + spin_unlock_irqrestore(&blkg->stats_lock, flags); >> } >> -EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_stats); >> +EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used); >> >> void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, >> struct blkio_group *blkg, void *key, dev_t dev) >> @@ -170,13 +181,107 @@ blkiocg_weight_write(struct cgroup *cgroup, struct cftype *cftype, u64 val) >> return 0; >> } >> >> -#define SHOW_FUNCTION_PER_GROUP(__VAR) \ >> +static int >> +blkiocg_reset_write(struct cgroup *cgroup, struct cftype *cftype, u64 val) >> +{ > > I guess we can rename this function to blkiocg_reset_stats(). Will do > > [..] >> @@ -217,16 +331,36 @@ struct cftype blkio_files[] = { >> }, >> { >> .name = "time", >> - .read_seq_string = blkiocg_time_read, >> + .read_map = blkiocg_time_read, >> }, >> { >> .name = "sectors", >> - .read_seq_string = blkiocg_sectors_read, >> + .read_map = blkiocg_sectors_read, >> + }, >> + { >> + .name = "io_service_bytes", >> + .read_map = blkiocg_io_service_bytes_read, >> + }, >> + { >> + .name = "io_serviced", >> + .read_map = blkiocg_io_serviced_read, >> + }, >> + { >> + .name = "io_service_time", >> + .read_map = blkiocg_io_service_time_read, >> + }, >> + { >> + .name = "io_wait_time", >> + .read_map = blkiocg_io_wait_time_read, >> + }, >> + { >> + .name = "reset_stats", >> + .write_u64 = blkiocg_reset_write, > > use blkiocg_reset_stats? Will do > > Thanks > Vivek > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/3][v3] blkio: Add io controller stats like 2010-04-06 16:59 ` Divyesh Shah @ 2010-04-06 17:42 ` Vivek Goyal 0 siblings, 0 replies; 9+ messages in thread From: Vivek Goyal @ 2010-04-06 17:42 UTC (permalink / raw) To: Divyesh Shah; +Cc: jens.axboe, linux-kernel, nauman, ctalbott On Tue, Apr 06, 2010 at 09:59:32AM -0700, Divyesh Shah wrote: > Vivek, > I'll send out a v3.1 only for this patch since you've Acked the > other 2 patches. Thanks a lot for the detailed reviews again! > That's fine. You can just reply to same mail and attach the new patch. Just that we need to explicitly mention to Jens to pick this new patch with minor modifications. Jens, how should we handle this patchset now? Original patches have already been applied to you for-2.6.35 branch. Should Divyesh, now send cleanup patches on top of these or something else? Thanks Vivek > On Tue, Apr 6, 2010 at 8:16 AM, Vivek Goyal <vgoyal@redhat.com> wrote: > > On Mon, Apr 05, 2010 at 08:37:01PM -0700, Divyesh Shah wrote: > >> - io_service_time (the time between request dispatch and completion for IOs > >> in the cgroup) > >> - io_wait_time (the time spent waiting in the IO shceduler queues before > >> getting serviced) > >> - io_serviced (number of IOs serviced from this blkio_group) > >> - io_service_bytes (Number of bytes served for this cgroup) > >> > >> These stats are accumulated per operation type helping us to distinguish between > >> read and write, and sync and async IO. This patch does not increment any of > >> these stats. > >> > >> Signed-off-by: Divyesh Shah<dpshah@google.com> > >> --- > >> > >> Documentation/cgroups/blkio-controller.txt | 40 +++++++ > >> block/blk-cgroup.c | 166 +++++++++++++++++++++++++--- > >> block/blk-cgroup.h | 60 ++++++++-- > >> block/cfq-iosched.c | 3 - > >> 4 files changed, 239 insertions(+), 30 deletions(-) > >> > >> diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt > >> index 630879c..087925b 100644 > >> --- a/Documentation/cgroups/blkio-controller.txt > >> +++ b/Documentation/cgroups/blkio-controller.txt > >> @@ -77,7 +77,6 @@ Details of cgroup files > >> ======================= > >> - blkio.weight > >> - Specifies per cgroup weight. > >> - > >> Currently allowed range of weights is from 100 to 1000. > >> > >> - blkio.time > >> @@ -92,6 +91,41 @@ Details of cgroup files > >> third field specifies the number of sectors transferred by the > >> group to/from the device. > >> > >> +- blkio.io_service_bytes > >> + - Number of bytes transferred to/from the disk by the group. These > >> + are further divided by the type of operation - read or write, sync > >> + or async. First two fields specify the major and minor number of the > >> + device, third field specifies the operation type and the fourth field > >> + specifies the number of bytes. > >> + > >> +- blkio.io_serviced > >> + - Number of IOs completed to/from the disk by the group. These > >> + are further divided by the type of operation - read or write, sync > >> + or async. First two fields specify the major and minor number of the > >> + device, third field specifies the operation type and the fourth field > >> + specifies the number of IOs. > >> + > > > > Hi Divyesh, > > > > V3 looks much better. Just couple of minor nits. > > > >> +- blkio.io_service_time > >> + - Total amount of time between request dispatch and request completion > >> + for the IOs done by this cgroup. This is in nanoseconds to make it > >> + meaningful for flash devices too. For devices with queue depth of 1, > >> + this time represents the actual service time. When queue_depth > 1, > >> + that is no longer true as requests may be served out of order. > > > > Like io_wait_time, can you also mention here that this time is cumulative > > time and on NCQ disks it can be more than actual time elapsed. > > Will do. > > > > > I did a quick run with your patches. I ran a sequential workload for 30s > > in two groups of weight 100 and 200. Following is the output of > > blkio.io_service_time (I have kept stats for only disk 253:3 in output). > > > > # cat test1/blkio.io_service_time > > 253:3 Read 18019970625 > > 253:3 Write 0 > > 253:3 Sync 18019970625 > > 253:3 Async 0 > > 253:3 Total 18019970625 > > > > # cat test2/blkio.io_service_time > > 253:3 Read 35479070171 > > 253:3 Write 0 > > 253:3 Sync 35479070171 > > 253:3 Async 0 > > > > This storage supports NCQ. We see that though I ran test only for > > 30 seconds, total service ime for cgroup is close to 18+35=53 seconds. > > Yes that is expected as from our discussion about NCQ as any IO can > have the service time of multiple IOs coupled together when serviced > out of order. > > > > >> + This time is further divided by the type of operation - > >> + read or write, sync or async. First two fields specify the major and > >> + minor number of the device, third field specifies the operation type > >> + and the fourth field specifies the io_service_time in ns. > >> + > >> +- blkio.io_wait_time > >> + - Total amount of time the IO spent waiting in the scheduler queues for > >> + service. This can be greater than the total time elapsed since it is > >> + cumulative io_wait_time for all IOs. This is in nanoseconds to make it > >> + meaningful for flash devices too. This time is further divided by the > >> + type of operation - read or write, sync or async. First two fields > >> + specify the major and minor number of the device, third field > >> + specifies the operation type and the fourth field specifies the > >> + io_wait_time in ns. > >> + > >> - blkio.dequeue > >> - Debugging aid only enabled if CONFIG_DEBUG_CFQ_IOSCHED=y. This > >> gives the statistics about how many a times a group was dequeued > >> @@ -99,6 +133,10 @@ Details of cgroup files > >> and minor number of the device and third field specifies the number > >> of times a group was dequeued from a particular device. > >> > >> +- blkio.reset_stats > >> + - Writing an int to this file will result in resetting all the stats > >> + for that cgroup. > >> + > > > > Personally, I like adding a separate file to reset the stats. Now one does > > not get surprised by the fact that writting to blkio.io_service_time, also > > reset rest of the stats. > > > >> CFQ sysfs tunable > >> ================= > >> /sys/block/<disk>/queue/iosched/group_isolation > >> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > >> index 5be3981..d585a05 100644 > >> --- a/block/blk-cgroup.c > >> +++ b/block/blk-cgroup.c > >> @@ -17,6 +17,8 @@ > >> #include <linux/err.h> > >> #include "blk-cgroup.h" > >> > >> +#define MAX_KEY_LEN 100 > >> + > >> static DEFINE_SPINLOCK(blkio_list_lock); > >> static LIST_HEAD(blkio_list); > >> > >> @@ -55,12 +57,21 @@ struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup) > >> } > >> EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup); > >> > >> -void blkiocg_update_blkio_group_stats(struct blkio_group *blkg, > >> - unsigned long time) > >> +void blkio_group_init(struct blkio_group *blkg) > >> +{ > >> + spin_lock_init(&blkg->stats_lock); > >> +} > >> +EXPORT_SYMBOL_GPL(blkio_group_init); > >> + > >> +void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) > >> { > >> - blkg->time += time; > >> + unsigned long flags; > >> + > >> + spin_lock_irqsave(&blkg->stats_lock, flags); > >> + blkg->stats.time += time; > >> + spin_unlock_irqrestore(&blkg->stats_lock, flags); > >> } > >> -EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_stats); > >> +EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used); > >> > >> void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, > >> struct blkio_group *blkg, void *key, dev_t dev) > >> @@ -170,13 +181,107 @@ blkiocg_weight_write(struct cgroup *cgroup, struct cftype *cftype, u64 val) > >> return 0; > >> } > >> > >> -#define SHOW_FUNCTION_PER_GROUP(__VAR) \ > >> +static int > >> +blkiocg_reset_write(struct cgroup *cgroup, struct cftype *cftype, u64 val) > >> +{ > > > > I guess we can rename this function to blkiocg_reset_stats(). > Will do > > > > > [..] > >> @@ -217,16 +331,36 @@ struct cftype blkio_files[] = { > >> }, > >> { > >> .name = "time", > >> - .read_seq_string = blkiocg_time_read, > >> + .read_map = blkiocg_time_read, > >> }, > >> { > >> .name = "sectors", > >> - .read_seq_string = blkiocg_sectors_read, > >> + .read_map = blkiocg_sectors_read, > >> + }, > >> + { > >> + .name = "io_service_bytes", > >> + .read_map = blkiocg_io_service_bytes_read, > >> + }, > >> + { > >> + .name = "io_serviced", > >> + .read_map = blkiocg_io_serviced_read, > >> + }, > >> + { > >> + .name = "io_service_time", > >> + .read_map = blkiocg_io_service_time_read, > >> + }, > >> + { > >> + .name = "io_wait_time", > >> + .read_map = blkiocg_io_wait_time_read, > >> + }, > >> + { > >> + .name = "reset_stats", > >> + .write_u64 = blkiocg_reset_write, > > > > use blkiocg_reset_stats? > Will do > > > > Thanks > > Vivek > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3][v3] blkio: Increment the blkio cgroup stats for real now. 2010-04-06 3:35 [PATCH 0/3][v3] blkio: IO controller stats Divyesh Shah 2010-04-06 3:36 ` [PATCH 1/3][v3] blkio: Remove per-cfqq nr_sectors as we'll be passing Divyesh Shah 2010-04-06 3:37 ` [PATCH 2/3][v3] blkio: Add io controller stats like Divyesh Shah @ 2010-04-06 3:37 ` Divyesh Shah 2010-04-06 15:30 ` Vivek Goyal 2 siblings, 1 reply; 9+ messages in thread From: Divyesh Shah @ 2010-04-06 3:37 UTC (permalink / raw) To: jens.axboe, vgoyal; +Cc: linux-kernel, nauman, ctalbott We also add start_time_ns and io_start_time_ns fields to struct request here to record the time when a request is created and when it is dispatched to device. We use ns uints here as ms and jiffies are not very useful for non-rotational media. Signed-off-by: Divyesh Shah<dpshah@google.com> --- block/blk-cgroup.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++-- block/blk-cgroup.h | 15 ++++++++++-- block/blk-core.c | 6 +++-- block/cfq-iosched.c | 7 +++++- include/linux/blkdev.h | 38 +++++++++++++++++++++++++++++++ 5 files changed, 115 insertions(+), 9 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index d585a05..8bd607c 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -15,6 +15,7 @@ #include <linux/kdev_t.h> #include <linux/module.h> #include <linux/err.h> +#include <linux/blkdev.h> #include "blk-cgroup.h" #define MAX_KEY_LEN 100 @@ -63,6 +64,23 @@ void blkio_group_init(struct blkio_group *blkg) } EXPORT_SYMBOL_GPL(blkio_group_init); +/* + * Add to the appropriate stat variable depending on the request type. + * This should be called with the blkg->stats_lock held. + */ +static void blkio_add_stat(uint64_t *stat, uint64_t add, bool direction, + bool sync) +{ + if (direction) + stat[BLKIO_STAT_WRITE] += add; + else + stat[BLKIO_STAT_READ] += add; + if (sync) + stat[BLKIO_STAT_SYNC] += add; + else + stat[BLKIO_STAT_ASYNC] += add; +} + void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) { unsigned long flags; @@ -73,6 +91,42 @@ void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) } EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used); +void blkiocg_update_dispatch_stats(struct blkio_group *blkg, + uint64_t bytes, bool direction, bool sync) +{ + struct blkio_group_stats *stats; + unsigned long flags; + + spin_lock_irqsave(&blkg->stats_lock, flags); + stats = &blkg->stats; + stats->sectors += bytes >> 9; + blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICED], 1, direction, + sync); + blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICE_BYTES], bytes, + direction, sync); + spin_unlock_irqrestore(&blkg->stats_lock, flags); +} +EXPORT_SYMBOL_GPL(blkiocg_update_dispatch_stats); + +void blkiocg_update_completion_stats(struct blkio_group *blkg, + uint64_t start_time, uint64_t io_start_time, bool direction, bool sync) +{ + struct blkio_group_stats *stats; + unsigned long flags; + unsigned long long now = sched_clock(); + + spin_lock_irqsave(&blkg->stats_lock, flags); + stats = &blkg->stats; + if (time_after64(now, io_start_time)) + blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICE_TIME], + now - io_start_time, direction, sync); + if (time_after64(io_start_time, start_time)) + blkio_add_stat(stats->stat_arr[BLKIO_STAT_WAIT_TIME], + io_start_time - start_time, direction, sync); + spin_unlock_irqrestore(&blkg->stats_lock, flags); +} +EXPORT_SYMBOL_GPL(blkiocg_update_completion_stats); + void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, struct blkio_group *blkg, void *key, dev_t dev) { @@ -315,12 +369,12 @@ SHOW_FUNCTION_PER_GROUP(dequeue, BLKIO_STAT_DEQUEUE, 0); #undef SHOW_FUNCTION_PER_GROUP #ifdef CONFIG_DEBUG_BLK_CGROUP -void blkiocg_update_blkio_group_dequeue_stats(struct blkio_group *blkg, +void blkiocg_update_dequeue_stats(struct blkio_group *blkg, unsigned long dequeue) { blkg->stats.dequeue += dequeue; } -EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_dequeue_stats); +EXPORT_SYMBOL_GPL(blkiocg_update_dequeue_stats); #endif struct cftype blkio_files[] = { diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index a4bc4bb..b22e553 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -125,12 +125,12 @@ static inline char *blkg_path(struct blkio_group *blkg) { return blkg->path; } -void blkiocg_update_blkio_group_dequeue_stats(struct blkio_group *blkg, +void blkiocg_update_dequeue_stats(struct blkio_group *blkg, unsigned long dequeue); #else static inline char *blkg_path(struct blkio_group *blkg) { return NULL; } -static inline void blkiocg_update_blkio_group_dequeue_stats( - struct blkio_group *blkg, unsigned long dequeue) {} +static inline void blkiocg_update_dequeue_stats(struct blkio_group *blkg, + unsigned long dequeue) {} #endif #if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE) @@ -144,6 +144,10 @@ extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, void blkio_group_init(struct blkio_group *blkg); void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time); +void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes, + bool direction, bool sync); +void blkiocg_update_completion_stats(struct blkio_group *blkg, + uint64_t start_time, uint64_t io_start_time, bool direction, bool sync); #else struct cgroup; static inline struct blkio_cgroup * @@ -160,5 +164,10 @@ static inline struct blkio_group * blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; } static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) {} +static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg, + uint64_t bytes, bool direction, bool sync) {} +static inline void blkiocg_update_completion_stats(struct blkio_group *blkg, + uint64_t start_time, uint64_t io_start_time, bool direction, + bool sync) {} #endif #endif /* _BLK_CGROUP_H */ diff --git a/block/blk-core.c b/block/blk-core.c index 9fe174d..1d94f15 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -127,6 +127,7 @@ void blk_rq_init(struct request_queue *q, struct request *rq) rq->tag = -1; rq->ref_count = 1; rq->start_time = jiffies; + set_start_time_ns(rq); } EXPORT_SYMBOL(blk_rq_init); @@ -1855,8 +1856,10 @@ void blk_dequeue_request(struct request *rq) * and to it is freed is accounted as io that is in progress at * the driver side. */ - if (blk_account_rq(rq)) + if (blk_account_rq(rq)) { q->in_flight[rq_is_sync(rq)]++; + set_io_start_time_ns(rq); + } } /** @@ -2517,4 +2520,3 @@ int __init blk_dev_init(void) return 0; } - diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index cf11548..9102ffc 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -854,7 +854,7 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg) if (!RB_EMPTY_NODE(&cfqg->rb_node)) cfq_rb_erase(&cfqg->rb_node, st); cfqg->saved_workload_slice = 0; - blkiocg_update_blkio_group_dequeue_stats(&cfqg->blkg, 1); + blkiocg_update_dequeue_stats(&cfqg->blkg, 1); } static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq) @@ -1865,6 +1865,8 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq) elv_dispatch_sort(q, rq); cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++; + blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq), + rq_data_dir(rq), rq_is_sync(rq)); } /* @@ -3285,6 +3287,9 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq) WARN_ON(!cfqq->dispatched); cfqd->rq_in_driver--; cfqq->dispatched--; + blkiocg_update_completion_stats(&cfqq->cfqg->blkg, rq_start_time_ns(rq), + rq_io_start_time_ns(rq), rq_data_dir(rq), + rq_is_sync(rq)); cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]--; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ebd22db..ac1f30d 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -193,7 +193,10 @@ struct request { struct gendisk *rq_disk; unsigned long start_time; - +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE) + unsigned long long start_time_ns; + unsigned long long io_start_time_ns; /* when passed to hardware */ +#endif /* Number of scatter-gather DMA addr+len pairs after * physical address coalescing is performed. */ @@ -1219,6 +1222,39 @@ static inline void put_dev_sector(Sector p) struct work_struct; int kblockd_schedule_work(struct request_queue *q, struct work_struct *work); +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE) +static inline void set_start_time_ns(struct request *req) +{ + req->start_time_ns = sched_clock(); +} + +static inline void set_io_start_time_ns(struct request *req) +{ + req->io_start_time_ns = sched_clock(); +} + +static inline uint64_t rq_start_time_ns(struct request *req) +{ + return req->start_time_ns; +} + +static inline uint64_t rq_io_start_time_ns(struct request *req) +{ + return req->io_start_time_ns; +} +#else +static inline void set_start_time_ns(struct request *req) {} +static inline void set_io_start_time_ns(struct request *req) {} +static inline uint64_t rq_start_time_ns(struct request *req) +{ + return 0; +} +static inline uint64_t rq_io_start_time_ns(struct request *req) +{ + return 0; +} +#endif + #define MODULE_ALIAS_BLOCKDEV(major,minor) \ MODULE_ALIAS("block-major-" __stringify(major) "-" __stringify(minor)) #define MODULE_ALIAS_BLOCKDEV_MAJOR(major) \ ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 3/3][v3] blkio: Increment the blkio cgroup stats for real now. 2010-04-06 3:37 ` [PATCH 3/3][v3] blkio: Increment the blkio cgroup stats for real now Divyesh Shah @ 2010-04-06 15:30 ` Vivek Goyal 0 siblings, 0 replies; 9+ messages in thread From: Vivek Goyal @ 2010-04-06 15:30 UTC (permalink / raw) To: Divyesh Shah; +Cc: jens.axboe, linux-kernel, nauman, ctalbott On Mon, Apr 05, 2010 at 08:37:38PM -0700, Divyesh Shah wrote: > We also add start_time_ns and io_start_time_ns fields to struct request > here to record the time when a request is created and when it is > dispatched to device. We use ns uints here as ms and jiffies are > not very useful for non-rotational media. > > Signed-off-by: Divyesh Shah<dpshah@google.com> Looks good to me. Acked-by: Vivek Goyal <vgoyal@redhat.com> Vivek > --- > > block/blk-cgroup.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++-- > block/blk-cgroup.h | 15 ++++++++++-- > block/blk-core.c | 6 +++-- > block/cfq-iosched.c | 7 +++++- > include/linux/blkdev.h | 38 +++++++++++++++++++++++++++++++ > 5 files changed, 115 insertions(+), 9 deletions(-) > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c > index d585a05..8bd607c 100644 > --- a/block/blk-cgroup.c > +++ b/block/blk-cgroup.c > @@ -15,6 +15,7 @@ > #include <linux/kdev_t.h> > #include <linux/module.h> > #include <linux/err.h> > +#include <linux/blkdev.h> > #include "blk-cgroup.h" > > #define MAX_KEY_LEN 100 > @@ -63,6 +64,23 @@ void blkio_group_init(struct blkio_group *blkg) > } > EXPORT_SYMBOL_GPL(blkio_group_init); > > +/* > + * Add to the appropriate stat variable depending on the request type. > + * This should be called with the blkg->stats_lock held. > + */ > +static void blkio_add_stat(uint64_t *stat, uint64_t add, bool direction, > + bool sync) > +{ > + if (direction) > + stat[BLKIO_STAT_WRITE] += add; > + else > + stat[BLKIO_STAT_READ] += add; > + if (sync) > + stat[BLKIO_STAT_SYNC] += add; > + else > + stat[BLKIO_STAT_ASYNC] += add; > +} > + > void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) > { > unsigned long flags; > @@ -73,6 +91,42 @@ void blkiocg_update_timeslice_used(struct blkio_group *blkg, unsigned long time) > } > EXPORT_SYMBOL_GPL(blkiocg_update_timeslice_used); > > +void blkiocg_update_dispatch_stats(struct blkio_group *blkg, > + uint64_t bytes, bool direction, bool sync) > +{ > + struct blkio_group_stats *stats; > + unsigned long flags; > + > + spin_lock_irqsave(&blkg->stats_lock, flags); > + stats = &blkg->stats; > + stats->sectors += bytes >> 9; > + blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICED], 1, direction, > + sync); > + blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICE_BYTES], bytes, > + direction, sync); > + spin_unlock_irqrestore(&blkg->stats_lock, flags); > +} > +EXPORT_SYMBOL_GPL(blkiocg_update_dispatch_stats); > + > +void blkiocg_update_completion_stats(struct blkio_group *blkg, > + uint64_t start_time, uint64_t io_start_time, bool direction, bool sync) > +{ > + struct blkio_group_stats *stats; > + unsigned long flags; > + unsigned long long now = sched_clock(); > + > + spin_lock_irqsave(&blkg->stats_lock, flags); > + stats = &blkg->stats; > + if (time_after64(now, io_start_time)) > + blkio_add_stat(stats->stat_arr[BLKIO_STAT_SERVICE_TIME], > + now - io_start_time, direction, sync); > + if (time_after64(io_start_time, start_time)) > + blkio_add_stat(stats->stat_arr[BLKIO_STAT_WAIT_TIME], > + io_start_time - start_time, direction, sync); > + spin_unlock_irqrestore(&blkg->stats_lock, flags); > +} > +EXPORT_SYMBOL_GPL(blkiocg_update_completion_stats); > + > void blkiocg_add_blkio_group(struct blkio_cgroup *blkcg, > struct blkio_group *blkg, void *key, dev_t dev) > { > @@ -315,12 +369,12 @@ SHOW_FUNCTION_PER_GROUP(dequeue, BLKIO_STAT_DEQUEUE, 0); > #undef SHOW_FUNCTION_PER_GROUP > > #ifdef CONFIG_DEBUG_BLK_CGROUP > -void blkiocg_update_blkio_group_dequeue_stats(struct blkio_group *blkg, > +void blkiocg_update_dequeue_stats(struct blkio_group *blkg, > unsigned long dequeue) > { > blkg->stats.dequeue += dequeue; > } > -EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_dequeue_stats); > +EXPORT_SYMBOL_GPL(blkiocg_update_dequeue_stats); > #endif > > struct cftype blkio_files[] = { > diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h > index a4bc4bb..b22e553 100644 > --- a/block/blk-cgroup.h > +++ b/block/blk-cgroup.h > @@ -125,12 +125,12 @@ static inline char *blkg_path(struct blkio_group *blkg) > { > return blkg->path; > } > -void blkiocg_update_blkio_group_dequeue_stats(struct blkio_group *blkg, > +void blkiocg_update_dequeue_stats(struct blkio_group *blkg, > unsigned long dequeue); > #else > static inline char *blkg_path(struct blkio_group *blkg) { return NULL; } > -static inline void blkiocg_update_blkio_group_dequeue_stats( > - struct blkio_group *blkg, unsigned long dequeue) {} > +static inline void blkiocg_update_dequeue_stats(struct blkio_group *blkg, > + unsigned long dequeue) {} > #endif > > #if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE) > @@ -144,6 +144,10 @@ extern struct blkio_group *blkiocg_lookup_group(struct blkio_cgroup *blkcg, > void blkio_group_init(struct blkio_group *blkg); > void blkiocg_update_timeslice_used(struct blkio_group *blkg, > unsigned long time); > +void blkiocg_update_dispatch_stats(struct blkio_group *blkg, uint64_t bytes, > + bool direction, bool sync); > +void blkiocg_update_completion_stats(struct blkio_group *blkg, > + uint64_t start_time, uint64_t io_start_time, bool direction, bool sync); > #else > struct cgroup; > static inline struct blkio_cgroup * > @@ -160,5 +164,10 @@ static inline struct blkio_group * > blkiocg_lookup_group(struct blkio_cgroup *blkcg, void *key) { return NULL; } > static inline void blkiocg_update_timeslice_used(struct blkio_group *blkg, > unsigned long time) {} > +static inline void blkiocg_update_dispatch_stats(struct blkio_group *blkg, > + uint64_t bytes, bool direction, bool sync) {} > +static inline void blkiocg_update_completion_stats(struct blkio_group *blkg, > + uint64_t start_time, uint64_t io_start_time, bool direction, > + bool sync) {} > #endif > #endif /* _BLK_CGROUP_H */ > diff --git a/block/blk-core.c b/block/blk-core.c > index 9fe174d..1d94f15 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -127,6 +127,7 @@ void blk_rq_init(struct request_queue *q, struct request *rq) > rq->tag = -1; > rq->ref_count = 1; > rq->start_time = jiffies; > + set_start_time_ns(rq); > } > EXPORT_SYMBOL(blk_rq_init); > > @@ -1855,8 +1856,10 @@ void blk_dequeue_request(struct request *rq) > * and to it is freed is accounted as io that is in progress at > * the driver side. > */ > - if (blk_account_rq(rq)) > + if (blk_account_rq(rq)) { > q->in_flight[rq_is_sync(rq)]++; > + set_io_start_time_ns(rq); > + } > } > > /** > @@ -2517,4 +2520,3 @@ int __init blk_dev_init(void) > > return 0; > } > - > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c > index cf11548..9102ffc 100644 > --- a/block/cfq-iosched.c > +++ b/block/cfq-iosched.c > @@ -854,7 +854,7 @@ cfq_group_service_tree_del(struct cfq_data *cfqd, struct cfq_group *cfqg) > if (!RB_EMPTY_NODE(&cfqg->rb_node)) > cfq_rb_erase(&cfqg->rb_node, st); > cfqg->saved_workload_slice = 0; > - blkiocg_update_blkio_group_dequeue_stats(&cfqg->blkg, 1); > + blkiocg_update_dequeue_stats(&cfqg->blkg, 1); > } > > static inline unsigned int cfq_cfqq_slice_usage(struct cfq_queue *cfqq) > @@ -1865,6 +1865,8 @@ static void cfq_dispatch_insert(struct request_queue *q, struct request *rq) > elv_dispatch_sort(q, rq); > > cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++; > + blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq), > + rq_data_dir(rq), rq_is_sync(rq)); > } > > /* > @@ -3285,6 +3287,9 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq) > WARN_ON(!cfqq->dispatched); > cfqd->rq_in_driver--; > cfqq->dispatched--; > + blkiocg_update_completion_stats(&cfqq->cfqg->blkg, rq_start_time_ns(rq), > + rq_io_start_time_ns(rq), rq_data_dir(rq), > + rq_is_sync(rq)); > > cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]--; > > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h > index ebd22db..ac1f30d 100644 > --- a/include/linux/blkdev.h > +++ b/include/linux/blkdev.h > @@ -193,7 +193,10 @@ struct request { > > struct gendisk *rq_disk; > unsigned long start_time; > - > +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE) > + unsigned long long start_time_ns; > + unsigned long long io_start_time_ns; /* when passed to hardware */ > +#endif > /* Number of scatter-gather DMA addr+len pairs after > * physical address coalescing is performed. > */ > @@ -1219,6 +1222,39 @@ static inline void put_dev_sector(Sector p) > struct work_struct; > int kblockd_schedule_work(struct request_queue *q, struct work_struct *work); > > +#if defined(CONFIG_BLK_CGROUP) || defined(CONFIG_BLK_CGROUP_MODULE) > +static inline void set_start_time_ns(struct request *req) > +{ > + req->start_time_ns = sched_clock(); > +} > + > +static inline void set_io_start_time_ns(struct request *req) > +{ > + req->io_start_time_ns = sched_clock(); > +} > + > +static inline uint64_t rq_start_time_ns(struct request *req) > +{ > + return req->start_time_ns; > +} > + > +static inline uint64_t rq_io_start_time_ns(struct request *req) > +{ > + return req->io_start_time_ns; > +} > +#else > +static inline void set_start_time_ns(struct request *req) {} > +static inline void set_io_start_time_ns(struct request *req) {} > +static inline uint64_t rq_start_time_ns(struct request *req) > +{ > + return 0; > +} > +static inline uint64_t rq_io_start_time_ns(struct request *req) > +{ > + return 0; > +} > +#endif > + > #define MODULE_ALIAS_BLOCKDEV(major,minor) \ > MODULE_ALIAS("block-major-" __stringify(major) "-" __stringify(minor)) > #define MODULE_ALIAS_BLOCKDEV_MAJOR(major) \ ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-04-06 17:43 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-06 3:35 [PATCH 0/3][v3] blkio: IO controller stats Divyesh Shah 2010-04-06 3:36 ` [PATCH 1/3][v3] blkio: Remove per-cfqq nr_sectors as we'll be passing Divyesh Shah 2010-04-06 15:30 ` Vivek Goyal 2010-04-06 3:37 ` [PATCH 2/3][v3] blkio: Add io controller stats like Divyesh Shah 2010-04-06 15:16 ` Vivek Goyal 2010-04-06 16:59 ` Divyesh Shah 2010-04-06 17:42 ` Vivek Goyal 2010-04-06 3:37 ` [PATCH 3/3][v3] blkio: Increment the blkio cgroup stats for real now Divyesh Shah 2010-04-06 15:30 ` Vivek Goyal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox