* [RFC PATCH 0/2] CPU controller statistics - v5 @ 2009-02-25 10:57 Bharata B Rao 2009-02-25 10:58 ` [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) Bharata B Rao 2009-02-25 10:59 ` [RFC PATCH 2/2] Add per-cgroup CPU controller statistics Bharata B Rao 0 siblings, 2 replies; 14+ messages in thread From: Bharata B Rao @ 2009-02-25 10:57 UTC (permalink / raw) To: linux-kernel Cc: Balaji Rao, Dhaval Giani, Balbir Singh, Li Zefan, Paul Menage, Andrew Morton, Ingo Molnar, Peter Zijlstra Hi, Last year, Balaji posted a patch to collect CPU controller statistics. After a few initial versions, I couldn't see any activity. Here I am posting the next version of the patch for review. Changes for v5: - Updated to 2.6.29-rc6. - Separated cgroup modifications into a different patch. - Changed the prototype of the ->initialize() subsystem API from cgroup_subsys->initialize(int early) to cgroup_subsys->initialize(void) and calling it only from cgroup_init() and not from cgroup_init_early(). - Updated documentation for new API. - Added guest time metric as another cpu controller statistic. - Free percpu statistics counters when the cgroup is brought down. - Account irq and softirq time also as system time for cgroup accounting. - Separate out stats collection code under CONFIG_CGROUP_SCHED to a helper funtion to reduce ifdefs. v4: - http://lkml.org/lkml/2008/5/11/163 I have tried to address most of the comments given for the 4th version of the patch. A few questions still remain: - percpu counters are used for stats collection. Since percpu counters aren't usable during cgroup_init_early(), we have to allocate percpu stats counter for init_task_group separately later during cgroup_init(). Because of this, in the stats collection code, we end up having a 'is stats counter allocated?' check - This patch collects per cgroup cpu controller stats. Does it make sense to account stime and utime hierarchially ? If so, we would probably be duplicating what cpuacct controller already does. - Is steal time a useful per-cgroup metric ? Perhaps in container based virtualized environments ? Regards, Bharata. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) 2009-02-25 10:57 [RFC PATCH 0/2] CPU controller statistics - v5 Bharata B Rao @ 2009-02-25 10:58 ` Bharata B Rao 2009-02-26 2:55 ` Li Zefan 2009-02-25 10:59 ` [RFC PATCH 2/2] Add per-cgroup CPU controller statistics Bharata B Rao 1 sibling, 1 reply; 14+ messages in thread From: Bharata B Rao @ 2009-02-25 10:58 UTC (permalink / raw) To: linux-kernel Cc: Balaji Rao, Dhaval Giani, Balbir Singh, Li Zefan, Paul Menage, Andrew Morton, Ingo Molnar, Peter Zijlstra From: Balaji Rao <balajirrao@gmail.com> cgroup: Add ->initialize() to cgroup_subsys structure Some cgroup subsystems (like cpu controller) would need subsystem specific initialization. Such subsystems can define ->initialize() which gets called during cgroup_init() (and not cgroup_init_early()). Signed-off-by: Balaji Rao <balajirrao@gmail.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> --- Documentation/cgroups/cgroups.txt | 6 ++++++ include/linux/cgroup.h | 1 + kernel/cgroup.c | 2 ++ 3 files changed, 9 insertions(+) --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -534,6 +534,12 @@ and root cgroup. Currently this will onl the default hierarchy (which never has sub-cgroups) and a hierarchy that is being created/destroyed (and hence has no sub-cgroups). +void initialize(void) + +Called during cgroup initialization (cgroup_init()) to do any subsystem +specific initialization. This is not called during early cgroup +initialialization (cgroup_init_early(). + 4. Questions ============ --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -359,6 +359,7 @@ struct cgroup_subsys { struct cgroup *cgrp); void (*post_clone)(struct cgroup_subsys *ss, struct cgroup *cgrp); void (*bind)(struct cgroup_subsys *ss, struct cgroup *root); + void (*initialize)(void); int subsys_id; int active; --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -2707,6 +2707,8 @@ int __init cgroup_init(void) struct cgroup_subsys *ss = subsys[i]; if (!ss->early_init) cgroup_init_subsys(ss); + if (ss->initialize) + ss->initialize(); } /* Add init_css_set to the hash table */ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) 2009-02-25 10:58 ` [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) Bharata B Rao @ 2009-02-26 2:55 ` Li Zefan 2009-02-26 7:52 ` Bharata B Rao 0 siblings, 1 reply; 14+ messages in thread From: Li Zefan @ 2009-02-26 2:55 UTC (permalink / raw) To: bharata Cc: linux-kernel, Balaji Rao, Dhaval Giani, Balbir Singh, Paul Menage, Andrew Morton, Ingo Molnar, Peter Zijlstra Bharata B Rao wrote: > From: Balaji Rao <balajirrao@gmail.com> > > cgroup: Add ->initialize() to cgroup_subsys structure > > Some cgroup subsystems (like cpu controller) would need subsystem > specific initialization. Such subsystems can define ->initialize() > which gets called during cgroup_init() (and not cgroup_init_early()). > I think it's better to avoid adding this. It would be best if we can add a hook to initialize init_task_group.stat where kmalloc is available but acount_xxx_time() hasn't been called. Otherwise, we have to check (tg->stat == NULL) in account_task_group_time(), then why not add a hook in smp_init_smp() to do initialization? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) 2009-02-26 2:55 ` Li Zefan @ 2009-02-26 7:52 ` Bharata B Rao 2009-02-26 8:11 ` Li Zefan 2009-02-26 8:48 ` Dhaval Giani 0 siblings, 2 replies; 14+ messages in thread From: Bharata B Rao @ 2009-02-26 7:52 UTC (permalink / raw) To: Li Zefan Cc: linux-kernel, Balaji Rao, Dhaval Giani, Balbir Singh, Paul Menage, Andrew Morton, Ingo Molnar, Peter Zijlstra On Thu, Feb 26, 2009 at 10:55:54AM +0800, Li Zefan wrote: > Bharata B Rao wrote: > > From: Balaji Rao <balajirrao@gmail.com> > > > > cgroup: Add ->initialize() to cgroup_subsys structure > > > > Some cgroup subsystems (like cpu controller) would need subsystem > > specific initialization. Such subsystems can define ->initialize() > > which gets called during cgroup_init() (and not cgroup_init_early()). > > > > I think it's better to avoid adding this. > > It would be best if we can add a hook to initialize init_task_group.stat where > kmalloc is available but acount_xxx_time() hasn't been called. Otherwise, we > have to check (tg->stat == NULL) in account_task_group_time(), then why not add > a hook in smp_init_smp() to do initialization? account_xxx_time() is called from scheduler ticks and AFAICS they end up getting called much before kmalloc is available. In any case, I would think any hook to just initialize stats for init_task_group would be very very (cpu controller) subsytem specific. Isn't that bad ? Another solution I see which can prevent all this is not to collect stats for init_task_group at all with the understanding that system wide stime/utime accounting (which is already present) is essentially the accounting for init_task_group because init_task_group comprises of all the tasks in the system. But this would necessiate us to make collection of cpu controller stats hierarchial. This was one of the questions I asked in my 0/2 thread. Shouldn't we be doing hierarchial accounting for cpu controller ? Another thing that could be done is to enhance already existing cpuacct controller to do stime/utime accouting also. cpuacct controller exists precisely for doing per-cgroup accounting and is there any reason why these stats shouldn't be part of cpuacct controller. If we do this, users would be forced to use cpu controller and cpuacct controller together. Is that a problem ? Regards, Bharata. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) 2009-02-26 7:52 ` Bharata B Rao @ 2009-02-26 8:11 ` Li Zefan 2009-02-26 8:20 ` KAMEZAWA Hiroyuki 2009-02-26 10:12 ` Bharata B Rao 2009-02-26 8:48 ` Dhaval Giani 1 sibling, 2 replies; 14+ messages in thread From: Li Zefan @ 2009-02-26 8:11 UTC (permalink / raw) To: bharata Cc: linux-kernel, Balaji Rao, Dhaval Giani, Balbir Singh, Paul Menage, Andrew Morton, Ingo Molnar, Peter Zijlstra Bharata B Rao wrote: > On Thu, Feb 26, 2009 at 10:55:54AM +0800, Li Zefan wrote: >> Bharata B Rao wrote: >>> From: Balaji Rao <balajirrao@gmail.com> >>> >>> cgroup: Add ->initialize() to cgroup_subsys structure >>> >>> Some cgroup subsystems (like cpu controller) would need subsystem >>> specific initialization. Such subsystems can define ->initialize() >>> which gets called during cgroup_init() (and not cgroup_init_early()). >>> >> I think it's better to avoid adding this. >> >> It would be best if we can add a hook to initialize init_task_group.stat where >> kmalloc is available but acount_xxx_time() hasn't been called. Otherwise, we >> have to check (tg->stat == NULL) in account_task_group_time(), then why not add >> a hook in smp_init_smp() to do initialization? > > account_xxx_time() is called from scheduler ticks and AFAICS they end up > getting called much before kmalloc is available. In any case, I would think > any hook to just initialize stats for init_task_group would be > very very (cpu controller) subsytem specific. Isn't that bad ? > Since it's very very cpu subsystem specific, so it's better to use it's own hook. (and because the initialize() API is not so elegant..) > Another solution I see which can prevent all this is not to collect > stats for init_task_group at all with the understanding that system wide This came to my mind too. ;) > stime/utime accounting (which is already present) is essentially the > accounting for init_task_group because init_task_group comprises of all > the tasks in the system. But this would necessiate us to make collection > of cpu controller stats hierarchial. This was one of the questions I asked > in my 0/2 thread. Shouldn't we be doing hierarchial accounting for > cpu controller ? > Don't know. I have no strong opinion about this. I'm a bit doubt how useful this is. > Another thing that could be done is to enhance already existing > cpuacct controller to do stime/utime accouting also. cpuacct controller > exists precisely for doing per-cgroup accounting and is there any reason > why these stats shouldn't be part of cpuacct controller. If we do this, > users would be forced to use cpu controller and cpuacct controller > together. Is that a problem ? > I wondered why these stats is part of cpu subsystem but not cpuacct. And I don't see any problem to use these 2 subsystems together. Actually this is one of the advantage of cgroup. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) 2009-02-26 8:11 ` Li Zefan @ 2009-02-26 8:20 ` KAMEZAWA Hiroyuki 2009-02-26 10:12 ` Bharata B Rao 1 sibling, 0 replies; 14+ messages in thread From: KAMEZAWA Hiroyuki @ 2009-02-26 8:20 UTC (permalink / raw) To: Li Zefan Cc: bharata, linux-kernel, Balaji Rao, Dhaval Giani, Balbir Singh, Paul Menage, Andrew Morton, Ingo Molnar, Peter Zijlstra On Thu, 26 Feb 2009 16:11:36 +0800 Li Zefan <lizf@cn.fujitsu.com> wrote: > Bharata B Rao wrote: > > On Thu, Feb 26, 2009 at 10:55:54AM +0800, Li Zefan wrote: > >> Bharata B Rao wrote: > >>> From: Balaji Rao <balajirrao@gmail.com> > > Another thing that could be done is to enhance already existing > > cpuacct controller to do stime/utime accouting also. cpuacct controller > > exists precisely for doing per-cgroup accounting and is there any reason > > why these stats shouldn't be part of cpuacct controller. If we do this, > > users would be forced to use cpu controller and cpuacct controller > > together. Is that a problem ? > > > > I wondered why these stats is part of cpu subsystem but not cpuacct. > And I don't see any problem to use these 2 subsystems together. Actually > this is one of the advantage of cgroup. > IMHO, cpuacct subsystem should count utime/stime. current statistics is not so useful. BTW, I wonde why cpuacct subsys walks up to root at charging... I think we can gather all subtree information at read... Thanks -Kame ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) 2009-02-26 8:11 ` Li Zefan 2009-02-26 8:20 ` KAMEZAWA Hiroyuki @ 2009-02-26 10:12 ` Bharata B Rao 1 sibling, 0 replies; 14+ messages in thread From: Bharata B Rao @ 2009-02-26 10:12 UTC (permalink / raw) To: Li Zefan Cc: linux-kernel, Balaji Rao, Dhaval Giani, Balbir Singh, Paul Menage, Andrew Morton, Ingo Molnar, Peter Zijlstra On Thu, Feb 26, 2009 at 04:11:36PM +0800, Li Zefan wrote: > Bharata B Rao wrote: > > On Thu, Feb 26, 2009 at 10:55:54AM +0800, Li Zefan wrote: > >> Bharata B Rao wrote: > >>> From: Balaji Rao <balajirrao@gmail.com> > >>> > >>> cgroup: Add ->initialize() to cgroup_subsys structure > >>> > >>> Some cgroup subsystems (like cpu controller) would need subsystem > >>> specific initialization. Such subsystems can define ->initialize() > >>> which gets called during cgroup_init() (and not cgroup_init_early()). > >>> > >> I think it's better to avoid adding this. > >> > >> It would be best if we can add a hook to initialize init_task_group.stat where > >> kmalloc is available but acount_xxx_time() hasn't been called. Otherwise, we > >> have to check (tg->stat == NULL) in account_task_group_time(), then why not add > >> a hook in smp_init_smp() to do initialization? > > > > account_xxx_time() is called from scheduler ticks and AFAICS they end up > > getting called much before kmalloc is available. In any case, I would think > > any hook to just initialize stats for init_task_group would be > > very very (cpu controller) subsytem specific. Isn't that bad ? > > > > Since it's very very cpu subsystem specific, so it's better to use it's own hook. > (and because the initialize() API is not so elegant..) > > > Another solution I see which can prevent all this is not to collect > > stats for init_task_group at all with the understanding that system wide > > This came to my mind too. ;) > > > stime/utime accounting (which is already present) is essentially the > > accounting for init_task_group because init_task_group comprises of all > > the tasks in the system. But this would necessiate us to make collection > > of cpu controller stats hierarchial. This was one of the questions I asked > > in my 0/2 thread. Shouldn't we be doing hierarchial accounting for > > cpu controller ? > > > > Don't know. I have no strong opinion about this. I'm a bit doubt how useful > this is. > > > Another thing that could be done is to enhance already existing > > cpuacct controller to do stime/utime accouting also. cpuacct controller > > exists precisely for doing per-cgroup accounting and is there any reason > > why these stats shouldn't be part of cpuacct controller. If we do this, > > users would be forced to use cpu controller and cpuacct controller > > together. Is that a problem ? > > > > I wondered why these stats is part of cpu subsystem but not cpuacct. > And I don't see any problem to use these 2 subsystems together. Actually > this is one of the advantage of cgroup. Ok, so if there are no objections, my next version would move these stats to cpuacct subsystem. Regards, Bharata. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) 2009-02-26 7:52 ` Bharata B Rao 2009-02-26 8:11 ` Li Zefan @ 2009-02-26 8:48 ` Dhaval Giani 2009-02-26 8:52 ` Li Zefan 1 sibling, 1 reply; 14+ messages in thread From: Dhaval Giani @ 2009-02-26 8:48 UTC (permalink / raw) To: Bharata B Rao Cc: Li Zefan, linux-kernel, Balaji Rao, Balbir Singh, Paul Menage, Andrew Morton, Ingo Molnar, Peter Zijlstra On Thu, Feb 26, 2009 at 01:22:59PM +0530, Bharata B Rao wrote: > On Thu, Feb 26, 2009 at 10:55:54AM +0800, Li Zefan wrote: > > Bharata B Rao wrote: > > > From: Balaji Rao <balajirrao@gmail.com> > > > > > > cgroup: Add ->initialize() to cgroup_subsys structure > > > > > > Some cgroup subsystems (like cpu controller) would need subsystem > > > specific initialization. Such subsystems can define ->initialize() > > > which gets called during cgroup_init() (and not cgroup_init_early()). > > > > > > > I think it's better to avoid adding this. > > > > It would be best if we can add a hook to initialize init_task_group.stat where > > kmalloc is available but acount_xxx_time() hasn't been called. Otherwise, we > > have to check (tg->stat == NULL) in account_task_group_time(), then why not add > > a hook in smp_init_smp() to do initialization? > > account_xxx_time() is called from scheduler ticks and AFAICS they end up > getting called much before kmalloc is available. In any case, I would think > any hook to just initialize stats for init_task_group would be > very very (cpu controller) subsytem specific. Isn't that bad ? > > Another solution I see which can prevent all this is not to collect > stats for init_task_group at all with the understanding that system wide > stime/utime accounting (which is already present) is essentially the > accounting for init_task_group because init_task_group comprises of all > the tasks in the system. But this would necessiate us to make collection > of cpu controller stats hierarchial. This was one of the questions I asked > in my 0/2 thread. Shouldn't we be doing hierarchial accounting for > cpu controller ? > I believe hierarchical accounting is natural. Since the CPU controller divides its bandwidth as per hierarchy, it makes sense for the accounting to also be hierarchical. I guess this is a reasonable compromise. But how do you still check if the group you are accounting for is the root group or not? (Unless I am missing something obvious) thanks, -- regards, Dhaval ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) 2009-02-26 8:48 ` Dhaval Giani @ 2009-02-26 8:52 ` Li Zefan 0 siblings, 0 replies; 14+ messages in thread From: Li Zefan @ 2009-02-26 8:52 UTC (permalink / raw) To: Dhaval Giani Cc: Bharata B Rao, linux-kernel, Balaji Rao, Balbir Singh, Paul Menage, Andrew Morton, Ingo Molnar, Peter Zijlstra > I believe hierarchical accounting is natural. Since the CPU controller > divides its bandwidth as per hierarchy, it makes sense for the > accounting to also be hierarchical. I guess this is a reasonable > compromise. But how do you still check if the group you are accounting > for is the root group or not? (Unless I am missing something obvious) > As easy as: if (cgrp->parent == NULL) or if (cgrp == cgrp->top_cgroup) ^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC PATCH 2/2] Add per-cgroup CPU controller statistics 2009-02-25 10:57 [RFC PATCH 0/2] CPU controller statistics - v5 Bharata B Rao 2009-02-25 10:58 ` [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) Bharata B Rao @ 2009-02-25 10:59 ` Bharata B Rao 2009-02-25 11:04 ` Peter Zijlstra 1 sibling, 1 reply; 14+ messages in thread From: Bharata B Rao @ 2009-02-25 10:59 UTC (permalink / raw) To: linux-kernel Cc: Balaji Rao, Dhaval Giani, Balbir Singh, Li Zefan, Paul Menage, Andrew Morton, Ingo Molnar, Peter Zijlstra From: Balaji Rao <balajirrao@gmail.com> sched: Add cpu controller statistics Add per-cgroup cpu controller statistics like system time and user time consumed by groups of tasks. Signed-off-by: Balaji Rao <balajirrao@gmail.com> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> --- kernel/sched.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 106 insertions(+) --- a/kernel/sched.c +++ b/kernel/sched.c @@ -263,10 +263,25 @@ struct cfs_rq; static LIST_HEAD(task_groups); +#ifdef CONFIG_CGROUP_SCHED +enum cpu_cgroup_stat_index { + CPU_CGROUP_STAT_UTIME, /* User time of the task group */ + CPU_CGROUP_STAT_STIME, /* Kernel time of the task group */ + CPU_CGROUP_STAT_GTIME, /* Guest time of the task group */ + + CPU_CGROUP_STAT_NSTATS, +}; + +struct cpu_cgroup_stat { + struct percpu_counter cpustat[CPU_CGROUP_STAT_NSTATS]; +}; +#endif + /* task group related information */ struct task_group { #ifdef CONFIG_CGROUP_SCHED struct cgroup_subsys_state css; + struct cpu_cgroup_stat *stat; #endif #ifdef CONFIG_USER_SCHED @@ -4159,6 +4174,19 @@ unsigned long long task_delta_exec(struc return ns; } +#ifdef CONFIG_CGROUP_SCHED +static void account_task_group_time(struct task_struct *p, + enum cpu_cgroup_stat_index idx, int val) +{ + struct task_group *tg = task_group(p); + + if (likely(tg->stat)) + percpu_counter_add(&tg->stat->cpustat[idx], val); +} +#else +#define account_task_group_time(x, y, z) { 0; } +#endif + /* * Account user cpu time to a process. * @p: the process that the cpu time gets accounted to @@ -4182,6 +4210,10 @@ void account_user_time(struct task_struc cpustat->nice = cputime64_add(cpustat->nice, tmp); else cpustat->user = cputime64_add(cpustat->user, tmp); + + account_task_group_time(p, CPU_CGROUP_STAT_UTIME, + cputime_to_msecs(cputime)); + /* Account for user time used */ acct_update_integrals(p); } @@ -4206,6 +4238,9 @@ static void account_guest_time(struct ta account_group_user_time(p, cputime); p->gtime = cputime_add(p->gtime, cputime); + account_task_group_time(p, CPU_CGROUP_STAT_GTIME, + cputime_to_msecs(cputime)); + /* Add guest time to cpustat. */ cpustat->user = cputime64_add(cpustat->user, tmp); cpustat->guest = cputime64_add(cpustat->guest, tmp); @@ -4243,6 +4278,8 @@ void account_system_time(struct task_str else cpustat->system = cputime64_add(cpustat->system, tmp); + account_task_group_time(p, CPU_CGROUP_STAT_STIME, + cputime_to_msecs(cputime)); /* Account for system time used */ acct_update_integrals(p); } @@ -9290,6 +9327,7 @@ static struct cgroup_subsys_state * cpu_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cgrp) { struct task_group *tg, *parent; + int i; if (!cgrp->parent) { /* This is early initialization for the top cgroup */ @@ -9301,6 +9339,15 @@ cpu_cgroup_create(struct cgroup_subsys * if (IS_ERR(tg)) return ERR_PTR(-ENOMEM); + tg->stat = kmalloc(sizeof(struct cpu_cgroup_stat), GFP_KERNEL); + if (!tg->stat) { + sched_destroy_group(tg); + return ERR_PTR(-ENOMEM); + } + + for (i = 0; i < CPU_CGROUP_STAT_NSTATS; i++) + percpu_counter_init(&tg->stat->cpustat[i], 0); + return &tg->css; } @@ -9308,6 +9355,13 @@ static void cpu_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp) { struct task_group *tg = cgroup_tg(cgrp); + int i; + + if (tg->stat) { + for (i = 0; i < CPU_CGROUP_STAT_NSTATS; i++) + percpu_counter_destroy(&tg->stat->cpustat[i]); + kfree(tg->stat); + } sched_destroy_group(tg); } @@ -9336,6 +9390,53 @@ cpu_cgroup_attach(struct cgroup_subsys * sched_move_task(tsk); } +static void cpu_cgroup_initialize(void) +{ + int i; + struct cpu_cgroup_stat *stat; + + stat = kmalloc(sizeof(struct cpu_cgroup_stat) , GFP_KERNEL); + if (!stat) + return; + + for (i = 0; i < CPU_CGROUP_STAT_NSTATS; i++) + percpu_counter_init(&stat->cpustat[i], 0); + init_task_group.stat = stat; +} + +static s64 cpu_cgroup_read_stat(struct cpu_cgroup_stat *stat, + enum cpu_cgroup_stat_index idx) +{ + if (stat) + return percpu_counter_read(&stat->cpustat[idx]); + return 0; +} + +static const struct cpu_cgroup_stat_desc { + const char *msg; + u64 unit; +} cpu_cgroup_stat_desc[] = { + [CPU_CGROUP_STAT_UTIME] = { "utime", 1, }, + [CPU_CGROUP_STAT_STIME] = { "stime", 1, }, + [CPU_CGROUP_STAT_GTIME] = { "gtime", 1, }, +}; + +static int cpu_cgroup_stats_show(struct cgroup *cgrp, struct cftype *cft, + struct cgroup_map_cb *cb) +{ + struct task_group *tg = cgroup_tg(cgrp); + struct cpu_cgroup_stat *stat = tg->stat; + int i; + + for (i = 0; i < CPU_CGROUP_STAT_NSTATS; i++) { + s64 val; + val = cpu_cgroup_read_stat(stat, i); + val *= cpu_cgroup_stat_desc[i].unit; + cb->fill(cb, cpu_cgroup_stat_desc[i].msg, val); + } + return 0; +} + #ifdef CONFIG_FAIR_GROUP_SCHED static int cpu_shares_write_u64(struct cgroup *cgrp, struct cftype *cftype, u64 shareval) @@ -9395,6 +9496,10 @@ static struct cftype cpu_files[] = { .write_u64 = cpu_rt_period_write_uint, }, #endif + { + .name = "stat", + .read_map = cpu_cgroup_stats_show, + }, }; static int cpu_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont) @@ -9409,6 +9514,7 @@ struct cgroup_subsys cpu_cgroup_subsys = .can_attach = cpu_cgroup_can_attach, .attach = cpu_cgroup_attach, .populate = cpu_cgroup_populate, + .initialize = cpu_cgroup_initialize, .subsys_id = cpu_cgroup_subsys_id, .early_init = 1, }; ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] Add per-cgroup CPU controller statistics 2009-02-25 10:59 ` [RFC PATCH 2/2] Add per-cgroup CPU controller statistics Bharata B Rao @ 2009-02-25 11:04 ` Peter Zijlstra 2009-02-25 11:20 ` Bharata B Rao 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2009-02-25 11:04 UTC (permalink / raw) To: bharata Cc: linux-kernel, Balaji Rao, Dhaval Giani, Balbir Singh, Li Zefan, Paul Menage, Andrew Morton, Ingo Molnar On Wed, 2009-02-25 at 16:29 +0530, Bharata B Rao wrote: > From: Balaji Rao <balajirrao@gmail.com> > > sched: Add cpu controller statistics > > Add per-cgroup cpu controller statistics like system time and user time > consumed by groups of tasks. Do we want this unconditionally? > Signed-off-by: Balaji Rao <balajirrao@gmail.com> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> > --- > kernel/sched.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 106 insertions(+) > > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -263,10 +263,25 @@ struct cfs_rq; > > static LIST_HEAD(task_groups); > > +#ifdef CONFIG_CGROUP_SCHED > +enum cpu_cgroup_stat_index { > + CPU_CGROUP_STAT_UTIME, /* User time of the task group */ > + CPU_CGROUP_STAT_STIME, /* Kernel time of the task group */ > + CPU_CGROUP_STAT_GTIME, /* Guest time of the task group */ > + > + CPU_CGROUP_STAT_NSTATS, > +}; > + > +struct cpu_cgroup_stat { > + struct percpu_counter cpustat[CPU_CGROUP_STAT_NSTATS]; > +}; > +#endif > + > /* task group related information */ > struct task_group { > #ifdef CONFIG_CGROUP_SCHED > struct cgroup_subsys_state css; > + struct cpu_cgroup_stat *stat; > #endif > > #ifdef CONFIG_USER_SCHED > @@ -4159,6 +4174,19 @@ unsigned long long task_delta_exec(struc > return ns; > } > > +#ifdef CONFIG_CGROUP_SCHED > +static void account_task_group_time(struct task_struct *p, > + enum cpu_cgroup_stat_index idx, int val) > +{ > + struct task_group *tg = task_group(p); > + > + if (likely(tg->stat)) > + percpu_counter_add(&tg->stat->cpustat[idx], val); > +} > +#else > +#define account_task_group_time(x, y, z) { 0; } inline please, so we get argument validation. > +#endif > + > /* > * Account user cpu time to a process. > * @p: the process that the cpu time gets accounted to > @@ -4182,6 +4210,10 @@ void account_user_time(struct task_struc > cpustat->nice = cputime64_add(cpustat->nice, tmp); > else > cpustat->user = cputime64_add(cpustat->user, tmp); > + > + account_task_group_time(p, CPU_CGROUP_STAT_UTIME, > + cputime_to_msecs(cputime)); > + > /* Account for user time used */ > acct_update_integrals(p); > } > @@ -4206,6 +4238,9 @@ static void account_guest_time(struct ta > account_group_user_time(p, cputime); > p->gtime = cputime_add(p->gtime, cputime); > > + account_task_group_time(p, CPU_CGROUP_STAT_GTIME, > + cputime_to_msecs(cputime)); > + > /* Add guest time to cpustat. */ > cpustat->user = cputime64_add(cpustat->user, tmp); > cpustat->guest = cputime64_add(cpustat->guest, tmp); > @@ -4243,6 +4278,8 @@ void account_system_time(struct task_str > else > cpustat->system = cputime64_add(cpustat->system, tmp); > > + account_task_group_time(p, CPU_CGROUP_STAT_STIME, > + cputime_to_msecs(cputime)); > /* Account for system time used */ > acct_update_integrals(p); > } > @@ -9290,6 +9327,7 @@ static struct cgroup_subsys_state * > cpu_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cgrp) > { > struct task_group *tg, *parent; > + int i; > > if (!cgrp->parent) { > /* This is early initialization for the top cgroup */ > @@ -9301,6 +9339,15 @@ cpu_cgroup_create(struct cgroup_subsys * > if (IS_ERR(tg)) > return ERR_PTR(-ENOMEM); > > + tg->stat = kmalloc(sizeof(struct cpu_cgroup_stat), GFP_KERNEL); > + if (!tg->stat) { > + sched_destroy_group(tg); > + return ERR_PTR(-ENOMEM); > + } > + > + for (i = 0; i < CPU_CGROUP_STAT_NSTATS; i++) > + percpu_counter_init(&tg->stat->cpustat[i], 0); percpu_counter_init() can fail with -ENOMEM. > return &tg->css; > } > > @@ -9308,6 +9355,13 @@ static void > cpu_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp) > { > struct task_group *tg = cgroup_tg(cgrp); > + int i; > + > + if (tg->stat) { > + for (i = 0; i < CPU_CGROUP_STAT_NSTATS; i++) > + percpu_counter_destroy(&tg->stat->cpustat[i]); > + kfree(tg->stat); > + } > > sched_destroy_group(tg); > } > @@ -9336,6 +9390,53 @@ cpu_cgroup_attach(struct cgroup_subsys * > sched_move_task(tsk); > } > > +static void cpu_cgroup_initialize(void) > +{ > + int i; > + struct cpu_cgroup_stat *stat; > + > + stat = kmalloc(sizeof(struct cpu_cgroup_stat) , GFP_KERNEL); > + if (!stat) > + return; > + > + for (i = 0; i < CPU_CGROUP_STAT_NSTATS; i++) > + percpu_counter_init(&stat->cpustat[i], 0); > + init_task_group.stat = stat; > +} > + > +static s64 cpu_cgroup_read_stat(struct cpu_cgroup_stat *stat, > + enum cpu_cgroup_stat_index idx) > +{ > + if (stat) > + return percpu_counter_read(&stat->cpustat[idx]); > + return 0; > +} > + > +static const struct cpu_cgroup_stat_desc { > + const char *msg; > + u64 unit; > +} cpu_cgroup_stat_desc[] = { > + [CPU_CGROUP_STAT_UTIME] = { "utime", 1, }, > + [CPU_CGROUP_STAT_STIME] = { "stime", 1, }, > + [CPU_CGROUP_STAT_GTIME] = { "gtime", 1, }, > +}; > + > +static int cpu_cgroup_stats_show(struct cgroup *cgrp, struct cftype *cft, > + struct cgroup_map_cb *cb) > +{ > + struct task_group *tg = cgroup_tg(cgrp); > + struct cpu_cgroup_stat *stat = tg->stat; > + int i; > + > + for (i = 0; i < CPU_CGROUP_STAT_NSTATS; i++) { > + s64 val; > + val = cpu_cgroup_read_stat(stat, i); > + val *= cpu_cgroup_stat_desc[i].unit; > + cb->fill(cb, cpu_cgroup_stat_desc[i].msg, val); > + } > + return 0; > +} > + > #ifdef CONFIG_FAIR_GROUP_SCHED > static int cpu_shares_write_u64(struct cgroup *cgrp, struct cftype *cftype, > u64 shareval) > @@ -9395,6 +9496,10 @@ static struct cftype cpu_files[] = { > .write_u64 = cpu_rt_period_write_uint, > }, > #endif > + { > + .name = "stat", > + .read_map = cpu_cgroup_stats_show, > + }, > }; > > static int cpu_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont) > @@ -9409,6 +9514,7 @@ struct cgroup_subsys cpu_cgroup_subsys = > .can_attach = cpu_cgroup_can_attach, > .attach = cpu_cgroup_attach, > .populate = cpu_cgroup_populate, > + .initialize = cpu_cgroup_initialize, > .subsys_id = cpu_cgroup_subsys_id, > .early_init = 1, > }; ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] Add per-cgroup CPU controller statistics 2009-02-25 11:04 ` Peter Zijlstra @ 2009-02-25 11:20 ` Bharata B Rao 2009-02-25 11:24 ` Peter Zijlstra 0 siblings, 1 reply; 14+ messages in thread From: Bharata B Rao @ 2009-02-25 11:20 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Balaji Rao, Dhaval Giani, Balbir Singh, Li Zefan, Paul Menage, Andrew Morton, Ingo Molnar On Wed, Feb 25, 2009 at 12:04:10PM +0100, Peter Zijlstra wrote: > On Wed, 2009-02-25 at 16:29 +0530, Bharata B Rao wrote: > > From: Balaji Rao <balajirrao@gmail.com> > > > > sched: Add cpu controller statistics > > > > Add per-cgroup cpu controller statistics like system time and user time > > consumed by groups of tasks. > > Do we want this unconditionally? Not sure if I understand you correctly here, but we are collecting stats only when CONFIG_CGROUP_SCHED. Or do you mean we need a config option to enable stats collection ? If so, we have memory controller already providing stats unconditionally. > > > > +#ifdef CONFIG_CGROUP_SCHED > > +static void account_task_group_time(struct task_struct *p, > > + enum cpu_cgroup_stat_index idx, int val) > > +{ > > + struct task_group *tg = task_group(p); > > + > > + if (likely(tg->stat)) > > + percpu_counter_add(&tg->stat->cpustat[idx], val); > > +} > > +#else > > +#define account_task_group_time(x, y, z) { 0; } > > inline please, so we get argument validation. Actually wanted to inline, but one of the arguments (cpu_cgroup_stat_index) is defined only CONFIG_CGROUP_SCHED. Hence didn't want argument validation here. > > + > > + for (i = 0; i < CPU_CGROUP_STAT_NSTATS; i++) > > + percpu_counter_init(&tg->stat->cpustat[i], 0); > > percpu_counter_init() can fail with -ENOMEM. Ok, will fix. Thanks for your review, Peter. Regards, Bharata. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] Add per-cgroup CPU controller statistics 2009-02-25 11:20 ` Bharata B Rao @ 2009-02-25 11:24 ` Peter Zijlstra 2009-02-25 11:41 ` Bharata B Rao 0 siblings, 1 reply; 14+ messages in thread From: Peter Zijlstra @ 2009-02-25 11:24 UTC (permalink / raw) To: bharata Cc: linux-kernel, Balaji Rao, Dhaval Giani, Balbir Singh, Li Zefan, Paul Menage, Andrew Morton, Ingo Molnar On Wed, 2009-02-25 at 16:50 +0530, Bharata B Rao wrote: > On Wed, Feb 25, 2009 at 12:04:10PM +0100, Peter Zijlstra wrote: > > On Wed, 2009-02-25 at 16:29 +0530, Bharata B Rao wrote: > > > From: Balaji Rao <balajirrao@gmail.com> > > > > > > sched: Add cpu controller statistics > > > > > > Add per-cgroup cpu controller statistics like system time and user time > > > consumed by groups of tasks. > > > > Do we want this unconditionally? > > Not sure if I understand you correctly here, but we are collecting > stats only when CONFIG_CGROUP_SCHED. > > Or do you mean we need a config option to enable stats collection ? > If so, we have memory controller already providing stats unconditionally. OK, I guess so ;-) > > > > > > +#ifdef CONFIG_CGROUP_SCHED > > > +static void account_task_group_time(struct task_struct *p, > > > + enum cpu_cgroup_stat_index idx, int val) > > > +{ > > > + struct task_group *tg = task_group(p); > > > + > > > + if (likely(tg->stat)) > > > + percpu_counter_add(&tg->stat->cpustat[idx], val); > > > +} > > > +#else > > > +#define account_task_group_time(x, y, z) { 0; } > > > > inline please, so we get argument validation. > > Actually wanted to inline, but one of the arguments (cpu_cgroup_stat_index) > is defined only CONFIG_CGROUP_SCHED. Hence didn't want argument > validation here. Just pull the enum outside of the ifdef? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/2] Add per-cgroup CPU controller statistics 2009-02-25 11:24 ` Peter Zijlstra @ 2009-02-25 11:41 ` Bharata B Rao 0 siblings, 0 replies; 14+ messages in thread From: Bharata B Rao @ 2009-02-25 11:41 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Balaji Rao, Dhaval Giani, Balbir Singh, Li Zefan, Paul Menage, Andrew Morton, Ingo Molnar On Wed, Feb 25, 2009 at 12:24:58PM +0100, Peter Zijlstra wrote: > On Wed, 2009-02-25 at 16:50 +0530, Bharata B Rao wrote: > > On Wed, Feb 25, 2009 at 12:04:10PM +0100, Peter Zijlstra wrote: > > > On Wed, 2009-02-25 at 16:29 +0530, Bharata B Rao wrote: > > > > > > > > +#ifdef CONFIG_CGROUP_SCHED > > > > +static void account_task_group_time(struct task_struct *p, > > > > + enum cpu_cgroup_stat_index idx, int val) > > > > +{ > > > > + struct task_group *tg = task_group(p); > > > > + > > > > + if (likely(tg->stat)) > > > > + percpu_counter_add(&tg->stat->cpustat[idx], val); > > > > +} > > > > +#else > > > > +#define account_task_group_time(x, y, z) { 0; } > > > > > > inline please, so we get argument validation. > > > > Actually wanted to inline, but one of the arguments (cpu_cgroup_stat_index) > > is defined only CONFIG_CGROUP_SCHED. Hence didn't want argument > > validation here. > > Just pull the enum outside of the ifdef? Ok will do it in the next iteration if that's acceptable and argument validation is important. Regards, Bharata. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-02-26 10:11 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-02-25 10:57 [RFC PATCH 0/2] CPU controller statistics - v5 Bharata B Rao 2009-02-25 10:58 ` [RFC PATCH 1/2] New cgroup subsystem API (->initialize()) Bharata B Rao 2009-02-26 2:55 ` Li Zefan 2009-02-26 7:52 ` Bharata B Rao 2009-02-26 8:11 ` Li Zefan 2009-02-26 8:20 ` KAMEZAWA Hiroyuki 2009-02-26 10:12 ` Bharata B Rao 2009-02-26 8:48 ` Dhaval Giani 2009-02-26 8:52 ` Li Zefan 2009-02-25 10:59 ` [RFC PATCH 2/2] Add per-cgroup CPU controller statistics Bharata B Rao 2009-02-25 11:04 ` Peter Zijlstra 2009-02-25 11:20 ` Bharata B Rao 2009-02-25 11:24 ` Peter Zijlstra 2009-02-25 11:41 ` Bharata B Rao
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox