From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [RFC 0/5] forced comounts for cgroups. Date: Tue, 4 Sep 2012 18:18:15 +0400 Message-ID: <1346768300-10282-1-git-send-email-glommer@parallels.com> Return-path: Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org Hi, As we have been extensively discussing, the cost and pain points for cgroups come from many places. But at least one of those is the arbitrary nature of hierarchies. Many people, including at least Tejun and me would like this to go away altogether. Problem so far, is breaking compatiblity with existing setups I am proposing here a default-n Kconfig option that will guarantee that the cpu cgroups (for now) will be comounted. I started with them because the cpu/cpuacct division is clearly the worst offender. Also, the default-n is here so distributions will have time to adapt: Forcing this flag to be on without userspace changes will just lead to cgroups failing to mount, which we don't want. Although I've tested it and it works, I haven't compile-tested all possible config combinations. So this is mostly for your eyes. If this gets traction, I'll submit it properly, along with any changes that you might require. Thanks. Glauber Costa (5): cgroup: allow some comounts to be forced. sched: adjust exec_clock to use it as cpu usage metric sched: do not call cpuacct_charge when cpu and cpuacct are comounted cpuacct: do not gather cpuacct statistics when not mounted sched: add cpusets to comounts list include/linux/cgroup.h | 6 ++ init/Kconfig | 23 ++++++++ kernel/cgroup.c | 29 +++++++++- kernel/cpuset.c | 4 ++ kernel/sched/core.c | 149 +++++++++++++++++++++++++++++++++++++++++++++---- kernel/sched/rt.c | 1 + kernel/sched/sched.h | 20 ++++++- 7 files changed, 220 insertions(+), 12 deletions(-) -- 1.7.11.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [RFC 1/5] cgroup: allow some comounts to be forced. Date: Tue, 4 Sep 2012 18:18:16 +0400 Message-ID: <1346768300-10282-2-git-send-email-glommer@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Return-path: In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org, Glauber Costa One of the pain points we have today with cgroups, is the excessive flexibility coming from the fact that controllers can be mounted at will, without any relationship with each other. Although this is nice in principle, this comes with a cost that is not always welcome in practice. The very fact of this being possible is already enough to trigger those costs. We cannot assume a common hierarchy between controllers, and then hierarchy walks have to be done more than once. This happens in hotpaths as well. This patch introduces a Kconfig option, default n, that will force some controllers to be comounted. After some time, we may be able to deprecate this mode of operation. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- include/linux/cgroup.h | 6 ++++++ init/Kconfig | 4 ++++ kernel/cgroup.c | 29 ++++++++++++++++++++++++++++- 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index d3f5fba..f986ad1 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -531,6 +531,12 @@ struct cgroup_subsys { /* should be defined only by modular subsystems */ struct module *module; + +#ifdef CONFIG_CGROUP_FORCE_COMOUNT + /* List of groups that we must be comounted with */ + int comounts; + int must_comount[3]; +#endif }; #define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys; diff --git a/init/Kconfig b/init/Kconfig index f64f888..d7d693d 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -680,6 +680,10 @@ config CGROUP_CPUACCT Provides a simple Resource Controller for monitoring the total CPU consumed by the tasks in a cgroup. +config CGROUP_FORCE_COMOUNT + bool + default n + config RESOURCE_COUNTERS bool "Resource counters" help diff --git a/kernel/cgroup.c b/kernel/cgroup.c index b303dfc..137ac62 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1058,6 +1058,33 @@ static int rebind_subsystems(struct cgroupfs_root *root, if (root->number_of_cgroups > 1) return -EBUSY; +#ifdef CONFIG_CGROUP_FORCE_COMOUNT + /* + * Some subsystems should not be allowed to be freely mounted in + * separate hierarchies. They may not be present, but if they are, they + * should be together. For compatibility with older kernels, we'll allow + * this to live inside a separate Kconfig option. Each subsys will be + * able to tell us which other subsys it expects to be mounted with. + * + * We do a separate path for this, to avoid unwinding our modifications + * in case of an error. + */ + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { + unsigned long bit = 1UL << i; + int j; + + if (!(bit & added_bits)) + continue; + + for (j = 0; j < subsys[i]->comounts; j++) { + int comount_id = subsys[i]->must_comount[j]; + struct cgroup_subsys *ss = subsys[comount_id]; + if ((ss->root != &rootnode) && (ss->root != root)) + return -EINVAL; + } + } +#endif + /* Process each subsystem */ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { struct cgroup_subsys *ss = subsys[i]; @@ -1634,7 +1661,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, goto unlock_drop; ret = rebind_subsystems(root, root->subsys_bits); - if (ret == -EBUSY) { + if ((ret == -EBUSY) || (ret == -EINVAL)) { free_cg_links(&tmp_cg_links); goto unlock_drop; } -- 1.7.11.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [RFC 2/5] sched: adjust exec_clock to use it as cpu usage metric Date: Tue, 4 Sep 2012 18:18:17 +0400 Message-ID: <1346768300-10282-3-git-send-email-glommer@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Return-path: In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org, Glauber Costa exec_clock already provides per-group cpu usage metrics, and can be reused by cpuacct in case cpu and cpuacct are comounted. However, it is only provided by tasks in fair class. Doing the same for rt is easy, and can be done in an already existing hierarchy loop. This is an improvement over the independent hierarchy walk executed by cpuacct. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- kernel/sched/rt.c | 1 + kernel/sched/sched.h | 3 +++ 2 files changed, 4 insertions(+) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 573e1ca..40ef6af 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -930,6 +930,7 @@ static void update_curr_rt(struct rq *rq) for_each_sched_rt_entity(rt_se) { rt_rq = rt_rq_of_se(rt_se); + schedstat_add(rt_rq, exec_clock, delta_exec); if (sched_rt_runtime(rt_rq) != RUNTIME_INF) { raw_spin_lock(&rt_rq->rt_runtime_lock); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 55844f2..8da579d 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -204,6 +204,7 @@ struct cfs_rq { unsigned int nr_running, h_nr_running; u64 exec_clock; + u64 prev_exec_clock; u64 min_vruntime; #ifndef CONFIG_64BIT u64 min_vruntime_copy; @@ -295,6 +296,8 @@ struct rt_rq { struct plist_head pushable_tasks; #endif int rt_throttled; + u64 exec_clock; + u64 prev_exec_clock; u64 rt_time; u64 rt_runtime; /* Nests inside the rq lock: */ -- 1.7.11.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [RFC 5/5] sched: add cpusets to comounts list Date: Tue, 4 Sep 2012 18:18:20 +0400 Message-ID: <1346768300-10282-6-git-send-email-glommer@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Return-path: In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org, Glauber Costa Although we have not yet identified any place where cpusets could be improved performance-wise by guaranteeing comounts with the other two cpu cgroups, it is a sane choice to mount them together. We can preemptively benefit from it and avoid a growing mess, by guaranteeing that subsystems that mostly contraint the same kind of resource will live together. With cgroups is never that simple, and things crosses boundaries quite often. But I hope this can be seen as a potential improvement. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- kernel/cpuset.c | 4 ++++ kernel/sched/core.c | 8 ++++---- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 8c8bd65..f8e1c49 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1879,6 +1879,10 @@ struct cgroup_subsys cpuset_subsys = { .post_clone = cpuset_post_clone, .subsys_id = cpuset_subsys_id, .base_cftypes = files, +#ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU + .comounts = 2, + .must_comount = { cpu_cgroup_subsys_id, cpuacct_subsys_id, }, +#endif .early_init = 1, }; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d654bd1..aeff02c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8301,8 +8301,8 @@ struct cgroup_subsys cpu_cgroup_subsys = { .subsys_id = cpu_cgroup_subsys_id, .base_cftypes = cpu_files, #ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU - .comounts = 1, - .must_comount = { cpuacct_subsys_id, }, + .comounts = 2, + .must_comount = { cpuacct_subsys_id, cpuset_subsys_id, }, .bind = cpu_cgroup_bind, #endif .early_init = 1, @@ -8637,8 +8637,8 @@ struct cgroup_subsys cpuacct_subsys = { .base_cftypes = files, .bind = cpuacct_bind, #ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU - .comounts = 1, - .must_comount = { cpu_cgroup_subsys_id, }, + .comounts = 2, + .must_comount = { cpu_cgroup_subsys_id, cpuset_subsys_id, }, #endif }; #endif /* CONFIG_CGROUP_CPUACCT */ -- 1.7.11.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [RFC 3/5] sched: do not call cpuacct_charge when cpu and cpuacct are comounted Date: Tue, 4 Sep 2012 18:18:18 +0400 Message-ID: <1346768300-10282-4-git-send-email-glommer@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Return-path: In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org, Glauber Costa cpuacct_charge() incurs in some quite expensive operations to achieve its measurement goal. To make matters worse, this cost is not constant, but grows with the depth of the cgroup hierarchy tree. Also, all this data is already available anyway in the scheduler core. The fact that the cpuacct cgroup cannot be guaranteed to be mounted in the same hierarchy as the scheduler core cgroup (cpu), forces us to go gather them all again. With the introduction of CONFIG_CGROUP_FORCE_COMOUNT_CPU, we will be able to be absolutely sure that such a coupling exists. After that, the hierarchy walks can be completely abandoned. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- init/Kconfig | 19 +++++++ kernel/sched/core.c | 141 +++++++++++++++++++++++++++++++++++++++++++++++---- kernel/sched/sched.h | 14 ++++- 3 files changed, 163 insertions(+), 11 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index d7d693d..694944e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -684,6 +684,25 @@ config CGROUP_FORCE_COMOUNT bool default n +config CGROUP_FORCE_COMOUNT_CPU + bool "Enforce single hierarchy for the cpu related cgroups" + depends on CGROUP_SCHED || CPUSETS || CGROUP_CPUACCT + select SCHEDSTATS + select CGROUP_FORCE_COMOUNT + default n + help + Throughout cgroup's life, it was always possible to mount the + controllers in completely independent hierarchies. However, the + costs incurred by allowing are considerably big. Hotpaths in the + scheduler needs to call expensive hierarchy walks more than once in + the same place just to account for the fact that multiple controllers + can be mounted in different places. + + Setting this option will disallow cpu, cpuacct and cpuset to be + mounted in different hierarchies. Distributions are highly encouraged + to set this option and comount those groups. + + config RESOURCE_COUNTERS bool "Resource counters" help diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 468bdd4..e46871d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8282,6 +8282,15 @@ static struct cftype cpu_files[] = { { } /* terminate */ }; +bool cpuacct_from_cpu; + +#ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU +void cpu_cgroup_bind(struct cgroup *root) +{ + cpuacct_from_cpu = root->root == root_task_group.css.cgroup->root; +#endif +} + struct cgroup_subsys cpu_cgroup_subsys = { .name = "cpu", .create = cpu_cgroup_create, @@ -8291,6 +8300,11 @@ struct cgroup_subsys cpu_cgroup_subsys = { .exit = cpu_cgroup_exit, .subsys_id = cpu_cgroup_subsys_id, .base_cftypes = cpu_files, +#ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU + .comounts = 1, + .must_comount = { cpuacct_subsys_id, }, + .bind = cpu_cgroup_bind, +#endif .early_init = 1, }; @@ -8345,8 +8359,102 @@ static void cpuacct_destroy(struct cgroup *cgrp) kfree(ca); } -static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu) +#ifdef CONFIG_CGROUP_SCHED +#ifdef CONFIG_FAIR_GROUP_SCHED +static struct cfs_rq * +cpu_cgroup_cfs_rq(struct cgroup *cgrp, int cpu) +{ + struct task_group *tg = cgroup_tg(cgrp); + + if (tg == &root_task_group) + return &cpu_rq(cpu)->cfs; + + return tg->cfs_rq[cpu]; +} + +static void cpu_cgroup_update_cpuusage_cfs(struct cgroup *cgrp, int cpu) +{ + struct cfs_rq *cfs = cpu_cgroup_cfs_rq(cgrp, cpu); + cfs->prev_exec_clock = cfs->exec_clock; +} +static u64 cpu_cgroup_cpuusage_cfs(struct cgroup *cgrp, int cpu) +{ + struct cfs_rq *cfs = cpu_cgroup_cfs_rq(cgrp, cpu); + return cfs->exec_clock - cfs->prev_exec_clock; +} +#else +static void cpu_cgroup_update_cpuusage_cfs(struct cgroup *cgrp, int cpu) { +} + +static u64 cpu_cgroup_cpuusage_cfs(struct cgroup *cgrp, int cpu) +{ + return 0; +} +#endif + +#ifdef CONFIG_RT_GROUP_SCHED +static struct rt_rq * +cpu_cgroup_rt_rq(struct cgroup *cgrp, int cpu) +{ + struct task_group *tg = cgroup_tg(cgrp); + if (tg == &root_task_group) + return &cpu_rq(cpu)->rt; + + return tg->rt_rq[cpu]; + +} +static void cpu_cgroup_update_cpuusage_rt(struct cgroup *cgrp, int cpu) +{ + struct rt_rq *rt = cpu_cgroup_rt_rq(cgrp, cpu); + rt->prev_exec_clock = rt->exec_clock; +} + +static u64 cpu_cgroup_cpuusage_rt(struct cgroup *cgrp, int cpu) +{ + struct rt_rq *rt = cpu_cgroup_rt_rq(cgrp, cpu); + return rt->exec_clock - rt->prev_exec_clock; +} +#else +static void cpu_cgroup_update_cpuusage_rt(struct cgroup *cgrp, int cpu) +{ +} +static u64 cpu_cgroup_cpuusage_rt(struct cgroup *cgrp, int cpu) +{ + return 0; +} +#endif + +static int cpu_cgroup_cpuusage_write(struct cgroup *cgrp, int cpu, u64 val) +{ + cpu_cgroup_update_cpuusage_cfs(cgrp, cpu); + cpu_cgroup_update_cpuusage_rt(cgrp, cpu); + return 0; +} + +static u64 cpu_cgroup_cpuusage_read(struct cgroup *cgrp, int cpu) +{ + return cpu_cgroup_cpuusage_cfs(cgrp, cpu) + + cpu_cgroup_cpuusage_rt(cgrp, cpu); +} + +#else +static u64 cpu_cgroup_cpuusage_read(struct cgroup *cgrp, int i) +{ + BUG(); + return 0; +} + +static int cpu_cgroup_cpuusage_write(struct cgroup *cgrp, int cpu, u64 val) +{ + BUG(); + return 0; +} +#endif /* CONFIG_CGROUP_SCHED */ + +static u64 cpuacct_cpuusage_read(struct cgroup *cgrp, int cpu) +{ + struct cpuacct *ca = cgroup_ca(cgrp); u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); u64 data; @@ -8364,8 +8472,9 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu) return data; } -static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) +static void cpuacct_cpuusage_write(struct cgroup *cgrp, int cpu, u64 val) { + struct cpuacct *ca = cgroup_ca(cgrp); u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); #ifndef CONFIG_64BIT @@ -8380,15 +8489,21 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) #endif } +static u64 cpuusage_read_percpu(struct cgroup *cgrp, int cpu) +{ + if (cpuacct_from_cpu) + return cpu_cgroup_cpuusage_read(cgrp, cpu); + return cpuacct_cpuusage_read(cgrp, cpu); +} + /* return total cpu usage (in nanoseconds) of a group */ static u64 cpuusage_read(struct cgroup *cgrp, struct cftype *cft) { - struct cpuacct *ca = cgroup_ca(cgrp); u64 totalcpuusage = 0; int i; for_each_present_cpu(i) - totalcpuusage += cpuacct_cpuusage_read(ca, i); + totalcpuusage += cpuusage_read_percpu(cgrp, i); return totalcpuusage; } @@ -8396,7 +8511,6 @@ static u64 cpuusage_read(struct cgroup *cgrp, struct cftype *cft) static int cpuusage_write(struct cgroup *cgrp, struct cftype *cftype, u64 reset) { - struct cpuacct *ca = cgroup_ca(cgrp); int err = 0; int i; @@ -8405,8 +8519,12 @@ static int cpuusage_write(struct cgroup *cgrp, struct cftype *cftype, goto out; } - for_each_present_cpu(i) - cpuacct_cpuusage_write(ca, i, 0); + for_each_present_cpu(i) { + if (cpuacct_from_cpu) + cpu_cgroup_cpuusage_write(cgrp, i, 0); + else + cpuacct_cpuusage_write(cgrp, i, 0); + } out: return err; @@ -8415,12 +8533,11 @@ out: static int cpuacct_percpu_seq_read(struct cgroup *cgroup, struct cftype *cft, struct seq_file *m) { - struct cpuacct *ca = cgroup_ca(cgroup); u64 percpu; int i; for_each_present_cpu(i) { - percpu = cpuacct_cpuusage_read(ca, i); + percpu = cpuusage_read_percpu(cgroup, i); seq_printf(m, "%llu ", (unsigned long long) percpu); } seq_printf(m, "\n"); @@ -8483,7 +8600,7 @@ static struct cftype files[] = { * * called with rq->lock held. */ -void cpuacct_charge(struct task_struct *tsk, u64 cputime) +void __cpuacct_charge(struct task_struct *tsk, u64 cputime) { struct cpuacct *ca; int cpu; @@ -8511,5 +8628,9 @@ struct cgroup_subsys cpuacct_subsys = { .destroy = cpuacct_destroy, .subsys_id = cpuacct_subsys_id, .base_cftypes = files, +#ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU + .comounts = 1, + .must_comount = { cpu_cgroup_subsys_id, }, +#endif }; #endif /* CONFIG_CGROUP_CPUACCT */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 8da579d..1da9fa8 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -885,6 +885,9 @@ extern void update_idle_cpu_load(struct rq *this_rq); #ifdef CONFIG_CGROUP_CPUACCT #include + +extern bool cpuacct_from_cpu; + /* track cpu usage of a group of tasks and its child groups */ struct cpuacct { struct cgroup_subsys_state css; @@ -914,7 +917,16 @@ static inline struct cpuacct *parent_ca(struct cpuacct *ca) return cgroup_ca(ca->css.cgroup->parent); } -extern void cpuacct_charge(struct task_struct *tsk, u64 cputime); +extern void __cpuacct_charge(struct task_struct *tsk, u64 cputime); + +static inline void cpuacct_charge(struct task_struct *tsk, u64 cputime) +{ +#ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU + if (likely(!cpuacct_from_cpu)) + return; +#endif + __cpuacct_charge(tsk, cputime); +} #else static inline void cpuacct_charge(struct task_struct *tsk, u64 cputime) {} #endif -- 1.7.11.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: [RFC 4/5] cpuacct: do not gather cpuacct statistics when not mounted Date: Tue, 4 Sep 2012 18:18:19 +0400 Message-ID: <1346768300-10282-5-git-send-email-glommer@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Return-path: In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-kernel@vger.kernel.org Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org, Glauber Costa Currently, the only test that prevents us from running the expensive cpuacct_charge() is cpuacct_subsys.active == true. This will hold at all times after the subsystem is activated, even if it is not mounted. IOW, use it or not, you pay it. By hooking with the bind() callback, we can detect when cpuacct is mounted or umounted, and stop collecting statistics when this cgroup is not in use. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- kernel/sched/core.c | 8 ++++++++ kernel/sched/sched.h | 3 +++ 2 files changed, 11 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e46871d..d654bd1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8595,6 +8595,13 @@ static struct cftype files[] = { { } /* terminate */ }; +bool cpuacct_mounted; + +void cpuacct_bind(struct cgroup *root) +{ + cpuacct_mounted = root->root == root_cpuacct.css.cgroup->root; +} + /* * charge this task's execution time to its accounting group. * @@ -8628,6 +8635,7 @@ struct cgroup_subsys cpuacct_subsys = { .destroy = cpuacct_destroy, .subsys_id = cpuacct_subsys_id, .base_cftypes = files, + .bind = cpuacct_bind, #ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU .comounts = 1, .must_comount = { cpu_cgroup_subsys_id, }, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1da9fa8..d33f777 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -887,6 +887,7 @@ extern void update_idle_cpu_load(struct rq *this_rq); #include extern bool cpuacct_from_cpu; +extern bool cpuacct_mounted; /* track cpu usage of a group of tasks and its child groups */ struct cpuacct { @@ -921,6 +922,8 @@ extern void __cpuacct_charge(struct task_struct *tsk, u64 cputime); static inline void cpuacct_charge(struct task_struct *tsk, u64 cputime) { + if (unlikely(!cpuacct_mounted)) + return; #ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU if (likely(!cpuacct_from_cpu)) return; -- 1.7.11.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Tue, 4 Sep 2012 14:46:02 -0700 Message-ID: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=QAF0qo1Sp71F8SCMlq1cWeWXNeIk8GctJWdHBdSOcDk=; b=HINKc6Eo0YOzzwQv6jrudEQOWUtQZV6PMXnwwG2Yf9RvEAcXblk1VnboyYZ7ZTfY7Q vGLHha35fwpDgZbRbAkP0aBX7UsFpEdPrBpjlXCU3C4nsZPg/KKBDo9j/tsSynjpgg2P riFWsU+7QjNWt4ERW7ivdqYMGB3OREkDni2P75TnePLVVjgNCOGMFV5jFXV7Iw3hYTBK BWiliz0mzVs12n+8ko9abVcDQO3dg7um4wovTnhCmr47gv7Sj0Nad1/7V5AHKtCVZ8a/ 4JTVqdSHqfWrRlOY2Z7kSpircCYVMmJNDdagLws4GNjekZZ6MLhaao31liO3hV2cYc0d zHcA== Content-Disposition: inline In-Reply-To: <1346768300-10282-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org Hello, Glauber. On Tue, Sep 04, 2012 at 06:18:15PM +0400, Glauber Costa wrote: > As we have been extensively discussing, the cost and pain points for cgroups > come from many places. But at least one of those is the arbitrary nature of > hierarchies. Many people, including at least Tejun and me would like this to go > away altogether. Problem so far, is breaking compatiblity with existing setups > > I am proposing here a default-n Kconfig option that will guarantee that the cpu > cgroups (for now) will be comounted. I started with them because the > cpu/cpuacct division is clearly the worst offender. Also, the default-n is here > so distributions will have time to adapt: Forcing this flag to be on without > userspace changes will just lead to cgroups failing to mount, which we don't > want. > > Although I've tested it and it works, I haven't compile-tested all possible > config combinations. So this is mostly for your eyes. If this gets traction, > I'll submit it properly, along with any changes that you might require. As I said during the discussion, I'm skeptical about how useful this is. This can't nudge existing users in any meaningfully gradual way. Kconfig doesn't make it any better. It's still an abrupt behavior change when seen from userland. Also, I really don't see much point in enforcing this almost arbitrary grouping of controllers. It doesn't simplify anything and using cpuacct in more granular way than cpu actually is one of the better justified use of multiple hierarchies. Also, what about memcg and blkcg? Do they *really* coincide? Note that both blkcg and memcg involve non-trivial overhead and blkcg is essentially broken hierarchy-wise. Currently, from userland visible behavior POV, the crazy parts are 1. The flat hierarchy thing. This just should go away. 2. Orthogonal multiple hierarchies. I think we agree that #1 should go away one way or the other. I *really* wanna get rid of #2 but am not sure how. I'll give it another stab once the writeback thing is resolved. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 12:03:25 +0400 Message-ID: <5047074D.1030104@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120904214602.GA9092-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org On 09/05/2012 01:46 AM, Tejun Heo wrote: > Hello, Glauber. > > On Tue, Sep 04, 2012 at 06:18:15PM +0400, Glauber Costa wrote: >> As we have been extensively discussing, the cost and pain points for cgroups >> come from many places. But at least one of those is the arbitrary nature of >> hierarchies. Many people, including at least Tejun and me would like this to go >> away altogether. Problem so far, is breaking compatiblity with existing setups >> >> I am proposing here a default-n Kconfig option that will guarantee that the cpu >> cgroups (for now) will be comounted. I started with them because the >> cpu/cpuacct division is clearly the worst offender. Also, the default-n is here >> so distributions will have time to adapt: Forcing this flag to be on without >> userspace changes will just lead to cgroups failing to mount, which we don't >> want. >> >> Although I've tested it and it works, I haven't compile-tested all possible >> config combinations. So this is mostly for your eyes. If this gets traction, >> I'll submit it properly, along with any changes that you might require. > > As I said during the discussion, I'm skeptical about how useful this > is. This can't nudge existing users in any meaningfully gradual way. > Kconfig doesn't make it any better. It's still an abrupt behavior > change when seen from userland. > The goal here is to have distributions to do it, because they tend to have a well defined lifecycle management, much more than upstream. Whoever sets this option, can coordinate with upstream. Aside from enforcing it, we can pretty much warn() as well, to direct people towards flipping the switch. > Also, I really don't see much point in enforcing this almost arbitrary > grouping of controllers. It doesn't simplify anything and using > cpuacct in more granular way than cpu actually is one of the better > justified use of multiple hierarchies. Also, what about memcg and > blkcg? Do they *really* coincide? Note that both blkcg and memcg > involve non-trivial overhead and blkcg is essentially broken > hierarchy-wise. > Where did I mention memcg or blkcg in this patch ? From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 01:14:39 -0700 Message-ID: <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=3tn6Nx4MAcupcur17xfDrRjSqPp/+xoUm3uB6OjQAM8=; b=WfBsZ9aGYtPaQEqMTPtskEDSc05j6VB0c56hLirGZ8xlXB4Oj1xoUFa5L0mMyvZWKl wv36sIPZsuZ4LFs40aMDuvUy6g+dmVrHhFd7QSmsJYX+3K0ooGzI6xAD6lNPkPG68k7T epAz+slujSmdbQTyxQzls2vwbYGe7MzNfWZXsBOr0pqtbJfC9W0VQKIG40lP78aSZxaS EH4uVYFgnByAuVkvyak7wne/XG4gvXlO4U7hS7JBmYkbTqMSc23hqvlI86DmIrntzdSn hSljr1Oy3+XAVqVDcTnyj3j7/6f2XUwvA1AvzaWR2bBgVH8T7h5YW7pmqoNTYPrPIxuo UrbQ== Content-Disposition: inline In-Reply-To: <5047074D.1030104@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hello, Glauber. On Wed, Sep 05, 2012 at 12:03:25PM +0400, Glauber Costa wrote: > The goal here is to have distributions to do it, because they tend to > have a well defined lifecycle management, much more than upstream. Whoever > sets this option, can coordinate with upstream. Distros can just co-mount them during boot. What's the point of the config options? > > Also, I really don't see much point in enforcing this almost arbitrary > > grouping of controllers. It doesn't simplify anything and using > > cpuacct in more granular way than cpu actually is one of the better > > justified use of multiple hierarchies. Also, what about memcg and > > blkcg? Do they *really* coincide? Note that both blkcg and memcg > > involve non-trivial overhead and blkcg is essentially broken > > hierarchy-wise. > > Where did I mention memcg or blkcg in this patch ? Differing hierarchies in memcg and blkcg currently is the most prominent case where the intersection in writeback is problematic and your proposed solution doesn't help one way or the other. What's the point? Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 12:17:11 +0400 Message-ID: <50470A87.1040701@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/05/2012 12:14 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:03:25PM +0400, Glauber Costa wrote: >> The goal here is to have distributions to do it, because they tend to >> have a well defined lifecycle management, much more than upstream. Whoever >> sets this option, can coordinate with upstream. > > Distros can just co-mount them during boot. What's the point of the > config options? > Pretty simple. The kernel can't assume the distro did. And then we still need to pay a stupid big price in the scheduler. After this patchset, We can assume this. And cpuusage can totally be derived from the cpu cgroup. Because much more than "they can comount", we can assume they did. >>> Also, I really don't see much point in enforcing this almost arbitrary >>> grouping of controllers. It doesn't simplify anything and using >>> cpuacct in more granular way than cpu actually is one of the better >>> justified use of multiple hierarchies. Also, what about memcg and >>> blkcg? Do they *really* coincide? Note that both blkcg and memcg >>> involve non-trivial overhead and blkcg is essentially broken >>> hierarchy-wise. >> >> Where did I mention memcg or blkcg in this patch ? > > Differing hierarchies in memcg and blkcg currently is the most > prominent case where the intersection in writeback is problematic and > your proposed solution doesn't help one way or the other. What's the > point? > The point is that I am focusing at one problem at a time. But FWIW, I don't see why memcg/blkcg can't use a step just like this one in a separate pass. If the goal is comounting them eventually, at some point when the issues are sorted out, just do it. Get a switch like this one, and then you will start being able to assume a lot of things in the code. Miracles can happen. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 01:29:47 -0700 Message-ID: <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=HHI51I3YB2Ium7hPabS2GTUS8WOkCFeam3tbnTTOwls=; b=O/15yqDUJUAbhp9nrqmkH51AN6TuXv4b6lv1PGVoEGkx8b7oNW+ObLArYQ0qqjnhDH VHVcoNoMHXgdCKruLGTy/eFqaw9bRNaspjknYfjKAXB+MWVKhmQaYXLbWTKPSwftCBT1 +DFmnagnhq0CMpF42nZLxoWHHNa0okPX71vcrBDKQQk/2bjZsSKzeeUgff8JpT/72Zgg h5AaiEnOVRbCaOojRpq6MPua9G0KR0Pb1UAVz7v7KNUJQyInsiBZGzzJWc2Nr6weSRHn XHGlbdVhBfIO0DMCxG4JbTtRlvJ99TUEAFN/oHvQW74h3GuJCPuDCr5jbCMOwBomT0i8 M1lg== Content-Disposition: inline In-Reply-To: <50470A87.1040701@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hello, Glauber. On Wed, Sep 05, 2012 at 12:17:11PM +0400, Glauber Costa wrote: > > Distros can just co-mount them during boot. What's the point of the > > config options? > > Pretty simple. The kernel can't assume the distro did. And then we still > need to pay a stupid big price in the scheduler. > > After this patchset, We can assume this. And cpuusage can totally be > derived from the cpu cgroup. Because much more than "they can comount", > we can assume they did. As long as cpuacct and cpu are separate, I think it makes sense to assume that they at least could be at different granularity. As for optimization for co-mounted case, if that is *really* necessary, couldn't it be done dynamically? It's not like CONFIG_XXX blocks are pretty things and they're worse for runtime code path coverage. > > Differing hierarchies in memcg and blkcg currently is the most > > prominent case where the intersection in writeback is problematic and > > your proposed solution doesn't help one way or the other. What's the > > point? > > The point is that I am focusing at one problem at a time. But FWIW, I > don't see why memcg/blkcg can't use a step just like this one in a > separate pass. > > If the goal is comounting them eventually, at some point when the issues > are sorted out, just do it. Get a switch like this one, and then you > will start being able to assume a lot of things in the code. Miracles > can happen. The problem is that I really don't see how this leads to where we eventually wanna be. Orthogonal hierarchies are bad because, * It complicates the code. This doesn't really help there much. * Intersections between controllers are cumbersome to handle. Again, this doesn't help much. And this restricts the only valid use case for multiple hierarchies which is applying differing level of granularity depending on controllers. So, I don't know. Doesn't seem like a good idea to me. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 12:35:11 +0400 Message-ID: <50470EBF.9070109@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120905082947.GD3195-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org On 09/05/2012 12:29 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:17:11PM +0400, Glauber Costa wrote: >>> Distros can just co-mount them during boot. What's the point of the >>> config options? >> >> Pretty simple. The kernel can't assume the distro did. And then we still >> need to pay a stupid big price in the scheduler. >> >> After this patchset, We can assume this. And cpuusage can totally be >> derived from the cpu cgroup. Because much more than "they can comount", >> we can assume they did. > > As long as cpuacct and cpu are separate, I think it makes sense to > assume that they at least could be at different granularity. If they are comounted, and more: forceably comounted, I don't see how to call them separate. At the very best, they are this way for compatibility purposes only, to lay a path that would allow us to get rid of the separation eventually. > As for > optimization for co-mounted case, if that is *really* necessary, > couldn't it be done dynamically? It's not like CONFIG_XXX blocks are > pretty things and they're worse for runtime code path coverage. > I've done it dynamically, as you know. But if you think that complicated the code less than this, we're operating by very different standards... CONFIG options can make the code uglier, but it is a lot more predictable. It also guarantee no state changes will happen during the lifecycle of the machine. Doing it dynamically makes the code prettier, but still extensively large, and prone to subtle bugs, as we've already seen in practice. >>> Differing hierarchies in memcg and blkcg currently is the most >>> prominent case where the intersection in writeback is problematic and >>> your proposed solution doesn't help one way or the other. What's the >>> point? >> >> The point is that I am focusing at one problem at a time. But FWIW, I >> don't see why memcg/blkcg can't use a step just like this one in a >> separate pass. >> >> If the goal is comounting them eventually, at some point when the issues >> are sorted out, just do it. Get a switch like this one, and then you >> will start being able to assume a lot of things in the code. Miracles >> can happen. > > The problem is that I really don't see how this leads to where we > eventually wanna be. Orthogonal hierarchies are bad because, > > * It complicates the code. This doesn't really help there much. > Way I see it, it is the price we pay for having screwed up before. And Kconfig options doesn't necessarily complicate the code. They make it bigger, and possibly slightly harder to follow. But I myself > * Intersections between controllers are cumbersome to handle. Again, > this doesn't help much. > They are only cumbersome because we can't assume nothing. The cpuacct is the perfect example. Once we can start assuming, they become a lot less so. > And this restricts the only valid use case for multiple hierarchies > which is applying differing level of granularity depending on > controllers. So, I don't know. Doesn't seem like a good idea to me. > > Thanks. > From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 01:47:40 -0700 Message-ID: <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=hevLXdrLyIHQ05osFzD82K9027QEaxHaaomIeyHxQxE=; b=r8HNs12+k3id8ZeKanGnb8wRuBMa+AXQ/K5fu83R9EIddG+UpkjNEPt6BkFozi/omA 6NeicW/5P/Xm71UDq4EMod1IGsoJlGhJHdizkISCejtd3CiwFBN4ZwU+3MuLq6+tJJOx RDDoiQaeQk8MN17WSbAScHR95MNKMckjc2FahaNugTYVXzhNk0FohtWk54cQbZXlsZaJ dXXOoxSi8huLz91ZI0dR/HCRffVWbFzz5xFqBxviG+uLtZIzN3b87aSBqSRYexStC9Xd MLipMxN72eZWvRXUw6MtcL+b9a9cS/ptRVvZkiq2Mh6HZXbR/Z1pgIcAbDSQxJ+iN1Cl fWXA== Content-Disposition: inline In-Reply-To: <50470EBF.9070109@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hello, Glauber. On Wed, Sep 05, 2012 at 12:35:11PM +0400, Glauber Costa wrote: > > As long as cpuacct and cpu are separate, I think it makes sense to > > assume that they at least could be at different granularity. > > If they are comounted, and more: forceably comounted, I don't see how to > call them separate. At the very best, they are this way for > compatibility purposes only, to lay a path that would allow us to get > rid of the separation eventually. I think this is where we disagree. I didn't mean that all controllers should be using exactly the same hierarchy when I was talking about unified hierarchy. I do think it's useful and maybe even essential to allow differing levels of granularity. cpu and cpuacct could be a valid example for this. Likely blkcg and memcg too. So, I think it's desirable for all controllers to be able to handle hierarchies the same way and to have the ability to tag something as belonging to certain group in the hierarchy for all controllers but I don't think it's desirable or feasible to require all of them to follow exactly the same grouping at all levels. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 12:55:21 +0400 Message-ID: <50471379.3060603@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120905084740.GE3195-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org On 09/05/2012 12:47 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:35:11PM +0400, Glauber Costa wrote: >>> As long as cpuacct and cpu are separate, I think it makes sense to >>> assume that they at least could be at different granularity. >> >> If they are comounted, and more: forceably comounted, I don't see how to >> call them separate. At the very best, they are this way for >> compatibility purposes only, to lay a path that would allow us to get >> rid of the separation eventually. > > I think this is where we disagree. I didn't mean that all controllers > should be using exactly the same hierarchy when I was talking about > unified hierarchy. I do think it's useful and maybe even essential to > allow differing levels of granularity. cpu and cpuacct could be a > valid example for this. Likely blkcg and memcg too. > > So, I think it's desirable for all controllers to be able to handle > hierarchies the same way and to have the ability to tag something as > belonging to certain group in the hierarchy for all controllers but I > don't think it's desirable or feasible to require all of them to > follow exactly the same grouping at all levels. > By "different levels of granularity" do you mean having just a subset of them turned on at a particular place? If yes, having them guaranteed to be comounted is still perceived by me as a good first step. A natural following would be to turn them on/off on a per-group basis. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 05 Sep 2012 11:06:33 +0200 Message-ID: <1346835993.2600.9.camel@twins> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On Wed, 2012-09-05 at 01:47 -0700, Tejun Heo wrote: > I think this is where we disagree. I didn't mean that all controllers > should be using exactly the same hierarchy when I was talking about > unified hierarchy. I do think it's useful and maybe even essential to > allow differing levels of granularity. cpu and cpuacct could be a > valid example for this. Likely blkcg and memcg too. >=20 > So, I think it's desirable for all controllers to be able to handle > hierarchies the same way and to have the ability to tag something as > belonging to certain group in the hierarchy for all controllers but I > don't think it's desirable or feasible to require all of them to > follow exactly the same grouping at all levels.=20 *confused* I always thought that was exactly what you meant with unified hierarchy. Doing all this runtime is just going to make the mess even bigger, because now we have to deal with even more stupid cases. So either we go and try to contain this mess as proposed by Glauber or we go delete controllers.. I've had it with this crap. --- Documentation/cgroups/00-INDEX | 2 - Documentation/cgroups/cpuacct.txt | 49 -------- include/linux/cgroup_subsys.h | 6 - init/Kconfig | 6 - kernel/sched/core.c | 247 ----------------------------------= ---- kernel/sched/fair.c | 1 - kernel/sched/rt.c | 1 - kernel/sched/sched.h | 45 ------- kernel/sched/stop_task.c | 1 - 9 files changed, 358 deletions(-) diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDE= X index 3f58fa3..9f100cc 100644 --- a/Documentation/cgroups/00-INDEX +++ b/Documentation/cgroups/00-INDEX @@ -2,8 +2,6 @@ - this file cgroups.txt - Control Groups definition, implementation details, examples and API. -cpuacct.txt - - CPU Accounting Controller; account CPU usage for groups of tasks. cpusets.txt - documents the cpusets feature; assign CPUs and Mem to a set of tasks. devices.txt diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpua= cct.txt deleted file mode 100644 index 9d73cc0..0000000 --- a/Documentation/cgroups/cpuacct.txt +++ /dev/null @@ -1,49 +0,0 @@ -CPU Accounting Controller -------------------------- - -The CPU accounting controller is used to group tasks using cgroups and -account the CPU usage of these groups of tasks. - -The CPU accounting controller supports multi-hierarchy groups. An accounti= ng -group accumulates the CPU usage of all of its child groups and the tasks -directly present in its group. - -Accounting groups can be created by first mounting the cgroup filesystem. - -# mount -t cgroup -ocpuacct none /sys/fs/cgroup - -With the above step, the initial or the parent accounting group becomes -visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in -the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. -/sys/fs/cgroup/cpuacct.usage gives the CPU time (in nanoseconds) obtained -by this group which is essentially the CPU time obtained by all the tasks -in the system. - -New accounting groups can be created under the parent group /sys/fs/cgroup= . - -# cd /sys/fs/cgroup -# mkdir g1 -# echo $$ > g1/tasks - -The above steps create a new group g1 and move the current shell -process (bash) into it. CPU time consumed by this bash and its children -can be obtained from g1/cpuacct.usage and the same is accumulated in -/sys/fs/cgroup/cpuacct.usage also. - -cpuacct.stat file lists a few statistics which further divide the -CPU time obtained by the cgroup into user and system times. Currently -the following statistics are supported: - -user: Time spent by tasks of the cgroup in user mode. -system: Time spent by tasks of the cgroup in kernel mode. - -user and system are in USER_HZ unit. - -cpuacct controller uses percpu_counter interface to collect user and -system times. This has two side effects: - -- It is theoretically possible to see wrong values for user and system tim= es. - This is because percpu_counter_read() on 32bit systems isn't safe - against concurrent writes. -- It is possible to see slightly outdated values for user and system times - due to the batch processing nature of percpu_counter. diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index dfae957..73b7cc1 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -25,12 +25,6 @@ SUBSYS(cpu_cgroup) =20 /* */ =20 -#ifdef CONFIG_CGROUP_CPUACCT -SUBSYS(cpuacct) -#endif - -/* */ - #ifdef CONFIG_MEMCG SUBSYS(mem_cgroup) #endif diff --git a/init/Kconfig b/init/Kconfig index af6c7f8..3ac9e1c 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -674,12 +674,6 @@ config PROC_PID_CPUSET depends on CPUSETS default y =20 -config CGROUP_CPUACCT - bool "Simple CPU accounting cgroup subsystem" - help - Provides a simple Resource Controller for monitoring the - total CPU consumed by the tasks in a cgroup. - config RESOURCE_COUNTERS bool "Resource counters" help diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4376c9f..47c7cdb 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2809,18 +2809,9 @@ unsigned long long task_sched_runtime(struct task_st= ruct *p) return ns; } =20 -#ifdef CONFIG_CGROUP_CPUACCT -struct cgroup_subsys cpuacct_subsys; -struct cpuacct root_cpuacct; -#endif - static inline void task_group_account_field(struct task_struct *p, int ind= ex, u64 tmp) { -#ifdef CONFIG_CGROUP_CPUACCT - struct kernel_cpustat *kcpustat; - struct cpuacct *ca; -#endif /* * Since all updates are sure to touch the root cgroup, we * get ourselves ahead and touch it first. If the root cgroup @@ -2828,20 +2819,6 @@ static inline void task_group_account_field(struct t= ask_struct *p, int index, * */ __get_cpu_var(kernel_cpustat).cpustat[index] +=3D tmp; - -#ifdef CONFIG_CGROUP_CPUACCT - if (unlikely(!cpuacct_subsys.active)) - return; - - rcu_read_lock(); - ca =3D task_ca(p); - while (ca && (ca !=3D &root_cpuacct)) { - kcpustat =3D this_cpu_ptr(ca->cpustat); - kcpustat->cpustat[index] +=3D tmp; - ca =3D parent_ca(ca); - } - rcu_read_unlock(); -#endif } =20 =20 @@ -7351,12 +7328,6 @@ void __init sched_init(void) =20 #endif /* CONFIG_CGROUP_SCHED */ =20 -#ifdef CONFIG_CGROUP_CPUACCT - root_cpuacct.cpustat =3D &kernel_cpustat; - root_cpuacct.cpuusage =3D alloc_percpu(u64); - /* Too early, not expected to fail */ - BUG_ON(!root_cpuacct.cpuusage); -#endif for_each_possible_cpu(i) { struct rq *rq; =20 @@ -8409,221 +8380,3 @@ struct cgroup_subsys cpu_cgroup_subsys =3D { }; =20 #endif /* CONFIG_CGROUP_SCHED */ - -#ifdef CONFIG_CGROUP_CPUACCT - -/* - * CPU accounting code for task groups. - * - * Based on the work by Paul Menage (menage@google.com) and Balbir Singh - * (balbir@in.ibm.com). - */ - -/* create a new cpu accounting group */ -static struct cgroup_subsys_state *cpuacct_create(struct cgroup *cgrp) -{ - struct cpuacct *ca; - - if (!cgrp->parent) - return &root_cpuacct.css; - - ca =3D kzalloc(sizeof(*ca), GFP_KERNEL); - if (!ca) - goto out; - - ca->cpuusage =3D alloc_percpu(u64); - if (!ca->cpuusage) - goto out_free_ca; - - ca->cpustat =3D alloc_percpu(struct kernel_cpustat); - if (!ca->cpustat) - goto out_free_cpuusage; - - return &ca->css; - -out_free_cpuusage: - free_percpu(ca->cpuusage); -out_free_ca: - kfree(ca); -out: - return ERR_PTR(-ENOMEM); -} - -/* destroy an existing cpu accounting group */ -static void cpuacct_destroy(struct cgroup *cgrp) -{ - struct cpuacct *ca =3D cgroup_ca(cgrp); - - free_percpu(ca->cpustat); - free_percpu(ca->cpuusage); - kfree(ca); -} - -static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu) -{ - u64 *cpuusage =3D per_cpu_ptr(ca->cpuusage, cpu); - u64 data; - -#ifndef CONFIG_64BIT - /* - * Take rq->lock to make 64-bit read safe on 32-bit platforms. - */ - raw_spin_lock_irq(&cpu_rq(cpu)->lock); - data =3D *cpuusage; - raw_spin_unlock_irq(&cpu_rq(cpu)->lock); -#else - data =3D *cpuusage; -#endif - - return data; -} - -static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) -{ - u64 *cpuusage =3D per_cpu_ptr(ca->cpuusage, cpu); - -#ifndef CONFIG_64BIT - /* - * Take rq->lock to make 64-bit write safe on 32-bit platforms. - */ - raw_spin_lock_irq(&cpu_rq(cpu)->lock); - *cpuusage =3D val; - raw_spin_unlock_irq(&cpu_rq(cpu)->lock); -#else - *cpuusage =3D val; -#endif -} - -/* return total cpu usage (in nanoseconds) of a group */ -static u64 cpuusage_read(struct cgroup *cgrp, struct cftype *cft) -{ - struct cpuacct *ca =3D cgroup_ca(cgrp); - u64 totalcpuusage =3D 0; - int i; - - for_each_present_cpu(i) - totalcpuusage +=3D cpuacct_cpuusage_read(ca, i); - - return totalcpuusage; -} - -static int cpuusage_write(struct cgroup *cgrp, struct cftype *cftype, - u64 reset) -{ - struct cpuacct *ca =3D cgroup_ca(cgrp); - int err =3D 0; - int i; - - if (reset) { - err =3D -EINVAL; - goto out; - } - - for_each_present_cpu(i) - cpuacct_cpuusage_write(ca, i, 0); - -out: - return err; -} - -static int cpuacct_percpu_seq_read(struct cgroup *cgroup, struct cftype *c= ft, - struct seq_file *m) -{ - struct cpuacct *ca =3D cgroup_ca(cgroup); - u64 percpu; - int i; - - for_each_present_cpu(i) { - percpu =3D cpuacct_cpuusage_read(ca, i); - seq_printf(m, "%llu ", (unsigned long long) percpu); - } - seq_printf(m, "\n"); - return 0; -} - -static const char *cpuacct_stat_desc[] =3D { - [CPUACCT_STAT_USER] =3D "user", - [CPUACCT_STAT_SYSTEM] =3D "system", -}; - -static int cpuacct_stats_show(struct cgroup *cgrp, struct cftype *cft, - struct cgroup_map_cb *cb) -{ - struct cpuacct *ca =3D cgroup_ca(cgrp); - int cpu; - s64 val =3D 0; - - for_each_online_cpu(cpu) { - struct kernel_cpustat *kcpustat =3D per_cpu_ptr(ca->cpustat, cpu); - val +=3D kcpustat->cpustat[CPUTIME_USER]; - val +=3D kcpustat->cpustat[CPUTIME_NICE]; - } - val =3D cputime64_to_clock_t(val); - cb->fill(cb, cpuacct_stat_desc[CPUACCT_STAT_USER], val); - - val =3D 0; - for_each_online_cpu(cpu) { - struct kernel_cpustat *kcpustat =3D per_cpu_ptr(ca->cpustat, cpu); - val +=3D kcpustat->cpustat[CPUTIME_SYSTEM]; - val +=3D kcpustat->cpustat[CPUTIME_IRQ]; - val +=3D kcpustat->cpustat[CPUTIME_SOFTIRQ]; - } - - val =3D cputime64_to_clock_t(val); - cb->fill(cb, cpuacct_stat_desc[CPUACCT_STAT_SYSTEM], val); - - return 0; -} - -static struct cftype files[] =3D { - { - .name =3D "usage", - .read_u64 =3D cpuusage_read, - .write_u64 =3D cpuusage_write, - }, - { - .name =3D "usage_percpu", - .read_seq_string =3D cpuacct_percpu_seq_read, - }, - { - .name =3D "stat", - .read_map =3D cpuacct_stats_show, - }, - { } /* terminate */ -}; - -/* - * charge this task's execution time to its accounting group. - * - * called with rq->lock held. - */ -void cpuacct_charge(struct task_struct *tsk, u64 cputime) -{ - struct cpuacct *ca; - int cpu; - - if (unlikely(!cpuacct_subsys.active)) - return; - - cpu =3D task_cpu(tsk); - - rcu_read_lock(); - - ca =3D task_ca(tsk); - - for (; ca; ca =3D parent_ca(ca)) { - u64 *cpuusage =3D per_cpu_ptr(ca->cpuusage, cpu); - *cpuusage +=3D cputime; - } - - rcu_read_unlock(); -} - -struct cgroup_subsys cpuacct_subsys =3D { - .name =3D "cpuacct", - .create =3D cpuacct_create, - .destroy =3D cpuacct_destroy, - .subsys_id =3D cpuacct_subsys_id, - .base_cftypes =3D files, -}; -#endif /* CONFIG_CGROUP_CPUACCT */ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 01d3eda..bff5b6e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -706,7 +706,6 @@ static void update_curr(struct cfs_rq *cfs_rq) struct task_struct *curtask =3D task_of(curr); =20 trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime); - cpuacct_charge(curtask, delta_exec); account_group_exec_runtime(curtask, delta_exec); } =20 diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 944cb68..8e5805e 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -934,7 +934,6 @@ static void update_curr_rt(struct rq *rq) account_group_exec_runtime(curr, delta_exec); =20 curr->se.exec_start =3D rq->clock_task; - cpuacct_charge(curr, delta_exec); =20 sched_rt_avg_update(rq, delta_exec); =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index f6714d0..00ca3f6 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -833,15 +833,6 @@ static const u32 prio_to_wmult[40] =3D { /* 15 */ 119304647, 148102320, 186737708, 238609294, 286331153, }; =20 -/* Time spent by the tasks of the cpu accounting group executing in ... */ -enum cpuacct_stat_index { - CPUACCT_STAT_USER, /* ... user mode */ - CPUACCT_STAT_SYSTEM, /* ... kernel mode */ - - CPUACCT_STAT_NSTATS, -}; - - #define sched_class_highest (&stop_sched_class) #define for_each_class(class) \ for (class =3D sched_class_highest; class; class =3D class->next) @@ -881,42 +872,6 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_= b, u64 period, u64 runtime =20 extern void update_idle_cpu_load(struct rq *this_rq); =20 -#ifdef CONFIG_CGROUP_CPUACCT -#include -/* track cpu usage of a group of tasks and its child groups */ -struct cpuacct { - struct cgroup_subsys_state css; - /* cpuusage holds pointer to a u64-type object on every cpu */ - u64 __percpu *cpuusage; - struct kernel_cpustat __percpu *cpustat; -}; - -/* return cpu accounting group corresponding to this container */ -static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp) -{ - return container_of(cgroup_subsys_state(cgrp, cpuacct_subsys_id), - struct cpuacct, css); -} - -/* return cpu accounting group to which this task belongs */ -static inline struct cpuacct *task_ca(struct task_struct *tsk) -{ - return container_of(task_subsys_state(tsk, cpuacct_subsys_id), - struct cpuacct, css); -} - -static inline struct cpuacct *parent_ca(struct cpuacct *ca) -{ - if (!ca || !ca->css.cgroup->parent) - return NULL; - return cgroup_ca(ca->css.cgroup->parent); -} - -extern void cpuacct_charge(struct task_struct *tsk, u64 cputime); -#else -static inline void cpuacct_charge(struct task_struct *tsk, u64 cputime) {} -#endif - static inline void inc_nr_running(struct rq *rq) { rq->nr_running++; diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c index da5eb5b..fda1cbe 100644 --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -68,7 +68,6 @@ static void put_prev_task_stop(struct rq *rq, struct task= _struct *prev) account_group_exec_runtime(curr, delta_exec); =20 curr->se.exec_start =3D rq->clock_task; - cpuacct_charge(curr, delta_exec); } =20 static void task_tick_stop(struct rq *rq, struct task_struct *curr, int qu= eued) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 05 Sep 2012 11:07:21 +0200 Message-ID: <1346836041.2600.10.camel@twins> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <1346835993.2600.9.camel@twins> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On Wed, 2012-09-05 at 11:06 +0200, Peter Zijlstra wrote: >=20 > So either we go and try to contain this mess as proposed by Glauber or > we go delete controllers.. I've had it with this crap. >=20 >=20 Glauber, the other approach is sending a patch that doesn't touch cgroup.c but only the controllers and I'll merge it regardless of what tj thinks. We need some movement here. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 02:07:44 -0700 Message-ID: <20120905090744.GG3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471379.3060603@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=xYfY0AIurviK0E647+YvgapuXsG00je1RRDu9G2mJCU=; b=ymTST9+7TdKNZQYcztFkEGaMAeji56SXfGEu9SqrQXNsTClr9fAK950hHwxy9rPCQN vhvTie/lrMytjWr7cohnxRqMR5vocQGRMFWAmiXsprpNVpxPaOT+S8+2CaQJJL85V9pl kIUSUH695RRQrgriGRqdk1P3RFOykVR4aUPE92HL6zmlbb2lp9Cmmo5DFpdDquTEvw4A WgkpypVsCKmPADOYYQ5hlUf/HpvtGspK5Mvr2QWxSNgDDjyzM0UZO8WLQrnxSgo/UJkN 0BIGwNexu7HdP+PXEVRPXTl6QfxQIKbOBGks8kbG4wakBzlcIVOMDTlbZ5sp/Jm7GsHx 0pnA== Content-Disposition: inline In-Reply-To: <50471379.3060603@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hello, Glauber. On Wed, Sep 05, 2012 at 12:55:21PM +0400, Glauber Costa wrote: > > So, I think it's desirable for all controllers to be able to handle > > hierarchies the same way and to have the ability to tag something as > > belonging to certain group in the hierarchy for all controllers but I > > don't think it's desirable or feasible to require all of them to > > follow exactly the same grouping at all levels. > > By "different levels of granularity" do you mean having just a subset of > them turned on at a particular place? Heh, this is tricky to describe and I'm not really following what you mean. They're all on the same tree but a controller should be able to handle a given subtree as single group. e.g. if you draw the tree, different controllers should be able to draw different enclosing circles and operate on the simplifed tree. How flexible that should be, I don't know. Maybe it would be enough to be able to say "treat all children of this node as belonging to this node for controllers X and Y". > If yes, having them guaranteed to be comounted is still perceived by me > as a good first step. A natural following would be to turn them on/off > on a per-group basis. I don't agree with that. If we do it that way, we would lose differing granularity from forcing co-mounting and then restore it later when the subtree handling is implemented. If we can do away with differing granularity, that's fine; otherwise, it doesn't make much sense to remove and then restore it. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 13:06:39 +0400 Message-ID: <5047161F.60503@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471379.3060603@parallels.com> <20120905090744.GG3195@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120905090744.GG3195-RcKxWJ4Cfj1J2suj2OqeGauc2jM2gXBXkQQo+JxHRPFibQn6LdNjmg@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org On 09/05/2012 01:07 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:55:21PM +0400, Glauber Costa wrote: >>> So, I think it's desirable for all controllers to be able to handle >>> hierarchies the same way and to have the ability to tag something as >>> belonging to certain group in the hierarchy for all controllers but I >>> don't think it's desirable or feasible to require all of them to >>> follow exactly the same grouping at all levels. >> >> By "different levels of granularity" do you mean having just a subset of >> them turned on at a particular place? > > Heh, this is tricky to describe and I'm not really following what you > mean. Do we really want to start cleaning up all this by changing the interface to something that is described as "tricky" ? From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 02:11:40 -0700 Message-ID: <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=pZ62LlHGOIiPW4wMhy5G+KVp/i3omi6EnvyF6ogOTNU=; b=g6tcryjJ3RRfZ+ARQwT5jqZ2oUGs/gP91NS+NTzwJESdORJlDPd4XHnCr9QMZ4qd93 RHX2FfmrX7OHo5MHPb1/3mQRGoIpJ7CAs3Ajo3CApDdjwVxyKTaGjxu+pbkfZKoFsq+U 6YIFELf89ADcMTedTtjb5izf4I908LO7Zma1yRI6+uYEBtdhcw/T9Xfl+vJyfetwlKlr hITupbFKONm4mEJnjcny1WoIbEoVQ4y8zuGoYAMcbyqNnr9tfvlI70ATWC22GruuC1u6 lR8hA2i/bUkCVGPcA7zy0UH8k4QRXkIhjMJkPXgXnstV1LnPIYEB6g82mn+z7CcOjOkG nQZQ== Content-Disposition: inline In-Reply-To: <1346835993.2600.9.camel@twins> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Peter Zijlstra Cc: Glauber Costa , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org Hello, Peter. On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > *confused* I always thought that was exactly what you meant with unified > hierarchy. No, I never counted out differing granularity. > Doing all this runtime is just going to make the mess even bigger, > because now we have to deal with even more stupid cases. > > So either we go and try to contain this mess as proposed by Glauber or > we go delete controllers.. I've had it with this crap. If cpuacct can really go away, that's great, but I don't think the problem at hand is unsolvable, so let's not jump it. cpuacct and cpu aren't the onlfy problem cases after all. We need to solve it for other controllers too. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 02:14:56 -0700 Message-ID: <20120905091456.GI3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471379.3060603@parallels.com> <20120905090744.GG3195@dhcp-172-17-108-109.mtv.corp.google.com> <5047161F.60503@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=6SWLZ13zYnJxvrM4JhUPij+DGMhIiz8tMIFylO+XhDU=; b=KVCP5X0MQ4Wss94tGk/WD3ipkZFkJXVBjQw+zN+dHTAaGiLtEdojClHyc/92Vo0W8u N299dX0LKbmsrO4P2dqsov0aLOBPhHKJ0uyXG1XaG4kZLfmDi1OqaxrdNOHSkyaZpewU pMuUs6FD0GGGSJMIitPrf7QEu36psAdudNAbG2XSOEn7qcX3GXTJUPUIRGJcMKPW6QIe 6mL09KVjWmjvUi15rKYJ1Us0fYjh6Ua+SU03mlyi+BTTjOEAYnJjDnvP6v3GEJXurz5J BrqZyT13sNiyJGFRPKktbQLnCIVP5oMY4eRWUOxqIbiuciIXvDmAmx/2z/u1SPoVosB+ 7OrQ== Content-Disposition: inline In-Reply-To: <5047161F.60503-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org On Wed, Sep 05, 2012 at 01:06:39PM +0400, Glauber Costa wrote: > > Heh, this is tricky to describe and I'm not really following what you > > mean. > > Do we really want to start cleaning up all this by changing the > interface to something that is described as "tricky" ? The concept is not tricky. I just can't find the appropriate words. I *suspect* this can mostly re-use the existing css_set thing. It mostly becomes that css_set belongs to the unified hierarchy rather than each task. The user interface part isn't trivial and maybe "don't nest beyond this level" is the only thing reasonable. Not sure yet whether that would be enough tho. Need to think more about it. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 13:12:34 +0400 Message-ID: <50471782.6060800@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/05/2012 01:11 PM, Tejun Heo wrote: > Hello, Peter. > > On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: >> *confused* I always thought that was exactly what you meant with unified >> hierarchy. > > No, I never counted out differing granularity. > Can you elaborate on which interface do you envision to make it work? They will clearly be mounted in the same hierarchy, or as said alternatively, comounted. If you can turn them on/off on a per-subtree basis, which interface exactly do you propose for that? Would a pair of cgroup core files like available_controllers and current_controllers are a lot of drivers do, suffice? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 02:19:25 -0700 Message-ID: <20120905091925.GJ3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=1CHxEmUyDGD9klRbXkctZczz3cgFF8Va97PWvdS8pFM=; b=dtbbOB1YseUFthtoNLEp4L87EQPrALxkZMUVqm7lsVXtwasrpvoPs1Urp64sLmiopf DgzutNlsyTgVLEKWxcG1WriFdfoningbt89qla2cuNxzNEdiCiRX5QgYlIT0VBhqk0FU k6JYfQkP6fKYNu49M7z35Rijv9Vu3tv10y5LJ5A0W7y3ao29QD/7WIfKW5xaszjSG253 b46WV3JMKYRXB8XNgCoFifzLo+btcb0APxHVbU5oS7CmHQWQgtxbPc1d/wzZFAoZ8cZs E8J3NRH/KuhmS7rUw9XikfeS3nldYwAwa9liXcTIbKKkcTwu+ZpjLGVNr9T/A34tpRTG LISQ== Content-Disposition: inline In-Reply-To: <50471782.6060800-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: Peter Zijlstra , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org On Wed, Sep 05, 2012 at 01:12:34PM +0400, Glauber Costa wrote: > > No, I never counted out differing granularity. > > Can you elaborate on which interface do you envision to make it work? > They will clearly be mounted in the same hierarchy, or as said > alternatively, comounted. I'm not sure yet. At the simplest, mask of controllers which should honor (or ignore) nesting beyond the node. That should be understandable enough. Not sure whether that would be flexible enough yet tho. In the end, they should be comounted but again I don't think enforcing comounting at the moment is a step towards that. It's more like a step sideways. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 02:22:16 -0700 Message-ID: <20120905092216.GK3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <1346836041.2600.10.camel@twins> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=OpBbMZDBvbXui/W34c1T178X/Wfoq5tWXpVEia65tlo=; b=V1nWHVq/Q6y6rea3Tye8P5BhrSvQ9W4wJNdZ787Q1kbJg9aw8wVM3OcOGOsInyZrzc HrxtWJ/exSW2EtMWlBNXzSDF7IWh3WmFa1HI90CmdBfgWN0pfGSWJnae/z0W3O1CamBG Evi6xSUWVQtvBj11SbB/aaDetjDa45iLVrxT9qHnADZjNg5sCYfjOhbRMlEjJh/yfGRK h+WgNXA1+jl42F+Y+F6oZ4GDI7oXnKd5gf/fTVAzx5OEFDtBnCeG5g+Sb6VB7ixFv6wm pXQCLEtLUMcz7BnB0lZCsg621pw8QxQYhLiCdy7HPqtvYfZ7ULpTKX3GX/pM1tJoL0sg HkIg== Content-Disposition: inline In-Reply-To: <1346836041.2600.10.camel@twins> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Peter Zijlstra Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hey, On Wed, Sep 05, 2012 at 11:07:21AM +0200, Peter Zijlstra wrote: > Glauber, the other approach is sending a patch that doesn't touch > cgroup.c but only the controllers and I'll merge it regardless of what > tj thinks. > > We need some movement here. Peter, I don't think the proposed patch is helpful at this point. While movement is necessary, it's not like moving towards any direction is helpful. They might just become another cruft which needs to be maintained. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 05 Sep 2012 11:26:49 +0200 Message-ID: <1346837209.2600.14.camel@twins> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <50471782.6060800@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: Tejun Heo , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On Wed, 2012-09-05 at 13:12 +0400, Glauber Costa wrote: > On 09/05/2012 01:11 PM, Tejun Heo wrote: > > Hello, Peter. > >=20 > > On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > >> *confused* I always thought that was exactly what you meant with unifi= ed > >> hierarchy. > >=20 > > No, I never counted out differing granularity. > >=20 >=20 > Can you elaborate on which interface do you envision to make it work? > They will clearly be mounted in the same hierarchy, or as said > alternatively, comounted. >=20 > If you can turn them on/off on a per-subtree basis, which interface > exactly do you propose for that? I wouldn't, screw that. That would result in the exact same problem we're trying to fix. I want a single hierarchy walk, that's expensive enough. > Would a pair of cgroup core files like available_controllers and > current_controllers are a lot of drivers do, suffice? No.. its not a 'feature' I care to support for 'my' controllers. I simply don't want to have to do two (or more) hierarchy walks for accounting on every schedule event, all that pointer chasing is stupidly expensive. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 02:32:04 -0700 Message-ID: <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=uJ9y/WYlqz74FFpQlSv5tnQ+MrHWEhF+9lHDeFOwp5Q=; b=uHlW0n6KDR4dbknp6RRZlzYNod46/WpG04AfvYBQf+IiU2csQMYCU4uWfsau3AS5IN X0r1JwgZrxSOlieVlOXpn+bpis7kBM761SzSkbx90UNxqd7T/1jW05C9CXz3zkRlznJZ Xgrn4cOvlk/UkhdHhdTzRmHkVYoaMy66v1a5tAXkEmacFqAbRJYkWtz+s3OXKvZH7aeE OlSddnd2+uuGA7zL5HIUvbENJrln/zKNIaMS/LlyPVQY77ifh+WRP6Rupdj+1SqkqsCB rpBDR7KbmkLq3ozlKF/xWd3nABvDoFCS/oUHU14lipBKpXUVPkXv2ntbDcakqkmEZaQZ wjmA== Content-Disposition: inline In-Reply-To: <1346835993.2600.9.camel@twins> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Peter Zijlstra Cc: Glauber Costa , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org Hey, again. On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > Doing all this runtime is just going to make the mess even bigger, > because now we have to deal with even more stupid cases. > > So either we go and try to contain this mess as proposed by Glauber or > we go delete controllers.. I've had it with this crap. cpuacct is rather unique tho. I think it's gonna be silly whether the hierarchy is unified or not. 1. If they always can live on the exact same hierarchy, there's no point in having the two separate. Just merge them. 2. If they need differing levels of granularity, they either need to do it completely separately as they do now or have some form of dynamic optimization if absolutely necesary. So, I think that choice is rather separate from other issues. If cpuacct is gonna be kept, I'd just keep it separate and warn that it incurs extra overhead for the current users if for nothing else. Otherwise, kill it or merge it into cpu. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 13:30:23 +0400 Message-ID: <50471BAF.2060708@parallels.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <20120905091925.GJ3195@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120905091925.GJ3195@dhcp-172-17-108-109.mtv.corp.google.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/05/2012 01:19 PM, Tejun Heo wrote: > On Wed, Sep 05, 2012 at 01:12:34PM +0400, Glauber Costa wrote: >>> No, I never counted out differing granularity. >> >> Can you elaborate on which interface do you envision to make it work? >> They will clearly be mounted in the same hierarchy, or as said >> alternatively, comounted. > > I'm not sure yet. At the simplest, mask of controllers which should > honor (or ignore) nesting beyond the node. That should be > understandable enough. Not sure whether that would be flexible enough > yet tho. In the end, they should be comounted but again I don't think > enforcing comounting at the moment is a step towards that. It's more > like a step sideways. > Tejun, >From the code PoV, guaranteed comounting is what allow us to make optimizations. "Maybe comounting" will maybe simplify the interface, but will buy us nothing in the performance level. I am more than happy to respin it with an added interface for masking cgroups, if you believe this is a requirement. But hinting me about what you would like to see on that front would be really helpful. Re-asking my question: cpufreq, clocksources, ftrace, etc, they all use an interface that at this point can be considered quite standard. Applying the same logic, each cgroup would have a pair of files: available_controllers, current_controllers, that you can just control by writing to. This can get slightly funny when we consider the right semantics for the hierarchy, but really, everything will. And it is not like we'll have anything crazy, we just need to tailor it with care. If you think there is any chance of this getting us somewhere, I'll code it. But that would be something to be sent *together* with what I've just done. As I've said, if we can't guarantee the comounting, we would still lose all the optimization opportunities. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 13:31:56 +0400 Message-ID: <50471C0C.7050600@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1346837209.2600.14.camel@twins> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Peter Zijlstra Cc: Tejun Heo , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org On 09/05/2012 01:26 PM, Peter Zijlstra wrote: > On Wed, 2012-09-05 at 13:12 +0400, Glauber Costa wrote: >> On 09/05/2012 01:11 PM, Tejun Heo wrote: >>> Hello, Peter. >>> >>> On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: >>>> *confused* I always thought that was exactly what you meant with unified >>>> hierarchy. >>> >>> No, I never counted out differing granularity. >>> >> >> Can you elaborate on which interface do you envision to make it work? >> They will clearly be mounted in the same hierarchy, or as said >> alternatively, comounted. >> >> If you can turn them on/off on a per-subtree basis, which interface >> exactly do you propose for that? > > I wouldn't, screw that. That would result in the exact same problem > we're trying to fix. I want a single hierarchy walk, that's expensive > enough. > >> Would a pair of cgroup core files like available_controllers and >> current_controllers are a lot of drivers do, suffice? > > No.. its not a 'feature' I care to support for 'my' controllers. > > I simply don't want to have to do two (or more) hierarchy walks for > accounting on every schedule event, all that pointer chasing is stupidly > expensive. > You wouldn't have to do more than one hierarchy walks for that. What Tejun seems to want, is the ability to not have a particular controller at some point in the tree. But if they exist, they are always together. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 02:45:20 -0700 Message-ID: <20120905094520.GM3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=NOXQWR2ZAljSZJAUzeMtlvYH4JRMUDl8zfnylhDA+Sw=; b=pKc9Mdmr7YMDxXqao+VVHEqtjttcZ6sjijdCR1Kt/AmPLmXrwjbpT9KCbZVW7ZkQ+I 42nBrUQ8/DAtezvJTfZ5X4KMIW226u6Jk3zDmHrCMWlA+X3AK9FcoYaWdHsy1J6gnNRU Owglc2FaRtlDYPyQ0hifAKnFgAcc1tQLAnCPkFfqY2KEJlTRkHgdZH6WYj+puGscuWkP DsfjQeGXSbx7PnCsbhLh2d562HGOBTFXvu2MaVep7ocfxUCbPuNgoL9fYO4g1jrMjTV8 frI1ma0dSIzWHCYrXGbOioER1nOipj3PHlAYQDUPUZGYD1O5vOcutgUQ4InObWumDQQm kBQw== Content-Disposition: inline In-Reply-To: <50471C0C.7050600@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hello, On Wed, Sep 05, 2012 at 01:31:56PM +0400, Glauber Costa wrote: > > I simply don't want to have to do two (or more) hierarchy walks for > > accounting on every schedule event, all that pointer chasing is stupidly > > expensive. > > You wouldn't have to do more than one hierarchy walks for that. What > Tejun seems to want, is the ability to not have a particular controller > at some point in the tree. But if they exist, they are always together. Nope, as I wrote in the other reply, for cpu and cpuacct, either just merge them or kill cpuacct if you want to avoid silliness from walking multiple times. Does cpuset cause problem in this regard too? Or can it be handled similarly to other controllers? I think the confusion here is that we're talking about two different issues. As for cpuacct, I can see why strict co-mounting can be attractive but then again if that's gonna be required, there's no point in having them separate, right? If that's the way you want it, just trigger WARN_ON() if cpu and cpuacct aren't co-mounted and later on kill cpuacct. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 13:48:30 +0400 Message-ID: <50471FEE.8060408@parallels.com> References: <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <20120905094520.GM3195@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120905094520.GM3195@dhcp-172-17-108-109.mtv.corp.google.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/05/2012 01:45 PM, Tejun Heo wrote: > Hello, > > On Wed, Sep 05, 2012 at 01:31:56PM +0400, Glauber Costa wrote: >>> > > I simply don't want to have to do two (or more) hierarchy walks for >>> > > accounting on every schedule event, all that pointer chasing is stupidly >>> > > expensive. >> > >> > You wouldn't have to do more than one hierarchy walks for that. What >> > Tejun seems to want, is the ability to not have a particular controller >> > at some point in the tree. But if they exist, they are always together. > Nope, as I wrote in the other reply, Would you mind, then, stopping for a moment and telling us what it is, then, that you envision? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 5 Sep 2012 02:56:06 -0700 Message-ID: <20120905095606.GN3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <20120905094520.GM3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471FEE.8060408@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=2TSOsKhRWAGGUDwoAQJnsZ3uLJ3dbvTdWvYCOvZsOg4=; b=tDRgl5IXQ3/AutYUQ3KjD7aiIwdb16tF2PbDLJpG3Jl35n8EnRSnpWotBpH2W1ui+9 VF9O+1WNFDBIvBwCJQ6HcKyuIIZLmI3QLpTWr8QjhpLdTBrppLiL1CkdRajIVzYpPNbp m7Fcv2ORiX7Obp+a6/HBXmOQcMKMeXcaLgf6AYKgYZwgERyMeHouejy8mP/MWaCPSt+1 CVqZ4IN0Fykd94kMNwTIwSHDA2Av3HoOGoShV7e0M7iFMLuzowKqIhcSiVaPMqfWazZf GwLDRsT7clNCEcnyw4bo5XsGizpPyzBsB4/uTsiEzRwnVEB/7TPzZs5h+j/W8YEpSxRi 4Njg== Content-Disposition: inline In-Reply-To: <50471FEE.8060408@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On Wed, Sep 05, 2012 at 01:48:30PM +0400, Glauber Costa wrote: > > Nope, as I wrote in the other reply, > > Would you mind, then, stopping for a moment and telling us what it is, > then, that you envision? I thought I already explained it a couple times in this thread (also in the big thread from several months ago). It's nearing three in the morning here. I'll try to explain it better tomorrow. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 05 Sep 2012 12:04:47 +0200 Message-ID: <1346839487.2600.24.camel@twins> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On Wed, 2012-09-05 at 02:32 -0700, Tejun Heo wrote: > Hey, again. >=20 > On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > > Doing all this runtime is just going to make the mess even bigger, > > because now we have to deal with even more stupid cases. > >=20 > > So either we go and try to contain this mess as proposed by Glauber or > > we go delete controllers.. I've had it with this crap. >=20 > cpuacct is rather unique tho. I think it's gonna be silly whether the > hierarchy is unified or not. >=20 > 1. If they always can live on the exact same hierarchy, there's no > point in having the two separate. Just merge them. >=20 > 2. If they need differing levels of granularity, they either need to > do it completely separately as they do now or have some form of > dynamic optimization if absolutely necesary. >=20 > So, I think that choice is rather separate from other issues. If > cpuacct is gonna be kept, I'd just keep it separate and warn that it > incurs extra overhead for the current users if for nothing else. > Otherwise, kill it or merge it into cpu. Quite, hence my 'proposal' to remove cpuacct. There was some whining last time Glauber proposed this, but the one whining never convinced and has gone away from Linux, so lets just do this. Lets make cpuacct print a deprecated msg to dmesg for a few releases and make cpu do all this. The co-mounting stuff would have been nice for cpusets as well, knowing all your tasks are affine to a subset of cpus allows for a few optimizations (smaller cpumask iterations), but I guess we'll have to do that dynamically, we'll just have to see how ugly that is. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Zijlstra Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Wed, 05 Sep 2012 12:20:53 +0200 Message-ID: <1346840453.2461.6.camel@laptop> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <50471C0C.7050600-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Glauber Costa Cc: Tejun Heo , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org On Wed, 2012-09-05 at 13:31 +0400, Glauber Costa wrote: > > You wouldn't have to do more than one hierarchy walks for that. What > Tejun seems to want, is the ability to not have a particular controller > at some point in the tree. But if they exist, they are always together. Right, but the accounting is very much tied to the control structures, I suppose we could change that, but my jet-leg addled brain isn't seeing anything particularly nice atm. But I don't really see the point though, this kind of interface would only ever work for the non-controlling and controlling controller combination (confused yet ;-), and I don't think we have many of those. I would really rather see a simplification of the entire cgroup interface space as opposed to making it more complex. And adding this subtree 'feature' only makes it more complex. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Thu, 6 Sep 2012 13:38:39 -0700 Message-ID: <20120906203839.GM29092@google.com> References: <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <1346840453.2461.6.camel@laptop> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=g82pX2aIRII4m851L40HmXzRFx+UqRKK/MKwmY1iDHU=; b=l4nXmtYasIAovsbSABRgxBXPxXp8q5FgCZTthCeTfYG4ihuhTwGnXHjGLgZXKJmykD AvHLXBb92qDfGMnS2DD1wWvQktBefR+AxNwwL513freljAP/mptu8UBlT5gRWCOlwF+6 IL/1j9b/exG7gD307upW4a6xYzfSP5eb7PLBXxGr+4nc9FL0yqGS7hZtSrWhVKoTsunl rOe2TcnpYjjcBnGxccdU1XKf//m6m7bojcM+F2tzkffpa7L4gtDppfzAXfkAVGRdLgu9 iRvBf9tS8IS5sv+gxeG+kv2qxTmcZForO2cejpyLoa5Q+NAdRrMQGrDZiObKkMzY3/Cd 1oKQ== Content-Disposition: inline In-Reply-To: <1346840453.2461.6.camel@laptop> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Peter Zijlstra Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hello, Peter, Glauber. (I'm gonna write up cgroup core todos which should explain / address this issue too. ATM I'm a bit overwhelmed with stuff accumulated while traveling.) On Wed, Sep 05, 2012 at 12:20:53PM +0200, Peter Zijlstra wrote: > But I don't really see the point though, this kind of interface would > only ever work for the non-controlling and controlling controller > combination (confused yet ;-), and I don't think we have many of those. It's more than that. One may not want to apply the same level of granularity to different resources. e.g. depending on the setup, IOs may need to be further categorized and controlled than memory or vice versa. > I would really rather see a simplification of the entire cgroup > interface space as opposed to making it more complex. And adding this > subtree 'feature' only makes it more complex. It does in the meantime but I think most of it can piggyback on the existing css_set mechanism. No matter what we do, this isn't gonna be a short and easy transition. More than half of the controllers don't even support proper hierarchy yet. We can't move to any kind of unified hierarchy without getting that settled first. I *think* I have a plan which can mostly work now. I'll write more later. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Thu, 6 Sep 2012 13:46:42 -0700 Message-ID: <20120906204642.GN29092@google.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346839487.2600.24.camel@twins> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=SnKDjsc57H+KEoGDIYH1vgCyPSzX4lqgS6GVZ5jddWs=; b=Jn8dvmmuBcClsHUISmyuY8ICvkv6EQq4y3xOZOcZhdz+cMkd4uS5AFgZa3b/qfKhCo SDfF0doEpFxQOD7hS/wHyDcQFXvTRnZfpISEkOmNuYYwOKbKwceV7PsgGlA8Re0ueaQ6 t8hL3gyPNOQZ/QQNhTllLDH9YWX/8wZArqDx7hj17VTpp+M97FXbDIiqkwTXQLgTtskG C15QisuZv6Ijlf0TD+A3rNaEiPA641/eSN0jp8NJ7vYpP9orolsX9Iw5xGirUsSqT+Wy 6dvUZy9RbJ+b9gsf1aMGtOSymu8RHArAJemzy9wlcQMKV8unn41cW5W9QW2AQxU7UJAe BHAg== Content-Disposition: inline In-Reply-To: <1346839487.2600.24.camel@twins> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Peter Zijlstra Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, Dhaval Giani , Frederic Weisbecker Hello, cc'ing Dhaval and Frederic. They were interested in the subject before and Dhaval was pretty vocal about cpuacct having a separate hierarchy (or at least granularity). On Wed, Sep 05, 2012 at 12:04:47PM +0200, Peter Zijlstra wrote: > > cpuacct is rather unique tho. I think it's gonna be silly whether the > > hierarchy is unified or not. > > > > 1. If they always can live on the exact same hierarchy, there's no > > point in having the two separate. Just merge them. > > > > 2. If they need differing levels of granularity, they either need to > > do it completely separately as they do now or have some form of > > dynamic optimization if absolutely necesary. > > > > So, I think that choice is rather separate from other issues. If > > cpuacct is gonna be kept, I'd just keep it separate and warn that it > > incurs extra overhead for the current users if for nothing else. > > Otherwise, kill it or merge it into cpu. > > Quite, hence my 'proposal' to remove cpuacct. > > There was some whining last time Glauber proposed this, but the one > whining never convinced and has gone away from Linux, so lets just do > this. > > Lets make cpuacct print a deprecated msg to dmesg for a few releases and > make cpu do all this. I like it. Currently cpuacct is the only problematic one in this regard (cpuset to a much lesser extent) and it would be great to make it go away. Dhaval, Frederic, Paul, if you guys object, please voice your opinions. > The co-mounting stuff would have been nice for cpusets as well, knowing > all your tasks are affine to a subset of cpus allows for a few > optimizations (smaller cpumask iterations), but I guess we'll have to do > that dynamically, we'll just have to see how ugly that is. Forced co-mounting sounds rather silly to me. If the two are always gonna be co-mounted, why not just merge them and switch the functionality depending on configuration? I'm fairly sure the code would be simpler that way. If cpuset and cpu being separate is important enough && the overhead of doing things separately for cpuset isn't too high, I wouldn't bother too much with dynamic optimization but that's your call. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Turner Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Thu, 6 Sep 2012 14:11:00 -0700 Message-ID: References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346839487.2600.24.camel@twins> <20120906204642.GN29092@google.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-system-of-record; bh=5BB9PFFym5pK29EsDdJN/jkH583vb0Byk3zeGTeGm0U=; b=Wqyg3fO6EGBog3WLd+f40ppJw9yWeg+TXIsYoMZBLsvJV6qKSJJ3yijLwdWjAuUQKg usA5DGcQh0Gp7zx2xqxCliY/ULOwdxubzZsGoDyFC0hX3y9or+qr1nJkeDyiGnWQSUmP YBbrUJ6demadOP3pSep2Xxq1040FfnbOurrJWSVe1fxHGPjUW0rAkUCn58LacotpHPVL EFmDMc3q5eYn3CfO7Nurx1M1bqTlGKGWAdb0+lFAMzw8147K00JAOfsG7Pgku+wuqPYw Zr8sfE5mrrXQn+ELjFCT6PyuDdq+E76YAVbCRAaTtdY+uYF8bo3JF2M15HqcxsJ6ZPk6 sxew== In-Reply-To: <20120906204642.GN29092@google.com> Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Tejun Heo Cc: Peter Zijlstra , Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, lennart@poettering.net, kay.sievers@vrfy.org, Dhaval Giani , Frederic Weisbecker On Thu, Sep 6, 2012 at 1:46 PM, Tejun Heo wrote: > Hello, > > cc'ing Dhaval and Frederic. They were interested in the subject > before and Dhaval was pretty vocal about cpuacct having a separate > hierarchy (or at least granularity). Really? Time just has _not_ borne out this use-case. I'll let Dhaval make a case for this but he should expect violent objection. > > On Wed, Sep 05, 2012 at 12:04:47PM +0200, Peter Zijlstra wrote: >> > cpuacct is rather unique tho. I think it's gonna be silly whether the >> > hierarchy is unified or not. >> > >> > 1. If they always can live on the exact same hierarchy, there's no >> > point in having the two separate. Just merge them. >> > >> > 2. If they need differing levels of granularity, they either need to >> > do it completely separately as they do now or have some form of >> > dynamic optimization if absolutely necesary. >> > >> > So, I think that choice is rather separate from other issues. If >> > cpuacct is gonna be kept, I'd just keep it separate and warn that it >> > incurs extra overhead for the current users if for nothing else. >> > Otherwise, kill it or merge it into cpu. >> >> Quite, hence my 'proposal' to remove cpuacct. >> >> There was some whining last time Glauber proposed this, but the one >> whining never convinced and has gone away from Linux, so lets just do >> this. >> >> Lets make cpuacct print a deprecated msg to dmesg for a few releases and >> make cpu do all this. > > I like it. Currently cpuacct is the only problematic one in this > regard (cpuset to a much lesser extent) and it would be great to make > it go away. > > Dhaval, Frederic, Paul, if you guys object, please voice your > opinions. > >> The co-mounting stuff would have been nice for cpusets as well, knowing >> all your tasks are affine to a subset of cpus allows for a few >> optimizations (smaller cpumask iterations), but I guess we'll have to do >> that dynamically, we'll just have to see how ugly that is. > > Forced co-mounting sounds rather silly to me. If the two are always > gonna be co-mounted, why not just merge them and switch the > functionality depending on configuration? I'm fairly sure the code > would be simpler that way. It would be simpler but the problem is we'd break any userspace that was just doing mount cpuacct? Further, even if it were mounting both, userspace code still has to be changed to read from "cpu.export" instead of "cpuacct.export". I think a sane path on this front is: Immediately: Don't allow cpuacct and cpu to be co-mounted on separate hierarchies simultaneously. That is: mount none /dev/cgroup/cpuacct -t cgroupfs -o cpuacct : still works mount none /dev/cgroup/cpu -t cgroupfs -o cpu : still works mount none /dev/cgroup/cpux -t cgroupfs -o cpuacct,cpu : still works But the combination: mount none /dev/cgroup/cpu -t cgroupfs -o cpu : still works mount none /dev/cgroup/cpuacct -t cgroupfs -o cpu : EINVAL [or vice versa]. Also: WARN_ON when mounting cpuacct without cpu, strongly explaining that ANY such configuration is deprecated. Glauber's patchset goes most of the way towards enabling this. In a release or two: Make the restriction strict; don't allow individual mounting of cpuacct, force it to be mounted ONLY with cpu. Glauber's patchset gives us this. Finally: Mirror the interfaces to cpu, print nasty syslog messages about ANY mounts of cpuacct Follow that up by eventually removing cpuacct completely -- In general I think this sets a hard precedent of never allowing an accounting controller to exist with a control one for a given area, e.g. cpu, networking, mm, etc. In the cases where one of these exists already, any attempts to extend (acounting or control) must extend the existing. > > If cpuset and cpu being separate is important enough && the overhead > of doing things separately for cpuset isn't too high, I wouldn't > bother too much with dynamic optimization but that's your call. > Given the choice we would just straight out ripped it out long ago. Breaking the user-space ABI is the problem. > Thanks. > > -- > tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Fri, 7 Sep 2012 02:36:36 +0400 Message-ID: <50492574.6030308@parallels.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346839487.2600.24.camel@twins> <20120906204642.GN29092@google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Paul Turner Cc: Tejun Heo , Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, lennart@poettering.net, kay.sievers@vrfy.org, Dhaval Giani , Frederic Weisbecker On 09/07/2012 01:11 AM, Paul Turner wrote: > On Thu, Sep 6, 2012 at 1:46 PM, Tejun Heo wrote: >> Hello, >> >> cc'ing Dhaval and Frederic. They were interested in the subject >> before and Dhaval was pretty vocal about cpuacct having a separate >> hierarchy (or at least granularity). > > Really? Time just has _not_ borne out this use-case. I'll let Dhaval > make a case for this but he should expect violent objection. > I strongly advise against physical violence. In case it is really necessary, please break his legs only. >> On Wed, Sep 05, 2012 at 12:04:47PM +0200, Peter Zijlstra wrote: >>>> cpuacct is rather unique tho. I think it's gonna be silly whether the >>>> hierarchy is unified or not. >>>> >>>> 1. If they always can live on the exact same hierarchy, there's no >>>> point in having the two separate. Just merge them. >>>> >>>> 2. If they need differing levels of granularity, they either need to >>>> do it completely separately as they do now or have some form of >>>> dynamic optimization if absolutely necesary. >>>> >>>> So, I think that choice is rather separate from other issues. If >>>> cpuacct is gonna be kept, I'd just keep it separate and warn that it >>>> incurs extra overhead for the current users if for nothing else. >>>> Otherwise, kill it or merge it into cpu. >>> >>> Quite, hence my 'proposal' to remove cpuacct. >>> >>> There was some whining last time Glauber proposed this, but the one >>> whining never convinced and has gone away from Linux, so lets just do >>> this. >>> >>> Lets make cpuacct print a deprecated msg to dmesg for a few releases and >>> make cpu do all this. >> >> I like it. Currently cpuacct is the only problematic one in this >> regard (cpuset to a much lesser extent) and it would be great to make >> it go away. >> >> Dhaval, Frederic, Paul, if you guys object, please voice your >> opinions. >> >>> The co-mounting stuff would have been nice for cpusets as well, knowing >>> all your tasks are affine to a subset of cpus allows for a few >>> optimizations (smaller cpumask iterations), but I guess we'll have to do >>> that dynamically, we'll just have to see how ugly that is. >> >> Forced co-mounting sounds rather silly to me. If the two are always >> gonna be co-mounted, why not just merge them and switch the >> functionality depending on configuration? I'm fairly sure the code >> would be simpler that way. > > It would be simpler but the problem is we'd break any userspace that > was just doing mount cpuacct? > > Further, even if it were mounting both, userspace code still has to be > changed to read from "cpu.export" instead of "cpuacct.export". > Only if we remove cpuacct. What we can do, and I thought about doing, is just merging cpuacct functionality into cpu. Then we move cpuacct to default no. It will be there for userspace if they absolutely want to use it. > I think a sane path on this front is: > > Immediately: > Don't allow cpuacct and cpu to be co-mounted on separate hierarchies > simultaneously. > that is precisely what my patch does, except it is a bit more generic. > That is: > mount none /dev/cgroup/cpuacct -t cgroupfs -o cpuacct : still works > mount none /dev/cgroup/cpu -t cgroupfs -o cpu : still works > mount none /dev/cgroup/cpux -t cgroupfs -o cpuacct,cpu : still works > > But the combination: > mount none /dev/cgroup/cpu -t cgroupfs -o cpu : still works > mount none /dev/cgroup/cpuacct -t cgroupfs -o cpu : EINVAL [or vice versa]. > > Also: > WARN_ON when mounting cpuacct without cpu, strongly explaining that > ANY such configuration is deprecated. > > Glauber's patchset goes most of the way towards enabling this. > yes. > In a release or two: > Make the restriction strict; don't allow individual mounting of > cpuacct, force it to be mounted ONLY with cpu. > > Glauber's patchset gives us this. > > Finally: > Mirror the interfaces to cpu, print nasty syslog messages about ANY > mounts of cpuacct > Follow that up by eventually removing cpuacct completely > Why don't start with mirroring? It gives more time for people to start switching to it. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 From: Glauber Costa Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Fri, 7 Sep 2012 02:39:19 +0400 Message-ID: <50492617.8030609@parallels.com> References: <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <1346840453.2461.6.camel@laptop> <20120906203839.GM29092@google.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20120906203839.GM29092-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" To: Tejun Heo Cc: Peter Zijlstra , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org On 09/07/2012 12:38 AM, Tejun Heo wrote: > Hello, Peter, Glauber. > > (I'm gonna write up cgroup core todos which should explain / address > this issue too. ATM I'm a bit overwhelmed with stuff accumulated > while traveling.) > Yes, please. While you rightfully claim that you explained it a couple of times, it all seems to be quite fuzzy. I don't blame it on you: the current state of the interface leads to this. So another detailed explanation of what you envision at this point, considering the discussions we had in the previous days, would be really helpful, From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Thu, 6 Sep 2012 15:45:47 -0700 Message-ID: References: <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <1346840453.2461.6.camel@laptop> <20120906203839.GM29092@google.com> <50492617.8030609@parallels.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=6rNFIeD3lK6ZdrUbPEYj9C8BcynYTpJLnbTTMSl6eMc=; b=ViqwvjdOQLIqqdeXKXsOX0uvyM8h/kiv3z92yBFnGsi5Bddwqmjaj7uevHdGLlTWEr mPX3VZBLVvLLP64HGQUuaVdQA/1Ninf4MzN7/XHnBbqv8fx9tIvdoHiMxBB5L8+BFLN3 XGU1nFPEubtmJk9Htp9HSJBaXhboTruSfuhB2JS3fZQPVTT+7m0dbbq7ebqinm/aOM09 t/viuTiV1xnbMNmCc0ikejQhF6QeLPh6yrgJXyD7870dNCVR/wlWV1IRJRi681MMnfBE OQ38+4EO+D3SwXNkjjbNg2lNw5VL5y80LPKAULgpCEn6qhItohRo9fxJwYwbtu4P+RP7 A+8w== In-Reply-To: <50492617.8030609-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Glauber Costa Cc: Peter Zijlstra , linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, ben-/+tVBieCtBitmTQ+vhA3Yw@public.gmane.org, pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, lennart-mdGvqq1h2p+GdvJs77BJ7Q@public.gmane.org, kay.sievers-tD+1rO4QERM@public.gmane.org Hello, Glauber. On Thu, Sep 6, 2012 at 3:39 PM, Glauber Costa wrote: > Yes, please. > > While you rightfully claim that you explained it a couple of times, it > all seems to be quite fuzzy. I don't blame it on you: the current state > of the interface leads to this. Heh, I drank two cups of coffee and two glasses of wine that evening. Coffee won and I couldn't sleep till around 4am with splitting headache. I'm not too confident about what I wrote that night. :) > So another detailed explanation of what you envision at this point, > considering the discussions we had in the previous days, would be really > helpful, Definitely, will do. Please give me a few days to sort through immediately pending stuff. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dhaval Giani Subject: Re: [RFC 0/5] forced comounts for cgroups. Date: Sat, 8 Sep 2012 09:36:29 -0400 Message-ID: References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346839487.2600.24.camel@twins> <20120906204642.GN29092@google.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=hmINwVe0a40apzBjXrb4Uu5gKvCIhpX1iWlcQ0FZZgQ=; b=gJcHKRuT/k9aZgeCXRhyqcKoP/jo7n7TC8nVr7XWOd6aiyYbJgtmgV/lEVLltlBgdl 3nbT3X+o7TqT/J+gM4cPqYxNGhsPRPcYR4basiuVa1C1qQf2DQk+0OqCTYVu1dZhhxdt fxHzgICl5ghVj67zV+M9C/Mkk54niGUa2eZpNuRePQuAHn5pw1Rp8Tf/HvmS0ilHgrG4 Rus5o/nH/eu7t70nE/OKo8/4CYBn3qvBY0Fz2mbNJ8c96aRHP9ZcK3yURqhY7tg12CPi CYdGWqg8/L9A2Ik8jysRDvF9lRcaFKI5Shuv3iuqL0NbpCgChgr3zbFjny0vVtQzC4Oc bGQw== In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Paul Turner Cc: Tejun Heo , Peter Zijlstra , Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, lennart@poettering.net, kay.sievers@vrfy.org, Frederic Weisbecker , Balbir Singh , Bharata B Rao On Thu, Sep 6, 2012 at 5:11 PM, Paul Turner wrote: > On Thu, Sep 6, 2012 at 1:46 PM, Tejun Heo wrote: >> Hello, >> >> cc'ing Dhaval and Frederic. They were interested in the subject >> before and Dhaval was pretty vocal about cpuacct having a separate >> hierarchy (or at least granularity). > > Really? Time just has _not_ borne out this use-case. I'll let Dhaval > make a case for this but he should expect violent objection. > I am not objecting directly! I am aware of a few users who are (or at least were) using cpu and cpuacct separately because they want to be able to account without control. Having said that, there are tons of flaws in the current approach, because the accounting without control is just plain wrong. I have copied a few other folks who might be able to shed light on those users and if we should still consider them. [And the lesser number of controllers, the better it is!] Thanks! Dhaval -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx204.postini.com [74.125.245.204]) by kanga.kvack.org (Postfix) with SMTP id C793D6B0068 for ; Tue, 4 Sep 2012 17:46:07 -0400 (EDT) Received: by dadi14 with SMTP id i14so4994209dad.14 for ; Tue, 04 Sep 2012 14:46:07 -0700 (PDT) Date: Tue, 4 Sep 2012 14:46:02 -0700 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hello, Glauber. On Tue, Sep 04, 2012 at 06:18:15PM +0400, Glauber Costa wrote: > As we have been extensively discussing, the cost and pain points for cgroups > come from many places. But at least one of those is the arbitrary nature of > hierarchies. Many people, including at least Tejun and me would like this to go > away altogether. Problem so far, is breaking compatiblity with existing setups > > I am proposing here a default-n Kconfig option that will guarantee that the cpu > cgroups (for now) will be comounted. I started with them because the > cpu/cpuacct division is clearly the worst offender. Also, the default-n is here > so distributions will have time to adapt: Forcing this flag to be on without > userspace changes will just lead to cgroups failing to mount, which we don't > want. > > Although I've tested it and it works, I haven't compile-tested all possible > config combinations. So this is mostly for your eyes. If this gets traction, > I'll submit it properly, along with any changes that you might require. As I said during the discussion, I'm skeptical about how useful this is. This can't nudge existing users in any meaningfully gradual way. Kconfig doesn't make it any better. It's still an abrupt behavior change when seen from userland. Also, I really don't see much point in enforcing this almost arbitrary grouping of controllers. It doesn't simplify anything and using cpuacct in more granular way than cpu actually is one of the better justified use of multiple hierarchies. Also, what about memcg and blkcg? Do they *really* coincide? Note that both blkcg and memcg involve non-trivial overhead and blkcg is essentially broken hierarchy-wise. Currently, from userland visible behavior POV, the crazy parts are 1. The flat hierarchy thing. This just should go away. 2. Orthogonal multiple hierarchies. I think we agree that #1 should go away one way or the other. I *really* wanna get rid of #2 but am not sure how. I'll give it another stab once the writeback thing is resolved. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx175.postini.com [74.125.245.175]) by kanga.kvack.org (Postfix) with SMTP id 5A6186B005D for ; Wed, 5 Sep 2012 04:06:46 -0400 (EDT) Message-ID: <5047074D.1030104@parallels.com> Date: Wed, 5 Sep 2012 12:03:25 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/05/2012 01:46 AM, Tejun Heo wrote: > Hello, Glauber. > > On Tue, Sep 04, 2012 at 06:18:15PM +0400, Glauber Costa wrote: >> As we have been extensively discussing, the cost and pain points for cgroups >> come from many places. But at least one of those is the arbitrary nature of >> hierarchies. Many people, including at least Tejun and me would like this to go >> away altogether. Problem so far, is breaking compatiblity with existing setups >> >> I am proposing here a default-n Kconfig option that will guarantee that the cpu >> cgroups (for now) will be comounted. I started with them because the >> cpu/cpuacct division is clearly the worst offender. Also, the default-n is here >> so distributions will have time to adapt: Forcing this flag to be on without >> userspace changes will just lead to cgroups failing to mount, which we don't >> want. >> >> Although I've tested it and it works, I haven't compile-tested all possible >> config combinations. So this is mostly for your eyes. If this gets traction, >> I'll submit it properly, along with any changes that you might require. > > As I said during the discussion, I'm skeptical about how useful this > is. This can't nudge existing users in any meaningfully gradual way. > Kconfig doesn't make it any better. It's still an abrupt behavior > change when seen from userland. > The goal here is to have distributions to do it, because they tend to have a well defined lifecycle management, much more than upstream. Whoever sets this option, can coordinate with upstream. Aside from enforcing it, we can pretty much warn() as well, to direct people towards flipping the switch. > Also, I really don't see much point in enforcing this almost arbitrary > grouping of controllers. It doesn't simplify anything and using > cpuacct in more granular way than cpu actually is one of the better > justified use of multiple hierarchies. Also, what about memcg and > blkcg? Do they *really* coincide? Note that both blkcg and memcg > involve non-trivial overhead and blkcg is essentially broken > hierarchy-wise. > Where did I mention memcg or blkcg in this patch ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx189.postini.com [74.125.245.189]) by kanga.kvack.org (Postfix) with SMTP id 687C16B0068 for ; Wed, 5 Sep 2012 04:38:37 -0400 (EDT) Message-ID: <50470EBF.9070109@parallels.com> Date: Wed, 5 Sep 2012 12:35:11 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/05/2012 12:29 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:17:11PM +0400, Glauber Costa wrote: >>> Distros can just co-mount them during boot. What's the point of the >>> config options? >> >> Pretty simple. The kernel can't assume the distro did. And then we still >> need to pay a stupid big price in the scheduler. >> >> After this patchset, We can assume this. And cpuusage can totally be >> derived from the cpu cgroup. Because much more than "they can comount", >> we can assume they did. > > As long as cpuacct and cpu are separate, I think it makes sense to > assume that they at least could be at different granularity. If they are comounted, and more: forceably comounted, I don't see how to call them separate. At the very best, they are this way for compatibility purposes only, to lay a path that would allow us to get rid of the separation eventually. > As for > optimization for co-mounted case, if that is *really* necessary, > couldn't it be done dynamically? It's not like CONFIG_XXX blocks are > pretty things and they're worse for runtime code path coverage. > I've done it dynamically, as you know. But if you think that complicated the code less than this, we're operating by very different standards... CONFIG options can make the code uglier, but it is a lot more predictable. It also guarantee no state changes will happen during the lifecycle of the machine. Doing it dynamically makes the code prettier, but still extensively large, and prone to subtle bugs, as we've already seen in practice. >>> Differing hierarchies in memcg and blkcg currently is the most >>> prominent case where the intersection in writeback is problematic and >>> your proposed solution doesn't help one way or the other. What's the >>> point? >> >> The point is that I am focusing at one problem at a time. But FWIW, I >> don't see why memcg/blkcg can't use a step just like this one in a >> separate pass. >> >> If the goal is comounting them eventually, at some point when the issues >> are sorted out, just do it. Get a switch like this one, and then you >> will start being able to assume a lot of things in the code. Miracles >> can happen. > > The problem is that I really don't see how this leads to where we > eventually wanna be. Orthogonal hierarchies are bad because, > > * It complicates the code. This doesn't really help there much. > Way I see it, it is the price we pay for having screwed up before. And Kconfig options doesn't necessarily complicate the code. They make it bigger, and possibly slightly harder to follow. But I myself > * Intersections between controllers are cumbersome to handle. Again, > this doesn't help much. > They are only cumbersome because we can't assume nothing. The cpuacct is the perfect example. Once we can start assuming, they become a lot less so. > And this restricts the only valid use case for multiple hierarchies > which is applying differing level of granularity depending on > controllers. So, I don't know. Doesn't seem like a good idea to me. > > Thanks. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx153.postini.com [74.125.245.153]) by kanga.kvack.org (Postfix) with SMTP id 9E4936B005D for ; Wed, 5 Sep 2012 04:58:46 -0400 (EDT) Message-ID: <50471379.3060603@parallels.com> Date: Wed, 5 Sep 2012 12:55:21 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/05/2012 12:47 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:35:11PM +0400, Glauber Costa wrote: >>> As long as cpuacct and cpu are separate, I think it makes sense to >>> assume that they at least could be at different granularity. >> >> If they are comounted, and more: forceably comounted, I don't see how to >> call them separate. At the very best, they are this way for >> compatibility purposes only, to lay a path that would allow us to get >> rid of the separation eventually. > > I think this is where we disagree. I didn't mean that all controllers > should be using exactly the same hierarchy when I was talking about > unified hierarchy. I do think it's useful and maybe even essential to > allow differing levels of granularity. cpu and cpuacct could be a > valid example for this. Likely blkcg and memcg too. > > So, I think it's desirable for all controllers to be able to handle > hierarchies the same way and to have the ability to tag something as > belonging to certain group in the hierarchy for all controllers but I > don't think it's desirable or feasible to require all of them to > follow exactly the same grouping at all levels. > By "different levels of granularity" do you mean having just a subset of them turned on at a particular place? If yes, having them guaranteed to be comounted is still perceived by me as a good first step. A natural following would be to turn them on/off on a per-group basis. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx103.postini.com [74.125.245.103]) by kanga.kvack.org (Postfix) with SMTP id 0B5226B0069 for ; Wed, 5 Sep 2012 05:10:01 -0400 (EDT) Message-ID: <5047161F.60503@parallels.com> Date: Wed, 5 Sep 2012 13:06:39 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471379.3060603@parallels.com> <20120905090744.GG3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905090744.GG3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/05/2012 01:07 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:55:21PM +0400, Glauber Costa wrote: >>> So, I think it's desirable for all controllers to be able to handle >>> hierarchies the same way and to have the ability to tag something as >>> belonging to certain group in the hierarchy for all controllers but I >>> don't think it's desirable or feasible to require all of them to >>> follow exactly the same grouping at all levels. >> >> By "different levels of granularity" do you mean having just a subset of >> them turned on at a particular place? > > Heh, this is tricky to describe and I'm not really following what you > mean. Do we really want to start cleaning up all this by changing the interface to something that is described as "tricky" ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx191.postini.com [74.125.245.191]) by kanga.kvack.org (Postfix) with SMTP id 4581D6B006E for ; Wed, 5 Sep 2012 05:11:46 -0400 (EDT) Received: by pbbro12 with SMTP id ro12so661159pbb.14 for ; Wed, 05 Sep 2012 02:11:45 -0700 (PDT) Date: Wed, 5 Sep 2012 02:11:40 -0700 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1346835993.2600.9.camel@twins> Sender: owner-linux-mm@kvack.org List-ID: To: Peter Zijlstra Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hello, Peter. On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > *confused* I always thought that was exactly what you meant with unified > hierarchy. No, I never counted out differing granularity. > Doing all this runtime is just going to make the mess even bigger, > because now we have to deal with even more stupid cases. > > So either we go and try to contain this mess as proposed by Glauber or > we go delete controllers.. I've had it with this crap. If cpuacct can really go away, that's great, but I don't think the problem at hand is unsolvable, so let's not jump it. cpuacct and cpu aren't the onlfy problem cases after all. We need to solve it for other controllers too. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx189.postini.com [74.125.245.189]) by kanga.kvack.org (Postfix) with SMTP id E11956B0062 for ; Wed, 5 Sep 2012 05:15:02 -0400 (EDT) Received: by pbbro12 with SMTP id ro12so666635pbb.14 for ; Wed, 05 Sep 2012 02:15:02 -0700 (PDT) Date: Wed, 5 Sep 2012 02:14:56 -0700 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905091456.GI3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471379.3060603@parallels.com> <20120905090744.GG3195@dhcp-172-17-108-109.mtv.corp.google.com> <5047161F.60503@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5047161F.60503@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On Wed, Sep 05, 2012 at 01:06:39PM +0400, Glauber Costa wrote: > > Heh, this is tricky to describe and I'm not really following what you > > mean. > > Do we really want to start cleaning up all this by changing the > interface to something that is described as "tricky" ? The concept is not tricky. I just can't find the appropriate words. I *suspect* this can mostly re-use the existing css_set thing. It mostly becomes that css_set belongs to the unified hierarchy rather than each task. The user interface part isn't trivial and maybe "don't nest beyond this level" is the only thing reasonable. Not sure yet whether that would be enough tho. Need to think more about it. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx194.postini.com [74.125.245.194]) by kanga.kvack.org (Postfix) with SMTP id E55056B0068 for ; Wed, 5 Sep 2012 05:19:31 -0400 (EDT) Received: by pbbro12 with SMTP id ro12so673751pbb.14 for ; Wed, 05 Sep 2012 02:19:31 -0700 (PDT) Date: Wed, 5 Sep 2012 02:19:25 -0700 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905091925.GJ3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50471782.6060800@parallels.com> Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On Wed, Sep 05, 2012 at 01:12:34PM +0400, Glauber Costa wrote: > > No, I never counted out differing granularity. > > Can you elaborate on which interface do you envision to make it work? > They will clearly be mounted in the same hierarchy, or as said > alternatively, comounted. I'm not sure yet. At the simplest, mask of controllers which should honor (or ignore) nesting beyond the node. That should be understandable enough. Not sure whether that would be flexible enough yet tho. In the end, they should be comounted but again I don't think enforcing comounting at the moment is a step towards that. It's more like a step sideways. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx148.postini.com [74.125.245.148]) by kanga.kvack.org (Postfix) with SMTP id 89A136B005D for ; Wed, 5 Sep 2012 05:32:10 -0400 (EDT) Received: by dadi14 with SMTP id i14so258851dad.14 for ; Wed, 05 Sep 2012 02:32:09 -0700 (PDT) Date: Wed, 5 Sep 2012 02:32:04 -0700 From: Tejun Heo Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1346835993.2600.9.camel@twins> Sender: owner-linux-mm@kvack.org List-ID: To: Peter Zijlstra Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hey, again. On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > Doing all this runtime is just going to make the mess even bigger, > because now we have to deal with even more stupid cases. > > So either we go and try to contain this mess as proposed by Glauber or > we go delete controllers.. I've had it with this crap. cpuacct is rather unique tho. I think it's gonna be silly whether the hierarchy is unified or not. 1. If they always can live on the exact same hierarchy, there's no point in having the two separate. Just merge them. 2. If they need differing levels of granularity, they either need to do it completely separately as they do now or have some form of dynamic optimization if absolutely necesary. So, I think that choice is rather separate from other issues. If cpuacct is gonna be kept, I'd just keep it separate and warn that it incurs extra overhead for the current users if for nothing else. Otherwise, kill it or merge it into cpu. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx125.postini.com [74.125.245.125]) by kanga.kvack.org (Postfix) with SMTP id 0EA626B0069 for ; Wed, 5 Sep 2012 05:33:43 -0400 (EDT) Message-ID: <50471BAF.2060708@parallels.com> Date: Wed, 5 Sep 2012 13:30:23 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <20120905091925.GJ3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905091925.GJ3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/05/2012 01:19 PM, Tejun Heo wrote: > On Wed, Sep 05, 2012 at 01:12:34PM +0400, Glauber Costa wrote: >>> No, I never counted out differing granularity. >> >> Can you elaborate on which interface do you envision to make it work? >> They will clearly be mounted in the same hierarchy, or as said >> alternatively, comounted. > > I'm not sure yet. At the simplest, mask of controllers which should > honor (or ignore) nesting beyond the node. That should be > understandable enough. Not sure whether that would be flexible enough > yet tho. In the end, they should be comounted but again I don't think > enforcing comounting at the moment is a step towards that. It's more > like a step sideways. > Tejun, >>From the code PoV, guaranteed comounting is what allow us to make optimizations. "Maybe comounting" will maybe simplify the interface, but will buy us nothing in the performance level. I am more than happy to respin it with an added interface for masking cgroups, if you believe this is a requirement. But hinting me about what you would like to see on that front would be really helpful. Re-asking my question: cpufreq, clocksources, ftrace, etc, they all use an interface that at this point can be considered quite standard. Applying the same logic, each cgroup would have a pair of files: available_controllers, current_controllers, that you can just control by writing to. This can get slightly funny when we consider the right semantics for the hierarchy, but really, everything will. And it is not like we'll have anything crazy, we just need to tailor it with care. If you think there is any chance of this getting us somewhere, I'll code it. But that would be something to be sent *together* with what I've just done. As I've said, if we can't guarantee the comounting, we would still lose all the optimization opportunities. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx172.postini.com [74.125.245.172]) by kanga.kvack.org (Postfix) with SMTP id 6F0776B0070 for ; Wed, 5 Sep 2012 05:35:15 -0400 (EDT) Message-ID: <50471C0C.7050600@parallels.com> Date: Wed, 5 Sep 2012 13:31:56 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> In-Reply-To: <1346837209.2600.14.camel@twins> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Peter Zijlstra Cc: Tejun Heo , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/05/2012 01:26 PM, Peter Zijlstra wrote: > On Wed, 2012-09-05 at 13:12 +0400, Glauber Costa wrote: >> On 09/05/2012 01:11 PM, Tejun Heo wrote: >>> Hello, Peter. >>> >>> On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: >>>> *confused* I always thought that was exactly what you meant with unified >>>> hierarchy. >>> >>> No, I never counted out differing granularity. >>> >> >> Can you elaborate on which interface do you envision to make it work? >> They will clearly be mounted in the same hierarchy, or as said >> alternatively, comounted. >> >> If you can turn them on/off on a per-subtree basis, which interface >> exactly do you propose for that? > > I wouldn't, screw that. That would result in the exact same problem > we're trying to fix. I want a single hierarchy walk, that's expensive > enough. > >> Would a pair of cgroup core files like available_controllers and >> current_controllers are a lot of drivers do, suffice? > > No.. its not a 'feature' I care to support for 'my' controllers. > > I simply don't want to have to do two (or more) hierarchy walks for > accounting on every schedule event, all that pointer chasing is stupidly > expensive. > You wouldn't have to do more than one hierarchy walks for that. What Tejun seems to want, is the ability to not have a particular controller at some point in the tree. But if they exist, they are always together. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx159.postini.com [74.125.245.159]) by kanga.kvack.org (Postfix) with SMTP id DCC066B0083 for ; Wed, 5 Sep 2012 06:21:00 -0400 (EDT) Received: from canuck.infradead.org ([2001:4978:20e::1]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1T9CjD-0006Te-5d for linux-mm@kvack.org; Wed, 05 Sep 2012 10:20:59 +0000 Received: from dhcp-089-099-019-018.chello.nl ([89.99.19.18] helo=dyad.programming.kicks-ass.net) by canuck.infradead.org with esmtpsa (Exim 4.76 #1 (Red Hat Linux)) id 1T9CjC-0000wa-DR for linux-mm@kvack.org; Wed, 05 Sep 2012 10:20:58 +0000 Subject: Re: [RFC 0/5] forced comounts for cgroups. From: Peter Zijlstra In-Reply-To: <50471C0C.7050600@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 05 Sep 2012 12:20:53 +0200 Message-ID: <1346840453.2461.6.camel@laptop> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: Tejun Heo , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On Wed, 2012-09-05 at 13:31 +0400, Glauber Costa wrote: > > You wouldn't have to do more than one hierarchy walks for that. What > Tejun seems to want, is the ability to not have a particular controller > at some point in the tree. But if they exist, they are always together. Right, but the accounting is very much tied to the control structures, I suppose we could change that, but my jet-leg addled brain isn't seeing anything particularly nice atm. But I don't really see the point though, this kind of interface would only ever work for the non-controlling and controlling controller combination (confused yet ;-), and I don't think we have many of those. I would really rather see a simplification of the entire cgroup interface space as opposed to making it more complex. And adding this subtree 'feature' only makes it more complex. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx108.postini.com [74.125.245.108]) by kanga.kvack.org (Postfix) with SMTP id 897A76B005A for ; Thu, 6 Sep 2012 18:42:38 -0400 (EDT) Message-ID: <50492617.8030609@parallels.com> Date: Fri, 7 Sep 2012 02:39:19 +0400 From: Glauber Costa MIME-Version: 1.0 Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <1346840453.2461.6.camel@laptop> <20120906203839.GM29092@google.com> In-Reply-To: <20120906203839.GM29092@google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Tejun Heo Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org On 09/07/2012 12:38 AM, Tejun Heo wrote: > Hello, Peter, Glauber. > > (I'm gonna write up cgroup core todos which should explain / address > this issue too. ATM I'm a bit overwhelmed with stuff accumulated > while traveling.) > Yes, please. While you rightfully claim that you explained it a couple of times, it all seems to be quite fuzzy. I don't blame it on you: the current state of the interface leads to this. So another detailed explanation of what you envision at this point, considering the discussions we had in the previous days, would be really helpful, -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx171.postini.com [74.125.245.171]) by kanga.kvack.org (Postfix) with SMTP id 317DF6B005A for ; Thu, 6 Sep 2012 18:45:49 -0400 (EDT) Received: by lahd3 with SMTP id d3so1836822lah.14 for ; Thu, 06 Sep 2012 15:45:47 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <50492617.8030609@parallels.com> References: <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <1346840453.2461.6.camel@laptop> <20120906203839.GM29092@google.com> <50492617.8030609@parallels.com> Date: Thu, 6 Sep 2012 15:45:47 -0700 Message-ID: Subject: Re: [RFC 0/5] forced comounts for cgroups. From: Tejun Heo Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Glauber Costa Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Hello, Glauber. On Thu, Sep 6, 2012 at 3:39 PM, Glauber Costa wrote: > Yes, please. > > While you rightfully claim that you explained it a couple of times, it > all seems to be quite fuzzy. I don't blame it on you: the current state > of the interface leads to this. Heh, I drank two cups of coffee and two glasses of wine that evening. Coffee won and I couldn't sleep till around 4am with splitting headache. I'm not too confident about what I wrote that night. :) > So another detailed explanation of what you envision at this point, > considering the discussions we had in the previous days, would be really > helpful, Definitely, will do. Please give me a few days to sort through immediately pending stuff. Thanks. -- tejun -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757238Ab2IDOVu (ORCPT ); Tue, 4 Sep 2012 10:21:50 -0400 Received: from mailhub.sw.ru ([195.214.232.25]:19542 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757157Ab2IDOVt (ORCPT ); Tue, 4 Sep 2012 10:21:49 -0400 From: Glauber Costa To: Cc: , , davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org Subject: [RFC 0/5] forced comounts for cgroups. Date: Tue, 4 Sep 2012 18:18:15 +0400 Message-Id: <1346768300-10282-1-git-send-email-glommer@parallels.com> X-Mailer: git-send-email 1.7.11.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, As we have been extensively discussing, the cost and pain points for cgroups come from many places. But at least one of those is the arbitrary nature of hierarchies. Many people, including at least Tejun and me would like this to go away altogether. Problem so far, is breaking compatiblity with existing setups I am proposing here a default-n Kconfig option that will guarantee that the cpu cgroups (for now) will be comounted. I started with them because the cpu/cpuacct division is clearly the worst offender. Also, the default-n is here so distributions will have time to adapt: Forcing this flag to be on without userspace changes will just lead to cgroups failing to mount, which we don't want. Although I've tested it and it works, I haven't compile-tested all possible config combinations. So this is mostly for your eyes. If this gets traction, I'll submit it properly, along with any changes that you might require. Thanks. Glauber Costa (5): cgroup: allow some comounts to be forced. sched: adjust exec_clock to use it as cpu usage metric sched: do not call cpuacct_charge when cpu and cpuacct are comounted cpuacct: do not gather cpuacct statistics when not mounted sched: add cpusets to comounts list include/linux/cgroup.h | 6 ++ init/Kconfig | 23 ++++++++ kernel/cgroup.c | 29 +++++++++- kernel/cpuset.c | 4 ++ kernel/sched/core.c | 149 +++++++++++++++++++++++++++++++++++++++++++++---- kernel/sched/rt.c | 1 + kernel/sched/sched.h | 20 ++++++- 7 files changed, 220 insertions(+), 12 deletions(-) -- 1.7.11.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757283Ab2IDOWH (ORCPT ); Tue, 4 Sep 2012 10:22:07 -0400 Received: from mailhub.sw.ru ([195.214.232.25]:6675 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757157Ab2IDOWB (ORCPT ); Tue, 4 Sep 2012 10:22:01 -0400 From: Glauber Costa To: Cc: , , davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org, Glauber Costa Subject: [RFC 2/5] sched: adjust exec_clock to use it as cpu usage metric Date: Tue, 4 Sep 2012 18:18:17 +0400 Message-Id: <1346768300-10282-3-git-send-email-glommer@parallels.com> X-Mailer: git-send-email 1.7.11.4 In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org exec_clock already provides per-group cpu usage metrics, and can be reused by cpuacct in case cpu and cpuacct are comounted. However, it is only provided by tasks in fair class. Doing the same for rt is easy, and can be done in an already existing hierarchy loop. This is an improvement over the independent hierarchy walk executed by cpuacct. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- kernel/sched/rt.c | 1 + kernel/sched/sched.h | 3 +++ 2 files changed, 4 insertions(+) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 573e1ca..40ef6af 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -930,6 +930,7 @@ static void update_curr_rt(struct rq *rq) for_each_sched_rt_entity(rt_se) { rt_rq = rt_rq_of_se(rt_se); + schedstat_add(rt_rq, exec_clock, delta_exec); if (sched_rt_runtime(rt_rq) != RUNTIME_INF) { raw_spin_lock(&rt_rq->rt_runtime_lock); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 55844f2..8da579d 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -204,6 +204,7 @@ struct cfs_rq { unsigned int nr_running, h_nr_running; u64 exec_clock; + u64 prev_exec_clock; u64 min_vruntime; #ifndef CONFIG_64BIT u64 min_vruntime_copy; @@ -295,6 +296,8 @@ struct rt_rq { struct plist_head pushable_tasks; #endif int rt_throttled; + u64 exec_clock; + u64 prev_exec_clock; u64 rt_time; u64 rt_runtime; /* Nests inside the rq lock: */ -- 1.7.11.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757295Ab2IDOWK (ORCPT ); Tue, 4 Sep 2012 10:22:10 -0400 Received: from mailhub.sw.ru ([195.214.232.25]:3376 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757236Ab2IDOWH (ORCPT ); Tue, 4 Sep 2012 10:22:07 -0400 From: Glauber Costa To: Cc: , , davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org, Glauber Costa Subject: [RFC 3/5] sched: do not call cpuacct_charge when cpu and cpuacct are comounted Date: Tue, 4 Sep 2012 18:18:18 +0400 Message-Id: <1346768300-10282-4-git-send-email-glommer@parallels.com> X-Mailer: git-send-email 1.7.11.4 In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org cpuacct_charge() incurs in some quite expensive operations to achieve its measurement goal. To make matters worse, this cost is not constant, but grows with the depth of the cgroup hierarchy tree. Also, all this data is already available anyway in the scheduler core. The fact that the cpuacct cgroup cannot be guaranteed to be mounted in the same hierarchy as the scheduler core cgroup (cpu), forces us to go gather them all again. With the introduction of CONFIG_CGROUP_FORCE_COMOUNT_CPU, we will be able to be absolutely sure that such a coupling exists. After that, the hierarchy walks can be completely abandoned. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- init/Kconfig | 19 +++++++ kernel/sched/core.c | 141 +++++++++++++++++++++++++++++++++++++++++++++++---- kernel/sched/sched.h | 14 ++++- 3 files changed, 163 insertions(+), 11 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index d7d693d..694944e 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -684,6 +684,25 @@ config CGROUP_FORCE_COMOUNT bool default n +config CGROUP_FORCE_COMOUNT_CPU + bool "Enforce single hierarchy for the cpu related cgroups" + depends on CGROUP_SCHED || CPUSETS || CGROUP_CPUACCT + select SCHEDSTATS + select CGROUP_FORCE_COMOUNT + default n + help + Throughout cgroup's life, it was always possible to mount the + controllers in completely independent hierarchies. However, the + costs incurred by allowing are considerably big. Hotpaths in the + scheduler needs to call expensive hierarchy walks more than once in + the same place just to account for the fact that multiple controllers + can be mounted in different places. + + Setting this option will disallow cpu, cpuacct and cpuset to be + mounted in different hierarchies. Distributions are highly encouraged + to set this option and comount those groups. + + config RESOURCE_COUNTERS bool "Resource counters" help diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 468bdd4..e46871d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8282,6 +8282,15 @@ static struct cftype cpu_files[] = { { } /* terminate */ }; +bool cpuacct_from_cpu; + +#ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU +void cpu_cgroup_bind(struct cgroup *root) +{ + cpuacct_from_cpu = root->root == root_task_group.css.cgroup->root; +#endif +} + struct cgroup_subsys cpu_cgroup_subsys = { .name = "cpu", .create = cpu_cgroup_create, @@ -8291,6 +8300,11 @@ struct cgroup_subsys cpu_cgroup_subsys = { .exit = cpu_cgroup_exit, .subsys_id = cpu_cgroup_subsys_id, .base_cftypes = cpu_files, +#ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU + .comounts = 1, + .must_comount = { cpuacct_subsys_id, }, + .bind = cpu_cgroup_bind, +#endif .early_init = 1, }; @@ -8345,8 +8359,102 @@ static void cpuacct_destroy(struct cgroup *cgrp) kfree(ca); } -static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu) +#ifdef CONFIG_CGROUP_SCHED +#ifdef CONFIG_FAIR_GROUP_SCHED +static struct cfs_rq * +cpu_cgroup_cfs_rq(struct cgroup *cgrp, int cpu) +{ + struct task_group *tg = cgroup_tg(cgrp); + + if (tg == &root_task_group) + return &cpu_rq(cpu)->cfs; + + return tg->cfs_rq[cpu]; +} + +static void cpu_cgroup_update_cpuusage_cfs(struct cgroup *cgrp, int cpu) +{ + struct cfs_rq *cfs = cpu_cgroup_cfs_rq(cgrp, cpu); + cfs->prev_exec_clock = cfs->exec_clock; +} +static u64 cpu_cgroup_cpuusage_cfs(struct cgroup *cgrp, int cpu) +{ + struct cfs_rq *cfs = cpu_cgroup_cfs_rq(cgrp, cpu); + return cfs->exec_clock - cfs->prev_exec_clock; +} +#else +static void cpu_cgroup_update_cpuusage_cfs(struct cgroup *cgrp, int cpu) { +} + +static u64 cpu_cgroup_cpuusage_cfs(struct cgroup *cgrp, int cpu) +{ + return 0; +} +#endif + +#ifdef CONFIG_RT_GROUP_SCHED +static struct rt_rq * +cpu_cgroup_rt_rq(struct cgroup *cgrp, int cpu) +{ + struct task_group *tg = cgroup_tg(cgrp); + if (tg == &root_task_group) + return &cpu_rq(cpu)->rt; + + return tg->rt_rq[cpu]; + +} +static void cpu_cgroup_update_cpuusage_rt(struct cgroup *cgrp, int cpu) +{ + struct rt_rq *rt = cpu_cgroup_rt_rq(cgrp, cpu); + rt->prev_exec_clock = rt->exec_clock; +} + +static u64 cpu_cgroup_cpuusage_rt(struct cgroup *cgrp, int cpu) +{ + struct rt_rq *rt = cpu_cgroup_rt_rq(cgrp, cpu); + return rt->exec_clock - rt->prev_exec_clock; +} +#else +static void cpu_cgroup_update_cpuusage_rt(struct cgroup *cgrp, int cpu) +{ +} +static u64 cpu_cgroup_cpuusage_rt(struct cgroup *cgrp, int cpu) +{ + return 0; +} +#endif + +static int cpu_cgroup_cpuusage_write(struct cgroup *cgrp, int cpu, u64 val) +{ + cpu_cgroup_update_cpuusage_cfs(cgrp, cpu); + cpu_cgroup_update_cpuusage_rt(cgrp, cpu); + return 0; +} + +static u64 cpu_cgroup_cpuusage_read(struct cgroup *cgrp, int cpu) +{ + return cpu_cgroup_cpuusage_cfs(cgrp, cpu) + + cpu_cgroup_cpuusage_rt(cgrp, cpu); +} + +#else +static u64 cpu_cgroup_cpuusage_read(struct cgroup *cgrp, int i) +{ + BUG(); + return 0; +} + +static int cpu_cgroup_cpuusage_write(struct cgroup *cgrp, int cpu, u64 val) +{ + BUG(); + return 0; +} +#endif /* CONFIG_CGROUP_SCHED */ + +static u64 cpuacct_cpuusage_read(struct cgroup *cgrp, int cpu) +{ + struct cpuacct *ca = cgroup_ca(cgrp); u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); u64 data; @@ -8364,8 +8472,9 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu) return data; } -static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) +static void cpuacct_cpuusage_write(struct cgroup *cgrp, int cpu, u64 val) { + struct cpuacct *ca = cgroup_ca(cgrp); u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); #ifndef CONFIG_64BIT @@ -8380,15 +8489,21 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) #endif } +static u64 cpuusage_read_percpu(struct cgroup *cgrp, int cpu) +{ + if (cpuacct_from_cpu) + return cpu_cgroup_cpuusage_read(cgrp, cpu); + return cpuacct_cpuusage_read(cgrp, cpu); +} + /* return total cpu usage (in nanoseconds) of a group */ static u64 cpuusage_read(struct cgroup *cgrp, struct cftype *cft) { - struct cpuacct *ca = cgroup_ca(cgrp); u64 totalcpuusage = 0; int i; for_each_present_cpu(i) - totalcpuusage += cpuacct_cpuusage_read(ca, i); + totalcpuusage += cpuusage_read_percpu(cgrp, i); return totalcpuusage; } @@ -8396,7 +8511,6 @@ static u64 cpuusage_read(struct cgroup *cgrp, struct cftype *cft) static int cpuusage_write(struct cgroup *cgrp, struct cftype *cftype, u64 reset) { - struct cpuacct *ca = cgroup_ca(cgrp); int err = 0; int i; @@ -8405,8 +8519,12 @@ static int cpuusage_write(struct cgroup *cgrp, struct cftype *cftype, goto out; } - for_each_present_cpu(i) - cpuacct_cpuusage_write(ca, i, 0); + for_each_present_cpu(i) { + if (cpuacct_from_cpu) + cpu_cgroup_cpuusage_write(cgrp, i, 0); + else + cpuacct_cpuusage_write(cgrp, i, 0); + } out: return err; @@ -8415,12 +8533,11 @@ out: static int cpuacct_percpu_seq_read(struct cgroup *cgroup, struct cftype *cft, struct seq_file *m) { - struct cpuacct *ca = cgroup_ca(cgroup); u64 percpu; int i; for_each_present_cpu(i) { - percpu = cpuacct_cpuusage_read(ca, i); + percpu = cpuusage_read_percpu(cgroup, i); seq_printf(m, "%llu ", (unsigned long long) percpu); } seq_printf(m, "\n"); @@ -8483,7 +8600,7 @@ static struct cftype files[] = { * * called with rq->lock held. */ -void cpuacct_charge(struct task_struct *tsk, u64 cputime) +void __cpuacct_charge(struct task_struct *tsk, u64 cputime) { struct cpuacct *ca; int cpu; @@ -8511,5 +8628,9 @@ struct cgroup_subsys cpuacct_subsys = { .destroy = cpuacct_destroy, .subsys_id = cpuacct_subsys_id, .base_cftypes = files, +#ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU + .comounts = 1, + .must_comount = { cpu_cgroup_subsys_id, }, +#endif }; #endif /* CONFIG_CGROUP_CPUACCT */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 8da579d..1da9fa8 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -885,6 +885,9 @@ extern void update_idle_cpu_load(struct rq *this_rq); #ifdef CONFIG_CGROUP_CPUACCT #include + +extern bool cpuacct_from_cpu; + /* track cpu usage of a group of tasks and its child groups */ struct cpuacct { struct cgroup_subsys_state css; @@ -914,7 +917,16 @@ static inline struct cpuacct *parent_ca(struct cpuacct *ca) return cgroup_ca(ca->css.cgroup->parent); } -extern void cpuacct_charge(struct task_struct *tsk, u64 cputime); +extern void __cpuacct_charge(struct task_struct *tsk, u64 cputime); + +static inline void cpuacct_charge(struct task_struct *tsk, u64 cputime) +{ +#ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU + if (likely(!cpuacct_from_cpu)) + return; +#endif + __cpuacct_charge(tsk, cputime); +} #else static inline void cpuacct_charge(struct task_struct *tsk, u64 cputime) {} #endif -- 1.7.11.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757269Ab2IDOWF (ORCPT ); Tue, 4 Sep 2012 10:22:05 -0400 Received: from mailhub.sw.ru ([195.214.232.25]:44641 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757250Ab2IDOWB (ORCPT ); Tue, 4 Sep 2012 10:22:01 -0400 From: Glauber Costa To: Cc: , , davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org, Glauber Costa Subject: [RFC 5/5] sched: add cpusets to comounts list Date: Tue, 4 Sep 2012 18:18:20 +0400 Message-Id: <1346768300-10282-6-git-send-email-glommer@parallels.com> X-Mailer: git-send-email 1.7.11.4 In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Although we have not yet identified any place where cpusets could be improved performance-wise by guaranteeing comounts with the other two cpu cgroups, it is a sane choice to mount them together. We can preemptively benefit from it and avoid a growing mess, by guaranteeing that subsystems that mostly contraint the same kind of resource will live together. With cgroups is never that simple, and things crosses boundaries quite often. But I hope this can be seen as a potential improvement. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- kernel/cpuset.c | 4 ++++ kernel/sched/core.c | 8 ++++---- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 8c8bd65..f8e1c49 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1879,6 +1879,10 @@ struct cgroup_subsys cpuset_subsys = { .post_clone = cpuset_post_clone, .subsys_id = cpuset_subsys_id, .base_cftypes = files, +#ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU + .comounts = 2, + .must_comount = { cpu_cgroup_subsys_id, cpuacct_subsys_id, }, +#endif .early_init = 1, }; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d654bd1..aeff02c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8301,8 +8301,8 @@ struct cgroup_subsys cpu_cgroup_subsys = { .subsys_id = cpu_cgroup_subsys_id, .base_cftypes = cpu_files, #ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU - .comounts = 1, - .must_comount = { cpuacct_subsys_id, }, + .comounts = 2, + .must_comount = { cpuacct_subsys_id, cpuset_subsys_id, }, .bind = cpu_cgroup_bind, #endif .early_init = 1, @@ -8637,8 +8637,8 @@ struct cgroup_subsys cpuacct_subsys = { .base_cftypes = files, .bind = cpuacct_bind, #ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU - .comounts = 1, - .must_comount = { cpu_cgroup_subsys_id, }, + .comounts = 2, + .must_comount = { cpu_cgroup_subsys_id, cpuset_subsys_id, }, #endif }; #endif /* CONFIG_CGROUP_CPUACCT */ -- 1.7.11.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757304Ab2IDOWi (ORCPT ); Tue, 4 Sep 2012 10:22:38 -0400 Received: from mailhub.sw.ru ([195.214.232.25]:4810 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757262Ab2IDOWF (ORCPT ); Tue, 4 Sep 2012 10:22:05 -0400 From: Glauber Costa To: Cc: , , davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org, Glauber Costa Subject: [RFC 4/5] cpuacct: do not gather cpuacct statistics when not mounted Date: Tue, 4 Sep 2012 18:18:19 +0400 Message-Id: <1346768300-10282-5-git-send-email-glommer@parallels.com> X-Mailer: git-send-email 1.7.11.4 In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently, the only test that prevents us from running the expensive cpuacct_charge() is cpuacct_subsys.active == true. This will hold at all times after the subsystem is activated, even if it is not mounted. IOW, use it or not, you pay it. By hooking with the bind() callback, we can detect when cpuacct is mounted or umounted, and stop collecting statistics when this cgroup is not in use. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- kernel/sched/core.c | 8 ++++++++ kernel/sched/sched.h | 3 +++ 2 files changed, 11 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e46871d..d654bd1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8595,6 +8595,13 @@ static struct cftype files[] = { { } /* terminate */ }; +bool cpuacct_mounted; + +void cpuacct_bind(struct cgroup *root) +{ + cpuacct_mounted = root->root == root_cpuacct.css.cgroup->root; +} + /* * charge this task's execution time to its accounting group. * @@ -8628,6 +8635,7 @@ struct cgroup_subsys cpuacct_subsys = { .destroy = cpuacct_destroy, .subsys_id = cpuacct_subsys_id, .base_cftypes = files, + .bind = cpuacct_bind, #ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU .comounts = 1, .must_comount = { cpu_cgroup_subsys_id, }, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1da9fa8..d33f777 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -887,6 +887,7 @@ extern void update_idle_cpu_load(struct rq *this_rq); #include extern bool cpuacct_from_cpu; +extern bool cpuacct_mounted; /* track cpu usage of a group of tasks and its child groups */ struct cpuacct { @@ -921,6 +922,8 @@ extern void __cpuacct_charge(struct task_struct *tsk, u64 cputime); static inline void cpuacct_charge(struct task_struct *tsk, u64 cputime) { + if (unlikely(!cpuacct_mounted)) + return; #ifdef CONFIG_CGROUP_FORCE_COMOUNT_CPU if (likely(!cpuacct_from_cpu)) return; -- 1.7.11.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757254Ab2IDOWA (ORCPT ); Tue, 4 Sep 2012 10:22:00 -0400 Received: from mailhub.sw.ru ([195.214.232.25]:6515 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757157Ab2IDOV5 (ORCPT ); Tue, 4 Sep 2012 10:21:57 -0400 From: Glauber Costa To: Cc: , , davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, tj@kernel.org, Glauber Costa Subject: [RFC 1/5] cgroup: allow some comounts to be forced. Date: Tue, 4 Sep 2012 18:18:16 +0400 Message-Id: <1346768300-10282-2-git-send-email-glommer@parallels.com> X-Mailer: git-send-email 1.7.11.4 In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org One of the pain points we have today with cgroups, is the excessive flexibility coming from the fact that controllers can be mounted at will, without any relationship with each other. Although this is nice in principle, this comes with a cost that is not always welcome in practice. The very fact of this being possible is already enough to trigger those costs. We cannot assume a common hierarchy between controllers, and then hierarchy walks have to be done more than once. This happens in hotpaths as well. This patch introduces a Kconfig option, default n, that will force some controllers to be comounted. After some time, we may be able to deprecate this mode of operation. Signed-off-by: Glauber Costa CC: Dave Jones CC: Ben Hutchings CC: Peter Zijlstra CC: Paul Turner CC: Lennart Poettering CC: Kay Sievers CC: Tejun Heo --- include/linux/cgroup.h | 6 ++++++ init/Kconfig | 4 ++++ kernel/cgroup.c | 29 ++++++++++++++++++++++++++++- 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index d3f5fba..f986ad1 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -531,6 +531,12 @@ struct cgroup_subsys { /* should be defined only by modular subsystems */ struct module *module; + +#ifdef CONFIG_CGROUP_FORCE_COMOUNT + /* List of groups that we must be comounted with */ + int comounts; + int must_comount[3]; +#endif }; #define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys; diff --git a/init/Kconfig b/init/Kconfig index f64f888..d7d693d 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -680,6 +680,10 @@ config CGROUP_CPUACCT Provides a simple Resource Controller for monitoring the total CPU consumed by the tasks in a cgroup. +config CGROUP_FORCE_COMOUNT + bool + default n + config RESOURCE_COUNTERS bool "Resource counters" help diff --git a/kernel/cgroup.c b/kernel/cgroup.c index b303dfc..137ac62 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1058,6 +1058,33 @@ static int rebind_subsystems(struct cgroupfs_root *root, if (root->number_of_cgroups > 1) return -EBUSY; +#ifdef CONFIG_CGROUP_FORCE_COMOUNT + /* + * Some subsystems should not be allowed to be freely mounted in + * separate hierarchies. They may not be present, but if they are, they + * should be together. For compatibility with older kernels, we'll allow + * this to live inside a separate Kconfig option. Each subsys will be + * able to tell us which other subsys it expects to be mounted with. + * + * We do a separate path for this, to avoid unwinding our modifications + * in case of an error. + */ + for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { + unsigned long bit = 1UL << i; + int j; + + if (!(bit & added_bits)) + continue; + + for (j = 0; j < subsys[i]->comounts; j++) { + int comount_id = subsys[i]->must_comount[j]; + struct cgroup_subsys *ss = subsys[comount_id]; + if ((ss->root != &rootnode) && (ss->root != root)) + return -EINVAL; + } + } +#endif + /* Process each subsystem */ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { struct cgroup_subsys *ss = subsys[i]; @@ -1634,7 +1661,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, goto unlock_drop; ret = rebind_subsystems(root, root->subsys_bits); - if (ret == -EBUSY) { + if ((ret == -EBUSY) || (ret == -EINVAL)) { free_cg_links(&tmp_cg_links); goto unlock_drop; } -- 1.7.11.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932764Ab2IDVqK (ORCPT ); Tue, 4 Sep 2012 17:46:10 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:38563 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753769Ab2IDVqH (ORCPT ); Tue, 4 Sep 2012 17:46:07 -0400 Date: Tue, 4 Sep 2012 14:46:02 -0700 From: Tejun Heo To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1346768300-10282-1-git-send-email-glommer@parallels.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Glauber. On Tue, Sep 04, 2012 at 06:18:15PM +0400, Glauber Costa wrote: > As we have been extensively discussing, the cost and pain points for cgroups > come from many places. But at least one of those is the arbitrary nature of > hierarchies. Many people, including at least Tejun and me would like this to go > away altogether. Problem so far, is breaking compatiblity with existing setups > > I am proposing here a default-n Kconfig option that will guarantee that the cpu > cgroups (for now) will be comounted. I started with them because the > cpu/cpuacct division is clearly the worst offender. Also, the default-n is here > so distributions will have time to adapt: Forcing this flag to be on without > userspace changes will just lead to cgroups failing to mount, which we don't > want. > > Although I've tested it and it works, I haven't compile-tested all possible > config combinations. So this is mostly for your eyes. If this gets traction, > I'll submit it properly, along with any changes that you might require. As I said during the discussion, I'm skeptical about how useful this is. This can't nudge existing users in any meaningfully gradual way. Kconfig doesn't make it any better. It's still an abrupt behavior change when seen from userland. Also, I really don't see much point in enforcing this almost arbitrary grouping of controllers. It doesn't simplify anything and using cpuacct in more granular way than cpu actually is one of the better justified use of multiple hierarchies. Also, what about memcg and blkcg? Do they *really* coincide? Note that both blkcg and memcg involve non-trivial overhead and blkcg is essentially broken hierarchy-wise. Currently, from userland visible behavior POV, the crazy parts are 1. The flat hierarchy thing. This just should go away. 2. Orthogonal multiple hierarchies. I think we agree that #1 should go away one way or the other. I *really* wanna get rid of #2 but am not sure how. I'll give it another stab once the writeback thing is resolved. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755657Ab2IEIOs (ORCPT ); Wed, 5 Sep 2012 04:14:48 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:49178 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751164Ab2IEIOp (ORCPT ); Wed, 5 Sep 2012 04:14:45 -0400 Date: Wed, 5 Sep 2012 01:14:39 -0700 From: Tejun Heo To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5047074D.1030104@parallels.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Glauber. On Wed, Sep 05, 2012 at 12:03:25PM +0400, Glauber Costa wrote: > The goal here is to have distributions to do it, because they tend to > have a well defined lifecycle management, much more than upstream. Whoever > sets this option, can coordinate with upstream. Distros can just co-mount them during boot. What's the point of the config options? > > Also, I really don't see much point in enforcing this almost arbitrary > > grouping of controllers. It doesn't simplify anything and using > > cpuacct in more granular way than cpu actually is one of the better > > justified use of multiple hierarchies. Also, what about memcg and > > blkcg? Do they *really* coincide? Note that both blkcg and memcg > > involve non-trivial overhead and blkcg is essentially broken > > hierarchy-wise. > > Where did I mention memcg or blkcg in this patch ? Differing hierarchies in memcg and blkcg currently is the most prominent case where the intersection in writeback is problematic and your proposed solution doesn't help one way or the other. What's the point? Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753404Ab2IEIGq (ORCPT ); Wed, 5 Sep 2012 04:06:46 -0400 Received: from mx2.parallels.com ([64.131.90.16]:56954 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750765Ab2IEIGn (ORCPT ); Wed, 5 Sep 2012 04:06:43 -0400 Message-ID: <5047074D.1030104@parallels.com> Date: Wed, 5 Sep 2012 12:03:25 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Tejun Heo CC: , , , , , , , , Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/05/2012 01:46 AM, Tejun Heo wrote: > Hello, Glauber. > > On Tue, Sep 04, 2012 at 06:18:15PM +0400, Glauber Costa wrote: >> As we have been extensively discussing, the cost and pain points for cgroups >> come from many places. But at least one of those is the arbitrary nature of >> hierarchies. Many people, including at least Tejun and me would like this to go >> away altogether. Problem so far, is breaking compatiblity with existing setups >> >> I am proposing here a default-n Kconfig option that will guarantee that the cpu >> cgroups (for now) will be comounted. I started with them because the >> cpu/cpuacct division is clearly the worst offender. Also, the default-n is here >> so distributions will have time to adapt: Forcing this flag to be on without >> userspace changes will just lead to cgroups failing to mount, which we don't >> want. >> >> Although I've tested it and it works, I haven't compile-tested all possible >> config combinations. So this is mostly for your eyes. If this gets traction, >> I'll submit it properly, along with any changes that you might require. > > As I said during the discussion, I'm skeptical about how useful this > is. This can't nudge existing users in any meaningfully gradual way. > Kconfig doesn't make it any better. It's still an abrupt behavior > change when seen from userland. > The goal here is to have distributions to do it, because they tend to have a well defined lifecycle management, much more than upstream. Whoever sets this option, can coordinate with upstream. Aside from enforcing it, we can pretty much warn() as well, to direct people towards flipping the switch. > Also, I really don't see much point in enforcing this almost arbitrary > grouping of controllers. It doesn't simplify anything and using > cpuacct in more granular way than cpu actually is one of the better > justified use of multiple hierarchies. Also, what about memcg and > blkcg? Do they *really* coincide? Note that both blkcg and memcg > involve non-trivial overhead and blkcg is essentially broken > hierarchy-wise. > Where did I mention memcg or blkcg in this patch ? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757594Ab2IEIUe (ORCPT ); Wed, 5 Sep 2012 04:20:34 -0400 Received: from mx2.parallels.com ([64.131.90.16]:49296 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753117Ab2IEIUb (ORCPT ); Wed, 5 Sep 2012 04:20:31 -0400 Message-ID: <50470A87.1040701@parallels.com> Date: Wed, 5 Sep 2012 12:17:11 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Tejun Heo CC: , , , , , , , , Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/05/2012 12:14 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:03:25PM +0400, Glauber Costa wrote: >> The goal here is to have distributions to do it, because they tend to >> have a well defined lifecycle management, much more than upstream. Whoever >> sets this option, can coordinate with upstream. > > Distros can just co-mount them during boot. What's the point of the > config options? > Pretty simple. The kernel can't assume the distro did. And then we still need to pay a stupid big price in the scheduler. After this patchset, We can assume this. And cpuusage can totally be derived from the cpu cgroup. Because much more than "they can comount", we can assume they did. >>> Also, I really don't see much point in enforcing this almost arbitrary >>> grouping of controllers. It doesn't simplify anything and using >>> cpuacct in more granular way than cpu actually is one of the better >>> justified use of multiple hierarchies. Also, what about memcg and >>> blkcg? Do they *really* coincide? Note that both blkcg and memcg >>> involve non-trivial overhead and blkcg is essentially broken >>> hierarchy-wise. >> >> Where did I mention memcg or blkcg in this patch ? > > Differing hierarchies in memcg and blkcg currently is the most > prominent case where the intersection in writeback is problematic and > your proposed solution doesn't help one way or the other. What's the > point? > The point is that I am focusing at one problem at a time. But FWIW, I don't see why memcg/blkcg can't use a step just like this one in a separate pass. If the goal is comounting them eventually, at some point when the issues are sorted out, just do it. Get a switch like this one, and then you will start being able to assume a lot of things in the code. Miracles can happen. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757487Ab2IEIaR (ORCPT ); Wed, 5 Sep 2012 04:30:17 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:41521 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750999Ab2IEI3w (ORCPT ); Wed, 5 Sep 2012 04:29:52 -0400 Date: Wed, 5 Sep 2012 01:29:47 -0700 From: Tejun Heo To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50470A87.1040701@parallels.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Glauber. On Wed, Sep 05, 2012 at 12:17:11PM +0400, Glauber Costa wrote: > > Distros can just co-mount them during boot. What's the point of the > > config options? > > Pretty simple. The kernel can't assume the distro did. And then we still > need to pay a stupid big price in the scheduler. > > After this patchset, We can assume this. And cpuusage can totally be > derived from the cpu cgroup. Because much more than "they can comount", > we can assume they did. As long as cpuacct and cpu are separate, I think it makes sense to assume that they at least could be at different granularity. As for optimization for co-mounted case, if that is *really* necessary, couldn't it be done dynamically? It's not like CONFIG_XXX blocks are pretty things and they're worse for runtime code path coverage. > > Differing hierarchies in memcg and blkcg currently is the most > > prominent case where the intersection in writeback is problematic and > > your proposed solution doesn't help one way or the other. What's the > > point? > > The point is that I am focusing at one problem at a time. But FWIW, I > don't see why memcg/blkcg can't use a step just like this one in a > separate pass. > > If the goal is comounting them eventually, at some point when the issues > are sorted out, just do it. Get a switch like this one, and then you > will start being able to assume a lot of things in the code. Miracles > can happen. The problem is that I really don't see how this leads to where we eventually wanna be. Orthogonal hierarchies are bad because, * It complicates the code. This doesn't really help there much. * Intersections between controllers are cumbersome to handle. Again, this doesn't help much. And this restricts the only valid use case for multiple hierarchies which is applying differing level of granularity depending on controllers. So, I don't know. Doesn't seem like a good idea to me. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757303Ab2IEIif (ORCPT ); Wed, 5 Sep 2012 04:38:35 -0400 Received: from mx2.parallels.com ([64.131.90.16]:44125 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751156Ab2IEIic (ORCPT ); Wed, 5 Sep 2012 04:38:32 -0400 Message-ID: <50470EBF.9070109@parallels.com> Date: Wed, 5 Sep 2012 12:35:11 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Tejun Heo CC: , , , , , , , , Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/05/2012 12:29 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:17:11PM +0400, Glauber Costa wrote: >>> Distros can just co-mount them during boot. What's the point of the >>> config options? >> >> Pretty simple. The kernel can't assume the distro did. And then we still >> need to pay a stupid big price in the scheduler. >> >> After this patchset, We can assume this. And cpuusage can totally be >> derived from the cpu cgroup. Because much more than "they can comount", >> we can assume they did. > > As long as cpuacct and cpu are separate, I think it makes sense to > assume that they at least could be at different granularity. If they are comounted, and more: forceably comounted, I don't see how to call them separate. At the very best, they are this way for compatibility purposes only, to lay a path that would allow us to get rid of the separation eventually. > As for > optimization for co-mounted case, if that is *really* necessary, > couldn't it be done dynamically? It's not like CONFIG_XXX blocks are > pretty things and they're worse for runtime code path coverage. > I've done it dynamically, as you know. But if you think that complicated the code less than this, we're operating by very different standards... CONFIG options can make the code uglier, but it is a lot more predictable. It also guarantee no state changes will happen during the lifecycle of the machine. Doing it dynamically makes the code prettier, but still extensively large, and prone to subtle bugs, as we've already seen in practice. >>> Differing hierarchies in memcg and blkcg currently is the most >>> prominent case where the intersection in writeback is problematic and >>> your proposed solution doesn't help one way or the other. What's the >>> point? >> >> The point is that I am focusing at one problem at a time. But FWIW, I >> don't see why memcg/blkcg can't use a step just like this one in a >> separate pass. >> >> If the goal is comounting them eventually, at some point when the issues >> are sorted out, just do it. Get a switch like this one, and then you >> will start being able to assume a lot of things in the code. Miracles >> can happen. > > The problem is that I really don't see how this leads to where we > eventually wanna be. Orthogonal hierarchies are bad because, > > * It complicates the code. This doesn't really help there much. > Way I see it, it is the price we pay for having screwed up before. And Kconfig options doesn't necessarily complicate the code. They make it bigger, and possibly slightly harder to follow. But I myself > * Intersections between controllers are cumbersome to handle. Again, > this doesn't help much. > They are only cumbersome because we can't assume nothing. The cpuacct is the perfect example. Once we can start assuming, they become a lot less so. > And this restricts the only valid use case for multiple hierarchies > which is applying differing level of granularity depending on > controllers. So, I don't know. Doesn't seem like a good idea to me. > > Thanks. > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758133Ab2IEIrv (ORCPT ); Wed, 5 Sep 2012 04:47:51 -0400 Received: from mail-pz0-f46.google.com ([209.85.210.46]:52663 "EHLO mail-pz0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756792Ab2IEIrp (ORCPT ); Wed, 5 Sep 2012 04:47:45 -0400 Date: Wed, 5 Sep 2012 01:47:40 -0700 From: Tejun Heo To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50470EBF.9070109@parallels.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Glauber. On Wed, Sep 05, 2012 at 12:35:11PM +0400, Glauber Costa wrote: > > As long as cpuacct and cpu are separate, I think it makes sense to > > assume that they at least could be at different granularity. > > If they are comounted, and more: forceably comounted, I don't see how to > call them separate. At the very best, they are this way for > compatibility purposes only, to lay a path that would allow us to get > rid of the separation eventually. I think this is where we disagree. I didn't mean that all controllers should be using exactly the same hierarchy when I was talking about unified hierarchy. I do think it's useful and maybe even essential to allow differing levels of granularity. cpu and cpuacct could be a valid example for this. Likely blkcg and memcg too. So, I think it's desirable for all controllers to be able to handle hierarchies the same way and to have the ability to tag something as belonging to certain group in the hierarchy for all controllers but I don't think it's desirable or feasible to require all of them to follow exactly the same grouping at all levels. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757952Ab2IEI6n (ORCPT ); Wed, 5 Sep 2012 04:58:43 -0400 Received: from mx2.parallels.com ([64.131.90.16]:60065 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751328Ab2IEI6m (ORCPT ); Wed, 5 Sep 2012 04:58:42 -0400 Message-ID: <50471379.3060603@parallels.com> Date: Wed, 5 Sep 2012 12:55:21 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Tejun Heo CC: , , , , , , , , Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/05/2012 12:47 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:35:11PM +0400, Glauber Costa wrote: >>> As long as cpuacct and cpu are separate, I think it makes sense to >>> assume that they at least could be at different granularity. >> >> If they are comounted, and more: forceably comounted, I don't see how to >> call them separate. At the very best, they are this way for >> compatibility purposes only, to lay a path that would allow us to get >> rid of the separation eventually. > > I think this is where we disagree. I didn't mean that all controllers > should be using exactly the same hierarchy when I was talking about > unified hierarchy. I do think it's useful and maybe even essential to > allow differing levels of granularity. cpu and cpuacct could be a > valid example for this. Likely blkcg and memcg too. > > So, I think it's desirable for all controllers to be able to handle > hierarchies the same way and to have the ability to tag something as > belonging to certain group in the hierarchy for all controllers but I > don't think it's desirable or feasible to require all of them to > follow exactly the same grouping at all levels. > By "different levels of granularity" do you mean having just a subset of them turned on at a particular place? If yes, having them guaranteed to be comounted is still perceived by me as a good first step. A natural following would be to turn them on/off on a per-group basis. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758182Ab2IEJG6 (ORCPT ); Wed, 5 Sep 2012 05:06:58 -0400 Received: from merlin.infradead.org ([205.233.59.134]:55075 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750765Ab2IEJGz convert rfc822-to-8bit (ORCPT ); Wed, 5 Sep 2012 05:06:55 -0400 Message-ID: <1346835993.2600.9.camel@twins> Subject: Re: [RFC 0/5] forced comounts for cgroups. From: Peter Zijlstra To: Tejun Heo Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Date: Wed, 05 Sep 2012 11:06:33 +0200 In-Reply-To: <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Mailer: Evolution 3.2.2- Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2012-09-05 at 01:47 -0700, Tejun Heo wrote: > I think this is where we disagree. I didn't mean that all controllers > should be using exactly the same hierarchy when I was talking about > unified hierarchy. I do think it's useful and maybe even essential to > allow differing levels of granularity. cpu and cpuacct could be a > valid example for this. Likely blkcg and memcg too. > > So, I think it's desirable for all controllers to be able to handle > hierarchies the same way and to have the ability to tag something as > belonging to certain group in the hierarchy for all controllers but I > don't think it's desirable or feasible to require all of them to > follow exactly the same grouping at all levels. *confused* I always thought that was exactly what you meant with unified hierarchy. Doing all this runtime is just going to make the mess even bigger, because now we have to deal with even more stupid cases. So either we go and try to contain this mess as proposed by Glauber or we go delete controllers.. I've had it with this crap. --- Documentation/cgroups/00-INDEX | 2 - Documentation/cgroups/cpuacct.txt | 49 -------- include/linux/cgroup_subsys.h | 6 - init/Kconfig | 6 - kernel/sched/core.c | 247 -------------------------------------- kernel/sched/fair.c | 1 - kernel/sched/rt.c | 1 - kernel/sched/sched.h | 45 ------- kernel/sched/stop_task.c | 1 - 9 files changed, 358 deletions(-) diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX index 3f58fa3..9f100cc 100644 --- a/Documentation/cgroups/00-INDEX +++ b/Documentation/cgroups/00-INDEX @@ -2,8 +2,6 @@ - this file cgroups.txt - Control Groups definition, implementation details, examples and API. -cpuacct.txt - - CPU Accounting Controller; account CPU usage for groups of tasks. cpusets.txt - documents the cpusets feature; assign CPUs and Mem to a set of tasks. devices.txt diff --git a/Documentation/cgroups/cpuacct.txt b/Documentation/cgroups/cpuacct.txt deleted file mode 100644 index 9d73cc0..0000000 --- a/Documentation/cgroups/cpuacct.txt +++ /dev/null @@ -1,49 +0,0 @@ -CPU Accounting Controller -------------------------- - -The CPU accounting controller is used to group tasks using cgroups and -account the CPU usage of these groups of tasks. - -The CPU accounting controller supports multi-hierarchy groups. An accounting -group accumulates the CPU usage of all of its child groups and the tasks -directly present in its group. - -Accounting groups can be created by first mounting the cgroup filesystem. - -# mount -t cgroup -ocpuacct none /sys/fs/cgroup - -With the above step, the initial or the parent accounting group becomes -visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in -the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. -/sys/fs/cgroup/cpuacct.usage gives the CPU time (in nanoseconds) obtained -by this group which is essentially the CPU time obtained by all the tasks -in the system. - -New accounting groups can be created under the parent group /sys/fs/cgroup. - -# cd /sys/fs/cgroup -# mkdir g1 -# echo $$ > g1/tasks - -The above steps create a new group g1 and move the current shell -process (bash) into it. CPU time consumed by this bash and its children -can be obtained from g1/cpuacct.usage and the same is accumulated in -/sys/fs/cgroup/cpuacct.usage also. - -cpuacct.stat file lists a few statistics which further divide the -CPU time obtained by the cgroup into user and system times. Currently -the following statistics are supported: - -user: Time spent by tasks of the cgroup in user mode. -system: Time spent by tasks of the cgroup in kernel mode. - -user and system are in USER_HZ unit. - -cpuacct controller uses percpu_counter interface to collect user and -system times. This has two side effects: - -- It is theoretically possible to see wrong values for user and system times. - This is because percpu_counter_read() on 32bit systems isn't safe - against concurrent writes. -- It is possible to see slightly outdated values for user and system times - due to the batch processing nature of percpu_counter. diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h index dfae957..73b7cc1 100644 --- a/include/linux/cgroup_subsys.h +++ b/include/linux/cgroup_subsys.h @@ -25,12 +25,6 @@ SUBSYS(cpu_cgroup) /* */ -#ifdef CONFIG_CGROUP_CPUACCT -SUBSYS(cpuacct) -#endif - -/* */ - #ifdef CONFIG_MEMCG SUBSYS(mem_cgroup) #endif diff --git a/init/Kconfig b/init/Kconfig index af6c7f8..3ac9e1c 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -674,12 +674,6 @@ config PROC_PID_CPUSET depends on CPUSETS default y -config CGROUP_CPUACCT - bool "Simple CPU accounting cgroup subsystem" - help - Provides a simple Resource Controller for monitoring the - total CPU consumed by the tasks in a cgroup. - config RESOURCE_COUNTERS bool "Resource counters" help diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4376c9f..47c7cdb 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2809,18 +2809,9 @@ unsigned long long task_sched_runtime(struct task_struct *p) return ns; } -#ifdef CONFIG_CGROUP_CPUACCT -struct cgroup_subsys cpuacct_subsys; -struct cpuacct root_cpuacct; -#endif - static inline void task_group_account_field(struct task_struct *p, int index, u64 tmp) { -#ifdef CONFIG_CGROUP_CPUACCT - struct kernel_cpustat *kcpustat; - struct cpuacct *ca; -#endif /* * Since all updates are sure to touch the root cgroup, we * get ourselves ahead and touch it first. If the root cgroup @@ -2828,20 +2819,6 @@ static inline void task_group_account_field(struct task_struct *p, int index, * */ __get_cpu_var(kernel_cpustat).cpustat[index] += tmp; - -#ifdef CONFIG_CGROUP_CPUACCT - if (unlikely(!cpuacct_subsys.active)) - return; - - rcu_read_lock(); - ca = task_ca(p); - while (ca && (ca != &root_cpuacct)) { - kcpustat = this_cpu_ptr(ca->cpustat); - kcpustat->cpustat[index] += tmp; - ca = parent_ca(ca); - } - rcu_read_unlock(); -#endif } @@ -7351,12 +7328,6 @@ void __init sched_init(void) #endif /* CONFIG_CGROUP_SCHED */ -#ifdef CONFIG_CGROUP_CPUACCT - root_cpuacct.cpustat = &kernel_cpustat; - root_cpuacct.cpuusage = alloc_percpu(u64); - /* Too early, not expected to fail */ - BUG_ON(!root_cpuacct.cpuusage); -#endif for_each_possible_cpu(i) { struct rq *rq; @@ -8409,221 +8380,3 @@ struct cgroup_subsys cpu_cgroup_subsys = { }; #endif /* CONFIG_CGROUP_SCHED */ - -#ifdef CONFIG_CGROUP_CPUACCT - -/* - * CPU accounting code for task groups. - * - * Based on the work by Paul Menage (menage@google.com) and Balbir Singh - * (balbir@in.ibm.com). - */ - -/* create a new cpu accounting group */ -static struct cgroup_subsys_state *cpuacct_create(struct cgroup *cgrp) -{ - struct cpuacct *ca; - - if (!cgrp->parent) - return &root_cpuacct.css; - - ca = kzalloc(sizeof(*ca), GFP_KERNEL); - if (!ca) - goto out; - - ca->cpuusage = alloc_percpu(u64); - if (!ca->cpuusage) - goto out_free_ca; - - ca->cpustat = alloc_percpu(struct kernel_cpustat); - if (!ca->cpustat) - goto out_free_cpuusage; - - return &ca->css; - -out_free_cpuusage: - free_percpu(ca->cpuusage); -out_free_ca: - kfree(ca); -out: - return ERR_PTR(-ENOMEM); -} - -/* destroy an existing cpu accounting group */ -static void cpuacct_destroy(struct cgroup *cgrp) -{ - struct cpuacct *ca = cgroup_ca(cgrp); - - free_percpu(ca->cpustat); - free_percpu(ca->cpuusage); - kfree(ca); -} - -static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu) -{ - u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - u64 data; - -#ifndef CONFIG_64BIT - /* - * Take rq->lock to make 64-bit read safe on 32-bit platforms. - */ - raw_spin_lock_irq(&cpu_rq(cpu)->lock); - data = *cpuusage; - raw_spin_unlock_irq(&cpu_rq(cpu)->lock); -#else - data = *cpuusage; -#endif - - return data; -} - -static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) -{ - u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - -#ifndef CONFIG_64BIT - /* - * Take rq->lock to make 64-bit write safe on 32-bit platforms. - */ - raw_spin_lock_irq(&cpu_rq(cpu)->lock); - *cpuusage = val; - raw_spin_unlock_irq(&cpu_rq(cpu)->lock); -#else - *cpuusage = val; -#endif -} - -/* return total cpu usage (in nanoseconds) of a group */ -static u64 cpuusage_read(struct cgroup *cgrp, struct cftype *cft) -{ - struct cpuacct *ca = cgroup_ca(cgrp); - u64 totalcpuusage = 0; - int i; - - for_each_present_cpu(i) - totalcpuusage += cpuacct_cpuusage_read(ca, i); - - return totalcpuusage; -} - -static int cpuusage_write(struct cgroup *cgrp, struct cftype *cftype, - u64 reset) -{ - struct cpuacct *ca = cgroup_ca(cgrp); - int err = 0; - int i; - - if (reset) { - err = -EINVAL; - goto out; - } - - for_each_present_cpu(i) - cpuacct_cpuusage_write(ca, i, 0); - -out: - return err; -} - -static int cpuacct_percpu_seq_read(struct cgroup *cgroup, struct cftype *cft, - struct seq_file *m) -{ - struct cpuacct *ca = cgroup_ca(cgroup); - u64 percpu; - int i; - - for_each_present_cpu(i) { - percpu = cpuacct_cpuusage_read(ca, i); - seq_printf(m, "%llu ", (unsigned long long) percpu); - } - seq_printf(m, "\n"); - return 0; -} - -static const char *cpuacct_stat_desc[] = { - [CPUACCT_STAT_USER] = "user", - [CPUACCT_STAT_SYSTEM] = "system", -}; - -static int cpuacct_stats_show(struct cgroup *cgrp, struct cftype *cft, - struct cgroup_map_cb *cb) -{ - struct cpuacct *ca = cgroup_ca(cgrp); - int cpu; - s64 val = 0; - - for_each_online_cpu(cpu) { - struct kernel_cpustat *kcpustat = per_cpu_ptr(ca->cpustat, cpu); - val += kcpustat->cpustat[CPUTIME_USER]; - val += kcpustat->cpustat[CPUTIME_NICE]; - } - val = cputime64_to_clock_t(val); - cb->fill(cb, cpuacct_stat_desc[CPUACCT_STAT_USER], val); - - val = 0; - for_each_online_cpu(cpu) { - struct kernel_cpustat *kcpustat = per_cpu_ptr(ca->cpustat, cpu); - val += kcpustat->cpustat[CPUTIME_SYSTEM]; - val += kcpustat->cpustat[CPUTIME_IRQ]; - val += kcpustat->cpustat[CPUTIME_SOFTIRQ]; - } - - val = cputime64_to_clock_t(val); - cb->fill(cb, cpuacct_stat_desc[CPUACCT_STAT_SYSTEM], val); - - return 0; -} - -static struct cftype files[] = { - { - .name = "usage", - .read_u64 = cpuusage_read, - .write_u64 = cpuusage_write, - }, - { - .name = "usage_percpu", - .read_seq_string = cpuacct_percpu_seq_read, - }, - { - .name = "stat", - .read_map = cpuacct_stats_show, - }, - { } /* terminate */ -}; - -/* - * charge this task's execution time to its accounting group. - * - * called with rq->lock held. - */ -void cpuacct_charge(struct task_struct *tsk, u64 cputime) -{ - struct cpuacct *ca; - int cpu; - - if (unlikely(!cpuacct_subsys.active)) - return; - - cpu = task_cpu(tsk); - - rcu_read_lock(); - - ca = task_ca(tsk); - - for (; ca; ca = parent_ca(ca)) { - u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - *cpuusage += cputime; - } - - rcu_read_unlock(); -} - -struct cgroup_subsys cpuacct_subsys = { - .name = "cpuacct", - .create = cpuacct_create, - .destroy = cpuacct_destroy, - .subsys_id = cpuacct_subsys_id, - .base_cftypes = files, -}; -#endif /* CONFIG_CGROUP_CPUACCT */ diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 01d3eda..bff5b6e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -706,7 +706,6 @@ static void update_curr(struct cfs_rq *cfs_rq) struct task_struct *curtask = task_of(curr); trace_sched_stat_runtime(curtask, delta_exec, curr->vruntime); - cpuacct_charge(curtask, delta_exec); account_group_exec_runtime(curtask, delta_exec); } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 944cb68..8e5805e 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -934,7 +934,6 @@ static void update_curr_rt(struct rq *rq) account_group_exec_runtime(curr, delta_exec); curr->se.exec_start = rq->clock_task; - cpuacct_charge(curr, delta_exec); sched_rt_avg_update(rq, delta_exec); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index f6714d0..00ca3f6 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -833,15 +833,6 @@ static const u32 prio_to_wmult[40] = { /* 15 */ 119304647, 148102320, 186737708, 238609294, 286331153, }; -/* Time spent by the tasks of the cpu accounting group executing in ... */ -enum cpuacct_stat_index { - CPUACCT_STAT_USER, /* ... user mode */ - CPUACCT_STAT_SYSTEM, /* ... kernel mode */ - - CPUACCT_STAT_NSTATS, -}; - - #define sched_class_highest (&stop_sched_class) #define for_each_class(class) \ for (class = sched_class_highest; class; class = class->next) @@ -881,42 +872,6 @@ extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime extern void update_idle_cpu_load(struct rq *this_rq); -#ifdef CONFIG_CGROUP_CPUACCT -#include -/* track cpu usage of a group of tasks and its child groups */ -struct cpuacct { - struct cgroup_subsys_state css; - /* cpuusage holds pointer to a u64-type object on every cpu */ - u64 __percpu *cpuusage; - struct kernel_cpustat __percpu *cpustat; -}; - -/* return cpu accounting group corresponding to this container */ -static inline struct cpuacct *cgroup_ca(struct cgroup *cgrp) -{ - return container_of(cgroup_subsys_state(cgrp, cpuacct_subsys_id), - struct cpuacct, css); -} - -/* return cpu accounting group to which this task belongs */ -static inline struct cpuacct *task_ca(struct task_struct *tsk) -{ - return container_of(task_subsys_state(tsk, cpuacct_subsys_id), - struct cpuacct, css); -} - -static inline struct cpuacct *parent_ca(struct cpuacct *ca) -{ - if (!ca || !ca->css.cgroup->parent) - return NULL; - return cgroup_ca(ca->css.cgroup->parent); -} - -extern void cpuacct_charge(struct task_struct *tsk, u64 cputime); -#else -static inline void cpuacct_charge(struct task_struct *tsk, u64 cputime) {} -#endif - static inline void inc_nr_running(struct rq *rq) { rq->nr_running++; diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c index da5eb5b..fda1cbe 100644 --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -68,7 +68,6 @@ static void put_prev_task_stop(struct rq *rq, struct task_struct *prev) account_group_exec_runtime(curr, delta_exec); curr->se.exec_start = rq->clock_task; - cpuacct_charge(curr, delta_exec); } static void task_tick_stop(struct rq *rq, struct task_struct *curr, int queued) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758245Ab2IEJHg (ORCPT ); Wed, 5 Sep 2012 05:07:36 -0400 Received: from merlin.infradead.org ([205.233.59.134]:56036 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751031Ab2IEJHe convert rfc822-to-8bit (ORCPT ); Wed, 5 Sep 2012 05:07:34 -0400 Message-ID: <1346836041.2600.10.camel@twins> Subject: Re: [RFC 0/5] forced comounts for cgroups. From: Peter Zijlstra To: Tejun Heo Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Date: Wed, 05 Sep 2012 11:07:21 +0200 In-Reply-To: <1346835993.2600.9.camel@twins> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Mailer: Evolution 3.2.2- Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2012-09-05 at 11:06 +0200, Peter Zijlstra wrote: > > So either we go and try to contain this mess as proposed by Glauber or > we go delete controllers.. I've had it with this crap. > > Glauber, the other approach is sending a patch that doesn't touch cgroup.c but only the controllers and I'll merge it regardless of what tj thinks. We need some movement here. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758276Ab2IEJHv (ORCPT ); Wed, 5 Sep 2012 05:07:51 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:58695 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751031Ab2IEJHt (ORCPT ); Wed, 5 Sep 2012 05:07:49 -0400 Date: Wed, 5 Sep 2012 02:07:44 -0700 From: Tejun Heo To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905090744.GG3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471379.3060603@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50471379.3060603@parallels.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Glauber. On Wed, Sep 05, 2012 at 12:55:21PM +0400, Glauber Costa wrote: > > So, I think it's desirable for all controllers to be able to handle > > hierarchies the same way and to have the ability to tag something as > > belonging to certain group in the hierarchy for all controllers but I > > don't think it's desirable or feasible to require all of them to > > follow exactly the same grouping at all levels. > > By "different levels of granularity" do you mean having just a subset of > them turned on at a particular place? Heh, this is tricky to describe and I'm not really following what you mean. They're all on the same tree but a controller should be able to handle a given subtree as single group. e.g. if you draw the tree, different controllers should be able to draw different enclosing circles and operate on the simplifed tree. How flexible that should be, I don't know. Maybe it would be enough to be able to say "treat all children of this node as belonging to this node for controllers X and Y". > If yes, having them guaranteed to be comounted is still perceived by me > as a good first step. A natural following would be to turn them on/off > on a per-group basis. I don't agree with that. If we do it that way, we would lose differing granularity from forcing co-mounting and then restore it later when the subtree handling is implemented. If we can do away with differing granularity, that's fine; otherwise, it doesn't make much sense to remove and then restore it. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758263Ab2IEJKC (ORCPT ); Wed, 5 Sep 2012 05:10:02 -0400 Received: from mx2.parallels.com ([64.131.90.16]:54374 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751263Ab2IEJJ7 (ORCPT ); Wed, 5 Sep 2012 05:09:59 -0400 Message-ID: <5047161F.60503@parallels.com> Date: Wed, 5 Sep 2012 13:06:39 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Tejun Heo CC: , , , , , , , , Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471379.3060603@parallels.com> <20120905090744.GG3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905090744.GG3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/05/2012 01:07 PM, Tejun Heo wrote: > Hello, Glauber. > > On Wed, Sep 05, 2012 at 12:55:21PM +0400, Glauber Costa wrote: >>> So, I think it's desirable for all controllers to be able to handle >>> hierarchies the same way and to have the ability to tag something as >>> belonging to certain group in the hierarchy for all controllers but I >>> don't think it's desirable or feasible to require all of them to >>> follow exactly the same grouping at all levels. >> >> By "different levels of granularity" do you mean having just a subset of >> them turned on at a particular place? > > Heh, this is tricky to describe and I'm not really following what you > mean. Do we really want to start cleaning up all this by changing the interface to something that is described as "tricky" ? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758299Ab2IEJPy (ORCPT ); Wed, 5 Sep 2012 05:15:54 -0400 Received: from mx2.parallels.com ([64.131.90.16]:34437 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750990Ab2IEJPw (ORCPT ); Wed, 5 Sep 2012 05:15:52 -0400 Message-ID: <50471782.6060800@parallels.com> Date: Wed, 5 Sep 2012 13:12:34 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Tejun Heo CC: Peter Zijlstra , , , , , , , , Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/05/2012 01:11 PM, Tejun Heo wrote: > Hello, Peter. > > On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: >> *confused* I always thought that was exactly what you meant with unified >> hierarchy. > > No, I never counted out differing granularity. > Can you elaborate on which interface do you envision to make it work? They will clearly be mounted in the same hierarchy, or as said alternatively, comounted. If you can turn them on/off on a per-subtree basis, which interface exactly do you propose for that? Would a pair of cgroup core files like available_controllers and current_controllers are a lot of drivers do, suffice? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758266Ab2IEJLr (ORCPT ); Wed, 5 Sep 2012 05:11:47 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:63413 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753211Ab2IEJLq (ORCPT ); Wed, 5 Sep 2012 05:11:46 -0400 Date: Wed, 5 Sep 2012 02:11:40 -0700 From: Tejun Heo To: Peter Zijlstra Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1346835993.2600.9.camel@twins> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Peter. On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > *confused* I always thought that was exactly what you meant with unified > hierarchy. No, I never counted out differing granularity. > Doing all this runtime is just going to make the mess even bigger, > because now we have to deal with even more stupid cases. > > So either we go and try to contain this mess as proposed by Glauber or > we go delete controllers.. I've had it with this crap. If cpuacct can really go away, that's great, but I don't think the problem at hand is unsolvable, so let's not jump it. cpuacct and cpu aren't the onlfy problem cases after all. We need to solve it for other controllers too. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758278Ab2IEJPG (ORCPT ); Wed, 5 Sep 2012 05:15:06 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:59179 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751143Ab2IEJPD (ORCPT ); Wed, 5 Sep 2012 05:15:03 -0400 Date: Wed, 5 Sep 2012 02:14:56 -0700 From: Tejun Heo To: Glauber Costa Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, a.p.zijlstra@chello.nl, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905091456.GI3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471379.3060603@parallels.com> <20120905090744.GG3195@dhcp-172-17-108-109.mtv.corp.google.com> <5047161F.60503@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5047161F.60503@parallels.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 05, 2012 at 01:06:39PM +0400, Glauber Costa wrote: > > Heh, this is tricky to describe and I'm not really following what you > > mean. > > Do we really want to start cleaning up all this by changing the > interface to something that is described as "tricky" ? The concept is not tricky. I just can't find the appropriate words. I *suspect* this can mostly re-use the existing css_set thing. It mostly becomes that css_set belongs to the unified hierarchy rather than each task. The user interface part isn't trivial and maybe "don't nest beyond this level" is the only thing reasonable. Not sure yet whether that would be enough tho. Need to think more about it. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757215Ab2IEJTd (ORCPT ); Wed, 5 Sep 2012 05:19:33 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:54008 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750765Ab2IEJTb (ORCPT ); Wed, 5 Sep 2012 05:19:31 -0400 Date: Wed, 5 Sep 2012 02:19:25 -0700 From: Tejun Heo To: Glauber Costa Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905091925.GJ3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50471782.6060800@parallels.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 05, 2012 at 01:12:34PM +0400, Glauber Costa wrote: > > No, I never counted out differing granularity. > > Can you elaborate on which interface do you envision to make it work? > They will clearly be mounted in the same hierarchy, or as said > alternatively, comounted. I'm not sure yet. At the simplest, mask of controllers which should honor (or ignore) nesting beyond the node. That should be understandable enough. Not sure whether that would be flexible enough yet tho. In the end, they should be comounted but again I don't think enforcing comounting at the moment is a step towards that. It's more like a step sideways. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758527Ab2IEJWZ (ORCPT ); Wed, 5 Sep 2012 05:22:25 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:60002 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758371Ab2IEJWV (ORCPT ); Wed, 5 Sep 2012 05:22:21 -0400 Date: Wed, 5 Sep 2012 02:22:16 -0700 From: Tejun Heo To: Peter Zijlstra Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905092216.GK3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <1346836041.2600.10.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1346836041.2600.10.camel@twins> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hey, On Wed, Sep 05, 2012 at 11:07:21AM +0200, Peter Zijlstra wrote: > Glauber, the other approach is sending a patch that doesn't touch > cgroup.c but only the controllers and I'll merge it regardless of what > tj thinks. > > We need some movement here. Peter, I don't think the proposed patch is helpful at this point. While movement is necessary, it's not like moving towards any direction is helpful. They might just become another cruft which needs to be maintained. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758324Ab2IEJ1G (ORCPT ); Wed, 5 Sep 2012 05:27:06 -0400 Received: from casper.infradead.org ([85.118.1.10]:50549 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751257Ab2IEJ1E convert rfc822-to-8bit (ORCPT ); Wed, 5 Sep 2012 05:27:04 -0400 Message-ID: <1346837209.2600.14.camel@twins> Subject: Re: [RFC 0/5] forced comounts for cgroups. From: Peter Zijlstra To: Glauber Costa Cc: Tejun Heo , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Date: Wed, 05 Sep 2012 11:26:49 +0200 In-Reply-To: <50471782.6060800@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Mailer: Evolution 3.2.2- Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2012-09-05 at 13:12 +0400, Glauber Costa wrote: > On 09/05/2012 01:11 PM, Tejun Heo wrote: > > Hello, Peter. > > > > On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > >> *confused* I always thought that was exactly what you meant with unified > >> hierarchy. > > > > No, I never counted out differing granularity. > > > > Can you elaborate on which interface do you envision to make it work? > They will clearly be mounted in the same hierarchy, or as said > alternatively, comounted. > > If you can turn them on/off on a per-subtree basis, which interface > exactly do you propose for that? I wouldn't, screw that. That would result in the exact same problem we're trying to fix. I want a single hierarchy walk, that's expensive enough. > Would a pair of cgroup core files like available_controllers and > current_controllers are a lot of drivers do, suffice? No.. its not a 'feature' I care to support for 'my' controllers. I simply don't want to have to do two (or more) hierarchy walks for accounting on every schedule event, all that pointer chasing is stupidly expensive. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758288Ab2IEJcM (ORCPT ); Wed, 5 Sep 2012 05:32:12 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:56278 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757066Ab2IEJcK (ORCPT ); Wed, 5 Sep 2012 05:32:10 -0400 Date: Wed, 5 Sep 2012 02:32:04 -0700 From: Tejun Heo To: Peter Zijlstra Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1346835993.2600.9.camel@twins> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hey, again. On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > Doing all this runtime is just going to make the mess even bigger, > because now we have to deal with even more stupid cases. > > So either we go and try to contain this mess as proposed by Glauber or > we go delete controllers.. I've had it with this crap. cpuacct is rather unique tho. I think it's gonna be silly whether the hierarchy is unified or not. 1. If they always can live on the exact same hierarchy, there's no point in having the two separate. Just merge them. 2. If they need differing levels of granularity, they either need to do it completely separately as they do now or have some form of dynamic optimization if absolutely necesary. So, I think that choice is rather separate from other issues. If cpuacct is gonna be kept, I'd just keep it separate and warn that it incurs extra overhead for the current users if for nothing else. Otherwise, kill it or merge it into cpu. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758241Ab2IEJdn (ORCPT ); Wed, 5 Sep 2012 05:33:43 -0400 Received: from mx2.parallels.com ([64.131.90.16]:53466 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757896Ab2IEJdl (ORCPT ); Wed, 5 Sep 2012 05:33:41 -0400 Message-ID: <50471BAF.2060708@parallels.com> Date: Wed, 5 Sep 2012 13:30:23 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Tejun Heo CC: Peter Zijlstra , , , , , , , , Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <20120905091925.GJ3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905091925.GJ3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/05/2012 01:19 PM, Tejun Heo wrote: > On Wed, Sep 05, 2012 at 01:12:34PM +0400, Glauber Costa wrote: >>> No, I never counted out differing granularity. >> >> Can you elaborate on which interface do you envision to make it work? >> They will clearly be mounted in the same hierarchy, or as said >> alternatively, comounted. > > I'm not sure yet. At the simplest, mask of controllers which should > honor (or ignore) nesting beyond the node. That should be > understandable enough. Not sure whether that would be flexible enough > yet tho. In the end, they should be comounted but again I don't think > enforcing comounting at the moment is a step towards that. It's more > like a step sideways. > Tejun, >>From the code PoV, guaranteed comounting is what allow us to make optimizations. "Maybe comounting" will maybe simplify the interface, but will buy us nothing in the performance level. I am more than happy to respin it with an added interface for masking cgroups, if you believe this is a requirement. But hinting me about what you would like to see on that front would be really helpful. Re-asking my question: cpufreq, clocksources, ftrace, etc, they all use an interface that at this point can be considered quite standard. Applying the same logic, each cgroup would have a pair of files: available_controllers, current_controllers, that you can just control by writing to. This can get slightly funny when we consider the right semantics for the hierarchy, but really, everything will. And it is not like we'll have anything crazy, we just need to tailor it with care. If you think there is any chance of this getting us somewhere, I'll code it. But that would be something to be sent *together* with what I've just done. As I've said, if we can't guarantee the comounting, we would still lose all the optimization opportunities. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758303Ab2IEJfQ (ORCPT ); Wed, 5 Sep 2012 05:35:16 -0400 Received: from mx2.parallels.com ([64.131.90.16]:53594 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758099Ab2IEJfO (ORCPT ); Wed, 5 Sep 2012 05:35:14 -0400 Message-ID: <50471C0C.7050600@parallels.com> Date: Wed, 5 Sep 2012 13:31:56 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Peter Zijlstra CC: Tejun Heo , , , , , , , , Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> In-Reply-To: <1346837209.2600.14.camel@twins> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/05/2012 01:26 PM, Peter Zijlstra wrote: > On Wed, 2012-09-05 at 13:12 +0400, Glauber Costa wrote: >> On 09/05/2012 01:11 PM, Tejun Heo wrote: >>> Hello, Peter. >>> >>> On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: >>>> *confused* I always thought that was exactly what you meant with unified >>>> hierarchy. >>> >>> No, I never counted out differing granularity. >>> >> >> Can you elaborate on which interface do you envision to make it work? >> They will clearly be mounted in the same hierarchy, or as said >> alternatively, comounted. >> >> If you can turn them on/off on a per-subtree basis, which interface >> exactly do you propose for that? > > I wouldn't, screw that. That would result in the exact same problem > we're trying to fix. I want a single hierarchy walk, that's expensive > enough. > >> Would a pair of cgroup core files like available_controllers and >> current_controllers are a lot of drivers do, suffice? > > No.. its not a 'feature' I care to support for 'my' controllers. > > I simply don't want to have to do two (or more) hierarchy walks for > accounting on every schedule event, all that pointer chasing is stupidly > expensive. > You wouldn't have to do more than one hierarchy walks for that. What Tejun seems to want, is the ability to not have a particular controller at some point in the tree. But if they exist, they are always together. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758447Ab2IEJpa (ORCPT ); Wed, 5 Sep 2012 05:45:30 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:56057 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751358Ab2IEJp0 (ORCPT ); Wed, 5 Sep 2012 05:45:26 -0400 Date: Wed, 5 Sep 2012 02:45:20 -0700 From: Tejun Heo To: Glauber Costa Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905094520.GM3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50471C0C.7050600@parallels.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Wed, Sep 05, 2012 at 01:31:56PM +0400, Glauber Costa wrote: > > I simply don't want to have to do two (or more) hierarchy walks for > > accounting on every schedule event, all that pointer chasing is stupidly > > expensive. > > You wouldn't have to do more than one hierarchy walks for that. What > Tejun seems to want, is the ability to not have a particular controller > at some point in the tree. But if they exist, they are always together. Nope, as I wrote in the other reply, for cpu and cpuacct, either just merge them or kill cpuacct if you want to avoid silliness from walking multiple times. Does cpuset cause problem in this regard too? Or can it be handled similarly to other controllers? I think the confusion here is that we're talking about two different issues. As for cpuacct, I can see why strict co-mounting can be attractive but then again if that's gonna be required, there's no point in having them separate, right? If that's the way you want it, just trigger WARN_ON() if cpu and cpuacct aren't co-mounted and later on kill cpuacct. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932260Ab2IEJvy (ORCPT ); Wed, 5 Sep 2012 05:51:54 -0400 Received: from mx2.parallels.com ([64.131.90.16]:40947 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753211Ab2IEJvw (ORCPT ); Wed, 5 Sep 2012 05:51:52 -0400 Message-ID: <50471FEE.8060408@parallels.com> Date: Wed, 5 Sep 2012 13:48:30 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Tejun Heo CC: Peter Zijlstra , , , , , , , , Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <20120905094520.GM3195@dhcp-172-17-108-109.mtv.corp.google.com> In-Reply-To: <20120905094520.GM3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/05/2012 01:45 PM, Tejun Heo wrote: > Hello, > > On Wed, Sep 05, 2012 at 01:31:56PM +0400, Glauber Costa wrote: >>> > > I simply don't want to have to do two (or more) hierarchy walks for >>> > > accounting on every schedule event, all that pointer chasing is stupidly >>> > > expensive. >> > >> > You wouldn't have to do more than one hierarchy walks for that. What >> > Tejun seems to want, is the ability to not have a particular controller >> > at some point in the tree. But if they exist, they are always together. > Nope, as I wrote in the other reply, Would you mind, then, stopping for a moment and telling us what it is, then, that you envision? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758325Ab2IEJ4O (ORCPT ); Wed, 5 Sep 2012 05:56:14 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:41489 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751493Ab2IEJ4M (ORCPT ); Wed, 5 Sep 2012 05:56:12 -0400 Date: Wed, 5 Sep 2012 02:56:06 -0700 From: Tejun Heo To: Glauber Costa Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120905095606.GN3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <20120905094520.GM3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471FEE.8060408@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <50471FEE.8060408@parallels.com> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 05, 2012 at 01:48:30PM +0400, Glauber Costa wrote: > > Nope, as I wrote in the other reply, > > Would you mind, then, stopping for a moment and telling us what it is, > then, that you envision? I thought I already explained it a couple times in this thread (also in the big thread from several months ago). It's nearing three in the morning here. I'll try to explain it better tomorrow. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758489Ab2IEKFK (ORCPT ); Wed, 5 Sep 2012 06:05:10 -0400 Received: from merlin.infradead.org ([205.233.59.134]:41753 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753245Ab2IEKFH convert rfc822-to-8bit (ORCPT ); Wed, 5 Sep 2012 06:05:07 -0400 Message-ID: <1346839487.2600.24.camel@twins> Subject: Re: [RFC 0/5] forced comounts for cgroups. From: Peter Zijlstra To: Tejun Heo Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Date: Wed, 05 Sep 2012 12:04:47 +0200 In-Reply-To: <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT X-Mailer: Evolution 3.2.2- Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2012-09-05 at 02:32 -0700, Tejun Heo wrote: > Hey, again. > > On Wed, Sep 05, 2012 at 11:06:33AM +0200, Peter Zijlstra wrote: > > Doing all this runtime is just going to make the mess even bigger, > > because now we have to deal with even more stupid cases. > > > > So either we go and try to contain this mess as proposed by Glauber or > > we go delete controllers.. I've had it with this crap. > > cpuacct is rather unique tho. I think it's gonna be silly whether the > hierarchy is unified or not. > > 1. If they always can live on the exact same hierarchy, there's no > point in having the two separate. Just merge them. > > 2. If they need differing levels of granularity, they either need to > do it completely separately as they do now or have some form of > dynamic optimization if absolutely necesary. > > So, I think that choice is rather separate from other issues. If > cpuacct is gonna be kept, I'd just keep it separate and warn that it > incurs extra overhead for the current users if for nothing else. > Otherwise, kill it or merge it into cpu. Quite, hence my 'proposal' to remove cpuacct. There was some whining last time Glauber proposed this, but the one whining never convinced and has gone away from Linux, so lets just do this. Lets make cpuacct print a deprecated msg to dmesg for a few releases and make cpu do all this. The co-mounting stuff would have been nice for cpusets as well, knowing all your tasks are affine to a subset of cpus allows for a few optimizations (smaller cpumask iterations), but I guess we'll have to do that dynamically, we'll just have to see how ugly that is. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758236Ab2IEKVC (ORCPT ); Wed, 5 Sep 2012 06:21:02 -0400 Received: from merlin.infradead.org ([205.233.59.134]:36744 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751298Ab2IEKVA (ORCPT ); Wed, 5 Sep 2012 06:21:00 -0400 Subject: Re: [RFC 0/5] forced comounts for cgroups. From: Peter Zijlstra To: Glauber Costa Cc: Tejun Heo , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org In-Reply-To: <50471C0C.7050600@parallels.com> References: <1346768300-10282-1-git-send-email-glommer@parallels.com> <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 05 Sep 2012 12:20:53 +0200 Message-ID: <1346840453.2461.6.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2012-09-05 at 13:31 +0400, Glauber Costa wrote: > > You wouldn't have to do more than one hierarchy walks for that. What > Tejun seems to want, is the ability to not have a particular controller > at some point in the tree. But if they exist, they are always together. Right, but the accounting is very much tied to the control structures, I suppose we could change that, but my jet-leg addled brain isn't seeing anything particularly nice atm. But I don't really see the point though, this kind of interface would only ever work for the non-controlling and controlling controller combination (confused yet ;-), and I don't think we have many of those. I would really rather see a simplification of the entire cgroup interface space as opposed to making it more complex. And adding this subtree 'feature' only makes it more complex. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759922Ab2IFUir (ORCPT ); Thu, 6 Sep 2012 16:38:47 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:45392 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759795Ab2IFUio (ORCPT ); Thu, 6 Sep 2012 16:38:44 -0400 Date: Thu, 6 Sep 2012 13:38:39 -0700 From: Tejun Heo To: Peter Zijlstra Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120906203839.GM29092@google.com> References: <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <1346840453.2461.6.camel@laptop> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1346840453.2461.6.camel@laptop> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Peter, Glauber. (I'm gonna write up cgroup core todos which should explain / address this issue too. ATM I'm a bit overwhelmed with stuff accumulated while traveling.) On Wed, Sep 05, 2012 at 12:20:53PM +0200, Peter Zijlstra wrote: > But I don't really see the point though, this kind of interface would > only ever work for the non-controlling and controlling controller > combination (confused yet ;-), and I don't think we have many of those. It's more than that. One may not want to apply the same level of granularity to different resources. e.g. depending on the setup, IOs may need to be further categorized and controlled than memory or vice versa. > I would really rather see a simplification of the entire cgroup > interface space as opposed to making it more complex. And adding this > subtree 'feature' only makes it more complex. It does in the meantime but I think most of it can piggyback on the existing css_set mechanism. No matter what we do, this isn't gonna be a short and easy transition. More than half of the controllers don't even support proper hierarchy yet. We can't move to any kind of unified hierarchy without getting that settled first. I *think* I have a plan which can mostly work now. I'll write more later. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759977Ab2IFUqu (ORCPT ); Thu, 6 Sep 2012 16:46:50 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:36779 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759858Ab2IFUqs (ORCPT ); Thu, 6 Sep 2012 16:46:48 -0400 Date: Thu, 6 Sep 2012 13:46:42 -0700 From: Tejun Heo To: Peter Zijlstra Cc: Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org, Dhaval Giani , Frederic Weisbecker Subject: Re: [RFC 0/5] forced comounts for cgroups. Message-ID: <20120906204642.GN29092@google.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346839487.2600.24.camel@twins> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1346839487.2600.24.camel@twins> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, cc'ing Dhaval and Frederic. They were interested in the subject before and Dhaval was pretty vocal about cpuacct having a separate hierarchy (or at least granularity). On Wed, Sep 05, 2012 at 12:04:47PM +0200, Peter Zijlstra wrote: > > cpuacct is rather unique tho. I think it's gonna be silly whether the > > hierarchy is unified or not. > > > > 1. If they always can live on the exact same hierarchy, there's no > > point in having the two separate. Just merge them. > > > > 2. If they need differing levels of granularity, they either need to > > do it completely separately as they do now or have some form of > > dynamic optimization if absolutely necesary. > > > > So, I think that choice is rather separate from other issues. If > > cpuacct is gonna be kept, I'd just keep it separate and warn that it > > incurs extra overhead for the current users if for nothing else. > > Otherwise, kill it or merge it into cpu. > > Quite, hence my 'proposal' to remove cpuacct. > > There was some whining last time Glauber proposed this, but the one > whining never convinced and has gone away from Linux, so lets just do > this. > > Lets make cpuacct print a deprecated msg to dmesg for a few releases and > make cpu do all this. I like it. Currently cpuacct is the only problematic one in this regard (cpuset to a much lesser extent) and it would be great to make it go away. Dhaval, Frederic, Paul, if you guys object, please voice your opinions. > The co-mounting stuff would have been nice for cpusets as well, knowing > all your tasks are affine to a subset of cpus allows for a few > optimizations (smaller cpumask iterations), but I guess we'll have to do > that dynamically, we'll just have to see how ugly that is. Forced co-mounting sounds rather silly to me. If the two are always gonna be co-mounted, why not just merge them and switch the functionality depending on configuration? I'm fairly sure the code would be simpler that way. If cpuset and cpu being separate is important enough && the overhead of doing things separately for cpuset isn't too high, I wouldn't bother too much with dynamic optimization but that's your call. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933192Ab2IFVLi (ORCPT ); Thu, 6 Sep 2012 17:11:38 -0400 Received: from mail-vc0-f174.google.com ([209.85.220.174]:55938 "EHLO mail-vc0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933183Ab2IFVLd (ORCPT ); Thu, 6 Sep 2012 17:11:33 -0400 MIME-Version: 1.0 In-Reply-To: <20120906204642.GN29092@google.com> References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346839487.2600.24.camel@twins> <20120906204642.GN29092@google.com> From: Paul Turner Date: Thu, 6 Sep 2012 14:11:00 -0700 Message-ID: Subject: Re: [RFC 0/5] forced comounts for cgroups. To: Tejun Heo Cc: Peter Zijlstra , Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, lennart@poettering.net, kay.sievers@vrfy.org, Dhaval Giani , Frederic Weisbecker Content-Type: text/plain; charset=ISO-8859-1 X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 6, 2012 at 1:46 PM, Tejun Heo wrote: > Hello, > > cc'ing Dhaval and Frederic. They were interested in the subject > before and Dhaval was pretty vocal about cpuacct having a separate > hierarchy (or at least granularity). Really? Time just has _not_ borne out this use-case. I'll let Dhaval make a case for this but he should expect violent objection. > > On Wed, Sep 05, 2012 at 12:04:47PM +0200, Peter Zijlstra wrote: >> > cpuacct is rather unique tho. I think it's gonna be silly whether the >> > hierarchy is unified or not. >> > >> > 1. If they always can live on the exact same hierarchy, there's no >> > point in having the two separate. Just merge them. >> > >> > 2. If they need differing levels of granularity, they either need to >> > do it completely separately as they do now or have some form of >> > dynamic optimization if absolutely necesary. >> > >> > So, I think that choice is rather separate from other issues. If >> > cpuacct is gonna be kept, I'd just keep it separate and warn that it >> > incurs extra overhead for the current users if for nothing else. >> > Otherwise, kill it or merge it into cpu. >> >> Quite, hence my 'proposal' to remove cpuacct. >> >> There was some whining last time Glauber proposed this, but the one >> whining never convinced and has gone away from Linux, so lets just do >> this. >> >> Lets make cpuacct print a deprecated msg to dmesg for a few releases and >> make cpu do all this. > > I like it. Currently cpuacct is the only problematic one in this > regard (cpuset to a much lesser extent) and it would be great to make > it go away. > > Dhaval, Frederic, Paul, if you guys object, please voice your > opinions. > >> The co-mounting stuff would have been nice for cpusets as well, knowing >> all your tasks are affine to a subset of cpus allows for a few >> optimizations (smaller cpumask iterations), but I guess we'll have to do >> that dynamically, we'll just have to see how ugly that is. > > Forced co-mounting sounds rather silly to me. If the two are always > gonna be co-mounted, why not just merge them and switch the > functionality depending on configuration? I'm fairly sure the code > would be simpler that way. It would be simpler but the problem is we'd break any userspace that was just doing mount cpuacct? Further, even if it were mounting both, userspace code still has to be changed to read from "cpu.export" instead of "cpuacct.export". I think a sane path on this front is: Immediately: Don't allow cpuacct and cpu to be co-mounted on separate hierarchies simultaneously. That is: mount none /dev/cgroup/cpuacct -t cgroupfs -o cpuacct : still works mount none /dev/cgroup/cpu -t cgroupfs -o cpu : still works mount none /dev/cgroup/cpux -t cgroupfs -o cpuacct,cpu : still works But the combination: mount none /dev/cgroup/cpu -t cgroupfs -o cpu : still works mount none /dev/cgroup/cpuacct -t cgroupfs -o cpu : EINVAL [or vice versa]. Also: WARN_ON when mounting cpuacct without cpu, strongly explaining that ANY such configuration is deprecated. Glauber's patchset goes most of the way towards enabling this. In a release or two: Make the restriction strict; don't allow individual mounting of cpuacct, force it to be mounted ONLY with cpu. Glauber's patchset gives us this. Finally: Mirror the interfaces to cpu, print nasty syslog messages about ANY mounts of cpuacct Follow that up by eventually removing cpuacct completely -- In general I think this sets a hard precedent of never allowing an accounting controller to exist with a control one for a given area, e.g. cpu, networking, mm, etc. In the cases where one of these exists already, any attempts to extend (acounting or control) must extend the existing. > > If cpuset and cpu being separate is important enough && the overhead > of doing things separately for cpuset isn't too high, I wouldn't > bother too much with dynamic optimization but that's your call. > Given the choice we would just straight out ripped it out long ago. Breaking the user-space ABI is the problem. > Thanks. > > -- > tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753840Ab2IFWkF (ORCPT ); Thu, 6 Sep 2012 18:40:05 -0400 Received: from mx2.parallels.com ([64.131.90.16]:38454 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752734Ab2IFWkB (ORCPT ); Thu, 6 Sep 2012 18:40:01 -0400 Message-ID: <50492574.6030308@parallels.com> Date: Fri, 7 Sep 2012 02:36:36 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Paul Turner CC: Tejun Heo , Peter Zijlstra , , , , , , , , Dhaval Giani , Frederic Weisbecker Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346839487.2600.24.camel@twins> <20120906204642.GN29092@google.com> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [109.173.3.27] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/07/2012 01:11 AM, Paul Turner wrote: > On Thu, Sep 6, 2012 at 1:46 PM, Tejun Heo wrote: >> Hello, >> >> cc'ing Dhaval and Frederic. They were interested in the subject >> before and Dhaval was pretty vocal about cpuacct having a separate >> hierarchy (or at least granularity). > > Really? Time just has _not_ borne out this use-case. I'll let Dhaval > make a case for this but he should expect violent objection. > I strongly advise against physical violence. In case it is really necessary, please break his legs only. >> On Wed, Sep 05, 2012 at 12:04:47PM +0200, Peter Zijlstra wrote: >>>> cpuacct is rather unique tho. I think it's gonna be silly whether the >>>> hierarchy is unified or not. >>>> >>>> 1. If they always can live on the exact same hierarchy, there's no >>>> point in having the two separate. Just merge them. >>>> >>>> 2. If they need differing levels of granularity, they either need to >>>> do it completely separately as they do now or have some form of >>>> dynamic optimization if absolutely necesary. >>>> >>>> So, I think that choice is rather separate from other issues. If >>>> cpuacct is gonna be kept, I'd just keep it separate and warn that it >>>> incurs extra overhead for the current users if for nothing else. >>>> Otherwise, kill it or merge it into cpu. >>> >>> Quite, hence my 'proposal' to remove cpuacct. >>> >>> There was some whining last time Glauber proposed this, but the one >>> whining never convinced and has gone away from Linux, so lets just do >>> this. >>> >>> Lets make cpuacct print a deprecated msg to dmesg for a few releases and >>> make cpu do all this. >> >> I like it. Currently cpuacct is the only problematic one in this >> regard (cpuset to a much lesser extent) and it would be great to make >> it go away. >> >> Dhaval, Frederic, Paul, if you guys object, please voice your >> opinions. >> >>> The co-mounting stuff would have been nice for cpusets as well, knowing >>> all your tasks are affine to a subset of cpus allows for a few >>> optimizations (smaller cpumask iterations), but I guess we'll have to do >>> that dynamically, we'll just have to see how ugly that is. >> >> Forced co-mounting sounds rather silly to me. If the two are always >> gonna be co-mounted, why not just merge them and switch the >> functionality depending on configuration? I'm fairly sure the code >> would be simpler that way. > > It would be simpler but the problem is we'd break any userspace that > was just doing mount cpuacct? > > Further, even if it were mounting both, userspace code still has to be > changed to read from "cpu.export" instead of "cpuacct.export". > Only if we remove cpuacct. What we can do, and I thought about doing, is just merging cpuacct functionality into cpu. Then we move cpuacct to default no. It will be there for userspace if they absolutely want to use it. > I think a sane path on this front is: > > Immediately: > Don't allow cpuacct and cpu to be co-mounted on separate hierarchies > simultaneously. > that is precisely what my patch does, except it is a bit more generic. > That is: > mount none /dev/cgroup/cpuacct -t cgroupfs -o cpuacct : still works > mount none /dev/cgroup/cpu -t cgroupfs -o cpu : still works > mount none /dev/cgroup/cpux -t cgroupfs -o cpuacct,cpu : still works > > But the combination: > mount none /dev/cgroup/cpu -t cgroupfs -o cpu : still works > mount none /dev/cgroup/cpuacct -t cgroupfs -o cpu : EINVAL [or vice versa]. > > Also: > WARN_ON when mounting cpuacct without cpu, strongly explaining that > ANY such configuration is deprecated. > > Glauber's patchset goes most of the way towards enabling this. > yes. > In a release or two: > Make the restriction strict; don't allow individual mounting of > cpuacct, force it to be mounted ONLY with cpu. > > Glauber's patchset gives us this. > > Finally: > Mirror the interfaces to cpu, print nasty syslog messages about ANY > mounts of cpuacct > Follow that up by eventually removing cpuacct completely > Why don't start with mirroring? It gives more time for people to start switching to it. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755185Ab2IFWmj (ORCPT ); Thu, 6 Sep 2012 18:42:39 -0400 Received: from mx2.parallels.com ([64.131.90.16]:37172 "EHLO mx2.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753545Ab2IFWmh (ORCPT ); Thu, 6 Sep 2012 18:42:37 -0400 Message-ID: <50492617.8030609@parallels.com> Date: Fri, 7 Sep 2012 02:39:19 +0400 From: Glauber Costa User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Tejun Heo CC: Peter Zijlstra , , , , , , , , Subject: Re: [RFC 0/5] forced comounts for cgroups. References: <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <1346840453.2461.6.camel@laptop> <20120906203839.GM29092@google.com> In-Reply-To: <20120906203839.GM29092@google.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [109.173.3.27] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/07/2012 12:38 AM, Tejun Heo wrote: > Hello, Peter, Glauber. > > (I'm gonna write up cgroup core todos which should explain / address > this issue too. ATM I'm a bit overwhelmed with stuff accumulated > while traveling.) > Yes, please. While you rightfully claim that you explained it a couple of times, it all seems to be quite fuzzy. I don't blame it on you: the current state of the interface leads to this. So another detailed explanation of what you envision at this point, considering the discussions we had in the previous days, would be really helpful, From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755285Ab2IFWpu (ORCPT ); Thu, 6 Sep 2012 18:45:50 -0400 Received: from mail-lb0-f174.google.com ([209.85.217.174]:50117 "EHLO mail-lb0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754318Ab2IFWpt (ORCPT ); Thu, 6 Sep 2012 18:45:49 -0400 MIME-Version: 1.0 In-Reply-To: <50492617.8030609@parallels.com> References: <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905091140.GH3195@dhcp-172-17-108-109.mtv.corp.google.com> <50471782.6060800@parallels.com> <1346837209.2600.14.camel@twins> <50471C0C.7050600@parallels.com> <1346840453.2461.6.camel@laptop> <20120906203839.GM29092@google.com> <50492617.8030609@parallels.com> Date: Thu, 6 Sep 2012 15:45:47 -0700 X-Google-Sender-Auth: FuJJIO4uGRklWbGj2FwcVNJ0gVs Message-ID: Subject: Re: [RFC 0/5] forced comounts for cgroups. From: Tejun Heo To: Glauber Costa Cc: Peter Zijlstra , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, pjt@google.com, lennart@poettering.net, kay.sievers@vrfy.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, Glauber. On Thu, Sep 6, 2012 at 3:39 PM, Glauber Costa wrote: > Yes, please. > > While you rightfully claim that you explained it a couple of times, it > all seems to be quite fuzzy. I don't blame it on you: the current state > of the interface leads to this. Heh, I drank two cups of coffee and two glasses of wine that evening. Coffee won and I couldn't sleep till around 4am with splitting headache. I'm not too confident about what I wrote that night. :) > So another detailed explanation of what you envision at this point, > considering the discussions we had in the previous days, would be really > helpful, Definitely, will do. Please give me a few days to sort through immediately pending stuff. Thanks. -- tejun From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753979Ab2IHNgf (ORCPT ); Sat, 8 Sep 2012 09:36:35 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:38550 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751013Ab2IHNga (ORCPT ); Sat, 8 Sep 2012 09:36:30 -0400 MIME-Version: 1.0 In-Reply-To: References: <20120904214602.GA9092@dhcp-172-17-108-109.mtv.corp.google.com> <5047074D.1030104@parallels.com> <20120905081439.GC3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470A87.1040701@parallels.com> <20120905082947.GD3195@dhcp-172-17-108-109.mtv.corp.google.com> <50470EBF.9070109@parallels.com> <20120905084740.GE3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346835993.2600.9.camel@twins> <20120905093204.GL3195@dhcp-172-17-108-109.mtv.corp.google.com> <1346839487.2600.24.camel@twins> <20120906204642.GN29092@google.com> Date: Sat, 8 Sep 2012 09:36:29 -0400 Message-ID: Subject: Re: [RFC 0/5] forced comounts for cgroups. From: Dhaval Giani To: Paul Turner Cc: Tejun Heo , Peter Zijlstra , Glauber Costa , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, davej@redhat.com, ben@decadent.org.uk, lennart@poettering.net, kay.sievers@vrfy.org, Frederic Weisbecker , Balbir Singh , Bharata B Rao Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 6, 2012 at 5:11 PM, Paul Turner wrote: > On Thu, Sep 6, 2012 at 1:46 PM, Tejun Heo wrote: >> Hello, >> >> cc'ing Dhaval and Frederic. They were interested in the subject >> before and Dhaval was pretty vocal about cpuacct having a separate >> hierarchy (or at least granularity). > > Really? Time just has _not_ borne out this use-case. I'll let Dhaval > make a case for this but he should expect violent objection. > I am not objecting directly! I am aware of a few users who are (or at least were) using cpu and cpuacct separately because they want to be able to account without control. Having said that, there are tons of flaws in the current approach, because the accounting without control is just plain wrong. I have copied a few other folks who might be able to shed light on those users and if we should still consider them. [And the lesser number of controllers, the better it is!] Thanks! Dhaval