* [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat @ 2021-02-17 12:00 Andrey Ryabinin 2021-02-17 12:00 ` [PATCH 3/4] sched/cpuacct: fix user/system in shown cpuacct.usage* Andrey Ryabinin ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Andrey Ryabinin @ 2021-02-17 12:00 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Tejun Heo, Zefan Li, Johannes Weiner, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: Boris Burkov, Bharata B Rao, cgroups, linux-kernel, Andrey Ryabinin, stable cpuacct.stat in no-root cgroups shows user time without guest time included int it. This doesn't match with user time shown in root cpuacct.stat and /proc/<pid>/stat. Make account_guest_time() to add user time to cgroup's cpustat to fix this. Fixes: ef12fefabf94 ("cpuacct: add per-cgroup utime/stime statistics") Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Cc: <stable@vger.kernel.org> --- kernel/sched/cputime.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 5f611658eeab..95a9c5603d29 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -139,8 +139,6 @@ void account_user_time(struct task_struct *p, u64 cputime) */ void account_guest_time(struct task_struct *p, u64 cputime) { - u64 *cpustat = kcpustat_this_cpu->cpustat; - /* Add guest time to process. */ p->utime += cputime; account_group_user_time(p, cputime); @@ -148,11 +146,11 @@ void account_guest_time(struct task_struct *p, u64 cputime) /* Add guest time to cpustat. */ if (task_nice(p) > 0) { - cpustat[CPUTIME_NICE] += cputime; - cpustat[CPUTIME_GUEST_NICE] += cputime; + task_group_account_field(p, CPUTIME_NICE, cputime); + task_group_account_field(p, CPUTIME_GUEST_NICE, cputime); } else { - cpustat[CPUTIME_USER] += cputime; - cpustat[CPUTIME_GUEST] += cputime; + task_group_account_field(p, CPUTIME_USER, cputime); + task_group_account_field(p, CPUTIME_GUEST, cputime); } } -- 2.26.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/4] sched/cpuacct: fix user/system in shown cpuacct.usage* 2021-02-17 12:00 [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat Andrey Ryabinin @ 2021-02-17 12:00 ` Andrey Ryabinin 2021-03-17 22:22 ` Daniel Jordan [not found] ` <20210217120004.7984-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-08-20 9:40 ` [PATCH v2 1/5] cputime, cpuacct: " Andrey Ryabinin 2 siblings, 1 reply; 17+ messages in thread From: Andrey Ryabinin @ 2021-02-17 12:00 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Tejun Heo, Zefan Li, Johannes Weiner, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: Boris Burkov, Bharata B Rao, cgroups, linux-kernel, Andrey Ryabinin, stable cpuacct has 2 different ways of accounting and showing user and system times. The first one uses cpuacct_account_field() to account times and cpuacct.stat file to expose them. And this one seems to work ok. The second one is uses cpuacct_charge() function for accounting and set of cpuacct.usage* files to show times. Despite some attempts to fix it in the past it still doesn't work. E.g. while running KVM guest the cpuacct_charge() accounts most of the guest time as system time. This doesn't match with user&system times shown in cpuacct.stat or proc/<pid>/stat. Use cpustats accounted in cpuacct_account_field() as the source of user/sys times for cpuacct.usage* files. Make cpuacct_charge() to account only summary execution time. Fixes: d740037fac70 ("sched/cpuacct: Split usage accounting into user_usage and sys_usage") Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Cc: <stable@vger.kernel.org> --- kernel/sched/cpuacct.c | 77 +++++++++++++++++++----------------------- 1 file changed, 34 insertions(+), 43 deletions(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 941c28cf9738..7eff79faab0d 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -29,7 +29,7 @@ struct cpuacct_usage { struct cpuacct { struct cgroup_subsys_state css; /* cpuusage holds pointer to a u64-type object on every CPU */ - struct cpuacct_usage __percpu *cpuusage; + u64 __percpu *cpuusage; struct kernel_cpustat __percpu *cpustat; }; @@ -49,7 +49,7 @@ static inline struct cpuacct *parent_ca(struct cpuacct *ca) return css_ca(ca->css.parent); } -static DEFINE_PER_CPU(struct cpuacct_usage, root_cpuacct_cpuusage); +static DEFINE_PER_CPU(u64, root_cpuacct_cpuusage); static struct cpuacct root_cpuacct = { .cpustat = &kernel_cpustat, .cpuusage = &root_cpuacct_cpuusage, @@ -68,7 +68,7 @@ cpuacct_css_alloc(struct cgroup_subsys_state *parent_css) if (!ca) goto out; - ca->cpuusage = alloc_percpu(struct cpuacct_usage); + ca->cpuusage = alloc_percpu(u64); if (!ca->cpuusage) goto out_free_ca; @@ -99,7 +99,8 @@ static void cpuacct_css_free(struct cgroup_subsys_state *css) static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, enum cpuacct_stat_index index) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; u64 data; /* @@ -115,14 +116,17 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, raw_spin_lock_irq(&cpu_rq(cpu)->lock); #endif - if (index == CPUACCT_STAT_NSTATS) { - int i = 0; - - data = 0; - for (i = 0; i < CPUACCT_STAT_NSTATS; i++) - data += cpuusage->usages[i]; - } else { - data = cpuusage->usages[index]; + switch (index) { + case CPUACCT_STAT_USER: + data = cpustat[CPUTIME_USER] + cpustat[CPUTIME_NICE]; + break; + case CPUACCT_STAT_SYSTEM: + data = cpustat[CPUTIME_SYSTEM] + cpustat[CPUTIME_IRQ] + + cpustat[CPUTIME_SOFTIRQ]; + break; + case CPUACCT_STAT_NSTATS: + data = *cpuusage; + break; } #ifndef CONFIG_64BIT @@ -132,10 +136,14 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, return data; } -static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) +static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - int i; + u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; + + /* Don't allow to reset global kernel_cpustat */ + if (ca == &root_cpuacct) + return; #ifndef CONFIG_64BIT /* @@ -143,9 +151,10 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) */ raw_spin_lock_irq(&cpu_rq(cpu)->lock); #endif - - for (i = 0; i < CPUACCT_STAT_NSTATS; i++) - cpuusage->usages[i] = val; + *cpuusage = 0; + cpustat[CPUTIME_USER] = cpustat[CPUTIME_NICE] = 0; + cpustat[CPUTIME_SYSTEM] = cpustat[CPUTIME_IRQ] = 0; + cpustat[CPUTIME_SOFTIRQ] = 0; #ifndef CONFIG_64BIT raw_spin_unlock_irq(&cpu_rq(cpu)->lock); @@ -196,7 +205,7 @@ static int cpuusage_write(struct cgroup_subsys_state *css, struct cftype *cft, return -EINVAL; for_each_possible_cpu(cpu) - cpuacct_cpuusage_write(ca, cpu, 0); + cpuacct_cpuusage_write(ca, cpu); return 0; } @@ -243,25 +252,12 @@ static int cpuacct_all_seq_show(struct seq_file *m, void *V) seq_puts(m, "\n"); for_each_possible_cpu(cpu) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - seq_printf(m, "%d", cpu); - for (index = 0; index < CPUACCT_STAT_NSTATS; index++) { -#ifndef CONFIG_64BIT - /* - * Take rq->lock to make 64-bit read safe on 32-bit - * platforms. - */ - raw_spin_lock_irq(&cpu_rq(cpu)->lock); -#endif - - seq_printf(m, " %llu", cpuusage->usages[index]); + for (index = 0; index < CPUACCT_STAT_NSTATS; index++) + seq_printf(m, " %llu", + cpuacct_cpuusage_read(ca, cpu, index)); -#ifndef CONFIG_64BIT - raw_spin_unlock_irq(&cpu_rq(cpu)->lock); -#endif - } seq_puts(m, "\n"); } return 0; @@ -278,8 +274,8 @@ static int cpuacct_stats_show(struct seq_file *sf, void *v) for_each_possible_cpu(cpu) { u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_USER]; - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_NICE]; + val[CPUACCT_STAT_USER] += cpustat[CPUTIME_USER]; + val[CPUACCT_STAT_USER] += cpustat[CPUTIME_NICE]; val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SYSTEM]; val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_IRQ]; val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SOFTIRQ]; @@ -339,16 +335,11 @@ static struct cftype files[] = { void cpuacct_charge(struct task_struct *tsk, u64 cputime) { struct cpuacct *ca; - int index = CPUACCT_STAT_SYSTEM; - struct pt_regs *regs = get_irq_regs() ? : task_pt_regs(tsk); - - if (regs && user_mode(regs)) - index = CPUACCT_STAT_USER; rcu_read_lock(); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) - __this_cpu_add(ca->cpuusage->usages[index], cputime); + __this_cpu_add(*ca->cpuusage, cputime); rcu_read_unlock(); } -- 2.26.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 3/4] sched/cpuacct: fix user/system in shown cpuacct.usage* 2021-02-17 12:00 ` [PATCH 3/4] sched/cpuacct: fix user/system in shown cpuacct.usage* Andrey Ryabinin @ 2021-03-17 22:22 ` Daniel Jordan [not found] ` <87r1kdl8se.fsf-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Daniel Jordan @ 2021-03-17 22:22 UTC (permalink / raw) To: Andrey Ryabinin, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Tejun Heo, Zefan Li, Johannes Weiner, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: Boris Burkov, Bharata B Rao, cgroups, linux-kernel, Andrey Ryabinin, stable Andrey Ryabinin <arbn@yandex-team.com> writes: > cpuacct has 2 different ways of accounting and showing user > and system times. > > The first one uses cpuacct_account_field() to account times > and cpuacct.stat file to expose them. And this one seems to work ok. > > The second one is uses cpuacct_charge() function for accounting and > set of cpuacct.usage* files to show times. Despite some attempts to > fix it in the past it still doesn't work. E.g. while running KVM > guest the cpuacct_charge() accounts most of the guest time as > system time. This doesn't match with user&system times shown in > cpuacct.stat or proc/<pid>/stat. I couldn't reproduce this running a cpu bound load in a kvm guest on a nohz_full cpu on 5.11. The time is almost entirely in cpuacct.usage and _user, while _sys stays low. Could you say more about how you're seeing this? Don't really doubt there's a problem, just wondering what you're doing. > diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c > index 941c28cf9738..7eff79faab0d 100644 > --- a/kernel/sched/cpuacct.c > +++ b/kernel/sched/cpuacct.c > @@ -29,7 +29,7 @@ struct cpuacct_usage { > struct cpuacct { > struct cgroup_subsys_state css; > /* cpuusage holds pointer to a u64-type object on every CPU */ > - struct cpuacct_usage __percpu *cpuusage; Definition of struct cpuacct_usage can go away now. > @@ -99,7 +99,8 @@ static void cpuacct_css_free(struct cgroup_subsys_state *css) > static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, > enum cpuacct_stat_index index) > { > - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); > + u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); > + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; > u64 data; There's a BUG_ON below this that could probably be WARN_ON_ONCE while you're here > @@ -278,8 +274,8 @@ static int cpuacct_stats_show(struct seq_file *sf, void *v) > for_each_possible_cpu(cpu) { > u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; > > - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_USER]; > - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_NICE]; > + val[CPUACCT_STAT_USER] += cpustat[CPUTIME_USER]; > + val[CPUACCT_STAT_USER] += cpustat[CPUTIME_NICE]; unnecessary whitespace change? ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <87r1kdl8se.fsf-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 3/4] sched/cpuacct: fix user/system in shown cpuacct.usage* [not found] ` <87r1kdl8se.fsf-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2021-08-20 9:37 ` Andrey Ryabinin 0 siblings, 0 replies; 17+ messages in thread From: Andrey Ryabinin @ 2021-08-20 9:37 UTC (permalink / raw) To: Daniel Jordan, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Tejun Heo, Zefan Li, Johannes Weiner, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: Boris Burkov, Bharata B Rao, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA On 3/18/21 1:22 AM, Daniel Jordan wrote: > Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> writes: > >> cpuacct has 2 different ways of accounting and showing user >> and system times. >> >> The first one uses cpuacct_account_field() to account times >> and cpuacct.stat file to expose them. And this one seems to work ok. >> >> The second one is uses cpuacct_charge() function for accounting and >> set of cpuacct.usage* files to show times. Despite some attempts to >> fix it in the past it still doesn't work. E.g. while running KVM >> guest the cpuacct_charge() accounts most of the guest time as >> system time. This doesn't match with user&system times shown in >> cpuacct.stat or proc/<pid>/stat. > > I couldn't reproduce this running a cpu bound load in a kvm guest on a > nohz_full cpu on 5.11. The time is almost entirely in cpuacct.usage and > _user, while _sys stays low. > > Could you say more about how you're seeing this? Don't really doubt > there's a problem, just wondering what you're doing. > Yeah, I it's almost unnoticable if you run some load in guest like qemu. But more simple case with busy loop in KVM_RUN triggers this: # git clone https://github.com/aryabinin/kvmsample # make # mkdir /sys/fs/cgroup/cpuacct/test # echo $$ > /sys/fs/cgroup/cpuacct/test/tasks # ./kvmsample & # for i in {1..5}; do cat /sys/fs/cgroup/cpuacct/test/cpuacct.usage_sys; sleep 1; done 1976535645 2979839428 3979832704 4983603153 5983604157 >> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c >> index 941c28cf9738..7eff79faab0d 100644 >> --- a/kernel/sched/cpuacct.c >> +++ b/kernel/sched/cpuacct.c >> @@ -29,7 +29,7 @@ struct cpuacct_usage { >> struct cpuacct { >> struct cgroup_subsys_state css; >> /* cpuusage holds pointer to a u64-type object on every CPU */ >> - struct cpuacct_usage __percpu *cpuusage; > > Definition of struct cpuacct_usage can go away now. > Done. >> @@ -99,7 +99,8 @@ static void cpuacct_css_free(struct cgroup_subsys_state *css) >> static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, >> enum cpuacct_stat_index index) >> { >> - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); >> + u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); >> + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; >> u64 data; > > There's a BUG_ON below this that could probably be WARN_ON_ONCE while > you're here > Sure. >> @@ -278,8 +274,8 @@ static int cpuacct_stats_show(struct seq_file *sf, void *v) >> for_each_possible_cpu(cpu) { >> u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; >> >> - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_USER]; >> - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_NICE]; >> + val[CPUACCT_STAT_USER] += cpustat[CPUTIME_USER]; >> + val[CPUACCT_STAT_USER] += cpustat[CPUTIME_NICE]; > > unnecessary whitespace change? > yup ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20210217120004.7984-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org>]
* [PATCH 2/4] cgroup: Fix 'usage_usec' time in root's cpu.stat [not found] ` <20210217120004.7984-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> @ 2021-02-17 12:00 ` Andrey Ryabinin [not found] ` <20210217120004.7984-2-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-02-17 12:00 ` [PATCH 4/4] sched/cpuacct: Make user/system times in cpuacct.stat more precise Andrey Ryabinin 2021-03-17 22:09 ` [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat Daniel Jordan 2 siblings, 1 reply; 17+ messages in thread From: Andrey Ryabinin @ 2021-02-17 12:00 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Tejun Heo, Zefan Li, Johannes Weiner, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: Boris Burkov, Bharata B Rao, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrey Ryabinin, stable-u79uwXL29TY76Z2rM5mHXA Global CPUTIME_USER counter already includes CPUTIME_GUEST Also CPUTIME_NICE already includes CPUTIME_GUEST_NICE. Remove additions of CPUTIME_GUEST[_NICE] to total ->sum_exec_runtime to not account them twice. Fixes: 936f2a70f207 ("cgroup: add cpu.stat file to root cgroup") Signed-off-by: Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> --- kernel/cgroup/rstat.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index d51175cedfca..89ca9b61aa0d 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -421,8 +421,6 @@ static void root_cgroup_cputime(struct task_cputime *cputime) cputime->sum_exec_runtime += user; cputime->sum_exec_runtime += sys; cputime->sum_exec_runtime += cpustat[CPUTIME_STEAL]; - cputime->sum_exec_runtime += cpustat[CPUTIME_GUEST]; - cputime->sum_exec_runtime += cpustat[CPUTIME_GUEST_NICE]; } } -- 2.26.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <20210217120004.7984-2-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org>]
* Re: [PATCH 2/4] cgroup: Fix 'usage_usec' time in root's cpu.stat [not found] ` <20210217120004.7984-2-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> @ 2021-03-17 22:13 ` Daniel Jordan 0 siblings, 0 replies; 17+ messages in thread From: Daniel Jordan @ 2021-03-17 22:13 UTC (permalink / raw) To: Andrey Ryabinin, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Tejun Heo, Zefan Li, Johannes Weiner, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: Boris Burkov, Bharata B Rao, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrey Ryabinin, stable-u79uwXL29TY76Z2rM5mHXA Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> writes: > Global CPUTIME_USER counter already includes CPUTIME_GUEST > Also CPUTIME_NICE already includes CPUTIME_GUEST_NICE. > > Remove additions of CPUTIME_GUEST[_NICE] to total ->sum_exec_runtime > to not account them twice. Yes, that's just wrong. usage_usec looks ok now. Reviewed-by: Daniel Jordan <daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> Tested-by: Daniel Jordan <daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 4/4] sched/cpuacct: Make user/system times in cpuacct.stat more precise [not found] ` <20210217120004.7984-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-02-17 12:00 ` [PATCH 2/4] cgroup: Fix 'usage_usec' time in root's cpu.stat Andrey Ryabinin @ 2021-02-17 12:00 ` Andrey Ryabinin [not found] ` <20210217120004.7984-4-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-03-17 22:09 ` [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat Daniel Jordan 2 siblings, 1 reply; 17+ messages in thread From: Andrey Ryabinin @ 2021-02-17 12:00 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Tejun Heo, Zefan Li, Johannes Weiner, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: Boris Burkov, Bharata B Rao, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrey Ryabinin cpuacct.stat shows user time based on raw random precision tick based counters. Use cputime_addjust() to scale these values against the total runtime accounted by the scheduler, like we already do for user/system times in /proc/<pid>/stat. Signed-off-by: Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> --- kernel/sched/cpuacct.c | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 7eff79faab0d..36347f2810c0 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -266,25 +266,30 @@ static int cpuacct_all_seq_show(struct seq_file *m, void *V) static int cpuacct_stats_show(struct seq_file *sf, void *v) { struct cpuacct *ca = css_ca(seq_css(sf)); - s64 val[CPUACCT_STAT_NSTATS]; + struct task_cputime cputime; + u64 val[CPUACCT_STAT_NSTATS]; int cpu; int stat; - memset(val, 0, sizeof(val)); + memset(&cputime, 0, sizeof(cputime)); for_each_possible_cpu(cpu) { u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_USER]; - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_NICE]; - val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SYSTEM]; - val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_IRQ]; - val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SOFTIRQ]; + cputime.utime += cpustat[CPUTIME_USER]; + cputime.utime += cpustat[CPUTIME_NICE]; + cputime.stime += cpustat[CPUTIME_SYSTEM]; + cputime.stime += cpustat[CPUTIME_IRQ]; + cputime.stime += cpustat[CPUTIME_SOFTIRQ]; + + cputime.sum_exec_runtime += this_cpu_read(*ca->cpuusage); } + cputime_adjust(&cputime, &seq_css(sf)->cgroup->prev_cputime, + &val[CPUACCT_STAT_USER], &val[CPUACCT_STAT_SYSTEM]); + for (stat = 0; stat < CPUACCT_STAT_NSTATS; stat++) { - seq_printf(sf, "%s %lld\n", - cpuacct_stat_desc[stat], - (long long)nsec_to_clock_t(val[stat])); + seq_printf(sf, "%s %llu\n", cpuacct_stat_desc[stat], + nsec_to_clock_t(val[stat])); } return 0; -- 2.26.2 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <20210217120004.7984-4-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org>]
* Re: [PATCH 4/4] sched/cpuacct: Make user/system times in cpuacct.stat more precise [not found] ` <20210217120004.7984-4-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> @ 2021-03-17 22:25 ` Daniel Jordan 0 siblings, 0 replies; 17+ messages in thread From: Daniel Jordan @ 2021-03-17 22:25 UTC (permalink / raw) To: Andrey Ryabinin, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Tejun Heo, Zefan Li, Johannes Weiner, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: Boris Burkov, Bharata B Rao, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrey Ryabinin Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> writes: > static int cpuacct_stats_show(struct seq_file *sf, void *v) > { ... > for_each_possible_cpu(cpu) { > u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; > > - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_USER]; > - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_NICE]; > - val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SYSTEM]; > - val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_IRQ]; > - val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SOFTIRQ]; > + cputime.utime += cpustat[CPUTIME_USER]; > + cputime.utime += cpustat[CPUTIME_NICE]; > + cputime.stime += cpustat[CPUTIME_SYSTEM]; > + cputime.stime += cpustat[CPUTIME_IRQ]; > + cputime.stime += cpustat[CPUTIME_SOFTIRQ]; > + > + cputime.sum_exec_runtime += this_cpu_read(*ca->cpuusage); > } cputime.sum_exec_runtime += *per_cpu_ptr(ca->cpuusage, cpu); Or the stats can all be 0... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat [not found] ` <20210217120004.7984-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-02-17 12:00 ` [PATCH 2/4] cgroup: Fix 'usage_usec' time in root's cpu.stat Andrey Ryabinin 2021-02-17 12:00 ` [PATCH 4/4] sched/cpuacct: Make user/system times in cpuacct.stat more precise Andrey Ryabinin @ 2021-03-17 22:09 ` Daniel Jordan 2021-08-20 9:37 ` Andrey Ryabinin 2 siblings, 1 reply; 17+ messages in thread From: Daniel Jordan @ 2021-03-17 22:09 UTC (permalink / raw) To: Andrey Ryabinin, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Tejun Heo, Zefan Li, Johannes Weiner, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: Boris Burkov, Bharata B Rao, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrey Ryabinin, stable-u79uwXL29TY76Z2rM5mHXA Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> writes: > cpuacct.stat in no-root cgroups shows user time without guest time > included int it. This doesn't match with user time shown in root > cpuacct.stat and /proc/<pid>/stat. Yeah, that's inconsistent. > Make account_guest_time() to add user time to cgroup's cpustat to > fix this. Yep. cgroup2's cpu.stat is broken the same way for child cgroups, and this happily fixes it. Probably deserves a mention in the changelog. The problem with cgroup2 was, if the workload was mostly guest time, cpu.stat's user and system together reflected it, but it was split unevenly across the two. I think guest time wasn't actually included in either bucket, it was just that the little user and system time there was got scaled up in cgroup_base_stat_cputime_show -> cputime_adjust to match sum_exec_runtime, which did have it. The stats look ok now for both cgroup1 and 2. Just slightly unsure whether we want to change the way both interfaces expose the accounting in case something out there depends on it. Seems like we should, but it'd be good to hear more opinions. > @@ -148,11 +146,11 @@ void account_guest_time(struct task_struct *p, u64 cputime) > > /* Add guest time to cpustat. */ > if (task_nice(p) > 0) { > - cpustat[CPUTIME_NICE] += cputime; > - cpustat[CPUTIME_GUEST_NICE] += cputime; > + task_group_account_field(p, CPUTIME_NICE, cputime); > + task_group_account_field(p, CPUTIME_GUEST_NICE, cputime); > } else { > - cpustat[CPUTIME_USER] += cputime; > - cpustat[CPUTIME_GUEST] += cputime; > + task_group_account_field(p, CPUTIME_USER, cputime); > + task_group_account_field(p, CPUTIME_GUEST, cputime); > } Makes sense for _USER and _NICE, but it doesn't seem cgroup1 or 2 actually use _GUEST and _GUEST_NICE. Could go either way. Consistency is nice, but I probably wouldn't change the GUEST ones so people aren't confused about why they're accounted. It's also extra cycles for nothing, even though most of the data is probably in the cache. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat 2021-03-17 22:09 ` [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat Daniel Jordan @ 2021-08-20 9:37 ` Andrey Ryabinin 0 siblings, 0 replies; 17+ messages in thread From: Andrey Ryabinin @ 2021-08-20 9:37 UTC (permalink / raw) To: Daniel Jordan, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Tejun Heo, Zefan Li, Johannes Weiner, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira Cc: Boris Burkov, Bharata B Rao, cgroups, linux-kernel, stable Sorry for abandoning this, got distracted by lots of other stuff. On 3/18/21 1:09 AM, Daniel Jordan wrote: > Andrey Ryabinin <arbn@yandex-team.com> writes: > >> cpuacct.stat in no-root cgroups shows user time without guest time >> included int it. This doesn't match with user time shown in root >> cpuacct.stat and /proc/<pid>/stat. > > Yeah, that's inconsistent. > >> Make account_guest_time() to add user time to cgroup's cpustat to >> fix this. > > Yep. > > cgroup2's cpu.stat is broken the same way for child cgroups, and this > happily fixes it. Probably deserves a mention in the changelog. > Sure. > The problem with cgroup2 was, if the workload was mostly guest time, > cpu.stat's user and system together reflected it, but it was split > unevenly across the two. I think guest time wasn't actually included in > either bucket, it was just that the little user and system time there > was got scaled up in cgroup_base_stat_cputime_show -> cputime_adjust to > match sum_exec_runtime, which did have it. > > The stats look ok now for both cgroup1 and 2. Just slightly unsure > whether we want to change the way both interfaces expose the accounting > in case something out there depends on it. Seems like we should, but > it'd be good to hear more opinions. > >> @@ -148,11 +146,11 @@ void account_guest_time(struct task_struct *p, u64 cputime) >> >> /* Add guest time to cpustat. */ >> if (task_nice(p) > 0) { >> - cpustat[CPUTIME_NICE] += cputime; >> - cpustat[CPUTIME_GUEST_NICE] += cputime; >> + task_group_account_field(p, CPUTIME_NICE, cputime); >> + task_group_account_field(p, CPUTIME_GUEST_NICE, cputime); >> } else { >> - cpustat[CPUTIME_USER] += cputime; >> - cpustat[CPUTIME_GUEST] += cputime; >> + task_group_account_field(p, CPUTIME_USER, cputime); >> + task_group_account_field(p, CPUTIME_GUEST, cputime); >> } > > Makes sense for _USER and _NICE, but it doesn't seem cgroup1 or 2 > actually use _GUEST and _GUEST_NICE. > > Could go either way. Consistency is nice, but I probably wouldn't > change the GUEST ones so people aren't confused about why they're > accounted. It's also extra cycles for nothing, even though most of the > data is probably in the cache. > Agreed, will live the _GUEST* as is. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/5] cputime, cpuacct: Include guest time in user time in cpuacct.stat 2021-02-17 12:00 [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat Andrey Ryabinin 2021-02-17 12:00 ` [PATCH 3/4] sched/cpuacct: fix user/system in shown cpuacct.usage* Andrey Ryabinin [not found] ` <20210217120004.7984-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> @ 2021-08-20 9:40 ` Andrey Ryabinin [not found] ` <20210820094005.20596-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> ` (2 more replies) 2 siblings, 3 replies; 17+ messages in thread From: Andrey Ryabinin @ 2021-08-20 9:40 UTC (permalink / raw) To: Tejun Heo, Zefan Li, Johannes Weiner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot Cc: Daniel Jordan, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, cgroups, linux-kernel, bharata, boris, Andrey Ryabinin, stable cpuacct.stat in no-root cgroups shows user time without guest time included int it. This doesn't match with user time shown in root cpuacct.stat and /proc/<pid>/stat. This also affects cgroup2's cpu.stat in the same way. Make account_guest_time() to add user time to cgroup's cpustat to fix this. Fixes: ef12fefabf94 ("cpuacct: add per-cgroup utime/stime statistics") Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Cc: <stable@vger.kernel.org> --- Changes since v1: - Don't CPUTIME_GUEST* since they aren't used cgroups --- kernel/sched/cputime.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 872e481d5098..042a6dbce8f3 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -148,10 +148,10 @@ void account_guest_time(struct task_struct *p, u64 cputime) /* Add guest time to cpustat. */ if (task_nice(p) > 0) { - cpustat[CPUTIME_NICE] += cputime; + task_group_account_field(p, CPUTIME_NICE, cputime); cpustat[CPUTIME_GUEST_NICE] += cputime; } else { - cpustat[CPUTIME_USER] += cputime; + task_group_account_field(p, CPUTIME_USER, cputime); cpustat[CPUTIME_GUEST] += cputime; } } -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <20210820094005.20596-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org>]
* [PATCH v2 2/5] cpuacct: convert BUG_ON() to WARN_ON_ONCE() [not found] ` <20210820094005.20596-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> @ 2021-08-20 9:40 ` Andrey Ryabinin 2021-08-20 9:40 ` [PATCH v2 3/5] cgroup: Fix 'usage_usec' time in root's cpu.stat Andrey Ryabinin 2021-08-20 9:40 ` [PATCH v2 5/5] sched/cpuacct: Make user/system times in cpuacct.stat more precise Andrey Ryabinin 2 siblings, 0 replies; 17+ messages in thread From: Andrey Ryabinin @ 2021-08-20 9:40 UTC (permalink / raw) To: Tejun Heo, Zefan Li, Johannes Weiner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot Cc: Daniel Jordan, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bharata-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, boris-UrQYPotlwNs, Andrey Ryabinin Replace fatal BUG_ON() with more safe WARN_ON_ONCE() in cpuacct_cpuusage_read(). Signed-off-by: Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> --- kernel/sched/cpuacct.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 893eece65bfd..f347cf9e4634 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -106,7 +106,8 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, * We allow index == CPUACCT_STAT_NSTATS here to read * the sum of usages. */ - BUG_ON(index > CPUACCT_STAT_NSTATS); + if (WARN_ON_ONCE(index > CPUACCT_STAT_NSTATS)) + return 0; #ifndef CONFIG_64BIT /* -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 3/5] cgroup: Fix 'usage_usec' time in root's cpu.stat [not found] ` <20210820094005.20596-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-08-20 9:40 ` [PATCH v2 2/5] cpuacct: convert BUG_ON() to WARN_ON_ONCE() Andrey Ryabinin @ 2021-08-20 9:40 ` Andrey Ryabinin 2021-08-20 9:40 ` [PATCH v2 5/5] sched/cpuacct: Make user/system times in cpuacct.stat more precise Andrey Ryabinin 2 siblings, 0 replies; 17+ messages in thread From: Andrey Ryabinin @ 2021-08-20 9:40 UTC (permalink / raw) To: Tejun Heo, Zefan Li, Johannes Weiner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot Cc: Daniel Jordan, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bharata-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, boris-UrQYPotlwNs, Andrey Ryabinin, stable-u79uwXL29TY76Z2rM5mHXA Global CPUTIME_USER counter already includes CPUTIME_GUEST Also CPUTIME_NICE already includes CPUTIME_GUEST_NICE. Remove additions of CPUTIME_GUEST[_NICE] to total ->sum_exec_runtime to not account them twice. Fixes: 936f2a70f207 ("cgroup: add cpu.stat file to root cgroup") Signed-off-by: Andrey Ryabinin <arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> Reviewed-by: Daniel Jordan <daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> Tested-by: Daniel Jordan <daniel.m.jordan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> --- Changes since v1: - Add review/tested tags in changelog --- kernel/cgroup/rstat.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index b264ab5652ba..1486768f2318 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -433,8 +433,6 @@ static void root_cgroup_cputime(struct task_cputime *cputime) cputime->sum_exec_runtime += user; cputime->sum_exec_runtime += sys; cputime->sum_exec_runtime += cpustat[CPUTIME_STEAL]; - cputime->sum_exec_runtime += cpustat[CPUTIME_GUEST]; - cputime->sum_exec_runtime += cpustat[CPUTIME_GUEST_NICE]; } } -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 5/5] sched/cpuacct: Make user/system times in cpuacct.stat more precise [not found] ` <20210820094005.20596-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-08-20 9:40 ` [PATCH v2 2/5] cpuacct: convert BUG_ON() to WARN_ON_ONCE() Andrey Ryabinin 2021-08-20 9:40 ` [PATCH v2 3/5] cgroup: Fix 'usage_usec' time in root's cpu.stat Andrey Ryabinin @ 2021-08-20 9:40 ` Andrey Ryabinin 2 siblings, 0 replies; 17+ messages in thread From: Andrey Ryabinin @ 2021-08-20 9:40 UTC (permalink / raw) To: Tejun Heo, Zefan Li, Johannes Weiner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot Cc: Daniel Jordan, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, bharata-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, boris-UrQYPotlwNs, Andrey Ryabinin From: Andrey Ryabinin <arb-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> cpuacct.stat shows user time based on raw random precision tick based counters. Use cputime_addjust() to scale these values against the total runtime accounted by the scheduler, like we already do for user/system times in /proc/<pid>/stat. Signed-off-by: Andrey Ryabinin <arb-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> --- Changes since v1: - fix cputime.sum_exec_runtime calculation --- kernel/sched/cpuacct.c | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 9de7dd51beb0..3d06c5e4220d 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -261,25 +261,30 @@ static int cpuacct_all_seq_show(struct seq_file *m, void *V) static int cpuacct_stats_show(struct seq_file *sf, void *v) { struct cpuacct *ca = css_ca(seq_css(sf)); - s64 val[CPUACCT_STAT_NSTATS]; + struct task_cputime cputime; + u64 val[CPUACCT_STAT_NSTATS]; int cpu; int stat; - memset(val, 0, sizeof(val)); + memset(&cputime, 0, sizeof(cputime)); for_each_possible_cpu(cpu) { u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_USER]; - val[CPUACCT_STAT_USER] += cpustat[CPUTIME_NICE]; - val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SYSTEM]; - val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_IRQ]; - val[CPUACCT_STAT_SYSTEM] += cpustat[CPUTIME_SOFTIRQ]; + cputime.utime += cpustat[CPUTIME_USER]; + cputime.utime += cpustat[CPUTIME_NICE]; + cputime.stime += cpustat[CPUTIME_SYSTEM]; + cputime.stime += cpustat[CPUTIME_IRQ]; + cputime.stime += cpustat[CPUTIME_SOFTIRQ]; + + cputime.sum_exec_runtime += *per_cpu_ptr(ca->cpuusage, cpu); } + cputime_adjust(&cputime, &seq_css(sf)->cgroup->prev_cputime, + &val[CPUACCT_STAT_USER], &val[CPUACCT_STAT_SYSTEM]); + for (stat = 0; stat < CPUACCT_STAT_NSTATS; stat++) { - seq_printf(sf, "%s %lld\n", - cpuacct_stat_desc[stat], - (long long)nsec_to_clock_t(val[stat])); + seq_printf(sf, "%s %llu\n", cpuacct_stat_desc[stat], + nsec_to_clock_t(val[stat])); } return 0; -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 4/5] sched/cpuacct: fix user/system in shown cpuacct.usage* 2021-08-20 9:40 ` [PATCH v2 1/5] cputime, cpuacct: " Andrey Ryabinin [not found] ` <20210820094005.20596-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> @ 2021-08-20 9:40 ` Andrey Ryabinin 2021-09-10 19:03 ` Daniel Jordan 2021-09-07 17:19 ` [PATCH v2 1/5] cputime, cpuacct: Include guest time in user time in cpuacct.stat Tejun Heo 2 siblings, 1 reply; 17+ messages in thread From: Andrey Ryabinin @ 2021-08-20 9:40 UTC (permalink / raw) To: Tejun Heo, Zefan Li, Johannes Weiner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot Cc: Daniel Jordan, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, cgroups, linux-kernel, bharata, boris, Andrey Ryabinin, stable cpuacct has 2 different ways of accounting and showing user and system times. The first one uses cpuacct_account_field() to account times and cpuacct.stat file to expose them. And this one seems to work ok. The second one is uses cpuacct_charge() function for accounting and set of cpuacct.usage* files to show times. Despite some attempts to fix it in the past it still doesn't work. Sometimes while running KVM guest the cpuacct_charge() accounts most of the guest time as system time. This doesn't match with user&system times shown in cpuacct.stat or proc/<pid>/stat. Demonstration: # git clone https://github.com/aryabinin/kvmsample # make # mkdir /sys/fs/cgroup/cpuacct/test # echo $$ > /sys/fs/cgroup/cpuacct/test/tasks # ./kvmsample & # for i in {1..5}; do cat /sys/fs/cgroup/cpuacct/test/cpuacct.usage_sys; sleep 1; done 1976535645 2979839428 3979832704 4983603153 5983604157 Use cpustats accounted in cpuacct_account_field() as the source of user/sys times for cpuacct.usage* files. Make cpuacct_charge() to account only summary execution time. Fixes: d740037fac70 ("sched/cpuacct: Split usage accounting into user_usage and sys_usage") Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> Cc: <stable@vger.kernel.org> --- Changes since v1: - remove struct cpuacct_usage; --- kernel/sched/cpuacct.c | 79 +++++++++++++++++------------------------- 1 file changed, 32 insertions(+), 47 deletions(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index f347cf9e4634..9de7dd51beb0 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -21,15 +21,11 @@ static const char * const cpuacct_stat_desc[] = { [CPUACCT_STAT_SYSTEM] = "system", }; -struct cpuacct_usage { - u64 usages[CPUACCT_STAT_NSTATS]; -}; - /* track CPU usage of a group of tasks and its child groups */ struct cpuacct { struct cgroup_subsys_state css; /* cpuusage holds pointer to a u64-type object on every CPU */ - struct cpuacct_usage __percpu *cpuusage; + u64 __percpu *cpuusage; struct kernel_cpustat __percpu *cpustat; }; @@ -49,7 +45,7 @@ static inline struct cpuacct *parent_ca(struct cpuacct *ca) return css_ca(ca->css.parent); } -static DEFINE_PER_CPU(struct cpuacct_usage, root_cpuacct_cpuusage); +static DEFINE_PER_CPU(u64, root_cpuacct_cpuusage); static struct cpuacct root_cpuacct = { .cpustat = &kernel_cpustat, .cpuusage = &root_cpuacct_cpuusage, @@ -68,7 +64,7 @@ cpuacct_css_alloc(struct cgroup_subsys_state *parent_css) if (!ca) goto out; - ca->cpuusage = alloc_percpu(struct cpuacct_usage); + ca->cpuusage = alloc_percpu(u64); if (!ca->cpuusage) goto out_free_ca; @@ -99,7 +95,8 @@ static void cpuacct_css_free(struct cgroup_subsys_state *css) static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, enum cpuacct_stat_index index) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; u64 data; /* @@ -116,14 +113,17 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, raw_spin_rq_lock_irq(cpu_rq(cpu)); #endif - if (index == CPUACCT_STAT_NSTATS) { - int i = 0; - - data = 0; - for (i = 0; i < CPUACCT_STAT_NSTATS; i++) - data += cpuusage->usages[i]; - } else { - data = cpuusage->usages[index]; + switch (index) { + case CPUACCT_STAT_USER: + data = cpustat[CPUTIME_USER] + cpustat[CPUTIME_NICE]; + break; + case CPUACCT_STAT_SYSTEM: + data = cpustat[CPUTIME_SYSTEM] + cpustat[CPUTIME_IRQ] + + cpustat[CPUTIME_SOFTIRQ]; + break; + case CPUACCT_STAT_NSTATS: + data = *cpuusage; + break; } #ifndef CONFIG_64BIT @@ -133,10 +133,14 @@ static u64 cpuacct_cpuusage_read(struct cpuacct *ca, int cpu, return data; } -static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) +static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - int i; + u64 *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); + u64 *cpustat = per_cpu_ptr(ca->cpustat, cpu)->cpustat; + + /* Don't allow to reset global kernel_cpustat */ + if (ca == &root_cpuacct) + return; #ifndef CONFIG_64BIT /* @@ -144,9 +148,10 @@ static void cpuacct_cpuusage_write(struct cpuacct *ca, int cpu, u64 val) */ raw_spin_rq_lock_irq(cpu_rq(cpu)); #endif - - for (i = 0; i < CPUACCT_STAT_NSTATS; i++) - cpuusage->usages[i] = val; + *cpuusage = 0; + cpustat[CPUTIME_USER] = cpustat[CPUTIME_NICE] = 0; + cpustat[CPUTIME_SYSTEM] = cpustat[CPUTIME_IRQ] = 0; + cpustat[CPUTIME_SOFTIRQ] = 0; #ifndef CONFIG_64BIT raw_spin_rq_unlock_irq(cpu_rq(cpu)); @@ -197,7 +202,7 @@ static int cpuusage_write(struct cgroup_subsys_state *css, struct cftype *cft, return -EINVAL; for_each_possible_cpu(cpu) - cpuacct_cpuusage_write(ca, cpu, 0); + cpuacct_cpuusage_write(ca, cpu); return 0; } @@ -244,25 +249,10 @@ static int cpuacct_all_seq_show(struct seq_file *m, void *V) seq_puts(m, "\n"); for_each_possible_cpu(cpu) { - struct cpuacct_usage *cpuusage = per_cpu_ptr(ca->cpuusage, cpu); - seq_printf(m, "%d", cpu); - - for (index = 0; index < CPUACCT_STAT_NSTATS; index++) { -#ifndef CONFIG_64BIT - /* - * Take rq->lock to make 64-bit read safe on 32-bit - * platforms. - */ - raw_spin_rq_lock_irq(cpu_rq(cpu)); -#endif - - seq_printf(m, " %llu", cpuusage->usages[index]); - -#ifndef CONFIG_64BIT - raw_spin_rq_unlock_irq(cpu_rq(cpu)); -#endif - } + for (index = 0; index < CPUACCT_STAT_NSTATS; index++) + seq_printf(m, " %llu", + cpuacct_cpuusage_read(ca, cpu, index)); seq_puts(m, "\n"); } return 0; @@ -340,16 +330,11 @@ static struct cftype files[] = { void cpuacct_charge(struct task_struct *tsk, u64 cputime) { struct cpuacct *ca; - int index = CPUACCT_STAT_SYSTEM; - struct pt_regs *regs = get_irq_regs() ? : task_pt_regs(tsk); - - if (regs && user_mode(regs)) - index = CPUACCT_STAT_USER; rcu_read_lock(); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) - __this_cpu_add(ca->cpuusage->usages[index], cputime); + __this_cpu_add(*ca->cpuusage, cputime); rcu_read_unlock(); } -- 2.31.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 4/5] sched/cpuacct: fix user/system in shown cpuacct.usage* 2021-08-20 9:40 ` [PATCH v2 4/5] sched/cpuacct: fix user/system in shown cpuacct.usage* Andrey Ryabinin @ 2021-09-10 19:03 ` Daniel Jordan 0 siblings, 0 replies; 17+ messages in thread From: Daniel Jordan @ 2021-09-10 19:03 UTC (permalink / raw) To: Andrey Ryabinin Cc: Tejun Heo, Zefan Li, Johannes Weiner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, cgroups, linux-kernel, bharata, boris, stable On Fri, Aug 20, 2021 at 12:40:04PM +0300, Andrey Ryabinin wrote: > cpuacct has 2 different ways of accounting and showing user > and system times. > > The first one uses cpuacct_account_field() to account times > and cpuacct.stat file to expose them. And this one seems to work ok. > > The second one is uses cpuacct_charge() function for accounting and > set of cpuacct.usage* files to show times. Despite some attempts to > fix it in the past it still doesn't work. Sometimes while running KVM > guest the cpuacct_charge() accounts most of the guest time as > system time. This doesn't match with user&system times shown in > cpuacct.stat or proc/<pid>/stat. > > Demonstration: > # git clone https://github.com/aryabinin/kvmsample > # make > # mkdir /sys/fs/cgroup/cpuacct/test > # echo $$ > /sys/fs/cgroup/cpuacct/test/tasks > # ./kvmsample & > # for i in {1..5}; do cat /sys/fs/cgroup/cpuacct/test/cpuacct.usage_sys; sleep 1; done > 1976535645 > 2979839428 > 3979832704 > 4983603153 > 5983604157 Thanks for expanding on this, and fixing broken cpuacct_charge. For the series, Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/5] cputime, cpuacct: Include guest time in user time in cpuacct.stat 2021-08-20 9:40 ` [PATCH v2 1/5] cputime, cpuacct: " Andrey Ryabinin [not found] ` <20210820094005.20596-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-08-20 9:40 ` [PATCH v2 4/5] sched/cpuacct: fix user/system in shown cpuacct.usage* Andrey Ryabinin @ 2021-09-07 17:19 ` Tejun Heo 2 siblings, 0 replies; 17+ messages in thread From: Tejun Heo @ 2021-09-07 17:19 UTC (permalink / raw) To: Andrey Ryabinin Cc: Zefan Li, Johannes Weiner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Daniel Jordan, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, cgroups, linux-kernel, bharata, boris, stable On Fri, Aug 20, 2021 at 12:40:01PM +0300, Andrey Ryabinin wrote: > cpuacct.stat in no-root cgroups shows user time without guest time > included int it. This doesn't match with user time shown in root > cpuacct.stat and /proc/<pid>/stat. This also affects cgroup2's cpu.stat > in the same way. > > Make account_guest_time() to add user time to cgroup's cpustat to > fix this. > > Fixes: ef12fefabf94 ("cpuacct: add per-cgroup utime/stime statistics") > Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com> > Cc: <stable@vger.kernel.org> The fact that this has been broken for so long, prolly from the beginning, gives me some pause but the patches looks fine to me. For the series, Acked-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-09-10 19:03 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-02-17 12:00 [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat Andrey Ryabinin 2021-02-17 12:00 ` [PATCH 3/4] sched/cpuacct: fix user/system in shown cpuacct.usage* Andrey Ryabinin 2021-03-17 22:22 ` Daniel Jordan [not found] ` <87r1kdl8se.fsf-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2021-08-20 9:37 ` Andrey Ryabinin [not found] ` <20210217120004.7984-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-02-17 12:00 ` [PATCH 2/4] cgroup: Fix 'usage_usec' time in root's cpu.stat Andrey Ryabinin [not found] ` <20210217120004.7984-2-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-03-17 22:13 ` Daniel Jordan 2021-02-17 12:00 ` [PATCH 4/4] sched/cpuacct: Make user/system times in cpuacct.stat more precise Andrey Ryabinin [not found] ` <20210217120004.7984-4-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-03-17 22:25 ` Daniel Jordan 2021-03-17 22:09 ` [PATCH 1/4] cputime,cpuacct: Include guest time in user time in cpuacct.stat Daniel Jordan 2021-08-20 9:37 ` Andrey Ryabinin 2021-08-20 9:40 ` [PATCH v2 1/5] cputime, cpuacct: " Andrey Ryabinin [not found] ` <20210820094005.20596-1-arbn-XoJtRXgx1JtgGdypvdXH1w@public.gmane.org> 2021-08-20 9:40 ` [PATCH v2 2/5] cpuacct: convert BUG_ON() to WARN_ON_ONCE() Andrey Ryabinin 2021-08-20 9:40 ` [PATCH v2 3/5] cgroup: Fix 'usage_usec' time in root's cpu.stat Andrey Ryabinin 2021-08-20 9:40 ` [PATCH v2 5/5] sched/cpuacct: Make user/system times in cpuacct.stat more precise Andrey Ryabinin 2021-08-20 9:40 ` [PATCH v2 4/5] sched/cpuacct: fix user/system in shown cpuacct.usage* Andrey Ryabinin 2021-09-10 19:03 ` Daniel Jordan 2021-09-07 17:19 ` [PATCH v2 1/5] cputime, cpuacct: Include guest time in user time in cpuacct.stat Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).