Re: [RFD 3/9] Display /proc/stat information per cgroup

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Glauber Costa <glommer@parallels.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: <linux-kernel@vger.kernel.org>, <paul@paulmenage.org>,
	<lizf@cn.fujitsu.com>, <daniel.lezcano@free.fr>,
	<a.p.zijlstra@chello.nl>, <jbottomley@parallels.com>
Subject: Re: [RFD 3/9] Display /proc/stat information per cgroup
Date: Tue, 27 Sep 2011 15:42:40 -0300	[thread overview]
Message-ID: <4E821920.3000509@parallels.com> (raw)
In-Reply-To: <CAKTCnz=5wLq2OAKLfCOh5bX+t2T1exhhCdHUMkQchPzHNyU5Hw@mail.gmail.com>

On 09/27/2011 02:01 PM, Balbir Singh wrote:
> On Sat, Sep 24, 2011 at 3:50 AM, Glauber Costa<glommer@parallels.com>  wrote:
>> Each cgroup has its own file, cpu.proc.stat that will
>> display the exact same format as /proc/stat. Users
>> that want to have access to a per-cgroup version of
>> this information, can query it for that purpose.
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
> ...

Hi Balbir, thanks for reviewing this.

>> +static inline void task_cgroup_account_field(struct task_struct *p,
>> +                                            cputime64_t tmp, int index)
>> +{
>> +       struct kernel_stat *kstat;
>> +       struct task_group *tg = task_group(p);
>> +
>> +       do {
>> +               kstat = this_cpu_ptr(tg->cpustat);
>> +               kstat->cpustat[index] = cputime64_add(kstat->cpustat[index],
>> +                                                     tmp);
>> +               tg = tg->parent;
>> +       } while (tg);
>> +}
>
> What protects the walk (tg = tg->parent)? Could you please document it
I think that the fact that the hierarchy only grows down, thus parent 
never changes (or am I wrong?)

And since we run all this with preempt disabled and with the runqueue 
locked, we should have no problems.

Do you agree?

>> +
>>   /*
>>   * Account user cpu time to a process.
>>   * @p: the process that the cpu time gets accounted to
>> @@ -3763,9 +3777,8 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p)
>>   * @cputime_scaled: cputime scaled by cpu frequency
>>   */
>>   void account_user_time(struct task_struct *p, cputime_t cputime,
>> -                      cputime_t cputime_scaled)
>> +               cputime_t cputime_scaled)
>>   {
>> -       cputime64_t *cpustat = kstat_this_cpu->cpustat;
>>         cputime64_t tmp;
>>
>>         /* Add user time to process. */
>> @@ -3775,10 +3788,11 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
>>
>>         /* Add user time to cpustat. */
>>         tmp = cputime_to_cputime64(cputime);
>> +
>>         if (TASK_NICE(p)>  0)
>> -               cpustat[NICE] = cputime64_add(cpustat[NICE], tmp);
>> +               task_cgroup_account_field(p, tmp, NICE);
>>         else
>> -               cpustat[USER] = cputime64_add(cpustat[USER], tmp);
>> +               task_cgroup_account_field(p, tmp, USER);
>>
>>         cpuacct_update_stats(p, CPUACCT_STAT_USER, cputime);
>>         /* Account for user time used */
>> @@ -3824,7 +3838,7 @@ static void account_guest_time(struct task_struct *p, cputime_t cputime,
>>   */
>>   static inline
>>   void __account_system_time(struct task_struct *p, cputime_t cputime,
>> -                       cputime_t cputime_scaled, cputime64_t *target_cputime64)
>> +                       cputime_t cputime_scaled, int index)
>>   {
>>         cputime64_t tmp = cputime_to_cputime64(cputime);
>>
>> @@ -3834,7 +3848,7 @@ void __account_system_time(struct task_struct *p, cputime_t cputime,
>>         account_group_system_time(p, cputime);
>>
>>         /* Add system time to cpustat. */
>> -       *target_cputime64 = cputime64_add(*target_cputime64, tmp);
>> +       task_cgroup_account_field(p, tmp, index);
>>         cpuacct_update_stats(p, CPUACCT_STAT_SYSTEM, cputime);
>>
>>         /* Account for system time used */
>> @@ -3851,8 +3865,7 @@ void __account_system_time(struct task_struct *p, cputime_t cputime,
>>   void account_system_time(struct task_struct *p, int hardirq_offset,
>>                          cputime_t cputime, cputime_t cputime_scaled)
>>   {
>> -       cputime64_t *cpustat = kstat_this_cpu->cpustat;
>> -       cputime64_t *target_cputime64;
>> +       int index;
>>
>>         if ((p->flags&  PF_VCPU)&&  (irq_count() - hardirq_offset == 0)) {
>>                 account_guest_time(p, cputime, cputime_scaled);
>> @@ -3860,13 +3873,13 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
>>         }
>>
>>         if (hardirq_count() - hardirq_offset)
>> -               target_cputime64 =&cpustat[IRQ];
>> +               index = IRQ;
>>         else if (in_serving_softirq())
>> -               target_cputime64 =&cpustat[SOFTIRQ];
>> +               index = SOFTIRQ;
>>         else
>> -               target_cputime64 =&cpustat[SYSTEM];
>> +               index = SYSTEM;
>>
>> -       __account_system_time(p, cputime, cputime_scaled, target_cputime64);
>> +       __account_system_time(p, cputime, cputime_scaled, index);
>>   }
>>
>>   /*
>> @@ -3941,27 +3954,29 @@ static __always_inline bool steal_account_process_tick(void)
>>   * softirq as those do not count in task exec_runtime any more.
>>   */
>>   static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
>> -                                               struct rq *rq)
>> +                                        struct rq *rq)
>>   {
>>         cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
>>         cputime64_t tmp = cputime_to_cputime64(cputime_one_jiffy);
>> -       cputime64_t *cpustat = kstat_this_cpu->cpustat;
>> +       struct task_group *tg;
>>
>>         if (steal_account_process_tick())
>>                 return;
>>
>> +       tg = task_group(p);
>> +
>>         if (irqtime_account_hi_update()) {
>> -               cpustat[IRQ] = cputime64_add(cpustat[IRQ], tmp);
>> +               task_cgroup_account_field(p, tmp, IRQ);
>>         } else if (irqtime_account_si_update()) {
>> -               cpustat[SOFTIRQ] = cputime64_add(cpustat[SOFTIRQ], tmp);
>> +               task_cgroup_account_field(p, tmp, SOFTIRQ);
>>         } else if (this_cpu_ksoftirqd() == p) {
>>                 /*
>>                  * ksoftirqd time do not get accounted in cpu_softirq_time.
>>                  * So, we have to handle it separately here.
>>                  * Also, p->stime needs to be updated for ksoftirqd.
>>                  */
>> -               __account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
>> -&cpustat[SOFTIRQ]);
>> +               __account_system_time(p, cputime_one_jiffy,
>> +                                     one_jiffy_scaled, SOFTIRQ);
>>         } else if (user_tick) {
>>                 account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
>>         } else if (p == rq->idle) {
>> @@ -3969,8 +3984,8 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
>>         } else if (p->flags&  PF_VCPU) { /* System time or guest time */
>>                 account_guest_time(p, cputime_one_jiffy, one_jiffy_scaled);
>>         } else {
>> -               __account_system_time(p, cputime_one_jiffy, one_jiffy_scaled,
>> -&cpustat[SYSTEM]);
>> +               __account_system_time(p, cputime_one_jiffy,
>> +                                     one_jiffy_scaled, SYSTEM);
>>         }
>>   }
>>
>> @@ -8038,6 +8053,7 @@ void __init sched_init(void)
>>   {
>>         int i, j;
>>         unsigned long alloc_size = 0, ptr;
>> +       struct kernel_stat *kstat;
>>
>>   #ifdef CONFIG_FAIR_GROUP_SCHED
>>         alloc_size += 2 * nr_cpu_ids * sizeof(void **);
>> @@ -8092,6 +8108,18 @@ void __init sched_init(void)
>>         INIT_LIST_HEAD(&root_task_group.children);
>>         INIT_LIST_HEAD(&root_task_group.siblings);
>>         autogroup_init(&init_task);
>> +
>> +       root_task_group.cpustat = alloc_percpu(struct kernel_stat);
>> +       /* Failing that early an allocation means we're screwed anyway */
>> +       BUG_ON(!root_task_group.cpustat);
>> +       for_each_possible_cpu(i) {
>
> for_each_possible_cpu might be an overkill, no?
>
>> +               kstat = per_cpu_ptr(root_task_group.cpustat, i);
>> +               kstat->cpustat[IDLE] = 0;
>> +               kstat->cpustat[IDLE_BASE] = 0;
>> +               kstat->cpustat[IOWAIT_BASE] = 0;
>> +               kstat->cpustat[IOWAIT] = 0;
>> +       }
>> +
>>   #endif /* CONFIG_CGROUP_SCHED */
>>
>>         for_each_possible_cpu(i) {
>> @@ -8526,6 +8554,7 @@ static void free_sched_group(struct task_group *tg)
>>         free_fair_sched_group(tg);
>>         free_rt_sched_group(tg);
>>         autogroup_free(tg);
>> +       free_percpu(tg->cpustat);
>>         kfree(tg);
>>   }
>>
>> @@ -8534,6 +8563,7 @@ struct task_group *sched_create_group(struct task_group *parent)
>>   {
>>         struct task_group *tg;
>>         unsigned long flags;
>> +       int i;
>>
>>         tg = kzalloc(sizeof(*tg), GFP_KERNEL);
>>         if (!tg)
>> @@ -8545,6 +8575,19 @@ struct task_group *sched_create_group(struct task_group *parent)
>>         if (!alloc_rt_sched_group(tg, parent))
>>                 goto err;
>>
>> +       tg->cpustat = alloc_percpu(struct kernel_stat);
>> +       if (!tg->cpustat)
>> +               goto err;
>> +
>> +       for_each_possible_cpu(i) {
>> +               struct kernel_stat *kstat, *root_kstat;
>> +
>> +               kstat = per_cpu_ptr(tg->cpustat, i);
>> +               root_kstat = per_cpu_ptr(root_task_group.cpustat, i);
>> +               kstat->cpustat[IDLE_BASE]  = root_kstat->cpustat[IDLE];
>> +               kstat->cpustat[IOWAIT_BASE] = root_kstat->cpustat[IOWAIT];
>> +       }
>> +
>>         spin_lock_irqsave(&task_group_lock, flags);
>>         list_add_rcu(&tg->list,&task_groups);
>>
>> @@ -9092,7 +9135,9 @@ int cpu_cgroup_proc_stat(struct cgroup *cgrp, struct cftype *cft, struct seq_fil
>>                 system = cputime64_add(system, kstat->cpustat[SYSTEM]);
>>                 idle = cputime64_add(idle, root_kstat->cpustat[IDLE]);
>>                 idle = cputime64_add(idle, arch_idle_time(i));
>> +               idle = cputime64_sub(idle, kstat->cpustat[IDLE_BASE]);
>>                 iowait = cputime64_add(iowait, root_kstat->cpustat[IOWAIT]);
>> +               iowait = cputime64_sub(iowait, kstat->cpustat[IOWAIT_BASE]);
>>                 irq = cputime64_add(irq, kstat->cpustat[IRQ]);
>>                 softirq = cputime64_add(softirq, kstat->cpustat[SOFTIRQ]);
>>                 steal = cputime64_add(steal, kstat->cpustat[STEAL]);
>> @@ -9134,7 +9179,9 @@ int cpu_cgroup_proc_stat(struct cgroup *cgrp, struct cftype *cft, struct seq_fil
>>                 system = kstat->cpustat[SYSTEM];
>>                 idle = root_kstat->cpustat[IDLE];
>>                 idle = cputime64_add(idle, arch_idle_time(i));
>> +               idle = cputime64_sub(idle, kstat->cpustat[IDLE_BASE]);
>>                 iowait = root_kstat->cpustat[IOWAIT];
>> +               iowait = cputime64_sub(iowait, kstat->cpustat[IOWAIT_BASE]);
>>                 irq = kstat->cpustat[IRQ];
>>                 softirq = kstat->cpustat[SOFTIRQ];
>>                 steal = kstat->cpustat[STEAL];
>> @@ -9205,6 +9252,10 @@ static struct cftype cpu_files[] = {
>>                 .write_u64 = cpu_rt_period_write_uint,
>>         },
>>   #endif
>> +       {
>> +               .name = "proc.stat",
>> +               .read_seq_string = cpu_cgroup_proc_stat,
>
> Looks fine to me
>
> Balbir Singh
Awesome.

Thanks.

next prev parent reply	other threads:[~2011-09-27 18:43 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-23 22:20 [RFD 0/9] per-cgroup /proc/stat statistics Glauber Costa
2011-09-23 22:20 ` [RFD 1/9] Change cpustat fields to an array Glauber Costa
2011-09-27 21:00   ` Peter Zijlstra
2011-09-28 15:13     ` Glauber Costa
2011-09-28 15:23       ` Peter Zijlstra
2011-09-28 18:19     ` Glauber Costa
2011-09-28 19:09       ` Peter Zijlstra
2011-09-28 20:04         ` Glauber Costa
2011-10-01 17:47         ` Glauber Costa
2011-09-27 21:03   ` Peter Zijlstra
2011-09-28 15:14     ` Glauber Costa
2011-09-23 22:20 ` [RFD 2/9] Move /proc/stat logic inside sched.c Glauber Costa
2011-09-23 22:20 ` [RFD 3/9] Display /proc/stat information per cgroup Glauber Costa
2011-09-27 17:01   ` Balbir Singh
2011-09-27 18:42     ` Glauber Costa [this message]
2011-09-27 22:21       ` Peter Zijlstra
2011-09-28 15:22         ` Glauber Costa
2011-09-28 15:23         ` Glauber Costa
2011-09-27 21:48   ` Peter Zijlstra
2011-09-28 15:14     ` Glauber Costa
2011-09-27 21:52   ` Peter Zijlstra
2011-09-28 15:15     ` Glauber Costa
2011-09-23 22:20 ` [RFD 4/9] Make total_forks per-cgroup Glauber Costa
2011-09-27 22:00   ` Peter Zijlstra
2011-09-28  8:13     ` Martin Schwidefsky
2011-09-28 10:35       ` Peter Zijlstra
2011-09-28 12:42         ` Martin Schwidefsky
2011-09-28 12:53           ` Peter Zijlstra
2011-09-28 15:29             ` Glauber Costa
2011-09-28 15:33               ` Peter Zijlstra
2011-09-28 15:35                 ` Glauber Costa
2011-09-28 15:37                   ` Peter Zijlstra
2011-09-28 15:39                     ` Glauber Costa
2011-09-28 15:28           ` Glauber Costa
2011-09-28 15:27         ` Glauber Costa
2011-09-28 15:26       ` Glauber Costa
2011-09-23 22:20 ` [RFD 5/9] per-cgroup boot time Glauber Costa
2011-09-23 22:20 ` [RFD 6/9] Report steal time for cgroup Glauber Costa
2011-09-23 22:20 ` [RFD 7/9] provide a version of cpuacct statistics inside cpu cgroup Glauber Costa
2011-09-23 22:20 ` [RFD 8/9] provide a version of cpuusage " Glauber Costa
2011-09-23 22:20 ` [RFD 9/9] Change CPUACCT to default n Glauber Costa
2011-09-27 22:11 ` [RFD 0/9] per-cgroup /proc/stat statistics Peter Zijlstra
2011-09-28 15:21   ` Glauber Costa
2011-09-28 15:27     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E821920.3000509@parallels.com \
    --to=glommer@parallels.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=bsingharora@gmail.com \
    --cc=daniel.lezcano@free.fr \
    --cc=jbottomley@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=paul@paulmenage.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).