From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Peter Zijlstra
<a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
"Eric W. Biederman"
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
Serge Hallyn
<serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>,
devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org,
handai.szj-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
Andrew.Phillips-xheW4WVAX9Y@public.gmane.org
Subject: Re: [PATCH v2 2/5] account guest time per-cgroup as well.
Date: Mon, 28 May 2012 13:03:19 +0400 [thread overview]
Message-ID: <4FC33F57.3010702@parallels.com> (raw)
In-Reply-To: <CAPM31RLwY4d-Ng3-T+-1eLxuZxr8wbdC_+sDQbJQXuqEfe9tfg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On 05/26/2012 08:44 AM, Paul Turner wrote:
> On 04/09/2012 03:25 PM, Glauber Costa wrote:
>> In the interest of providing a per-cgroup figure of common statistics,
>> this patch adds a nr_switches counter to each group runqueue (both cfs
>> and rt).
>>
>> To avoid impact on schedule(), we don't walk the tree at stat gather
>> time. This is because schedule() is called much more frequently than
>> the tick functions, in which we do walk the tree.
>>
>> When this figure needs to be read (different patch), we will
>> aggregate them at read time.
>>
>> Signed-off-by: Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ-XMD5yJDbdMReXY1tMh2IBg@public.gmane.org>
>> ---
>> kernel/sched/core.c | 32 ++++++++++++++++++++++++++++++++
>> kernel/sched/sched.h | 3 +++
>> 2 files changed, 35 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 1cfb7f0..1ee3772 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -3168,6 +3168,37 @@ pick_next_task(struct rq *rq)
>> }
>>
>> /*
>> + * For all other data, we do a tree walk at the time of
>> + * gathering. We want, however, to minimize the impact over schedule(),
>> + * because... well... it's schedule().
>> + *
>> + * Therefore we only gather for the current cgroup, and walk the tree
>> + * at read time
>> + */
>> +static void update_switches_task_group(struct rq *rq,
>> + struct task_struct *prev,
>> + struct task_struct *next)
>> +{
>> +#ifdef CONFIG_CGROUP_SCHED
>> + int cpu = cpu_of(rq);
>> +
>> + if (rq->curr_tg ==&root_task_group)
>> + goto out;
>> +
>> +#ifdef CONFIG_FAIR_GROUP_SCHED
>> + if (prev->sched_class ==&fair_sched_class)
>> + rq->curr_tg->cfs_rq[cpu]->nr_switches++;
>> +#endif
>> +#ifdef CONFIG_RT_GROUP_SCHED
>> + if (prev->sched_class ==&rt_sched_class)
>> + rq->curr_tg->rt_rq[cpu]->nr_switches++;
>> +#endif
>
> With this approach why differentiate cfs vs rt? These could both just
> be on the task_group.
>
> This could then just be
> if (prev != root_task_group)
> task_group(prev)->nr_switches++;
well, no. Then it needs to be an atomic update, or something like it.
The runqueue is a percpu data, the task_group is not. That's why I
choosed to use a rq (and the runqueues are separated between classes).
It all boils down to the fact that I wanted to avoid an atomic update in
this path.
But if you think that would be okay, I could change it. Alternatively, I
could come up with another percpu storage as well, since we're
ultimately just reading it later (and for that we need to iterate on all
cpus anyway).
> Which you could wrap in a nice static inline that disappears when
> CONFIG_CGROUP_PROC_STAT isn't there
That I can do.
>
> Another way to go about this would be to promote (demote?) nr_switches
> to the sched_entity. At which point you know you only need to update
> yours, and conditionally update your parents.
You mean the global one ?
Not sure it will work, because that always refer to the root cgroup...
> But.. that's still gross.. Hmm..
>
>> +out:
>> + rq->curr_tg = task_group(next);
>
> If you're going to task_group every time anyway you might as well just
> take it against prev -- then you don't have to cache rq->curr_tg?
>
> Another way to do this would be:
>
> On cfs_rq, rt_rq add:
> int prev_rq_nr_switches, nr_switches
>
> On put_prev_prev_task_fair (against a task)
> cfs_rq_of(prev->se)->prev_rq_nr_switches = rq->nr_switches
>
> On pick_next_task_fair:
> if (cfs_rq_of(prev->se)->prev_rq_nr_switches != rq->nr_switches)
> cfs_rq->nr_switches++;
>
> On aggregating the value for read: +1 if prev_rq_nr_running != rq->nr_running
> [And equivalent for sched_rt]
>
> While this is no nicer (and fractionally more expensive but this is
> never something we'd enable by default), it at least gets the goop out
> of schedule().
At first look this sounds a bit weird to me, but OTOH, this is how a lot
of the stuff is done... All the other statistics in the patch set are
collected this exact same way - because it draws from schedstats, that
always touch the specific rqs, so maybe this gain points for consistency.
I'll give it a shot.
BTW, this means that your first comment about merging cfq and rt is
basically to be disconsidered should I take this route, right ?
next prev parent reply other threads:[~2012-05-28 9:03 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-09 22:25 [PATCH v2 0/5] per-cgroup /proc/stat statistics Glauber Costa
[not found] ` <1334010315-4453-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-09 22:25 ` [PATCH v2 1/5] measure exec_clock for rt sched entities Glauber Costa
2012-04-09 22:25 ` [PATCH v2 2/5] account guest time per-cgroup as well Glauber Costa
[not found] ` <1334010315-4453-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-05-26 4:44 ` Paul Turner
[not found] ` <CAPM31RLwY4d-Ng3-T+-1eLxuZxr8wbdC_+sDQbJQXuqEfe9tfg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-05-28 9:03 ` Glauber Costa [this message]
2012-05-28 13:26 ` Glauber Costa
[not found] ` <4FC37D01.7080704-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-05-29 10:34 ` Glauber Costa
2012-04-09 22:25 ` [PATCH v2 3/5] record nr_switches per task_group Glauber Costa
2012-04-09 22:25 ` [PATCH v2 4/5] expose fine-grained per-cpu data for cpuacct stats Glauber Costa
[not found] ` <1334010315-4453-5-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-18 12:30 ` Sha Zhengju
[not found] ` <CAFj3OHUzKDdS_3LrnTk+XaRVt+fGxWkvmh9cjv88Dt4n8Q39MA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-18 16:14 ` Glauber Costa
2012-04-09 22:25 ` [PATCH v2 5/5] expose per-taskgroup schedstats in cgroup Glauber Costa
[not found] ` <1334010315-4453-6-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-18 14:44 ` Sha Zhengju
[not found] ` <CAFj3OHUwF2My5c-+ZCwLNynNTokYwioXP2jTJ4FtKg_=jPed0Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-18 14:57 ` Sha Zhengju
2012-04-18 16:24 ` Glauber Costa
[not found] ` <4F8EEAC5.7060703-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-04-19 13:30 ` Sha Zhengju
[not found] ` <4F90135C.20203-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2012-04-19 15:00 ` Glauber Costa
2012-05-24 9:10 ` [PATCH v2 0/5] per-cgroup /proc/stat statistics Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FC33F57.3010702@parallels.com \
--to=glommer-bzqdu9zft3wakbo8gow8eq@public.gmane.org \
--cc=Andrew.Phillips-xheW4WVAX9Y@public.gmane.org \
--cc=a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=handai.szj-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=lxc-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
--cc=pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).