From: Tejun Heo <tj@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: lizefan@huawei.com, hannes@cmpxchg.org, mingo@redhat.com,
longman@redhat.com, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel-team@fb.com, pjt@google.com,
luto@amacapital.net, efault@gmx.de,
torvalds@linux-foundation.org, guro@fb.com
Subject: Re: [PATCH 3/3] cgroup: Implement cgroup2 basic CPU usage accounting
Date: Tue, 29 Aug 2017 08:24:27 -0700 [thread overview]
Message-ID: <20170829152426.GL491396@devbig577.frc2.facebook.com> (raw)
In-Reply-To: <20170829143252.6zoes63bwfflukjy@hirez.programming.kicks-ass.net>
Hello, Peter.
On Tue, Aug 29, 2017 at 04:32:52PM +0200, Peter Zijlstra wrote:
> So I mostly like. On accounting it only adds to the immediate cgroup (if
> it has a parent, aka !root).
>
> On update it does a DFS of all sub-groups and propagates the deltas up
> to the requested group.
...
> What I don't get is why you need cgroup_cpu_stat_updated(). That is, I
> see you use it to keep the keep the DFS 'stack' up-to-date, but what I
> don't see is why you'd need that.
That is to make reading stats O(number of descendants which have been
active since last read) instad of O(number of all descendants) as
there can be a lot of not-too-active cgroups in a system. Stat
reading can be frequent, so the combination can get really bad. By
keeping the updated list separate, increasing read frequency decreases
the cost of each read.
Also, please note that a system may end up with a lot of cgroups
without the user intending to. memcg drains removed cgroups lazily
and the number of draining cgroups can reach very high numbers if the
system isn't under memory pressure. The plan is to add basic stats
for other resources too and keeping it scalable w.r.t. idle cgroups
allows using the same mechanism for all resources.
> Have a look at walk_tg_tree_from(), I think we can do something like
> that on struct cgroup_subsys_state, it has that children list and the
> parent pointer.
>
> And yes, walk_tg_tree_from() is tricky, it always takes a fair while to
> remember how it works.
We can propagate "updated" flag up the tree (we need to, otherwise we
can't tell which subtree to descend into) and prune the iteration on
subtrees which haven't been updated; however, this can still become
very costly depending on the topology as it can't jump over the
siblings which haven't been updated.
Thanks.
--
tejun
next prev parent reply other threads:[~2017-08-29 15:24 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-11 16:37 [PATCHSET for-4.14] cgroup, sched: cgroup2 basic resource accounting Tejun Heo
2017-08-11 16:37 ` [PATCH 1/3] sched/cputime: Expose cputime_adjust() Tejun Heo
2017-08-11 16:37 ` [PATCH 2/3] cpuacct: Introduce cgroup_account_cputime[_field]() Tejun Heo
2017-08-11 17:28 ` [PATCH v2 " Tejun Heo
2017-08-11 16:37 ` [PATCH 3/3] cgroup: Implement cgroup2 basic CPU usage accounting Tejun Heo
2017-08-11 20:19 ` Waiman Long
2017-08-13 19:44 ` Tejun Heo
2017-08-13 23:20 ` Waiman Long
2017-08-17 12:07 ` Roman Gushchin
2017-08-17 13:13 ` Tejun Heo
2017-08-29 14:32 ` Peter Zijlstra
2017-08-29 15:24 ` Tejun Heo [this message]
2017-08-29 15:32 ` Waiman Long
2017-08-29 15:42 ` Tejun Heo
2017-08-29 15:59 ` Peter Zijlstra
2017-08-29 17:43 ` [PATCH v2 " Tejun Heo
2017-08-29 18:06 ` Waiman Long
2017-08-29 18:10 ` Tejun Heo
2017-08-16 18:52 ` [PATCHSET for-4.14] cgroup, sched: cgroup2 basic resource accounting Tejun Heo
2017-08-17 8:13 ` Ingo Molnar
2017-08-17 13:01 ` Tejun Heo
2017-08-17 15:03 ` Ingo Molnar
2017-08-24 17:20 ` Tejun Heo
2017-09-22 18:05 ` Tejun Heo
2017-09-23 13:37 ` Peter Zijlstra
2017-09-23 13:44 ` Tejun Heo
2017-09-25 7:27 ` Peter Zijlstra
2017-09-25 15:07 ` Tejun Heo
2017-09-25 21:10 ` [PATCH cgroup/for-4.15] cgroup: statically initialize init_css_set->dfl_cgrp Tejun Heo
2017-09-25 21:34 ` [PATCH cgroup/for-4.15] sched/cputime: Add dummy cputime_adjust() implementation for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE Tejun Heo
2017-09-26 9:10 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170829152426.GL491396@devbig577.frc2.facebook.com \
--to=tj@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=efault@gmx.de \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=longman@redhat.com \
--cc=luto@amacapital.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox