From: Tejun Heo <tj@kernel.org>
To: Li Zefan <lizf@cn.fujitsu.com>,
containers@lists.linux-foundation.org, cgroups@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Kay Sievers <kay.sievers@vrfy.org>,
Lennart Poettering <lennart@poettering.net>,
Frederic Weisbecker <fweisbec@gmail.com>,
linux-kernel@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [RFD] cgroup: about multiple hierarchies
Date: Tue, 21 Feb 2012 13:21:06 -0800 [thread overview]
Message-ID: <20120221212106.GF12236@google.com> (raw)
In-Reply-To: <20120221211938.GE12236@google.com>
Sorry, forgot to cc hch. Cc'ing him and quoting whole message.
On Tue, Feb 21, 2012 at 01:19:38PM -0800, Tejun Heo wrote:
> Hello, guys.
>
> I've been thinking about multiple hierarchy support in cgroup for a
> while, especially after Frederic's pending task counter patchset.
> This is a write up of what I've been thinking. I don't know what to
> do yet and simply continuing the current situation definitely is an
> option, so please read on and throw in your 20 Won (or whatever amount
> in whatever currency you want).
>
> * The problems.
>
> The support for multiple process hierarchies always struck me as
> rather strange. If you forget about the current cgroup controllers
> and their implementations, the *only* reason to support multiple
> hierarchies is if you want to apply resource limits based on different
> orthogonal categorizations.
>
> Documentation/cgroups.txt seems to be written with this consideration
> on mind. It's giving an example of applying limits accoring to two
> orthogonal categorizations - user groups (profressors, students...)
> and applications (WWW, NFS...). While it may sound like a valid use
> case, I'm very skeptical how useful or common mixing such orthogonal
> categorizations in a single setup would be.
>
> If support for multiple hierarchies comes for free, at least in terms
> of features, maybe it can be better but of course it isn't so. Any
> given cgroup subsystem (or controller) can only be applied to a single
> hierarchy, which makes sense for a lot of things - what would two
> different limits on the same resource from different hierarchies mean?
> But, there also are things which can be used and useful in all
> hierarchies - e.g. cgroup freezer and task counter.
>
> While the current cgroup implementation and conventions can probably
> allow admins and engineers to tailor cgroup configuration for a
> specific setup, it is very difficult to use in generic and automated
> way. I mean, who owns the freezer or task counter? If they're
> mounted on their own hierarchies, how should they be structured?
> Should the different hierarchies be structured such that they are
> projections of one unified hierarchy so that those generic mechanisms
> can be applied uniformly? If so, why do we need multiple hierarchies
> at all?
>
> A related limitation is that as different subsystems don't know which
> hierarchies they'll end up on, they can't cooperate. Wouldn't it make
> more sense if task counter is a separate thing watching the resources
> and triggers different actions as conifgured - be it failing forks or
> freezing?
>
> And yet another oddity is how cgroup handles nested cgroups - some
> care about nesting but others just treat both internal and leaf nodes
> equally. They don't care about the topology at all. This, too, can
> be fine if you approach things subsys by subsys and use them in
> different ways but if you try to combine them in generic way you get
> sucked into the lala land of whatevers.
>
> The following is a "best practices" document on using cgroups.
>
> http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups
>
> To me, it seems to demonstrate the rather ugly situation that the
> current cgroup is providing. Everyone should tip-toe around cgroup
> hierarchies and nobody has full knowledge or control over them.
> e.g. base system management (e.g. systemd) can't use freezer or task
> counter as someone else might want to use it for different hierarchy
> layout.
>
> It seems to me that cgroup interface is too complicated and inflexible
> at the same time to be useful in generic manner. Sure, it can be
> useful for setups individually crafted by engineers and admins to
> match specific sites or applications but as soon as you try to do
> something automatic and generic with it, there just are too many
> different scenarios and limitations to consider.
>
>
> * So, what to do?
>
> Heh, I don't know. IIRC, last year at LinuxCon Japan, I heard
> Christoph saying that the biggest problem w/ cgroup was that it was
> building completely separate hierarchies out of the traditional
> process hierarchies. After thinking about this stuff for a while, I
> fully agree with him. I think this whole thing should have been a
> layer over the process tree like sessions or program groups.
>
> Unfortunately, that ship sailed long ago and we gotta make do with
> what we have on our collective hands. Here are some paths that we can
> take.
>
> 1. We're screwed anyway. Just don't worry about it and continue down
> on this path. Can't get much worse, right?
>
> This approach has the apparent advantage of not having to do
> anything and is probably most likely to be taken. This isn't ideal
> but hey nothing is. :P
>
> 2. Make it more flexible (and likely more complex, unfortunately).
> Allow the utility type subsystems to be used in multiple
> hierarchies. The easiest and probably dirtiest way to achieve that
> would be embedding them into cgroup core.
>
> Thinking about doing this depresses me and it's not like I have a
> cheerful personality to begin with. :(
>
> 3. Head towards single hierarchy with the pie-in-the-sky goal of
> merging things into process hierarchy in some distant future.
>
> The first step would be herding people to use a unified hierarchy
> (ie. all subsystems mounted on a single cgroup tree) which is
> controlled by single entity in userland (be it systemd or cgroupd,
> cgroup-kit or whatever); however, even if we exclude supporting
> orthogonal categorizations, there are good number of non-trivial
> hurdles to clear before this can be realized.
>
> Most importantly, we would need to clean up how nesting is handled
> across different subsystems. Handling internal and leaf nodes as
> equals simply can't work. Membership should be recursive, and for
> subsystems which can't support proper nesting, the right thing to
> do would be somehow ensuring that only single node in the path from
> root to leaf is active for the controller. We may even have to
> introduce an alternative of operation to support this (yuck).
>
> This path would require the most amount of work and we would be
> excluding a feature - support for multiple orthogonal
> categorizations - which has been available till now, probably
> through deprecation process spanning years; however, this at least
> gives us hope that we may reach sanity in the end, how distant that
> end may be. Oh, hope. :)
>
> So, I mean, I don't know. What do other people think? Is this a
> unnecessary worry? Are people generally happy with the way things
> are? Lennart, Kay, what do you guys think?
>
> Thanks.
>
> --
> tejun
--
tejun
next prev parent reply other threads:[~2012-02-21 21:21 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-21 21:19 [RFD] cgroup: about multiple hierarchies Tejun Heo
2012-02-21 21:21 ` Tejun Heo [this message]
2012-02-22 13:34 ` Glauber Costa
2012-02-23 7:45 ` Serge E. Hallyn
2012-02-23 17:29 ` Tejun Heo
2012-02-23 18:47 ` Serge Hallyn
2012-02-26 4:59 ` Konstantin Khlebnikov
2012-02-22 13:30 ` Peter Zijlstra
2012-02-22 13:37 ` Glauber Costa
2012-02-22 18:01 ` Tejun Heo
2012-02-23 7:39 ` Li Zefan
2012-02-22 15:45 ` Frederic Weisbecker
2012-02-22 18:22 ` Tejun Heo
2012-02-27 17:46 ` Frederic Weisbecker
2012-02-22 16:38 ` Vivek Goyal
2012-02-22 16:57 ` Vivek Goyal
2012-02-22 18:43 ` Tejun Heo
2012-02-23 9:41 ` Peter Zijlstra
2012-02-23 14:13 ` Peter Zijlstra
2012-03-01 17:19 ` Michal Schmidt
2012-03-01 18:03 ` Peter Zijlstra
2012-03-02 11:08 ` Michal Schmidt
2012-03-02 11:23 ` Peter Zijlstra
2012-03-02 11:28 ` Michal Schmidt
2012-03-02 11:34 ` Peter Zijlstra
2012-03-01 20:26 ` Mike Galbraith
2012-03-01 21:02 ` Vivek Goyal
2012-03-01 22:04 ` Mike Galbraith
2012-03-01 22:38 ` C Anthony Risinger
2012-03-02 10:51 ` Michal Schmidt
2012-03-02 11:52 ` Mike Galbraith
2012-03-05 12:43 ` Lennart Poettering
2012-03-05 15:47 ` Mike Galbraith
2012-03-05 19:58 ` Mike Galbraith
2012-03-02 2:43 ` Kay Sievers
2012-03-02 10:15 ` Peter Zijlstra
2012-03-02 11:16 ` Michal Schmidt
2012-03-02 11:24 ` Peter Zijlstra
2012-02-23 21:38 ` Vivek Goyal
2012-02-23 22:34 ` Tejun Heo
2012-02-28 21:16 ` Vivek Goyal
2012-02-28 21:21 ` Peter Zijlstra
2012-02-28 21:35 ` Vivek Goyal
2012-02-28 21:43 ` Peter Zijlstra
2012-02-28 21:54 ` Vivek Goyal
2012-02-28 22:00 ` Peter Zijlstra
2012-02-28 22:31 ` Vivek Goyal
2012-02-28 21:53 ` Peter Zijlstra
2012-02-28 22:09 ` Vivek Goyal
2012-02-24 11:33 ` Peter Zijlstra
2012-02-22 18:33 ` Tejun Heo
2012-02-23 19:41 ` Vivek Goyal
2012-02-23 22:38 ` Tejun Heo
2012-02-23 7:59 ` Li Zefan
2012-02-23 20:32 ` Vivek Goyal
2012-02-23 8:22 ` Li Zefan
2012-02-23 17:33 ` Tejun Heo
[not found] ` <m162em2efy.fsf@fess.ebiederm.org>
2012-03-03 14:26 ` Serge Hallyn
2012-03-05 11:37 ` Lennart Poettering
2012-03-12 22:10 ` Tejun Heo
2012-03-12 22:22 ` Peter Zijlstra
2012-03-12 22:28 ` Tejun Heo
2012-03-12 22:31 ` Lennart Poettering
2012-03-12 23:00 ` Tejun Heo
2012-03-12 23:02 ` Peter Zijlstra
2012-03-12 23:09 ` Tejun Heo
2012-03-12 23:43 ` Lennart Poettering
2012-03-12 22:32 ` Peter Zijlstra
2012-03-12 22:39 ` Tejun Heo
2012-03-12 22:44 ` Peter Zijlstra
2012-03-12 23:04 ` Tejun Heo
2012-03-13 14:10 ` Vivek Goyal
2012-03-13 16:11 ` C Anthony Risinger
2012-03-13 16:30 ` C Anthony Risinger
2012-03-13 17:25 ` Peter Zijlstra
2012-03-13 17:31 ` Peter Zijlstra
2012-03-13 10:11 ` Glauber Costa
2012-03-13 14:03 ` Vivek Goyal
2012-03-13 15:59 ` Tejun Heo
2012-03-16 23:14 ` James Bottomley
2012-03-12 22:37 ` Serge Hallyn
2012-03-12 22:55 ` Tejun Heo
2012-03-13 13:49 ` Vivek Goyal
2012-03-13 16:02 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120221212106.GF12236@google.com \
--to=tj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=fweisbec@gmail.com \
--cc=hch@infradead.org \
--cc=kay.sievers@vrfy.org \
--cc=lennart@poettering.net \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).