public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Serge Hallyn <serge.hallyn@ubuntu.com>
To: Tim Hockin <thockin@hockin.org>
Cc: Mike Galbraith <bitbucket@online.de>, Tejun Heo <tj@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Containers <containers@lists.linux-foundation.org>,
	Kay Sievers <kay.sievers@vrfy.org>,
	lpoetter <lpoetter@redhat.com>,
	workman-devel <workman-devel@redhat.com>,
	jpoimboe <jpoimboe@redhat.com>,
	"dhaval.giani" <dhaval.giani@gmail.com>,
	Cgroups <cgroups@vger.kernel.org>
Subject: Re: cgroup: status-quo and userland efforts
Date: Thu, 27 Jun 2013 11:18:09 -0500	[thread overview]
Message-ID: <20130627161809.GA4957@sergelap> (raw)
In-Reply-To: <CAAAKZwt9QdddFrEjvdBsi3sbQXScKyzY=vZpYXqTwjGUebH1Ag@mail.gmail.com>

Quoting Tim Hockin (thockin@hockin.org):
> On Thu, Jun 27, 2013 at 6:22 AM, Serge Hallyn <serge.hallyn@ubuntu.com> wrote:
> > Quoting Mike Galbraith (bitbucket@online.de):
> >> On Wed, 2013-06-26 at 14:20 -0700, Tejun Heo wrote:
> >> > Hello, Tim.
> >> >
> >> > On Mon, Jun 24, 2013 at 09:07:47PM -0700, Tim Hockin wrote:
> >> > > I really want to understand why this is SO IMPORTANT that you have to
> >> > > break userspace compatibility?  I mean, isn't Linux supposed to be the
> >> > > OS with the stable kernel interface?  I've seen Linus rant time and
> >> > > time again about this - why is it OK now?
> >> >
> >> > What the hell are you talking about?  Nobody is breaking userland
> >> > interface.  A new version of interface is being phased in and the old
> >> > one will stay there for the foreseeable future.  It will be phased out
> >> > eventually but that's gonna take a long time and it will have to be
> >> > something hardly noticeable.  Of course new features will only be
> >> > available with the new interface and there will be efforts to nudge
> >> > people away from the old one but the existing interface will keep
> >> > working it does.
> >>
> >> I can understand some alarm.  When I saw the below I started frothing at
> >> the face and howling at the moon, and I don't even use the things much.
> >>
> >> http://lists.freedesktop.org/archives/systemd-devel/2013-June/011521.html
> >>
> >> Hierarchy layout aside, that "private property" bit says that the folks
> >> who currently own and use the cgroups interface will lose direct access
> >> to it.  I can imagine folks who have become dependent upon an on the fly
> >> management agents of their own design becoming a tad alarmed.
> >
> > FWIW, the code is too embarassing yet to see daylight, but I'm playing
> > with a very lowlevel cgroup manager which supports nesting itself.
> > Access in this POC is low-level ("set freezer.state to THAWED for cgroup
> > /c1/c2", "Create /c3"), but the key feature is that it can run in two
> > modes - native mode in which it uses cgroupfs, and child mode where it
> > talks to a parent manager to make the changes.
> 
> In this world, are users able to read cgroup files, or do they have to
> go through a central agent, too?

The agent won't itself do anything to stop access through cgroupfs, but
the idea would be that cgroupfs would only be mounted in the agent's
mntns.  My hope would be that the libcgroup commands (like cgexec,
cgcreate, etc) would know to talk to the agent when possible, and users
would use those.

> > So then the idea would be that userspace (like libvirt and lxc) would
> > talk over /dev/cgroup to its manager.  Userspace inside a container
> > (which can't actually mount cgroups itself) would talk to its own
> > manager which is talking over a passed-in socket to the host manager,
> > which in turn runs natively (uses cgroupfs, and nests "create /c1" under
> > the requestor's cgroup).
> 
> How do you handle updates of this agent?  Suppose I have hundreds of
> running containers, and I want to release a new version of the cgroupd
> ?

This may change (which is part of what I want to investigate with some
POC), but right now I'm building any controller-aware smarts into it.  I
think that's what you're asking about?  The agent doesn't do "slices"
etc.  This may turn out to be insufficient, we'll see.

So the only state which the agent stores is a list of cgroup mounts (if
in native mode) or an open socket to the parent (if in child mode), and a
list of connected children sockets.

HUPping the agent will cause it to reload the cgroupfs mounts (in case
you've mounted a new controller, living in "the old world" :).  If you
just kill it and start a new one, it shouldn't matter.

> (note: inquiries about the implementation do not denote acceptance of
> the model :)

To put it another way, the problem I'm solving (for now) is not the "I
want a daemon to ensure that requested guarantees are correctly
implemented." In that sense I'm maintaining the status quo, i.e. the
admin needs to architect the layout correctly.

The problem I'm solving is really that I want containers to be able to
handle cgroups even if they can't mount cgroupfs, and I want all
userspace to be able to behave the same whether they are in a container
or not.

This isn't meant as a poke in the eye of anyone who wants to address the
other problem.  If it turns out that we (meaning "the community of
cgroup users") really want such an agent, then we can add that.  I'm not
convinced.

What would probably be a better design, then, would be that the agent
I'm working on can plug into a resource allocation agent.  Or, I
suppose, the other way around.

> > At some point (probably soon) we might want to talk about a standard API
> > for these things.  However I think it will have to come in the form of
> > a standard library, which knows to either send requests over dbus to
> > systemd, or over /dev/cgroup sock to the manager.
> >
> > -serge

  reply	other threads:[~2013-06-27 16:18 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-06  1:21 cgroup: status-quo and userland efforts Tejun Heo
2013-04-08 13:46 ` Glauber Costa
2013-04-08 18:00   ` [Workman-devel] " Vivek Goyal
2013-04-08 18:26   ` Tejun Heo
2013-04-08 23:32   ` Lennart Poettering
2013-04-09  7:37     ` Glauber Costa
2013-04-09 19:11     ` Tejun Heo
2013-04-08 17:59 ` [Workman-devel] " Vivek Goyal
2013-04-08 18:16   ` Tejun Heo
2013-04-08 18:49     ` Tejun Heo
2013-04-08 19:11     ` Vivek Goyal
2013-04-08 19:20       ` Tejun Heo
2013-04-08 19:46         ` Vivek Goyal
2013-04-08 20:02           ` Tejun Heo
2013-04-09  9:50 ` Daniel P. Berrange
2013-04-09 19:38   ` Tejun Heo
2013-04-09 19:46     ` Tejun Heo
2013-04-09 21:04       ` Serge Hallyn
2013-04-09 21:11         ` Tejun Heo
2013-04-16 11:17 ` Li Zefan
2013-04-16 17:10   ` Tejun Heo
2013-04-17  1:29     ` Li Zefan
2013-04-22 21:26 ` Tim Hockin
2013-04-22 21:41   ` Tejun Heo
2013-04-22 22:33     ` Tim Hockin
2013-06-22 23:13       ` Tim Hockin
2013-06-25  0:01         ` Tejun Heo
2013-06-25  4:07           ` Tim Hockin
2013-06-26 21:20             ` Tejun Heo
2013-06-27  0:06               ` Tim Hockin
2013-06-26 23:14                 ` David Lang
2013-06-27  1:04                 ` Tejun Heo
2013-06-27  3:42                   ` Tim Hockin
2013-06-27 17:38                     ` Tejun Heo
2013-06-27 20:46                       ` Tim Hockin
2013-06-27 21:04                         ` Tejun Heo
2013-06-28 18:44                           ` Tim Hockin
2013-06-29 16:40                             ` Tejun Heo
2015-03-03 21:53                               ` Luke Leighton
2015-03-03 21:38                       ` Luke Leighton
2015-03-03 21:17                   ` Luke Leighton
2015-03-04  5:08                     ` David Lang
2015-03-04 11:27                       ` Luke Kenneth Casson Leighton
2015-03-04 20:08                         ` David Lang
2013-06-27  5:45               ` Mike Galbraith
2013-06-27 13:22                 ` Serge Hallyn
2013-06-27 15:29                   ` Tim Hockin
2013-06-27 16:18                     ` Serge Hallyn [this message]
2015-03-03 22:00                       ` Luke Leighton
2013-06-27 17:48                   ` Tejun Heo
2013-06-27 18:14                     ` Serge Hallyn
2013-06-27 18:45                       ` Tejun Heo
2013-06-27 18:51                         ` Serge Hallyn
2013-06-27 18:52                           ` Tejun Heo
2013-06-27 20:52                       ` Tim Hockin
2015-03-03 22:08                     ` Luke Leighton
2013-06-28  9:09                   ` [Workman-devel] " Daniel P. Berrange
2013-06-28 15:53                     ` Serge Hallyn
2013-06-28 18:58                       ` Tim Hockin
2015-03-03 22:20                       ` Luke Leighton
2013-06-27 18:01                 ` Tejun Heo
2013-06-28  3:46                   ` Mike Galbraith
2013-06-28  4:09                     ` Tejun Heo
2013-06-28  4:49                       ` Mike Galbraith
2013-06-28  5:01                         ` Tejun Heo
2013-06-28  6:00                           ` Mike Galbraith
2013-06-28 15:05                           ` Michal Hocko
2013-06-28 18:01                             ` [Workman-devel] " Vivek Goyal
2013-06-28 19:59                               ` Daniel P. Berrange
2013-06-28 22:40                                 ` Serge Hallyn
2013-06-28 22:43                                   ` Tejun Heo
2013-06-30 18:38                               ` Michal Hocko
2013-07-15 18:49                                 ` Vivek Goyal
2013-07-23 14:48                                   ` Michal Hocko
2013-06-28 18:30                             ` Tejun Heo
2013-06-28 18:53                             ` Tim Hockin
2013-06-29  1:48                               ` Lennart Poettering
2013-06-29  3:05                                 ` Tim Hockin
2013-06-30 19:39                                   ` Lennart Poettering
2013-07-01  6:06                                     ` Tim Hockin
2013-07-02 23:57                                     ` Thomas Gleixner
2013-07-03  0:44                                       ` Kay Sievers
2013-07-03  7:37                                         ` Borislav Petkov
2013-07-03  9:30                                         ` Thomas Gleixner
2013-07-09 23:12                                         ` Jiri Kosina
2013-07-03 17:11                                       ` James Bottomley
2013-06-28 19:18                   ` Andy Lutomirski
2013-06-28 19:36                     ` Serge Hallyn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130627161809.GA4957@sergelap \
    --to=serge.hallyn@ubuntu.com \
    --cc=bitbucket@online.de \
    --cc=cgroups@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dhaval.giani@gmail.com \
    --cc=jpoimboe@redhat.com \
    --cc=kay.sievers@vrfy.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lpoetter@redhat.com \
    --cc=thockin@hockin.org \
    --cc=tj@kernel.org \
    --cc=workman-devel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox