cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Aristeu Rozanski <aris-moeOTchvdi7YtjvyW6yDsg@public.gmane.org>,
	Neil Horman <nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Paul Mackerras <paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>,
	"Aneesh Kumar K.V"
	<aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>,
	Arnaldo Carvalho de Melo
	<acme-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>,
	"Serge E. Hallyn" <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>,
	Paul Turner <pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Subject: Re: Controlling devices and device namespaces
Date: Sat, 15 Sep 2012 22:05:20 +0000	[thread overview]
Message-ID: <20120915220520.GA11364@mail.hallyn.com> (raw)
In-Reply-To: <87wqzv7i08.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Quoting Aristeu Rozanski (aris-moeOTchvdi7YtjvyW6yDsg@public.gmane.org):
> >> Tejun,
> >> On Thu, Sep 13, 2012 at 01:58:27PM -0700, Tejun Heo wrote:
> >> >   memcg can be handled by memcg people and I can handle cgroup_freezer
> >> >   and others with help from the authors.  The problematic one is
> >> >   blkio.  If anyone is interested in working on blkio, please be my
> >> >   guest.  Vivek?  Glauber?
> >> 
> >> if Serge is not planning to do it already, I can take a look in device_cgroup.
> >
> > That's fine with me, thanks.
> >
> >> also, heard about the desire of having a device namespace instead with
> >> support for translation ("sda" -> "sdf"). If anyone see immediate use for
> >> this please let me know.
> >
> > Before going down this road, I'd like to discuss this with at least you,
> > me, and Eric Biederman (cc:d) as to how it relates to a device
> > namespace.
> 
> 
> The problem with devices.
> 
> - An unrestricted mknod gives you access to effectively any device in
>   the system.
> 
> - During process migration if the device number changes using
>   stat to file descriptors can fail on the same file descriptor.
> 
> - Devices coming from prexisting filesystems that we mount
>   as unprivileged users are as dangerous as mknod but show
>   that the problem is not limited to mknod.
> 
> - udev thinks mknod is a system call we can remove from the kernel.

Also,

 - udevadm trigger --action=add

causes all the devices known on the host to be re-sent to
everyone (all namespaces).  Which floods everyone and causes the
host to reset some devices.

> ---
> 
> The use cases seem comparitively simple to enumerate.
> 
> - Giving unfiltered access to a device to someone not root.
> 
> - Virtual devices that everyone uses and have no real privilege
>   requirements: /dev/null /dev/tty /dev/zero etc.
> 
> - Dynamically created devices /dev/loopN /dev/tun /dev/macvtapN,
>   nbd, iscsi, /dev/ptsN, etc

and

 - per-namespace uevent filtering.

> ---
> 
> There are a couple of solution to these problems.
> 
> - The classic solution of creating a /dev for a container
>   before starting it.
> 
> - The devpts filesystem.  This works well for unprivileged access
>   to ptys.  Except for the /dev/ptmx sillines I very like how
>   things are handled today with devpts.
> 
> - Device control groups.  I am not quite certain what to make
>   of them.  The only case I see where they are better than
>   a prebuilt static dev is if there is a hotppluged device
>   that I want to push into my container.
> 
>   I think the only problem with device control groups and
>   hierarchies is that removing a device from a whitelist
>   does not recurse down the hierarchy.

That's going to be fixed soon thanks to Aristeu  :)

>   Can a process inside of a device control group create
>   a child group that has access to a subset of it's
>   devices?  The actually checks don't need to be hierarchical
>   but the presence of device nodes should be.

If I understand your question right, yes.

> ---
> 
> I see a couple of holes in the device control picture.
> 
> - How do we handle hotplug events?
> 
>   I think we can do this by relaying events trough userspace,
>   upating the device control groups etc.
> 
> - Unprivileged processess interacting with all of this.
>   (possibly with privilege in their user namespace)
>   What I don't know how to do is how to create a couple of different
>   subhierarchies each for different child processes.
> 
> - Dynamically created devices.
> 
>   My gut feel is that we should replicate the success of devpts
>   and give each type of dynamically created device it's own
>   filesystem and mount point under /dev, and just bend
>   the handful of userspace users into that model.

Phew.  Maybe.  Had not considered that.  But seems daunting.

> - Sysfs
> 
>   My gut says for the container use case we should aim to
>   simply not have dynamically created devices in sysfs
>   and then we can simply not care.
> 
> - Migration
> 
>   Either we need block device numbers that can migrate with us,
>   (possibly a subset of the entire range ala devpts) or we need to send
>   hotplug events to userspace right after a migration so userspace
>   processes that care can invalidate their caches of stat data.
> 
> ---
> 
> With the code in my userns development tree I can create a user
> namespace, create a new mount namespace, and then if I have
> access to any block devices mount filesystems, all without
> needing to have any special privileges.  What I haven't
> figured out is what it would take to get the the device
> control group into the middle that.

I'm really not sure that's a question we want to ask.  The
device control group, like the ns cgroup, was meant as a
temporary workaround to not having user and device namespaces.

If we can come up with a device cgroup model that works to
fill all the requirements we would have for a devices ns, then
great.  But I don't want us to be constrained by that.

> It feels like it should be possible to get the checks straight
> and use the device control group hooks to control which devices
> are usable in a user namespace.  Unfortunately when I try and work
> it out the independence of the user namespace and the device
> control group seem to make that impossible.
> 
> Shrug there is most definitely something missing from our
> model on how to handle devices well.  I am hoping we can
> sprinkling some devpts derived pixie dust at the problem
> migrate userspace to some new interfaces and have life
> be good.
> 
> Eric

Me too!

I'm torn between suggesting that we have a session at UDS to
discuss this, and not wanting to so that we can focus on the
remaining questions with the user namespace.

thanks,
-serge

  parent reply	other threads:[~2012-09-15 22:05 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-13 20:58 [RFC] cgroup TODOs Tejun Heo
2012-09-14 11:15 ` Peter Zijlstra
2012-09-14 12:54   ` Daniel P. Berrange
     [not found]     ` <20120914125427.GW6819-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-14  8:55       ` Glauber Costa
2012-09-14 17:53   ` Tejun Heo
     [not found] ` <20120913205827.GO7677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-14  8:16   ` Glauber Costa
     [not found]     ` <5052E7DF.7040000-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-14  9:12       ` Li Zefan
     [not found]         ` <5052F4FF.6070508-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2012-09-14 11:22           ` Peter Zijlstra
2012-09-14 17:59           ` Tejun Heo
     [not found]             ` <20120914175944.GF17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-14 18:23               ` Peter Zijlstra
2012-09-14 18:33                 ` Tejun Heo
2012-09-14 17:43       ` Tejun Heo
     [not found]         ` <20120914174329.GD17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-17  8:50           ` Glauber Costa
     [not found]             ` <5056E467.2090108-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-17 17:21               ` Tejun Heo
     [not found]                 ` <20120917172123.GB18677-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-18  8:16                   ` Glauber Costa
2012-09-14  9:04   ` Mike Galbraith
     [not found]     ` <1347613484.4340.132.camel-YqMYhexLQo31wTEvPJ5Q0F6hYfS7NtTn@public.gmane.org>
2012-09-14 17:17       ` Tejun Heo
2012-09-14  9:10   ` Daniel P. Berrange
     [not found]     ` <20120914091032.GA6819-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-14  9:08       ` Glauber Costa
2012-09-14 13:58       ` Vivek Goyal
     [not found]         ` <20120914135830.GB6221-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-14 19:29           ` Tejun Heo
     [not found]             ` <20120914192935.GO17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-14 21:51               ` Kay Sievers
2012-09-14 14:25   ` Vivek Goyal
     [not found]     ` <20120914142539.GC6221-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-14 14:53       ` Peter Zijlstra
2012-09-14 15:14         ` Vivek Goyal
     [not found]           ` <20120914151447.GD6221-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-14 21:57             ` Tejun Heo
     [not found]               ` <20120914215701.GW17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-17 15:27                 ` Vivek Goyal
2012-09-18 18:08                 ` Vivek Goyal
2012-09-17  8:55             ` Glauber Costa
2012-09-14 21:39       ` Tejun Heo
     [not found]         ` <20120914213938.GV17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-17 15:05           ` Vivek Goyal
     [not found]             ` <20120917150518.GB5094-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-17 16:40               ` Tejun Heo
2012-09-14 15:03   ` Michal Hocko
     [not found]     ` <20120914150306.GQ28039-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-09-19 14:02       ` Michal Hocko
     [not found]         ` <20120919140203.GA5398-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-09-19 14:03           ` [PATCH 2.6.32] memcg: warn on deeper hierarchies with use_hierarchy==0 Michal Hocko
     [not found]             ` <20120919140308.GB5398-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-09-19 19:38               ` David Rientjes
     [not found]                 ` <alpine.DEB.2.00.1209191237020.749-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>
2012-09-20 13:24                   ` Michal Hocko
     [not found]                     ` <20120920132400.GC23872-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2012-09-20 22:33                       ` David Rientjes
     [not found]                         ` <alpine.DEB.2.00.1209201531250.17455-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>
2012-09-21  7:16                           ` Michal Hocko
2012-09-19 14:03           ` [PATCH 3.0] " Michal Hocko
2012-09-19 14:05           ` [PATCH 3.2+] " Michal Hocko
2012-09-14 18:07   ` [RFC] cgroup TODOs Vivek Goyal
     [not found]     ` <20120914180754.GF6221-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-14 18:53       ` Tejun Heo
     [not found]         ` <20120914185324.GI17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-14 19:28           ` Vivek Goyal
     [not found]             ` <20120914192840.GG6221-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-14 19:44               ` Tejun Heo
     [not found]                 ` <20120914194439.GP17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-14 19:49                   ` Tejun Heo
     [not found]                     ` <20120914194950.GQ17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-14 20:39                       ` Tejun Heo
     [not found]                         ` <20120914203925.GR17747-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-17  8:40                           ` Glauber Costa
     [not found]                             ` <5056E1FC.1090508-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-09-17 17:30                               ` Tejun Heo
2012-09-17 14:37                           ` Vivek Goyal
2012-09-14 18:36   ` Aristeu Rozanski
     [not found]     ` <20120914183641.GA2191-YqEmrenMroyQb786VAuzj9i2O/JbrIOy@public.gmane.org>
2012-09-14 18:54       ` Tejun Heo
2012-09-15  2:20       ` Serge E. Hallyn
     [not found]         ` <20120915022037.GA6438-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-09-15  9:27           ` Controlling devices and device namespaces Eric W. Biederman
     [not found]             ` <87wqzv7i08.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-15 22:05               ` Serge E. Hallyn [this message]
     [not found]                 ` <20120915220520.GA11364-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-09-16  0:24                   ` Eric W. Biederman
     [not found]                     ` <87y5kazuez.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-16  3:31                       ` Serge E. Hallyn
2012-09-16 11:21                       ` Alan Cox
     [not found]                         ` <20120916122112.3f16178d-38n7/U1jhRXW96NNrWNlrekiAK3p4hvP@public.gmane.org>
2012-09-16 11:56                           ` Eric W. Biederman
     [not found]                             ` <87sjaiuqp5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-16 12:17                               ` Eric W. Biederman
     [not found]                                 ` <87d31mupp3.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-16 13:32                                   ` Serge Hallyn
     [not found]                                     ` <5055D4D1.3070407-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2012-09-16 14:23                                       ` Eric W. Biederman
     [not found]                                         ` <87k3vuqc5l.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-09-16 16:13                                           ` Alan Cox
     [not found]                                             ` <20120916171316.517ad0fd-38n7/U1jhRXW96NNrWNlrekiAK3p4hvP@public.gmane.org>
2012-09-16 17:49                                               ` Eric W. Biederman
2012-09-16 16:15                                           ` Serge Hallyn
     [not found]                                             ` <5055FB2A.1020103-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2012-09-16 16:53                                               ` Eric W. Biederman
2012-09-16  8:19       ` [RFC] cgroup TODOs James Bottomley
     [not found]         ` <1347783557.2463.1.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2012-09-16 14:41           ` Eric W. Biederman
2012-09-17 13:21           ` Aristeu Rozanski
2012-09-14 22:03   ` Dhaval Giani
     [not found]     ` <CAPhKKr8wDLrcWHLTRq1M7gU_6CGNxzzF83zJo2WZ5vrY7h8Qyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-09-14 22:06       ` Tejun Heo
2012-09-20  1:33   ` Andy Lutomirski
     [not found]     ` <505A725B.2080901-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
2012-09-20 18:26       ` Tejun Heo
     [not found]         ` <20120920182651.GH28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-20 18:39           ` Andy Lutomirski
2012-09-21 21:40   ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120915220520.GA11364@mail.hallyn.com \
    --to=serge-a9i7lubdfnhqt0dzr+alfa@public.gmane.org \
    --cc=acme-f8uhVLnGfZaxAyOMLChx1axOck334EZe@public.gmane.org \
    --cc=aneesh.kumar-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=aris-moeOTchvdi7YtjvyW6yDsg@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mhocko-AlSwsSmVLrQ@public.gmane.org \
    --cc=mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=nhorman-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org \
    --cc=paulus-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org \
    --cc=pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org \
    --cc=tgraf-G/eBtMaohhA@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).