public inbox for containers@lists.linux.dev
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	Kay Sievers <kay.sievers-tD+1rO4QERM@public.gmane.org>,
	Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org,
	lxc-devel
	<lxc-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org>,
	mhw-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org,
	Stephane Graber
	<stgraber-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org>
Subject: Re: Device Namespaces
Date: Wed, 25 Sep 2013 22:33:20 -0700	[thread overview]
Message-ID: <20130926053320.GB3725@kroah.com> (raw)
In-Reply-To: <87bo3gshz5.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

On Wed, Sep 25, 2013 at 02:34:54PM -0700, Eric W. Biederman wrote:
> So the big issues for a device namespace to solve are filtering which
> devices a container has access to and being able to dynamically change
> which devices those are at run time (aka hotplug).

As _all_ devices are hotpluggable now (look, there's no CONFIG_HOTPLUG
anymore, because it was redundant), I think you need to really think
this through better (pci, memory, cpus, etc.) before you do anything in
the kernel.

> After having thought about this for a bit I don't know if a pure
> userspace solution is sufficient or actually a good idea.
> 
> - We can manually manage a tmpfs with device nodes in userspace.
>   (But that is deprecated functionality in the mainstream kernel).

Yes, but I'm not going to namespace devtmpfs, as that is going to be an
impossible task, right?

And remember, udev doesn't create device nodes anymore...

> - We can manually export a subset of sysfs with bind mounts.
>   (But that feels hacky, and is essentially incompatible with hotplug).

True.

> - We can relay a call of /sbin/hotplug from outside of a container
>   to inside of a container based on policy.
>   (But no one uses /sbin/hotplug anymore).

That's right, they should be listening to libudev events, so why can't
your daemon shuffle them off to the proper container, all in userspace?

> - There is no way to fake netlink uevents for a container to see them.
>   (The best we could do is replace udev everywhere with something that
>    listens on a unix domain socket).

You shouldn't need to do this.

> - It would be nice to replace the device cgroup with a comprehensive
>   solution that really works. (Among other things the device cgroup
>   does not work in terms of struct device the underlying kernel
>   abstraction for devices).

I didn't even know there was a device cgroup.

Which means that if there is one, odds are it's useless.

> We must manage sysfs entries as well device nodes because:
> - Seeing more than we should has the real potential to confuse
>   userspace, especially a userspace that replays uevents.

You should never replay uevents.  If you don't do that, why can't you
see all of sysfs?

> - Some device control must happens through writing to sysfs files and
>   if we don't remove all root privileges from a container only by
>   exporting a subset of sysfs to that container can we limit which
>   sysfs nodes can be written to.

But you have the issue of controlling devices in a "shared" way, which
isn't going to be usable for almost all devices.

> The current kernel tagged sysfs entry support does not look like a good
> match for the impelementing device filtering.   The common case will
> be allowing devices like /dev/zero, and /dev/null that live in
> /sys/devices/virtual and are the devices we are most likely to care
> about.  Those devices need to live in multiple device namespaces so
> everyone can use them.  Perhaps exclusive assignment will be the more
> common paradigm for device namespaces like it is for network devices in
> the network namespace but from what little I can of this problem right now I
> don't think so.
> 
> I definitely think we should hold off on a kernel level implementation
> until we really understand the issues and are ready to implement device
> namespaces correctly.

I agree, especially as I don't think this will ever work.

> A userspace implementation looks like it can only do about 95% of what
> is really needed, but at the same time looks like an easy way to
> experiment until the problem is sufficiently well understood.

95% is probably way better than what you have today, and will fit the
needs of almost everyone today, so why not do it?

I'd argue that those last 5% either are custom solutions that never get
merged, or candidates for true virtulization.

> In summary the situation with device hoptlug and containers sucks today,
> and we need to do something.  Running a linux desktop in a container is
> a reasonably good example use case.

No it isn't.  I'd argue that this is a horrible use case, one that you
shouldn't do.  Why not just use multi-head machines like people do who
really want to do this, relying on user separation?  That's a workable
solution that is quite common and works very well today.

> Having one standard common maintainable implementation would be very
> useful and the most logical place for that would be in the kernel.
> For now we should focus on simple device filtering and hotplug.

Just listen for libudev stuff, don't try to filter them, or ever
"replay" them, that way lies madness, and lots of nasty race conditions
that is guaranteed to break things.

good luck,

greg k-h

  parent reply	other threads:[~2013-09-26  5:33 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-22 17:43 RFC: Device Namespaces Oren Laadan
     [not found] ` <CAA4jN2aw4zEW=UfKCyqaOvXnbiRb_J9srfCn4OXTFzc6vWBM4A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-22 18:21   ` Serge Hallyn
2013-08-26 10:11     ` Oren Laadan
     [not found]       ` <CAA4jN2YL7Lfu2+DW-i+MovFxWEhJfT4aBBKREU_vy7JX9TKGHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-06 17:50         ` Eric W. Biederman
     [not found]           ` <8761udlu0d.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-08 12:28             ` Amir Goldstein
     [not found]               ` <CAA2m6vexArJ+6jFbK80Amstk=LK30=XDNHdBHSswP=LgpSP-6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-09  0:51                 ` Eric W. Biederman
     [not found]                   ` <871u4yddg4.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-10  7:09                     ` Amir Goldstein
     [not found]                       ` <CAA2m6vc_kWWGDWcdjk26N3YvTqZySLFxPQRjOD9_ypBOka2+GQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-25 11:05                         ` Janne Karhunen
     [not found]                           ` <CAE=NcrbyFFoMn2nfBA_=ZtwD=eGLvqK=L-U9MuGrtJFLZfZppw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-25 20:23                             ` Eric W. Biederman
     [not found]                               ` <87ioxo4pm5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-25 21:17                                 ` [lxc-devel] " Jeremy Andrus
     [not found]                                   ` <AD5F7BD2-0166-46BD-AB14-463C0E88BC92-eQaUEPhvms7ENvBUuze7eA@public.gmane.org>
2013-09-25 21:47                                     ` Eric W. Biederman
     [not found]                                       ` <8738osr2ue.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-29 17:56                                         ` Amir Goldstein
2013-09-25 21:34                             ` Eric W. Biederman
     [not found]                               ` <87bo3gshz5.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-26  5:33                                 ` Greg Kroah-Hartman [this message]
     [not found]                                   ` <20130926053320.GB3725-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2013-09-26  8:25                                     ` Janne Karhunen
     [not found]                                       ` <CAE=NcrbPXGWU8FUgwchXyL5HjXf+4AKbgUWGe1ZO=Xcq=iV-Lg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-26 13:56                                         ` Greg Kroah-Hartman
     [not found]                                           ` <20130926135604.GA16624-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2013-09-26 17:01                                             ` Janne Karhunen
     [not found]                                               ` <CAE=NcrY3xC1AF_GV2b1KsF7AwYZTuGBuKLS5yBUWoWcmKU4YBg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-26 17:07                                                 ` Greg Kroah-Hartman
     [not found]                                                   ` <20130926170757.GA9345-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2013-09-26 17:56                                                     ` Janne Karhunen
2013-09-30 15:37                                                     ` James Bottomley
     [not found]                                                       ` <1380555439.2161.5.camel-sFMDBYUN5F8GjUHQrlYNx2Wm91YjaHnnhRte9Li2A+AAvxtiuMwx3w@public.gmane.org>
2013-09-30 16:11                                                         ` Greg Kroah-Hartman
     [not found]                                                           ` <20130930161117.GA26459-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2013-09-30 16:33                                                             ` James Bottomley
2013-10-01  6:19                                     ` Janne Karhunen
     [not found]                                       ` <CAE=NcrYV2RiMV7PcwEjFGFRBrz9XdZGs86Wau2a+6xpYN2aEHA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-01 17:27                                         ` Andy Lutomirski
     [not found]                                           ` <CALCETrWWoHzuJcnfEUY+cFpOgT5gnG8U1cVbCW0_8V7Z_v6DJw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-01 17:53                                             ` Serge E. Hallyn
     [not found]                                               ` <20131001175345.GA4145-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2013-10-01 19:51                                                 ` Eric W. Biederman
     [not found]                                                   ` <87had0wz07.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-10-01 20:46                                                     ` Serge Hallyn
2013-10-01 22:59                                                       ` [lxc-devel] " Michael H. Warfield
2013-10-02 22:55                                                       ` Eric W. Biederman
2013-10-01 20:57                                                     ` Greg Kroah-Hartman
     [not found]                                                       ` <20131001205718.GA17036-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2013-10-02 22:45                                                         ` Eric W. Biederman
2013-10-01 22:19                                                     ` Michael H. Warfield
2013-10-01 18:36                                             ` Janne Karhunen
2013-10-01 17:33                                         ` Greg Kroah-Hartman
     [not found]                                           ` <20131001173342.GA19267-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2013-10-01 18:23                                             ` Janne Karhunen
2013-10-28 23:31                                 ` Andrey Wagin
2013-08-29 19:06   ` RFC: " Andy Lutomirski
     [not found]     ` <521F9BBE.2070505-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
2013-09-03 19:35       ` [lxc-devel] " Stéphane Graber
  -- strict thread matches above, loose matches on Subject: below --
2013-09-29 19:28 Amir Goldstein
     [not found] ` <CAA2m6veny-7_ONMA973Wu36U4kz4gAuw0dpodkb8+GZDv6VNBQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-09-29 20:06   ` Greg Kroah-Hartman
     [not found]     ` <20130929200620.GA31304-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2013-09-30 15:36       ` Michael H. Warfield
2013-10-03  0:44   ` Eric W. Biederman
     [not found]     ` <87a9iri3ot.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-10-03  0:59       ` Eric W. Biederman
2013-10-03  8:58       ` Amir Goldstein
     [not found]         ` <CAA2m6vc3OFmS9VwiTavRzPqhn+qoe6vDCO2sitXpEQ8a1JVyfg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-10-03  9:17           ` Eric W. Biederman
2021-06-08  9:38 device namespaces Enrico Weigelt, metux IT consult
2021-06-08 12:30 ` Christian Brauner
2021-06-08 12:41   ` Greg Kroah-Hartman
2021-06-08 14:10     ` Hannes Reinecke
2021-06-08 14:29       ` Christian Brauner
2021-06-08 15:54         ` Hannes Reinecke
2021-06-08 17:16           ` Eric W. Biederman
2021-06-09  6:38             ` Christian Brauner
2021-06-09  7:02               ` Hannes Reinecke
2021-06-09  7:21                 ` Christian Brauner
2021-06-09  7:54                   ` Hannes Reinecke
2021-06-09  8:09                     ` Christian Brauner
2021-06-11 18:14                       ` Eric W. Biederman
2021-06-14  7:49                         ` Enrico Weigelt, metux IT consult
2021-06-14  8:22                           ` Greg KH
2021-06-14 17:36                           ` Eric W. Biederman
2021-06-15 11:24                             ` Enrico Weigelt, metux IT consult
2021-06-15 11:33                               ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130926053320.GB3725@kroah.com \
    --to=gregkh-hqyy1w1ycw8ekmwlsbkhg0b+6bgklq7r@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=devel-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=kay.sievers-tD+1rO4QERM@public.gmane.org \
    --cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
    --cc=lxc-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    --cc=mhw-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org \
    --cc=stgraber-GeWIH/nMZzLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox