[LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
@ 2014-12-02 23:47 Andy Lutomirski
  2014-12-03  3:37 ` Eric W. Biederman
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Andy Lutomirski @ 2014-12-02 23:47 UTC (permalink / raw)
  To: Linux FS Devel, lsf-pc
  Cc: Eric W. Biederman, Seth Forshee, Lukasz Pawelczyk,
	Richard Weinberger

This should hopefully be a short topic, and it's possible that it'll
be settled by the time LSF/MM comes around, but:

There's a fair amount of interest from different directions for
allowing filesystems with a backing store to be mounted (in the
mount-from-scratch sense, not the bind-mount sense) in a user
namespace.  For example, Seth has patches to allow unprivileged FUSE
mounts.  There are a few issues here, for example:

 - What happens to device nodes in those filesystems?

 - If a FUSE backend is in a user namespace, how should UIDs be
translated to/from that backend?

 - How should LSM security labels be translated?

 - Should a struct super_block be associated with a user namespace?
(Answer: probably, I think.)  If so, what should the semantics be?

There are also some remapping cases that aren't directly user
namespace-related.  For example, I'd like to be able to insert
removable media and create files owned by uid 0 (or any other uid)
without actually being root.

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2014-12-02 23:47 [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping Andy Lutomirski
@ 2014-12-03  3:37 ` Eric W. Biederman
  2015-02-22 17:12   ` [Lsf-pc] " James Bottomley
  2014-12-03 14:48 ` Seth Forshee
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Eric W. Biederman @ 2014-12-03  3:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux FS Devel, lsf-pc, Seth Forshee, Lukasz Pawelczyk,
	Richard Weinberger

Andy Lutomirski <luto@amacapital.net> writes:

> This should hopefully be a short topic, and it's possible that it'll
> be settled by the time LSF/MM comes around, but:
>
> There's a fair amount of interest from different directions for
> allowing filesystems with a backing store to be mounted (in the
> mount-from-scratch sense, not the bind-mount sense) in a user
> namespace.  For example, Seth has patches to allow unprivileged FUSE
> mounts.  There are a few issues here, for example:
>
>  - What happens to device nodes in those filesystems?
>
>  - If a FUSE backend is in a user namespace, how should UIDs be
> translated to/from that backend?
>
>  - How should LSM security labels be translated?
>
>  - Should a struct super_block be associated with a user namespace?
> (Answer: probably, I think.)  If so, what should the semantics be?
>
> There are also some remapping cases that aren't directly user
> namespace-related.  For example, I'd like to be able to insert
> removable media and create files owned by uid 0 (or any other uid)
> without actually being root.

And there is the longer term question that may be more appropriate when
we get all of the id problems settled, about what kind of
testing, auditing, review we want in place before we believe an
unprivileged mount is actually safe to perform, when we can assume
hostile intent by the mounter.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2014-12-02 23:47 [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping Andy Lutomirski
  2014-12-03  3:37 ` Eric W. Biederman
@ 2014-12-03 14:48 ` Seth Forshee
  2014-12-05 18:01 ` David Howells
  2015-02-22 17:01 ` James Bottomley
  3 siblings, 0 replies; 14+ messages in thread
From: Seth Forshee @ 2014-12-03 14:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux FS Devel, lsf-pc, Eric W. Biederman, Lukasz Pawelczyk,
	Richard Weinberger

On Tue, Dec 02, 2014 at 03:47:04PM -0800, Andy Lutomirski wrote:
> This should hopefully be a short topic, and it's possible that it'll
> be settled by the time LSF/MM comes around, but:
> 
> There's a fair amount of interest from different directions for
> allowing filesystems with a backing store to be mounted (in the
> mount-from-scratch sense, not the bind-mount sense) in a user
> namespace.  For example, Seth has patches to allow unprivileged FUSE
> mounts.  There are a few issues here, for example:
> 
>  - What happens to device nodes in those filesystems?
> 
>  - If a FUSE backend is in a user namespace, how should UIDs be
> translated to/from that backend?
> 
>  - How should LSM security labels be translated?
> 
>  - Should a struct super_block be associated with a user namespace?
> (Answer: probably, I think.)  If so, what should the semantics be?

Another issue is how to deal with ids in the filesystem which don't map
into the user namespace from which the filesystem was mounted, e.g. in
inodes and ACLs.

I was also experimenting with mounting ext4 from a user namespace a
month or so ago, and I'm pretty sure a few other issues popped up there.
I'll have to go back and refresh my memory.

Seth

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2014-12-02 23:47 [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping Andy Lutomirski
  2014-12-03  3:37 ` Eric W. Biederman
  2014-12-03 14:48 ` Seth Forshee
@ 2014-12-05 18:01 ` David Howells
  2014-12-08 21:59   ` Eric W. Biederman
  2015-02-22 17:01 ` James Bottomley
  3 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2014-12-05 18:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: dhowells, Linux FS Devel, lsf-pc, Eric W. Biederman, Seth Forshee,
	Lukasz Pawelczyk, Richard Weinberger

Andy Lutomirski <luto@amacapital.net> wrote:

>  - How should LSM security labels be translated?

I'm definitely interested in that.  Especially with respect to how to deal
with SELinux + overlay{fs,}/unionmount.

Also, I'm interested in how keyrings should interact with namespaces.  Should
keys be namespaced?

And I'm also interested in how upcalls, including to /sbin/request-key, should
be dealt with.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2014-12-05 18:01 ` David Howells
@ 2014-12-08 21:59   ` Eric W. Biederman
  2014-12-09 18:51     ` [Lsf-pc] " Jeff Layton
  2015-02-22 16:52     ` James Bottomley
  0 siblings, 2 replies; 14+ messages in thread
From: Eric W. Biederman @ 2014-12-08 21:59 UTC (permalink / raw)
  To: David Howells
  Cc: Andy Lutomirski, Linux FS Devel, lsf-pc, Seth Forshee,
	Lukasz Pawelczyk, Richard Weinberger

David Howells <dhowells@redhat.com> writes:

> Andy Lutomirski <luto@amacapital.net> wrote:
>
>>  - How should LSM security labels be translated?
>
> I'm definitely interested in that.  Especially with respect to how to deal
> with SELinux + overlay{fs,}/unionmount.
>
> Also, I'm interested in how keyrings should interact with namespaces.  Should
> keys be namespaced?

Key lookups are already per user namespace, so I would call that
namespaced.  We do have the question with keys, should we allow
duplicate key values so that checkpoint/restart can carry keys between
different kernels.

> And I'm also interested in how upcalls, including to /sbin/request-key, should
> be dealt with.

Good question.  There is some ongoing discussion on that right now.

Eric


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2014-12-08 21:59   ` Eric W. Biederman
@ 2014-12-09 18:51     ` Jeff Layton
  2015-02-22 16:52     ` James Bottomley
  1 sibling, 0 replies; 14+ messages in thread
From: Jeff Layton @ 2014-12-09 18:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Howells, Lukasz Pawelczyk, Richard Weinberger,
	Andy Lutomirski, Seth Forshee, Linux FS Devel, lsf-pc

On Mon, 08 Dec 2014 15:59:12 -0600
ebiederm@xmission.com (Eric W. Biederman) wrote:

> David Howells <dhowells@redhat.com> writes:
> 
> > Andy Lutomirski <luto@amacapital.net> wrote:
> >
> >>  - How should LSM security labels be translated?
> >
> > I'm definitely interested in that.  Especially with respect to how to deal
> > with SELinux + overlay{fs,}/unionmount.
> >
> > Also, I'm interested in how keyrings should interact with namespaces.  Should
> > keys be namespaced?
> 
> Key lookups are already per user namespace, so I would call that
> namespaced.  We do have the question with keys, should we allow
> duplicate key values so that checkpoint/restart can carry keys between
> different kernels.
> 
> > And I'm also interested in how upcalls, including to /sbin/request-key, should
> > be dealt with.
> 
> Good question.  There is some ongoing discussion on that right now.
> 

Agreed. It would be nice to figure out what the end game is for all
call_usermodehelper type upcalls within namespaces (including the ones
for keyrings). What can we do to make that work as expected and be safe?

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2014-12-08 21:59   ` Eric W. Biederman
  2014-12-09 18:51     ` [Lsf-pc] " Jeff Layton
@ 2015-02-22 16:52     ` James Bottomley
  2015-02-22 23:51       ` Jeff Layton
  1 sibling, 1 reply; 14+ messages in thread
From: James Bottomley @ 2015-02-22 16:52 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Howells, Lukasz Pawelczyk, Richard Weinberger,
	Andy Lutomirski, Seth Forshee, Linux FS Devel, lsf-pc

On Mon, 2014-12-08 at 15:59 -0600, Eric W. Biederman wrote:
> David Howells <dhowells@redhat.com> writes:
> 
> > Andy Lutomirski <luto@amacapital.net> wrote:
> >
> >>  - How should LSM security labels be translated?
> >
> > I'm definitely interested in that.  Especially with respect to how to deal
> > with SELinux + overlay{fs,}/unionmount.
> >
> > Also, I'm interested in how keyrings should interact with namespaces.  Should
> > keys be namespaced?
> 
> Key lookups are already per user namespace, so I would call that
> namespaced.  We do have the question with keys, should we allow
> duplicate key values so that checkpoint/restart can carry keys between
> different kernels.
> 
> > And I'm also interested in how upcalls, including to /sbin/request-key, should
> > be dealt with.
> 
> Good question.  There is some ongoing discussion on that right now.

Aren't the upcalls exactly the same problem as NFS in a container (which
uses daemon upcalls).  Can the existing solution for that be
generalised?

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2014-12-02 23:47 [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping Andy Lutomirski
                   ` (2 preceding siblings ...)
  2014-12-05 18:01 ` David Howells
@ 2015-02-22 17:01 ` James Bottomley
  2015-02-23 15:54   ` Andy Lutomirski
  3 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2015-02-22 17:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux FS Devel, lsf-pc, Seth Forshee, Lukasz Pawelczyk,
	Eric W. Biederman, Richard Weinberger

On Tue, 2014-12-02 at 15:47 -0800, Andy Lutomirski wrote:
> This should hopefully be a short topic, and it's possible that it'll
> be settled by the time LSF/MM comes around, but:
> 
> There's a fair amount of interest from different directions for
> allowing filesystems with a backing store to be mounted (in the
> mount-from-scratch sense, not the bind-mount sense) in a user
> namespace.  For example, Seth has patches to allow unprivileged FUSE
> mounts.  There are a few issues here, for example:
> 
>  - What happens to device nodes in those filesystems?

You have to allow device nodes in mount namespaces.  However, not all
devices should be present, only the ones the owner of the namespace is
allowed to either see (read only) or control (read/write).

The specific problem for container security is allowing the user who can
write to the device also to mount it ... because that lets them inject
data known to cause a kernel crash and bring down the entire system or
worse.  The current solution is simply not to allow the owner both to
write and mount, but this is becoming increasingly untenable using
loopback images with containers for cascading overlays like docker does.

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2014-12-03  3:37 ` Eric W. Biederman
@ 2015-02-22 17:12   ` James Bottomley
  2015-02-23 12:38     ` Jan Kara
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2015-02-22 17:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Lutomirski, Linux FS Devel, Seth Forshee, lsf-pc,
	Lukasz Pawelczyk, Richard Weinberger

On Tue, 2014-12-02 at 21:37 -0600, Eric W. Biederman wrote:
> Andy Lutomirski <luto@amacapital.net> writes:
> 
> > This should hopefully be a short topic, and it's possible that it'll
> > be settled by the time LSF/MM comes around, but:
> >
> > There's a fair amount of interest from different directions for
> > allowing filesystems with a backing store to be mounted (in the
> > mount-from-scratch sense, not the bind-mount sense) in a user
> > namespace.  For example, Seth has patches to allow unprivileged FUSE
> > mounts.  There are a few issues here, for example:
> >
> >  - What happens to device nodes in those filesystems?
> >
> >  - If a FUSE backend is in a user namespace, how should UIDs be
> > translated to/from that backend?
> >
> >  - How should LSM security labels be translated?
> >
> >  - Should a struct super_block be associated with a user namespace?
> > (Answer: probably, I think.)  If so, what should the semantics be?
> >
> > There are also some remapping cases that aren't directly user
> > namespace-related.  For example, I'd like to be able to insert
> > removable media and create files owned by uid 0 (or any other uid)
> > without actually being root.
> 
> And there is the longer term question that may be more appropriate when
> we get all of the id problems settled, about what kind of
> testing, auditing, review we want in place before we believe an
> unprivileged mount is actually safe to perform, when we can assume
> hostile intent by the mounter.

Realistically, we can't rely on auditing the data: a hostile user will
be injecting a specific data pattern to exploit a bug in the filesystem
code.  We can't audit for this if we don't know the bug (which we mostly
don't otherwise they'd be fixed).

What we can do is audit for specific operations.  Looking at what the
use cases are, users mostly either want to create a pristine filesystem
or use an existing template.  Mkfs is a particular nasty because it's
all in userspace and sprays data down on to the device making it really
hard to audit.  One of the approaches we've experimented with in
Parallels is the bit bucket one, where we create a device that looks
read/write in the container, but really it throws away the writes from
the user and performs in the host the operation we believe the user is
trying to do.  It protects against most injection attacks, but trips up
when the user tries to do some operation we haven't anticipated.

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2015-02-22 16:52     ` James Bottomley
@ 2015-02-22 23:51       ` Jeff Layton
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff Layton @ 2015-02-22 23:51 UTC (permalink / raw)
  To: James Bottomley
  Cc: Eric W. Biederman, Lukasz Pawelczyk, Richard Weinberger,
	Andy Lutomirski, David Howells, Seth Forshee, Linux FS Devel,
	lsf-pc

On Sun, 22 Feb 2015 08:52:48 -0800
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Mon, 2014-12-08 at 15:59 -0600, Eric W. Biederman wrote:
> > David Howells <dhowells@redhat.com> writes:
> > 
> > > Andy Lutomirski <luto@amacapital.net> wrote:
> > >
> > >>  - How should LSM security labels be translated?
> > >
> > > I'm definitely interested in that.  Especially with respect to how to deal
> > > with SELinux + overlay{fs,}/unionmount.
> > >
> > > Also, I'm interested in how keyrings should interact with namespaces.  Should
> > > keys be namespaced?
> > 
> > Key lookups are already per user namespace, so I would call that
> > namespaced.  We do have the question with keys, should we allow
> > duplicate key values so that checkpoint/restart can carry keys between
> > different kernels.
> > 
> > > And I'm also interested in how upcalls, including to /sbin/request-key, should
> > > be dealt with.
> > 
> > Good question.  There is some ongoing discussion on that right now.
> 
> Aren't the upcalls exactly the same problem as NFS in a container (which
> uses daemon upcalls).  Can the existing solution for that be
> generalised?
> 

Not really, no...

NFS (and nfsd) namespaceification (is that a word?) is designed around
the network namespace. Start your daemons in a container and do your
mount in the same container and everything "just works" due to the fact
that the network namespace is the same.

When you do something like a call_usermodehelper upcall, then things
become more tricky. You have a network namespace, but nothing else, so
you still have to know (for instance) what mount namespace to spawn the
usermode helper in. Sorting that out is not at all straightforward and
if we get it wrong it could be a giant security hole.

Ian Kent is working on a patchset for this. The current idea is to ID
the init process in the container where the mount occurred (or where
nfsd was started) and use that to get an nsproxy to use for the UMH
upcall.

-- 
Jeff Layton <jlayton@primarydata.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2015-02-22 17:12   ` [Lsf-pc] " James Bottomley
@ 2015-02-23 12:38     ` Jan Kara
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2015-02-23 12:38 UTC (permalink / raw)
  To: James Bottomley
  Cc: Eric W. Biederman, Lukasz Pawelczyk, Richard Weinberger,
	Andy Lutomirski, Seth Forshee, Linux FS Devel, lsf-pc

On Sun 22-02-15 09:12:35, James Bottomley wrote:
> On Tue, 2014-12-02 at 21:37 -0600, Eric W. Biederman wrote:
> > Andy Lutomirski <luto@amacapital.net> writes:
> > 
> > > This should hopefully be a short topic, and it's possible that it'll
> > > be settled by the time LSF/MM comes around, but:
> > >
> > > There's a fair amount of interest from different directions for
> > > allowing filesystems with a backing store to be mounted (in the
> > > mount-from-scratch sense, not the bind-mount sense) in a user
> > > namespace.  For example, Seth has patches to allow unprivileged FUSE
> > > mounts.  There are a few issues here, for example:
> > >
> > >  - What happens to device nodes in those filesystems?
> > >
> > >  - If a FUSE backend is in a user namespace, how should UIDs be
> > > translated to/from that backend?
> > >
> > >  - How should LSM security labels be translated?
> > >
> > >  - Should a struct super_block be associated with a user namespace?
> > > (Answer: probably, I think.)  If so, what should the semantics be?
> > >
> > > There are also some remapping cases that aren't directly user
> > > namespace-related.  For example, I'd like to be able to insert
> > > removable media and create files owned by uid 0 (or any other uid)
> > > without actually being root.
> > 
> > And there is the longer term question that may be more appropriate when
> > we get all of the id problems settled, about what kind of
> > testing, auditing, review we want in place before we believe an
> > unprivileged mount is actually safe to perform, when we can assume
> > hostile intent by the mounter.
> 
> Realistically, we can't rely on auditing the data: a hostile user will
> be injecting a specific data pattern to exploit a bug in the filesystem
> code.  We can't audit for this if we don't know the bug (which we mostly
> don't otherwise they'd be fixed).
> 
> What we can do is audit for specific operations.  Looking at what the
> use cases are, users mostly either want to create a pristine filesystem
> or use an existing template.  Mkfs is a particular nasty because it's
  Well, what if you also had templates for pristine filesystems? There
aren't that many sensible configs and compressed empty fs image is pretty
small... Sure, users won't be able to "finetune" their fs configuration but
is it that important? Most users don't do that anyway.

Alternatively you could just forbid writing from the container and if user
wants to create fs image, he'd just pass options for mkfs to some service
which will run mkfs outside of the container. It isn't neat but when I see
the hacks you are describing below, it doesn't seem as such a bad option :)

> all in userspace and sprays data down on to the device making it really
> hard to audit.  One of the approaches we've experimented with in
> Parallels is the bit bucket one, where we create a device that looks
> read/write in the container, but really it throws away the writes from
> the user and performs in the host the operation we believe the user is
> trying to do.  It protects against most injection attacks, but trips up
> when the user tries to do some operation we haven't anticipated.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2015-02-22 17:01 ` James Bottomley
@ 2015-02-23 15:54   ` Andy Lutomirski
  2015-02-23 16:16     ` James Bottomley
  0 siblings, 1 reply; 14+ messages in thread
From: Andy Lutomirski @ 2015-02-23 15:54 UTC (permalink / raw)
  To: James Bottomley
  Cc: Linux FS Devel, lsf-pc, Seth Forshee, Lukasz Pawelczyk,
	Eric W. Biederman, Richard Weinberger

On Sun, Feb 22, 2015 at 9:01 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Tue, 2014-12-02 at 15:47 -0800, Andy Lutomirski wrote:
>> This should hopefully be a short topic, and it's possible that it'll
>> be settled by the time LSF/MM comes around, but:
>>
>> There's a fair amount of interest from different directions for
>> allowing filesystems with a backing store to be mounted (in the
>> mount-from-scratch sense, not the bind-mount sense) in a user
>> namespace.  For example, Seth has patches to allow unprivileged FUSE
>> mounts.  There are a few issues here, for example:
>>
>>  - What happens to device nodes in those filesystems?
>
> You have to allow device nodes in mount namespaces.  However, not all
> devices should be present, only the ones the owner of the namespace is
> allowed to either see (read only) or control (read/write).

I agree that you need to allow device nodes, but I'm not sure that you
need to allow device nodes on filesystems with backing store.  Every
recent distro should work with devtmpfs (admittedly, we don't know how
devtmpfs should work in a container), but tmpfs is a decent
alternative.  In any event, sticking device nodes on ext4 is asking
for trouble with dynamic minors and such.

>
> The specific problem for container security is allowing the user who can
> write to the device also to mount it ... because that lets them inject
> data known to cause a kernel crash and bring down the entire system or
> worse.  The current solution is simply not to allow the owner both to
> write and mount, but this is becoming increasingly untenable using
> loopback images with containers for cascading overlays like docker does.

I see this as a separate issue.  If the kernel has no implementation
bugs, this would be a nonissue :)

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2015-02-23 15:54   ` Andy Lutomirski
@ 2015-02-23 16:16     ` James Bottomley
  2015-03-02 22:34       ` Andy Lutomirski
  0 siblings, 1 reply; 14+ messages in thread
From: James Bottomley @ 2015-02-23 16:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux FS Devel, lsf-pc, Seth Forshee, Lukasz Pawelczyk,
	Eric W. Biederman, Richard Weinberger

On Mon, 2015-02-23 at 07:54 -0800, Andy Lutomirski wrote:
> On Sun, Feb 22, 2015 at 9:01 AM, James Bottomley
> <James.Bottomley@hansenpartnership.com> wrote:
> > On Tue, 2014-12-02 at 15:47 -0800, Andy Lutomirski wrote:
> >> This should hopefully be a short topic, and it's possible that it'll
> >> be settled by the time LSF/MM comes around, but:
> >>
> >> There's a fair amount of interest from different directions for
> >> allowing filesystems with a backing store to be mounted (in the
> >> mount-from-scratch sense, not the bind-mount sense) in a user
> >> namespace.  For example, Seth has patches to allow unprivileged FUSE
> >> mounts.  There are a few issues here, for example:
> >>
> >>  - What happens to device nodes in those filesystems?
> >
> > You have to allow device nodes in mount namespaces.  However, not all
> > devices should be present, only the ones the owner of the namespace is
> > allowed to either see (read only) or control (read/write).
> 
> I agree that you need to allow device nodes, but I'm not sure that you
> need to allow device nodes on filesystems with backing store.  Every
> recent distro should work with devtmpfs (admittedly, we don't know how
> devtmpfs should work in a container), but tmpfs is a decent
> alternative.  In any event, sticking device nodes on ext4 is asking
> for trouble with dynamic minors and such.

OK, so this one is a bit off topic from your original proposal.  Because
now we're moving on to device handling inside containers (which is also
a big can of worms).

We tend to want a strictly controlled /dev for a container, because the
host has to make decisions about hotplug devices and pass them on to
containers (or not) based on its policy.  This makes devtmpfs (to us)
unfit for purpose because all that policy would have to be coded per
container inside the kernel to make it work.  We also need to control
access more strictly because of the disallow write and mount problem.

Device nodes we pass through to the container tend to be done via bind
mount from the host, so most of the policy logic can be in the host
userspace.

In fact, mknod is intercepted from the container and so the host polices
policy from that end as well ... so it doesn't really matter *where* the
device is being created ... that's not to say it couldn't be a tmpfs,
just saying that the actual location isn't that important.  What is
important is policing the node create action.

However, other container people need to chime in here.  I tend to think
that hotplug handling inside the container is unnecessary (certainly in
a hosting/VPS environment), but I believe there are other potential
users of it who have different ideas.

> > The specific problem for container security is allowing the user who can
> > write to the device also to mount it ... because that lets them inject
> > data known to cause a kernel crash and bring down the entire system or
> > worse.  The current solution is simply not to allow the owner both to
> > write and mount, but this is becoming increasingly untenable using
> > loopback images with containers for cascading overlays like docker does.
> 
> I see this as a separate issue.  If the kernel has no implementation
> bugs, this would be a nonissue :)

Right, I started another thread on this.

James

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Lsf-pc] [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping
  2015-02-23 16:16     ` James Bottomley
@ 2015-03-02 22:34       ` Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2015-03-02 22:34 UTC (permalink / raw)
  To: James Bottomley
  Cc: Linux FS Devel, lsf-pc, Seth Forshee, Lukasz Pawelczyk,
	Eric W. Biederman, Richard Weinberger

On Mon, Feb 23, 2015 at 8:16 AM, James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
> On Mon, 2015-02-23 at 07:54 -0800, Andy Lutomirski wrote:
>> On Sun, Feb 22, 2015 at 9:01 AM, James Bottomley
>> <James.Bottomley@hansenpartnership.com> wrote:
>> > On Tue, 2014-12-02 at 15:47 -0800, Andy Lutomirski wrote:
>> >> This should hopefully be a short topic, and it's possible that it'll
>> >> be settled by the time LSF/MM comes around, but:
>> >>
>> >> There's a fair amount of interest from different directions for
>> >> allowing filesystems with a backing store to be mounted (in the
>> >> mount-from-scratch sense, not the bind-mount sense) in a user
>> >> namespace.  For example, Seth has patches to allow unprivileged FUSE
>> >> mounts.  There are a few issues here, for example:
>> >>
>> >>  - What happens to device nodes in those filesystems?
>> >
>> > You have to allow device nodes in mount namespaces.  However, not all
>> > devices should be present, only the ones the owner of the namespace is
>> > allowed to either see (read only) or control (read/write).
>>
>> I agree that you need to allow device nodes, but I'm not sure that you
>> need to allow device nodes on filesystems with backing store.  Every
>> recent distro should work with devtmpfs (admittedly, we don't know how
>> devtmpfs should work in a container), but tmpfs is a decent
>> alternative.  In any event, sticking device nodes on ext4 is asking
>> for trouble with dynamic minors and such.
>
> OK, so this one is a bit off topic from your original proposal.  Because
> now we're moving on to device handling inside containers (which is also
> a big can of worms).
>
> We tend to want a strictly controlled /dev for a container, because the
> host has to make decisions about hotplug devices and pass them on to
> containers (or not) based on its policy.  This makes devtmpfs (to us)
> unfit for purpose because all that policy would have to be coded per
> container inside the kernel to make it work.  We also need to control
> access more strictly because of the disallow write and mount problem.
>
> Device nodes we pass through to the container tend to be done via bind
> mount from the host, so most of the policy logic can be in the host
> userspace.
>
> In fact, mknod is intercepted from the container and so the host polices
> policy from that end as well ... so it doesn't really matter *where* the
> device is being created ... that's not to say it couldn't be a tmpfs,
> just saying that the actual location isn't that important.  What is
> important is policing the node create action.

Agreed, as long as the fs with the device nodes isn't ext4 or some
other real fs backed by storage owned by the container (obviously).

--Andy

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-03-02 22:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-02 23:47 [LSF/MM TOPIC] Filesystem namespaces and uid/gid/lsm remapping Andy Lutomirski
2014-12-03  3:37 ` Eric W. Biederman
2015-02-22 17:12   ` [Lsf-pc] " James Bottomley
2015-02-23 12:38     ` Jan Kara
2014-12-03 14:48 ` Seth Forshee
2014-12-05 18:01 ` David Howells
2014-12-08 21:59   ` Eric W. Biederman
2014-12-09 18:51     ` [Lsf-pc] " Jeff Layton
2015-02-22 16:52     ` James Bottomley
2015-02-22 23:51       ` Jeff Layton
2015-02-22 17:01 ` James Bottomley
2015-02-23 15:54   ` Andy Lutomirski
2015-02-23 16:16     ` James Bottomley
2015-03-02 22:34       ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).