userns idea: preventing SCM_CREDENTIALS from leaking out

All of lore.kernel.org
 help / color / mirror / Atom feed

* userns idea: preventing SCM_CREDENTIALS from leaking out
@ 2013-11-27  1:02 Andy Lutomirski
       [not found] ` <CALCETrWWSVnwg6Sb=bZz0xuAj_ASjZmsLYy=ELoR_uSqKJJaWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2013-11-27  1:02 UTC (permalink / raw)
  To: Eric W. Biederman, Linux Containers, Serge Hallyn

IIUC there are multiple ways to end up with a socket pair for which
one end is in a user namespace and the other is outside of it.  That
means that SCM_CREDENTIALS can be used by a process in a userns to
authenticate to a process outside.

This is all well and good (and, as far as I know, correct), but I'm
not sure this is always the desired behavior.  In the context of a
tool like Docker, it might be useful to have several user namespaces
that have the *same* uids mapped.  Nonetheless, if one of those
namespaces is compromised, it probably shouldn't be permitted to
attack things outside the user namespace (or in the host, if any
interesting uids are mapped).

Would it make sense to have an option to allow a user namespace to opt
into different behavior so that its users show up as the invalid uid
as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)?

Implementing this might be awkward (ok, it might actively suck due to
a possible need for reference counting), but I'm wondering if it's a
good idea even in principle.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <CALCETrWWSVnwg6Sb=bZz0xuAj_ASjZmsLYy=ELoR_uSqKJJaWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: userns idea: preventing SCM_CREDENTIALS from leaking out
       [not found] ` <CALCETrWWSVnwg6Sb=bZz0xuAj_ASjZmsLYy=ELoR_uSqKJJaWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-11-27  1:33   ` Eric W. Biederman
  2013-11-27  1:49   ` Serge E. Hallyn
  1 sibling, 0 replies; 11+ messages in thread
From: Eric W. Biederman @ 2013-11-27  1:33 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Containers, Serge Hallyn

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> IIUC there are multiple ways to end up with a socket pair for which
> one end is in a user namespace and the other is outside of it.  That
> means that SCM_CREDENTIALS can be used by a process in a userns to
> authenticate to a process outside.
>
> This is all well and good (and, as far as I know, correct), but I'm
> not sure this is always the desired behavior. 

Preventing a socket pair in contexts where it is not desired is straight
forward so I don't think in general this is a problem.

> In the context of a
> tool like Docker, it might be useful to have several user namespaces
> that have the *same* uids mapped.  Nonetheless, if one of those
> namespaces is compromised, it probably shouldn't be permitted to
> attack things outside the user namespace (or in the host, if any
> interesting uids are mapped).
>
> Would it make sense to have an option to allow a user namespace to opt
> into different behavior so that its users show up as the invalid uid
> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)?
>
> Implementing this might be awkward (ok, it might actively suck due to
> a possible need for reference counting), but I'm wondering if it's a
> good idea even in principle.

For an idea like this I would really need to see a motivating example,
especially as adding complexity simple makes the analysis of security
properites worse.

As for uid mappings my expectation is that an ordinary user will get
about 10,000 uids that are all reserved for that users.  Which means
that in general there is no excuse for using uids in different
containers for different users.

The only case I can think of for having the same uids mapped is when
we want to share code between containers.  In that case I would suggest
making the directories read-only and ensuring there are no executables
are suid or sgid.  Which prevents the problem you are worrying about.

Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: userns idea: preventing SCM_CREDENTIALS from leaking out
       [not found] ` <CALCETrWWSVnwg6Sb=bZz0xuAj_ASjZmsLYy=ELoR_uSqKJJaWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-11-27  1:33   ` Eric W. Biederman
@ 2013-11-27  1:49   ` Serge E. Hallyn
       [not found]     ` <20131127014920.GA31364-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Serge E. Hallyn @ 2013-11-27  1:49 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Containers, Serge Hallyn, Eric W. Biederman

Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> IIUC there are multiple ways to end up with a socket pair for which
> one end is in a user namespace and the other is outside of it.  That
> means that SCM_CREDENTIALS can be used by a process in a userns to
> authenticate to a process outside.
> 
> This is all well and good (and, as far as I know, correct), but I'm

And the cgroup manager I'm starting on depends on this.

> not sure this is always the desired behavior.  In the context of a
> tool like Docker, it might be useful to have several user namespaces
> that have the *same* uids mapped.  Nonetheless, if one of those
> namespaces is compromised, it probably shouldn't be permitted to
> attack things outside the user namespace (or in the host, if any
> interesting uids are mapped).
> 
> Would it make sense to have an option to allow a user namespace to opt
> into different behavior so that its users show up as the invalid uid
> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)?
> 
> Implementing this might be awkward (ok, it might actively suck due to
> a possible need for reference counting), but I'm wondering if it's a
> good idea even in principle.

Well, I'll grant you, if I have a single directory with a socket in
it, and I make that the aufs or overlayfs underlay for two separate
mounts, which each are in different containers, then you might have
a problem here.

Now maybe the answer to that is that the sockets should be created
in tmpfss (/run, /tmp, etc) anyway.  But the more I think about it
the more I, unfortunately, agree that this could be a problem.

If we were to do something like this, i'd like it to at least have
an exception to always translate the uids when talking to the
host uid.

-serge

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <20131127014920.GA31364-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]

* Re: userns idea: preventing SCM_CREDENTIALS from leaking out
       [not found]     ` <20131127014920.GA31364-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2013-11-27  3:17       ` Eric W. Biederman
       [not found]         ` <87eh62v8hc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Eric W. Biederman @ 2013-11-27  3:17 UTC (permalink / raw)
  To: Serge E. Hallyn; +Cc: Linux Containers, Serge Hallyn, Andy Lutomirski

"Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:

> Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
>> IIUC there are multiple ways to end up with a socket pair for which
>> one end is in a user namespace and the other is outside of it.  That
>> means that SCM_CREDENTIALS can be used by a process in a userns to
>> authenticate to a process outside.
>> 
>> This is all well and good (and, as far as I know, correct), but I'm
>
> And the cgroup manager I'm starting on depends on this.
>
>> not sure this is always the desired behavior.  In the context of a
>> tool like Docker, it might be useful to have several user namespaces
>> that have the *same* uids mapped.  Nonetheless, if one of those
>> namespaces is compromised, it probably shouldn't be permitted to
>> attack things outside the user namespace (or in the host, if any
>> interesting uids are mapped).
>> 
>> Would it make sense to have an option to allow a user namespace to opt
>> into different behavior so that its users show up as the invalid uid
>> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)?
>> 
>> Implementing this might be awkward (ok, it might actively suck due to
>> a possible need for reference counting), but I'm wondering if it's a
>> good idea even in principle.
>
> Well, I'll grant you, if I have a single directory with a socket in
> it, and I make that the aufs or overlayfs underlay for two separate
> mounts, which each are in different containers, then you might have
> a problem here.
>
> Now maybe the answer to that is that the sockets should be created
> in tmpfss (/run, /tmp, etc) anyway.  But the more I think about it
> the more I, unfortunately, agree that this could be a problem.

I really hate the concept of mapping a uid in some contexts and not
others.  That seems very prone to go wrong. Given all of the possible
kinds of perumutations I can't imagine how we would get it correct.

MS_NOSUID and MS_RDONLY will help with some of the worst offenders.
But it will still be possible for the user namespace root to call
setuid(NNN); and create a process with that uid.  And if a unix domain
socket isn't the only means of interacting there will still be problems.

I will suggest that writing a uid mapping filesystem like overlayfs or
perhaps as a mount option of overlayfs is likely to be a more robuse
solution in general.  Certainly that is what I originally had on the
drawing board to solve this class of problem.

Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <87eh62v8hc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>]

* Re: userns idea: preventing SCM_CREDENTIALS from leaking out
       [not found]         ` <87eh62v8hc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-11-27 14:44           ` Serge E. Hallyn
       [not found]             ` <20131127144431.GA6122-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Serge E. Hallyn @ 2013-11-27 14:44 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Serge Hallyn, Linux Containers, Andy Lutomirski

Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> 
> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> >> IIUC there are multiple ways to end up with a socket pair for which
> >> one end is in a user namespace and the other is outside of it.  That
> >> means that SCM_CREDENTIALS can be used by a process in a userns to
> >> authenticate to a process outside.
> >> 
> >> This is all well and good (and, as far as I know, correct), but I'm
> >
> > And the cgroup manager I'm starting on depends on this.
> >
> >> not sure this is always the desired behavior.  In the context of a
> >> tool like Docker, it might be useful to have several user namespaces
> >> that have the *same* uids mapped.  Nonetheless, if one of those
> >> namespaces is compromised, it probably shouldn't be permitted to
> >> attack things outside the user namespace (or in the host, if any
> >> interesting uids are mapped).
> >> 
> >> Would it make sense to have an option to allow a user namespace to opt
> >> into different behavior so that its users show up as the invalid uid
> >> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)?
> >> 
> >> Implementing this might be awkward (ok, it might actively suck due to
> >> a possible need for reference counting), but I'm wondering if it's a
> >> good idea even in principle.
> >
> > Well, I'll grant you, if I have a single directory with a socket in
> > it, and I make that the aufs or overlayfs underlay for two separate
> > mounts, which each are in different containers, then you might have
> > a problem here.
> >
> > Now maybe the answer to that is that the sockets should be created
> > in tmpfss (/run, /tmp, etc) anyway.  But the more I think about it
> > the more I, unfortunately, agree that this could be a problem.
> 
> I really hate the concept of mapping a uid in some contexts and not
> others.  That seems very prone to go wrong. Given all of the possible
> kinds of perumutations I can't imagine how we would get it correct.
> 
> MS_NOSUID and MS_RDONLY will help with some of the worst offenders.
> But it will still be possible for the user namespace root to call
> setuid(NNN); and create a process with that uid.  And if a unix domain
> socket isn't the only means of interacting there will still be problems.
> 
> I will suggest that writing a uid mapping filesystem like overlayfs or
> perhaps as a mount option of overlayfs is likely to be a more robuse
> solution in general.  Certainly that is what I originally had on the
> drawing board to solve this class of problem.

Actually an option to aufs and overlayfs to say "any unix domain socket
which is opened must first be copied to the writeable layer" would
solve the issue (at least for all reasonable cases, iiuc)

-serge

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <20131127144431.GA6122-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]

* Re: userns idea: preventing SCM_CREDENTIALS from leaking out
       [not found]             ` <20131127144431.GA6122-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2013-11-27 16:24               ` Andy Lutomirski
       [not found]                 ` <CALCETrVXKHO4=Q+0szERmte+5HYJMwVXnXJxLTdBThmoQMMPcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2013-11-27 16:24 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, Serge Hallyn, Eric W. Biederman, Miklos Szeredi

On Wed, Nov 27, 2013 at 6:44 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
>>
>> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
>> >> IIUC there are multiple ways to end up with a socket pair for which
>> >> one end is in a user namespace and the other is outside of it.  That
>> >> means that SCM_CREDENTIALS can be used by a process in a userns to
>> >> authenticate to a process outside.
>> >>
>> >> This is all well and good (and, as far as I know, correct), but I'm
>> >
>> > And the cgroup manager I'm starting on depends on this.
>> >
>> >> not sure this is always the desired behavior.  In the context of a
>> >> tool like Docker, it might be useful to have several user namespaces
>> >> that have the *same* uids mapped.  Nonetheless, if one of those
>> >> namespaces is compromised, it probably shouldn't be permitted to
>> >> attack things outside the user namespace (or in the host, if any
>> >> interesting uids are mapped).
>> >>
>> >> Would it make sense to have an option to allow a user namespace to opt
>> >> into different behavior so that its users show up as the invalid uid
>> >> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)?
>> >>
>> >> Implementing this might be awkward (ok, it might actively suck due to
>> >> a possible need for reference counting), but I'm wondering if it's a
>> >> good idea even in principle.
>> >
>> > Well, I'll grant you, if I have a single directory with a socket in
>> > it, and I make that the aufs or overlayfs underlay for two separate
>> > mounts, which each are in different containers, then you might have
>> > a problem here.
>> >
>> > Now maybe the answer to that is that the sockets should be created
>> > in tmpfss (/run, /tmp, etc) anyway.  But the more I think about it
>> > the more I, unfortunately, agree that this could be a problem.
>>
>> I really hate the concept of mapping a uid in some contexts and not
>> others.  That seems very prone to go wrong. Given all of the possible
>> kinds of perumutations I can't imagine how we would get it correct.
>>
>> MS_NOSUID and MS_RDONLY will help with some of the worst offenders.
>> But it will still be possible for the user namespace root to call
>> setuid(NNN); and create a process with that uid.  And if a unix domain
>> socket isn't the only means of interacting there will still be problems.
>>
>> I will suggest that writing a uid mapping filesystem like overlayfs or
>> perhaps as a mount option of overlayfs is likely to be a more robuse
>> solution in general.  Certainly that is what I originally had on the
>> drawing board to solve this class of problem.
>
> Actually an option to aufs and overlayfs to say "any unix domain socket
> which is opened must first be copied to the writeable layer" would
> solve the issue (at least for all reasonable cases, iiuc)

I guess I'm reasonably convinced that overlayfs is the right place to
fix this.  (Containers using lvm will be left in the cold -- oh,
well.)

cc: Miklos, who is the most likely to implement one or both of these features.

(In cases where containers share a (non-overlay) directory that one of
them can write, would it make sense to have an option MS_NOSOCKET that
works on bind mounts?)

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <CALCETrVXKHO4=Q+0szERmte+5HYJMwVXnXJxLTdBThmoQMMPcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: userns idea: preventing SCM_CREDENTIALS from leaking out
       [not found]                 ` <CALCETrVXKHO4=Q+0szERmte+5HYJMwVXnXJxLTdBThmoQMMPcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-11-27 16:26                   ` Serge E. Hallyn
       [not found]                     ` <20131127162626.GA7358-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  2013-11-27 16:56                   ` Miklos Szeredi
  1 sibling, 1 reply; 11+ messages in thread
From: Serge E. Hallyn @ 2013-11-27 16:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Miklos Szeredi, Linux Containers, Serge Hallyn, Eric W. Biederman

Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> On Wed, Nov 27, 2013 at 6:44 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
> >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
> >>
> >> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
> >> >> IIUC there are multiple ways to end up with a socket pair for which
> >> >> one end is in a user namespace and the other is outside of it.  That
> >> >> means that SCM_CREDENTIALS can be used by a process in a userns to
> >> >> authenticate to a process outside.
> >> >>
> >> >> This is all well and good (and, as far as I know, correct), but I'm
> >> >
> >> > And the cgroup manager I'm starting on depends on this.
> >> >
> >> >> not sure this is always the desired behavior.  In the context of a
> >> >> tool like Docker, it might be useful to have several user namespaces
> >> >> that have the *same* uids mapped.  Nonetheless, if one of those
> >> >> namespaces is compromised, it probably shouldn't be permitted to
> >> >> attack things outside the user namespace (or in the host, if any
> >> >> interesting uids are mapped).
> >> >>
> >> >> Would it make sense to have an option to allow a user namespace to opt
> >> >> into different behavior so that its users show up as the invalid uid
> >> >> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)?
> >> >>
> >> >> Implementing this might be awkward (ok, it might actively suck due to
> >> >> a possible need for reference counting), but I'm wondering if it's a
> >> >> good idea even in principle.
> >> >
> >> > Well, I'll grant you, if I have a single directory with a socket in
> >> > it, and I make that the aufs or overlayfs underlay for two separate
> >> > mounts, which each are in different containers, then you might have
> >> > a problem here.
> >> >
> >> > Now maybe the answer to that is that the sockets should be created
> >> > in tmpfss (/run, /tmp, etc) anyway.  But the more I think about it
> >> > the more I, unfortunately, agree that this could be a problem.
> >>
> >> I really hate the concept of mapping a uid in some contexts and not
> >> others.  That seems very prone to go wrong. Given all of the possible
> >> kinds of perumutations I can't imagine how we would get it correct.
> >>
> >> MS_NOSUID and MS_RDONLY will help with some of the worst offenders.
> >> But it will still be possible for the user namespace root to call
> >> setuid(NNN); and create a process with that uid.  And if a unix domain
> >> socket isn't the only means of interacting there will still be problems.
> >>
> >> I will suggest that writing a uid mapping filesystem like overlayfs or
> >> perhaps as a mount option of overlayfs is likely to be a more robuse
> >> solution in general.  Certainly that is what I originally had on the
> >> drawing board to solve this class of problem.
> >
> > Actually an option to aufs and overlayfs to say "any unix domain socket
> > which is opened must first be copied to the writeable layer" would
> > solve the issue (at least for all reasonable cases, iiuc)
> 
> I guess I'm reasonably convinced that overlayfs is the right place to
> fix this.  (Containers using lvm will be left in the cold -- oh,
> well.)

Have you tested that?  If I create two LVM snapshots of an LVM, with a
unix sock on the original, and run containers on both snapshots, does
the socket connect the two containers?

> cc: Miklos, who is the most likely to implement one or both of these features.
> 
> (In cases where containers share a (non-overlay) directory that one of
> them can write, would it make sense to have an option MS_NOSOCKET that
> works on bind mounts?)
> 
> --Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <20131127162626.GA7358-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]

* Re: userns idea: preventing SCM_CREDENTIALS from leaking out
       [not found]                     ` <20131127162626.GA7358-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2013-11-27 16:37                       ` Andy Lutomirski
  0 siblings, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2013-11-27 16:37 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Linux Containers, Serge Hallyn, Eric W. Biederman, Miklos Szeredi

On Wed, Nov 27, 2013 at 8:26 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
>> On Wed, Nov 27, 2013 at 6:44 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
>> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org):
>> >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes:
>> >>
>> >> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org):
>> >> >> IIUC there are multiple ways to end up with a socket pair for which
>> >> >> one end is in a user namespace and the other is outside of it.  That
>> >> >> means that SCM_CREDENTIALS can be used by a process in a userns to
>> >> >> authenticate to a process outside.
>> >> >>
>> >> >> This is all well and good (and, as far as I know, correct), but I'm
>> >> >
>> >> > And the cgroup manager I'm starting on depends on this.
>> >> >
>> >> >> not sure this is always the desired behavior.  In the context of a
>> >> >> tool like Docker, it might be useful to have several user namespaces
>> >> >> that have the *same* uids mapped.  Nonetheless, if one of those
>> >> >> namespaces is compromised, it probably shouldn't be permitted to
>> >> >> attack things outside the user namespace (or in the host, if any
>> >> >> interesting uids are mapped).
>> >> >>
>> >> >> Would it make sense to have an option to allow a user namespace to opt
>> >> >> into different behavior so that its users show up as the invalid uid
>> >> >> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)?
>> >> >>
>> >> >> Implementing this might be awkward (ok, it might actively suck due to
>> >> >> a possible need for reference counting), but I'm wondering if it's a
>> >> >> good idea even in principle.
>> >> >
>> >> > Well, I'll grant you, if I have a single directory with a socket in
>> >> > it, and I make that the aufs or overlayfs underlay for two separate
>> >> > mounts, which each are in different containers, then you might have
>> >> > a problem here.
>> >> >
>> >> > Now maybe the answer to that is that the sockets should be created
>> >> > in tmpfss (/run, /tmp, etc) anyway.  But the more I think about it
>> >> > the more I, unfortunately, agree that this could be a problem.
>> >>
>> >> I really hate the concept of mapping a uid in some contexts and not
>> >> others.  That seems very prone to go wrong. Given all of the possible
>> >> kinds of perumutations I can't imagine how we would get it correct.
>> >>
>> >> MS_NOSUID and MS_RDONLY will help with some of the worst offenders.
>> >> But it will still be possible for the user namespace root to call
>> >> setuid(NNN); and create a process with that uid.  And if a unix domain
>> >> socket isn't the only means of interacting there will still be problems.
>> >>
>> >> I will suggest that writing a uid mapping filesystem like overlayfs or
>> >> perhaps as a mount option of overlayfs is likely to be a more robuse
>> >> solution in general.  Certainly that is what I originally had on the
>> >> drawing board to solve this class of problem.
>> >
>> > Actually an option to aufs and overlayfs to say "any unix domain socket
>> > which is opened must first be copied to the writeable layer" would
>> > solve the issue (at least for all reasonable cases, iiuc)
>>
>> I guess I'm reasonably convinced that overlayfs is the right place to
>> fix this.  (Containers using lvm will be left in the cold -- oh,
>> well.)
>
> Have you tested that?  If I create two LVM snapshots of an LVM, with a
> unix sock on the original, and run containers on both snapshots, does
> the socket connect the two containers?

That won't work, of course.  I meant that lvm containers won't be able
to remap filesystem uids, which would be an even better fix.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: userns idea: preventing SCM_CREDENTIALS from leaking out
       [not found]                 ` <CALCETrVXKHO4=Q+0szERmte+5HYJMwVXnXJxLTdBThmoQMMPcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-11-27 16:26                   ` Serge E. Hallyn
@ 2013-11-27 16:56                   ` Miklos Szeredi
       [not found]                     ` <CAJfpeguHPFcX07bM=+3JJrV1kanDxp5wZWj4jBo-+1EMceonqg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Miklos Szeredi @ 2013-11-27 16:56 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Containers, Serge Hallyn, Eric W. Biederman

On Wed, Nov 27, 2013 at 5:24 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> Actually an option to aufs and overlayfs to say "any unix domain socket
>> which is opened must first be copied to the writeable layer" would
>> solve the issue (at least for all reasonable cases, iiuc)
>
> I guess I'm reasonably convinced that overlayfs is the right place to
> fix this.  (Containers using lvm will be left in the cold -- oh,
> well.)
>
> cc: Miklos, who is the most likely to implement one or both of these features.

AFAICS implementing the option to copy up a unix domain socket on open
is trivial:  just need to tweak ovl_open_need_copy_up().

Is that what you were thinking?

> (In cases where containers share a (non-overlay) directory that one of
> them can write, would it make sense to have an option MS_NOSOCKET that
> works on bind mounts?)

Isn't it "you can't send SCM_CREDENTIALS", rather than "you can't open
unix domain socket"?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 11+ messages in thread

[parent not found: <CAJfpeguHPFcX07bM=+3JJrV1kanDxp5wZWj4jBo-+1EMceonqg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: userns idea: preventing SCM_CREDENTIALS from leaking out
       [not found]                     ` <CAJfpeguHPFcX07bM=+3JJrV1kanDxp5wZWj4jBo-+1EMceonqg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-11-27 17:54                       ` Andy Lutomirski
  2013-11-27 18:47                       ` Serge Hallyn
  1 sibling, 0 replies; 11+ messages in thread
From: Andy Lutomirski @ 2013-11-27 17:54 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Linux Containers, Serge Hallyn, Eric W. Biederman

On Wed, Nov 27, 2013 at 8:56 AM, Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> wrote:
> On Wed, Nov 27, 2013 at 5:24 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>>> Actually an option to aufs and overlayfs to say "any unix domain socket
>>> which is opened must first be copied to the writeable layer" would
>>> solve the issue (at least for all reasonable cases, iiuc)
>>
>> I guess I'm reasonably convinced that overlayfs is the right place to
>> fix this.  (Containers using lvm will be left in the cold -- oh,
>> well.)
>>
>> cc: Miklos, who is the most likely to implement one or both of these features.
>
> AFAICS implementing the option to copy up a unix domain socket on open
> is trivial:  just need to tweak ovl_open_need_copy_up().
>
> Is that what you were thinking?

I'm not familiar enough w/ overlayfs.  I think the desired semantics
would be that a socket in the overlay mount would be a different inode
than the socket in the bottom underlying fs (or whatever it's called).

>
>> (In cases where containers share a (non-overlay) directory that one of
>> them can write, would it make sense to have an option MS_NOSOCKET that
>> works on bind mounts?)
>
> Isn't it "you can't send SCM_CREDENTIALS", rather than "you can't open
> unix domain socket"?
>

The latter may be considerably harder to implement.  (There's
SO_PEERCRED, too, and I don't know if there's a good place to stick a
flag for this in an open socket.)

I think the ideal solution here is to have non-overlapping uid ranges,
and an option in overlayfs to remap uids and gids would make this
possible, at least if overlayfs is in use.

--Andy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: userns idea: preventing SCM_CREDENTIALS from leaking out
       [not found]                     ` <CAJfpeguHPFcX07bM=+3JJrV1kanDxp5wZWj4jBo-+1EMceonqg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-11-27 17:54                       ` Andy Lutomirski
@ 2013-11-27 18:47                       ` Serge Hallyn
  1 sibling, 0 replies; 11+ messages in thread
From: Serge Hallyn @ 2013-11-27 18:47 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Linux Containers, Eric W. Biederman, Andy Lutomirski

Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org):
> On Wed, Nov 27, 2013 at 5:24 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> >> Actually an option to aufs and overlayfs to say "any unix domain socket
> >> which is opened must first be copied to the writeable layer" would
> >> solve the issue (at least for all reasonable cases, iiuc)
> >
> > I guess I'm reasonably convinced that overlayfs is the right place to
> > fix this.  (Containers using lvm will be left in the cold -- oh,
> > well.)
> >
> > cc: Miklos, who is the most likely to implement one or both of these features.
> 
> AFAICS implementing the option to copy up a unix domain socket on open
> is trivial:  just need to tweak ovl_open_need_copy_up().
> 
> Is that what you were thinking?

Exactly.

thanks,
-serge

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-11-27 18:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-27  1:02 userns idea: preventing SCM_CREDENTIALS from leaking out Andy Lutomirski
     [not found] ` <CALCETrWWSVnwg6Sb=bZz0xuAj_ASjZmsLYy=ELoR_uSqKJJaWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-27  1:33   ` Eric W. Biederman
2013-11-27  1:49   ` Serge E. Hallyn
     [not found]     ` <20131127014920.GA31364-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2013-11-27  3:17       ` Eric W. Biederman
     [not found]         ` <87eh62v8hc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-11-27 14:44           ` Serge E. Hallyn
     [not found]             ` <20131127144431.GA6122-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2013-11-27 16:24               ` Andy Lutomirski
     [not found]                 ` <CALCETrVXKHO4=Q+0szERmte+5HYJMwVXnXJxLTdBThmoQMMPcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-27 16:26                   ` Serge E. Hallyn
     [not found]                     ` <20131127162626.GA7358-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2013-11-27 16:37                       ` Andy Lutomirski
2013-11-27 16:56                   ` Miklos Szeredi
     [not found]                     ` <CAJfpeguHPFcX07bM=+3JJrV1kanDxp5wZWj4jBo-+1EMceonqg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-27 17:54                       ` Andy Lutomirski
2013-11-27 18:47                       ` Serge Hallyn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.