* userns idea: preventing SCM_CREDENTIALS from leaking out
@ 2013-11-27 1:02 Andy Lutomirski
[not found] ` <CALCETrWWSVnwg6Sb=bZz0xuAj_ASjZmsLYy=ELoR_uSqKJJaWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 11+ messages in thread
From: Andy Lutomirski @ 2013-11-27 1:02 UTC (permalink / raw)
To: Eric W. Biederman, Linux Containers, Serge Hallyn
IIUC there are multiple ways to end up with a socket pair for which
one end is in a user namespace and the other is outside of it. That
means that SCM_CREDENTIALS can be used by a process in a userns to
authenticate to a process outside.
This is all well and good (and, as far as I know, correct), but I'm
not sure this is always the desired behavior. In the context of a
tool like Docker, it might be useful to have several user namespaces
that have the *same* uids mapped. Nonetheless, if one of those
namespaces is compromised, it probably shouldn't be permitted to
attack things outside the user namespace (or in the host, if any
interesting uids are mapped).
Would it make sense to have an option to allow a user namespace to opt
into different behavior so that its users show up as the invalid uid
as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)?
Implementing this might be awkward (ok, it might actively suck due to
a possible need for reference counting), but I'm wondering if it's a
good idea even in principle.
--Andy
^ permalink raw reply [flat|nested] 11+ messages in thread[parent not found: <CALCETrWWSVnwg6Sb=bZz0xuAj_ASjZmsLYy=ELoR_uSqKJJaWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: userns idea: preventing SCM_CREDENTIALS from leaking out [not found] ` <CALCETrWWSVnwg6Sb=bZz0xuAj_ASjZmsLYy=ELoR_uSqKJJaWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-11-27 1:33 ` Eric W. Biederman 2013-11-27 1:49 ` Serge E. Hallyn 1 sibling, 0 replies; 11+ messages in thread From: Eric W. Biederman @ 2013-11-27 1:33 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Linux Containers, Serge Hallyn Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes: > IIUC there are multiple ways to end up with a socket pair for which > one end is in a user namespace and the other is outside of it. That > means that SCM_CREDENTIALS can be used by a process in a userns to > authenticate to a process outside. > > This is all well and good (and, as far as I know, correct), but I'm > not sure this is always the desired behavior. Preventing a socket pair in contexts where it is not desired is straight forward so I don't think in general this is a problem. > In the context of a > tool like Docker, it might be useful to have several user namespaces > that have the *same* uids mapped. Nonetheless, if one of those > namespaces is compromised, it probably shouldn't be permitted to > attack things outside the user namespace (or in the host, if any > interesting uids are mapped). > > Would it make sense to have an option to allow a user namespace to opt > into different behavior so that its users show up as the invalid uid > as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)? > > Implementing this might be awkward (ok, it might actively suck due to > a possible need for reference counting), but I'm wondering if it's a > good idea even in principle. For an idea like this I would really need to see a motivating example, especially as adding complexity simple makes the analysis of security properites worse. As for uid mappings my expectation is that an ordinary user will get about 10,000 uids that are all reserved for that users. Which means that in general there is no excuse for using uids in different containers for different users. The only case I can think of for having the same uids mapped is when we want to share code between containers. In that case I would suggest making the directories read-only and ensuring there are no executables are suid or sgid. Which prevents the problem you are worrying about. Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: userns idea: preventing SCM_CREDENTIALS from leaking out [not found] ` <CALCETrWWSVnwg6Sb=bZz0xuAj_ASjZmsLYy=ELoR_uSqKJJaWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-11-27 1:33 ` Eric W. Biederman @ 2013-11-27 1:49 ` Serge E. Hallyn [not found] ` <20131127014920.GA31364-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 1 sibling, 1 reply; 11+ messages in thread From: Serge E. Hallyn @ 2013-11-27 1:49 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Linux Containers, Serge Hallyn, Eric W. Biederman Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > IIUC there are multiple ways to end up with a socket pair for which > one end is in a user namespace and the other is outside of it. That > means that SCM_CREDENTIALS can be used by a process in a userns to > authenticate to a process outside. > > This is all well and good (and, as far as I know, correct), but I'm And the cgroup manager I'm starting on depends on this. > not sure this is always the desired behavior. In the context of a > tool like Docker, it might be useful to have several user namespaces > that have the *same* uids mapped. Nonetheless, if one of those > namespaces is compromised, it probably shouldn't be permitted to > attack things outside the user namespace (or in the host, if any > interesting uids are mapped). > > Would it make sense to have an option to allow a user namespace to opt > into different behavior so that its users show up as the invalid uid > as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)? > > Implementing this might be awkward (ok, it might actively suck due to > a possible need for reference counting), but I'm wondering if it's a > good idea even in principle. Well, I'll grant you, if I have a single directory with a socket in it, and I make that the aufs or overlayfs underlay for two separate mounts, which each are in different containers, then you might have a problem here. Now maybe the answer to that is that the sockets should be created in tmpfss (/run, /tmp, etc) anyway. But the more I think about it the more I, unfortunately, agree that this could be a problem. If we were to do something like this, i'd like it to at least have an exception to always translate the uids when talking to the host uid. -serge ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20131127014920.GA31364-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: userns idea: preventing SCM_CREDENTIALS from leaking out [not found] ` <20131127014920.GA31364-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2013-11-27 3:17 ` Eric W. Biederman [not found] ` <87eh62v8hc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Eric W. Biederman @ 2013-11-27 3:17 UTC (permalink / raw) To: Serge E. Hallyn; +Cc: Linux Containers, Serge Hallyn, Andy Lutomirski "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): >> IIUC there are multiple ways to end up with a socket pair for which >> one end is in a user namespace and the other is outside of it. That >> means that SCM_CREDENTIALS can be used by a process in a userns to >> authenticate to a process outside. >> >> This is all well and good (and, as far as I know, correct), but I'm > > And the cgroup manager I'm starting on depends on this. > >> not sure this is always the desired behavior. In the context of a >> tool like Docker, it might be useful to have several user namespaces >> that have the *same* uids mapped. Nonetheless, if one of those >> namespaces is compromised, it probably shouldn't be permitted to >> attack things outside the user namespace (or in the host, if any >> interesting uids are mapped). >> >> Would it make sense to have an option to allow a user namespace to opt >> into different behavior so that its users show up as the invalid uid >> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)? >> >> Implementing this might be awkward (ok, it might actively suck due to >> a possible need for reference counting), but I'm wondering if it's a >> good idea even in principle. > > Well, I'll grant you, if I have a single directory with a socket in > it, and I make that the aufs or overlayfs underlay for two separate > mounts, which each are in different containers, then you might have > a problem here. > > Now maybe the answer to that is that the sockets should be created > in tmpfss (/run, /tmp, etc) anyway. But the more I think about it > the more I, unfortunately, agree that this could be a problem. I really hate the concept of mapping a uid in some contexts and not others. That seems very prone to go wrong. Given all of the possible kinds of perumutations I can't imagine how we would get it correct. MS_NOSUID and MS_RDONLY will help with some of the worst offenders. But it will still be possible for the user namespace root to call setuid(NNN); and create a process with that uid. And if a unix domain socket isn't the only means of interacting there will still be problems. I will suggest that writing a uid mapping filesystem like overlayfs or perhaps as a mount option of overlayfs is likely to be a more robuse solution in general. Certainly that is what I originally had on the drawing board to solve this class of problem. Eric ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <87eh62v8hc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>]
* Re: userns idea: preventing SCM_CREDENTIALS from leaking out [not found] ` <87eh62v8hc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> @ 2013-11-27 14:44 ` Serge E. Hallyn [not found] ` <20131127144431.GA6122-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Serge E. Hallyn @ 2013-11-27 14:44 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Serge Hallyn, Linux Containers, Andy Lutomirski Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): > "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: > > > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > >> IIUC there are multiple ways to end up with a socket pair for which > >> one end is in a user namespace and the other is outside of it. That > >> means that SCM_CREDENTIALS can be used by a process in a userns to > >> authenticate to a process outside. > >> > >> This is all well and good (and, as far as I know, correct), but I'm > > > > And the cgroup manager I'm starting on depends on this. > > > >> not sure this is always the desired behavior. In the context of a > >> tool like Docker, it might be useful to have several user namespaces > >> that have the *same* uids mapped. Nonetheless, if one of those > >> namespaces is compromised, it probably shouldn't be permitted to > >> attack things outside the user namespace (or in the host, if any > >> interesting uids are mapped). > >> > >> Would it make sense to have an option to allow a user namespace to opt > >> into different behavior so that its users show up as the invalid uid > >> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)? > >> > >> Implementing this might be awkward (ok, it might actively suck due to > >> a possible need for reference counting), but I'm wondering if it's a > >> good idea even in principle. > > > > Well, I'll grant you, if I have a single directory with a socket in > > it, and I make that the aufs or overlayfs underlay for two separate > > mounts, which each are in different containers, then you might have > > a problem here. > > > > Now maybe the answer to that is that the sockets should be created > > in tmpfss (/run, /tmp, etc) anyway. But the more I think about it > > the more I, unfortunately, agree that this could be a problem. > > I really hate the concept of mapping a uid in some contexts and not > others. That seems very prone to go wrong. Given all of the possible > kinds of perumutations I can't imagine how we would get it correct. > > MS_NOSUID and MS_RDONLY will help with some of the worst offenders. > But it will still be possible for the user namespace root to call > setuid(NNN); and create a process with that uid. And if a unix domain > socket isn't the only means of interacting there will still be problems. > > I will suggest that writing a uid mapping filesystem like overlayfs or > perhaps as a mount option of overlayfs is likely to be a more robuse > solution in general. Certainly that is what I originally had on the > drawing board to solve this class of problem. Actually an option to aufs and overlayfs to say "any unix domain socket which is opened must first be copied to the writeable layer" would solve the issue (at least for all reasonable cases, iiuc) -serge ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20131127144431.GA6122-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: userns idea: preventing SCM_CREDENTIALS from leaking out [not found] ` <20131127144431.GA6122-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2013-11-27 16:24 ` Andy Lutomirski [not found] ` <CALCETrVXKHO4=Q+0szERmte+5HYJMwVXnXJxLTdBThmoQMMPcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Andy Lutomirski @ 2013-11-27 16:24 UTC (permalink / raw) To: Serge E. Hallyn Cc: Linux Containers, Serge Hallyn, Eric W. Biederman, Miklos Szeredi On Wed, Nov 27, 2013 at 6:44 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: >> >> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): >> >> IIUC there are multiple ways to end up with a socket pair for which >> >> one end is in a user namespace and the other is outside of it. That >> >> means that SCM_CREDENTIALS can be used by a process in a userns to >> >> authenticate to a process outside. >> >> >> >> This is all well and good (and, as far as I know, correct), but I'm >> > >> > And the cgroup manager I'm starting on depends on this. >> > >> >> not sure this is always the desired behavior. In the context of a >> >> tool like Docker, it might be useful to have several user namespaces >> >> that have the *same* uids mapped. Nonetheless, if one of those >> >> namespaces is compromised, it probably shouldn't be permitted to >> >> attack things outside the user namespace (or in the host, if any >> >> interesting uids are mapped). >> >> >> >> Would it make sense to have an option to allow a user namespace to opt >> >> into different behavior so that its users show up as the invalid uid >> >> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)? >> >> >> >> Implementing this might be awkward (ok, it might actively suck due to >> >> a possible need for reference counting), but I'm wondering if it's a >> >> good idea even in principle. >> > >> > Well, I'll grant you, if I have a single directory with a socket in >> > it, and I make that the aufs or overlayfs underlay for two separate >> > mounts, which each are in different containers, then you might have >> > a problem here. >> > >> > Now maybe the answer to that is that the sockets should be created >> > in tmpfss (/run, /tmp, etc) anyway. But the more I think about it >> > the more I, unfortunately, agree that this could be a problem. >> >> I really hate the concept of mapping a uid in some contexts and not >> others. That seems very prone to go wrong. Given all of the possible >> kinds of perumutations I can't imagine how we would get it correct. >> >> MS_NOSUID and MS_RDONLY will help with some of the worst offenders. >> But it will still be possible for the user namespace root to call >> setuid(NNN); and create a process with that uid. And if a unix domain >> socket isn't the only means of interacting there will still be problems. >> >> I will suggest that writing a uid mapping filesystem like overlayfs or >> perhaps as a mount option of overlayfs is likely to be a more robuse >> solution in general. Certainly that is what I originally had on the >> drawing board to solve this class of problem. > > Actually an option to aufs and overlayfs to say "any unix domain socket > which is opened must first be copied to the writeable layer" would > solve the issue (at least for all reasonable cases, iiuc) I guess I'm reasonably convinced that overlayfs is the right place to fix this. (Containers using lvm will be left in the cold -- oh, well.) cc: Miklos, who is the most likely to implement one or both of these features. (In cases where containers share a (non-overlay) directory that one of them can write, would it make sense to have an option MS_NOSOCKET that works on bind mounts?) --Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CALCETrVXKHO4=Q+0szERmte+5HYJMwVXnXJxLTdBThmoQMMPcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: userns idea: preventing SCM_CREDENTIALS from leaking out [not found] ` <CALCETrVXKHO4=Q+0szERmte+5HYJMwVXnXJxLTdBThmoQMMPcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-11-27 16:26 ` Serge E. Hallyn [not found] ` <20131127162626.GA7358-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 2013-11-27 16:56 ` Miklos Szeredi 1 sibling, 1 reply; 11+ messages in thread From: Serge E. Hallyn @ 2013-11-27 16:26 UTC (permalink / raw) To: Andy Lutomirski Cc: Miklos Szeredi, Linux Containers, Serge Hallyn, Eric W. Biederman Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > On Wed, Nov 27, 2013 at 6:44 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: > > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): > >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: > >> > >> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): > >> >> IIUC there are multiple ways to end up with a socket pair for which > >> >> one end is in a user namespace and the other is outside of it. That > >> >> means that SCM_CREDENTIALS can be used by a process in a userns to > >> >> authenticate to a process outside. > >> >> > >> >> This is all well and good (and, as far as I know, correct), but I'm > >> > > >> > And the cgroup manager I'm starting on depends on this. > >> > > >> >> not sure this is always the desired behavior. In the context of a > >> >> tool like Docker, it might be useful to have several user namespaces > >> >> that have the *same* uids mapped. Nonetheless, if one of those > >> >> namespaces is compromised, it probably shouldn't be permitted to > >> >> attack things outside the user namespace (or in the host, if any > >> >> interesting uids are mapped). > >> >> > >> >> Would it make sense to have an option to allow a user namespace to opt > >> >> into different behavior so that its users show up as the invalid uid > >> >> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)? > >> >> > >> >> Implementing this might be awkward (ok, it might actively suck due to > >> >> a possible need for reference counting), but I'm wondering if it's a > >> >> good idea even in principle. > >> > > >> > Well, I'll grant you, if I have a single directory with a socket in > >> > it, and I make that the aufs or overlayfs underlay for two separate > >> > mounts, which each are in different containers, then you might have > >> > a problem here. > >> > > >> > Now maybe the answer to that is that the sockets should be created > >> > in tmpfss (/run, /tmp, etc) anyway. But the more I think about it > >> > the more I, unfortunately, agree that this could be a problem. > >> > >> I really hate the concept of mapping a uid in some contexts and not > >> others. That seems very prone to go wrong. Given all of the possible > >> kinds of perumutations I can't imagine how we would get it correct. > >> > >> MS_NOSUID and MS_RDONLY will help with some of the worst offenders. > >> But it will still be possible for the user namespace root to call > >> setuid(NNN); and create a process with that uid. And if a unix domain > >> socket isn't the only means of interacting there will still be problems. > >> > >> I will suggest that writing a uid mapping filesystem like overlayfs or > >> perhaps as a mount option of overlayfs is likely to be a more robuse > >> solution in general. Certainly that is what I originally had on the > >> drawing board to solve this class of problem. > > > > Actually an option to aufs and overlayfs to say "any unix domain socket > > which is opened must first be copied to the writeable layer" would > > solve the issue (at least for all reasonable cases, iiuc) > > I guess I'm reasonably convinced that overlayfs is the right place to > fix this. (Containers using lvm will be left in the cold -- oh, > well.) Have you tested that? If I create two LVM snapshots of an LVM, with a unix sock on the original, and run containers on both snapshots, does the socket connect the two containers? > cc: Miklos, who is the most likely to implement one or both of these features. > > (In cases where containers share a (non-overlay) directory that one of > them can write, would it make sense to have an option MS_NOSOCKET that > works on bind mounts?) > > --Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <20131127162626.GA7358-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: userns idea: preventing SCM_CREDENTIALS from leaking out [not found] ` <20131127162626.GA7358-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2013-11-27 16:37 ` Andy Lutomirski 0 siblings, 0 replies; 11+ messages in thread From: Andy Lutomirski @ 2013-11-27 16:37 UTC (permalink / raw) To: Serge E. Hallyn Cc: Linux Containers, Serge Hallyn, Eric W. Biederman, Miklos Szeredi On Wed, Nov 27, 2013 at 8:26 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): >> On Wed, Nov 27, 2013 at 6:44 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: >> > Quoting Eric W. Biederman (ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org): >> >> "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> writes: >> >> >> >> > Quoting Andy Lutomirski (luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org): >> >> >> IIUC there are multiple ways to end up with a socket pair for which >> >> >> one end is in a user namespace and the other is outside of it. That >> >> >> means that SCM_CREDENTIALS can be used by a process in a userns to >> >> >> authenticate to a process outside. >> >> >> >> >> >> This is all well and good (and, as far as I know, correct), but I'm >> >> > >> >> > And the cgroup manager I'm starting on depends on this. >> >> > >> >> >> not sure this is always the desired behavior. In the context of a >> >> >> tool like Docker, it might be useful to have several user namespaces >> >> >> that have the *same* uids mapped. Nonetheless, if one of those >> >> >> namespaces is compromised, it probably shouldn't be permitted to >> >> >> attack things outside the user namespace (or in the host, if any >> >> >> interesting uids are mapped). >> >> >> >> >> >> Would it make sense to have an option to allow a user namespace to opt >> >> >> into different behavior so that its users show up as the invalid uid >> >> >> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)? >> >> >> >> >> >> Implementing this might be awkward (ok, it might actively suck due to >> >> >> a possible need for reference counting), but I'm wondering if it's a >> >> >> good idea even in principle. >> >> > >> >> > Well, I'll grant you, if I have a single directory with a socket in >> >> > it, and I make that the aufs or overlayfs underlay for two separate >> >> > mounts, which each are in different containers, then you might have >> >> > a problem here. >> >> > >> >> > Now maybe the answer to that is that the sockets should be created >> >> > in tmpfss (/run, /tmp, etc) anyway. But the more I think about it >> >> > the more I, unfortunately, agree that this could be a problem. >> >> >> >> I really hate the concept of mapping a uid in some contexts and not >> >> others. That seems very prone to go wrong. Given all of the possible >> >> kinds of perumutations I can't imagine how we would get it correct. >> >> >> >> MS_NOSUID and MS_RDONLY will help with some of the worst offenders. >> >> But it will still be possible for the user namespace root to call >> >> setuid(NNN); and create a process with that uid. And if a unix domain >> >> socket isn't the only means of interacting there will still be problems. >> >> >> >> I will suggest that writing a uid mapping filesystem like overlayfs or >> >> perhaps as a mount option of overlayfs is likely to be a more robuse >> >> solution in general. Certainly that is what I originally had on the >> >> drawing board to solve this class of problem. >> > >> > Actually an option to aufs and overlayfs to say "any unix domain socket >> > which is opened must first be copied to the writeable layer" would >> > solve the issue (at least for all reasonable cases, iiuc) >> >> I guess I'm reasonably convinced that overlayfs is the right place to >> fix this. (Containers using lvm will be left in the cold -- oh, >> well.) > > Have you tested that? If I create two LVM snapshots of an LVM, with a > unix sock on the original, and run containers on both snapshots, does > the socket connect the two containers? That won't work, of course. I meant that lvm containers won't be able to remap filesystem uids, which would be an even better fix. --Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: userns idea: preventing SCM_CREDENTIALS from leaking out [not found] ` <CALCETrVXKHO4=Q+0szERmte+5HYJMwVXnXJxLTdBThmoQMMPcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-11-27 16:26 ` Serge E. Hallyn @ 2013-11-27 16:56 ` Miklos Szeredi [not found] ` <CAJfpeguHPFcX07bM=+3JJrV1kanDxp5wZWj4jBo-+1EMceonqg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 11+ messages in thread From: Miklos Szeredi @ 2013-11-27 16:56 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Linux Containers, Serge Hallyn, Eric W. Biederman On Wed, Nov 27, 2013 at 5:24 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: >> Actually an option to aufs and overlayfs to say "any unix domain socket >> which is opened must first be copied to the writeable layer" would >> solve the issue (at least for all reasonable cases, iiuc) > > I guess I'm reasonably convinced that overlayfs is the right place to > fix this. (Containers using lvm will be left in the cold -- oh, > well.) > > cc: Miklos, who is the most likely to implement one or both of these features. AFAICS implementing the option to copy up a unix domain socket on open is trivial: just need to tweak ovl_open_need_copy_up(). Is that what you were thinking? > (In cases where containers share a (non-overlay) directory that one of > them can write, would it make sense to have an option MS_NOSOCKET that > works on bind mounts?) Isn't it "you can't send SCM_CREDENTIALS", rather than "you can't open unix domain socket"? Thanks, Miklos ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CAJfpeguHPFcX07bM=+3JJrV1kanDxp5wZWj4jBo-+1EMceonqg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: userns idea: preventing SCM_CREDENTIALS from leaking out [not found] ` <CAJfpeguHPFcX07bM=+3JJrV1kanDxp5wZWj4jBo-+1EMceonqg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-11-27 17:54 ` Andy Lutomirski 2013-11-27 18:47 ` Serge Hallyn 1 sibling, 0 replies; 11+ messages in thread From: Andy Lutomirski @ 2013-11-27 17:54 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Linux Containers, Serge Hallyn, Eric W. Biederman On Wed, Nov 27, 2013 at 8:56 AM, Miklos Szeredi <miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org> wrote: > On Wed, Nov 27, 2013 at 5:24 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: >>> Actually an option to aufs and overlayfs to say "any unix domain socket >>> which is opened must first be copied to the writeable layer" would >>> solve the issue (at least for all reasonable cases, iiuc) >> >> I guess I'm reasonably convinced that overlayfs is the right place to >> fix this. (Containers using lvm will be left in the cold -- oh, >> well.) >> >> cc: Miklos, who is the most likely to implement one or both of these features. > > AFAICS implementing the option to copy up a unix domain socket on open > is trivial: just need to tweak ovl_open_need_copy_up(). > > Is that what you were thinking? I'm not familiar enough w/ overlayfs. I think the desired semantics would be that a socket in the overlay mount would be a different inode than the socket in the bottom underlying fs (or whatever it's called). > >> (In cases where containers share a (non-overlay) directory that one of >> them can write, would it make sense to have an option MS_NOSOCKET that >> works on bind mounts?) > > Isn't it "you can't send SCM_CREDENTIALS", rather than "you can't open > unix domain socket"? > The latter may be considerably harder to implement. (There's SO_PEERCRED, too, and I don't know if there's a good place to stick a flag for this in an open socket.) I think the ideal solution here is to have non-overlapping uid ranges, and an option in overlayfs to remap uids and gids would make this possible, at least if overlayfs is in use. --Andy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: userns idea: preventing SCM_CREDENTIALS from leaking out [not found] ` <CAJfpeguHPFcX07bM=+3JJrV1kanDxp5wZWj4jBo-+1EMceonqg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-11-27 17:54 ` Andy Lutomirski @ 2013-11-27 18:47 ` Serge Hallyn 1 sibling, 0 replies; 11+ messages in thread From: Serge Hallyn @ 2013-11-27 18:47 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Linux Containers, Eric W. Biederman, Andy Lutomirski Quoting Miklos Szeredi (miklos-sUDqSbJrdHQHWmgEVkV9KA@public.gmane.org): > On Wed, Nov 27, 2013 at 5:24 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: > >> Actually an option to aufs and overlayfs to say "any unix domain socket > >> which is opened must first be copied to the writeable layer" would > >> solve the issue (at least for all reasonable cases, iiuc) > > > > I guess I'm reasonably convinced that overlayfs is the right place to > > fix this. (Containers using lvm will be left in the cold -- oh, > > well.) > > > > cc: Miklos, who is the most likely to implement one or both of these features. > > AFAICS implementing the option to copy up a unix domain socket on open > is trivial: just need to tweak ovl_open_need_copy_up(). > > Is that what you were thinking? Exactly. thanks, -serge ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-11-27 18:47 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-27 1:02 userns idea: preventing SCM_CREDENTIALS from leaking out Andy Lutomirski
[not found] ` <CALCETrWWSVnwg6Sb=bZz0xuAj_ASjZmsLYy=ELoR_uSqKJJaWg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-27 1:33 ` Eric W. Biederman
2013-11-27 1:49 ` Serge E. Hallyn
[not found] ` <20131127014920.GA31364-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2013-11-27 3:17 ` Eric W. Biederman
[not found] ` <87eh62v8hc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-11-27 14:44 ` Serge E. Hallyn
[not found] ` <20131127144431.GA6122-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2013-11-27 16:24 ` Andy Lutomirski
[not found] ` <CALCETrVXKHO4=Q+0szERmte+5HYJMwVXnXJxLTdBThmoQMMPcg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-27 16:26 ` Serge E. Hallyn
[not found] ` <20131127162626.GA7358-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2013-11-27 16:37 ` Andy Lutomirski
2013-11-27 16:56 ` Miklos Szeredi
[not found] ` <CAJfpeguHPFcX07bM=+3JJrV1kanDxp5wZWj4jBo-+1EMceonqg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-27 17:54 ` Andy Lutomirski
2013-11-27 18:47 ` Serge Hallyn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.