* [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets @ 2008-10-01 10:54 Denis V. Lunev 2008-10-01 11:13 ` Daniel Lezcano 0 siblings, 1 reply; 22+ messages in thread From: Denis V. Lunev @ 2008-10-01 10:54 UTC (permalink / raw) To: containers; +Cc: xemul, netdev, dlezcano, benjamin.thery, ebiederm, den This patch opens a way to connect via Unix socket from one namespace to another if these sockets are opened via conventional filesystem interface. Such approach allows to share important services between namespaces in efficient way. This breach is controlled by the means of shared filesystem, i.e. if somebody really wants to isolate containers, he should start from filesystem separation. Signed-off-by: Denis V. Lunev <den@openvz.org> --- net/unix/af_unix.c | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 39d2173..0e1eccd 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -297,9 +297,6 @@ static struct sock *unix_find_socket_byinode(struct net *net, struct inode *i) &unix_socket_table[i->i_ino & (UNIX_HASH_SIZE - 1)]) { struct dentry *dentry = unix_sk(s)->dentry; - if (!net_eq(sock_net(s), net)) - continue; - if(dentry && dentry->d_inode == i) { sock_hold(s); -- 1.5.3.rc5 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 10:54 [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets Denis V. Lunev @ 2008-10-01 11:13 ` Daniel Lezcano 2008-10-01 11:32 ` Denis V. Lunev 0 siblings, 1 reply; 22+ messages in thread From: Daniel Lezcano @ 2008-10-01 11:13 UTC (permalink / raw) To: Denis V. Lunev; +Cc: containers, xemul, netdev, benjamin.thery, ebiederm Denis V. Lunev wrote: > This patch opens a way to connect via Unix socket from one namespace > to another if these sockets are opened via conventional filesystem > interface. Such approach allows to share important services between > namespaces in efficient way. > > This breach is controlled by the means of shared filesystem, i.e. if > somebody really wants to isolate containers, he should start from > filesystem separation. > > Signed-off-by: Denis V. Lunev <den@openvz.org> > --- > net/unix/af_unix.c | 3 --- > 1 files changed, 0 insertions(+), 3 deletions(-) > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 39d2173..0e1eccd 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -297,9 +297,6 @@ static struct sock *unix_find_socket_byinode(struct net *net, struct inode *i) > &unix_socket_table[i->i_ino & (UNIX_HASH_SIZE - 1)]) { > struct dentry *dentry = unix_sk(s)->dentry; > > - if (!net_eq(sock_net(s), net)) > - continue; > - > if(dentry && dentry->d_inode == i) > { > sock_hold(s); Hi Denis, Do you have a list of the important services this isolation forbids ? (I suppose there is syslog). Without this isolation, we can not ensure the container will not connect to an application outside of the namespace. How can we handle this case for the checkpoint-restart/migration ? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 11:13 ` Daniel Lezcano @ 2008-10-01 11:32 ` Denis V. Lunev 2008-10-01 11:55 ` Daniel Lezcano 0 siblings, 1 reply; 22+ messages in thread From: Denis V. Lunev @ 2008-10-01 11:32 UTC (permalink / raw) To: Daniel Lezcano; +Cc: containers, xemul, netdev, benjamin.thery, ebiederm On Wed, 2008-10-01 at 13:13 +0200, Daniel Lezcano wrote: > Denis V. Lunev wrote: > > This patch opens a way to connect via Unix socket from one namespace > > to another if these sockets are opened via conventional filesystem > > interface. Such approach allows to share important services between > > namespaces in efficient way. > > > > This breach is controlled by the means of shared filesystem, i.e. if > > somebody really wants to isolate containers, he should start from > > filesystem separation. > > > > Signed-off-by: Denis V. Lunev <den@openvz.org> > > --- > > net/unix/af_unix.c | 3 --- > > 1 files changed, 0 insertions(+), 3 deletions(-) > > > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > > index 39d2173..0e1eccd 100644 > > --- a/net/unix/af_unix.c > > +++ b/net/unix/af_unix.c > > @@ -297,9 +297,6 @@ static struct sock *unix_find_socket_byinode(struct net *net, struct inode *i) > > &unix_socket_table[i->i_ino & (UNIX_HASH_SIZE - 1)]) { > > struct dentry *dentry = unix_sk(s)->dentry; > > > > - if (!net_eq(sock_net(s), net)) > > - continue; > > - > > if(dentry && dentry->d_inode == i) > > { > > sock_hold(s); > > Hi Denis, > > Do you have a list of the important services this isolation forbids ? (I > suppose there is syslog). we have asked from our customers for a shared MySQL server The full story is here :) http://bugzilla.openvz.org/show_bug.cgi?id=985 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 11:32 ` Denis V. Lunev @ 2008-10-01 11:55 ` Daniel Lezcano [not found] ` <48E3653C.1070701-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 22+ messages in thread From: Daniel Lezcano @ 2008-10-01 11:55 UTC (permalink / raw) To: Denis V. Lunev; +Cc: netdev, containers, benjamin.thery, ebiederm, xemul Denis V. Lunev wrote: > On Wed, 2008-10-01 at 13:13 +0200, Daniel Lezcano wrote: >> Denis V. Lunev wrote: >>> This patch opens a way to connect via Unix socket from one namespace >>> to another if these sockets are opened via conventional filesystem >>> interface. Such approach allows to share important services between >>> namespaces in efficient way. >>> >>> This breach is controlled by the means of shared filesystem, i.e. if >>> somebody really wants to isolate containers, he should start from >>> filesystem separation. >>> >>> Signed-off-by: Denis V. Lunev <den@openvz.org> >>> --- >>> net/unix/af_unix.c | 3 --- >>> 1 files changed, 0 insertions(+), 3 deletions(-) >>> >>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c >>> index 39d2173..0e1eccd 100644 >>> --- a/net/unix/af_unix.c >>> +++ b/net/unix/af_unix.c >>> @@ -297,9 +297,6 @@ static struct sock *unix_find_socket_byinode(struct net *net, struct inode *i) >>> &unix_socket_table[i->i_ino & (UNIX_HASH_SIZE - 1)]) { >>> struct dentry *dentry = unix_sk(s)->dentry; >>> >>> - if (!net_eq(sock_net(s), net)) >>> - continue; >>> - >>> if(dentry && dentry->d_inode == i) >>> { >>> sock_hold(s); >> Hi Denis, >> >> Do you have a list of the important services this isolation forbids ? (I >> suppose there is syslog). > > we have asked from our customers for a shared MySQL server > > The full story is here :) > http://bugzilla.openvz.org/show_bug.cgi?id=985 Ok, thanks. My question remains :) How do you handle migration in this case ? ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <48E3653C.1070701-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets [not found] ` <48E3653C.1070701-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org> @ 2008-10-01 12:03 ` Denis V. Lunev 2008-10-01 12:19 ` Daniel Lezcano 0 siblings, 1 reply; 22+ messages in thread From: Denis V. Lunev @ 2008-10-01 12:03 UTC (permalink / raw) To: Daniel Lezcano Cc: netdev-u79uwXL29TY76Z2rM5mHXA, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, xemul-GEFAQzZX7r8dnm+yROfE0A, ebiederm-aS9lmoZGLiVWk0Htik3J/w, benjamin.thery-6ktuUTfB/bM On Wed, 2008-10-01 at 13:55 +0200, Daniel Lezcano wrote: > Denis V. Lunev wrote: > > On Wed, 2008-10-01 at 13:13 +0200, Daniel Lezcano wrote: > >> Denis V. Lunev wrote: > >>> This patch opens a way to connect via Unix socket from one namespace > >>> to another if these sockets are opened via conventional filesystem > >>> interface. Such approach allows to share important services between > >>> namespaces in efficient way. > >>> > >>> This breach is controlled by the means of shared filesystem, i.e. if > >>> somebody really wants to isolate containers, he should start from > >>> filesystem separation. > >>> > >>> Signed-off-by: Denis V. Lunev <den-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> > >>> --- > >>> net/unix/af_unix.c | 3 --- > >>> 1 files changed, 0 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > >>> index 39d2173..0e1eccd 100644 > >>> --- a/net/unix/af_unix.c > >>> +++ b/net/unix/af_unix.c > >>> @@ -297,9 +297,6 @@ static struct sock *unix_find_socket_byinode(struct net *net, struct inode *i) > >>> &unix_socket_table[i->i_ino & (UNIX_HASH_SIZE - 1)]) { > >>> struct dentry *dentry = unix_sk(s)->dentry; > >>> > >>> - if (!net_eq(sock_net(s), net)) > >>> - continue; > >>> - > >>> if(dentry && dentry->d_inode == i) > >>> { > >>> sock_hold(s); > >> Hi Denis, > >> > >> Do you have a list of the important services this isolation forbids ? (I > >> suppose there is syslog). > > > > we have asked from our customers for a shared MySQL server > > > > The full story is here :) > > http://bugzilla.openvz.org/show_bug.cgi?id=985 > > Ok, thanks. > > My question remains :) > > How do you handle migration in this case ? There is no problem until you really have listeners from different namespaces on both ends. This is checked after the freeze stage and migration is forbidden if such a situation is detected. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 12:03 ` Denis V. Lunev @ 2008-10-01 12:19 ` Daniel Lezcano 2008-10-01 12:24 ` Pavel Emelyanov 0 siblings, 1 reply; 22+ messages in thread From: Daniel Lezcano @ 2008-10-01 12:19 UTC (permalink / raw) To: Denis V. Lunev; +Cc: netdev, containers, benjamin.thery, ebiederm, xemul Denis V. Lunev wrote: > On Wed, 2008-10-01 at 13:55 +0200, Daniel Lezcano wrote: >> Denis V. Lunev wrote: >>> On Wed, 2008-10-01 at 13:13 +0200, Daniel Lezcano wrote: >>>> Denis V. Lunev wrote: >>>>> This patch opens a way to connect via Unix socket from one namespace >>>>> to another if these sockets are opened via conventional filesystem >>>>> interface. Such approach allows to share important services between >>>>> namespaces in efficient way. >>>>> >>>>> This breach is controlled by the means of shared filesystem, i.e. if >>>>> somebody really wants to isolate containers, he should start from >>>>> filesystem separation. >>>>> >>>>> Signed-off-by: Denis V. Lunev <den@openvz.org> >>>>> --- >>>>> net/unix/af_unix.c | 3 --- >>>>> 1 files changed, 0 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c >>>>> index 39d2173..0e1eccd 100644 >>>>> --- a/net/unix/af_unix.c >>>>> +++ b/net/unix/af_unix.c >>>>> @@ -297,9 +297,6 @@ static struct sock *unix_find_socket_byinode(struct net *net, struct inode *i) >>>>> &unix_socket_table[i->i_ino & (UNIX_HASH_SIZE - 1)]) { >>>>> struct dentry *dentry = unix_sk(s)->dentry; >>>>> >>>>> - if (!net_eq(sock_net(s), net)) >>>>> - continue; >>>>> - >>>>> if(dentry && dentry->d_inode == i) >>>>> { >>>>> sock_hold(s); >>>> Hi Denis, >>>> >>>> Do you have a list of the important services this isolation forbids ? (I >>>> suppose there is syslog). >>> we have asked from our customers for a shared MySQL server >>> >>> The full story is here :) >>> http://bugzilla.openvz.org/show_bug.cgi?id=985 >> Ok, thanks. >> >> My question remains :) >> >> How do you handle migration in this case ? > > There is no problem until you really have listeners from different > namespaces on both ends. This is checked after the freeze stage and > migration is forbidden if such a situation is detected. So there are 2 cases: * full isolation : restriction on VPS * partial isolation : no restriction but *perhaps* problem when migrating Looks like we need an option per namespace to reduce the isolation for af_unix sockets :) - on (default): current behaviour => full isolation - off : partial isolation ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 12:19 ` Daniel Lezcano @ 2008-10-01 12:24 ` Pavel Emelyanov 2008-10-01 12:31 ` Daniel Lezcano 0 siblings, 1 reply; 22+ messages in thread From: Pavel Emelyanov @ 2008-10-01 12:24 UTC (permalink / raw) To: Daniel Lezcano; +Cc: netdev, containers, benjamin.thery, ebiederm, Denis Lunev > So there are 2 cases: > * full isolation : restriction on VPS > * partial isolation : no restriction but *perhaps* problem when migrating > > Looks like we need an option per namespace to reduce the isolation for > af_unix sockets :) > - on (default): current behaviour => full isolation > - off : partial isolation You mean some sysctl, that enables/disables this check in unix_find_socket_byinode? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 12:24 ` Pavel Emelyanov @ 2008-10-01 12:31 ` Daniel Lezcano 2008-10-01 12:40 ` Pavel Emelyanov 2008-10-01 13:11 ` Denis V. Lunev 0 siblings, 2 replies; 22+ messages in thread From: Daniel Lezcano @ 2008-10-01 12:31 UTC (permalink / raw) To: Pavel Emelyanov; +Cc: netdev, containers, benjamin.thery, ebiederm, Denis Lunev Pavel Emelyanov wrote: >> So there are 2 cases: >> * full isolation : restriction on VPS >> * partial isolation : no restriction but *perhaps* problem when migrating >> >> Looks like we need an option per namespace to reduce the isolation for >> af_unix sockets :) >> - on (default): current behaviour => full isolation >> - off : partial isolation > > You mean some sysctl, that enables/disables this check in unix_find_socket_byinode? Yes. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 12:31 ` Daniel Lezcano @ 2008-10-01 12:40 ` Pavel Emelyanov 2008-10-01 13:08 ` Cedric Le Goater 2008-10-01 13:11 ` Denis V. Lunev 1 sibling, 1 reply; 22+ messages in thread From: Pavel Emelyanov @ 2008-10-01 12:40 UTC (permalink / raw) To: Daniel Lezcano; +Cc: netdev, containers, benjamin.thery, ebiederm, Denis Lunev Daniel Lezcano wrote: > Pavel Emelyanov wrote: >>> So there are 2 cases: >>> * full isolation : restriction on VPS >>> * partial isolation : no restriction but *perhaps* problem when migrating >>> >>> Looks like we need an option per namespace to reduce the isolation for >>> af_unix sockets :) >>> - on (default): current behaviour => full isolation >>> - off : partial isolation >> You mean some sysctl, that enables/disables this check in unix_find_socket_byinode? > > Yes. OK. Den, please, do :) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 12:40 ` Pavel Emelyanov @ 2008-10-01 13:08 ` Cedric Le Goater 2008-10-01 13:50 ` Daniel Lezcano 0 siblings, 1 reply; 22+ messages in thread From: Cedric Le Goater @ 2008-10-01 13:08 UTC (permalink / raw) To: Pavel Emelyanov Cc: Daniel Lezcano, Denis Lunev, netdev, containers, ebiederm, benjamin.thery Pavel Emelyanov wrote: > Daniel Lezcano wrote: >> Pavel Emelyanov wrote: >>>> So there are 2 cases: >>>> * full isolation : restriction on VPS >>>> * partial isolation : no restriction but *perhaps* problem when migrating >>>> >>>> Looks like we need an option per namespace to reduce the isolation for >>>> af_unix sockets :) >>>> - on (default): current behaviour => full isolation >>>> - off : partial isolation >>> You mean some sysctl, that enables/disables this check in unix_find_socket_byinode? >> Yes. > > OK. Den, please, do :) hmm, would that allow sibling namespaces to connect to each other ? If so, I'm not in favor of such a solution. I understand the need. we had a similar issue with the command line tool pgsl. Could we work something out with the capabilities ? or make an exception if your ->nsproxy->net_ns == init_net ? Thanks, C. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 13:08 ` Cedric Le Goater @ 2008-10-01 13:50 ` Daniel Lezcano 2008-10-01 15:07 ` Cedric Le Goater 0 siblings, 1 reply; 22+ messages in thread From: Daniel Lezcano @ 2008-10-01 13:50 UTC (permalink / raw) To: Cedric Le Goater Cc: Pavel Emelyanov, Denis Lunev, netdev, containers, ebiederm, benjamin.thery Cedric Le Goater wrote: > Pavel Emelyanov wrote: >> Daniel Lezcano wrote: >>> Pavel Emelyanov wrote: >>>>> So there are 2 cases: >>>>> * full isolation : restriction on VPS >>>>> * partial isolation : no restriction but *perhaps* problem when migrating >>>>> >>>>> Looks like we need an option per namespace to reduce the isolation for >>>>> af_unix sockets :) >>>>> - on (default): current behaviour => full isolation >>>>> - off : partial isolation >>>> You mean some sysctl, that enables/disables this check in unix_find_socket_byinode? >>> Yes. >> OK. Den, please, do :) > > hmm, would that allow sibling namespaces to connect to each other ? If so, > I'm not in favor of such a solution. > > I understand the need. we had a similar issue with the command line tool > pgsl. Could we work something out with the capabilities ? or make an > exception if your ->nsproxy->net_ns == init_net ? Why capabilities is better than a simple sysctl ? Making an exception for init_net will break the nested containers no ? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 13:50 ` Daniel Lezcano @ 2008-10-01 15:07 ` Cedric Le Goater 0 siblings, 0 replies; 22+ messages in thread From: Cedric Le Goater @ 2008-10-01 15:07 UTC (permalink / raw) To: Daniel Lezcano Cc: netdev, containers, ebiederm, Denis Lunev, Pavel Emelyanov, benjamin.thery Daniel Lezcano wrote: > Cedric Le Goater wrote: >> Pavel Emelyanov wrote: >>> Daniel Lezcano wrote: >>>> Pavel Emelyanov wrote: >>>>>> So there are 2 cases: >>>>>> * full isolation : restriction on VPS >>>>>> * partial isolation : no restriction but *perhaps* problem when migrating >>>>>> >>>>>> Looks like we need an option per namespace to reduce the isolation for >>>>>> af_unix sockets :) >>>>>> - on (default): current behaviour => full isolation >>>>>> - off : partial isolation >>>>> You mean some sysctl, that enables/disables this check in unix_find_socket_byinode? >>>> Yes. >>> OK. Den, please, do :) >> hmm, would that allow sibling namespaces to connect to each other ? If so, >> I'm not in favor of such a solution. >> >> I understand the need. we had a similar issue with the command line tool >> pgsl. Could we work something out with the capabilities ? or make an >> exception if your ->nsproxy->net_ns == init_net ? > > Why capabilities is better than a simple sysctl ? because it depends on the current process privilege and not just some random process. > Making an exception for init_net will break the nested containers no ? may be. I don't know how this is implemented. if we break isolation, my feeling is that we should only do it for a parent namespace. It just feel wrong to allow sibling namespaces to connect to each other. C. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 12:31 ` Daniel Lezcano 2008-10-01 12:40 ` Pavel Emelyanov @ 2008-10-01 13:11 ` Denis V. Lunev 2008-10-01 13:46 ` Daniel Lezcano 1 sibling, 1 reply; 22+ messages in thread From: Denis V. Lunev @ 2008-10-01 13:11 UTC (permalink / raw) To: Daniel Lezcano Cc: Pavel Emelyanov, netdev, containers, benjamin.thery, ebiederm On Wed, 2008-10-01 at 14:31 +0200, Daniel Lezcano wrote: > Pavel Emelyanov wrote: > >> So there are 2 cases: > >> * full isolation : restriction on VPS > >> * partial isolation : no restriction but *perhaps* problem when migrating > >> > >> Looks like we need an option per namespace to reduce the isolation for > >> af_unix sockets :) > >> - on (default): current behaviour => full isolation > >> - off : partial isolation > > > > You mean some sysctl, that enables/disables this check in unix_find_socket_byinode? > > Yes. I do not see much sense with sysctl as: - check (cross-connected sockets) is required as we can start namespace with already opened socket - this kind of sharing is not implicit but explicit as normal isolated containers _must_ have separate filesystems. In this case this sharing requires explicit host administrator action to link socket between containers ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 13:11 ` Denis V. Lunev @ 2008-10-01 13:46 ` Daniel Lezcano 2008-10-01 14:54 ` Denis V. Lunev 0 siblings, 1 reply; 22+ messages in thread From: Daniel Lezcano @ 2008-10-01 13:46 UTC (permalink / raw) To: Denis V. Lunev Cc: Pavel Emelyanov, netdev, containers, benjamin.thery, ebiederm Denis V. Lunev wrote: > On Wed, 2008-10-01 at 14:31 +0200, Daniel Lezcano wrote: >> Pavel Emelyanov wrote: >>>> So there are 2 cases: >>>> * full isolation : restriction on VPS >>>> * partial isolation : no restriction but *perhaps* problem when migrating >>>> >>>> Looks like we need an option per namespace to reduce the isolation for >>>> af_unix sockets :) >>>> - on (default): current behaviour => full isolation >>>> - off : partial isolation >>> You mean some sysctl, that enables/disables this check in unix_find_socket_byinode? >> Yes. > > I do not see much sense with sysctl as: > - check (cross-connected sockets) is required as we can start namespace > with already opened socket Check when checkpointing ? If you inherit a socket from your parent namespace, this socket belongs to your parent and you should not checkpoint it, no ? In case you allow cross-connected sockets, this check is mandatory I agree. > - this kind of sharing is not implicit but explicit as normal isolated > containers _must_ have separate filesystems. In this case this > sharing requires explicit host administrator action to link socket > between containers What are "normal isolated containers" ? Are they OpenVZ containers ? These containers belong to the system containers family. What happens with application containers, if I want to share the filesystem without breaking the isolation of the afunix sockets ? The current code provides full isolation and this is in mainline. I don't think it is reasonable to change that. What I propose is to keep the current behaviour. When you create a network namespace, you can change the behaviour inside this namespace via /proc/sys/net/unix/isolated (for example). This option allows: 1 - to connect to af_unix not belonging to the container 2 - to accept af_unix connection from outside the container (avoid a container to forbid the checkpoint of another container); ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 13:46 ` Daniel Lezcano @ 2008-10-01 14:54 ` Denis V. Lunev 2008-10-01 15:18 ` Daniel Lezcano 0 siblings, 1 reply; 22+ messages in thread From: Denis V. Lunev @ 2008-10-01 14:54 UTC (permalink / raw) To: Daniel Lezcano Cc: Pavel Emelyanov, netdev, containers, benjamin.thery, ebiederm On Wed, 2008-10-01 at 15:46 +0200, Daniel Lezcano wrote: > Denis V. Lunev wrote: > > On Wed, 2008-10-01 at 14:31 +0200, Daniel Lezcano wrote: > >> Pavel Emelyanov wrote: > >>>> So there are 2 cases: > >>>> * full isolation : restriction on VPS > >>>> * partial isolation : no restriction but *perhaps* problem when migrating > >>>> > >>>> Looks like we need an option per namespace to reduce the isolation for > >>>> af_unix sockets :) > >>>> - on (default): current behaviour => full isolation > >>>> - off : partial isolation > >>> You mean some sysctl, that enables/disables this check in unix_find_socket_byinode? > >> Yes. > > > > I do not see much sense with sysctl as: > > - check (cross-connected sockets) is required as we can start namespace > > with already opened socket > > Check when checkpointing ? If you inherit a socket from your parent > namespace, this socket belongs to your parent and you should not > checkpoint it, no ? > > In case you allow cross-connected sockets, this check is mandatory I agree. > > > - this kind of sharing is not implicit but explicit as normal isolated > > containers _must_ have separate filesystems. In this case this > > sharing requires explicit host administrator action to link socket > > between containers > > What are "normal isolated containers" ? Are they OpenVZ containers ? > These containers belong to the system containers family. What happens > with application containers, if I want to share the filesystem without > breaking the isolation of the afunix sockets ? then you are doomed as you will have a FIFO opened from 2 namespaces and checking the absences of external references is still mandatory > The current code provides full isolation and this is in mainline. I > don't think it is reasonable to change that. What I propose is to keep > the current behaviour. > > When you create a network namespace, you can change the behaviour inside > this namespace via /proc/sys/net/unix/isolated (for example). > > This option allows: > 1 - to connect to af_unix not belonging to the container > 2 - to accept af_unix connection from outside the container (avoid a > container to forbid the checkpoint of another container); this should be at least per/namespace option controlled from parent container from my POW ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 14:54 ` Denis V. Lunev @ 2008-10-01 15:18 ` Daniel Lezcano 2008-10-01 15:31 ` Pavel Emelyanov 0 siblings, 1 reply; 22+ messages in thread From: Daniel Lezcano @ 2008-10-01 15:18 UTC (permalink / raw) To: Denis V. Lunev Cc: Pavel Emelyanov, netdev, containers, benjamin.thery, ebiederm Denis V. Lunev wrote: > On Wed, 2008-10-01 at 15:46 +0200, Daniel Lezcano wrote: >> Denis V. Lunev wrote: >>> On Wed, 2008-10-01 at 14:31 +0200, Daniel Lezcano wrote: >>>> Pavel Emelyanov wrote: >>>>>> So there are 2 cases: >>>>>> * full isolation : restriction on VPS >>>>>> * partial isolation : no restriction but *perhaps* problem when migrating >>>>>> >>>>>> Looks like we need an option per namespace to reduce the isolation for >>>>>> af_unix sockets :) >>>>>> - on (default): current behaviour => full isolation >>>>>> - off : partial isolation >>>>> You mean some sysctl, that enables/disables this check in unix_find_socket_byinode? >>>> Yes. >>> I do not see much sense with sysctl as: >>> - check (cross-connected sockets) is required as we can start namespace >>> with already opened socket >> Check when checkpointing ? If you inherit a socket from your parent >> namespace, this socket belongs to your parent and you should not >> checkpoint it, no ? >> >> In case you allow cross-connected sockets, this check is mandatory I agree. >> >>> - this kind of sharing is not implicit but explicit as normal isolated >>> containers _must_ have separate filesystems. In this case this >>> sharing requires explicit host administrator action to link socket >>> between containers >> What are "normal isolated containers" ? Are they OpenVZ containers ? >> These containers belong to the system containers family. What happens >> with application containers, if I want to share the filesystem without >> breaking the isolation of the afunix sockets ? > > then you are doomed as you will have a FIFO opened from 2 namespaces and > checking the absences of external references is still mandatory >> The current code provides full isolation and this is in mainline. I >> don't think it is reasonable to change that. What I propose is to keep >> the current behaviour. >> >> When you create a network namespace, you can change the behaviour inside >> this namespace via /proc/sys/net/unix/isolated (for example). >> >> This option allows: >> 1 - to connect to af_unix not belonging to the container >> 2 - to accept af_unix connection from outside the container (avoid a >> container to forbid the checkpoint of another container); > > this should be at least per/namespace option controlled from parent > container from my POW Yes per namespace, I agree. If the option is controlled by the parent and it is done by sysctl, you will have to make proc/sys per namespace like Pavel did with /proc/net, no ? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 15:18 ` Daniel Lezcano @ 2008-10-01 15:31 ` Pavel Emelyanov 2008-10-01 15:38 ` Daniel Lezcano 0 siblings, 1 reply; 22+ messages in thread From: Pavel Emelyanov @ 2008-10-01 15:31 UTC (permalink / raw) To: Daniel Lezcano Cc: Denis V. Lunev, netdev, containers, benjamin.thery, ebiederm > Yes per namespace, I agree. > > If the option is controlled by the parent and it is done by sysctl, you > will have to make proc/sys per namespace like Pavel did with /proc/net, no ? /proc/sys is already per namespace actually ;) Or what did you mean by that? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 15:31 ` Pavel Emelyanov @ 2008-10-01 15:38 ` Daniel Lezcano 2008-10-01 15:42 ` Pavel Emelyanov 0 siblings, 1 reply; 22+ messages in thread From: Daniel Lezcano @ 2008-10-01 15:38 UTC (permalink / raw) To: Pavel Emelyanov Cc: Denis V. Lunev, netdev, containers, benjamin.thery, ebiederm Pavel Emelyanov wrote: >> Yes per namespace, I agree. >> >> If the option is controlled by the parent and it is done by sysctl, you >> will have to make proc/sys per namespace like Pavel did with /proc/net, no ? > > /proc/sys is already per namespace actually ;) Or what did you mean by that? Effectively I was not clear :) I meant, you can not access /proc/sys from outside the namespace like /proc/net which can be followed up by /proc/<pid>/net outside the namespace. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 15:38 ` Daniel Lezcano @ 2008-10-01 15:42 ` Pavel Emelyanov 2008-10-01 16:15 ` Daniel Lezcano 0 siblings, 1 reply; 22+ messages in thread From: Pavel Emelyanov @ 2008-10-01 15:42 UTC (permalink / raw) To: Daniel Lezcano Cc: Denis V. Lunev, netdev, containers, benjamin.thery, ebiederm Daniel Lezcano wrote: > Pavel Emelyanov wrote: >>> Yes per namespace, I agree. >>> >>> If the option is controlled by the parent and it is done by sysctl, you >>> will have to make proc/sys per namespace like Pavel did with /proc/net, no ? >> /proc/sys is already per namespace actually ;) Or what did you mean by that? > > > Effectively I was not clear :) > > I meant, you can not access /proc/sys from outside the namespace like > /proc/net which can be followed up by /proc/<pid>/net outside the namespace. Ah! I've got it. Well, I think after Al Viro finishes with sysctl rework this possibility will appear, but Denis actually persuaded me in his POV - if we do want to disable shared sockets we *can* do this by putting containers in proper mount namespaces of chroot environments. > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 15:42 ` Pavel Emelyanov @ 2008-10-01 16:15 ` Daniel Lezcano 2008-10-02 10:21 ` Denis V. Lunev 0 siblings, 1 reply; 22+ messages in thread From: Daniel Lezcano @ 2008-10-01 16:15 UTC (permalink / raw) To: Pavel Emelyanov Cc: Denis V. Lunev, netdev, containers, benjamin.thery, ebiederm Pavel Emelyanov wrote: > Daniel Lezcano wrote: >> Pavel Emelyanov wrote: >>>> Yes per namespace, I agree. >>>> >>>> If the option is controlled by the parent and it is done by sysctl, you >>>> will have to make proc/sys per namespace like Pavel did with /proc/net, no ? >>> /proc/sys is already per namespace actually ;) Or what did you mean by that? >> >> Effectively I was not clear :) >> >> I meant, you can not access /proc/sys from outside the namespace like >> /proc/net which can be followed up by /proc/<pid>/net outside the namespace. > > Ah! I've got it. Well, I think after Al Viro finishes with sysctl > rework this possibility will appear, but Denis actually persuaded me > in his POV - if we do want to disable shared sockets we *can* do this > by putting containers in proper mount namespaces of chroot environments. And I agree with this point. But :) 1 - the current behaviour is full isolation. Shall we/can we change that without taking into account there are perhaps some people using this today ? I don't know. 2 - I wish to launch a non chrooted application inside a namespace, sharing the file system without sharing the af_unix sockets, because I don't want the application running inside the container overlap with the socket af_unix of another container. I prefer to detect a collision with a strong isolation and handle it manually (remount some part of the fs for example). 3 - I would like to be able to reduce this isolation (your point) to share the af_unix socket for example to use /dev/klog or something else. I don't know how much we can consider the point 1, 2 pertinent, but disabling 3 lines of code via a sysctl with strong isolation as default and having a process unsharing the namespace in userspace and changing this value to less isolation is not a big challenge IMHO :) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-01 16:15 ` Daniel Lezcano @ 2008-10-02 10:21 ` Denis V. Lunev 2008-10-02 20:03 ` Eric W. Biederman 0 siblings, 1 reply; 22+ messages in thread From: Denis V. Lunev @ 2008-10-02 10:21 UTC (permalink / raw) To: Daniel Lezcano Cc: Pavel Emelyanov, netdev, containers, benjamin.thery, ebiederm On Wed, 2008-10-01 at 18:15 +0200, Daniel Lezcano wrote: > Pavel Emelyanov wrote: > > Daniel Lezcano wrote: > >> Pavel Emelyanov wrote: > >>>> Yes per namespace, I agree. > >>>> > >>>> If the option is controlled by the parent and it is done by sysctl, you > >>>> will have to make proc/sys per namespace like Pavel did with /proc/net, no ? > >>> /proc/sys is already per namespace actually ;) Or what did you mean by that? > >> > >> Effectively I was not clear :) > >> > >> I meant, you can not access /proc/sys from outside the namespace like > >> /proc/net which can be followed up by /proc/<pid>/net outside the namespace. > > > > Ah! I've got it. Well, I think after Al Viro finishes with sysctl > > rework this possibility will appear, but Denis actually persuaded me > > in his POV - if we do want to disable shared sockets we *can* do this > > by putting containers in proper mount namespaces of chroot environments. > > And I agree with this point. But :) > > 1 - the current behaviour is full isolation. Shall we/can we change > that without taking into account there are perhaps some people using > this today ? I don't know. We have a direct request from people using to remove this state of isolation. > 2 - I wish to launch a non chrooted application inside a namespace, > sharing the file system without sharing the af_unix sockets, because I > don't want the application running inside the container overlap with the > socket af_unix of another container. I prefer to detect a collision with > a strong isolation and handle it manually (remount some part of the fs > for example). with common filesystem you have to detect collisions at least for FIFOs. This situation is the same. Basically, if we'll treat named Unix sockets as an improved FIFO - it's better to use the same approach > 3 - I would like to be able to reduce this isolation (your point) to > share the af_unix socket for example to use /dev/klog or something else. > > I don't know how much we can consider the point 1, 2 pertinent, but > disabling 3 lines of code via a sysctl with strong isolation as default > and having a process unsharing the namespace in userspace and changing > this value to less isolation is not a big challenge IMHO :) the real questions is _who_ is responsible for this kind of staff -> node (parent container) administrator or container administrator. I strongly vote for first. Also if we are talking about such kind of staff, I dislike global kludge. This should be a property of two concrete VEs and better two concrete sockets. Unfortunately, setsockopt is not an option :( ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets 2008-10-02 10:21 ` Denis V. Lunev @ 2008-10-02 20:03 ` Eric W. Biederman 0 siblings, 0 replies; 22+ messages in thread From: Eric W. Biederman @ 2008-10-02 20:03 UTC (permalink / raw) To: Denis V. Lunev Cc: Daniel Lezcano, Pavel Emelyanov, netdev, containers, benjamin.thery I do not believe we have yet met the burden of proof necessary to make the proposed semantic change. "Denis V. Lunev" <den@openvz.org> writes: > On Wed, 2008-10-01 at 18:15 +0200, Daniel Lezcano wrote: >> >> 1 - the current behaviour is full isolation. Shall we/can we change >> that without taking into account there are perhaps some people using >> this today ? I don't know. > We have a direct request from people using to remove this state of > isolation. Users of a brittle but convenient hack not application developers. Which means there is demand for this class of communication but it doesn't mean that the suggested way is the right way. >> 2 - I wish to launch a non chrooted application inside a namespace, >> sharing the file system without sharing the af_unix sockets, because I >> don't want the application running inside the container overlap with the >> socket af_unix of another container. I prefer to detect a collision with >> a strong isolation and handle it manually (remount some part of the fs >> for example). > with common filesystem you have to detect collisions at least for FIFOs. > This situation is the same. Basically, if we'll treat named Unix sockets > as an improved FIFO - it's better to use the same approach There are two aspects to this. 1) Looking up the proxy object in the filesystem. With that I agree that the file system access rules are sufficient. If we don't want the possibility of proxy objects you simply configure the system so that there are no shared files between namespaces. 2) How we interpret the proxy object. Currently the shared filesystem aka NFS precedent is that you can see the unix domain socket but not use it. Just what we have implemented now. As for FIFO you are be right that there is a potential bug there. Likely those fall in the yet to be implemented device namespace. It is certainly an area of the code we have not audited and thought about, so there is no precedent set. >> 3 - I would like to be able to reduce this isolation (your point) to >> share the af_unix socket for example to use /dev/klog or something else. >> >> I don't know how much we can consider the point 1, 2 pertinent, but >> disabling 3 lines of code via a sysctl with strong isolation as default >> and having a process unsharing the namespace in userspace and changing >> this value to less isolation is not a big challenge IMHO :) > the real questions is _who_ is responsible for this kind of staff -> > node (parent container) administrator or container administrator. I > strongly vote for first. > > Also if we are talking about such kind of staff, I dislike global > kludge. This should be a property of two concrete VEs and better two > concrete sockets. Unfortunately, setsockopt is not an option :( We support sockets from different namespaces in a single process. I agree that to keep the hack working we can not use setsockopt. I don't agree that we want to keep the hack working. - The code needs an audit to think about what it means to exchange packets between unix domain sockets in different network namespaces. There is surprising a lot of code in veth to accommodate that. If we have too many places where we need to do something strange it is going to make code maintenance difficult. - The semantics get stranger with respect to interpreting unix domain socket proxy objects. Sometimes we can connect to someplace non-local and sometimes we can't. - We can do this without changing the semantics of how socket proxy objects are interpreted, as it is possible for a process to use sockets in two different network namespaces. That requires application level changes. - It is prohibitively difficult to implement unix domain sockets that talk between different kernels (file descriptors sharing offsets and garbage collection of in flight sockets ouch!). This means that encouraging the use of unix domain sockets to transparently connect applications in different containers is probably a bad idea. This is completely different from FIFO's that are simple enough it is not hard to relay data between machines and get it right. - I don't know how much using unix domain sockets to talk between different domains transparently will confuse applications, and increase the risk of security exploits. I don't think it will be much though as we already have unix permissions checking from the proxy object, and we should be handling the namespace transitions of passed objects in the code that sends credentials and file descriptors. Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2008-10-02 20:05 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-01 10:54 [PATCH net-next] [RFC] netns: enable cross-ve Unix sockets Denis V. Lunev
2008-10-01 11:13 ` Daniel Lezcano
2008-10-01 11:32 ` Denis V. Lunev
2008-10-01 11:55 ` Daniel Lezcano
[not found] ` <48E3653C.1070701-NmTC/0ZBporQT0dZR+AlfA@public.gmane.org>
2008-10-01 12:03 ` Denis V. Lunev
2008-10-01 12:19 ` Daniel Lezcano
2008-10-01 12:24 ` Pavel Emelyanov
2008-10-01 12:31 ` Daniel Lezcano
2008-10-01 12:40 ` Pavel Emelyanov
2008-10-01 13:08 ` Cedric Le Goater
2008-10-01 13:50 ` Daniel Lezcano
2008-10-01 15:07 ` Cedric Le Goater
2008-10-01 13:11 ` Denis V. Lunev
2008-10-01 13:46 ` Daniel Lezcano
2008-10-01 14:54 ` Denis V. Lunev
2008-10-01 15:18 ` Daniel Lezcano
2008-10-01 15:31 ` Pavel Emelyanov
2008-10-01 15:38 ` Daniel Lezcano
2008-10-01 15:42 ` Pavel Emelyanov
2008-10-01 16:15 ` Daniel Lezcano
2008-10-02 10:21 ` Denis V. Lunev
2008-10-02 20:03 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).