* [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer
@ 2014-03-12 20:46 Vivek Goyal
2014-03-12 20:46 ` [PATCH 1/2] cgroup: Provide empty definition of task_cgroup_path() Vivek Goyal
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Vivek Goyal @ 2014-03-12 20:46 UTC (permalink / raw)
To: linux-kernel, cgroups, netdev, davem, tj
Cc: ssorce, jkaluza, lpoetter, kay, Vivek Goyal
Hi,
This is V2 of patches. Fixed the function format issue and also I was using
CONFIG_CGROUP instead of CONFIG_CGROUPS. That led to crash at boot. Fixed that.
Some applications like sssd want to know the cgroup of connected peer over
unix stream socket. They want to use this information to map the cgroup to
the container client belongs to and then decide what kind of policies apply
on the container.
Well why not use SO_PEERCRED, extract pid from it and lookup in
/proc/pid/cgroup to figure out cgroup of client. Problem there is that it
is racy. By the time we look up in /proc, it might happen that client
exited (possibly after handing over socket fd to a child), and client pid
can possibly be assigned to another process. That's the reason people are
looking for more reliable mechanism.
There are others like journald who want similar information over unix
datagram sockets. A patchset to provide that functionality was posted
here.
https://lkml.org/lkml/2014/1/13/43
But this was rejected because of overhead it will cause for rest of the
cases.
https://lkml.org/lkml/2014/1/15/480
This patch series implements SO_PEERCGROUP, which gives the cgroup of
client at the time of opening the connection. So overhead is involved only
during connection setup and there should not be any overhead after that.
So it does not solve all the use cases out there but can solve the needs
of sssd. Hence I am posting this patch.
Please consider it for inclusion.
Thanks
Vivek
Vivek Goyal (2):
cgroup: Provide empty definition of task_cgroup_path()
net: Implement SO_PEERCGROUP
arch/alpha/include/uapi/asm/socket.h | 1 +
arch/avr32/include/uapi/asm/socket.h | 1 +
arch/cris/include/uapi/asm/socket.h | 2 ++
arch/frv/include/uapi/asm/socket.h | 1 +
arch/ia64/include/uapi/asm/socket.h | 2 ++
arch/m32r/include/uapi/asm/socket.h | 1 +
arch/mips/include/uapi/asm/socket.h | 1 +
arch/mn10300/include/uapi/asm/socket.h | 1 +
arch/parisc/include/uapi/asm/socket.h | 1 +
arch/powerpc/include/uapi/asm/socket.h | 1 +
arch/s390/include/uapi/asm/socket.h | 1 +
arch/sparc/include/uapi/asm/socket.h | 2 ++
arch/xtensa/include/uapi/asm/socket.h | 1 +
include/linux/cgroup.h | 6 +++++
include/net/sock.h | 1 +
include/uapi/asm-generic/socket.h | 2 ++
net/core/sock.c | 19 ++++++++++++++
net/unix/af_unix.c | 48 ++++++++++++++++++++++++++++++++++
18 files changed, 92 insertions(+)
--
1.8.5.3
^ permalink raw reply [flat|nested] 41+ messages in thread* [PATCH 1/2] cgroup: Provide empty definition of task_cgroup_path() 2014-03-12 20:46 [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer Vivek Goyal @ 2014-03-12 20:46 ` Vivek Goyal 2014-03-12 20:46 ` [PATCH 2/2] net: Implement SO_PEERCGROUP Vivek Goyal [not found] ` <1394657163-7472-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2 siblings, 0 replies; 41+ messages in thread From: Vivek Goyal @ 2014-03-12 20:46 UTC (permalink / raw) To: linux-kernel, cgroups, netdev, davem, tj Cc: ssorce, jkaluza, lpoetter, kay, Vivek Goyal Compilation with !CONFIG_CGROUP fails for task_cgroup_path() user. So provide an emtpy definition. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- include/linux/cgroup.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 9450f02..727728c 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -869,6 +869,12 @@ static inline int cgroup_attach_task_all(struct task_struct *from, return 0; } +static inline int +task_cgroup_path(struct task_struct *task, char *buf, size_t buflen) +{ + return 0; +} + #endif /* !CONFIG_CGROUPS */ #endif /* _LINUX_CGROUP_H */ -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-12 20:46 [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer Vivek Goyal 2014-03-12 20:46 ` [PATCH 1/2] cgroup: Provide empty definition of task_cgroup_path() Vivek Goyal @ 2014-03-12 20:46 ` Vivek Goyal 2014-03-12 20:58 ` Cong Wang [not found] ` <1394657163-7472-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> [not found] ` <1394657163-7472-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2 siblings, 2 replies; 41+ messages in thread From: Vivek Goyal @ 2014-03-12 20:46 UTC (permalink / raw) To: linux-kernel, cgroups, netdev, davem, tj Cc: ssorce, jkaluza, lpoetter, kay, Vivek Goyal Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the cgroup of first mounted hierarchy of the task. For the case of client, it represents the cgroup of client at the time of opening the connection. After that client cgroup might change. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- arch/alpha/include/uapi/asm/socket.h | 1 + arch/avr32/include/uapi/asm/socket.h | 1 + arch/cris/include/uapi/asm/socket.h | 2 ++ arch/frv/include/uapi/asm/socket.h | 1 + arch/ia64/include/uapi/asm/socket.h | 2 ++ arch/m32r/include/uapi/asm/socket.h | 1 + arch/mips/include/uapi/asm/socket.h | 1 + arch/mn10300/include/uapi/asm/socket.h | 1 + arch/parisc/include/uapi/asm/socket.h | 1 + arch/powerpc/include/uapi/asm/socket.h | 1 + arch/s390/include/uapi/asm/socket.h | 1 + arch/sparc/include/uapi/asm/socket.h | 2 ++ arch/xtensa/include/uapi/asm/socket.h | 1 + include/net/sock.h | 1 + include/uapi/asm-generic/socket.h | 2 ++ net/core/sock.c | 19 ++++++++++++++ net/unix/af_unix.c | 48 ++++++++++++++++++++++++++++++++++ 17 files changed, 86 insertions(+) diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h index 3de1394..7178353 100644 --- a/arch/alpha/include/uapi/asm/socket.h +++ b/arch/alpha/include/uapi/asm/socket.h @@ -87,4 +87,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h index 6e6cd15..486212b 100644 --- a/arch/avr32/include/uapi/asm/socket.h +++ b/arch/avr32/include/uapi/asm/socket.h @@ -80,4 +80,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _UAPI__ASM_AVR32_SOCKET_H */ diff --git a/arch/cris/include/uapi/asm/socket.h b/arch/cris/include/uapi/asm/socket.h index ed94e5e..89a09e3 100644 --- a/arch/cris/include/uapi/asm/socket.h +++ b/arch/cris/include/uapi/asm/socket.h @@ -82,6 +82,8 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 + #endif /* _ASM_SOCKET_H */ diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h index ca2c6e6..c4d90bc 100644 --- a/arch/frv/include/uapi/asm/socket.h +++ b/arch/frv/include/uapi/asm/socket.h @@ -80,5 +80,6 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _ASM_SOCKET_H */ diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h index a1b49ba..62c196d 100644 --- a/arch/ia64/include/uapi/asm/socket.h +++ b/arch/ia64/include/uapi/asm/socket.h @@ -89,4 +89,6 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 + #endif /* _ASM_IA64_SOCKET_H */ diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h index 6c9a24b..6e04a7d 100644 --- a/arch/m32r/include/uapi/asm/socket.h +++ b/arch/m32r/include/uapi/asm/socket.h @@ -80,4 +80,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _ASM_M32R_SOCKET_H */ diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h index a14baa2..cfbd84b 100644 --- a/arch/mips/include/uapi/asm/socket.h +++ b/arch/mips/include/uapi/asm/socket.h @@ -98,4 +98,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h index 6aa3ce1..73467fe 100644 --- a/arch/mn10300/include/uapi/asm/socket.h +++ b/arch/mn10300/include/uapi/asm/socket.h @@ -80,4 +80,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _ASM_SOCKET_H */ diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h index fe35cea..24d8913 100644 --- a/arch/parisc/include/uapi/asm/socket.h +++ b/arch/parisc/include/uapi/asm/socket.h @@ -79,4 +79,5 @@ #define SO_BPF_EXTENSIONS 0x4029 +#define SO_PEERCGROUP 0x402a #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h index a9c3e2e..50106be 100644 --- a/arch/powerpc/include/uapi/asm/socket.h +++ b/arch/powerpc/include/uapi/asm/socket.h @@ -87,4 +87,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _ASM_POWERPC_SOCKET_H */ diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h index e031332..4ae2f3c 100644 --- a/arch/s390/include/uapi/asm/socket.h +++ b/arch/s390/include/uapi/asm/socket.h @@ -86,4 +86,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _ASM_SOCKET_H */ diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h index 54d9608..1056168 100644 --- a/arch/sparc/include/uapi/asm/socket.h +++ b/arch/sparc/include/uapi/asm/socket.h @@ -76,6 +76,8 @@ #define SO_BPF_EXTENSIONS 0x0032 +#define SO_PEERCGROUP 0x0033 + /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 0x5001 #define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002 diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h index 39acec0..947bc6e 100644 --- a/arch/xtensa/include/uapi/asm/socket.h +++ b/arch/xtensa/include/uapi/asm/socket.h @@ -91,4 +91,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _XTENSA_SOCKET_H */ diff --git a/include/net/sock.h b/include/net/sock.h index 5c3f7c3..d594575 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -424,6 +424,7 @@ struct sock { int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); void (*sk_destruct)(struct sock *sk); + char *cgroup_path; }; #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index ea0796b..e86be5b 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -82,4 +82,6 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 + #endif /* __ASM_GENERIC_SOCKET_H */ diff --git a/net/core/sock.c b/net/core/sock.c index 5b6a943..0827a3c 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1185,6 +1185,24 @@ int sock_getsockopt(struct socket *sock, int level, int optname, v.val = sk->sk_max_pacing_rate; break; + case SO_PEERCGROUP: + { + int cgroup_path_len; + + if (!sk->cgroup_path) { + len = 0; + goto lenout; + } + + cgroup_path_len = strlen(sk->cgroup_path) + 1; + + if (len > cgroup_path_len) + len = cgroup_path_len; + if (copy_to_user(optval, sk->cgroup_path, len)) + return -EFAULT; + goto lenout; + } + default: return -ENOPROTOOPT; } @@ -1378,6 +1396,7 @@ static void __sk_free(struct sock *sk) put_cred(sk->sk_peer_cred); put_pid(sk->sk_peer_pid); put_net(sock_net(sk)); + kfree(sk->cgroup_path); sk_prot_free(sk->sk_prot_creator, sk); } diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 29fc8be..6921ae6 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -474,6 +474,37 @@ static void copy_peercred(struct sock *sk, struct sock *peersk) sk->sk_peer_cred = get_cred(peersk->sk_peer_cred); } +static int alloc_cgroup_path(struct sock *sk) +{ +#ifdef CONFIG_CGROUPS + if (sk->cgroup_path) + return 0; + + sk->cgroup_path = kzalloc(PATH_MAX, GFP_KERNEL); + if (!sk->cgroup_path) + return -ENOMEM; + +#endif + return 0; +} + +static int init_peercgroup(struct sock *sk) +{ + int ret; + + ret = alloc_cgroup_path(sk); + if (ret) + return ret; + + return task_cgroup_path(current, sk->cgroup_path, PATH_MAX); +} + +static void copy_peercgroup(struct sock *sk, struct sock *peersk) +{ + if (sk->cgroup_path) + strncpy(sk->cgroup_path, peersk->cgroup_path, PATH_MAX); +} + static int unix_listen(struct socket *sock, int backlog) { int err; @@ -487,6 +518,12 @@ static int unix_listen(struct socket *sock, int backlog) err = -EINVAL; if (!u->addr) goto out; /* No listens on an unbound socket */ + + err = init_peercgroup(sk); + if (err) + goto out; + + err = -EINVAL; unix_state_lock(sk); if (sk->sk_state != TCP_CLOSE && sk->sk_state != TCP_LISTEN) goto out_unlock; @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, if (newsk == NULL) goto out; + err = init_peercgroup(newsk); + if (err) + goto out; + + err = alloc_cgroup_path(sk); + if (err) + goto out; + + err = -ENOMEM; + /* Allocate skb for sending to listening sock */ skb = sock_wmalloc(newsk, 1, 0, GFP_KERNEL); if (skb == NULL) @@ -1203,6 +1250,7 @@ restart: /* Set credentials */ copy_peercred(sk, other); + copy_peercgroup(sk, other); sock->state = SS_CONNECTED; sk->sk_state = TCP_ESTABLISHED; -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-12 20:46 ` [PATCH 2/2] net: Implement SO_PEERCGROUP Vivek Goyal @ 2014-03-12 20:58 ` Cong Wang [not found] ` <CAHA+R7OrNoa7_J-rOskxgdvkM5gAnQyoFBeCXSziY7XMd-yLNA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> [not found] ` <1394657163-7472-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 41+ messages in thread From: Cong Wang @ 2014-03-12 20:58 UTC (permalink / raw) To: Vivek Goyal Cc: linux-kernel@vger.kernel.org, cgroups, netdev, David Miller, tj, ssorce, jkaluza, lpoetter, kay On Wed, Mar 12, 2014 at 1:46 PM, Vivek Goyal <vgoyal@redhat.com> wrote: > @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, > if (newsk == NULL) > goto out; > > + err = init_peercgroup(newsk); > + if (err) > + goto out; > + > + err = alloc_cgroup_path(sk); > + if (err) > + goto out; > + > + err = -ENOMEM; > + Don't we need to free the cgroup_path on error path in this function? ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <CAHA+R7OrNoa7_J-rOskxgdvkM5gAnQyoFBeCXSziY7XMd-yLNA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <CAHA+R7OrNoa7_J-rOskxgdvkM5gAnQyoFBeCXSziY7XMd-yLNA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-13 13:48 ` Vivek Goyal 0 siblings, 0 replies; 41+ messages in thread From: Vivek Goyal @ 2014-03-13 13:48 UTC (permalink / raw) To: Cong Wang Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, netdev, David Miller, tj-DgEjT+Ai2ygdnm+yROfE0A, ssorce-H+wXaHxf7aLQT0dZR+AlfA, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Wed, Mar 12, 2014 at 01:58:57PM -0700, Cong Wang wrote: > On Wed, Mar 12, 2014 at 1:46 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, > > if (newsk == NULL) > > goto out; > > > > + err = init_peercgroup(newsk); > > + if (err) > > + goto out; > > + > > + err = alloc_cgroup_path(sk); > > + if (err) > > + goto out; > > + > > + err = -ENOMEM; > > + > > Don't we need to free the cgroup_path on error path > in this function? Previous allocated cgroup_path is now in newsk->cgroup_path and I was relying on __sk_free() freeing that memory if error happens. unix_release_sock(sk) sock_put() sk_free() __sk_free() kfree(sk->cgroup_path) Do you see a problem with that? Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <1394657163-7472-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <1394657163-7472-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-12 21:00 ` Andy Lutomirski 2014-03-12 21:12 ` Andy Lutomirski 0 siblings, 1 reply; 41+ messages in thread From: Andy Lutomirski @ 2014-03-12 21:00 UTC (permalink / raw) To: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q, tj-DgEjT+Ai2ygdnm+yROfE0A Cc: ssorce-H+wXaHxf7aLQT0dZR+AlfA, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On 03/12/2014 01:46 PM, Vivek Goyal wrote: > Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the > cgroup of first mounted hierarchy of the task. For the case of client, > it represents the cgroup of client at the time of opening the connection. > After that client cgroup might change. Even if people decide that sending cgroups over a unix socket is a good idea, this API has my NAK in the strongest possible sense, for whatever my NAK is worth. IMO SO_PEERCRED is a disaster. Calling send(2) or write(2) should *never* imply the use of a credential. A program should always have to *explicitly* request use of a credential. What you want is SCM_CGROUP. (I've found privilege escalations before based on this observation, and I suspect I'll find them again.) Note that I think that you really want SCM_SOMETHING_ELSE and not SCM_CGROUP, but I don't know what the use case is yet. --Andy ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-12 21:00 ` Andy Lutomirski @ 2014-03-12 21:12 ` Andy Lutomirski [not found] ` <CALCETrUTjN=XKwnO62P9roZtLLuk7_9Oi17Mj_Aa2ZFtZc1gVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-03-13 14:14 ` Vivek Goyal 0 siblings, 2 replies; 41+ messages in thread From: Andy Lutomirski @ 2014-03-12 21:12 UTC (permalink / raw) To: Vivek Goyal, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo Cc: ssorce, jkaluza, lpoetter, kay On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto@amacapital.net> wrote: > On 03/12/2014 01:46 PM, Vivek Goyal wrote: >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the >> cgroup of first mounted hierarchy of the task. For the case of client, >> it represents the cgroup of client at the time of opening the connection. >> After that client cgroup might change. > > Even if people decide that sending cgroups over a unix socket is a good > idea, this API has my NAK in the strongest possible sense, for whatever > my NAK is worth. > > IMO SO_PEERCRED is a disaster. Calling send(2) or write(2) should > *never* imply the use of a credential. A program should always have to > *explicitly* request use of a credential. What you want is SCM_CGROUP. > > (I've found privilege escalations before based on this observation, and > I suspect I'll find them again.) > > > Note that I think that you really want SCM_SOMETHING_ELSE and not > SCM_CGROUP, but I don't know what the use case is yet. This might not be quite as awful as I thought. At least you're looking up the cgroup at connection time instead of at send time. OTOH, this is still racy -- the socket could easily outlive the cgroup that created it. --Andy ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <CALCETrUTjN=XKwnO62P9roZtLLuk7_9Oi17Mj_Aa2ZFtZc1gVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <CALCETrUTjN=XKwnO62P9roZtLLuk7_9Oi17Mj_Aa2ZFtZc1gVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-12 21:16 ` Simo Sorce [not found] ` <1394658983.32465.203.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 41+ messages in thread From: Simo Sorce @ 2014-03-12 21:16 UTC (permalink / raw) To: Andy Lutomirski Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote: > On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: > > On 03/12/2014 01:46 PM, Vivek Goyal wrote: > >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the > >> cgroup of first mounted hierarchy of the task. For the case of client, > >> it represents the cgroup of client at the time of opening the connection. > >> After that client cgroup might change. > > > > Even if people decide that sending cgroups over a unix socket is a good > > idea, this API has my NAK in the strongest possible sense, for whatever > > my NAK is worth. > > > > IMO SO_PEERCRED is a disaster. Calling send(2) or write(2) should > > *never* imply the use of a credential. A program should always have to > > *explicitly* request use of a credential. What you want is SCM_CGROUP. > > > > (I've found privilege escalations before based on this observation, and > > I suspect I'll find them again.) > > > > > > Note that I think that you really want SCM_SOMETHING_ELSE and not > > SCM_CGROUP, but I don't know what the use case is yet. > > This might not be quite as awful as I thought. At least you're > looking up the cgroup at connection time instead of at send time. > > OTOH, this is still racy -- the socket could easily outlive the cgroup > that created it. I think you do not understand how this whole problem space works. The problem is exactly the same as with SO_PEERCRED, so we are taking the same proven solution. Connection time is all we do and can care about. Simo. ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <1394658983.32465.203.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <1394658983.32465.203.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> @ 2014-03-12 21:19 ` Andy Lutomirski [not found] ` <CALCETrU29CpcK4dVRD45BGOD7AVEudADiFxOgFgKWkpFzw07eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 41+ messages in thread From: Andy Lutomirski @ 2014-03-12 21:19 UTC (permalink / raw) To: Simo Sorce Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote: >> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: >> > On 03/12/2014 01:46 PM, Vivek Goyal wrote: >> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the >> >> cgroup of first mounted hierarchy of the task. For the case of client, >> >> it represents the cgroup of client at the time of opening the connection. >> >> After that client cgroup might change. >> > >> > Even if people decide that sending cgroups over a unix socket is a good >> > idea, this API has my NAK in the strongest possible sense, for whatever >> > my NAK is worth. >> > >> > IMO SO_PEERCRED is a disaster. Calling send(2) or write(2) should >> > *never* imply the use of a credential. A program should always have to >> > *explicitly* request use of a credential. What you want is SCM_CGROUP. >> > >> > (I've found privilege escalations before based on this observation, and >> > I suspect I'll find them again.) >> > >> > >> > Note that I think that you really want SCM_SOMETHING_ELSE and not >> > SCM_CGROUP, but I don't know what the use case is yet. >> >> This might not be quite as awful as I thought. At least you're >> looking up the cgroup at connection time instead of at send time. >> >> OTOH, this is still racy -- the socket could easily outlive the cgroup >> that created it. > > I think you do not understand how this whole problem space works. > > The problem is exactly the same as with SO_PEERCRED, so we are taking > the same proven solution. You mean the same proven crappy solution? > > Connection time is all we do and can care about. You have not answered why. > > Simo. > > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <CALCETrU29CpcK4dVRD45BGOD7AVEudADiFxOgFgKWkpFzw07eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <CALCETrU29CpcK4dVRD45BGOD7AVEudADiFxOgFgKWkpFzw07eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-13 1:17 ` Simo Sorce [not found] ` <1394673476.32465.215.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 41+ messages in thread From: Simo Sorce @ 2014-03-13 1:17 UTC (permalink / raw) To: Andy Lutomirski Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote: > On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote: > >> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: > >> > On 03/12/2014 01:46 PM, Vivek Goyal wrote: > >> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the > >> >> cgroup of first mounted hierarchy of the task. For the case of client, > >> >> it represents the cgroup of client at the time of opening the connection. > >> >> After that client cgroup might change. > >> > > >> > Even if people decide that sending cgroups over a unix socket is a good > >> > idea, this API has my NAK in the strongest possible sense, for whatever > >> > my NAK is worth. > >> > > >> > IMO SO_PEERCRED is a disaster. Calling send(2) or write(2) should > >> > *never* imply the use of a credential. A program should always have to > >> > *explicitly* request use of a credential. What you want is SCM_CGROUP. > >> > > >> > (I've found privilege escalations before based on this observation, and > >> > I suspect I'll find them again.) > >> > > >> > > >> > Note that I think that you really want SCM_SOMETHING_ELSE and not > >> > SCM_CGROUP, but I don't know what the use case is yet. > >> > >> This might not be quite as awful as I thought. At least you're > >> looking up the cgroup at connection time instead of at send time. > >> > >> OTOH, this is still racy -- the socket could easily outlive the cgroup > >> that created it. > > > > I think you do not understand how this whole problem space works. > > > > The problem is exactly the same as with SO_PEERCRED, so we are taking > > the same proven solution. > > You mean the same proven crappy solution? > > > > > Connection time is all we do and can care about. > > You have not answered why. We are going to disclose information to the peer based on policy that depends on the cgroup the peer is part of. All we care for is who opened the connection, if the peer wants to pass on that information after it has obtained it there is nothing we can do, so connection time is all we really care about. Simo. ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <1394673476.32465.215.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <1394673476.32465.215.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> @ 2014-03-13 1:21 ` Andy Lutomirski [not found] ` <CALCETrWbCuKfV4zMkZajDOGhGkKduW0C2L7uHv=vtvqhVeAc6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 41+ messages in thread From: Andy Lutomirski @ 2014-03-13 1:21 UTC (permalink / raw) To: Simo Sorce Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote: >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> >> > >> > Connection time is all we do and can care about. >> >> You have not answered why. > > We are going to disclose information to the peer based on policy that > depends on the cgroup the peer is part of. All we care for is who opened > the connection, if the peer wants to pass on that information after it > has obtained it there is nothing we can do, so connection time is all we > really care about. Can you give a realistic example? I could say that I'd like to disclose information to processes based on their rlimits at the time they connected, but I don't think that would carry much weight. --Andy ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <CALCETrWbCuKfV4zMkZajDOGhGkKduW0C2L7uHv=vtvqhVeAc6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <CALCETrWbCuKfV4zMkZajDOGhGkKduW0C2L7uHv=vtvqhVeAc6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-13 1:43 ` Simo Sorce [not found] ` <1394675038.32465.223.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 41+ messages in thread From: Simo Sorce @ 2014-03-13 1:43 UTC (permalink / raw) To: Andy Lutomirski Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote: > On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote: > >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > >> > >> > > >> > Connection time is all we do and can care about. > >> > >> You have not answered why. > > > > We are going to disclose information to the peer based on policy that > > depends on the cgroup the peer is part of. All we care for is who opened > > the connection, if the peer wants to pass on that information after it > > has obtained it there is nothing we can do, so connection time is all we > > really care about. > > Can you give a realistic example? > > I could say that I'd like to disclose information to processes based > on their rlimits at the time they connected, but I don't think that > would carry much weight. We want to be able to show different user's list from SSSD based on the docker container that is asking for it. This works by having libnsss_sss.so from the containerized application connect to an SSSD daemon running on the host or in another container. The only way to distinguish between containers "from the outside" is to lookup the cgroup of the requesting process. It has a unique container ID, and can therefore be mapped to the appropriate policy that will let us decide which 'user domain' to serve to the container. Simo. ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <1394675038.32465.223.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <1394675038.32465.223.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> @ 2014-03-13 2:12 ` Andy Lutomirski 2014-03-13 14:27 ` Vivek Goyal [not found] ` <CALCETrV7GE0YL1sDDqTUV81rWL9zk2vTR1-VLkR7CfTA-FSrgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 2 replies; 41+ messages in thread From: Andy Lutomirski @ 2014-03-13 2:12 UTC (permalink / raw) To: Simo Sorce Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote: >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote: >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> >> >> >> > >> >> > Connection time is all we do and can care about. >> >> >> >> You have not answered why. >> > >> > We are going to disclose information to the peer based on policy that >> > depends on the cgroup the peer is part of. All we care for is who opened >> > the connection, if the peer wants to pass on that information after it >> > has obtained it there is nothing we can do, so connection time is all we >> > really care about. >> >> Can you give a realistic example? >> >> I could say that I'd like to disclose information to processes based >> on their rlimits at the time they connected, but I don't think that >> would carry much weight. > > We want to be able to show different user's list from SSSD based on the > docker container that is asking for it. > > This works by having libnsss_sss.so from the containerized application > connect to an SSSD daemon running on the host or in another container. > > The only way to distinguish between containers "from the outside" is to > lookup the cgroup of the requesting process. It has a unique container > ID, and can therefore be mapped to the appropriate policy that will let > us decide which 'user domain' to serve to the container. > I can think of at least three other ways to do this. 1. Fix Docker to use user namespaces and use the uid of the requesting process via SCM_CREDENTIALS. 2. Docker is a container system, so use the "container" (aka namespace) APIs. There are probably several clever things that could be done with /proc/<pid>/ns. 3. Given that Docker uses network namespaces, I assume that the socket connection between the two sssd instances either comes from Docker itself or uses socket inodes. In either case, the same mechanism should be usable for authentication. On an unrelated note, since you seem to have found a way to get unix sockets to connect the inside and outside of a Docker container, it would be awesome if Docker could use the same mechanism to pass TCP sockets around rather than playing awful games with virtual networks. --Andy ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 2:12 ` Andy Lutomirski @ 2014-03-13 14:27 ` Vivek Goyal [not found] ` <20140313142755.GC18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> [not found] ` <CALCETrV7GE0YL1sDDqTUV81rWL9zk2vTR1-VLkR7CfTA-FSrgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2014-03-13 14:27 UTC (permalink / raw) To: Andy Lutomirski Cc: Simo Sorce, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay On Wed, Mar 12, 2014 at 07:12:25PM -0700, Andy Lutomirski wrote: [..] > >> Can you give a realistic example? > >> > >> I could say that I'd like to disclose information to processes based > >> on their rlimits at the time they connected, but I don't think that > >> would carry much weight. > > > > We want to be able to show different user's list from SSSD based on the > > docker container that is asking for it. > > > > This works by having libnsss_sss.so from the containerized application > > connect to an SSSD daemon running on the host or in another container. > > > > The only way to distinguish between containers "from the outside" is to > > lookup the cgroup of the requesting process. It has a unique container > > ID, and can therefore be mapped to the appropriate policy that will let > > us decide which 'user domain' to serve to the container. > > > > I can think of at least three other ways to do this. > > 1. Fix Docker to use user namespaces and use the uid of the requesting > process via SCM_CREDENTIALS. Using user namespaces sounds like the right way to do it (atleast conceptually). But I think hurdle here is that people are not convinced yet that user namespaces are secure and work well. IOW, some people don't seem to think that user namespaces are ready yet. I guess that's the reason people are looking for other ways to achieve their goal. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <20140313142755.GC18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <20140313142755.GC18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-14 23:54 ` Eric W. Biederman 0 siblings, 0 replies; 41+ messages in thread From: Eric W. Biederman @ 2014-03-14 23:54 UTC (permalink / raw) To: Vivek Goyal Cc: Andy Lutomirski, Simo Sorce, linux-kernel@vger.kernel.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > On Wed, Mar 12, 2014 at 07:12:25PM -0700, Andy Lutomirski wrote: > >> I can think of at least three other ways to do this. >> >> 1. Fix Docker to use user namespaces and use the uid of the requesting >> process via SCM_CREDENTIALS. > > Using user namespaces sounds like the right way to do it (atleast > conceptually). But I think hurdle here is that people are not convinced > yet that user namespaces are secure and work well. IOW, some people > don't seem to think that user namespaces are ready yet. If the problem is user namespace immaturity patches or bug reports need to be sent for user namespaces. Containers with user namespaces (however immature they are) are much more secure than running container with processes with uid == 0 inside of them. User namespaces do considerably reduce the attack surface of what uid == 0 can do. > I guess that's the reason people are looking for other ways to > achieve their goal. It seems strange to work around a feature that is 99% of the way to solving their problem with more kernel patches. Eric ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <CALCETrV7GE0YL1sDDqTUV81rWL9zk2vTR1-VLkR7CfTA-FSrgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <CALCETrV7GE0YL1sDDqTUV81rWL9zk2vTR1-VLkR7CfTA-FSrgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-13 17:51 ` Simo Sorce 2014-03-13 17:55 ` Andy Lutomirski 2014-03-13 18:02 ` Vivek Goyal 0 siblings, 2 replies; 41+ messages in thread From: Simo Sorce @ 2014-03-13 17:51 UTC (permalink / raw) To: Andy Lutomirski Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote: > On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote: > >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote: > >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > >> >> > >> >> > > >> >> > Connection time is all we do and can care about. > >> >> > >> >> You have not answered why. > >> > > >> > We are going to disclose information to the peer based on policy that > >> > depends on the cgroup the peer is part of. All we care for is who opened > >> > the connection, if the peer wants to pass on that information after it > >> > has obtained it there is nothing we can do, so connection time is all we > >> > really care about. > >> > >> Can you give a realistic example? > >> > >> I could say that I'd like to disclose information to processes based > >> on their rlimits at the time they connected, but I don't think that > >> would carry much weight. > > > > We want to be able to show different user's list from SSSD based on the > > docker container that is asking for it. > > > > This works by having libnsss_sss.so from the containerized application > > connect to an SSSD daemon running on the host or in another container. > > > > The only way to distinguish between containers "from the outside" is to > > lookup the cgroup of the requesting process. It has a unique container > > ID, and can therefore be mapped to the appropriate policy that will let > > us decide which 'user domain' to serve to the container. > > > > I can think of at least three other ways to do this. > > 1. Fix Docker to use user namespaces and use the uid of the requesting > process via SCM_CREDENTIALS. This is not practical, I have no control on what UIDs will be used within a container, and IIRC user namespaces have severe limitations that may make them unusable in some situations. Forcing the use of user namespaces on docker to satisfy my use case is not in my power. > 2. Docker is a container system, so use the "container" (aka > namespace) APIs. There are probably several clever things that could > be done with /proc/<pid>/ns. pid is racy, if it weren't I would simply go straight to /proc/<pid>/cgroups ... > 3. Given that Docker uses network namespaces, I assume that the socket > connection between the two sssd instances either comes from Docker > itself or uses socket inodes. In either case, the same mechanism > should be usable for authentication. It is a unix socket, ie bind mounted on the container filesystem, not sure network namespaces really come into the picture, and I do not know of a racefree way of knowing what is the namespace of the peer at connect time. Is there a SO_PEER_NAMESPACE option ? Simo. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 17:51 ` Simo Sorce @ 2014-03-13 17:55 ` Andy Lutomirski [not found] ` <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-03-13 19:53 ` Vivek Goyal 2014-03-13 18:02 ` Vivek Goyal 1 sibling, 2 replies; 41+ messages in thread From: Andy Lutomirski @ 2014-03-13 17:55 UTC (permalink / raw) To: Simo Sorce Cc: Vivek Goyal, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce <ssorce@redhat.com> wrote: > On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote: >> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce <ssorce@redhat.com> wrote: >> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote: >> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce@redhat.com> wrote: >> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote: >> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce@redhat.com> wrote: >> >> >> >> >> >> > >> >> >> > Connection time is all we do and can care about. >> >> >> >> >> >> You have not answered why. >> >> > >> >> > We are going to disclose information to the peer based on policy that >> >> > depends on the cgroup the peer is part of. All we care for is who opened >> >> > the connection, if the peer wants to pass on that information after it >> >> > has obtained it there is nothing we can do, so connection time is all we >> >> > really care about. >> >> >> >> Can you give a realistic example? >> >> >> >> I could say that I'd like to disclose information to processes based >> >> on their rlimits at the time they connected, but I don't think that >> >> would carry much weight. >> > >> > We want to be able to show different user's list from SSSD based on the >> > docker container that is asking for it. >> > >> > This works by having libnsss_sss.so from the containerized application >> > connect to an SSSD daemon running on the host or in another container. >> > >> > The only way to distinguish between containers "from the outside" is to >> > lookup the cgroup of the requesting process. It has a unique container >> > ID, and can therefore be mapped to the appropriate policy that will let >> > us decide which 'user domain' to serve to the container. >> > >> >> I can think of at least three other ways to do this. >> >> 1. Fix Docker to use user namespaces and use the uid of the requesting >> process via SCM_CREDENTIALS. > > This is not practical, I have no control on what UIDs will be used > within a container, and IIRC user namespaces have severe limitations > that may make them unusable in some situations. Forcing the use of user > namespaces on docker to satisfy my use case is not in my power. Except that Docker w/o userns is basically completely insecure unless selinux or apparmor is in use, so this may not matter. > >> 2. Docker is a container system, so use the "container" (aka >> namespace) APIs. There are probably several clever things that could >> be done with /proc/<pid>/ns. > > pid is racy, if it weren't I would simply go straight > to /proc/<pid>/cgroups ... How about: open("/proc/self/ns/ipc", O_RDONLY); send the result over SCM_RIGHTS? > >> 3. Given that Docker uses network namespaces, I assume that the socket >> connection between the two sssd instances either comes from Docker >> itself or uses socket inodes. In either case, the same mechanism >> should be usable for authentication. > > It is a unix socket, ie bind mounted on the container filesystem, not > sure network namespaces really come into the picture, and I do not know > of a racefree way of knowing what is the namespace of the peer at > connect time. > Is there a SO_PEER_NAMESPACE option ? So give each container its own unix socket. Problem solved, no? --Andy ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-13 17:57 ` Simo Sorce [not found] ` <1394733448.32465.249.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> 2014-03-13 17:58 ` Simo Sorce 1 sibling, 1 reply; 41+ messages in thread From: Simo Sorce @ 2014-03-13 17:57 UTC (permalink / raw) To: Andy Lutomirski Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote: > On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote: > >> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > >> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote: > >> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > >> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote: > >> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > >> >> >> > >> >> >> > > >> >> >> > Connection time is all we do and can care about. > >> >> >> > >> >> >> You have not answered why. > >> >> > > >> >> > We are going to disclose information to the peer based on policy that > >> >> > depends on the cgroup the peer is part of. All we care for is who opened > >> >> > the connection, if the peer wants to pass on that information after it > >> >> > has obtained it there is nothing we can do, so connection time is all we > >> >> > really care about. > >> >> > >> >> Can you give a realistic example? > >> >> > >> >> I could say that I'd like to disclose information to processes based > >> >> on their rlimits at the time they connected, but I don't think that > >> >> would carry much weight. > >> > > >> > We want to be able to show different user's list from SSSD based on the > >> > docker container that is asking for it. > >> > > >> > This works by having libnsss_sss.so from the containerized application > >> > connect to an SSSD daemon running on the host or in another container. > >> > > >> > The only way to distinguish between containers "from the outside" is to > >> > lookup the cgroup of the requesting process. It has a unique container > >> > ID, and can therefore be mapped to the appropriate policy that will let > >> > us decide which 'user domain' to serve to the container. > >> > > >> > >> I can think of at least three other ways to do this. > >> > >> 1. Fix Docker to use user namespaces and use the uid of the requesting > >> process via SCM_CREDENTIALS. > > > > This is not practical, I have no control on what UIDs will be used > > within a container, and IIRC user namespaces have severe limitations > > that may make them unusable in some situations. Forcing the use of user > > namespaces on docker to satisfy my use case is not in my power. > > Except that Docker w/o userns is basically completely insecure unless > selinux or apparmor is in use, so this may not matter. > > > > >> 2. Docker is a container system, so use the "container" (aka > >> namespace) APIs. There are probably several clever things that could > >> be done with /proc/<pid>/ns. > > > > pid is racy, if it weren't I would simply go straight > > to /proc/<pid>/cgroups ... > > How about: > > open("/proc/self/ns/ipc", O_RDONLY); > send the result over SCM_RIGHTS? This needs to work with existing clients, existing clients, don't do this. > >> 3. Given that Docker uses network namespaces, I assume that the socket > >> connection between the two sssd instances either comes from Docker > >> itself or uses socket inodes. In either case, the same mechanism > >> should be usable for authentication. > > > > It is a unix socket, ie bind mounted on the container filesystem, not > > sure network namespaces really come into the picture, and I do not know > > of a racefree way of knowing what is the namespace of the peer at > > connect time. > > Is there a SO_PEER_NAMESPACE option ? > > So give each container its own unix socket. Problem solved, no? > > --Andy ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <1394733448.32465.249.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <1394733448.32465.249.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> @ 2014-03-13 18:03 ` Andy Lutomirski 0 siblings, 0 replies; 41+ messages in thread From: Andy Lutomirski @ 2014-03-13 18:03 UTC (permalink / raw) To: Simo Sorce Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Thu, Mar 13, 2014 at 10:57 AM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote: >> On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> > On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote: >> >> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> >> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote: >> >> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> >> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote: >> >> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> >> >> >> >> >> >> >> > >> >> >> >> > Connection time is all we do and can care about. >> >> >> >> >> >> >> >> You have not answered why. >> >> >> > >> >> >> > We are going to disclose information to the peer based on policy that >> >> >> > depends on the cgroup the peer is part of. All we care for is who opened >> >> >> > the connection, if the peer wants to pass on that information after it >> >> >> > has obtained it there is nothing we can do, so connection time is all we >> >> >> > really care about. >> >> >> >> >> >> Can you give a realistic example? >> >> >> >> >> >> I could say that I'd like to disclose information to processes based >> >> >> on their rlimits at the time they connected, but I don't think that >> >> >> would carry much weight. >> >> > >> >> > We want to be able to show different user's list from SSSD based on the >> >> > docker container that is asking for it. >> >> > >> >> > This works by having libnsss_sss.so from the containerized application >> >> > connect to an SSSD daemon running on the host or in another container. >> >> > >> >> > The only way to distinguish between containers "from the outside" is to >> >> > lookup the cgroup of the requesting process. It has a unique container >> >> > ID, and can therefore be mapped to the appropriate policy that will let >> >> > us decide which 'user domain' to serve to the container. >> >> > >> >> >> >> I can think of at least three other ways to do this. >> >> >> >> 1. Fix Docker to use user namespaces and use the uid of the requesting >> >> process via SCM_CREDENTIALS. >> > >> > This is not practical, I have no control on what UIDs will be used >> > within a container, and IIRC user namespaces have severe limitations >> > that may make them unusable in some situations. Forcing the use of user >> > namespaces on docker to satisfy my use case is not in my power. >> >> Except that Docker w/o userns is basically completely insecure unless >> selinux or apparmor is in use, so this may not matter. >> >> > >> >> 2. Docker is a container system, so use the "container" (aka >> >> namespace) APIs. There are probably several clever things that could >> >> be done with /proc/<pid>/ns. >> > >> > pid is racy, if it weren't I would simply go straight >> > to /proc/<pid>/cgroups ... >> >> How about: >> >> open("/proc/self/ns/ipc", O_RDONLY); >> send the result over SCM_RIGHTS? > > This needs to work with existing clients, existing clients, don't do > this. > Wait... you want completely unmodified clients in a container to talk to a service that they don't even realize is outside the container and for that server to magically behave differently because the container is there? And there's no per-container proxy involved? And every container is connecting to *the very same socket*? I just can't imagine this working well regardless if what magic socket options you add, especially if user namespaces aren't in use. --Andy ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-03-13 17:57 ` Simo Sorce @ 2014-03-13 17:58 ` Simo Sorce 2014-03-13 18:01 ` Andy Lutomirski 2014-03-13 18:05 ` Tim Hockin 1 sibling, 2 replies; 41+ messages in thread From: Simo Sorce @ 2014-03-13 17:58 UTC (permalink / raw) To: Andy Lutomirski Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote: > > So give each container its own unix socket. Problem solved, no? Not really practical if you have hundreds of containers. Simo. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 17:58 ` Simo Sorce @ 2014-03-13 18:01 ` Andy Lutomirski 2014-03-13 18:05 ` Tim Hockin 1 sibling, 0 replies; 41+ messages in thread From: Andy Lutomirski @ 2014-03-13 18:01 UTC (permalink / raw) To: Simo Sorce Cc: Vivek Goyal, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay On Thu, Mar 13, 2014 at 10:58 AM, Simo Sorce <ssorce@redhat.com> wrote: > On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote: >> >> So give each container its own unix socket. Problem solved, no? > > Not really practical if you have hundreds of containers. I don't see the problem. Sockets are cheap. > > Simo. > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 17:58 ` Simo Sorce 2014-03-13 18:01 ` Andy Lutomirski @ 2014-03-13 18:05 ` Tim Hockin 1 sibling, 0 replies; 41+ messages in thread From: Tim Hockin @ 2014-03-13 18:05 UTC (permalink / raw) To: Simo Sorce Cc: Andy Lutomirski, Vivek Goyal, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay I don't buy that it is not practical. Not convenient, maybe. Not clean, sure. But it is practical - it uses mechanisms that exist on all kernels today. That is a win, to me. On Thu, Mar 13, 2014 at 10:58 AM, Simo Sorce <ssorce@redhat.com> wrote: > On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote: >> >> So give each container its own unix socket. Problem solved, no? > > Not really practical if you have hundreds of containers. > > Simo. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 17:55 ` Andy Lutomirski [not found] ` <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-13 19:53 ` Vivek Goyal 2014-03-13 19:58 ` Andy Lutomirski 1 sibling, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2014-03-13 19:53 UTC (permalink / raw) To: Andy Lutomirski Cc: Simo Sorce, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote: [..] > >> 2. Docker is a container system, so use the "container" (aka > >> namespace) APIs. There are probably several clever things that could > >> be done with /proc/<pid>/ns. > > > > pid is racy, if it weren't I would simply go straight > > to /proc/<pid>/cgroups ... > > How about: > > open("/proc/self/ns/ipc", O_RDONLY); > send the result over SCM_RIGHTS? As I don't know I will ask. So what will server now do with this file descriptor of client's ipc namespace. IOW, what information/identifier does it contain which can be used to map to pre-configrued per container/per namespace policies. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 19:53 ` Vivek Goyal @ 2014-03-13 19:58 ` Andy Lutomirski 2014-03-13 20:06 ` Vivek Goyal 0 siblings, 1 reply; 41+ messages in thread From: Andy Lutomirski @ 2014-03-13 19:58 UTC (permalink / raw) To: Vivek Goyal Cc: Simo Sorce, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote: > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote: > > [..] >> >> 2. Docker is a container system, so use the "container" (aka >> >> namespace) APIs. There are probably several clever things that could >> >> be done with /proc/<pid>/ns. >> > >> > pid is racy, if it weren't I would simply go straight >> > to /proc/<pid>/cgroups ... >> >> How about: >> >> open("/proc/self/ns/ipc", O_RDONLY); >> send the result over SCM_RIGHTS? > > As I don't know I will ask. So what will server now do with this file > descriptor of client's ipc namespace. > > IOW, what information/identifier does it contain which can be > used to map to pre-configrued per container/per namespace policies. Inode number, which will match that assigned to the container at runtime. (I'm not sure this is a great idea -- there's no convention that "I have an fd for a namespace" means "I'm a daemon in that namespace".) --Andy > > Thanks > Vivek -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 19:58 ` Andy Lutomirski @ 2014-03-13 20:06 ` Vivek Goyal [not found] ` <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2014-03-13 20:06 UTC (permalink / raw) To: Andy Lutomirski Cc: Simo Sorce, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote: > On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote: > > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote: > > > > [..] > >> >> 2. Docker is a container system, so use the "container" (aka > >> >> namespace) APIs. There are probably several clever things that could > >> >> be done with /proc/<pid>/ns. > >> > > >> > pid is racy, if it weren't I would simply go straight > >> > to /proc/<pid>/cgroups ... > >> > >> How about: > >> > >> open("/proc/self/ns/ipc", O_RDONLY); > >> send the result over SCM_RIGHTS? > > > > As I don't know I will ask. So what will server now do with this file > > descriptor of client's ipc namespace. > > > > IOW, what information/identifier does it contain which can be > > used to map to pre-configrued per container/per namespace policies. > > Inode number, which will match that assigned to the container at runtime. > But what would I do with this inode number. I am assuming this is generated dynamically when respective namespace was created. To me this is like assigning a pid dynamically and one does not create policies in user space based on pid. Similarly I will not be able to create policies based on an inode number which is generated dynamically. For it to be useful, it should map to something more static which user space understands. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-13 20:17 ` Vivek Goyal [not found] ` <20140313201755.GO18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2014-03-13 21:21 ` Andy Lutomirski 2014-03-14 23:49 ` Eric W. Biederman 2 siblings, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2014-03-13 20:17 UTC (permalink / raw) To: Andy Lutomirski Cc: Simo Sorce, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Thu, Mar 13, 2014 at 04:06:49PM -0400, Vivek Goyal wrote: > On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote: > > On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote: > > > > > > [..] > > >> >> 2. Docker is a container system, so use the "container" (aka > > >> >> namespace) APIs. There are probably several clever things that could > > >> >> be done with /proc/<pid>/ns. > > >> > > > >> > pid is racy, if it weren't I would simply go straight > > >> > to /proc/<pid>/cgroups ... > > >> > > >> How about: > > >> > > >> open("/proc/self/ns/ipc", O_RDONLY); > > >> send the result over SCM_RIGHTS? > > > > > > As I don't know I will ask. So what will server now do with this file > > > descriptor of client's ipc namespace. > > > > > > IOW, what information/identifier does it contain which can be > > > used to map to pre-configrued per container/per namespace policies. > > > > Inode number, which will match that assigned to the container at runtime. > > > > But what would I do with this inode number. I am assuming this is > generated dynamically when respective namespace was created. To me > this is like assigning a pid dynamically and one does not create > policies in user space based on pid. Similarly I will not be able > to create policies based on an inode number which is generated > dynamically. > > For it to be useful, it should map to something more static which > user space understands. Or could we do following. open("/proc/self/cgroup", O_RDONLY); send the result over SCM_RIGHTS But this requires client modification. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <20140313201755.GO18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <20140313201755.GO18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-13 20:19 ` Vivek Goyal 0 siblings, 0 replies; 41+ messages in thread From: Vivek Goyal @ 2014-03-13 20:19 UTC (permalink / raw) To: Andy Lutomirski Cc: Simo Sorce, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Thu, Mar 13, 2014 at 04:17:55PM -0400, Vivek Goyal wrote: > On Thu, Mar 13, 2014 at 04:06:49PM -0400, Vivek Goyal wrote: > > On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote: > > > On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > > > > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote: > > > > > > > > [..] > > > >> >> 2. Docker is a container system, so use the "container" (aka > > > >> >> namespace) APIs. There are probably several clever things that could > > > >> >> be done with /proc/<pid>/ns. > > > >> > > > > >> > pid is racy, if it weren't I would simply go straight > > > >> > to /proc/<pid>/cgroups ... > > > >> > > > >> How about: > > > >> > > > >> open("/proc/self/ns/ipc", O_RDONLY); > > > >> send the result over SCM_RIGHTS? > > > > > > > > As I don't know I will ask. So what will server now do with this file > > > > descriptor of client's ipc namespace. > > > > > > > > IOW, what information/identifier does it contain which can be > > > > used to map to pre-configrued per container/per namespace policies. > > > > > > Inode number, which will match that assigned to the container at runtime. > > > > > > > But what would I do with this inode number. I am assuming this is > > generated dynamically when respective namespace was created. To me > > this is like assigning a pid dynamically and one does not create > > policies in user space based on pid. Similarly I will not be able > > to create policies based on an inode number which is generated > > dynamically. > > > > For it to be useful, it should map to something more static which > > user space understands. > > Or could we do following. > > open("/proc/self/cgroup", O_RDONLY); > send the result over SCM_RIGHTS I guess that would not work. Client should be able to create a file, fake cgroup information and send fd. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2014-03-13 20:17 ` Vivek Goyal @ 2014-03-13 21:21 ` Andy Lutomirski 2014-03-14 23:49 ` Eric W. Biederman 2 siblings, 0 replies; 41+ messages in thread From: Andy Lutomirski @ 2014-03-13 21:21 UTC (permalink / raw) To: Vivek Goyal Cc: Simo Sorce, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Thu, Mar 13, 2014 at 1:06 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote: >> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote: >> > >> > [..] >> >> >> 2. Docker is a container system, so use the "container" (aka >> >> >> namespace) APIs. There are probably several clever things that could >> >> >> be done with /proc/<pid>/ns. >> >> > >> >> > pid is racy, if it weren't I would simply go straight >> >> > to /proc/<pid>/cgroups ... >> >> >> >> How about: >> >> >> >> open("/proc/self/ns/ipc", O_RDONLY); >> >> send the result over SCM_RIGHTS? >> > >> > As I don't know I will ask. So what will server now do with this file >> > descriptor of client's ipc namespace. >> > >> > IOW, what information/identifier does it contain which can be >> > used to map to pre-configrued per container/per namespace policies. >> >> Inode number, which will match that assigned to the container at runtime. >> > > But what would I do with this inode number. I am assuming this is > generated dynamically when respective namespace was created. To me > this is like assigning a pid dynamically and one does not create > policies in user space based on pid. Similarly I will not be able > to create policies based on an inode number which is generated > dynamically. > > For it to be useful, it should map to something more static which > user space understands. Like what? I imagine that, at best, sssd will be hardcoding some understanding of Docker's cgroup names. As an alternative, it could ask Docker for a uid or an inode number of something else -- it's hardcoding an understanding of Docker anyway. And Docker needs to cooperate regardless, since otherwise it could change its cgroup naming or stop using cgroups entirely. --Andy ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 2014-03-13 20:17 ` Vivek Goyal 2014-03-13 21:21 ` Andy Lutomirski @ 2014-03-14 23:49 ` Eric W. Biederman 2 siblings, 0 replies; 41+ messages in thread From: Eric W. Biederman @ 2014-03-14 23:49 UTC (permalink / raw) To: Vivek Goyal Cc: Andy Lutomirski, Simo Sorce, linux-kernel@vger.kernel.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes: > On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote: >> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: >> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote: >> > >> > [..] >> >> >> 2. Docker is a container system, so use the "container" (aka >> >> >> namespace) APIs. There are probably several clever things that could >> >> >> be done with /proc/<pid>/ns. >> >> > >> >> > pid is racy, if it weren't I would simply go straight >> >> > to /proc/<pid>/cgroups ... >> >> >> >> How about: >> >> >> >> open("/proc/self/ns/ipc", O_RDONLY); >> >> send the result over SCM_RIGHTS? >> > >> > As I don't know I will ask. So what will server now do with this file >> > descriptor of client's ipc namespace. >> > >> > IOW, what information/identifier does it contain which can be >> > used to map to pre-configrued per container/per namespace policies. >> >> Inode number, which will match that assigned to the container at runtime. >> > > But what would I do with this inode number. I am assuming this is > generated dynamically when respective namespace was created. To me > this is like assigning a pid dynamically and one does not create > policies in user space based on pid. Similarly I will not be able > to create policies based on an inode number which is generated > dynamically. > > For it to be useful, it should map to something more static which > user space understands. But the mapping can be done in userspace. stat all of the namespaces you care about, get their inode numbers, and then do a lookup. Hard coding string based names in the kernel the way cgroups does is really pretty terrible and it seriously limits the flexibility of the api, and so far breaks nested containers. Eric ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 17:51 ` Simo Sorce 2014-03-13 17:55 ` Andy Lutomirski @ 2014-03-13 18:02 ` Vivek Goyal 1 sibling, 0 replies; 41+ messages in thread From: Vivek Goyal @ 2014-03-13 18:02 UTC (permalink / raw) To: Simo Sorce Cc: Andy Lutomirski, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay On Thu, Mar 13, 2014 at 01:51:17PM -0400, Simo Sorce wrote: [..] > > 1. Fix Docker to use user namespaces and use the uid of the requesting > > process via SCM_CREDENTIALS. > > This is not practical, I have no control on what UIDs will be used > within a container, I guess uid to container mapping has to be managed by somebody, say systemd. Then there systemd should export an API to query the container a uid is mapped into. So that should not be the real problem. > and IIRC user namespaces have severe limitations > that may make them unusable in some situations. Forcing the use of user > namespaces on docker to satisfy my use case is not in my power. I think that's the real practical problem. Adoption of user name space. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-12 21:12 ` Andy Lutomirski [not found] ` <CALCETrUTjN=XKwnO62P9roZtLLuk7_9Oi17Mj_Aa2ZFtZc1gVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-03-13 14:14 ` Vivek Goyal [not found] ` <20140313141422.GB18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2014-03-13 14:14 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, ssorce, jkaluza, lpoetter, kay On Wed, Mar 12, 2014 at 02:12:33PM -0700, Andy Lutomirski wrote: > On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > On 03/12/2014 01:46 PM, Vivek Goyal wrote: > >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the > >> cgroup of first mounted hierarchy of the task. For the case of client, > >> it represents the cgroup of client at the time of opening the connection. > >> After that client cgroup might change. > > > > Even if people decide that sending cgroups over a unix socket is a good > > idea, this API has my NAK in the strongest possible sense, for whatever > > my NAK is worth. > > > > IMO SO_PEERCRED is a disaster. Calling send(2) or write(2) should > > *never* imply the use of a credential. A program should always have to > > *explicitly* request use of a credential. What you want is SCM_CGROUP. > > > > (I've found privilege escalations before based on this observation, and > > I suspect I'll find them again.) > > > > > > Note that I think that you really want SCM_SOMETHING_ELSE and not > > SCM_CGROUP, but I don't know what the use case is yet. > > This might not be quite as awful as I thought. At least you're > looking up the cgroup at connection time instead of at send time. > > OTOH, this is still racy -- the socket could easily outlive the cgroup > that created it. That's a good point. What guarantees that previous cgroup was not reassigned to a different container. What if a process A opens the connection with sssd. Process A passes the file descriptor to a different process B in a differnt container. Process A exits. Container gets removed from system and new one gets launched which uses same cgroup as old one. Now process B sends a new request and SSSD will serve it based on policy of newly launched container. This sounds very similar to pid race where socket/connection will outlive the pid. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <20140313141422.GB18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <20140313141422.GB18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-13 14:55 ` Simo Sorce [not found] ` <1394722534.32465.227.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 41+ messages in thread From: Simo Sorce @ 2014-03-13 14:55 UTC (permalink / raw) To: Vivek Goyal Cc: Andy Lutomirski, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Thu, 2014-03-13 at 10:14 -0400, Vivek Goyal wrote: > On Wed, Mar 12, 2014 at 02:12:33PM -0700, Andy Lutomirski wrote: > > On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote: > > > On 03/12/2014 01:46 PM, Vivek Goyal wrote: > > >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the > > >> cgroup of first mounted hierarchy of the task. For the case of client, > > >> it represents the cgroup of client at the time of opening the connection. > > >> After that client cgroup might change. > > > > > > Even if people decide that sending cgroups over a unix socket is a good > > > idea, this API has my NAK in the strongest possible sense, for whatever > > > my NAK is worth. > > > > > > IMO SO_PEERCRED is a disaster. Calling send(2) or write(2) should > > > *never* imply the use of a credential. A program should always have to > > > *explicitly* request use of a credential. What you want is SCM_CGROUP. > > > > > > (I've found privilege escalations before based on this observation, and > > > I suspect I'll find them again.) > > > > > > > > > Note that I think that you really want SCM_SOMETHING_ELSE and not > > > SCM_CGROUP, but I don't know what the use case is yet. > > > > This might not be quite as awful as I thought. At least you're > > looking up the cgroup at connection time instead of at send time. > > > > OTOH, this is still racy -- the socket could easily outlive the cgroup > > that created it. > > That's a good point. What guarantees that previous cgroup was not > reassigned to a different container. > > What if a process A opens the connection with sssd. Process A passes the > file descriptor to a different process B in a differnt container. Stop right here. If the process passes the fd it is not my problem anymore. The process can as well just 'proxy' all the information to another process. We just care to properly identify the 'original' container, we are not in the business of detecting malicious behavior. That's something other mechanism need to protect against (SELinux or other LSMs, normal permissions, capabilities, etc...). > Process A exits. Container gets removed from system and new one gets > launched which uses same cgroup as old one. Now process B sends a new > request and SSSD will serve it based on policy of newly launched > container. > > This sounds very similar to pid race where socket/connection will outlive > the pid. Nope, completely different. Simo. ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <1394722534.32465.227.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <1394722534.32465.227.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> @ 2014-03-13 15:00 ` Vivek Goyal [not found] ` <20140313150034.GG18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2014-03-13 15:00 UTC (permalink / raw) To: Simo Sorce Cc: Andy Lutomirski, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote: [..] > > > This might not be quite as awful as I thought. At least you're > > > looking up the cgroup at connection time instead of at send time. > > > > > > OTOH, this is still racy -- the socket could easily outlive the cgroup > > > that created it. > > > > That's a good point. What guarantees that previous cgroup was not > > reassigned to a different container. > > > > What if a process A opens the connection with sssd. Process A passes the > > file descriptor to a different process B in a differnt container. > > Stop right here. > If the process passes the fd it is not my problem anymore. > The process can as well just 'proxy' all the information to another > process. > > We just care to properly identify the 'original' container, we are not > in the business of detecting malicious behavior. That's something other > mechanism need to protect against (SELinux or other LSMs, normal > permissions, capabilities, etc...). > > > Process A exits. Container gets removed from system and new one gets > > launched which uses same cgroup as old one. Now process B sends a new > > request and SSSD will serve it based on policy of newly launched > > container. > > > > This sounds very similar to pid race where socket/connection will outlive > > the pid. > > Nope, completely different. > I think you missed my point. Passing file descriptor is not the problem. Problem is reuse of same cgroup name for a different container while socket lives on. And it is same race as reuse of a pid for a different process. Thanks Vivek ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <20140313150034.GG18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP [not found] ` <20140313150034.GG18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-13 16:33 ` Simo Sorce 2014-03-13 17:25 ` Andy Lutomirski 0 siblings, 1 reply; 41+ messages in thread From: Simo Sorce @ 2014-03-13 16:33 UTC (permalink / raw) To: Vivek Goyal Cc: Andy Lutomirski, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote: > On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote: > > [..] > > > > This might not be quite as awful as I thought. At least you're > > > > looking up the cgroup at connection time instead of at send time. > > > > > > > > OTOH, this is still racy -- the socket could easily outlive the cgroup > > > > that created it. > > > > > > That's a good point. What guarantees that previous cgroup was not > > > reassigned to a different container. > > > > > > What if a process A opens the connection with sssd. Process A passes the > > > file descriptor to a different process B in a differnt container. > > > > Stop right here. > > If the process passes the fd it is not my problem anymore. > > The process can as well just 'proxy' all the information to another > > process. > > > > We just care to properly identify the 'original' container, we are not > > in the business of detecting malicious behavior. That's something other > > mechanism need to protect against (SELinux or other LSMs, normal > > permissions, capabilities, etc...). > > > > > Process A exits. Container gets removed from system and new one gets > > > launched which uses same cgroup as old one. Now process B sends a new > > > request and SSSD will serve it based on policy of newly launched > > > container. > > > > > > This sounds very similar to pid race where socket/connection will outlive > > > the pid. > > > > Nope, completely different. > > > > I think you missed my point. Passing file descriptor is not the problem. > Problem is reuse of same cgroup name for a different container while > socket lives on. And it is same race as reuse of a pid for a different > process. The cgroup name should not be reused of course, if userspace does that, it is userspace's issue. cgroup names are not a constrained namespace like pids which force the kernel to reuse them for processes of a different nature. Simo. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 16:33 ` Simo Sorce @ 2014-03-13 17:25 ` Andy Lutomirski 2014-03-13 17:55 ` Simo Sorce 2014-03-13 17:56 ` Tim Hockin 0 siblings, 2 replies; 41+ messages in thread From: Andy Lutomirski @ 2014-03-13 17:25 UTC (permalink / raw) To: Simo Sorce Cc: Vivek Goyal, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce <ssorce@redhat.com> wrote: > On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote: >> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote: >> >> [..] >> > > > This might not be quite as awful as I thought. At least you're >> > > > looking up the cgroup at connection time instead of at send time. >> > > > >> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup >> > > > that created it. >> > > >> > > That's a good point. What guarantees that previous cgroup was not >> > > reassigned to a different container. >> > > >> > > What if a process A opens the connection with sssd. Process A passes the >> > > file descriptor to a different process B in a differnt container. >> > >> > Stop right here. >> > If the process passes the fd it is not my problem anymore. >> > The process can as well just 'proxy' all the information to another >> > process. >> > >> > We just care to properly identify the 'original' container, we are not >> > in the business of detecting malicious behavior. That's something other >> > mechanism need to protect against (SELinux or other LSMs, normal >> > permissions, capabilities, etc...). >> > >> > > Process A exits. Container gets removed from system and new one gets >> > > launched which uses same cgroup as old one. Now process B sends a new >> > > request and SSSD will serve it based on policy of newly launched >> > > container. >> > > >> > > This sounds very similar to pid race where socket/connection will outlive >> > > the pid. >> > >> > Nope, completely different. >> > >> >> I think you missed my point. Passing file descriptor is not the problem. >> Problem is reuse of same cgroup name for a different container while >> socket lives on. And it is same race as reuse of a pid for a different >> process. > > The cgroup name should not be reused of course, if userspace does that, > it is userspace's issue. cgroup names are not a constrained namespace > like pids which force the kernel to reuse them for processes of a > different nature. > You're proposing a feature that will enshrine cgroups into the API use by non-cgroup-controlling applications. I don't think that anyone thinks that cgroups are pretty, so this is an unfortunate thing to have to do. I've suggested three different ways that your goal could be achieved without using cgroups at all. You haven't really addressed any of them. In order for something like this to go into the kernel, I would expect a real use case and a justification for why this is the right way to do it. "Docker containers can be identified by cgroup path" is completely unconvincing to me. --Andy ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 17:25 ` Andy Lutomirski @ 2014-03-13 17:55 ` Simo Sorce 2014-03-13 17:56 ` Tim Hockin 1 sibling, 0 replies; 41+ messages in thread From: Simo Sorce @ 2014-03-13 17:55 UTC (permalink / raw) To: Andy Lutomirski Cc: Vivek Goyal, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay On Thu, 2014-03-13 at 10:25 -0700, Andy Lutomirski wrote: > On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce <ssorce@redhat.com> wrote: > > On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote: > >> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote: > >> > >> [..] > >> > > > This might not be quite as awful as I thought. At least you're > >> > > > looking up the cgroup at connection time instead of at send time. > >> > > > > >> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup > >> > > > that created it. > >> > > > >> > > That's a good point. What guarantees that previous cgroup was not > >> > > reassigned to a different container. > >> > > > >> > > What if a process A opens the connection with sssd. Process A passes the > >> > > file descriptor to a different process B in a differnt container. > >> > > >> > Stop right here. > >> > If the process passes the fd it is not my problem anymore. > >> > The process can as well just 'proxy' all the information to another > >> > process. > >> > > >> > We just care to properly identify the 'original' container, we are not > >> > in the business of detecting malicious behavior. That's something other > >> > mechanism need to protect against (SELinux or other LSMs, normal > >> > permissions, capabilities, etc...). > >> > > >> > > Process A exits. Container gets removed from system and new one gets > >> > > launched which uses same cgroup as old one. Now process B sends a new > >> > > request and SSSD will serve it based on policy of newly launched > >> > > container. > >> > > > >> > > This sounds very similar to pid race where socket/connection will outlive > >> > > the pid. > >> > > >> > Nope, completely different. > >> > > >> > >> I think you missed my point. Passing file descriptor is not the problem. > >> Problem is reuse of same cgroup name for a different container while > >> socket lives on. And it is same race as reuse of a pid for a different > >> process. > > > > The cgroup name should not be reused of course, if userspace does that, > > it is userspace's issue. cgroup names are not a constrained namespace > > like pids which force the kernel to reuse them for processes of a > > different nature. > > > > You're proposing a feature that will enshrine cgroups into the API use > by non-cgroup-controlling applications. I don't think that anyone > thinks that cgroups are pretty, so this is an unfortunate thing to > have to do. > > I've suggested three different ways that your goal could be achieved > without using cgroups at all. You haven't really addressed any of > them. I replied now, none of them strike me as practical or something that can be enforced. > In order for something like this to go into the kernel, I would expect > a real use case and a justification for why this is the right way to > do it. I think my justification is quite real, the fact you do not like it does not really make it any less real. I am open to suggestions on alternative methods of course, I do not care which way as long as it is practical and does not cause unreasonable restrictions on the containerization. As far as I could see all of the container stuff uses cgroups already for various reasons, so using cgroups seem natural. > "Docker containers can be identified by cgroup path" is completely > unconvincing to me. Provide an alternative, so far there is a cgroup with a unique name associated to every container, I haven't found any other way to derive that information in a race free way so far. Simo. ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-13 17:25 ` Andy Lutomirski 2014-03-13 17:55 ` Simo Sorce @ 2014-03-13 17:56 ` Tim Hockin 1 sibling, 0 replies; 41+ messages in thread From: Tim Hockin @ 2014-03-13 17:56 UTC (permalink / raw) To: Andy Lutomirski Cc: Simo Sorce, Vivek Goyal, linux-kernel@vger.kernel.org, cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza, lpoetter, kay In some sense a cgroup is a pgrp that mere mortals can't escape. Why not just do something like that? root can set this "container id" or "job id" on your process when it first starts (e.g. docker sets it on your container process) or even make a cgroup that sets this for all processes in that cgroup. ints are better than strings anyway. On Thu, Mar 13, 2014 at 10:25 AM, Andy Lutomirski <luto@amacapital.net> wrote: > On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce <ssorce@redhat.com> wrote: >> On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote: >>> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote: >>> >>> [..] >>> > > > This might not be quite as awful as I thought. At least you're >>> > > > looking up the cgroup at connection time instead of at send time. >>> > > > >>> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup >>> > > > that created it. >>> > > >>> > > That's a good point. What guarantees that previous cgroup was not >>> > > reassigned to a different container. >>> > > >>> > > What if a process A opens the connection with sssd. Process A passes the >>> > > file descriptor to a different process B in a differnt container. >>> > >>> > Stop right here. >>> > If the process passes the fd it is not my problem anymore. >>> > The process can as well just 'proxy' all the information to another >>> > process. >>> > >>> > We just care to properly identify the 'original' container, we are not >>> > in the business of detecting malicious behavior. That's something other >>> > mechanism need to protect against (SELinux or other LSMs, normal >>> > permissions, capabilities, etc...). >>> > >>> > > Process A exits. Container gets removed from system and new one gets >>> > > launched which uses same cgroup as old one. Now process B sends a new >>> > > request and SSSD will serve it based on policy of newly launched >>> > > container. >>> > > >>> > > This sounds very similar to pid race where socket/connection will outlive >>> > > the pid. >>> > >>> > Nope, completely different. >>> > >>> >>> I think you missed my point. Passing file descriptor is not the problem. >>> Problem is reuse of same cgroup name for a different container while >>> socket lives on. And it is same race as reuse of a pid for a different >>> process. >> >> The cgroup name should not be reused of course, if userspace does that, >> it is userspace's issue. cgroup names are not a constrained namespace >> like pids which force the kernel to reuse them for processes of a >> different nature. >> > > You're proposing a feature that will enshrine cgroups into the API use > by non-cgroup-controlling applications. I don't think that anyone > thinks that cgroups are pretty, so this is an unfortunate thing to > have to do. > > I've suggested three different ways that your goal could be achieved > without using cgroups at all. You haven't really addressed any of > them. > > In order for something like this to go into the kernel, I would expect > a real use case and a justification for why this is the right way to > do it. > > "Docker containers can be identified by cgroup path" is completely > unconvincing to me. > > --Andy > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <1394657163-7472-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer [not found] ` <1394657163-7472-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> @ 2014-03-12 20:56 ` Andy Lutomirski 2014-03-12 20:59 ` Simo Sorce 0 siblings, 1 reply; 41+ messages in thread From: Andy Lutomirski @ 2014-03-12 20:56 UTC (permalink / raw) To: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q, tj-DgEjT+Ai2ygdnm+yROfE0A Cc: ssorce-H+wXaHxf7aLQT0dZR+AlfA, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On 03/12/2014 01:46 PM, Vivek Goyal wrote: > Hi, > > This is V2 of patches. Fixed the function format issue and also I was using > CONFIG_CGROUP instead of CONFIG_CGROUPS. That led to crash at boot. Fixed that. > > Some applications like sssd want to know the cgroup of connected peer over > unix stream socket. They want to use this information to map the cgroup to > the container client belongs to and then decide what kind of policies apply > on the container. > Can you explain what the use case is? My a priori opinion is that this is a terrible idea. cgroups are a nasty interface, and letting knowledge of cgroups leak into the programs that live in the groups (as opposed to the cgroup manager) seems like a huge mistake to me. If you want to know where in the process hierarchy a message sender is, add *that* and figure out how to fix the races (it shouldn't be that hard). --Andy ^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer 2014-03-12 20:56 ` [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer Andy Lutomirski @ 2014-03-12 20:59 ` Simo Sorce [not found] ` <1394657970.32465.200.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> 0 siblings, 1 reply; 41+ messages in thread From: Simo Sorce @ 2014-03-12 20:59 UTC (permalink / raw) To: Andy Lutomirski Cc: Vivek Goyal, linux-kernel, cgroups, netdev, davem, tj, jkaluza, lpoetter, kay On Wed, 2014-03-12 at 13:56 -0700, Andy Lutomirski wrote: > On 03/12/2014 01:46 PM, Vivek Goyal wrote: > > Hi, > > > > This is V2 of patches. Fixed the function format issue and also I was using > > CONFIG_CGROUP instead of CONFIG_CGROUPS. That led to crash at boot. Fixed that. > > > > Some applications like sssd want to know the cgroup of connected peer over > > unix stream socket. They want to use this information to map the cgroup to > > the container client belongs to and then decide what kind of policies apply > > on the container. > > > > Can you explain what the use case is? External programs contacted from inside a container want to know 'who' is contacting them. Whee 'who' is determined by the cgroup their put in. This way these external programs can apply appropriate policy associated with the specific 'marking' cgroup. > My a priori opinion is that this is a terrible idea. cgroups are a > nasty interface, and letting knowledge of cgroups leak into the programs > that live in the groups (as opposed to the cgroup manager) seems like a > huge mistake to me. I am not sure where you are going, the program that want's to know about the cgroup is outside the group. > If you want to know where in the process hierarchy a message sender is, > add *that* and figure out how to fix the races (it shouldn't be that hard). What is *that* here ? Simo. ^ permalink raw reply [flat|nested] 41+ messages in thread
[parent not found: <1394657970.32465.200.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]
* Re: [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer [not found] ` <1394657970.32465.200.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org> @ 2014-03-12 21:09 ` Andy Lutomirski 0 siblings, 0 replies; 41+ messages in thread From: Andy Lutomirski @ 2014-03-12 21:09 UTC (permalink / raw) To: Simo Sorce Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development, David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA On Wed, Mar 12, 2014 at 1:59 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote: > On Wed, 2014-03-12 at 13:56 -0700, Andy Lutomirski wrote: >> On 03/12/2014 01:46 PM, Vivek Goyal wrote: >> > Hi, >> > >> > This is V2 of patches. Fixed the function format issue and also I was using >> > CONFIG_CGROUP instead of CONFIG_CGROUPS. That led to crash at boot. Fixed that. >> > >> > Some applications like sssd want to know the cgroup of connected peer over >> > unix stream socket. They want to use this information to map the cgroup to >> > the container client belongs to and then decide what kind of policies apply >> > on the container. >> > >> >> Can you explain what the use case is? > > External programs contacted from inside a container want to know 'who' > is contacting them. Whee 'who' is determined by the cgroup their put in. > This way these external programs can apply appropriate policy associated > with the specific 'marking' cgroup. > >> My a priori opinion is that this is a terrible idea. cgroups are a >> nasty interface, and letting knowledge of cgroups leak into the programs >> that live in the groups (as opposed to the cgroup manager) seems like a >> huge mistake to me. > > I am not sure where you are going, the program that want's to know about > the cgroup is outside the group. > >> If you want to know where in the process hierarchy a message sender is, >> add *that* and figure out how to fix the races (it shouldn't be that hard). > > What is *that* here ? It sounds like your use case is: systemd shoves a service in a cgroup. Its children stay in that cgroup. One of those children sends a message back to systemd or something that knows about systemd's use of cgroups and wants to identify which service it is. Now imagine that you're using a non-systemd cgroup controller, or you have more than one cgroup hierarchy, or you have two services that want to share a cgroup. Or imagine that you're totally happy with systemd but that you want to use this same facility from something unprivileged. So let's rethink this. There's already SCM_CREDENTIALS for sending pid, but using pid there is inherently racy. If that race were fixed and there were a clean way to look up with process subtree or service a pid lives in, then I think this would solve your problem. No cgroups needed. --Andy > > Simo. > > > -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 0/2] net: Implement SO_PEERCGROUP to get cgroup of peer @ 2014-03-12 18:45 Vivek Goyal 2014-03-12 18:45 ` [PATCH 2/2] net: Implement SO_PEERCGROUP Vivek Goyal 0 siblings, 1 reply; 41+ messages in thread From: Vivek Goyal @ 2014-03-12 18:45 UTC (permalink / raw) To: linux-kernel, cgroups, netdev, davem, tj Cc: ssorce, jkaluza, lpoetter, kay, Vivek Goyal Some applications like sssd want to know the cgroup of connected peer over unix stream socket. They want to use this information to map the the container client belongs to and then decide what kind of policies apply on the container. Well why not use SO_PEERCRED, extract pid from it and lookup in /proc/pid/cgroup to figure out cgroup of client. Problem there is that it is racy. By the time we look up in /proc, it might happen that client exited (possibly after handing over socket fd to a child), and client pid can possibly be assigned to another process. That's the reason people are looking for more reliable mechanism. There are others like journald who want similar information over unix datagram sockets. A patchset to provide that functionality was posted here. https://lkml.org/lkml/2014/1/13/43 But this was rejected because of overhead it will cause for rest of the cases. https://lkml.org/lkml/2014/1/15/480 This patch series implements SO_PEERCGROUP, which gives more connection based and gives the cgroup of client at the time of opening the connection. So overhead is involved only during connection setup and there should not be any overhead after that. So it does not solve all the use cases out there but can solve the needs of sssd. Hence I am posting this patch. Please consider it for inclusion. Thanks Vivek Vivek Goyal (2): cgroup: Provide empty definition of task_cgroup_path() net: Implement SO_PEERCGROUP arch/alpha/include/uapi/asm/socket.h | 1 + arch/avr32/include/uapi/asm/socket.h | 1 + arch/cris/include/uapi/asm/socket.h | 2 ++ arch/frv/include/uapi/asm/socket.h | 1 + arch/ia64/include/uapi/asm/socket.h | 2 ++ arch/m32r/include/uapi/asm/socket.h | 1 + arch/mips/include/uapi/asm/socket.h | 1 + arch/mn10300/include/uapi/asm/socket.h | 1 + arch/parisc/include/uapi/asm/socket.h | 1 + arch/powerpc/include/uapi/asm/socket.h | 1 + arch/s390/include/uapi/asm/socket.h | 1 + arch/sparc/include/uapi/asm/socket.h | 2 ++ arch/xtensa/include/uapi/asm/socket.h | 1 + include/linux/cgroup.h | 2 ++ include/net/sock.h | 1 + include/uapi/asm-generic/socket.h | 2 ++ net/core/sock.c | 19 ++++++++++++++ net/unix/af_unix.c | 48 ++++++++++++++++++++++++++++++++++ 18 files changed, 88 insertions(+) -- 1.8.5.3 ^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 2/2] net: Implement SO_PEERCGROUP 2014-03-12 18:45 [PATCH 0/2] " Vivek Goyal @ 2014-03-12 18:45 ` Vivek Goyal 0 siblings, 0 replies; 41+ messages in thread From: Vivek Goyal @ 2014-03-12 18:45 UTC (permalink / raw) To: linux-kernel, cgroups, netdev, davem, tj Cc: ssorce, jkaluza, lpoetter, kay, Vivek Goyal Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the cgroup of first mounted hierarchy of the task. For the case of client, it represents the cgroup of client at the time of opening the connection. After that client cgroup might change. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- arch/alpha/include/uapi/asm/socket.h | 1 + arch/avr32/include/uapi/asm/socket.h | 1 + arch/cris/include/uapi/asm/socket.h | 2 ++ arch/frv/include/uapi/asm/socket.h | 1 + arch/ia64/include/uapi/asm/socket.h | 2 ++ arch/m32r/include/uapi/asm/socket.h | 1 + arch/mips/include/uapi/asm/socket.h | 1 + arch/mn10300/include/uapi/asm/socket.h | 1 + arch/parisc/include/uapi/asm/socket.h | 1 + arch/powerpc/include/uapi/asm/socket.h | 1 + arch/s390/include/uapi/asm/socket.h | 1 + arch/sparc/include/uapi/asm/socket.h | 2 ++ arch/xtensa/include/uapi/asm/socket.h | 1 + include/net/sock.h | 1 + include/uapi/asm-generic/socket.h | 2 ++ net/core/sock.c | 19 ++++++++++++++ net/unix/af_unix.c | 48 ++++++++++++++++++++++++++++++++++ 17 files changed, 86 insertions(+) diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h index 3de1394..7178353 100644 --- a/arch/alpha/include/uapi/asm/socket.h +++ b/arch/alpha/include/uapi/asm/socket.h @@ -87,4 +87,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h index 6e6cd15..486212b 100644 --- a/arch/avr32/include/uapi/asm/socket.h +++ b/arch/avr32/include/uapi/asm/socket.h @@ -80,4 +80,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _UAPI__ASM_AVR32_SOCKET_H */ diff --git a/arch/cris/include/uapi/asm/socket.h b/arch/cris/include/uapi/asm/socket.h index ed94e5e..89a09e3 100644 --- a/arch/cris/include/uapi/asm/socket.h +++ b/arch/cris/include/uapi/asm/socket.h @@ -82,6 +82,8 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 + #endif /* _ASM_SOCKET_H */ diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h index ca2c6e6..c4d90bc 100644 --- a/arch/frv/include/uapi/asm/socket.h +++ b/arch/frv/include/uapi/asm/socket.h @@ -80,5 +80,6 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _ASM_SOCKET_H */ diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h index a1b49ba..62c196d 100644 --- a/arch/ia64/include/uapi/asm/socket.h +++ b/arch/ia64/include/uapi/asm/socket.h @@ -89,4 +89,6 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 + #endif /* _ASM_IA64_SOCKET_H */ diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h index 6c9a24b..6e04a7d 100644 --- a/arch/m32r/include/uapi/asm/socket.h +++ b/arch/m32r/include/uapi/asm/socket.h @@ -80,4 +80,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _ASM_M32R_SOCKET_H */ diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h index a14baa2..cfbd84b 100644 --- a/arch/mips/include/uapi/asm/socket.h +++ b/arch/mips/include/uapi/asm/socket.h @@ -98,4 +98,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h index 6aa3ce1..73467fe 100644 --- a/arch/mn10300/include/uapi/asm/socket.h +++ b/arch/mn10300/include/uapi/asm/socket.h @@ -80,4 +80,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _ASM_SOCKET_H */ diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h index fe35cea..24d8913 100644 --- a/arch/parisc/include/uapi/asm/socket.h +++ b/arch/parisc/include/uapi/asm/socket.h @@ -79,4 +79,5 @@ #define SO_BPF_EXTENSIONS 0x4029 +#define SO_PEERCGROUP 0x402a #endif /* _UAPI_ASM_SOCKET_H */ diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h index a9c3e2e..50106be 100644 --- a/arch/powerpc/include/uapi/asm/socket.h +++ b/arch/powerpc/include/uapi/asm/socket.h @@ -87,4 +87,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _ASM_POWERPC_SOCKET_H */ diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h index e031332..4ae2f3c 100644 --- a/arch/s390/include/uapi/asm/socket.h +++ b/arch/s390/include/uapi/asm/socket.h @@ -86,4 +86,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _ASM_SOCKET_H */ diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h index 54d9608..1056168 100644 --- a/arch/sparc/include/uapi/asm/socket.h +++ b/arch/sparc/include/uapi/asm/socket.h @@ -76,6 +76,8 @@ #define SO_BPF_EXTENSIONS 0x0032 +#define SO_PEERCGROUP 0x0033 + /* Security levels - as per NRL IPv6 - don't actually do anything */ #define SO_SECURITY_AUTHENTICATION 0x5001 #define SO_SECURITY_ENCRYPTION_TRANSPORT 0x5002 diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h index 39acec0..947bc6e 100644 --- a/arch/xtensa/include/uapi/asm/socket.h +++ b/arch/xtensa/include/uapi/asm/socket.h @@ -91,4 +91,5 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 #endif /* _XTENSA_SOCKET_H */ diff --git a/include/net/sock.h b/include/net/sock.h index 5c3f7c3..d594575 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -424,6 +424,7 @@ struct sock { int (*sk_backlog_rcv)(struct sock *sk, struct sk_buff *skb); void (*sk_destruct)(struct sock *sk); + char *cgroup_path; }; #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h index ea0796b..e86be5b 100644 --- a/include/uapi/asm-generic/socket.h +++ b/include/uapi/asm-generic/socket.h @@ -82,4 +82,6 @@ #define SO_BPF_EXTENSIONS 48 +#define SO_PEERCGROUP 49 + #endif /* __ASM_GENERIC_SOCKET_H */ diff --git a/net/core/sock.c b/net/core/sock.c index 5b6a943..0827a3c 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1185,6 +1185,24 @@ int sock_getsockopt(struct socket *sock, int level, int optname, v.val = sk->sk_max_pacing_rate; break; + case SO_PEERCGROUP: + { + int cgroup_path_len; + + if (!sk->cgroup_path) { + len = 0; + goto lenout; + } + + cgroup_path_len = strlen(sk->cgroup_path) + 1; + + if (len > cgroup_path_len) + len = cgroup_path_len; + if (copy_to_user(optval, sk->cgroup_path, len)) + return -EFAULT; + goto lenout; + } + default: return -ENOPROTOOPT; } @@ -1378,6 +1396,7 @@ static void __sk_free(struct sock *sk) put_cred(sk->sk_peer_cred); put_pid(sk->sk_peer_pid); put_net(sock_net(sk)); + kfree(sk->cgroup_path); sk_prot_free(sk->sk_prot_creator, sk); } diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 29fc8be..e35105f 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -474,6 +474,37 @@ static void copy_peercred(struct sock *sk, struct sock *peersk) sk->sk_peer_cred = get_cred(peersk->sk_peer_cred); } +static int alloc_cgroup_path(struct sock *sk) +{ +#ifdef CONFIG_CGROUP + if (sk->cgroup_path) + return 0; + + sk->cgroup_path = kzalloc(PATH_MAX, GFP_KERNEL); + if (!sk->cgroup_path) + return -ENOMEM; + +#endif + return 0; +} + +static int init_peercgroup(struct sock *sk) +{ + int ret; + + ret = alloc_cgroup_path(sk); + if (ret) + return ret; + + return task_cgroup_path(current, sk->cgroup_path, PATH_MAX); +} + +static void copy_peercgroup(struct sock *sk, struct sock *peersk) +{ + if (sk->cgroup_path) + strncpy(sk->cgroup_path, peersk->cgroup_path, PATH_MAX); +} + static int unix_listen(struct socket *sock, int backlog) { int err; @@ -487,6 +518,12 @@ static int unix_listen(struct socket *sock, int backlog) err = -EINVAL; if (!u->addr) goto out; /* No listens on an unbound socket */ + + err = init_peercgroup(sk); + if (err) + goto out; + + err = -EINVAL; unix_state_lock(sk); if (sk->sk_state != TCP_CLOSE && sk->sk_state != TCP_LISTEN) goto out_unlock; @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, if (newsk == NULL) goto out; + err = init_peercgroup(newsk); + if (err) + goto out; + + err = alloc_cgroup_path(sk); + if (err) + goto out; + + err = -ENOMEM; + /* Allocate skb for sending to listening sock */ skb = sock_wmalloc(newsk, 1, 0, GFP_KERNEL); if (skb == NULL) @@ -1203,6 +1250,7 @@ restart: /* Set credentials */ copy_peercred(sk, other); + copy_peercgroup(sk, other); sock->state = SS_CONNECTED; sk->sk_state = TCP_ESTABLISHED; -- 1.8.5.3 ^ permalink raw reply related [flat|nested] 41+ messages in thread
end of thread, other threads:[~2014-03-14 23:54 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-12 20:46 [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer Vivek Goyal
2014-03-12 20:46 ` [PATCH 1/2] cgroup: Provide empty definition of task_cgroup_path() Vivek Goyal
2014-03-12 20:46 ` [PATCH 2/2] net: Implement SO_PEERCGROUP Vivek Goyal
2014-03-12 20:58 ` Cong Wang
[not found] ` <CAHA+R7OrNoa7_J-rOskxgdvkM5gAnQyoFBeCXSziY7XMd-yLNA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-13 13:48 ` Vivek Goyal
[not found] ` <1394657163-7472-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-12 21:00 ` Andy Lutomirski
2014-03-12 21:12 ` Andy Lutomirski
[not found] ` <CALCETrUTjN=XKwnO62P9roZtLLuk7_9Oi17Mj_Aa2ZFtZc1gVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-12 21:16 ` Simo Sorce
[not found] ` <1394658983.32465.203.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-12 21:19 ` Andy Lutomirski
[not found] ` <CALCETrU29CpcK4dVRD45BGOD7AVEudADiFxOgFgKWkpFzw07eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-13 1:17 ` Simo Sorce
[not found] ` <1394673476.32465.215.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-13 1:21 ` Andy Lutomirski
[not found] ` <CALCETrWbCuKfV4zMkZajDOGhGkKduW0C2L7uHv=vtvqhVeAc6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-13 1:43 ` Simo Sorce
[not found] ` <1394675038.32465.223.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-13 2:12 ` Andy Lutomirski
2014-03-13 14:27 ` Vivek Goyal
[not found] ` <20140313142755.GC18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-14 23:54 ` Eric W. Biederman
[not found] ` <CALCETrV7GE0YL1sDDqTUV81rWL9zk2vTR1-VLkR7CfTA-FSrgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-13 17:51 ` Simo Sorce
2014-03-13 17:55 ` Andy Lutomirski
[not found] ` <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-13 17:57 ` Simo Sorce
[not found] ` <1394733448.32465.249.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-13 18:03 ` Andy Lutomirski
2014-03-13 17:58 ` Simo Sorce
2014-03-13 18:01 ` Andy Lutomirski
2014-03-13 18:05 ` Tim Hockin
2014-03-13 19:53 ` Vivek Goyal
2014-03-13 19:58 ` Andy Lutomirski
2014-03-13 20:06 ` Vivek Goyal
[not found] ` <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-13 20:17 ` Vivek Goyal
[not found] ` <20140313201755.GO18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-13 20:19 ` Vivek Goyal
2014-03-13 21:21 ` Andy Lutomirski
2014-03-14 23:49 ` Eric W. Biederman
2014-03-13 18:02 ` Vivek Goyal
2014-03-13 14:14 ` Vivek Goyal
[not found] ` <20140313141422.GB18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-13 14:55 ` Simo Sorce
[not found] ` <1394722534.32465.227.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-13 15:00 ` Vivek Goyal
[not found] ` <20140313150034.GG18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-13 16:33 ` Simo Sorce
2014-03-13 17:25 ` Andy Lutomirski
2014-03-13 17:55 ` Simo Sorce
2014-03-13 17:56 ` Tim Hockin
[not found] ` <1394657163-7472-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-12 20:56 ` [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer Andy Lutomirski
2014-03-12 20:59 ` Simo Sorce
[not found] ` <1394657970.32465.200.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-12 21:09 ` Andy Lutomirski
-- strict thread matches above, loose matches on Subject: below --
2014-03-12 18:45 [PATCH 0/2] " Vivek Goyal
2014-03-12 18:45 ` [PATCH 2/2] net: Implement SO_PEERCGROUP Vivek Goyal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).