[PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer
@ 2014-03-12 20:46 Vivek Goyal
  2014-03-12 20:46 ` [PATCH 1/2] cgroup: Provide empty definition of task_cgroup_path() Vivek Goyal
                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Vivek Goyal @ 2014-03-12 20:46 UTC (permalink / raw)
  To: linux-kernel, cgroups, netdev, davem, tj
  Cc: ssorce, jkaluza, lpoetter, kay, Vivek Goyal

Hi,

This is V2 of patches. Fixed the function format issue and also I was using
CONFIG_CGROUP instead of CONFIG_CGROUPS. That led to crash at boot. Fixed that.

Some applications like sssd want to know the cgroup of connected peer over
unix stream socket. They want to use this information to map the cgroup to 
the container client belongs to and then decide what kind of policies apply
on the container.

Well why not use SO_PEERCRED, extract pid from it and lookup in
/proc/pid/cgroup to figure out cgroup of client. Problem there is that it
is racy. By the time we look up in /proc, it might happen that client
exited (possibly after handing over socket fd to a child), and client pid
can possibly be assigned to another process. That's the reason people are
looking for more reliable mechanism.

There are others like journald who want similar information over unix
datagram sockets. A patchset to provide that functionality was posted
here.

https://lkml.org/lkml/2014/1/13/43

But this was rejected because of overhead it will cause for rest of the
cases.

https://lkml.org/lkml/2014/1/15/480

This patch series implements SO_PEERCGROUP, which gives the cgroup of
client at the time of opening the connection. So overhead is involved only
during connection setup and there should not be any overhead after that.

So it does not solve all the use cases out there but can solve the needs
of sssd. Hence I am posting this patch.

Please consider it for inclusion.

Thanks
Vivek

Vivek Goyal (2):
  cgroup: Provide empty definition of task_cgroup_path()
  net: Implement SO_PEERCGROUP

 arch/alpha/include/uapi/asm/socket.h   |  1 +
 arch/avr32/include/uapi/asm/socket.h   |  1 +
 arch/cris/include/uapi/asm/socket.h    |  2 ++
 arch/frv/include/uapi/asm/socket.h     |  1 +
 arch/ia64/include/uapi/asm/socket.h    |  2 ++
 arch/m32r/include/uapi/asm/socket.h    |  1 +
 arch/mips/include/uapi/asm/socket.h    |  1 +
 arch/mn10300/include/uapi/asm/socket.h |  1 +
 arch/parisc/include/uapi/asm/socket.h  |  1 +
 arch/powerpc/include/uapi/asm/socket.h |  1 +
 arch/s390/include/uapi/asm/socket.h    |  1 +
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  1 +
 include/linux/cgroup.h                 |  6 +++++
 include/net/sock.h                     |  1 +
 include/uapi/asm-generic/socket.h      |  2 ++
 net/core/sock.c                        | 19 ++++++++++++++
 net/unix/af_unix.c                     | 48 ++++++++++++++++++++++++++++++++++
 18 files changed, 92 insertions(+)

-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/2] cgroup: Provide empty definition of task_cgroup_path()
  2014-03-12 20:46 [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer Vivek Goyal
@ 2014-03-12 20:46 ` Vivek Goyal
  2014-03-12 20:46 ` [PATCH 2/2] net: Implement SO_PEERCGROUP Vivek Goyal
       [not found] ` <1394657163-7472-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2 siblings, 0 replies; 41+ messages in thread
From: Vivek Goyal @ 2014-03-12 20:46 UTC (permalink / raw)
  To: linux-kernel, cgroups, netdev, davem, tj
  Cc: ssorce, jkaluza, lpoetter, kay, Vivek Goyal

Compilation with !CONFIG_CGROUP fails for task_cgroup_path() user. So
provide an emtpy definition.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 include/linux/cgroup.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9450f02..727728c 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -869,6 +869,12 @@ static inline int cgroup_attach_task_all(struct task_struct *from,
 	return 0;
 }
 
+static inline int
+task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
+{
+	return 0;
+}
+
 #endif /* !CONFIG_CGROUPS */
 
 #endif /* _LINUX_CGROUP_H */
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-12 20:46 [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer Vivek Goyal
  2014-03-12 20:46 ` [PATCH 1/2] cgroup: Provide empty definition of task_cgroup_path() Vivek Goyal
@ 2014-03-12 20:46 ` Vivek Goyal
  2014-03-12 20:58   ` Cong Wang
       [not found]   ` <1394657163-7472-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
       [not found] ` <1394657163-7472-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2 siblings, 2 replies; 41+ messages in thread
From: Vivek Goyal @ 2014-03-12 20:46 UTC (permalink / raw)
  To: linux-kernel, cgroups, netdev, davem, tj
  Cc: ssorce, jkaluza, lpoetter, kay, Vivek Goyal

Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
cgroup of first mounted hierarchy of the task. For the case of client,
it represents the cgroup of client at the time of opening the connection.
After that client cgroup might change.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/alpha/include/uapi/asm/socket.h   |  1 +
 arch/avr32/include/uapi/asm/socket.h   |  1 +
 arch/cris/include/uapi/asm/socket.h    |  2 ++
 arch/frv/include/uapi/asm/socket.h     |  1 +
 arch/ia64/include/uapi/asm/socket.h    |  2 ++
 arch/m32r/include/uapi/asm/socket.h    |  1 +
 arch/mips/include/uapi/asm/socket.h    |  1 +
 arch/mn10300/include/uapi/asm/socket.h |  1 +
 arch/parisc/include/uapi/asm/socket.h  |  1 +
 arch/powerpc/include/uapi/asm/socket.h |  1 +
 arch/s390/include/uapi/asm/socket.h    |  1 +
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  1 +
 include/net/sock.h                     |  1 +
 include/uapi/asm-generic/socket.h      |  2 ++
 net/core/sock.c                        | 19 ++++++++++++++
 net/unix/af_unix.c                     | 48 ++++++++++++++++++++++++++++++++++
 17 files changed, 86 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 3de1394..7178353 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -87,4 +87,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 6e6cd15..486212b 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/uapi/asm/socket.h b/arch/cris/include/uapi/asm/socket.h
index ed94e5e..89a09e3 100644
--- a/arch/cris/include/uapi/asm/socket.h
+++ b/arch/cris/include/uapi/asm/socket.h
@@ -82,6 +82,8 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
+
 #endif /* _ASM_SOCKET_H */
 
 
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index ca2c6e6..c4d90bc 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -80,5 +80,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index a1b49ba..62c196d 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index 6c9a24b..6e04a7d 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index a14baa2..cfbd84b 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -98,4 +98,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index 6aa3ce1..73467fe 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index fe35cea..24d8913 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -79,4 +79,5 @@
 
 #define SO_BPF_EXTENSIONS	0x4029
 
+#define SO_PEERCGROUP           0x402a
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index a9c3e2e..50106be 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -87,4 +87,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index e031332..4ae2f3c 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -86,4 +86,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 54d9608..1056168 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -76,6 +76,8 @@
 
 #define SO_BPF_EXTENSIONS	0x0032
 
+#define SO_PEERCGROUP           0x0033
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 39acec0..947bc6e 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -91,4 +91,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 5c3f7c3..d594575 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -424,6 +424,7 @@ struct sock {
 	int			(*sk_backlog_rcv)(struct sock *sk,
 						  struct sk_buff *skb);
 	void                    (*sk_destruct)(struct sock *sk);
+	char			*cgroup_path;
 };
 
 #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data)))
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index ea0796b..e86be5b 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -82,4 +82,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 5b6a943..0827a3c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1185,6 +1185,24 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sk->sk_max_pacing_rate;
 		break;
 
+	case SO_PEERCGROUP:
+	{
+		int cgroup_path_len;
+
+		if (!sk->cgroup_path) {
+			len = 0;
+			goto lenout;
+		}
+
+		cgroup_path_len = strlen(sk->cgroup_path) + 1;
+
+		if (len > cgroup_path_len)
+			len = cgroup_path_len;
+		if (copy_to_user(optval, sk->cgroup_path, len))
+			return -EFAULT;
+		goto lenout;
+	}
+
 	default:
 		return -ENOPROTOOPT;
 	}
@@ -1378,6 +1396,7 @@ static void __sk_free(struct sock *sk)
 		put_cred(sk->sk_peer_cred);
 	put_pid(sk->sk_peer_pid);
 	put_net(sock_net(sk));
+	kfree(sk->cgroup_path);
 	sk_prot_free(sk->sk_prot_creator, sk);
 }
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 29fc8be..6921ae6 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -474,6 +474,37 @@ static void copy_peercred(struct sock *sk, struct sock *peersk)
 	sk->sk_peer_cred = get_cred(peersk->sk_peer_cred);
 }
 
+static int alloc_cgroup_path(struct sock *sk)
+{
+#ifdef CONFIG_CGROUPS
+	if (sk->cgroup_path)
+		return 0;
+
+	sk->cgroup_path = kzalloc(PATH_MAX, GFP_KERNEL);
+	if (!sk->cgroup_path)
+		return -ENOMEM;
+
+#endif
+	return 0;
+}
+
+static int init_peercgroup(struct sock *sk)
+{
+	int ret;
+
+	ret = alloc_cgroup_path(sk);
+	if (ret)
+		return ret;
+
+	return task_cgroup_path(current, sk->cgroup_path, PATH_MAX);
+}
+
+static void copy_peercgroup(struct sock *sk, struct sock *peersk)
+{
+	if (sk->cgroup_path)
+		strncpy(sk->cgroup_path, peersk->cgroup_path, PATH_MAX);
+}
+
 static int unix_listen(struct socket *sock, int backlog)
 {
 	int err;
@@ -487,6 +518,12 @@ static int unix_listen(struct socket *sock, int backlog)
 	err = -EINVAL;
 	if (!u->addr)
 		goto out;	/* No listens on an unbound socket */
+
+	err = init_peercgroup(sk);
+	if (err)
+		goto out;
+
+	err = -EINVAL;
 	unix_state_lock(sk);
 	if (sk->sk_state != TCP_CLOSE && sk->sk_state != TCP_LISTEN)
 		goto out_unlock;
@@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	if (newsk == NULL)
 		goto out;
 
+	err = init_peercgroup(newsk);
+	if (err)
+		goto out;
+
+	err = alloc_cgroup_path(sk);
+	if (err)
+		goto out;
+
+	err = -ENOMEM;
+
 	/* Allocate skb for sending to listening sock */
 	skb = sock_wmalloc(newsk, 1, 0, GFP_KERNEL);
 	if (skb == NULL)
@@ -1203,6 +1250,7 @@ restart:
 
 	/* Set credentials */
 	copy_peercred(sk, other);
+	copy_peercgroup(sk, other);
 
 	sock->state	= SS_CONNECTED;
 	sk->sk_state	= TCP_ESTABLISHED;
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-12 20:46 ` [PATCH 2/2] net: Implement SO_PEERCGROUP Vivek Goyal
@ 2014-03-12 20:58   ` Cong Wang
       [not found]     ` <CAHA+R7OrNoa7_J-rOskxgdvkM5gAnQyoFBeCXSziY7XMd-yLNA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]   ` <1394657163-7472-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 41+ messages in thread
From: Cong Wang @ 2014-03-12 20:58 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-kernel@vger.kernel.org, cgroups, netdev, David Miller, tj,
	ssorce, jkaluza, lpoetter, kay

On Wed, Mar 12, 2014 at 1:46 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
>         if (newsk == NULL)
>                 goto out;
>
> +       err = init_peercgroup(newsk);
> +       if (err)
> +               goto out;
> +
> +       err = alloc_cgroup_path(sk);
> +       if (err)
> +               goto out;
> +
> +       err = -ENOMEM;
> +

Don't we need to free the cgroup_path on error path
in this function?

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <CAHA+R7OrNoa7_J-rOskxgdvkM5gAnQyoFBeCXSziY7XMd-yLNA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]     ` <CAHA+R7OrNoa7_J-rOskxgdvkM5gAnQyoFBeCXSziY7XMd-yLNA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-13 13:48       ` Vivek Goyal
  0 siblings, 0 replies; 41+ messages in thread
From: Vivek Goyal @ 2014-03-13 13:48 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, netdev, David Miller,
	tj-DgEjT+Ai2ygdnm+yROfE0A, ssorce-H+wXaHxf7aLQT0dZR+AlfA,
	jkaluza-H+wXaHxf7aLQT0dZR+AlfA, lpoetter-H+wXaHxf7aLQT0dZR+AlfA,
	kay-H+wXaHxf7aLQT0dZR+AlfA

On Wed, Mar 12, 2014 at 01:58:57PM -0700, Cong Wang wrote:
> On Wed, Mar 12, 2014 at 1:46 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > @@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> >         if (newsk == NULL)
> >                 goto out;
> >
> > +       err = init_peercgroup(newsk);
> > +       if (err)
> > +               goto out;
> > +
> > +       err = alloc_cgroup_path(sk);
> > +       if (err)
> > +               goto out;
> > +
> > +       err = -ENOMEM;
> > +
> 
> Don't we need to free the cgroup_path on error path
> in this function?

Previous allocated cgroup_path is now in newsk->cgroup_path and I was
relying on __sk_free() freeing that memory if error happens.

unix_release_sock(sk)
  sock_put()
    sk_free()
      __sk_free()
        kfree(sk->cgroup_path)

Do you see a problem with that?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <1394657163-7472-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]   ` <1394657163-7472-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-03-12 21:00     ` Andy Lutomirski
  2014-03-12 21:12       ` Andy Lutomirski
  0 siblings, 1 reply; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-12 21:00 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, tj-DgEjT+Ai2ygdnm+yROfE0A
  Cc: ssorce-H+wXaHxf7aLQT0dZR+AlfA, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> cgroup of first mounted hierarchy of the task. For the case of client,
> it represents the cgroup of client at the time of opening the connection.
> After that client cgroup might change.

Even if people decide that sending cgroups over a unix socket is a good
idea, this API has my NAK in the strongest possible sense, for whatever
my NAK is worth.

IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
*never* imply the use of a credential.  A program should always have to
*explicitly* request use of a credential.  What you want is SCM_CGROUP.

(I've found privilege escalations before based on this observation, and
I suspect I'll find them again.)

Note that I think that you really want SCM_SOMETHING_ELSE and not
SCM_CGROUP, but I don't know what the use case is yet.

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-12 21:00     ` Andy Lutomirski
@ 2014-03-12 21:12       ` Andy Lutomirski
       [not found]         ` <CALCETrUTjN=XKwnO62P9roZtLLuk7_9Oi17Mj_Aa2ZFtZc1gVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-03-13 14:14         ` Vivek Goyal
  0 siblings, 2 replies; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-12 21:12 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo
  Cc: ssorce, jkaluza, lpoetter, kay

On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> On 03/12/2014 01:46 PM, Vivek Goyal wrote:
>> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
>> cgroup of first mounted hierarchy of the task. For the case of client,
>> it represents the cgroup of client at the time of opening the connection.
>> After that client cgroup might change.
>
> Even if people decide that sending cgroups over a unix socket is a good
> idea, this API has my NAK in the strongest possible sense, for whatever
> my NAK is worth.
>
> IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> *never* imply the use of a credential.  A program should always have to
> *explicitly* request use of a credential.  What you want is SCM_CGROUP.
>
> (I've found privilege escalations before based on this observation, and
> I suspect I'll find them again.)
>
>
> Note that I think that you really want SCM_SOMETHING_ELSE and not
> SCM_CGROUP, but I don't know what the use case is yet.

This might not be quite as awful as I thought.  At least you're
looking up the cgroup at connection time instead of at send time.

OTOH, this is still racy -- the socket could easily outlive the cgroup
that created it.

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <CALCETrUTjN=XKwnO62P9roZtLLuk7_9Oi17Mj_Aa2ZFtZc1gVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]         ` <CALCETrUTjN=XKwnO62P9roZtLLuk7_9Oi17Mj_Aa2ZFtZc1gVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-12 21:16           ` Simo Sorce
       [not found]             ` <1394658983.32465.203.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Simo Sorce @ 2014-03-12 21:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> >> cgroup of first mounted hierarchy of the task. For the case of client,
> >> it represents the cgroup of client at the time of opening the connection.
> >> After that client cgroup might change.
> >
> > Even if people decide that sending cgroups over a unix socket is a good
> > idea, this API has my NAK in the strongest possible sense, for whatever
> > my NAK is worth.
> >
> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> > *never* imply the use of a credential.  A program should always have to
> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> >
> > (I've found privilege escalations before based on this observation, and
> > I suspect I'll find them again.)
> >
> >
> > Note that I think that you really want SCM_SOMETHING_ELSE and not
> > SCM_CGROUP, but I don't know what the use case is yet.
> 
> This might not be quite as awful as I thought.  At least you're
> looking up the cgroup at connection time instead of at send time.
> 
> OTOH, this is still racy -- the socket could easily outlive the cgroup
> that created it.

I think you do not understand how this whole problem space works.

The problem is exactly the same as with SO_PEERCRED, so we are taking
the same proven solution.

Connection time is all we do and can care about.

Simo.

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <1394658983.32465.203.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]             ` <1394658983.32465.203.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
@ 2014-03-12 21:19               ` Andy Lutomirski
       [not found]                 ` <CALCETrU29CpcK4dVRD45BGOD7AVEudADiFxOgFgKWkpFzw07eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-12 21:19 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
>> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
>> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
>> >> cgroup of first mounted hierarchy of the task. For the case of client,
>> >> it represents the cgroup of client at the time of opening the connection.
>> >> After that client cgroup might change.
>> >
>> > Even if people decide that sending cgroups over a unix socket is a good
>> > idea, this API has my NAK in the strongest possible sense, for whatever
>> > my NAK is worth.
>> >
>> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
>> > *never* imply the use of a credential.  A program should always have to
>> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
>> >
>> > (I've found privilege escalations before based on this observation, and
>> > I suspect I'll find them again.)
>> >
>> >
>> > Note that I think that you really want SCM_SOMETHING_ELSE and not
>> > SCM_CGROUP, but I don't know what the use case is yet.
>>
>> This might not be quite as awful as I thought.  At least you're
>> looking up the cgroup at connection time instead of at send time.
>>
>> OTOH, this is still racy -- the socket could easily outlive the cgroup
>> that created it.
>
> I think you do not understand how this whole problem space works.
>
> The problem is exactly the same as with SO_PEERCRED, so we are taking
> the same proven solution.

You mean the same proven crappy solution?

>
> Connection time is all we do and can care about.

You have not answered why.

>
> Simo.
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <CALCETrU29CpcK4dVRD45BGOD7AVEudADiFxOgFgKWkpFzw07eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                 ` <CALCETrU29CpcK4dVRD45BGOD7AVEudADiFxOgFgKWkpFzw07eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-13  1:17                   ` Simo Sorce
       [not found]                     ` <1394673476.32465.215.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Simo Sorce @ 2014-03-13  1:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On Wed, 2014-03-12 at 14:12 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> >> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> >> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> >> >> cgroup of first mounted hierarchy of the task. For the case of client,
> >> >> it represents the cgroup of client at the time of opening the connection.
> >> >> After that client cgroup might change.
> >> >
> >> > Even if people decide that sending cgroups over a unix socket is a good
> >> > idea, this API has my NAK in the strongest possible sense, for whatever
> >> > my NAK is worth.
> >> >
> >> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> >> > *never* imply the use of a credential.  A program should always have to
> >> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> >> >
> >> > (I've found privilege escalations before based on this observation, and
> >> > I suspect I'll find them again.)
> >> >
> >> >
> >> > Note that I think that you really want SCM_SOMETHING_ELSE and not
> >> > SCM_CGROUP, but I don't know what the use case is yet.
> >>
> >> This might not be quite as awful as I thought.  At least you're
> >> looking up the cgroup at connection time instead of at send time.
> >>
> >> OTOH, this is still racy -- the socket could easily outlive the cgroup
> >> that created it.
> >
> > I think you do not understand how this whole problem space works.
> >
> > The problem is exactly the same as with SO_PEERCRED, so we are taking
> > the same proven solution.
> 
> You mean the same proven crappy solution?
> 
> >
> > Connection time is all we do and can care about.
> 
> You have not answered why.

We are going to disclose information to the peer based on policy that
depends on the cgroup the peer is part of. All we care for is who opened
the connection, if the peer wants to pass on that information after it
has obtained it there is nothing we can do, so connection time is all we
really care about.

Simo.

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <1394673476.32465.215.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                     ` <1394673476.32465.215.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
@ 2014-03-13  1:21                       ` Andy Lutomirski
       [not found]                         ` <CALCETrWbCuKfV4zMkZajDOGhGkKduW0C2L7uHv=vtvqhVeAc6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-13  1:21 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>>
>> >
>> > Connection time is all we do and can care about.
>>
>> You have not answered why.
>
> We are going to disclose information to the peer based on policy that
> depends on the cgroup the peer is part of. All we care for is who opened
> the connection, if the peer wants to pass on that information after it
> has obtained it there is nothing we can do, so connection time is all we
> really care about.

Can you give a realistic example?

I could say that I'd like to disclose information to processes based
on their rlimits at the time they connected, but I don't think that
would carry much weight.

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <CALCETrWbCuKfV4zMkZajDOGhGkKduW0C2L7uHv=vtvqhVeAc6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                         ` <CALCETrWbCuKfV4zMkZajDOGhGkKduW0C2L7uHv=vtvqhVeAc6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-13  1:43                           ` Simo Sorce
       [not found]                             ` <1394675038.32465.223.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Simo Sorce @ 2014-03-13  1:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >>
> >> >
> >> > Connection time is all we do and can care about.
> >>
> >> You have not answered why.
> >
> > We are going to disclose information to the peer based on policy that
> > depends on the cgroup the peer is part of. All we care for is who opened
> > the connection, if the peer wants to pass on that information after it
> > has obtained it there is nothing we can do, so connection time is all we
> > really care about.
> 
> Can you give a realistic example?
> 
> I could say that I'd like to disclose information to processes based
> on their rlimits at the time they connected, but I don't think that
> would carry much weight.

We want to be able to show different user's list from SSSD based on the
docker container that is asking for it.

This works by having libnsss_sss.so from the containerized application
connect to an SSSD daemon running on the host or in another container.

The only way to distinguish between containers "from the outside" is to
lookup the cgroup of the requesting process. It has a unique container
ID, and can therefore be mapped to the appropriate policy that will let
us decide which 'user domain' to serve to the container.

Simo.

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <1394675038.32465.223.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                             ` <1394675038.32465.223.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
@ 2014-03-13  2:12                               ` Andy Lutomirski
  2014-03-13 14:27                                 ` Vivek Goyal
       [not found]                                 ` <CALCETrV7GE0YL1sDDqTUV81rWL9zk2vTR1-VLkR7CfTA-FSrgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 2 replies; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-13  2:12 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> >>
>> >> >
>> >> > Connection time is all we do and can care about.
>> >>
>> >> You have not answered why.
>> >
>> > We are going to disclose information to the peer based on policy that
>> > depends on the cgroup the peer is part of. All we care for is who opened
>> > the connection, if the peer wants to pass on that information after it
>> > has obtained it there is nothing we can do, so connection time is all we
>> > really care about.
>>
>> Can you give a realistic example?
>>
>> I could say that I'd like to disclose information to processes based
>> on their rlimits at the time they connected, but I don't think that
>> would carry much weight.
>
> We want to be able to show different user's list from SSSD based on the
> docker container that is asking for it.
>
> This works by having libnsss_sss.so from the containerized application
> connect to an SSSD daemon running on the host or in another container.
>
> The only way to distinguish between containers "from the outside" is to
> lookup the cgroup of the requesting process. It has a unique container
> ID, and can therefore be mapped to the appropriate policy that will let
> us decide which 'user domain' to serve to the container.
>

I can think of at least three other ways to do this.

1. Fix Docker to use user namespaces and use the uid of the requesting
process via SCM_CREDENTIALS.

2. Docker is a container system, so use the "container" (aka
namespace) APIs.  There are probably several clever things that could
be done with /proc/<pid>/ns.

3. Given that Docker uses network namespaces, I assume that the socket
connection between the two sssd instances either comes from Docker
itself or uses socket inodes.  In either case, the same mechanism
should be usable for authentication.

On an unrelated note, since you seem to have found a way to get unix
sockets to connect the inside and outside of a Docker container, it
would be awesome if Docker could use the same mechanism to pass TCP
sockets around rather than playing awful games with virtual networks.

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13  2:12                               ` Andy Lutomirski
@ 2014-03-13 14:27                                 ` Vivek Goyal
       [not found]                                   ` <20140313142755.GC18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
       [not found]                                 ` <CALCETrV7GE0YL1sDDqTUV81rWL9zk2vTR1-VLkR7CfTA-FSrgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2014-03-13 14:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Simo Sorce, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

On Wed, Mar 12, 2014 at 07:12:25PM -0700, Andy Lutomirski wrote:

[..]
> >> Can you give a realistic example?
> >>
> >> I could say that I'd like to disclose information to processes based
> >> on their rlimits at the time they connected, but I don't think that
> >> would carry much weight.
> >
> > We want to be able to show different user's list from SSSD based on the
> > docker container that is asking for it.
> >
> > This works by having libnsss_sss.so from the containerized application
> > connect to an SSSD daemon running on the host or in another container.
> >
> > The only way to distinguish between containers "from the outside" is to
> > lookup the cgroup of the requesting process. It has a unique container
> > ID, and can therefore be mapped to the appropriate policy that will let
> > us decide which 'user domain' to serve to the container.
> >
> 
> I can think of at least three other ways to do this.
> 
> 1. Fix Docker to use user namespaces and use the uid of the requesting
> process via SCM_CREDENTIALS.

Using user namespaces sounds like the right way to do it (atleast
conceptually). But I think hurdle here is that people are not convinced
yet that user namespaces are secure and work well. IOW, some people
don't seem to think that user namespaces are ready yet.

I guess that's the reason people are looking for other ways to
achieve their goal.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <20140313142755.GC18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                                   ` <20140313142755.GC18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-03-14 23:54                                     ` Eric W. Biederman
  0 siblings, 0 replies; 41+ messages in thread
From: Eric W. Biederman @ 2014-03-14 23:54 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andy Lutomirski, Simo Sorce, linux-kernel@vger.kernel.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> On Wed, Mar 12, 2014 at 07:12:25PM -0700, Andy Lutomirski wrote:
>
>> I can think of at least three other ways to do this.
>> 
>> 1. Fix Docker to use user namespaces and use the uid of the requesting
>> process via SCM_CREDENTIALS.
>
> Using user namespaces sounds like the right way to do it (atleast
> conceptually). But I think hurdle here is that people are not convinced
> yet that user namespaces are secure and work well. IOW, some people
> don't seem to think that user namespaces are ready yet.

If the problem is user namespace immaturity patches or bug reports need
to be sent for user namespaces.

Containers with user namespaces (however immature they are) are much
more secure than running container with processes with uid == 0 inside
of them.  User namespaces do considerably reduce the attack surface of
what uid == 0 can do.

> I guess that's the reason people are looking for other ways to
> achieve their goal.

It seems strange to work around a feature that is 99% of the way to
solving their problem with more kernel patches.

Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <CALCETrV7GE0YL1sDDqTUV81rWL9zk2vTR1-VLkR7CfTA-FSrgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                                 ` <CALCETrV7GE0YL1sDDqTUV81rWL9zk2vTR1-VLkR7CfTA-FSrgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-13 17:51                                   ` Simo Sorce
  2014-03-13 17:55                                     ` Andy Lutomirski
  2014-03-13 18:02                                     ` Vivek Goyal
  0 siblings, 2 replies; 41+ messages in thread
From: Simo Sorce @ 2014-03-13 17:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> >>
> >> >> >
> >> >> > Connection time is all we do and can care about.
> >> >>
> >> >> You have not answered why.
> >> >
> >> > We are going to disclose information to the peer based on policy that
> >> > depends on the cgroup the peer is part of. All we care for is who opened
> >> > the connection, if the peer wants to pass on that information after it
> >> > has obtained it there is nothing we can do, so connection time is all we
> >> > really care about.
> >>
> >> Can you give a realistic example?
> >>
> >> I could say that I'd like to disclose information to processes based
> >> on their rlimits at the time they connected, but I don't think that
> >> would carry much weight.
> >
> > We want to be able to show different user's list from SSSD based on the
> > docker container that is asking for it.
> >
> > This works by having libnsss_sss.so from the containerized application
> > connect to an SSSD daemon running on the host or in another container.
> >
> > The only way to distinguish between containers "from the outside" is to
> > lookup the cgroup of the requesting process. It has a unique container
> > ID, and can therefore be mapped to the appropriate policy that will let
> > us decide which 'user domain' to serve to the container.
> >
> 
> I can think of at least three other ways to do this.
> 
> 1. Fix Docker to use user namespaces and use the uid of the requesting
> process via SCM_CREDENTIALS.

This is not practical, I have no control on what UIDs will be used
within a container, and IIRC user namespaces have severe limitations
that may make them unusable in some situations. Forcing the use of user
namespaces on docker to satisfy my use case is not in my power.

> 2. Docker is a container system, so use the "container" (aka
> namespace) APIs.  There are probably several clever things that could
> be done with /proc/<pid>/ns.

pid is racy, if it weren't I would simply go straight
to /proc/<pid>/cgroups ...

> 3. Given that Docker uses network namespaces, I assume that the socket
> connection between the two sssd instances either comes from Docker
> itself or uses socket inodes.  In either case, the same mechanism
> should be usable for authentication.

It is a unix socket, ie bind mounted on the container filesystem, not
sure network namespaces really come into the picture, and I do not know
of a racefree way of knowing what is the namespace of the peer at
connect time.
Is there a SO_PEER_NAMESPACE option ?

Simo.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13 17:51                                   ` Simo Sorce
@ 2014-03-13 17:55                                     ` Andy Lutomirski
       [not found]                                       ` <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-03-13 19:53                                       ` Vivek Goyal
  2014-03-13 18:02                                     ` Vivek Goyal
  1 sibling, 2 replies; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-13 17:55 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Vivek Goyal, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
>> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce <ssorce@redhat.com> wrote:
>> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
>> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce@redhat.com> wrote:
>> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce@redhat.com> wrote:
>> >> >>
>> >> >> >
>> >> >> > Connection time is all we do and can care about.
>> >> >>
>> >> >> You have not answered why.
>> >> >
>> >> > We are going to disclose information to the peer based on policy that
>> >> > depends on the cgroup the peer is part of. All we care for is who opened
>> >> > the connection, if the peer wants to pass on that information after it
>> >> > has obtained it there is nothing we can do, so connection time is all we
>> >> > really care about.
>> >>
>> >> Can you give a realistic example?
>> >>
>> >> I could say that I'd like to disclose information to processes based
>> >> on their rlimits at the time they connected, but I don't think that
>> >> would carry much weight.
>> >
>> > We want to be able to show different user's list from SSSD based on the
>> > docker container that is asking for it.
>> >
>> > This works by having libnsss_sss.so from the containerized application
>> > connect to an SSSD daemon running on the host or in another container.
>> >
>> > The only way to distinguish between containers "from the outside" is to
>> > lookup the cgroup of the requesting process. It has a unique container
>> > ID, and can therefore be mapped to the appropriate policy that will let
>> > us decide which 'user domain' to serve to the container.
>> >
>>
>> I can think of at least three other ways to do this.
>>
>> 1. Fix Docker to use user namespaces and use the uid of the requesting
>> process via SCM_CREDENTIALS.
>
> This is not practical, I have no control on what UIDs will be used
> within a container, and IIRC user namespaces have severe limitations
> that may make them unusable in some situations. Forcing the use of user
> namespaces on docker to satisfy my use case is not in my power.

Except that Docker w/o userns is basically completely insecure unless
selinux or apparmor is in use, so this may not matter.

>
>> 2. Docker is a container system, so use the "container" (aka
>> namespace) APIs.  There are probably several clever things that could
>> be done with /proc/<pid>/ns.
>
> pid is racy, if it weren't I would simply go straight
> to /proc/<pid>/cgroups ...

How about:

open("/proc/self/ns/ipc", O_RDONLY);
send the result over SCM_RIGHTS?

>
>> 3. Given that Docker uses network namespaces, I assume that the socket
>> connection between the two sssd instances either comes from Docker
>> itself or uses socket inodes.  In either case, the same mechanism
>> should be usable for authentication.
>
> It is a unix socket, ie bind mounted on the container filesystem, not
> sure network namespaces really come into the picture, and I do not know
> of a racefree way of knowing what is the namespace of the peer at
> connect time.
> Is there a SO_PEER_NAMESPACE option ?

So give each container its own unix socket.  Problem solved, no?

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                                       ` <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-13 17:57                                         ` Simo Sorce
       [not found]                                           ` <1394733448.32465.249.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
  2014-03-13 17:58                                         ` Simo Sorce
  1 sibling, 1 reply; 41+ messages in thread
From: Simo Sorce @ 2014-03-13 17:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
> On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
> >> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
> >> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
> >> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >> >> >>
> >> >> >> >
> >> >> >> > Connection time is all we do and can care about.
> >> >> >>
> >> >> >> You have not answered why.
> >> >> >
> >> >> > We are going to disclose information to the peer based on policy that
> >> >> > depends on the cgroup the peer is part of. All we care for is who opened
> >> >> > the connection, if the peer wants to pass on that information after it
> >> >> > has obtained it there is nothing we can do, so connection time is all we
> >> >> > really care about.
> >> >>
> >> >> Can you give a realistic example?
> >> >>
> >> >> I could say that I'd like to disclose information to processes based
> >> >> on their rlimits at the time they connected, but I don't think that
> >> >> would carry much weight.
> >> >
> >> > We want to be able to show different user's list from SSSD based on the
> >> > docker container that is asking for it.
> >> >
> >> > This works by having libnsss_sss.so from the containerized application
> >> > connect to an SSSD daemon running on the host or in another container.
> >> >
> >> > The only way to distinguish between containers "from the outside" is to
> >> > lookup the cgroup of the requesting process. It has a unique container
> >> > ID, and can therefore be mapped to the appropriate policy that will let
> >> > us decide which 'user domain' to serve to the container.
> >> >
> >>
> >> I can think of at least three other ways to do this.
> >>
> >> 1. Fix Docker to use user namespaces and use the uid of the requesting
> >> process via SCM_CREDENTIALS.
> >
> > This is not practical, I have no control on what UIDs will be used
> > within a container, and IIRC user namespaces have severe limitations
> > that may make them unusable in some situations. Forcing the use of user
> > namespaces on docker to satisfy my use case is not in my power.
> 
> Except that Docker w/o userns is basically completely insecure unless
> selinux or apparmor is in use, so this may not matter.
> 
> >
> >> 2. Docker is a container system, so use the "container" (aka
> >> namespace) APIs.  There are probably several clever things that could
> >> be done with /proc/<pid>/ns.
> >
> > pid is racy, if it weren't I would simply go straight
> > to /proc/<pid>/cgroups ...
> 
> How about:
> 
> open("/proc/self/ns/ipc", O_RDONLY);
> send the result over SCM_RIGHTS?

This needs to work with existing clients, existing clients, don't do
this.

> >> 3. Given that Docker uses network namespaces, I assume that the socket
> >> connection between the two sssd instances either comes from Docker
> >> itself or uses socket inodes.  In either case, the same mechanism
> >> should be usable for authentication.
> >
> > It is a unix socket, ie bind mounted on the container filesystem, not
> > sure network namespaces really come into the picture, and I do not know
> > of a racefree way of knowing what is the namespace of the peer at
> > connect time.
> > Is there a SO_PEER_NAMESPACE option ?
> 
> So give each container its own unix socket.  Problem solved, no?
> 
> --Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <1394733448.32465.249.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                                           ` <1394733448.32465.249.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
@ 2014-03-13 18:03                                             ` Andy Lutomirski
  0 siblings, 0 replies; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-13 18:03 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Thu, Mar 13, 2014 at 10:57 AM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
>> On Thu, Mar 13, 2014 at 10:51 AM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> > On Wed, 2014-03-12 at 19:12 -0700, Andy Lutomirski wrote:
>> >> On Wed, Mar 12, 2014 at 6:43 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> >> > On Wed, 2014-03-12 at 18:21 -0700, Andy Lutomirski wrote:
>> >> >> On Wed, Mar 12, 2014 at 6:17 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> >> >> > On Wed, 2014-03-12 at 14:19 -0700, Andy Lutomirski wrote:
>> >> >> >> On Wed, Mar 12, 2014 at 2:16 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> >> >> >>
>> >> >> >> >
>> >> >> >> > Connection time is all we do and can care about.
>> >> >> >>
>> >> >> >> You have not answered why.
>> >> >> >
>> >> >> > We are going to disclose information to the peer based on policy that
>> >> >> > depends on the cgroup the peer is part of. All we care for is who opened
>> >> >> > the connection, if the peer wants to pass on that information after it
>> >> >> > has obtained it there is nothing we can do, so connection time is all we
>> >> >> > really care about.
>> >> >>
>> >> >> Can you give a realistic example?
>> >> >>
>> >> >> I could say that I'd like to disclose information to processes based
>> >> >> on their rlimits at the time they connected, but I don't think that
>> >> >> would carry much weight.
>> >> >
>> >> > We want to be able to show different user's list from SSSD based on the
>> >> > docker container that is asking for it.
>> >> >
>> >> > This works by having libnsss_sss.so from the containerized application
>> >> > connect to an SSSD daemon running on the host or in another container.
>> >> >
>> >> > The only way to distinguish between containers "from the outside" is to
>> >> > lookup the cgroup of the requesting process. It has a unique container
>> >> > ID, and can therefore be mapped to the appropriate policy that will let
>> >> > us decide which 'user domain' to serve to the container.
>> >> >
>> >>
>> >> I can think of at least three other ways to do this.
>> >>
>> >> 1. Fix Docker to use user namespaces and use the uid of the requesting
>> >> process via SCM_CREDENTIALS.
>> >
>> > This is not practical, I have no control on what UIDs will be used
>> > within a container, and IIRC user namespaces have severe limitations
>> > that may make them unusable in some situations. Forcing the use of user
>> > namespaces on docker to satisfy my use case is not in my power.
>>
>> Except that Docker w/o userns is basically completely insecure unless
>> selinux or apparmor is in use, so this may not matter.
>>
>> >
>> >> 2. Docker is a container system, so use the "container" (aka
>> >> namespace) APIs.  There are probably several clever things that could
>> >> be done with /proc/<pid>/ns.
>> >
>> > pid is racy, if it weren't I would simply go straight
>> > to /proc/<pid>/cgroups ...
>>
>> How about:
>>
>> open("/proc/self/ns/ipc", O_RDONLY);
>> send the result over SCM_RIGHTS?
>
> This needs to work with existing clients, existing clients, don't do
> this.
>

Wait... you want completely unmodified clients in a container to talk
to a service that they don't even realize is outside the container and
for that server to magically behave differently because the container
is there?  And there's no per-container proxy involved?  And every
container is connecting to *the very same socket*?

I just can't imagine this working well regardless if what magic socket
options you add, especially if user namespaces aren't in use.

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                                       ` <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-03-13 17:57                                         ` Simo Sorce
@ 2014-03-13 17:58                                         ` Simo Sorce
  2014-03-13 18:01                                           ` Andy Lutomirski
  2014-03-13 18:05                                           ` Tim Hockin
  1 sibling, 2 replies; 41+ messages in thread
From: Simo Sorce @ 2014-03-13 17:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
> 
> So give each container its own unix socket.  Problem solved, no?

Not really practical if you have hundreds of containers.

Simo.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13 17:58                                         ` Simo Sorce
@ 2014-03-13 18:01                                           ` Andy Lutomirski
  2014-03-13 18:05                                           ` Tim Hockin
  1 sibling, 0 replies; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-13 18:01 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Vivek Goyal, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

On Thu, Mar 13, 2014 at 10:58 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
>>
>> So give each container its own unix socket.  Problem solved, no?
>
> Not really practical if you have hundreds of containers.

I don't see the problem.  Sockets are cheap.

>
> Simo.
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13 17:58                                         ` Simo Sorce
  2014-03-13 18:01                                           ` Andy Lutomirski
@ 2014-03-13 18:05                                           ` Tim Hockin
  1 sibling, 0 replies; 41+ messages in thread
From: Tim Hockin @ 2014-03-13 18:05 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Andy Lutomirski, Vivek Goyal, linux-kernel@vger.kernel.org,
	cgroups, Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

I don't buy that it is not practical.  Not convenient, maybe.  Not
clean, sure.  But it is practical - it uses mechanisms that exist on
all kernels today.  That is a win, to me.

On Thu, Mar 13, 2014 at 10:58 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-03-13 at 10:55 -0700, Andy Lutomirski wrote:
>>
>> So give each container its own unix socket.  Problem solved, no?
>
> Not really practical if you have hundreds of containers.
>
> Simo.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13 17:55                                     ` Andy Lutomirski
       [not found]                                       ` <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-13 19:53                                       ` Vivek Goyal
  2014-03-13 19:58                                         ` Andy Lutomirski
  1 sibling, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2014-03-13 19:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Simo Sorce, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:

[..]
> >> 2. Docker is a container system, so use the "container" (aka
> >> namespace) APIs.  There are probably several clever things that could
> >> be done with /proc/<pid>/ns.
> >
> > pid is racy, if it weren't I would simply go straight
> > to /proc/<pid>/cgroups ...
> 
> How about:
> 
> open("/proc/self/ns/ipc", O_RDONLY);
> send the result over SCM_RIGHTS?

As I don't know I will ask. So what will server now do with this file
descriptor of client's ipc namespace.

IOW, what information/identifier does it contain which can be
used to map to pre-configrued per container/per namespace policies.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13 19:53                                       ` Vivek Goyal
@ 2014-03-13 19:58                                         ` Andy Lutomirski
  2014-03-13 20:06                                           ` Vivek Goyal
  0 siblings, 1 reply; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-13 19:58 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Simo Sorce, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
>
> [..]
>> >> 2. Docker is a container system, so use the "container" (aka
>> >> namespace) APIs.  There are probably several clever things that could
>> >> be done with /proc/<pid>/ns.
>> >
>> > pid is racy, if it weren't I would simply go straight
>> > to /proc/<pid>/cgroups ...
>>
>> How about:
>>
>> open("/proc/self/ns/ipc", O_RDONLY);
>> send the result over SCM_RIGHTS?
>
> As I don't know I will ask. So what will server now do with this file
> descriptor of client's ipc namespace.
>
> IOW, what information/identifier does it contain which can be
> used to map to pre-configrued per container/per namespace policies.

Inode number, which will match that assigned to the container at runtime.

(I'm not sure this is a great idea -- there's no convention that "I
have an fd for a namespace" means "I'm a daemon in that namespace".)

--Andy

>
> Thanks
> Vivek



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13 19:58                                         ` Andy Lutomirski
@ 2014-03-13 20:06                                           ` Vivek Goyal
       [not found]                                             ` <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2014-03-13 20:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Simo Sorce, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
> >
> > [..]
> >> >> 2. Docker is a container system, so use the "container" (aka
> >> >> namespace) APIs.  There are probably several clever things that could
> >> >> be done with /proc/<pid>/ns.
> >> >
> >> > pid is racy, if it weren't I would simply go straight
> >> > to /proc/<pid>/cgroups ...
> >>
> >> How about:
> >>
> >> open("/proc/self/ns/ipc", O_RDONLY);
> >> send the result over SCM_RIGHTS?
> >
> > As I don't know I will ask. So what will server now do with this file
> > descriptor of client's ipc namespace.
> >
> > IOW, what information/identifier does it contain which can be
> > used to map to pre-configrued per container/per namespace policies.
> 
> Inode number, which will match that assigned to the container at runtime.
> 

But what would I do with this inode number. I am assuming this is
generated dynamically when respective namespace was created. To me
this is like assigning a pid dynamically and one does not create
policies in user space based on pid. Similarly I will not be able
to create policies based on an inode number which is generated
dynamically.

For it to be useful, it should map to something more static which
user space understands.

Thanks
Vivek 

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                                             ` <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-03-13 20:17                                               ` Vivek Goyal
       [not found]                                                 ` <20140313201755.GO18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-03-13 21:21                                               ` Andy Lutomirski
  2014-03-14 23:49                                               ` Eric W. Biederman
  2 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2014-03-13 20:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Simo Sorce, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Thu, Mar 13, 2014 at 04:06:49PM -0400, Vivek Goyal wrote:
> On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
> > On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
> > >
> > > [..]
> > >> >> 2. Docker is a container system, so use the "container" (aka
> > >> >> namespace) APIs.  There are probably several clever things that could
> > >> >> be done with /proc/<pid>/ns.
> > >> >
> > >> > pid is racy, if it weren't I would simply go straight
> > >> > to /proc/<pid>/cgroups ...
> > >>
> > >> How about:
> > >>
> > >> open("/proc/self/ns/ipc", O_RDONLY);
> > >> send the result over SCM_RIGHTS?
> > >
> > > As I don't know I will ask. So what will server now do with this file
> > > descriptor of client's ipc namespace.
> > >
> > > IOW, what information/identifier does it contain which can be
> > > used to map to pre-configrued per container/per namespace policies.
> > 
> > Inode number, which will match that assigned to the container at runtime.
> > 
> 
> But what would I do with this inode number. I am assuming this is
> generated dynamically when respective namespace was created. To me
> this is like assigning a pid dynamically and one does not create
> policies in user space based on pid. Similarly I will not be able
> to create policies based on an inode number which is generated
> dynamically.
> 
> For it to be useful, it should map to something more static which
> user space understands.

Or could we do following.

open("/proc/self/cgroup", O_RDONLY);
send the result over SCM_RIGHTS

But this requires client modification.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <20140313201755.GO18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                                                 ` <20140313201755.GO18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-03-13 20:19                                                   ` Vivek Goyal
  0 siblings, 0 replies; 41+ messages in thread
From: Vivek Goyal @ 2014-03-13 20:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Simo Sorce, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Thu, Mar 13, 2014 at 04:17:55PM -0400, Vivek Goyal wrote:
> On Thu, Mar 13, 2014 at 04:06:49PM -0400, Vivek Goyal wrote:
> > On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
> > > On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > > > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
> > > >
> > > > [..]
> > > >> >> 2. Docker is a container system, so use the "container" (aka
> > > >> >> namespace) APIs.  There are probably several clever things that could
> > > >> >> be done with /proc/<pid>/ns.
> > > >> >
> > > >> > pid is racy, if it weren't I would simply go straight
> > > >> > to /proc/<pid>/cgroups ...
> > > >>
> > > >> How about:
> > > >>
> > > >> open("/proc/self/ns/ipc", O_RDONLY);
> > > >> send the result over SCM_RIGHTS?
> > > >
> > > > As I don't know I will ask. So what will server now do with this file
> > > > descriptor of client's ipc namespace.
> > > >
> > > > IOW, what information/identifier does it contain which can be
> > > > used to map to pre-configrued per container/per namespace policies.
> > > 
> > > Inode number, which will match that assigned to the container at runtime.
> > > 
> > 
> > But what would I do with this inode number. I am assuming this is
> > generated dynamically when respective namespace was created. To me
> > this is like assigning a pid dynamically and one does not create
> > policies in user space based on pid. Similarly I will not be able
> > to create policies based on an inode number which is generated
> > dynamically.
> > 
> > For it to be useful, it should map to something more static which
> > user space understands.
> 
> Or could we do following.
> 
> open("/proc/self/cgroup", O_RDONLY);
> send the result over SCM_RIGHTS

I guess that would not work. Client should be able to create a file,
fake cgroup information and send fd.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                                             ` <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-03-13 20:17                                               ` Vivek Goyal
@ 2014-03-13 21:21                                               ` Andy Lutomirski
  2014-03-14 23:49                                               ` Eric W. Biederman
  2 siblings, 0 replies; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-13 21:21 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Simo Sorce, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Thu, Mar 13, 2014 at 1:06 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
>> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
>> >
>> > [..]
>> >> >> 2. Docker is a container system, so use the "container" (aka
>> >> >> namespace) APIs.  There are probably several clever things that could
>> >> >> be done with /proc/<pid>/ns.
>> >> >
>> >> > pid is racy, if it weren't I would simply go straight
>> >> > to /proc/<pid>/cgroups ...
>> >>
>> >> How about:
>> >>
>> >> open("/proc/self/ns/ipc", O_RDONLY);
>> >> send the result over SCM_RIGHTS?
>> >
>> > As I don't know I will ask. So what will server now do with this file
>> > descriptor of client's ipc namespace.
>> >
>> > IOW, what information/identifier does it contain which can be
>> > used to map to pre-configrued per container/per namespace policies.
>>
>> Inode number, which will match that assigned to the container at runtime.
>>
>
> But what would I do with this inode number. I am assuming this is
> generated dynamically when respective namespace was created. To me
> this is like assigning a pid dynamically and one does not create
> policies in user space based on pid. Similarly I will not be able
> to create policies based on an inode number which is generated
> dynamically.
>
> For it to be useful, it should map to something more static which
> user space understands.

Like what?  I imagine that, at best, sssd will be hardcoding some
understanding of Docker's cgroup names.  As an alternative, it could
ask Docker for a uid or an inode number of something else -- it's
hardcoding an understanding of Docker anyway.  And Docker needs to
cooperate regardless, since otherwise it could change its cgroup
naming or stop using cgroups entirely.

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                                             ` <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2014-03-13 20:17                                               ` Vivek Goyal
  2014-03-13 21:21                                               ` Andy Lutomirski
@ 2014-03-14 23:49                                               ` Eric W. Biederman
  2 siblings, 0 replies; 41+ messages in thread
From: Eric W. Biederman @ 2014-03-14 23:49 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andy Lutomirski, Simo Sorce, linux-kernel@vger.kernel.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> writes:

> On Thu, Mar 13, 2014 at 12:58:14PM -0700, Andy Lutomirski wrote:
>> On Thu, Mar 13, 2014 at 12:53 PM, Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> > On Thu, Mar 13, 2014 at 10:55:16AM -0700, Andy Lutomirski wrote:
>> >
>> > [..]
>> >> >> 2. Docker is a container system, so use the "container" (aka
>> >> >> namespace) APIs.  There are probably several clever things that could
>> >> >> be done with /proc/<pid>/ns.
>> >> >
>> >> > pid is racy, if it weren't I would simply go straight
>> >> > to /proc/<pid>/cgroups ...
>> >>
>> >> How about:
>> >>
>> >> open("/proc/self/ns/ipc", O_RDONLY);
>> >> send the result over SCM_RIGHTS?
>> >
>> > As I don't know I will ask. So what will server now do with this file
>> > descriptor of client's ipc namespace.
>> >
>> > IOW, what information/identifier does it contain which can be
>> > used to map to pre-configrued per container/per namespace policies.
>> 
>> Inode number, which will match that assigned to the container at runtime.
>> 
>
> But what would I do with this inode number. I am assuming this is
> generated dynamically when respective namespace was created. To me
> this is like assigning a pid dynamically and one does not create
> policies in user space based on pid. Similarly I will not be able
> to create policies based on an inode number which is generated
> dynamically.
>
> For it to be useful, it should map to something more static which
> user space understands.

But the mapping can be done in userspace.  stat all of the namespaces
you care about, get their inode numbers, and then do a lookup.

Hard coding string based names in the kernel the way cgroups does is
really pretty terrible and it seriously limits the flexibility of the
api, and so far breaks nested containers.

Eric

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13 17:51                                   ` Simo Sorce
  2014-03-13 17:55                                     ` Andy Lutomirski
@ 2014-03-13 18:02                                     ` Vivek Goyal
  1 sibling, 0 replies; 41+ messages in thread
From: Vivek Goyal @ 2014-03-13 18:02 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Andy Lutomirski, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

On Thu, Mar 13, 2014 at 01:51:17PM -0400, Simo Sorce wrote:

[..]
> > 1. Fix Docker to use user namespaces and use the uid of the requesting
> > process via SCM_CREDENTIALS.
> 
> This is not practical, I have no control on what UIDs will be used
> within a container,

I guess uid to container mapping has to be managed by somebody, say systemd.
Then there systemd should export an API to query the container a uid is
mapped into. So that should not be the real problem.

> and IIRC user namespaces have severe limitations
> that may make them unusable in some situations. Forcing the use of user
> namespaces on docker to satisfy my use case is not in my power.

I think that's the real practical problem. Adoption of user name space.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-12 21:12       ` Andy Lutomirski
       [not found]         ` <CALCETrUTjN=XKwnO62P9roZtLLuk7_9Oi17Mj_Aa2ZFtZc1gVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-03-13 14:14         ` Vivek Goyal
       [not found]           ` <20140313141422.GB18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2014-03-13 14:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel@vger.kernel.org, cgroups, Network Development,
	David S. Miller, Tejun Heo, ssorce, jkaluza, lpoetter, kay

On Wed, Mar 12, 2014 at 02:12:33PM -0700, Andy Lutomirski wrote:
> On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> >> cgroup of first mounted hierarchy of the task. For the case of client,
> >> it represents the cgroup of client at the time of opening the connection.
> >> After that client cgroup might change.
> >
> > Even if people decide that sending cgroups over a unix socket is a good
> > idea, this API has my NAK in the strongest possible sense, for whatever
> > my NAK is worth.
> >
> > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> > *never* imply the use of a credential.  A program should always have to
> > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> >
> > (I've found privilege escalations before based on this observation, and
> > I suspect I'll find them again.)
> >
> >
> > Note that I think that you really want SCM_SOMETHING_ELSE and not
> > SCM_CGROUP, but I don't know what the use case is yet.
> 
> This might not be quite as awful as I thought.  At least you're
> looking up the cgroup at connection time instead of at send time.
> 
> OTOH, this is still racy -- the socket could easily outlive the cgroup
> that created it.

That's a good point. What guarantees that previous cgroup was not
reassigned to a different container.

What if a process A opens the connection with sssd. Process A passes the
file descriptor to a different process B in a differnt container.
Process A exits. Container gets removed from system and new one gets
launched which uses same cgroup as old one. Now process B sends a new
request and SSSD will serve it based on policy of newly launched
container.

This sounds very similar to pid race where socket/connection will outlive
the pid.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <20140313141422.GB18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]           ` <20140313141422.GB18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-03-13 14:55             ` Simo Sorce
       [not found]               ` <1394722534.32465.227.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Simo Sorce @ 2014-03-13 14:55 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andy Lutomirski,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Thu, 2014-03-13 at 10:14 -0400, Vivek Goyal wrote:
> On Wed, Mar 12, 2014 at 02:12:33PM -0700, Andy Lutomirski wrote:
> > On Wed, Mar 12, 2014 at 2:00 PM, Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> wrote:
> > > On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> > >> Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
> > >> cgroup of first mounted hierarchy of the task. For the case of client,
> > >> it represents the cgroup of client at the time of opening the connection.
> > >> After that client cgroup might change.
> > >
> > > Even if people decide that sending cgroups over a unix socket is a good
> > > idea, this API has my NAK in the strongest possible sense, for whatever
> > > my NAK is worth.
> > >
> > > IMO SO_PEERCRED is a disaster.  Calling send(2) or write(2) should
> > > *never* imply the use of a credential.  A program should always have to
> > > *explicitly* request use of a credential.  What you want is SCM_CGROUP.
> > >
> > > (I've found privilege escalations before based on this observation, and
> > > I suspect I'll find them again.)
> > >
> > >
> > > Note that I think that you really want SCM_SOMETHING_ELSE and not
> > > SCM_CGROUP, but I don't know what the use case is yet.
> > 
> > This might not be quite as awful as I thought.  At least you're
> > looking up the cgroup at connection time instead of at send time.
> > 
> > OTOH, this is still racy -- the socket could easily outlive the cgroup
> > that created it.
> 
> That's a good point. What guarantees that previous cgroup was not
> reassigned to a different container.
> 
> What if a process A opens the connection with sssd. Process A passes the
> file descriptor to a different process B in a differnt container.

Stop right here.
If the process passes the fd it is not my problem anymore.
The process can as well just 'proxy' all the information to another
process.

We just care to properly identify the 'original' container, we are not
in the business of detecting malicious behavior. That's something other
mechanism need to protect against (SELinux or other LSMs, normal
permissions, capabilities, etc...).

> Process A exits. Container gets removed from system and new one gets
> launched which uses same cgroup as old one. Now process B sends a new
> request and SSSD will serve it based on policy of newly launched
> container.
> 
> This sounds very similar to pid race where socket/connection will outlive
> the pid.

Nope, completely different.

Simo.

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <1394722534.32465.227.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]               ` <1394722534.32465.227.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
@ 2014-03-13 15:00                 ` Vivek Goyal
       [not found]                   ` <20140313150034.GG18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2014-03-13 15:00 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Andy Lutomirski,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:

[..]
> > > This might not be quite as awful as I thought.  At least you're
> > > looking up the cgroup at connection time instead of at send time.
> > > 
> > > OTOH, this is still racy -- the socket could easily outlive the cgroup
> > > that created it.
> > 
> > That's a good point. What guarantees that previous cgroup was not
> > reassigned to a different container.
> > 
> > What if a process A opens the connection with sssd. Process A passes the
> > file descriptor to a different process B in a differnt container.
> 
> Stop right here.
> If the process passes the fd it is not my problem anymore.
> The process can as well just 'proxy' all the information to another
> process.
> 
> We just care to properly identify the 'original' container, we are not
> in the business of detecting malicious behavior. That's something other
> mechanism need to protect against (SELinux or other LSMs, normal
> permissions, capabilities, etc...).
> 
> > Process A exits. Container gets removed from system and new one gets
> > launched which uses same cgroup as old one. Now process B sends a new
> > request and SSSD will serve it based on policy of newly launched
> > container.
> > 
> > This sounds very similar to pid race where socket/connection will outlive
> > the pid.
> 
> Nope, completely different.
> 

I think you missed my point. Passing file descriptor is not the problem.
Problem is reuse of same cgroup name for a different container while
socket lives on. And it is same race as reuse of a pid for a different
process.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <20140313150034.GG18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
       [not found]                   ` <20140313150034.GG18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-03-13 16:33                     ` Simo Sorce
  2014-03-13 17:25                       ` Andy Lutomirski
  0 siblings, 1 reply; 41+ messages in thread
From: Simo Sorce @ 2014-03-13 16:33 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andy Lutomirski,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
> 
> [..]
> > > > This might not be quite as awful as I thought.  At least you're
> > > > looking up the cgroup at connection time instead of at send time.
> > > > 
> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup
> > > > that created it.
> > > 
> > > That's a good point. What guarantees that previous cgroup was not
> > > reassigned to a different container.
> > > 
> > > What if a process A opens the connection with sssd. Process A passes the
> > > file descriptor to a different process B in a differnt container.
> > 
> > Stop right here.
> > If the process passes the fd it is not my problem anymore.
> > The process can as well just 'proxy' all the information to another
> > process.
> > 
> > We just care to properly identify the 'original' container, we are not
> > in the business of detecting malicious behavior. That's something other
> > mechanism need to protect against (SELinux or other LSMs, normal
> > permissions, capabilities, etc...).
> > 
> > > Process A exits. Container gets removed from system and new one gets
> > > launched which uses same cgroup as old one. Now process B sends a new
> > > request and SSSD will serve it based on policy of newly launched
> > > container.
> > > 
> > > This sounds very similar to pid race where socket/connection will outlive
> > > the pid.
> > 
> > Nope, completely different.
> > 
> 
> I think you missed my point. Passing file descriptor is not the problem.
> Problem is reuse of same cgroup name for a different container while
> socket lives on. And it is same race as reuse of a pid for a different
> process.

The cgroup name should not be reused of course, if userspace does that,
it is userspace's issue. cgroup names are not a constrained namespace
like pids which force the kernel to reuse them for processes of a
different nature.

Simo.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13 16:33                     ` Simo Sorce
@ 2014-03-13 17:25                       ` Andy Lutomirski
  2014-03-13 17:55                         ` Simo Sorce
  2014-03-13 17:56                         ` Tim Hockin
  0 siblings, 2 replies; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-13 17:25 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Vivek Goyal, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
> On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
>> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
>>
>> [..]
>> > > > This might not be quite as awful as I thought.  At least you're
>> > > > looking up the cgroup at connection time instead of at send time.
>> > > >
>> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup
>> > > > that created it.
>> > >
>> > > That's a good point. What guarantees that previous cgroup was not
>> > > reassigned to a different container.
>> > >
>> > > What if a process A opens the connection with sssd. Process A passes the
>> > > file descriptor to a different process B in a differnt container.
>> >
>> > Stop right here.
>> > If the process passes the fd it is not my problem anymore.
>> > The process can as well just 'proxy' all the information to another
>> > process.
>> >
>> > We just care to properly identify the 'original' container, we are not
>> > in the business of detecting malicious behavior. That's something other
>> > mechanism need to protect against (SELinux or other LSMs, normal
>> > permissions, capabilities, etc...).
>> >
>> > > Process A exits. Container gets removed from system and new one gets
>> > > launched which uses same cgroup as old one. Now process B sends a new
>> > > request and SSSD will serve it based on policy of newly launched
>> > > container.
>> > >
>> > > This sounds very similar to pid race where socket/connection will outlive
>> > > the pid.
>> >
>> > Nope, completely different.
>> >
>>
>> I think you missed my point. Passing file descriptor is not the problem.
>> Problem is reuse of same cgroup name for a different container while
>> socket lives on. And it is same race as reuse of a pid for a different
>> process.
>
> The cgroup name should not be reused of course, if userspace does that,
> it is userspace's issue. cgroup names are not a constrained namespace
> like pids which force the kernel to reuse them for processes of a
> different nature.
>

You're proposing a feature that will enshrine cgroups into the API use
by non-cgroup-controlling applications.  I don't think that anyone
thinks that cgroups are pretty, so this is an unfortunate thing to
have to do.

I've suggested three different ways that your goal could be achieved
without using cgroups at all.  You haven't really addressed any of
them.

In order for something like this to go into the kernel, I would expect
a real use case and a justification for why this is the right way to
do it.

"Docker containers can be identified by cgroup path" is completely
unconvincing to me.

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13 17:25                       ` Andy Lutomirski
@ 2014-03-13 17:55                         ` Simo Sorce
  2014-03-13 17:56                         ` Tim Hockin
  1 sibling, 0 replies; 41+ messages in thread
From: Simo Sorce @ 2014-03-13 17:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Vivek Goyal, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

On Thu, 2014-03-13 at 10:25 -0700, Andy Lutomirski wrote:
> On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
> > On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
> >> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
> >>
> >> [..]
> >> > > > This might not be quite as awful as I thought.  At least you're
> >> > > > looking up the cgroup at connection time instead of at send time.
> >> > > >
> >> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup
> >> > > > that created it.
> >> > >
> >> > > That's a good point. What guarantees that previous cgroup was not
> >> > > reassigned to a different container.
> >> > >
> >> > > What if a process A opens the connection with sssd. Process A passes the
> >> > > file descriptor to a different process B in a differnt container.
> >> >
> >> > Stop right here.
> >> > If the process passes the fd it is not my problem anymore.
> >> > The process can as well just 'proxy' all the information to another
> >> > process.
> >> >
> >> > We just care to properly identify the 'original' container, we are not
> >> > in the business of detecting malicious behavior. That's something other
> >> > mechanism need to protect against (SELinux or other LSMs, normal
> >> > permissions, capabilities, etc...).
> >> >
> >> > > Process A exits. Container gets removed from system and new one gets
> >> > > launched which uses same cgroup as old one. Now process B sends a new
> >> > > request and SSSD will serve it based on policy of newly launched
> >> > > container.
> >> > >
> >> > > This sounds very similar to pid race where socket/connection will outlive
> >> > > the pid.
> >> >
> >> > Nope, completely different.
> >> >
> >>
> >> I think you missed my point. Passing file descriptor is not the problem.
> >> Problem is reuse of same cgroup name for a different container while
> >> socket lives on. And it is same race as reuse of a pid for a different
> >> process.
> >
> > The cgroup name should not be reused of course, if userspace does that,
> > it is userspace's issue. cgroup names are not a constrained namespace
> > like pids which force the kernel to reuse them for processes of a
> > different nature.
> >
> 
> You're proposing a feature that will enshrine cgroups into the API use
> by non-cgroup-controlling applications.  I don't think that anyone
> thinks that cgroups are pretty, so this is an unfortunate thing to
> have to do.
> 
> I've suggested three different ways that your goal could be achieved
> without using cgroups at all.  You haven't really addressed any of
> them.

I replied now, none of them strike me as practical or something that can
be enforced.

> In order for something like this to go into the kernel, I would expect
> a real use case and a justification for why this is the right way to
> do it.

I think my justification is quite real, the fact you do not like it does
not really make it any less real.

I am open to suggestions on alternative methods of course, I do not care
which way as long as it is practical and does not cause unreasonable
restrictions on the containerization. As far as I could see all of the
container stuff uses cgroups already for various reasons, so using
cgroups seem natural.

> "Docker containers can be identified by cgroup path" is completely
> unconvincing to me.

Provide an alternative, so far there is a cgroup with a unique name
associated to every container, I haven't found any other way to derive
that information in a race free way so far.

Simo.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-13 17:25                       ` Andy Lutomirski
  2014-03-13 17:55                         ` Simo Sorce
@ 2014-03-13 17:56                         ` Tim Hockin
  1 sibling, 0 replies; 41+ messages in thread
From: Tim Hockin @ 2014-03-13 17:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Simo Sorce, Vivek Goyal, linux-kernel@vger.kernel.org, cgroups,
	Network Development, David S. Miller, Tejun Heo, jkaluza,
	lpoetter, kay

In some sense a cgroup is a pgrp that mere mortals can't escape.  Why
not just do something like that?  root can set this "container id" or
"job id" on your process when it first starts (e.g. docker sets it on
your container process) or even make a cgroup that sets this for all
processes in that cgroup.

ints are better than strings anyway.

On Thu, Mar 13, 2014 at 10:25 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Thu, Mar 13, 2014 at 9:33 AM, Simo Sorce <ssorce@redhat.com> wrote:
>> On Thu, 2014-03-13 at 11:00 -0400, Vivek Goyal wrote:
>>> On Thu, Mar 13, 2014 at 10:55:34AM -0400, Simo Sorce wrote:
>>>
>>> [..]
>>> > > > This might not be quite as awful as I thought.  At least you're
>>> > > > looking up the cgroup at connection time instead of at send time.
>>> > > >
>>> > > > OTOH, this is still racy -- the socket could easily outlive the cgroup
>>> > > > that created it.
>>> > >
>>> > > That's a good point. What guarantees that previous cgroup was not
>>> > > reassigned to a different container.
>>> > >
>>> > > What if a process A opens the connection with sssd. Process A passes the
>>> > > file descriptor to a different process B in a differnt container.
>>> >
>>> > Stop right here.
>>> > If the process passes the fd it is not my problem anymore.
>>> > The process can as well just 'proxy' all the information to another
>>> > process.
>>> >
>>> > We just care to properly identify the 'original' container, we are not
>>> > in the business of detecting malicious behavior. That's something other
>>> > mechanism need to protect against (SELinux or other LSMs, normal
>>> > permissions, capabilities, etc...).
>>> >
>>> > > Process A exits. Container gets removed from system and new one gets
>>> > > launched which uses same cgroup as old one. Now process B sends a new
>>> > > request and SSSD will serve it based on policy of newly launched
>>> > > container.
>>> > >
>>> > > This sounds very similar to pid race where socket/connection will outlive
>>> > > the pid.
>>> >
>>> > Nope, completely different.
>>> >
>>>
>>> I think you missed my point. Passing file descriptor is not the problem.
>>> Problem is reuse of same cgroup name for a different container while
>>> socket lives on. And it is same race as reuse of a pid for a different
>>> process.
>>
>> The cgroup name should not be reused of course, if userspace does that,
>> it is userspace's issue. cgroup names are not a constrained namespace
>> like pids which force the kernel to reuse them for processes of a
>> different nature.
>>
>
> You're proposing a feature that will enshrine cgroups into the API use
> by non-cgroup-controlling applications.  I don't think that anyone
> thinks that cgroups are pretty, so this is an unfortunate thing to
> have to do.
>
> I've suggested three different ways that your goal could be achieved
> without using cgroups at all.  You haven't really addressed any of
> them.
>
> In order for something like this to go into the kernel, I would expect
> a real use case and a justification for why this is the right way to
> do it.
>
> "Docker containers can be identified by cgroup path" is completely
> unconvincing to me.
>
> --Andy
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <1394657163-7472-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer
       [not found] ` <1394657163-7472-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2014-03-12 20:56   ` Andy Lutomirski
  2014-03-12 20:59     ` Simo Sorce
  0 siblings, 1 reply; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-12 20:56 UTC (permalink / raw)
  To: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, tj-DgEjT+Ai2ygdnm+yROfE0A
  Cc: ssorce-H+wXaHxf7aLQT0dZR+AlfA, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> Hi,
> 
> This is V2 of patches. Fixed the function format issue and also I was using
> CONFIG_CGROUP instead of CONFIG_CGROUPS. That led to crash at boot. Fixed that.
> 
> Some applications like sssd want to know the cgroup of connected peer over
> unix stream socket. They want to use this information to map the cgroup to 
> the container client belongs to and then decide what kind of policies apply
> on the container.
> 

Can you explain what the use case is?

My a priori opinion is that this is a terrible idea.  cgroups are a
nasty interface, and letting knowledge of cgroups leak into the programs
that live in the groups (as opposed to the cgroup manager) seems like a
huge mistake to me.

If you want to know where in the process hierarchy a message sender is,
add *that* and figure out how to fix the races (it shouldn't be that hard).

--Andy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer
  2014-03-12 20:56   ` [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer Andy Lutomirski
@ 2014-03-12 20:59     ` Simo Sorce
       [not found]       ` <1394657970.32465.200.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Simo Sorce @ 2014-03-12 20:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Vivek Goyal, linux-kernel, cgroups, netdev, davem, tj, jkaluza,
	lpoetter, kay

On Wed, 2014-03-12 at 13:56 -0700, Andy Lutomirski wrote:
> On 03/12/2014 01:46 PM, Vivek Goyal wrote:
> > Hi,
> > 
> > This is V2 of patches. Fixed the function format issue and also I was using
> > CONFIG_CGROUP instead of CONFIG_CGROUPS. That led to crash at boot. Fixed that.
> > 
> > Some applications like sssd want to know the cgroup of connected peer over
> > unix stream socket. They want to use this information to map the cgroup to 
> > the container client belongs to and then decide what kind of policies apply
> > on the container.
> > 
> 
> Can you explain what the use case is?

External programs contacted from inside a container want to know 'who'
is contacting them. Whee 'who' is determined by the cgroup their put in.
This way these external programs can apply appropriate policy associated
with the specific 'marking' cgroup.

> My a priori opinion is that this is a terrible idea.  cgroups are a
> nasty interface, and letting knowledge of cgroups leak into the programs
> that live in the groups (as opposed to the cgroup manager) seems like a
> huge mistake to me.

I am not sure where you are going, the program that want's to know about
the cgroup is outside the group.

> If you want to know where in the process hierarchy a message sender is,
> add *that* and figure out how to fix the races (it shouldn't be that hard).

What is *that* here ?

Simo.

^ permalink raw reply	[flat|nested] 41+ messages in thread

[parent not found: <1394657970.32465.200.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>]

* Re: [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer
       [not found]       ` <1394657970.32465.200.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
@ 2014-03-12 21:09         ` Andy Lutomirski
  0 siblings, 0 replies; 41+ messages in thread
From: Andy Lutomirski @ 2014-03-12 21:09 UTC (permalink / raw)
  To: Simo Sorce
  Cc: Vivek Goyal, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA, Network Development,
	David S. Miller, Tejun Heo, jkaluza-H+wXaHxf7aLQT0dZR+AlfA,
	lpoetter-H+wXaHxf7aLQT0dZR+AlfA, kay-H+wXaHxf7aLQT0dZR+AlfA

On Wed, Mar 12, 2014 at 1:59 PM, Simo Sorce <ssorce-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Wed, 2014-03-12 at 13:56 -0700, Andy Lutomirski wrote:
>> On 03/12/2014 01:46 PM, Vivek Goyal wrote:
>> > Hi,
>> >
>> > This is V2 of patches. Fixed the function format issue and also I was using
>> > CONFIG_CGROUP instead of CONFIG_CGROUPS. That led to crash at boot. Fixed that.
>> >
>> > Some applications like sssd want to know the cgroup of connected peer over
>> > unix stream socket. They want to use this information to map the cgroup to
>> > the container client belongs to and then decide what kind of policies apply
>> > on the container.
>> >
>>
>> Can you explain what the use case is?
>
> External programs contacted from inside a container want to know 'who'
> is contacting them. Whee 'who' is determined by the cgroup their put in.
> This way these external programs can apply appropriate policy associated
> with the specific 'marking' cgroup.
>
>> My a priori opinion is that this is a terrible idea.  cgroups are a
>> nasty interface, and letting knowledge of cgroups leak into the programs
>> that live in the groups (as opposed to the cgroup manager) seems like a
>> huge mistake to me.
>
> I am not sure where you are going, the program that want's to know about
> the cgroup is outside the group.
>
>> If you want to know where in the process hierarchy a message sender is,
>> add *that* and figure out how to fix the races (it shouldn't be that hard).
>
> What is *that* here ?

It sounds like your use case is:

systemd shoves a service in a cgroup.  Its children stay in that
cgroup.  One of those children sends a message back to systemd or
something that knows about systemd's use of cgroups and wants to
identify which service it is.

Now imagine that you're using a non-systemd cgroup controller, or you
have more than one cgroup hierarchy, or you have two services that
want to share a cgroup.  Or imagine that you're totally happy with
systemd but that you want to use this same facility from something
unprivileged.

So let's rethink this.  There's already SCM_CREDENTIALS for sending
pid, but using pid there is inherently racy.  If that race were fixed
and there were a clean way to look up with process subtree or service
a pid lives in, then I think this would solve your problem.  No
cgroups needed.

--Andy

>
> Simo.
>
>
>



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/2] net: Implement SO_PEERCGROUP to get cgroup of peer
@ 2014-03-12 18:45 Vivek Goyal
  2014-03-12 18:45 ` [PATCH 2/2] net: Implement SO_PEERCGROUP Vivek Goyal
  0 siblings, 1 reply; 41+ messages in thread
From: Vivek Goyal @ 2014-03-12 18:45 UTC (permalink / raw)
  To: linux-kernel, cgroups, netdev, davem, tj
  Cc: ssorce, jkaluza, lpoetter, kay, Vivek Goyal

Some applications like sssd want to know the cgroup of connected peer over
unix stream socket. They want to use this information to map the the
container client belongs to and then decide what kind of policies apply
on the container.

Well why not use SO_PEERCRED, extract pid from it and lookup in
/proc/pid/cgroup to figure out cgroup of client. Problem there is that it
is racy. By the time we look up in /proc, it might happen that client
exited (possibly after handing over socket fd to a child), and client pid
can possibly be assigned to another process. That's the reason people are
looking for more reliable mechanism.

There are others like journald who want similar information over unix
datagram sockets. A patchset to provide that functionality was posted 
here.

https://lkml.org/lkml/2014/1/13/43

But this was rejected because of overhead it will cause for rest of the
cases.

https://lkml.org/lkml/2014/1/15/480

This patch series implements SO_PEERCGROUP, which gives more connection
based and gives the cgroup of client at the time of opening the connection.
So overhead is involved only during connection setup and there should not
be any overhead after that.

So it does not solve all the use cases out there but can solve the needs
of sssd. Hence I am posting this patch.

Please consider it for inclusion.

Thanks
Vivek

Vivek Goyal (2):
  cgroup: Provide empty definition of task_cgroup_path()
  net: Implement SO_PEERCGROUP

 arch/alpha/include/uapi/asm/socket.h   |  1 +
 arch/avr32/include/uapi/asm/socket.h   |  1 +
 arch/cris/include/uapi/asm/socket.h    |  2 ++
 arch/frv/include/uapi/asm/socket.h     |  1 +
 arch/ia64/include/uapi/asm/socket.h    |  2 ++
 arch/m32r/include/uapi/asm/socket.h    |  1 +
 arch/mips/include/uapi/asm/socket.h    |  1 +
 arch/mn10300/include/uapi/asm/socket.h |  1 +
 arch/parisc/include/uapi/asm/socket.h  |  1 +
 arch/powerpc/include/uapi/asm/socket.h |  1 +
 arch/s390/include/uapi/asm/socket.h    |  1 +
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  1 +
 include/linux/cgroup.h                 |  2 ++
 include/net/sock.h                     |  1 +
 include/uapi/asm-generic/socket.h      |  2 ++
 net/core/sock.c                        | 19 ++++++++++++++
 net/unix/af_unix.c                     | 48 ++++++++++++++++++++++++++++++++++
 18 files changed, 88 insertions(+)

-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 2/2] net: Implement SO_PEERCGROUP
  2014-03-12 18:45 [PATCH 0/2] " Vivek Goyal
@ 2014-03-12 18:45 ` Vivek Goyal
  0 siblings, 0 replies; 41+ messages in thread
From: Vivek Goyal @ 2014-03-12 18:45 UTC (permalink / raw)
  To: linux-kernel, cgroups, netdev, davem, tj
  Cc: ssorce, jkaluza, lpoetter, kay, Vivek Goyal

Implement SO_PEERCGROUP along the lines of SO_PEERCRED. This returns the
cgroup of first mounted hierarchy of the task. For the case of client,
it represents the cgroup of client at the time of opening the connection.
After that client cgroup might change.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 arch/alpha/include/uapi/asm/socket.h   |  1 +
 arch/avr32/include/uapi/asm/socket.h   |  1 +
 arch/cris/include/uapi/asm/socket.h    |  2 ++
 arch/frv/include/uapi/asm/socket.h     |  1 +
 arch/ia64/include/uapi/asm/socket.h    |  2 ++
 arch/m32r/include/uapi/asm/socket.h    |  1 +
 arch/mips/include/uapi/asm/socket.h    |  1 +
 arch/mn10300/include/uapi/asm/socket.h |  1 +
 arch/parisc/include/uapi/asm/socket.h  |  1 +
 arch/powerpc/include/uapi/asm/socket.h |  1 +
 arch/s390/include/uapi/asm/socket.h    |  1 +
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  1 +
 include/net/sock.h                     |  1 +
 include/uapi/asm-generic/socket.h      |  2 ++
 net/core/sock.c                        | 19 ++++++++++++++
 net/unix/af_unix.c                     | 48 ++++++++++++++++++++++++++++++++++
 17 files changed, 86 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 3de1394..7178353 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -87,4 +87,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 6e6cd15..486212b 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/cris/include/uapi/asm/socket.h b/arch/cris/include/uapi/asm/socket.h
index ed94e5e..89a09e3 100644
--- a/arch/cris/include/uapi/asm/socket.h
+++ b/arch/cris/include/uapi/asm/socket.h
@@ -82,6 +82,8 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
+
 #endif /* _ASM_SOCKET_H */
 
 
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index ca2c6e6..c4d90bc 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -80,5 +80,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index a1b49ba..62c196d 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index 6c9a24b..6e04a7d 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index a14baa2..cfbd84b 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -98,4 +98,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index 6aa3ce1..73467fe 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -80,4 +80,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index fe35cea..24d8913 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -79,4 +79,5 @@
 
 #define SO_BPF_EXTENSIONS	0x4029
 
+#define SO_PEERCGROUP           0x402a
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index a9c3e2e..50106be 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -87,4 +87,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index e031332..4ae2f3c 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -86,4 +86,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 54d9608..1056168 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -76,6 +76,8 @@
 
 #define SO_BPF_EXTENSIONS	0x0032
 
+#define SO_PEERCGROUP           0x0033
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 39acec0..947bc6e 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -91,4 +91,5 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 5c3f7c3..d594575 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -424,6 +424,7 @@ struct sock {
 	int			(*sk_backlog_rcv)(struct sock *sk,
 						  struct sk_buff *skb);
 	void                    (*sk_destruct)(struct sock *sk);
+	char			*cgroup_path;
 };
 
 #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data)))
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index ea0796b..e86be5b 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -82,4 +82,6 @@
 
 #define SO_BPF_EXTENSIONS	48
 
+#define SO_PEERCGROUP		49
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 5b6a943..0827a3c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1185,6 +1185,24 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sk->sk_max_pacing_rate;
 		break;
 
+	case SO_PEERCGROUP:
+	{
+		int cgroup_path_len;
+
+		if (!sk->cgroup_path) {
+			len = 0;
+			goto lenout;
+		}
+
+		cgroup_path_len = strlen(sk->cgroup_path) + 1;
+
+		if (len > cgroup_path_len)
+			len = cgroup_path_len;
+		if (copy_to_user(optval, sk->cgroup_path, len))
+			return -EFAULT;
+		goto lenout;
+	}
+
 	default:
 		return -ENOPROTOOPT;
 	}
@@ -1378,6 +1396,7 @@ static void __sk_free(struct sock *sk)
 		put_cred(sk->sk_peer_cred);
 	put_pid(sk->sk_peer_pid);
 	put_net(sock_net(sk));
+	kfree(sk->cgroup_path);
 	sk_prot_free(sk->sk_prot_creator, sk);
 }
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 29fc8be..e35105f 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -474,6 +474,37 @@ static void copy_peercred(struct sock *sk, struct sock *peersk)
 	sk->sk_peer_cred = get_cred(peersk->sk_peer_cred);
 }
 
+static int alloc_cgroup_path(struct sock *sk)
+{
+#ifdef CONFIG_CGROUP
+	if (sk->cgroup_path)
+		return 0;
+
+	sk->cgroup_path = kzalloc(PATH_MAX, GFP_KERNEL);
+	if (!sk->cgroup_path)
+		return -ENOMEM;
+
+#endif
+	return 0;
+}
+
+static int init_peercgroup(struct sock *sk)
+{
+	int ret;
+
+	ret = alloc_cgroup_path(sk);
+	if (ret)
+		return ret;
+
+	return task_cgroup_path(current, sk->cgroup_path, PATH_MAX);
+}
+
+static void copy_peercgroup(struct sock *sk, struct sock *peersk)
+{
+	if (sk->cgroup_path)
+		strncpy(sk->cgroup_path, peersk->cgroup_path, PATH_MAX);
+}
+
 static int unix_listen(struct socket *sock, int backlog)
 {
 	int err;
@@ -487,6 +518,12 @@ static int unix_listen(struct socket *sock, int backlog)
 	err = -EINVAL;
 	if (!u->addr)
 		goto out;	/* No listens on an unbound socket */
+
+	err = init_peercgroup(sk);
+	if (err)
+		goto out;
+
+	err = -EINVAL;
 	unix_state_lock(sk);
 	if (sk->sk_state != TCP_CLOSE && sk->sk_state != TCP_LISTEN)
 		goto out_unlock;
@@ -1098,6 +1135,16 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	if (newsk == NULL)
 		goto out;
 
+	err = init_peercgroup(newsk);
+	if (err)
+		goto out;
+
+	err = alloc_cgroup_path(sk);
+	if (err)
+		goto out;
+
+	err = -ENOMEM;
+
 	/* Allocate skb for sending to listening sock */
 	skb = sock_wmalloc(newsk, 1, 0, GFP_KERNEL);
 	if (skb == NULL)
@@ -1203,6 +1250,7 @@ restart:
 
 	/* Set credentials */
 	copy_peercred(sk, other);
+	copy_peercgroup(sk, other);
 
 	sock->state	= SS_CONNECTED;
 	sk->sk_state	= TCP_ESTABLISHED;
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2014-03-14 23:54 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-12 20:46 [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer Vivek Goyal
2014-03-12 20:46 ` [PATCH 1/2] cgroup: Provide empty definition of task_cgroup_path() Vivek Goyal
2014-03-12 20:46 ` [PATCH 2/2] net: Implement SO_PEERCGROUP Vivek Goyal
2014-03-12 20:58   ` Cong Wang
     [not found]     ` <CAHA+R7OrNoa7_J-rOskxgdvkM5gAnQyoFBeCXSziY7XMd-yLNA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-13 13:48       ` Vivek Goyal
     [not found]   ` <1394657163-7472-3-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-12 21:00     ` Andy Lutomirski
2014-03-12 21:12       ` Andy Lutomirski
     [not found]         ` <CALCETrUTjN=XKwnO62P9roZtLLuk7_9Oi17Mj_Aa2ZFtZc1gVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-12 21:16           ` Simo Sorce
     [not found]             ` <1394658983.32465.203.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-12 21:19               ` Andy Lutomirski
     [not found]                 ` <CALCETrU29CpcK4dVRD45BGOD7AVEudADiFxOgFgKWkpFzw07eg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-13  1:17                   ` Simo Sorce
     [not found]                     ` <1394673476.32465.215.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-13  1:21                       ` Andy Lutomirski
     [not found]                         ` <CALCETrWbCuKfV4zMkZajDOGhGkKduW0C2L7uHv=vtvqhVeAc6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-13  1:43                           ` Simo Sorce
     [not found]                             ` <1394675038.32465.223.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-13  2:12                               ` Andy Lutomirski
2014-03-13 14:27                                 ` Vivek Goyal
     [not found]                                   ` <20140313142755.GC18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-14 23:54                                     ` Eric W. Biederman
     [not found]                                 ` <CALCETrV7GE0YL1sDDqTUV81rWL9zk2vTR1-VLkR7CfTA-FSrgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-13 17:51                                   ` Simo Sorce
2014-03-13 17:55                                     ` Andy Lutomirski
     [not found]                                       ` <CALCETrXWzja6W6y=p9MrtynGZMsrD5KQu0KBK6Bs1dQxnesv2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-03-13 17:57                                         ` Simo Sorce
     [not found]                                           ` <1394733448.32465.249.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-13 18:03                                             ` Andy Lutomirski
2014-03-13 17:58                                         ` Simo Sorce
2014-03-13 18:01                                           ` Andy Lutomirski
2014-03-13 18:05                                           ` Tim Hockin
2014-03-13 19:53                                       ` Vivek Goyal
2014-03-13 19:58                                         ` Andy Lutomirski
2014-03-13 20:06                                           ` Vivek Goyal
     [not found]                                             ` <20140313200649.GN18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-13 20:17                                               ` Vivek Goyal
     [not found]                                                 ` <20140313201755.GO18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-13 20:19                                                   ` Vivek Goyal
2014-03-13 21:21                                               ` Andy Lutomirski
2014-03-14 23:49                                               ` Eric W. Biederman
2014-03-13 18:02                                     ` Vivek Goyal
2014-03-13 14:14         ` Vivek Goyal
     [not found]           ` <20140313141422.GB18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-13 14:55             ` Simo Sorce
     [not found]               ` <1394722534.32465.227.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-13 15:00                 ` Vivek Goyal
     [not found]                   ` <20140313150034.GG18914-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-13 16:33                     ` Simo Sorce
2014-03-13 17:25                       ` Andy Lutomirski
2014-03-13 17:55                         ` Simo Sorce
2014-03-13 17:56                         ` Tim Hockin
     [not found] ` <1394657163-7472-1-git-send-email-vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-03-12 20:56   ` [PATCH 0/2][V2] net: Implement SO_PEERCGROUP to get cgroup of peer Andy Lutomirski
2014-03-12 20:59     ` Simo Sorce
     [not found]       ` <1394657970.32465.200.camel-Hs+ccMQdwurzDu64bZtGtWD2FQJk+8+b@public.gmane.org>
2014-03-12 21:09         ` Andy Lutomirski
  -- strict thread matches above, loose matches on Subject: below --
2014-03-12 18:45 [PATCH 0/2] " Vivek Goyal
2014-03-12 18:45 ` [PATCH 2/2] net: Implement SO_PEERCGROUP Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).