netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id
@ 2025-03-09 13:28 Alexander Mikhalitsyn
  2025-03-09 13:28 ` [PATCH net-next 1/4] net: unix: print cgroup_id and peer_cgroup_id in fdinfo Alexander Mikhalitsyn
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Alexander Mikhalitsyn @ 2025-03-09 13:28 UTC (permalink / raw)
  To: kuniyu
  Cc: Alexander Mikhalitsyn, linux-kernel, netdev, cgroups,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Leon Romanovsky, Arnd Bergmann,
	Christian Brauner, Lennart Poettering, Luca Boccassi, Tejun Heo,
	Johannes Weiner, Michal Koutný, Shuah Khan

1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo
2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id
3. Add SO_PEERCGROUPID kselftest

Generally speaking, this API allows race-free resolution of socket's peer cgroup id.
Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence
is used which is racy.

As we don't add any new state to the socket itself there is no potential locking issues
or performance problems. We use already existing sk->sk_cgrp_data.

We already have analogical interfaces to retrieve this
information:
- inet_diag: INET_DIAG_CGROUP_ID
- eBPF: bpf_sk_cgroup_id

Having getsockopt() interface makes sense for many applications, because using eBPF is
not always an option, while inet_diag has obvious complexety and performance drawbacks
if we only want to get this specific info for one specific socket.

Idea comes from UAPI kernel group:
https://uapi-group.org/kernel-features/

Huge thanks to Christian Brauner, Lennart Poettering and Luca Boccassi for proposing
and exchanging ideas about this.

Git tree:
https://github.com/mihalicyn/linux/tree/so_peercgroupid

Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Shuah Khan <shuah@kernel.org>

Alexander Mikhalitsyn (4):
  net: unix: print cgroup_id and peer_cgroup_id in fdinfo
  net: core: add getsockopt SO_PEERCGROUPID
  tools/testing/selftests/cgroup/cgroup_util: add cg_get_id helper
  tools/testing/selftests/cgroup: add test for SO_PEERCGROUPID

 arch/alpha/include/uapi/asm/socket.h          |   2 +
 arch/mips/include/uapi/asm/socket.h           |   2 +
 arch/parisc/include/uapi/asm/socket.h         |   2 +
 arch/sparc/include/uapi/asm/socket.h          |   2 +
 include/uapi/asm-generic/socket.h             |   2 +
 net/core/sock.c                               |  17 +
 net/unix/af_unix.c                            |  84 +++++
 tools/include/uapi/asm-generic/socket.h       |   2 +
 tools/testing/selftests/cgroup/Makefile       |   2 +
 tools/testing/selftests/cgroup/cgroup_util.c  |  15 +
 tools/testing/selftests/cgroup/cgroup_util.h  |   2 +
 .../selftests/cgroup/test_so_peercgroupid.c   | 308 ++++++++++++++++++
 12 files changed, 440 insertions(+)
 create mode 100644 tools/testing/selftests/cgroup/test_so_peercgroupid.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH net-next 1/4] net: unix: print cgroup_id and peer_cgroup_id in fdinfo
  2025-03-09 13:28 [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Alexander Mikhalitsyn
@ 2025-03-09 13:28 ` Alexander Mikhalitsyn
  2025-03-11  7:41   ` Kuniyuki Iwashima
  2025-03-09 13:28 ` [PATCH net-next 2/4] net: core: add getsockopt SO_PEERCGROUPID Alexander Mikhalitsyn
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Alexander Mikhalitsyn @ 2025-03-09 13:28 UTC (permalink / raw)
  To: kuniyu
  Cc: Alexander Mikhalitsyn, linux-kernel, netdev, cgroups,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Leon Romanovsky, Arnd Bergmann,
	Christian Brauner, Lennart Poettering, Luca Boccassi, Tejun Heo,
	Johannes Weiner, Michal Koutný

Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 net/unix/af_unix.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 7f8f3859cdb3..2b2c0036efc9 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -117,6 +117,7 @@
 #include <linux/file.h>
 #include <linux/btf_ids.h>
 #include <linux/bpf-cgroup.h>
+#include <linux/cgroup.h>
 
 static atomic_long_t unix_nr_socks;
 static struct hlist_head bsd_socket_buckets[UNIX_HASH_SIZE / 2];
@@ -861,6 +862,11 @@ static void unix_show_fdinfo(struct seq_file *m, struct socket *sock)
 	int nr_fds = 0;
 
 	if (sk) {
+#ifdef CONFIG_SOCK_CGROUP_DATA
+		struct sock *peer;
+		u64 sk_cgroup_id = 0;
+#endif
+
 		s_state = READ_ONCE(sk->sk_state);
 		u = unix_sk(sk);
 
@@ -874,6 +880,21 @@ static void unix_show_fdinfo(struct seq_file *m, struct socket *sock)
 			nr_fds = unix_count_nr_fds(sk);
 
 		seq_printf(m, "scm_fds: %u\n", nr_fds);
+
+#ifdef CONFIG_SOCK_CGROUP_DATA
+		sk_cgroup_id = cgroup_id(sock_cgroup_ptr(&sk->sk_cgrp_data));
+		seq_printf(m, "cgroup_id: %llu\n", sk_cgroup_id);
+
+		peer = unix_peer_get(sk);
+		if (peer) {
+			u64 peer_cgroup_id = 0;
+
+			peer_cgroup_id = cgroup_id(sock_cgroup_ptr(&peer->sk_cgrp_data));
+			sock_put(peer);
+
+			seq_printf(m, "peer_cgroup_id: %llu\n", peer_cgroup_id);
+		}
+#endif
 	}
 }
 #else
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 2/4] net: core: add getsockopt SO_PEERCGROUPID
  2025-03-09 13:28 [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Alexander Mikhalitsyn
  2025-03-09 13:28 ` [PATCH net-next 1/4] net: unix: print cgroup_id and peer_cgroup_id in fdinfo Alexander Mikhalitsyn
@ 2025-03-09 13:28 ` Alexander Mikhalitsyn
  2025-03-10 14:00   ` Willem de Bruijn
  2025-03-11  7:52   ` Kuniyuki Iwashima
  2025-03-09 13:28 ` [PATCH net-next 3/4] tools/testing/selftests/cgroup/cgroup_util: add cg_get_id helper Alexander Mikhalitsyn
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 16+ messages in thread
From: Alexander Mikhalitsyn @ 2025-03-09 13:28 UTC (permalink / raw)
  To: kuniyu
  Cc: Alexander Mikhalitsyn, linux-kernel, netdev, cgroups,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Leon Romanovsky, Arnd Bergmann,
	Christian Brauner, Lennart Poettering, Luca Boccassi, Tejun Heo,
	Johannes Weiner, Michal Koutný

Add SO_PEERCGROUPID which allows to get cgroup_id
for a socket.

We already have analogical interfaces to retrieve this
information:
- inet_diag: INET_DIAG_CGROUP_ID
- eBPF: bpf_sk_cgroup_id

Having getsockopt() interface makes sense for many
applications, because using eBPF is not always an option,
while inet_diag has obvious complexety and performance drawbacks
if we only want to get this specific info for one specific socket.

Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 arch/alpha/include/uapi/asm/socket.h    |  2 +
 arch/mips/include/uapi/asm/socket.h     |  2 +
 arch/parisc/include/uapi/asm/socket.h   |  2 +
 arch/sparc/include/uapi/asm/socket.h    |  2 +
 include/uapi/asm-generic/socket.h       |  2 +
 net/core/sock.c                         | 17 +++++++
 net/unix/af_unix.c                      | 63 +++++++++++++++++++++++++
 tools/include/uapi/asm-generic/socket.h |  2 +
 8 files changed, 92 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 3df5f2dd4c0f..58ce457b2c09 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -150,6 +150,8 @@
 
 #define SO_RCVPRIORITY		82
 
+#define SO_PEERCGROUPID		83
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 22fa8f19924a..823fa67f7d79 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -161,6 +161,8 @@
 
 #define SO_RCVPRIORITY		82
 
+#define SO_PEERCGROUPID		83
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index aa9cd4b951fe..1ee2e858d177 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -142,6 +142,8 @@
 
 #define SO_RCVPRIORITY		0x404D
 
+#define SO_PEERCGROUPID		0x404E
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 5b464a568664..2fe7d0c48a63 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -143,6 +143,8 @@
 
 #define SO_RCVPRIORITY           0x005b
 
+#define SO_PEERCGROUPID          0x005c
+
 #if !defined(__KERNEL__)
 
 
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index aa5016ff3d91..903904bb537c 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -145,6 +145,8 @@
 
 #define SO_RCVPRIORITY		82
 
+#define SO_PEERCGROUPID		83
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/sock.c b/net/core/sock.c
index a0598518ce89..6dc0b1a8367b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1946,6 +1946,23 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 		goto lenout;
 	}
 
+#ifdef CONFIG_SOCK_CGROUP_DATA
+	case SO_PEERCGROUPID:
+	{
+		const struct proto_ops *ops;
+
+		if (sk->sk_family != AF_UNIX)
+			return -EOPNOTSUPP;
+
+		ops = READ_ONCE(sock->ops);
+		if (!ops->getsockopt)
+			return -EOPNOTSUPP;
+
+		return ops->getsockopt(sock, SOL_SOCKET, optname, optval.user,
+				       optlen.user);
+	}
+#endif
+
 	/* Dubious BSD thing... Probably nobody even uses it, but
 	 * the UNIX standard wants it for whatever reason... -DaveM
 	 */
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2b2c0036efc9..3455f38f033d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -901,6 +901,66 @@ static void unix_show_fdinfo(struct seq_file *m, struct socket *sock)
 #define unix_show_fdinfo NULL
 #endif
 
+static int unix_getsockopt(struct socket *sock, int level, int optname,
+			   char __user *optval, int __user *optlen)
+{
+	struct sock *sk = sock->sk;
+
+	union {
+		int val;
+		u64 val64;
+	} v;
+
+	int lv = sizeof(int);
+	int len;
+
+	if (level != SOL_SOCKET)
+		return -ENOPROTOOPT;
+
+	if (get_user(len, optlen))
+		return -EFAULT;
+
+	if (len < 0)
+		return -EINVAL;
+
+	memset(&v, 0, sizeof(v));
+
+	switch (optname) {
+#ifdef CONFIG_SOCK_CGROUP_DATA
+	case SO_PEERCGROUPID:
+	{
+		struct sock *peer;
+		u64 peer_cgroup_id = 0;
+
+		lv = sizeof(u64);
+		if (len < lv)
+			return -EINVAL;
+
+		peer = unix_peer_get(sk);
+		if (!peer)
+			return -ENODATA;
+
+		peer_cgroup_id = cgroup_id(sock_cgroup_ptr(&peer->sk_cgrp_data));
+		sock_put(peer);
+
+		v.val64 = peer_cgroup_id;
+		break;
+	}
+#endif
+	default:
+		return -ENOPROTOOPT;
+	}
+
+	if (len > lv)
+		len = lv;
+	if (copy_to_user(optval, &v, len))
+		return -EFAULT;
+	if (put_user(len, optlen))
+		return -EFAULT;
+
+	return 0;
+}
+
 static const struct proto_ops unix_stream_ops = {
 	.family =	PF_UNIX,
 	.owner =	THIS_MODULE,
@@ -910,6 +970,7 @@ static const struct proto_ops unix_stream_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	unix_accept,
 	.getname =	unix_getname,
+	.getsockopt =	unix_getsockopt,
 	.poll =		unix_poll,
 	.ioctl =	unix_ioctl,
 #ifdef CONFIG_COMPAT
@@ -935,6 +996,7 @@ static const struct proto_ops unix_dgram_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	sock_no_accept,
 	.getname =	unix_getname,
+	.getsockopt =	unix_getsockopt,
 	.poll =		unix_dgram_poll,
 	.ioctl =	unix_ioctl,
 #ifdef CONFIG_COMPAT
@@ -959,6 +1021,7 @@ static const struct proto_ops unix_seqpacket_ops = {
 	.socketpair =	unix_socketpair,
 	.accept =	unix_accept,
 	.getname =	unix_getname,
+	.getsockopt =	unix_getsockopt,
 	.poll =		unix_dgram_poll,
 	.ioctl =	unix_ioctl,
 #ifdef CONFIG_COMPAT
diff --git a/tools/include/uapi/asm-generic/socket.h b/tools/include/uapi/asm-generic/socket.h
index aa5016ff3d91..903904bb537c 100644
--- a/tools/include/uapi/asm-generic/socket.h
+++ b/tools/include/uapi/asm-generic/socket.h
@@ -145,6 +145,8 @@
 
 #define SO_RCVPRIORITY		82
 
+#define SO_PEERCGROUPID		83
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 3/4] tools/testing/selftests/cgroup/cgroup_util: add cg_get_id helper
  2025-03-09 13:28 [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Alexander Mikhalitsyn
  2025-03-09 13:28 ` [PATCH net-next 1/4] net: unix: print cgroup_id and peer_cgroup_id in fdinfo Alexander Mikhalitsyn
  2025-03-09 13:28 ` [PATCH net-next 2/4] net: core: add getsockopt SO_PEERCGROUPID Alexander Mikhalitsyn
@ 2025-03-09 13:28 ` Alexander Mikhalitsyn
  2025-03-09 13:28 ` [PATCH net-next 4/4] tools/testing/selftests/cgroup: add test for SO_PEERCGROUPID Alexander Mikhalitsyn
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Alexander Mikhalitsyn @ 2025-03-09 13:28 UTC (permalink / raw)
  To: kuniyu
  Cc: Alexander Mikhalitsyn, linux-kselftest, linux-kernel, netdev,
	cgroups, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, Leon Romanovsky, Arnd Bergmann,
	Christian Brauner, Lennart Poettering, Luca Boccassi, Tejun Heo,
	Johannes Weiner, Michal Koutný, Shuah Khan

Cc: linux-kselftest@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 tools/testing/selftests/cgroup/cgroup_util.c | 15 +++++++++++++++
 tools/testing/selftests/cgroup/cgroup_util.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c
index 1e2d46636a0c..b60e0e1433f4 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -205,6 +205,21 @@ int cg_open(const char *cgroup, const char *control, int flags)
 	return open(path, flags);
 }
 
+/*
+ * Returns cgroup id on success, or -1 on failure.
+ */
+uint64_t cg_get_id(const char *cgroup)
+{
+	struct stat st;
+	int ret;
+
+	ret = stat(cgroup, &st);
+	if (ret)
+		return -1;
+
+	return st.st_ino;
+}
+
 int cg_write_numeric(const char *cgroup, const char *control, long value)
 {
 	char buf[64];
diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/selftests/cgroup/cgroup_util.h
index 19b131ee7707..3f2d9676ceda 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.h
+++ b/tools/testing/selftests/cgroup/cgroup_util.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <stdbool.h>
+#include <stdint.h>
 #include <stdlib.h>
 
 #include "../kselftest.h"
@@ -39,6 +40,7 @@ long cg_read_key_long(const char *cgroup, const char *control, const char *key);
 extern long cg_read_lc(const char *cgroup, const char *control);
 extern int cg_write(const char *cgroup, const char *control, char *buf);
 extern int cg_open(const char *cgroup, const char *control, int flags);
+extern uint64_t cg_get_id(const char *cgroup);
 int cg_write_numeric(const char *cgroup, const char *control, long value);
 extern int cg_run(const char *cgroup,
 		  int (*fn)(const char *cgroup, void *arg),
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH net-next 4/4] tools/testing/selftests/cgroup: add test for SO_PEERCGROUPID
  2025-03-09 13:28 [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Alexander Mikhalitsyn
                   ` (2 preceding siblings ...)
  2025-03-09 13:28 ` [PATCH net-next 3/4] tools/testing/selftests/cgroup/cgroup_util: add cg_get_id helper Alexander Mikhalitsyn
@ 2025-03-09 13:28 ` Alexander Mikhalitsyn
  2025-03-10 13:59   ` Willem de Bruijn
  2025-03-11  8:02   ` Kuniyuki Iwashima
  2025-03-09 14:36 ` [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Tejun Heo
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 16+ messages in thread
From: Alexander Mikhalitsyn @ 2025-03-09 13:28 UTC (permalink / raw)
  To: kuniyu
  Cc: Alexander Mikhalitsyn, linux-kselftest, linux-kernel, netdev,
	cgroups, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, Leon Romanovsky, Arnd Bergmann,
	Christian Brauner, Lennart Poettering, Luca Boccassi, Tejun Heo,
	Johannes Weiner, Michal Koutný, Shuah Khan

Cc: linux-kselftest@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Cc: Luca Boccassi <bluca@debian.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
---
 tools/testing/selftests/cgroup/Makefile       |   2 +
 .../selftests/cgroup/test_so_peercgroupid.c   | 308 ++++++++++++++++++
 2 files changed, 310 insertions(+)
 create mode 100644 tools/testing/selftests/cgroup/test_so_peercgroupid.c

diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile
index 1b897152bab6..a932ff068081 100644
--- a/tools/testing/selftests/cgroup/Makefile
+++ b/tools/testing/selftests/cgroup/Makefile
@@ -16,6 +16,7 @@ TEST_GEN_PROGS += test_kill
 TEST_GEN_PROGS += test_kmem
 TEST_GEN_PROGS += test_memcontrol
 TEST_GEN_PROGS += test_pids
+TEST_GEN_PROGS += test_so_peercgroupid
 TEST_GEN_PROGS += test_zswap
 
 LOCAL_HDRS += $(selfdir)/clone3/clone3_selftests.h $(selfdir)/pidfd/pidfd.h
@@ -31,4 +32,5 @@ $(OUTPUT)/test_kill: cgroup_util.c
 $(OUTPUT)/test_kmem: cgroup_util.c
 $(OUTPUT)/test_memcontrol: cgroup_util.c
 $(OUTPUT)/test_pids: cgroup_util.c
+$(OUTPUT)/test_so_peercgroupid: cgroup_util.c
 $(OUTPUT)/test_zswap: cgroup_util.c
diff --git a/tools/testing/selftests/cgroup/test_so_peercgroupid.c b/tools/testing/selftests/cgroup/test_so_peercgroupid.c
new file mode 100644
index 000000000000..2bf1f00a45c7
--- /dev/null
+++ b/tools/testing/selftests/cgroup/test_so_peercgroupid.c
@@ -0,0 +1,308 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+#define _GNU_SOURCE
+#include <error.h>
+#include <inttypes.h>
+#include <limits.h>
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/socket.h>
+#include <linux/socket.h>
+#include <unistd.h>
+#include <string.h>
+#include <errno.h>
+#include <sys/un.h>
+#include <sys/signal.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+#include "../kselftest_harness.h"
+#include "cgroup_util.h"
+
+#define clean_errno() (errno == 0 ? "None" : strerror(errno))
+#define log_err(MSG, ...)                                                   \
+	fprintf(stderr, "(%s:%d: errno: %s) " MSG "\n", __FILE__, __LINE__, \
+		clean_errno(), ##__VA_ARGS__)
+
+#ifndef SO_PEERCGROUPID
+#define SO_PEERCGROUPID 83
+#endif
+
+static void child_die()
+{
+	exit(1);
+}
+
+struct sock_addr {
+	char sock_name[32];
+	struct sockaddr_un listen_addr;
+	socklen_t addrlen;
+};
+
+FIXTURE(so_peercgroupid)
+{
+	int server;
+	pid_t client_pid;
+	int sync_sk[2];
+	struct sock_addr server_addr;
+	struct sock_addr *client_addr;
+	char cgroup_root[PATH_MAX];
+	char *test_cgroup1;
+	char *test_cgroup2;
+};
+
+FIXTURE_VARIANT(so_peercgroupid)
+{
+	int type;
+	bool abstract;
+};
+
+FIXTURE_VARIANT_ADD(so_peercgroupid, stream_pathname)
+{
+	.type = SOCK_STREAM,
+	.abstract = 0,
+};
+
+FIXTURE_VARIANT_ADD(so_peercgroupid, stream_abstract)
+{
+	.type = SOCK_STREAM,
+	.abstract = 1,
+};
+
+FIXTURE_VARIANT_ADD(so_peercgroupid, seqpacket_pathname)
+{
+	.type = SOCK_SEQPACKET,
+	.abstract = 0,
+};
+
+FIXTURE_VARIANT_ADD(so_peercgroupid, seqpacket_abstract)
+{
+	.type = SOCK_SEQPACKET,
+	.abstract = 1,
+};
+
+FIXTURE_VARIANT_ADD(so_peercgroupid, dgram_pathname)
+{
+	.type = SOCK_DGRAM,
+	.abstract = 0,
+};
+
+FIXTURE_VARIANT_ADD(so_peercgroupid, dgram_abstract)
+{
+	.type = SOCK_DGRAM,
+	.abstract = 1,
+};
+
+FIXTURE_SETUP(so_peercgroupid)
+{
+	self->client_addr = mmap(NULL, sizeof(*self->client_addr), PROT_READ | PROT_WRITE,
+				 MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+	ASSERT_NE(MAP_FAILED, self->client_addr);
+
+	self->cgroup_root[0] = '\0';
+}
+
+FIXTURE_TEARDOWN(so_peercgroupid)
+{
+	close(self->server);
+
+	kill(self->client_pid, SIGKILL);
+	waitpid(self->client_pid, NULL, 0);
+
+	if (!variant->abstract) {
+		unlink(self->server_addr.sock_name);
+		unlink(self->client_addr->sock_name);
+	}
+
+	if (strlen(self->cgroup_root) > 0) {
+		cg_enter_current(self->cgroup_root);
+
+		if (self->test_cgroup1)
+			cg_destroy(self->test_cgroup1);
+		free(self->test_cgroup1);
+
+		if (self->test_cgroup2)
+			cg_destroy(self->test_cgroup2);
+		free(self->test_cgroup2);
+	}
+}
+
+static void fill_sockaddr(struct sock_addr *addr, bool abstract)
+{
+	char *sun_path_buf = (char *)&addr->listen_addr.sun_path;
+
+	addr->listen_addr.sun_family = AF_UNIX;
+	addr->addrlen = offsetof(struct sockaddr_un, sun_path);
+	snprintf(addr->sock_name, sizeof(addr->sock_name), "so_peercgroupid_%d", getpid());
+	addr->addrlen += strlen(addr->sock_name);
+	if (abstract) {
+		*sun_path_buf = '\0';
+		addr->addrlen++;
+		sun_path_buf++;
+	} else {
+		unlink(addr->sock_name);
+	}
+	memcpy(sun_path_buf, addr->sock_name, strlen(addr->sock_name));
+}
+
+static void client(FIXTURE_DATA(so_peercgroupid) *self,
+		   const FIXTURE_VARIANT(so_peercgroupid) *variant)
+{
+	int cfd, err;
+	socklen_t len;
+	uint64_t peer_cgroup_id = 0, test_cgroup1_id = 0, test_cgroup2_id = 0;
+	char state;
+
+	cfd = socket(AF_UNIX, variant->type, 0);
+	if (cfd < 0) {
+		log_err("socket");
+		child_die();
+	}
+
+	if (variant->type == SOCK_DGRAM) {
+		fill_sockaddr(self->client_addr, variant->abstract);
+
+		if (bind(cfd, (struct sockaddr *)&self->client_addr->listen_addr, self->client_addr->addrlen)) {
+			log_err("bind");
+			child_die();
+		}
+	}
+
+	/* negative testcase: no peer for socket yet */
+	len = sizeof(peer_cgroup_id);
+	err = getsockopt(cfd, SOL_SOCKET, SO_PEERCGROUPID, &peer_cgroup_id, &len);
+	if (!err || (errno != ENODATA)) {
+		log_err("getsockopt must fail with errno == ENODATA when socket has no peer");
+		child_die();
+	}
+
+	if (connect(cfd, (struct sockaddr *)&self->server_addr.listen_addr,
+		    self->server_addr.addrlen) != 0) {
+		log_err("connect");
+		child_die();
+	}
+
+	state = 'R';
+	write(self->sync_sk[1], &state, sizeof(state));
+
+	read(self->sync_sk[1], &test_cgroup1_id, sizeof(uint64_t));
+	read(self->sync_sk[1], &test_cgroup2_id, sizeof(uint64_t));
+
+	len = sizeof(peer_cgroup_id);
+	if (getsockopt(cfd, SOL_SOCKET, SO_PEERCGROUPID, &peer_cgroup_id, &len)) {
+		log_err("Failed to get SO_PEERCGROUPID");
+		child_die();
+	}
+
+	/*
+	 * There is a difference between connection-oriented sockets
+	 * and connectionless ones from the perspective of SO_PEERCGROUPID.
+	 *
+	 * sk->sk_cgrp_data is getting filled when we allocate struct sock (see call to cgroup_sk_alloc()).
+	 * For DGRAM socket, self->server socket is our peer and by the time when we allocate it,
+	 * parent process sits in a test_cgroup1. Then it changes cgroup to test_cgroup2, but it does not
+	 * affect anything.
+	 * For STREAM/SEQPACKET socket, self->server is not our peer, but that one we get from accept()
+	 * syscall. And by the time when we call accept(), parent process sits in test_cgroup2.
+	 *
+	 * Let's ensure that it works like that and if it get changed then we should detect it
+	 * as it's a clear UAPI change.
+	 */
+	if (variant->type == SOCK_DGRAM) {
+		/* cgroup id from SO_PEERCGROUPID should be equal to the test_cgroup1_id */
+		if (peer_cgroup_id != test_cgroup1_id) {
+			log_err("peer_cgroup_id != test_cgroup1_id: %" PRId64 " != %" PRId64, peer_cgroup_id, test_cgroup1_id);
+			child_die();
+		}
+	} else {
+		/* cgroup id from SO_PEERCGROUPID should be equal to the test_cgroup2_id */
+		if (peer_cgroup_id != test_cgroup2_id) {
+			log_err("peer_cgroup_id != test_cgroup2_id: %" PRId64 " != %" PRId64, peer_cgroup_id, test_cgroup2_id);
+			child_die();
+		}
+	}
+}
+
+TEST_F(so_peercgroupid, test)
+{
+	uint64_t test_cgroup1_id, test_cgroup2_id;
+	int err;
+	int pfd;
+	char state;
+	int child_status = 0;
+
+	if (cg_find_unified_root(self->cgroup_root, sizeof(self->cgroup_root), NULL))
+		ksft_exit_skip("cgroup v2 isn't mounted\n");
+
+	self->test_cgroup1 = cg_name(self->cgroup_root, "so_peercgroupid_cg1");
+	ASSERT_NE(NULL, self->test_cgroup1);
+
+	self->test_cgroup2 = cg_name(self->cgroup_root, "so_peercgroupid_cg2");
+	ASSERT_NE(NULL, self->test_cgroup2);
+
+	err = cg_create(self->test_cgroup1);
+	ASSERT_EQ(0, err);
+
+	err = cg_create(self->test_cgroup2);
+	ASSERT_EQ(0, err);
+
+	test_cgroup1_id = cg_get_id(self->test_cgroup1);
+	ASSERT_LT(0, test_cgroup1_id);
+
+	test_cgroup2_id = cg_get_id(self->test_cgroup2);
+	ASSERT_LT(0, test_cgroup2_id);
+
+	/* enter test_cgroup1 before allocating a socket */
+	err = cg_enter_current(self->test_cgroup1);
+	ASSERT_EQ(0, err);
+
+	self->server = socket(AF_UNIX, variant->type, 0);
+	ASSERT_NE(-1, self->server);
+
+	/* enter test_cgroup2 after allocating a socket */
+	err = cg_enter_current(self->test_cgroup2);
+	ASSERT_EQ(0, err);
+
+	fill_sockaddr(&self->server_addr, variant->abstract);
+
+	err = bind(self->server, (struct sockaddr *)&self->server_addr.listen_addr, self->server_addr.addrlen);
+	ASSERT_EQ(0, err);
+
+	if (variant->type != SOCK_DGRAM) {
+		err = listen(self->server, 1);
+		ASSERT_EQ(0, err);
+	}
+
+	err = socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0, self->sync_sk);
+	EXPECT_EQ(err, 0);
+
+	self->client_pid = fork();
+	ASSERT_NE(-1, self->client_pid);
+	if (self->client_pid == 0) {
+		close(self->server);
+		close(self->sync_sk[0]);
+		client(self, variant);
+		exit(0);
+	}
+	close(self->sync_sk[1]);
+
+	if (variant->type != SOCK_DGRAM) {
+		pfd = accept(self->server, NULL, NULL);
+		ASSERT_NE(-1, pfd);
+	} else {
+		pfd = self->server;
+	}
+
+	/* wait until the child arrives at checkpoint */
+	read(self->sync_sk[0], &state, sizeof(state));
+	ASSERT_EQ(state, 'R');
+
+	write(self->sync_sk[0], &test_cgroup1_id, sizeof(uint64_t));
+	write(self->sync_sk[0], &test_cgroup2_id, sizeof(uint64_t));
+
+	close(pfd);
+	waitpid(self->client_pid, &child_status, 0);
+	ASSERT_EQ(0, WIFEXITED(child_status) ? WEXITSTATUS(child_status) : 1);
+}
+
+TEST_HARNESS_MAIN
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id
  2025-03-09 13:28 [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Alexander Mikhalitsyn
                   ` (3 preceding siblings ...)
  2025-03-09 13:28 ` [PATCH net-next 4/4] tools/testing/selftests/cgroup: add test for SO_PEERCGROUPID Alexander Mikhalitsyn
@ 2025-03-09 14:36 ` Tejun Heo
  2025-03-10  8:52 ` Christian Brauner
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Tejun Heo @ 2025-03-09 14:36 UTC (permalink / raw)
  Cc: kuniyu, linux-kernel, netdev, cgroups, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Willem de Bruijn,
	Leon Romanovsky, Arnd Bergmann, Christian Brauner,
	Lennart Poettering, Luca Boccassi, Johannes Weiner,
	Michal Koutný, Shuah Khan

On Sun, Mar 09, 2025 at 02:28:11PM +0100, Alexander Mikhalitsyn wrote:
> 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo
> 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id
> 3. Add SO_PEERCGROUPID kselftest
> 
> Generally speaking, this API allows race-free resolution of socket's peer cgroup id.
> Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence
> is used which is racy.
> 
> As we don't add any new state to the socket itself there is no potential locking issues
> or performance problems. We use already existing sk->sk_cgrp_data.
> 
> We already have analogical interfaces to retrieve this
> information:
> - inet_diag: INET_DIAG_CGROUP_ID
> - eBPF: bpf_sk_cgroup_id
> 
> Having getsockopt() interface makes sense for many applications, because using eBPF is
> not always an option, while inet_diag has obvious complexety and performance drawbacks
> if we only want to get this specific info for one specific socket.
> 
> Idea comes from UAPI kernel group:
> https://uapi-group.org/kernel-features/
> 
> Huge thanks to Christian Brauner, Lennart Poettering and Luca Boccassi for proposing
> and exchanging ideas about this.
> 
> Git tree:
> https://github.com/mihalicyn/linux/tree/so_peercgroupid

From cgroup POV:

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id
  2025-03-09 13:28 [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Alexander Mikhalitsyn
                   ` (4 preceding siblings ...)
  2025-03-09 14:36 ` [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Tejun Heo
@ 2025-03-10  8:52 ` Christian Brauner
  2025-03-10 11:27   ` Christian Brauner
  2025-03-11  7:33 ` Kuniyuki Iwashima
  2025-03-13 10:31 ` Michal Koutný
  7 siblings, 1 reply; 16+ messages in thread
From: Christian Brauner @ 2025-03-10  8:52 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: kuniyu, linux-kernel, netdev, cgroups, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Willem de Bruijn,
	Leon Romanovsky, Arnd Bergmann, Lennart Poettering, Luca Boccassi,
	Tejun Heo, Johannes Weiner, Michal Koutný, Shuah Khan

On Sun, Mar 09, 2025 at 02:28:11PM +0100, Alexander Mikhalitsyn wrote:
> 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo
> 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id
> 3. Add SO_PEERCGROUPID kselftest
> 
> Generally speaking, this API allows race-free resolution of socket's peer cgroup id.
> Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence
> is used which is racy.
> 
> As we don't add any new state to the socket itself there is no potential locking issues
> or performance problems. We use already existing sk->sk_cgrp_data.
> 
> We already have analogical interfaces to retrieve this
> information:
> - inet_diag: INET_DIAG_CGROUP_ID
> - eBPF: bpf_sk_cgroup_id
> 
> Having getsockopt() interface makes sense for many applications, because using eBPF is
> not always an option, while inet_diag has obvious complexety and performance drawbacks
> if we only want to get this specific info for one specific socket.
> 
> Idea comes from UAPI kernel group:
> https://uapi-group.org/kernel-features/
> 
> Huge thanks to Christian Brauner, Lennart Poettering and Luca Boccassi for proposing
> and exchanging ideas about this.

Seems fine to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id
  2025-03-10  8:52 ` Christian Brauner
@ 2025-03-10 11:27   ` Christian Brauner
  0 siblings, 0 replies; 16+ messages in thread
From: Christian Brauner @ 2025-03-10 11:27 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: kuniyu, linux-kernel, netdev, cgroups, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Willem de Bruijn,
	Leon Romanovsky, Arnd Bergmann, Lennart Poettering, Luca Boccassi,
	Tejun Heo, Johannes Weiner, Michal Koutný, Shuah Khan

On Mon, Mar 10, 2025 at 09:52:31AM +0100, Christian Brauner wrote:
> On Sun, Mar 09, 2025 at 02:28:11PM +0100, Alexander Mikhalitsyn wrote:
> > 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo
> > 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id
> > 3. Add SO_PEERCGROUPID kselftest
> > 
> > Generally speaking, this API allows race-free resolution of socket's peer cgroup id.
> > Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence
> > is used which is racy.
> > 
> > As we don't add any new state to the socket itself there is no potential locking issues
> > or performance problems. We use already existing sk->sk_cgrp_data.
> > 
> > We already have analogical interfaces to retrieve this
> > information:
> > - inet_diag: INET_DIAG_CGROUP_ID
> > - eBPF: bpf_sk_cgroup_id
> > 
> > Having getsockopt() interface makes sense for many applications, because using eBPF is
> > not always an option, while inet_diag has obvious complexety and performance drawbacks
> > if we only want to get this specific info for one specific socket.
> > 
> > Idea comes from UAPI kernel group:
> > https://uapi-group.org/kernel-features/
> > 
> > Huge thanks to Christian Brauner, Lennart Poettering and Luca Boccassi for proposing
> > and exchanging ideas about this.
> 
> Seems fine to me,
> Reviewed-by: Christian Brauner <brauner@kernel.org>

One wider conceptual comment.

Starting with v6.15 it is possible to retrieve exit information from
pidfds even after the task has been reaped. So if someone opens a pidfd
via pidfd_open() and that task gets reaped by the parent it is possible
to call PIDFD_INFO_EXIT and you can retrieve the exit status and the
cgroupid of the task that was reaped. That works even after all task
linkage has been removed from struct pid.

The system call api doesn't allow the creation of pidfds for reaped
processes. It wouldn't be possible as the pid number will have already
been released.

Both SO_PEERPIDFD and SO_PASSPIDFD also don't allow the creation of
pidfds for already reaped peers or senders.

But that doesn't have to be the case since we always have the struct pid
available. So it's entirely possible to hand out a pidfd to a reaped
process if it's guaranteed that exit information is available. If it's
not then this would be a bug.

The trick is that when a struct pid is stashed it needs to also allocate
a pidfd inode. That could simply be done by a helper get_pidfs_pid()
which takes a reference to the struct pid and ensures that space for
recording exit information is available.

With that done SO_PEERCGROUPID isn't needed per se as it will be
possible to get the cgroupid and exit status from the pidfd.

From a cursory look that should be possible to do without too much work.
I'm just pointing this out as an alternative.

There's one restriction that this would be subject to that
SO_PEERCGROUPID isn't. The SO_PEERCGROUPID is exposed for any process
whereas PIDFD_GET_INFO ioctls (that includes the PIDFD_INFO_EXIT) option
is only available for processes within the receivers pid namespace
hierarchy.

But in any case, enabling pidfds for such reaped processes might still
be useful since it would mean receivers could get exit information for
pidfds within their pid namespace hierarchy.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 4/4] tools/testing/selftests/cgroup: add test for SO_PEERCGROUPID
  2025-03-09 13:28 ` [PATCH net-next 4/4] tools/testing/selftests/cgroup: add test for SO_PEERCGROUPID Alexander Mikhalitsyn
@ 2025-03-10 13:59   ` Willem de Bruijn
  2025-03-11  8:02   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 16+ messages in thread
From: Willem de Bruijn @ 2025-03-10 13:59 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, kuniyu
  Cc: Alexander Mikhalitsyn, linux-kselftest, linux-kernel, netdev,
	cgroups, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, Leon Romanovsky, Arnd Bergmann,
	Christian Brauner, Lennart Poettering, Luca Boccassi, Tejun Heo,
	Johannes Weiner, Michal Koutný, Shuah Khan

Alexander Mikhalitsyn wrote:
> Cc: linux-kselftest@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: cgroups@vger.kernel.org
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Lennart Poettering <mzxreary@0pointer.de>
> Cc: Luca Boccassi <bluca@debian.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: "Michal Koutný" <mkoutny@suse.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
>  tools/testing/selftests/cgroup/Makefile       |   2 +
>  .../selftests/cgroup/test_so_peercgroupid.c   | 308 ++++++++++++++++++
>  2 files changed, 310 insertions(+)
>  create mode 100644 tools/testing/selftests/cgroup/test_so_peercgroupid.c
> 
> diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile
> index 1b897152bab6..a932ff068081 100644
> --- a/tools/testing/selftests/cgroup/Makefile
> +++ b/tools/testing/selftests/cgroup/Makefile
> @@ -16,6 +16,7 @@ TEST_GEN_PROGS += test_kill
>  TEST_GEN_PROGS += test_kmem
>  TEST_GEN_PROGS += test_memcontrol
>  TEST_GEN_PROGS += test_pids
> +TEST_GEN_PROGS += test_so_peercgroupid
>  TEST_GEN_PROGS += test_zswap

need to add to .gitignore

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/4] net: core: add getsockopt SO_PEERCGROUPID
  2025-03-09 13:28 ` [PATCH net-next 2/4] net: core: add getsockopt SO_PEERCGROUPID Alexander Mikhalitsyn
@ 2025-03-10 14:00   ` Willem de Bruijn
  2025-03-11  7:52   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 16+ messages in thread
From: Willem de Bruijn @ 2025-03-10 14:00 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, kuniyu
  Cc: Alexander Mikhalitsyn, linux-kernel, netdev, cgroups,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn, Leon Romanovsky, Arnd Bergmann,
	Christian Brauner, Lennart Poettering, Luca Boccassi, Tejun Heo,
	Johannes Weiner, Michal Koutný

Alexander Mikhalitsyn wrote:
> Add SO_PEERCGROUPID which allows to get cgroup_id
> for a socket.
> 
> We already have analogical interfaces to retrieve this
> information:
> - inet_diag: INET_DIAG_CGROUP_ID
> - eBPF: bpf_sk_cgroup_id
> 
> Having getsockopt() interface makes sense for many
> applications, because using eBPF is not always an option,
> while inet_diag has obvious complexety and performance drawbacks
> if we only want to get this specific info for one specific socket.
> 
> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: cgroups@vger.kernel.org
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Lennart Poettering <mzxreary@0pointer.de>
> Cc: Luca Boccassi <bluca@debian.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: "Michal Koutný" <mkoutny@suse.com>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
>  arch/alpha/include/uapi/asm/socket.h    |  2 +
>  arch/mips/include/uapi/asm/socket.h     |  2 +
>  arch/parisc/include/uapi/asm/socket.h   |  2 +
>  arch/sparc/include/uapi/asm/socket.h    |  2 +
>  include/uapi/asm-generic/socket.h       |  2 +
>  net/core/sock.c                         | 17 +++++++
>  net/unix/af_unix.c                      | 63 +++++++++++++++++++++++++
>  tools/include/uapi/asm-generic/socket.h |  2 +
>  8 files changed, 92 insertions(+)
> 
> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> index 3df5f2dd4c0f..58ce457b2c09 100644
> --- a/arch/alpha/include/uapi/asm/socket.h
> +++ b/arch/alpha/include/uapi/asm/socket.h
> @@ -150,6 +150,8 @@
>  
>  #define SO_RCVPRIORITY		82
>  
> +#define SO_PEERCGROUPID		83
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
> index 22fa8f19924a..823fa67f7d79 100644
> --- a/arch/mips/include/uapi/asm/socket.h
> +++ b/arch/mips/include/uapi/asm/socket.h
> @@ -161,6 +161,8 @@
>  
>  #define SO_RCVPRIORITY		82
>  
> +#define SO_PEERCGROUPID		83
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
> index aa9cd4b951fe..1ee2e858d177 100644
> --- a/arch/parisc/include/uapi/asm/socket.h
> +++ b/arch/parisc/include/uapi/asm/socket.h
> @@ -142,6 +142,8 @@
>  
>  #define SO_RCVPRIORITY		0x404D
>  
> +#define SO_PEERCGROUPID		0x404E
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64
> diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
> index 5b464a568664..2fe7d0c48a63 100644
> --- a/arch/sparc/include/uapi/asm/socket.h
> +++ b/arch/sparc/include/uapi/asm/socket.h
> @@ -143,6 +143,8 @@
>  
>  #define SO_RCVPRIORITY           0x005b
>  
> +#define SO_PEERCGROUPID          0x005c
> +
>  #if !defined(__KERNEL__)
>  
>  
> diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
> index aa5016ff3d91..903904bb537c 100644
> --- a/include/uapi/asm-generic/socket.h
> +++ b/include/uapi/asm-generic/socket.h
> @@ -145,6 +145,8 @@
>  
>  #define SO_RCVPRIORITY		82
>  
> +#define SO_PEERCGROUPID		83
> +
>  #if !defined(__KERNEL__)
>  
>  #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
> diff --git a/net/core/sock.c b/net/core/sock.c
> index a0598518ce89..6dc0b1a8367b 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1946,6 +1946,23 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
>  		goto lenout;
>  	}
>  
> +#ifdef CONFIG_SOCK_CGROUP_DATA
> +	case SO_PEERCGROUPID:
> +	{
> +		const struct proto_ops *ops;
> +
> +		if (sk->sk_family != AF_UNIX)
> +			return -EOPNOTSUPP;
> +
> +		ops = READ_ONCE(sock->ops);
> +		if (!ops->getsockopt)
> +			return -EOPNOTSUPP;
> +
> +		return ops->getsockopt(sock, SOL_SOCKET, optname, optval.user,
> +				       optlen.user);
> +	}
> +#endif
> +
>  	/* Dubious BSD thing... Probably nobody even uses it, but
>  	 * the UNIX standard wants it for whatever reason... -DaveM
>  	 */
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 2b2c0036efc9..3455f38f033d 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -901,6 +901,66 @@ static void unix_show_fdinfo(struct seq_file *m, struct socket *sock)
>  #define unix_show_fdinfo NULL
>  #endif
>  
> +static int unix_getsockopt(struct socket *sock, int level, int optname,
> +			   char __user *optval, int __user *optlen)
> +{
> +	struct sock *sk = sock->sk;
> +
> +	union {
> +		int val;
> +		u64 val64;
> +	} v;
> +
> +	int lv = sizeof(int);
> +	int len;

why the union if the only valid use is a u64? Is this forward looking?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id
  2025-03-09 13:28 [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Alexander Mikhalitsyn
                   ` (5 preceding siblings ...)
  2025-03-10  8:52 ` Christian Brauner
@ 2025-03-11  7:33 ` Kuniyuki Iwashima
  2025-03-11 12:02   ` Christian Brauner
  2025-03-13 10:31 ` Michal Koutný
  7 siblings, 1 reply; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-11  7:33 UTC (permalink / raw)
  To: aleksandr.mikhalitsyn
  Cc: arnd, bluca, brauner, cgroups, davem, edumazet, hannes, kuba,
	kuniyu, leon, linux-kernel, mkoutny, mzxreary, netdev, pabeni,
	shuah, tj, willemb

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Date: Sun,  9 Mar 2025 14:28:11 +0100
> 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo

Why do you want to add yet another racy interface ?


> 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id
> 3. Add SO_PEERCGROUPID kselftest
> 
> Generally speaking, this API allows race-free resolution of socket's peer cgroup id.
> Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence
> is used which is racy.

Few more words about the race (recycling pid ?) would be appreciated.

I somewhat assumed pid is not recycled until all of its pidfd are
close()d, but sounds like no ?


> 
> As we don't add any new state to the socket itself there is no potential locking issues
> or performance problems. We use already existing sk->sk_cgrp_data.
> 
> We already have analogical interfaces to retrieve this
> information:
> - inet_diag: INET_DIAG_CGROUP_ID
> - eBPF: bpf_sk_cgroup_id
> 
> Having getsockopt() interface makes sense for many applications, because using eBPF is
> not always an option, while inet_diag has obvious complexety and performance drawbacks
> if we only want to get this specific info for one specific socket.

If it's limited to the connect()ed peer, I'd add UNIX_DIAG_CGROUP_ID
and UNIX_DIAG_PEER_CGROUP_ID instead.  Then also ss can use that easily.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 1/4] net: unix: print cgroup_id and peer_cgroup_id in fdinfo
  2025-03-09 13:28 ` [PATCH net-next 1/4] net: unix: print cgroup_id and peer_cgroup_id in fdinfo Alexander Mikhalitsyn
@ 2025-03-11  7:41   ` Kuniyuki Iwashima
  0 siblings, 0 replies; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-11  7:41 UTC (permalink / raw)
  To: aleksandr.mikhalitsyn
  Cc: arnd, bluca, brauner, cgroups, davem, edumazet, hannes, kuba,
	kuniyu, leon, linux-kernel, mkoutny, mzxreary, netdev, pabeni, tj,
	willemb

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Date: Sun,  9 Mar 2025 14:28:12 +0100

Please add few sentences here, why this interface is needed,
why accessing peer sk's sk_cgrp_data is not racy (e.g. sk_cgrp_data
never changes after creation (I'm not sure this is the case though)),
etc.

In case this interface is racy for the use case, please drop the patch.


> Cc: linux-kernel@vger.kernel.org
> Cc: netdev@vger.kernel.org
> Cc: cgroups@vger.kernel.org
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Lennart Poettering <mzxreary@0pointer.de>
> Cc: Luca Boccassi <bluca@debian.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: "Michal Koutný" <mkoutny@suse.com>
> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> ---
>  net/unix/af_unix.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 7f8f3859cdb3..2b2c0036efc9 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -117,6 +117,7 @@
>  #include <linux/file.h>
>  #include <linux/btf_ids.h>
>  #include <linux/bpf-cgroup.h>
> +#include <linux/cgroup.h>
>  
>  static atomic_long_t unix_nr_socks;
>  static struct hlist_head bsd_socket_buckets[UNIX_HASH_SIZE / 2];
> @@ -861,6 +862,11 @@ static void unix_show_fdinfo(struct seq_file *m, struct socket *sock)
>  	int nr_fds = 0;
>  
>  	if (sk) {
> +#ifdef CONFIG_SOCK_CGROUP_DATA
> +		struct sock *peer;
> +		u64 sk_cgroup_id = 0;

Please keep reverse xmas tree order for net patches.
https://docs.kernel.org/process/maintainer-netdev.html#local-variable-ordering-reverse-xmas-tree-rcs

Also, no need to initialise sk_cgroup_id, so it should be:

	struct sock *peer;
	u64 sk_cgroup_id;


> +#endif
> +
>  		s_state = READ_ONCE(sk->sk_state);
>  		u = unix_sk(sk);
>  
> @@ -874,6 +880,21 @@ static void unix_show_fdinfo(struct seq_file *m, struct socket *sock)
>  			nr_fds = unix_count_nr_fds(sk);
>  
>  		seq_printf(m, "scm_fds: %u\n", nr_fds);
> +
> +#ifdef CONFIG_SOCK_CGROUP_DATA
> +		sk_cgroup_id = cgroup_id(sock_cgroup_ptr(&sk->sk_cgrp_data));
> +		seq_printf(m, "cgroup_id: %llu\n", sk_cgroup_id);
> +
> +		peer = unix_peer_get(sk);
> +		if (peer) {
> +			u64 peer_cgroup_id = 0;

Same here, no need to initialise peer_cgroup_id.


> +
> +			peer_cgroup_id = cgroup_id(sock_cgroup_ptr(&peer->sk_cgrp_data));
> +			sock_put(peer);
> +
> +			seq_printf(m, "peer_cgroup_id: %llu\n", peer_cgroup_id);
> +		}
> +#endif
>  	}
>  }
>  #else
> -- 
> 2.43.0

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 2/4] net: core: add getsockopt SO_PEERCGROUPID
  2025-03-09 13:28 ` [PATCH net-next 2/4] net: core: add getsockopt SO_PEERCGROUPID Alexander Mikhalitsyn
  2025-03-10 14:00   ` Willem de Bruijn
@ 2025-03-11  7:52   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-11  7:52 UTC (permalink / raw)
  To: aleksandr.mikhalitsyn
  Cc: arnd, bluca, brauner, cgroups, davem, edumazet, hannes, kuba,
	kuniyu, leon, linux-kernel, mkoutny, mzxreary, netdev, pabeni, tj,
	willemb

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Date: Sun,  9 Mar 2025 14:28:13 +0100
> Add SO_PEERCGROUPID which allows to get cgroup_id
> for a socket.

This looks useless for SOCK_DGRAM as the server can't have a
peer for clients to send() or connect() (see unix_may_send()).

Is there any reason to support only the connect()ed peer ?
It seems the uAPI group expects to retrieve data per message
as SCM_CGROUPID.


> 
> We already have analogical interfaces to retrieve this
> information:
> - inet_diag: INET_DIAG_CGROUP_ID
> - eBPF: bpf_sk_cgroup_id
> 
> Having getsockopt() interface makes sense for many
> applications, because using eBPF is not always an option,
> while inet_diag has obvious complexety and performance drawbacks
> if we only want to get this specific info for one specific socket.
[...]
> diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
> index 2b2c0036efc9..3455f38f033d 100644
> --- a/net/unix/af_unix.c
> +++ b/net/unix/af_unix.c
> @@ -901,6 +901,66 @@ static void unix_show_fdinfo(struct seq_file *m, struct socket *sock)
>  #define unix_show_fdinfo NULL
>  #endif
>  
> +static int unix_getsockopt(struct socket *sock, int level, int optname,
> +			   char __user *optval, int __user *optlen)
> +{
> +	struct sock *sk = sock->sk;
> +
> +	union {
> +		int val;
> +		u64 val64;

As Willem pointed out, simply use val64 only.


> +	} v;
> +
> +	int lv = sizeof(int);
> +	int len;
> +
> +	if (level != SOL_SOCKET)
> +		return -ENOPROTOOPT;
> +
> +	if (get_user(len, optlen))
> +		return -EFAULT;
> +
> +	if (len < 0)
> +		return -EINVAL;
> +
> +	memset(&v, 0, sizeof(v));
> +
> +	switch (optname) {
> +#ifdef CONFIG_SOCK_CGROUP_DATA
> +	case SO_PEERCGROUPID:
> +	{
> +		struct sock *peer;
> +		u64 peer_cgroup_id = 0;

Same remarks in patch 1, reverse xmas tree order, and no
initialisation needed.


> +
> +		lv = sizeof(u64);
> +		if (len < lv)
> +			return -EINVAL;
> +
> +		peer = unix_peer_get(sk);
> +		if (!peer)
> +			return -ENODATA;
> +
> +		peer_cgroup_id = cgroup_id(sock_cgroup_ptr(&peer->sk_cgrp_data));
> +		sock_put(peer);
> +
> +		v.val64 = peer_cgroup_id;
> +		break;
> +	}
> +#endif
> +	default:
> +		return -ENOPROTOOPT;
> +	}
> +
> +	if (len > lv)
> +		len = lv;
> +	if (copy_to_user(optval, &v, len))
> +		return -EFAULT;
> +	if (put_user(len, optlen))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 4/4] tools/testing/selftests/cgroup: add test for SO_PEERCGROUPID
  2025-03-09 13:28 ` [PATCH net-next 4/4] tools/testing/selftests/cgroup: add test for SO_PEERCGROUPID Alexander Mikhalitsyn
  2025-03-10 13:59   ` Willem de Bruijn
@ 2025-03-11  8:02   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 16+ messages in thread
From: Kuniyuki Iwashima @ 2025-03-11  8:02 UTC (permalink / raw)
  To: aleksandr.mikhalitsyn
  Cc: arnd, bluca, brauner, cgroups, davem, edumazet, hannes, kuba,
	kuniyu, leon, linux-kernel, linux-kselftest, mkoutny, mzxreary,
	netdev, pabeni, shuah, tj, willemb

From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Date: Sun,  9 Mar 2025 14:28:15 +0100
> +static void client(FIXTURE_DATA(so_peercgroupid) *self,
> +		   const FIXTURE_VARIANT(so_peercgroupid) *variant)
> +{
> +	int cfd, err;
> +	socklen_t len;
> +	uint64_t peer_cgroup_id = 0, test_cgroup1_id = 0, test_cgroup2_id = 0;
> +	char state;
> +
> +	cfd = socket(AF_UNIX, variant->type, 0);
> +	if (cfd < 0) {
> +		log_err("socket");
> +		child_die();
> +	}
> +
> +	if (variant->type == SOCK_DGRAM) {
> +		fill_sockaddr(self->client_addr, variant->abstract);
> +
> +		if (bind(cfd, (struct sockaddr *)&self->client_addr->listen_addr, self->client_addr->addrlen)) {
> +			log_err("bind");
> +			child_die();
> +		}
> +	}
> +
> +	/* negative testcase: no peer for socket yet */
> +	len = sizeof(peer_cgroup_id);
> +	err = getsockopt(cfd, SOL_SOCKET, SO_PEERCGROUPID, &peer_cgroup_id, &len);
> +	if (!err || (errno != ENODATA)) {
> +		log_err("getsockopt must fail with errno == ENODATA when socket has no peer");
> +		child_die();
> +	}
> +
> +	if (connect(cfd, (struct sockaddr *)&self->server_addr.listen_addr,
> +		    self->server_addr.addrlen) != 0) {
> +		log_err("connect");
> +		child_die();
> +	}
> +
> +	state = 'R';
> +	write(self->sync_sk[1], &state, sizeof(state));

nit: This looks unnecessary ?


> +
> +	read(self->sync_sk[1], &test_cgroup1_id, sizeof(uint64_t));
> +	read(self->sync_sk[1], &test_cgroup2_id, sizeof(uint64_t));
> +
> +	len = sizeof(peer_cgroup_id);
> +	if (getsockopt(cfd, SOL_SOCKET, SO_PEERCGROUPID, &peer_cgroup_id, &len)) {
> +		log_err("Failed to get SO_PEERCGROUPID");
> +		child_die();
> +	}
> +
> +	/*
> +	 * There is a difference between connection-oriented sockets
> +	 * and connectionless ones from the perspective of SO_PEERCGROUPID.
> +	 *
> +	 * sk->sk_cgrp_data is getting filled when we allocate struct sock (see call to cgroup_sk_alloc()).
> +	 * For DGRAM socket, self->server socket is our peer and by the time when we allocate it,
> +	 * parent process sits in a test_cgroup1. Then it changes cgroup to test_cgroup2, but it does not
> +	 * affect anything.
> +	 * For STREAM/SEQPACKET socket, self->server is not our peer, but that one we get from accept()
> +	 * syscall. And by the time when we call accept(), parent process sits in test_cgroup2.
> +	 *
> +	 * Let's ensure that it works like that and if it get changed then we should detect it
> +	 * as it's a clear UAPI change.
> +	 */
> +	if (variant->type == SOCK_DGRAM) {
> +		/* cgroup id from SO_PEERCGROUPID should be equal to the test_cgroup1_id */
> +		if (peer_cgroup_id != test_cgroup1_id) {
> +			log_err("peer_cgroup_id != test_cgroup1_id: %" PRId64 " != %" PRId64, peer_cgroup_id, test_cgroup1_id);
> +			child_die();
> +		}
> +	} else {
> +		/* cgroup id from SO_PEERCGROUPID should be equal to the test_cgroup2_id */
> +		if (peer_cgroup_id != test_cgroup2_id) {
> +			log_err("peer_cgroup_id != test_cgroup2_id: %" PRId64 " != %" PRId64, peer_cgroup_id, test_cgroup2_id);
> +			child_die();
> +		}
> +	}
> +}
> +
> +TEST_F(so_peercgroupid, test)
> +{
> +	uint64_t test_cgroup1_id, test_cgroup2_id;
> +	int err;
> +	int pfd;
> +	char state;
> +	int child_status = 0;
> +
> +	if (cg_find_unified_root(self->cgroup_root, sizeof(self->cgroup_root), NULL))
> +		ksft_exit_skip("cgroup v2 isn't mounted\n");
> +
> +	self->test_cgroup1 = cg_name(self->cgroup_root, "so_peercgroupid_cg1");
> +	ASSERT_NE(NULL, self->test_cgroup1);
> +
> +	self->test_cgroup2 = cg_name(self->cgroup_root, "so_peercgroupid_cg2");
> +	ASSERT_NE(NULL, self->test_cgroup2);
> +
> +	err = cg_create(self->test_cgroup1);
> +	ASSERT_EQ(0, err);
> +
> +	err = cg_create(self->test_cgroup2);
> +	ASSERT_EQ(0, err);
> +
> +	test_cgroup1_id = cg_get_id(self->test_cgroup1);
> +	ASSERT_LT(0, test_cgroup1_id);
> +
> +	test_cgroup2_id = cg_get_id(self->test_cgroup2);
> +	ASSERT_LT(0, test_cgroup2_id);
> +
> +	/* enter test_cgroup1 before allocating a socket */
> +	err = cg_enter_current(self->test_cgroup1);
> +	ASSERT_EQ(0, err);
> +
> +	self->server = socket(AF_UNIX, variant->type, 0);
> +	ASSERT_NE(-1, self->server);
> +
> +	/* enter test_cgroup2 after allocating a socket */
> +	err = cg_enter_current(self->test_cgroup2);
> +	ASSERT_EQ(0, err);
> +
> +	fill_sockaddr(&self->server_addr, variant->abstract);
> +
> +	err = bind(self->server, (struct sockaddr *)&self->server_addr.listen_addr, self->server_addr.addrlen);
> +	ASSERT_EQ(0, err);
> +
> +	if (variant->type != SOCK_DGRAM) {
> +		err = listen(self->server, 1);
> +		ASSERT_EQ(0, err);
> +	}
> +
> +	err = socketpair(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0, self->sync_sk);
> +	EXPECT_EQ(err, 0);
> +
> +	self->client_pid = fork();
> +	ASSERT_NE(-1, self->client_pid);
> +	if (self->client_pid == 0) {
> +		close(self->server);
> +		close(self->sync_sk[0]);
> +		client(self, variant);
> +		exit(0);
> +	}
> +	close(self->sync_sk[1]);
> +
> +	if (variant->type != SOCK_DGRAM) {
> +		pfd = accept(self->server, NULL, NULL);
> +		ASSERT_NE(-1, pfd);

nit: close(self->server) here ?

It's close()d anyway when the process exits.


> +	} else {
> +		pfd = self->server;
> +	}
> +
> +	/* wait until the child arrives at checkpoint */
> +	read(self->sync_sk[0], &state, sizeof(state));
> +	ASSERT_EQ(state, 'R');

The client will wait two write()s without this synchronisation.


> +
> +	write(self->sync_sk[0], &test_cgroup1_id, sizeof(uint64_t));
> +	write(self->sync_sk[0], &test_cgroup2_id, sizeof(uint64_t));
> +
> +	close(pfd);
> +	waitpid(self->client_pid, &child_status, 0);
> +	ASSERT_EQ(0, WIFEXITED(child_status) ? WEXITSTATUS(child_status) : 1);
> +}
> +
> +TEST_HARNESS_MAIN

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id
  2025-03-11  7:33 ` Kuniyuki Iwashima
@ 2025-03-11 12:02   ` Christian Brauner
  0 siblings, 0 replies; 16+ messages in thread
From: Christian Brauner @ 2025-03-11 12:02 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: aleksandr.mikhalitsyn, arnd, bluca, cgroups, davem, edumazet,
	hannes, kuba, leon, linux-kernel, mkoutny, mzxreary, netdev,
	pabeni, shuah, tj, willemb

On Tue, Mar 11, 2025 at 12:33:48AM -0700, Kuniyuki Iwashima wrote:
> From: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
> Date: Sun,  9 Mar 2025 14:28:11 +0100
> > 1. Add socket cgroup id and socket's peer cgroup id in socket's fdinfo
> 
> Why do you want to add yet another racy interface ?
> 
> 
> > 2. Add SO_PEERCGROUPID which allows to retrieve socket's peer cgroup id
> > 3. Add SO_PEERCGROUPID kselftest
> > 
> > Generally speaking, this API allows race-free resolution of socket's peer cgroup id.
> > Currently, to do that SCM_CREDENTIALS/SCM_PIDFD -> pid -> /proc/<pid>/cgroup sequence
> > is used which is racy.
> 
> Few more words about the race (recycling pid ?) would be appreciated.
> 
> I somewhat assumed pid is not recycled until all of its pidfd are
> close()d, but sounds like no ?

No, that would allow starving the kernel of pid numbers.
pidfds don't pin struct task_struct for a multitude of reasons similar
to how cred->peer or scm->pid don't stash a task_struct but a struct pid.

> 
> 
> > 
> > As we don't add any new state to the socket itself there is no potential locking issues
> > or performance problems. We use already existing sk->sk_cgrp_data.
> > 
> > We already have analogical interfaces to retrieve this
> > information:
> > - inet_diag: INET_DIAG_CGROUP_ID
> > - eBPF: bpf_sk_cgroup_id
> > 
> > Having getsockopt() interface makes sense for many applications, because using eBPF is
> > not always an option, while inet_diag has obvious complexety and performance drawbacks
> > if we only want to get this specific info for one specific socket.
> 
> If it's limited to the connect()ed peer, I'd add UNIX_DIAG_CGROUP_ID
> and UNIX_DIAG_PEER_CGROUP_ID instead.  Then also ss can use that easily.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id
  2025-03-09 13:28 [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Alexander Mikhalitsyn
                   ` (6 preceding siblings ...)
  2025-03-11  7:33 ` Kuniyuki Iwashima
@ 2025-03-13 10:31 ` Michal Koutný
  7 siblings, 0 replies; 16+ messages in thread
From: Michal Koutný @ 2025-03-13 10:31 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: kuniyu, linux-kernel, netdev, cgroups, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Willem de Bruijn,
	Leon Romanovsky, Arnd Bergmann, Christian Brauner,
	Lennart Poettering, Luca Boccassi, Tejun Heo, Johannes Weiner,
	Shuah Khan

[-- Attachment #1: Type: text/plain, Size: 1614 bytes --]

Hello.

On Sun, Mar 09, 2025 at 02:28:11PM +0100, Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> wrote:
> As we don't add any new state to the socket itself there is no potential locking issues
> or performance problems. We use already existing sk->sk_cgrp_data.

This is the cgroup where the socket was created in. If such a socket is
fd-passed to another cgroup, SO_PEERCGROUPID may not be what's expected.

> We already have analogical interfaces to retrieve this
> information:
> - inet_diag: INET_DIAG_CGROUP_ID
> - eBPF: bpf_sk_cgroup_id
> 
> Having getsockopt() interface makes sense for many applications, because using eBPF is
> not always an option, while inet_diag has obvious complexety and performance drawbacks
> if we only want to get this specific info for one specific socket.

I'm not that familiar with INET_DIAG_CGROUP_ID but that one sounds like
fit for the purpose of obtaining socket creator's cgroup whereas what is
desired here is slightly different thing -- cgroup of actual sender
through the socket.

> Idea comes from UAPI kernel group:
> https://uapi-group.org/kernel-features/

In theory shortlived (sending) program may reside in shortlived cgroup
and the consumer of SO_PEERCGROUPID (even if it is real sender) would
only have that number to work with.
It doesn't guarantee existence of original cgroup or stable translation
to cgroup path. I think having something like this could be useful but
consumers must still be aware of limitations (and possible switch from
path-oriented to id-oriented work with cgroups).

Thanks,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-03-13 10:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-09 13:28 [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Alexander Mikhalitsyn
2025-03-09 13:28 ` [PATCH net-next 1/4] net: unix: print cgroup_id and peer_cgroup_id in fdinfo Alexander Mikhalitsyn
2025-03-11  7:41   ` Kuniyuki Iwashima
2025-03-09 13:28 ` [PATCH net-next 2/4] net: core: add getsockopt SO_PEERCGROUPID Alexander Mikhalitsyn
2025-03-10 14:00   ` Willem de Bruijn
2025-03-11  7:52   ` Kuniyuki Iwashima
2025-03-09 13:28 ` [PATCH net-next 3/4] tools/testing/selftests/cgroup/cgroup_util: add cg_get_id helper Alexander Mikhalitsyn
2025-03-09 13:28 ` [PATCH net-next 4/4] tools/testing/selftests/cgroup: add test for SO_PEERCGROUPID Alexander Mikhalitsyn
2025-03-10 13:59   ` Willem de Bruijn
2025-03-11  8:02   ` Kuniyuki Iwashima
2025-03-09 14:36 ` [PATCH net-next 0/4] Add getsockopt(SO_PEERCGROUPID) and fdinfo API to retreive socket's peer cgroup id Tejun Heo
2025-03-10  8:52 ` Christian Brauner
2025-03-10 11:27   ` Christian Brauner
2025-03-11  7:33 ` Kuniyuki Iwashima
2025-03-11 12:02   ` Christian Brauner
2025-03-13 10:31 ` Michal Koutný

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).