netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS
@ 2025-05-10  1:56 Kuniyuki Iwashima
  2025-05-10  1:56 ` [PATCH v2 net-next 1/9] af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD Kuniyuki Iwashima
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-10  1:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

As long as recvmsg() or recvmmsg() is used with cmsg, it is not
possible to avoid receiving file descriptors via SCM_RIGHTS.

This series introduces a new socket option, SO_PASSRIGHTS, to allow
disabling SCM_RIGHTS.  The option is enabled by default.

See patch 8 for background/context.

This series is related to [0], but is split into a separate series,
as most of the patches are specific to af_unix.

The v2 of the BPF LSM extension part will be posted later, once
this series is merged into net-next and has landed in bpf-next.

[0]: https://lore.kernel.org/bpf/20250505215802.48449-1-kuniyu@amazon.com/


Changes:
  v2:
    * Added patch 4 & 5 to reuse sk_txrehash for scm_recv() flags

  v1: https://lore.kernel.org/netdev/20250508013021.79654-1-kuniyu@amazon.com/


Kuniyuki Iwashima (9):
  af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD.
  af_unix: Don't pass struct socket to maybe_add_creds().
  scm: Move scm_recv() from scm.h to scm.c.
  tcp: Restrict SO_TXREHASH to TCP socket.
  net: Restrict SO_PASS{CRED,PIDFD,SEC} to AF_{UNIX,NETLINK,BLUETOOTH}.
  af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
  af_unix: Inherit sk_flags at connect().
  af_unix: Introduce SO_PASSRIGHTS.
  selftest: af_unix: Test SO_PASSRIGHTS.

 arch/alpha/include/uapi/asm/socket.h          |   2 +
 arch/mips/include/uapi/asm/socket.h           |   2 +
 arch/parisc/include/uapi/asm/socket.h         |   2 +
 arch/sparc/include/uapi/asm/socket.h          |   2 +
 include/linux/net.h                           |  15 +--
 include/net/scm.h                             | 121 +-----------------
 include/net/sock.h                            |  29 ++++-
 include/uapi/asm-generic/socket.h             |   2 +
 net/core/scm.c                                | 121 ++++++++++++++++++
 net/core/sock.c                               |  52 ++++++--
 net/unix/af_unix.c                            |  96 +++++++-------
 tools/include/uapi/asm-generic/socket.h       |   2 +
 .../selftests/net/af_unix/scm_rights.c        |  84 +++++++++++-
 13 files changed, 343 insertions(+), 187 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 1/9] af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD.
  2025-05-10  1:56 [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
@ 2025-05-10  1:56 ` Kuniyuki Iwashima
  2025-05-10  1:56 ` [PATCH v2 net-next 2/9] af_unix: Don't pass struct socket to maybe_add_creds() Kuniyuki Iwashima
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-10  1:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

Currently, the same checks for SOCK_PASSCRED and SOCK_PASSPIDFD
are scattered across many places.

Let's centralise the bit tests to make the following changes cleaner.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2ab20821d6bb..464e183ffdb8 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -765,6 +765,14 @@ static void copy_peercred(struct sock *sk, struct sock *peersk)
 	spin_unlock(&sk->sk_peer_lock);
 }
 
+static bool unix_may_passcred(const struct sock *sk)
+{
+	struct socket *sock = sk->sk_socket;
+
+	return test_bit(SOCK_PASSCRED, &sock->flags) ||
+		test_bit(SOCK_PASSPIDFD, &sock->flags);
+}
+
 static int unix_listen(struct socket *sock, int backlog)
 {
 	int err;
@@ -1411,9 +1419,7 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
 		if (err)
 			goto out;
 
-		if ((test_bit(SOCK_PASSCRED, &sock->flags) ||
-		     test_bit(SOCK_PASSPIDFD, &sock->flags)) &&
-		    !READ_ONCE(unix_sk(sk)->addr)) {
+		if (unix_may_passcred(sk) && !READ_ONCE(unix_sk(sk)->addr)) {
 			err = unix_autobind(sk);
 			if (err)
 				goto out;
@@ -1531,9 +1537,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	if (err)
 		goto out;
 
-	if ((test_bit(SOCK_PASSCRED, &sock->flags) ||
-	     test_bit(SOCK_PASSPIDFD, &sock->flags)) &&
-	    !READ_ONCE(u->addr)) {
+	if (unix_may_passcred(sk) && !READ_ONCE(u->addr)) {
 		err = unix_autobind(sk);
 		if (err)
 			goto out;
@@ -1877,16 +1881,6 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 	return err;
 }
 
-static bool unix_passcred_enabled(const struct socket *sock,
-				  const struct sock *other)
-{
-	return test_bit(SOCK_PASSCRED, &sock->flags) ||
-	       test_bit(SOCK_PASSPIDFD, &sock->flags) ||
-	       !other->sk_socket ||
-	       test_bit(SOCK_PASSCRED, &other->sk_socket->flags) ||
-	       test_bit(SOCK_PASSPIDFD, &other->sk_socket->flags);
-}
-
 /*
  * Some apps rely on write() giving SCM_CREDENTIALS
  * We include credentials if source or destination socket
@@ -1897,7 +1891,9 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
 {
 	if (UNIXCB(skb).pid)
 		return;
-	if (unix_passcred_enabled(sock, other)) {
+
+	if (unix_may_passcred(sock->sk) ||
+	    !other->sk_socket || unix_may_passcred(other)) {
 		UNIXCB(skb).pid  = get_pid(task_tgid(current));
 		current_uid_gid(&UNIXCB(skb).uid, &UNIXCB(skb).gid);
 	}
@@ -1974,9 +1970,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 			goto out;
 	}
 
-	if ((test_bit(SOCK_PASSCRED, &sock->flags) ||
-	     test_bit(SOCK_PASSPIDFD, &sock->flags)) &&
-	    !READ_ONCE(u->addr)) {
+	if (unix_may_passcred(sk) && !READ_ONCE(u->addr)) {
 		err = unix_autobind(sk);
 		if (err)
 			goto out;
@@ -2846,8 +2840,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
 			/* Never glue messages from different writers */
 			if (!unix_skb_scm_eq(skb, &scm))
 				break;
-		} else if (test_bit(SOCK_PASSCRED, &sock->flags) ||
-			   test_bit(SOCK_PASSPIDFD, &sock->flags)) {
+		} else if (unix_may_passcred(sk)) {
 			/* Copy credentials */
 			scm_set_cred(&scm, UNIXCB(skb).pid, UNIXCB(skb).uid, UNIXCB(skb).gid);
 			unix_set_secdata(&scm, skb);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 2/9] af_unix: Don't pass struct socket to maybe_add_creds().
  2025-05-10  1:56 [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
  2025-05-10  1:56 ` [PATCH v2 net-next 1/9] af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD Kuniyuki Iwashima
@ 2025-05-10  1:56 ` Kuniyuki Iwashima
  2025-05-10  1:56 ` [PATCH v2 net-next 3/9] scm: Move scm_recv() from scm.h to scm.c Kuniyuki Iwashima
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-10  1:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

We will move SOCK_PASS{CRED,PIDFD,SEC} from struct socket.flags
to struct sock for better handling with SOCK_PASSRIGHTS.

Then, we don't need to access struct socket in maybe_add_creds().

Let's pass struct sock to maybe_add_creds() and its caller
queue_oob().

While at it, we append the unix_ prefix and fix double spaces
around the pid assignment.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 464e183ffdb8..a39497fd6e98 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1869,7 +1869,7 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 {
 	int err = 0;
 
-	UNIXCB(skb).pid  = get_pid(scm->pid);
+	UNIXCB(skb).pid = get_pid(scm->pid);
 	UNIXCB(skb).uid = scm->creds.uid;
 	UNIXCB(skb).gid = scm->creds.gid;
 	UNIXCB(skb).fp = NULL;
@@ -1886,15 +1886,15 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
  * We include credentials if source or destination socket
  * asserted SOCK_PASSCRED.
  */
-static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
-			    const struct sock *other)
+static void unix_maybe_add_creds(struct sk_buff *skb, const struct sock *sk,
+				 const struct sock *other)
 {
 	if (UNIXCB(skb).pid)
 		return;
 
-	if (unix_may_passcred(sock->sk) ||
+	if (unix_may_passcred(sk) ||
 	    !other->sk_socket || unix_may_passcred(other)) {
-		UNIXCB(skb).pid  = get_pid(task_tgid(current));
+		UNIXCB(skb).pid = get_pid(task_tgid(current));
 		current_uid_gid(&UNIXCB(skb).uid, &UNIXCB(skb).gid);
 	}
 }
@@ -2133,7 +2133,8 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 
 	if (sock_flag(other, SOCK_RCVTSTAMP))
 		__net_timestamp(skb);
-	maybe_add_creds(skb, sock, other);
+
+	unix_maybe_add_creds(skb, sk, other);
 	scm_stat_add(other, skb);
 	skb_queue_tail(&other->sk_receive_queue, skb);
 	unix_state_unlock(other);
@@ -2161,14 +2162,14 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 #define UNIX_SKB_FRAGS_SZ (PAGE_SIZE << get_order(32768))
 
 #if IS_ENABLED(CONFIG_AF_UNIX_OOB)
-static int queue_oob(struct socket *sock, struct msghdr *msg, struct sock *other,
+static int queue_oob(struct sock *sk, struct msghdr *msg, struct sock *other,
 		     struct scm_cookie *scm, bool fds_sent)
 {
 	struct unix_sock *ousk = unix_sk(other);
 	struct sk_buff *skb;
 	int err;
 
-	skb = sock_alloc_send_skb(sock->sk, 1, msg->msg_flags & MSG_DONTWAIT, &err);
+	skb = sock_alloc_send_skb(sk, 1, msg->msg_flags & MSG_DONTWAIT, &err);
 
 	if (!skb)
 		return err;
@@ -2192,7 +2193,7 @@ static int queue_oob(struct socket *sock, struct msghdr *msg, struct sock *other
 		goto out;
 	}
 
-	maybe_add_creds(skb, sock, other);
+	unix_maybe_add_creds(skb, sk, other);
 	scm_stat_add(other, skb);
 
 	spin_lock(&other->sk_receive_queue.lock);
@@ -2308,7 +2309,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 		    (other->sk_shutdown & RCV_SHUTDOWN))
 			goto out_pipe_unlock;
 
-		maybe_add_creds(skb, sock, other);
+		unix_maybe_add_creds(skb, sk, other);
 		scm_stat_add(other, skb);
 		skb_queue_tail(&other->sk_receive_queue, skb);
 		unix_state_unlock(other);
@@ -2318,7 +2319,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 
 #if IS_ENABLED(CONFIG_AF_UNIX_OOB)
 	if (msg->msg_flags & MSG_OOB) {
-		err = queue_oob(sock, msg, other, &scm, fds_sent);
+		err = queue_oob(sk, msg, other, &scm, fds_sent);
 		if (err)
 			goto out_err;
 		sent++;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 3/9] scm: Move scm_recv() from scm.h to scm.c.
  2025-05-10  1:56 [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
  2025-05-10  1:56 ` [PATCH v2 net-next 1/9] af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD Kuniyuki Iwashima
  2025-05-10  1:56 ` [PATCH v2 net-next 2/9] af_unix: Don't pass struct socket to maybe_add_creds() Kuniyuki Iwashima
@ 2025-05-10  1:56 ` Kuniyuki Iwashima
  2025-05-10  1:56 ` [PATCH v2 net-next 4/9] tcp: Restrict SO_TXREHASH to TCP socket Kuniyuki Iwashima
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-10  1:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

scm_recv() has been placed in scm.h since the pre-git era for no
particular reason (I think), which makes the file really fragile.

For example, when you move SOCK_PASSCRED from include/linux/net.h to
enum sock_flags in include/net/sock.h, you will see weird build failure
due to terrible dependency.

To avoid the build failure in the future, let's move scm_recv(_unix())?
and its callees to scm.c.

Note that only scm_recv() needs to be exported for Bluetooth.

scm_send() should be moved to scm.c too, but I'll revisit later.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/scm.h | 121 ++-------------------------------------------
 net/core/scm.c    | 122 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 126 insertions(+), 117 deletions(-)

diff --git a/include/net/scm.h b/include/net/scm.h
index 22bb49589fde..84c4707e78a5 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -102,123 +102,10 @@ static __inline__ int scm_send(struct socket *sock, struct msghdr *msg,
 	return __scm_send(sock, msg, scm);
 }
 
-#ifdef CONFIG_SECURITY_NETWORK
-static inline void scm_passec(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm)
-{
-	struct lsm_context ctx;
-	int err;
-
-	if (test_bit(SOCK_PASSSEC, &sock->flags)) {
-		err = security_secid_to_secctx(scm->secid, &ctx);
-
-		if (err >= 0) {
-			put_cmsg(msg, SOL_SOCKET, SCM_SECURITY, ctx.len,
-				 ctx.context);
-			security_release_secctx(&ctx);
-		}
-	}
-}
-
-static inline bool scm_has_secdata(struct socket *sock)
-{
-	return test_bit(SOCK_PASSSEC, &sock->flags);
-}
-#else
-static inline void scm_passec(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm)
-{ }
-
-static inline bool scm_has_secdata(struct socket *sock)
-{
-	return false;
-}
-#endif /* CONFIG_SECURITY_NETWORK */
-
-static __inline__ void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm)
-{
-	struct file *pidfd_file = NULL;
-	int len, pidfd;
-
-	/* put_cmsg() doesn't return an error if CMSG is truncated,
-	 * that's why we need to opencode these checks here.
-	 */
-	if (msg->msg_flags & MSG_CMSG_COMPAT)
-		len = sizeof(struct compat_cmsghdr) + sizeof(int);
-	else
-		len = sizeof(struct cmsghdr) + sizeof(int);
-
-	if (msg->msg_controllen < len) {
-		msg->msg_flags |= MSG_CTRUNC;
-		return;
-	}
-
-	if (!scm->pid)
-		return;
-
-	pidfd = pidfd_prepare(scm->pid, 0, &pidfd_file);
-
-	if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {
-		if (pidfd_file) {
-			put_unused_fd(pidfd);
-			fput(pidfd_file);
-		}
-
-		return;
-	}
-
-	if (pidfd_file)
-		fd_install(pidfd, pidfd_file);
-}
-
-static inline bool __scm_recv_common(struct socket *sock, struct msghdr *msg,
-				     struct scm_cookie *scm, int flags)
-{
-	if (!msg->msg_control) {
-		if (test_bit(SOCK_PASSCRED, &sock->flags) ||
-		    test_bit(SOCK_PASSPIDFD, &sock->flags) ||
-		    scm->fp || scm_has_secdata(sock))
-			msg->msg_flags |= MSG_CTRUNC;
-		scm_destroy(scm);
-		return false;
-	}
-
-	if (test_bit(SOCK_PASSCRED, &sock->flags)) {
-		struct user_namespace *current_ns = current_user_ns();
-		struct ucred ucreds = {
-			.pid = scm->creds.pid,
-			.uid = from_kuid_munged(current_ns, scm->creds.uid),
-			.gid = from_kgid_munged(current_ns, scm->creds.gid),
-		};
-		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(ucreds), &ucreds);
-	}
-
-	scm_passec(sock, msg, scm);
-
-	if (scm->fp)
-		scm_detach_fds(msg, scm);
-
-	return true;
-}
-
-static inline void scm_recv(struct socket *sock, struct msghdr *msg,
-			    struct scm_cookie *scm, int flags)
-{
-	if (!__scm_recv_common(sock, msg, scm, flags))
-		return;
-
-	scm_destroy_cred(scm);
-}
-
-static inline void scm_recv_unix(struct socket *sock, struct msghdr *msg,
-				 struct scm_cookie *scm, int flags)
-{
-	if (!__scm_recv_common(sock, msg, scm, flags))
-		return;
-
-	if (test_bit(SOCK_PASSPIDFD, &sock->flags))
-		scm_pidfd_recv(msg, scm);
-
-	scm_destroy_cred(scm);
-}
+void scm_recv(struct socket *sock, struct msghdr *msg,
+	      struct scm_cookie *scm, int flags);
+void scm_recv_unix(struct socket *sock, struct msghdr *msg,
+		   struct scm_cookie *scm, int flags);
 
 static inline int scm_recv_one_fd(struct file *f, int __user *ufd,
 				  unsigned int flags)
diff --git a/net/core/scm.c b/net/core/scm.c
index 733c0cbd393d..3f756f00e41e 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -404,3 +404,125 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
 	return new_fpl;
 }
 EXPORT_SYMBOL(scm_fp_dup);
+
+#ifdef CONFIG_SECURITY_NETWORK
+static void scm_passec(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm)
+{
+	struct lsm_context ctx;
+	int err;
+
+	if (test_bit(SOCK_PASSSEC, &sock->flags)) {
+		err = security_secid_to_secctx(scm->secid, &ctx);
+
+		if (err >= 0) {
+			put_cmsg(msg, SOL_SOCKET, SCM_SECURITY, ctx.len,
+				 ctx.context);
+
+			security_release_secctx(&ctx);
+		}
+	}
+}
+
+static bool scm_has_secdata(struct socket *sock)
+{
+	return test_bit(SOCK_PASSSEC, &sock->flags);
+}
+#else
+static inline void scm_passec(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm)
+{ }
+
+static inline bool scm_has_secdata(struct socket *sock)
+{
+	return false;
+}
+#endif
+
+static void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm)
+{
+	struct file *pidfd_file = NULL;
+	int len, pidfd;
+
+	/* put_cmsg() doesn't return an error if CMSG is truncated,
+	 * that's why we need to opencode these checks here.
+	 */
+	if (msg->msg_flags & MSG_CMSG_COMPAT)
+		len = sizeof(struct compat_cmsghdr) + sizeof(int);
+	else
+		len = sizeof(struct cmsghdr) + sizeof(int);
+
+	if (msg->msg_controllen < len) {
+		msg->msg_flags |= MSG_CTRUNC;
+		return;
+	}
+
+	if (!scm->pid)
+		return;
+
+	pidfd = pidfd_prepare(scm->pid, 0, &pidfd_file);
+
+	if (put_cmsg(msg, SOL_SOCKET, SCM_PIDFD, sizeof(int), &pidfd)) {
+		if (pidfd_file) {
+			put_unused_fd(pidfd);
+			fput(pidfd_file);
+		}
+
+		return;
+	}
+
+	if (pidfd_file)
+		fd_install(pidfd, pidfd_file);
+}
+
+static bool __scm_recv_common(struct socket *sock, struct msghdr *msg,
+			      struct scm_cookie *scm, int flags)
+{
+	if (!msg->msg_control) {
+		if (test_bit(SOCK_PASSCRED, &sock->flags) ||
+		    test_bit(SOCK_PASSPIDFD, &sock->flags) ||
+		    scm->fp || scm_has_secdata(sock))
+			msg->msg_flags |= MSG_CTRUNC;
+
+		scm_destroy(scm);
+		return false;
+	}
+
+	if (test_bit(SOCK_PASSCRED, &sock->flags)) {
+		struct user_namespace *current_ns = current_user_ns();
+		struct ucred ucreds = {
+			.pid = scm->creds.pid,
+			.uid = from_kuid_munged(current_ns, scm->creds.uid),
+			.gid = from_kgid_munged(current_ns, scm->creds.gid),
+		};
+
+		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(ucreds), &ucreds);
+	}
+
+	scm_passec(sock, msg, scm);
+
+	if (scm->fp)
+		scm_detach_fds(msg, scm);
+
+	return true;
+}
+
+void scm_recv(struct socket *sock, struct msghdr *msg,
+	      struct scm_cookie *scm, int flags)
+{
+	if (!__scm_recv_common(sock, msg, scm, flags))
+		return;
+
+	scm_destroy_cred(scm);
+}
+EXPORT_SYMBOL(scm_recv);
+
+void scm_recv_unix(struct socket *sock, struct msghdr *msg,
+		   struct scm_cookie *scm, int flags)
+{
+	if (!__scm_recv_common(sock, msg, scm, flags))
+		return;
+
+	if (test_bit(SOCK_PASSPIDFD, &sock->flags))
+		scm_pidfd_recv(msg, scm);
+
+	scm_destroy_cred(scm);
+}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 4/9] tcp: Restrict SO_TXREHASH to TCP socket.
  2025-05-10  1:56 [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
                   ` (2 preceding siblings ...)
  2025-05-10  1:56 ` [PATCH v2 net-next 3/9] scm: Move scm_recv() from scm.h to scm.c Kuniyuki Iwashima
@ 2025-05-10  1:56 ` Kuniyuki Iwashima
  2025-05-12 19:18   ` Willem de Bruijn
  2025-05-10  1:56 ` [PATCH v2 net-next 5/9] net: Restrict SO_PASS{CRED,PIDFD,SEC} to AF_{UNIX,NETLINK,BLUETOOTH} Kuniyuki Iwashima
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-10  1:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

sk->sk_txrehash is only used for TCP.

Let's restrict SO_TXREHASH to TCP to reflect this.

Later, we will make sk_txrehash a part of the union for other
protocol families, so we set 0 explicitly in getsockopt().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/core/sock.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index b64df2463300..5c84a608ddd7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1276,6 +1276,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
 		return 0;
 		}
 	case SO_TXREHASH:
+		if (!sk_is_tcp(sk))
+			return -EOPNOTSUPP;
 		if (val < -1 || val > 1)
 			return -EINVAL;
 		if ((u8)val == SOCK_TXREHASH_DEFAULT)
@@ -2102,8 +2104,11 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case SO_TXREHASH:
-		/* Paired with WRITE_ONCE() in sk_setsockopt() */
-		v.val = READ_ONCE(sk->sk_txrehash);
+		if (sk_is_tcp(sk))
+			/* Paired with WRITE_ONCE() in sk_setsockopt() */
+			v.val = READ_ONCE(sk->sk_txrehash);
+		else
+			v.val = 0;
 		break;
 
 	default:
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 5/9] net: Restrict SO_PASS{CRED,PIDFD,SEC} to AF_{UNIX,NETLINK,BLUETOOTH}.
  2025-05-10  1:56 [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
                   ` (3 preceding siblings ...)
  2025-05-10  1:56 ` [PATCH v2 net-next 4/9] tcp: Restrict SO_TXREHASH to TCP socket Kuniyuki Iwashima
@ 2025-05-10  1:56 ` Kuniyuki Iwashima
  2025-05-10  1:56 ` [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock Kuniyuki Iwashima
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-10  1:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

SCM_CREDENTIALS and SCM_SECURITY can be recv()ed by calling
scm_recv() or scm_recv_unix(), and SCM_PIDFD is only used by
scm_recv_unix().

scm_recv() is called from AF_NETLINK and AF_BLUETOOTH.

scm_recv_unix() is literally called from AF_UNIX.

Let's restrict SO_PASSCRED and SO_PASSSEC to such sockets and
SO_PASSPIDFD to AF_UNIX only.

Later, SOCK_PASS{CRED,PIDFD,SEC} will be moved to struct sock
and united with another field, so we set 0 explicitly in
getsockopt().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/sock.h | 14 +++++++++++++-
 net/core/sock.c    | 24 +++++++++++++++++++++---
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index f0fabb9fd28a..af0719ba331f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2772,9 +2772,14 @@ static inline bool sk_is_udp(const struct sock *sk)
 	       sk->sk_protocol == IPPROTO_UDP;
 }
 
+static inline bool sk_is_unix(const struct sock *sk)
+{
+	return sk->sk_family == AF_UNIX;
+}
+
 static inline bool sk_is_stream_unix(const struct sock *sk)
 {
-	return sk->sk_family == AF_UNIX && sk->sk_type == SOCK_STREAM;
+	return sk_is_unix(sk) && sk->sk_type == SOCK_STREAM;
 }
 
 static inline bool sk_is_vsock(const struct sock *sk)
@@ -2782,6 +2787,13 @@ static inline bool sk_is_vsock(const struct sock *sk)
 	return sk->sk_family == AF_VSOCK;
 }
 
+static inline bool sk_may_scm_recv(const struct sock *sk)
+{
+	return (IS_ENABLED(CONFIG_UNIX) && sk->sk_family == AF_UNIX) ||
+		sk->sk_family == AF_NETLINK ||
+		(IS_ENABLED(CONFIG_BT) && sk->sk_family == AF_BLUETOOTH);
+}
+
 /**
  * sk_eat_skb - Release a skb if it is no longer needed
  * @sk: socket to eat this skb from
diff --git a/net/core/sock.c b/net/core/sock.c
index 5c84a608ddd7..1ab59efbafc5 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1221,12 +1221,21 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
 		}
 		return -EPERM;
 	case SO_PASSSEC:
+		if (!sk_may_scm_recv(sk))
+			return -EOPNOTSUPP;
+
 		assign_bit(SOCK_PASSSEC, &sock->flags, valbool);
 		return 0;
 	case SO_PASSCRED:
+		if (!sk_may_scm_recv(sk))
+			return -EOPNOTSUPP;
+
 		assign_bit(SOCK_PASSCRED, &sock->flags, valbool);
 		return 0;
 	case SO_PASSPIDFD:
+		if (!sk_is_unix(sk))
+			return -EOPNOTSUPP;
+
 		assign_bit(SOCK_PASSPIDFD, &sock->flags, valbool);
 		return 0;
 	case SO_TYPE:
@@ -1855,11 +1864,17 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case SO_PASSCRED:
-		v.val = !!test_bit(SOCK_PASSCRED, &sock->flags);
+		if (sk_may_scm_recv(sk))
+			v.val = !!test_bit(SOCK_PASSCRED, &sock->flags);
+		else
+			v.val = 0;
 		break;
 
 	case SO_PASSPIDFD:
-		v.val = !!test_bit(SOCK_PASSPIDFD, &sock->flags);
+		if (sk_is_unix(sk))
+			v.val = !!test_bit(SOCK_PASSPIDFD, &sock->flags);
+		else
+			v.val = 0;
 		break;
 
 	case SO_PEERCRED:
@@ -1956,7 +1971,10 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case SO_PASSSEC:
-		v.val = !!test_bit(SOCK_PASSSEC, &sock->flags);
+		if (sk_may_scm_recv(sk))
+			v.val = !!test_bit(SOCK_PASSSEC, &sock->flags);
+		else
+			v.val = 0;
 		break;
 
 	case SO_PEERSEC:
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
  2025-05-10  1:56 [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
                   ` (4 preceding siblings ...)
  2025-05-10  1:56 ` [PATCH v2 net-next 5/9] net: Restrict SO_PASS{CRED,PIDFD,SEC} to AF_{UNIX,NETLINK,BLUETOOTH} Kuniyuki Iwashima
@ 2025-05-10  1:56 ` Kuniyuki Iwashima
  2025-05-12 19:20   ` Willem de Bruijn
  2025-05-13  2:03   ` kernel test robot
  2025-05-10  1:56 ` [PATCH v2 net-next 7/9] af_unix: Inherit sk_flags at connect() Kuniyuki Iwashima
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-10  1:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

As explained in the next patch, SO_PASSRIGHTS would have a problem
if we assigned a corresponding bit to socket->flags, so it must be
managed in struct sock.

Mixing socket->flags and sk->sk_flags for similar options will look
confusing, and sk->sk_flags does not have enough space on 32bit system.

Also, as mentioned in commit 16e572626961 ("af_unix: dont send
SCM_CREDENTIALS by default"), SOCK_PASSCRED and SOCK_PASSPID handling
is known to be slow, and managing the flags in struct socket cannot
avoid that for embryo sockets.

Let's move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.

While at it, other SOCK_XXX flags in net.h are grouped as enum.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/linux/net.h | 15 +++++++--------
 include/net/sock.h  | 13 ++++++++++++-
 net/core/scm.c      | 29 ++++++++++++++---------------
 net/core/sock.c     | 12 ++++++------
 net/unix/af_unix.c  | 18 ++----------------
 5 files changed, 41 insertions(+), 46 deletions(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 0ff950eecc6b..f8418d6e33e0 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -36,14 +36,13 @@ struct net;
  * in sock->flags, but moved into sk->sk_wq->flags to be RCU protected.
  * Eventually all flags will be in sk->sk_wq->flags.
  */
-#define SOCKWQ_ASYNC_NOSPACE	0
-#define SOCKWQ_ASYNC_WAITDATA	1
-#define SOCK_NOSPACE		2
-#define SOCK_PASSCRED		3
-#define SOCK_PASSSEC		4
-#define SOCK_SUPPORT_ZC		5
-#define SOCK_CUSTOM_SOCKOPT	6
-#define SOCK_PASSPIDFD		7
+enum socket_flags {
+	SOCKWQ_ASYNC_NOSPACE,
+	SOCKWQ_ASYNC_WAITDATA,
+	SOCK_NOSPACE,
+	SOCK_SUPPORT_ZC,
+	SOCK_CUSTOM_SOCKOPT,
+};
 
 #ifndef ARCH_HAS_SOCKET_TYPES
 /**
diff --git a/include/net/sock.h b/include/net/sock.h
index af0719ba331f..036ed7d394ba 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -337,6 +337,10 @@ struct sk_filter;
   *	@sk_txtime_deadline_mode: set deadline mode for SO_TXTIME
   *	@sk_txtime_report_errors: set report errors mode for SO_TXTIME
   *	@sk_txtime_unused: unused txtime flags
+  *	@sk_scm_recv_flags: All flags used by scm_recv()
+  *	@sk_scm_credentials: flagged by SO_PASSCRED to recv SCM_CREDENTIALS
+  *	@sk_scm_security: flagged by SO_PASSSEC to recv SCM_SECURITY
+  *	@sk_scm_pidfd: flagged by SO_PASSPIDFD to recv SCM_PIDFD
   *	@ns_tracker: tracker for netns reference
   *	@sk_user_frags: xarray of pages the user is holding a reference on.
   *	@sk_owner: reference to the real owner of the socket that calls
@@ -523,7 +527,14 @@ struct sock {
 #endif
 	int			sk_disconnects;
 
-	u8			sk_txrehash;
+	union {
+		u8		sk_txrehash;
+		u8		sk_scm_recv_flags;
+		u8		sk_scm_credentials : 1,
+				sk_scm_security : 1,
+				sk_scm_pidfd : 1,
+				sk_scm_unused : 5;
+	};
 	u8			sk_clockid;
 	u8			sk_txtime_deadline_mode : 1,
 				sk_txtime_report_errors : 1,
diff --git a/net/core/scm.c b/net/core/scm.c
index 3f756f00e41e..ffc0b917b4e7 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -406,12 +406,12 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
 EXPORT_SYMBOL(scm_fp_dup);
 
 #ifdef CONFIG_SECURITY_NETWORK
-static void scm_passec(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm)
+static void scm_passec(struct sock *sk, struct msghdr *msg, struct scm_cookie *scm)
 {
 	struct lsm_context ctx;
 	int err;
 
-	if (test_bit(SOCK_PASSSEC, &sock->flags)) {
+	if (sk->sk_scm_security) {
 		err = security_secid_to_secctx(scm->secid, &ctx);
 
 		if (err >= 0) {
@@ -423,15 +423,15 @@ static void scm_passec(struct socket *sock, struct msghdr *msg, struct scm_cooki
 	}
 }
 
-static bool scm_has_secdata(struct socket *sock)
+static bool scm_has_secdata(struct sock *sk)
 {
-	return test_bit(SOCK_PASSSEC, &sock->flags);
+	return sk->sk_scm_security;
 }
 #else
-static inline void scm_passec(struct socket *sock, struct msghdr *msg, struct scm_cookie *scm)
+static inline void scm_passec(struct sock *sk, struct msghdr *msg, struct scm_cookie *scm)
 { }
 
-static inline bool scm_has_secdata(struct socket *sock)
+static inline bool scm_has_secdata(struct sock *sk)
 {
 	return false;
 }
@@ -473,20 +473,19 @@ static void scm_pidfd_recv(struct msghdr *msg, struct scm_cookie *scm)
 		fd_install(pidfd, pidfd_file);
 }
 
-static bool __scm_recv_common(struct socket *sock, struct msghdr *msg,
+static bool __scm_recv_common(struct sock *sk, struct msghdr *msg,
 			      struct scm_cookie *scm, int flags)
 {
 	if (!msg->msg_control) {
-		if (test_bit(SOCK_PASSCRED, &sock->flags) ||
-		    test_bit(SOCK_PASSPIDFD, &sock->flags) ||
-		    scm->fp || scm_has_secdata(sock))
+		if (sk->sk_scm_credentials || sk->sk_scm_pidfd ||
+		    scm->fp || scm_has_secdata(sk))
 			msg->msg_flags |= MSG_CTRUNC;
 
 		scm_destroy(scm);
 		return false;
 	}
 
-	if (test_bit(SOCK_PASSCRED, &sock->flags)) {
+	if (sk->sk_scm_credentials) {
 		struct user_namespace *current_ns = current_user_ns();
 		struct ucred ucreds = {
 			.pid = scm->creds.pid,
@@ -497,7 +496,7 @@ static bool __scm_recv_common(struct socket *sock, struct msghdr *msg,
 		put_cmsg(msg, SOL_SOCKET, SCM_CREDENTIALS, sizeof(ucreds), &ucreds);
 	}
 
-	scm_passec(sock, msg, scm);
+	scm_passec(sk, msg, scm);
 
 	if (scm->fp)
 		scm_detach_fds(msg, scm);
@@ -508,7 +507,7 @@ static bool __scm_recv_common(struct socket *sock, struct msghdr *msg,
 void scm_recv(struct socket *sock, struct msghdr *msg,
 	      struct scm_cookie *scm, int flags)
 {
-	if (!__scm_recv_common(sock, msg, scm, flags))
+	if (!__scm_recv_common(sock->sk, msg, scm, flags))
 		return;
 
 	scm_destroy_cred(scm);
@@ -518,10 +517,10 @@ EXPORT_SYMBOL(scm_recv);
 void scm_recv_unix(struct socket *sock, struct msghdr *msg,
 		   struct scm_cookie *scm, int flags)
 {
-	if (!__scm_recv_common(sock, msg, scm, flags))
+	if (!__scm_recv_common(sock->sk, msg, scm, flags))
 		return;
 
-	if (test_bit(SOCK_PASSPIDFD, &sock->flags))
+	if (sock->sk->sk_scm_pidfd)
 		scm_pidfd_recv(msg, scm);
 
 	scm_destroy_cred(scm);
diff --git a/net/core/sock.c b/net/core/sock.c
index 1ab59efbafc5..9540cbe3d83e 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1224,19 +1224,19 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
 		if (!sk_may_scm_recv(sk))
 			return -EOPNOTSUPP;
 
-		assign_bit(SOCK_PASSSEC, &sock->flags, valbool);
+		sk->sk_scm_security = valbool;
 		return 0;
 	case SO_PASSCRED:
 		if (!sk_may_scm_recv(sk))
 			return -EOPNOTSUPP;
 
-		assign_bit(SOCK_PASSCRED, &sock->flags, valbool);
+		sk->sk_scm_credentials = valbool;
 		return 0;
 	case SO_PASSPIDFD:
 		if (!sk_is_unix(sk))
 			return -EOPNOTSUPP;
 
-		assign_bit(SOCK_PASSPIDFD, &sock->flags, valbool);
+		sk->sk_scm_pidfd = valbool;
 		return 0;
 	case SO_TYPE:
 	case SO_PROTOCOL:
@@ -1865,14 +1865,14 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 
 	case SO_PASSCRED:
 		if (sk_may_scm_recv(sk))
-			v.val = !!test_bit(SOCK_PASSCRED, &sock->flags);
+			v.val = sk->sk_scm_credentials;
 		else
 			v.val = 0;
 		break;
 
 	case SO_PASSPIDFD:
 		if (sk_is_unix(sk))
-			v.val = !!test_bit(SOCK_PASSPIDFD, &sock->flags);
+			v.val = sk->sk_scm_pidfd;
 		else
 			v.val = 0;
 		break;
@@ -1972,7 +1972,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 
 	case SO_PASSSEC:
 		if (sk_may_scm_recv(sk))
-			v.val = !!test_bit(SOCK_PASSSEC, &sock->flags);
+			v.val = sk->sk_scm_security;
 		else
 			v.val = 0;
 		break;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a39497fd6e98..83436297b0b3 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -767,10 +767,7 @@ static void copy_peercred(struct sock *sk, struct sock *peersk)
 
 static bool unix_may_passcred(const struct sock *sk)
 {
-	struct socket *sock = sk->sk_socket;
-
-	return test_bit(SOCK_PASSCRED, &sock->flags) ||
-		test_bit(SOCK_PASSPIDFD, &sock->flags);
+	return sk->sk_scm_credentials || sk->sk_scm_pidfd;
 }
 
 static int unix_listen(struct socket *sock, int backlog)
@@ -1713,17 +1710,6 @@ static int unix_socketpair(struct socket *socka, struct socket *sockb)
 	return 0;
 }
 
-static void unix_sock_inherit_flags(const struct socket *old,
-				    struct socket *new)
-{
-	if (test_bit(SOCK_PASSCRED, &old->flags))
-		set_bit(SOCK_PASSCRED, &new->flags);
-	if (test_bit(SOCK_PASSPIDFD, &old->flags))
-		set_bit(SOCK_PASSPIDFD, &new->flags);
-	if (test_bit(SOCK_PASSSEC, &old->flags))
-		set_bit(SOCK_PASSSEC, &new->flags);
-}
-
 static int unix_accept(struct socket *sock, struct socket *newsock,
 		       struct proto_accept_arg *arg)
 {
@@ -1760,7 +1746,7 @@ static int unix_accept(struct socket *sock, struct socket *newsock,
 	unix_state_lock(tsk);
 	unix_update_edges(unix_sk(tsk));
 	newsock->state = SS_CONNECTED;
-	unix_sock_inherit_flags(sock, newsock);
+	tsk->sk_scm_recv_flags = sk->sk_scm_recv_flags;
 	sock_graft(tsk, newsock);
 	unix_state_unlock(tsk);
 	return 0;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 7/9] af_unix: Inherit sk_flags at connect().
  2025-05-10  1:56 [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
                   ` (5 preceding siblings ...)
  2025-05-10  1:56 ` [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock Kuniyuki Iwashima
@ 2025-05-10  1:56 ` Kuniyuki Iwashima
  2025-05-12 19:38   ` Willem de Bruijn
  2025-05-10  1:56 ` [PATCH v2 net-next 8/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
  2025-05-10  1:56 ` [PATCH v2 net-next 9/9] selftest: af_unix: Test SO_PASSRIGHTS Kuniyuki Iwashima
  8 siblings, 1 reply; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-10  1:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

For SOCK_STREAM embryo sockets, the SO_PASS{CRED,PIDFD,SEC} options
are inherited from the parent listen()ing socket.

Currently, this inheritance happens at accept(), because these
attributes were stored in sk->sk_socket->flags and the struct socket
is not allocated until accept().

This leads to unintentional behaviour.

When a peer sends data to an embryo socket in the accept() queue,
unix_maybe_add_creds() embeds credentials into the skb, even if
neither the peer nor the listener has enabled these options.

If the option is enabled, the embryo socket receives the ancillary
data after accept().  If not, the data is silently discarded.

This conservative approach works for SO_PASS{CRED,PIDFD,SEC}, but not
for SO_PASSRIGHTS; once an SCM_RIGHTS with a hung file descriptor is
sent, it’s game over.

To avoid this, we will need to preserve SOCK_PASSRIGHTS even on embryo
sockets.

A recent change made it possible to access the parent's flags in
sendmsg() via unix_sk(other)->listener->sk->sk_socket->flags, but
this introduces an unnecessary condition that is irrelevant for
most sockets, accept()ed sockets and clients.

Therefore, we moved SOCK_PASSXXX into struct sock.

Let’s inherit sk->sk_scm_recv_flags at connect() to avoid receiving
SCM_RIGHTS on embryo sockets created from a parent with SO_PASSRIGHTS=0.

Now, we can remove !other->sk_socket check in unix_maybe_add_creds()
to avoid slow SOCK_PASS{CRED,PIDFD} handling for embryo sockets
created from a parent with SO_PASS{CRED,PIDFD}=0.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 net/unix/af_unix.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 83436297b0b3..ba52fc36f9be 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1626,10 +1626,12 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 	/* The way is open! Fastly set all the necessary fields... */
 
 	sock_hold(sk);
-	unix_peer(newsk)	= sk;
-	newsk->sk_state		= TCP_ESTABLISHED;
-	newsk->sk_type		= sk->sk_type;
+	unix_peer(newsk) =sk;
+	newsk->sk_state = TCP_ESTABLISHED;
+	newsk->sk_type = sk->sk_type;
+	newsk->sk_scm_recv_flags = other->sk_scm_recv_flags;
 	init_peercred(newsk);
+
 	newu = unix_sk(newsk);
 	newu->listener = other;
 	RCU_INIT_POINTER(newsk->sk_wq, &newu->peer_wq);
@@ -1746,7 +1748,6 @@ static int unix_accept(struct socket *sock, struct socket *newsock,
 	unix_state_lock(tsk);
 	unix_update_edges(unix_sk(tsk));
 	newsock->state = SS_CONNECTED;
-	tsk->sk_scm_recv_flags = sk->sk_scm_recv_flags;
 	sock_graft(tsk, newsock);
 	unix_state_unlock(tsk);
 	return 0;
@@ -1878,8 +1879,7 @@ static void unix_maybe_add_creds(struct sk_buff *skb, const struct sock *sk,
 	if (UNIXCB(skb).pid)
 		return;
 
-	if (unix_may_passcred(sk) ||
-	    !other->sk_socket || unix_may_passcred(other)) {
+	if (unix_may_passcred(sk) || unix_may_passcred(other)) {
 		UNIXCB(skb).pid = get_pid(task_tgid(current));
 		current_uid_gid(&UNIXCB(skb).uid, &UNIXCB(skb).gid);
 	}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 8/9] af_unix: Introduce SO_PASSRIGHTS.
  2025-05-10  1:56 [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
                   ` (6 preceding siblings ...)
  2025-05-10  1:56 ` [PATCH v2 net-next 7/9] af_unix: Inherit sk_flags at connect() Kuniyuki Iwashima
@ 2025-05-10  1:56 ` Kuniyuki Iwashima
  2025-05-10  1:56 ` [PATCH v2 net-next 9/9] selftest: af_unix: Test SO_PASSRIGHTS Kuniyuki Iwashima
  8 siblings, 0 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-10  1:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

As long as recvmsg() or recvmmsg() is used with cmsg, it is not
possible to avoid receiving file descriptors via SCM_RIGHTS.

This behaviour has occasionally been flagged as problematic, as
it can be (ab)used to trigger DoS during close(), for example, by
passing a FUSE-controlled fd or a hung NFS fd.

For instance, as noted on the uAPI Group page [0], an untrusted peer
could send a file descriptor pointing to a hung NFS mount and then
close it.  Once the receiver calls recvmsg() with msg_control, the
descriptor is automatically installed, and then the responsibility
for the final close() now falls on the receiver, which may result
in blocking the process for a long time.

Regarding this, systemd calls cmsg_close_all() [1] after each
recvmsg() to close() unwanted file descriptors sent via SCM_RIGHTS.

However, this cannot work around the issue at all, because the final
fput() may still occur on the receiver's side once sendmsg() with
SCM_RIGHTS succeeds.  Also, even filtering by LSM at recvmsg() does
not work for the same reason.

Thus, we need a better way to refuse SCM_RIGHTS at sendmsg().

Let's introduce SO_PASSRIGHTS to disable SCM_RIGHTS.

Note that this option is enabled by default for backward
compatibility.

Link: https://uapi-group.org/kernel-features/#disabling-reception-of-scm_rights-for-af_unix-sockets #[0]
Link: https://github.com/systemd/systemd/blob/v257.5/src/basic/fd-util.c#L612-L628 #[1]
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 arch/alpha/include/uapi/asm/socket.h    |  2 ++
 arch/mips/include/uapi/asm/socket.h     |  2 ++
 arch/parisc/include/uapi/asm/socket.h   |  2 ++
 arch/sparc/include/uapi/asm/socket.h    |  2 ++
 include/net/sock.h                      |  4 +++-
 include/uapi/asm-generic/socket.h       |  2 ++
 net/core/sock.c                         | 13 +++++++++++++
 net/unix/af_unix.c                      | 22 ++++++++++++++++++++--
 tools/include/uapi/asm-generic/socket.h |  2 ++
 9 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 3df5f2dd4c0f..8f1f18adcdb5 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -150,6 +150,8 @@
 
 #define SO_RCVPRIORITY		82
 
+#define SO_PASSRIGHTS		83
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 22fa8f19924a..31ac655b7837 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -161,6 +161,8 @@
 
 #define SO_RCVPRIORITY		82
 
+#define SO_PASSRIGHTS		83
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 96831c988606..1f2d5b7a7f5d 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -142,6 +142,8 @@
 #define SCM_DEVMEM_DMABUF	SO_DEVMEM_DMABUF
 #define SO_DEVMEM_DONTNEED	0x4050
 
+#define SO_PASSRIGHTS		0x4051
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 5b464a568664..adcba7329386 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -143,6 +143,8 @@
 
 #define SO_RCVPRIORITY           0x005b
 
+#define SO_PASSRIGHTS            0x005c
+
 #if !defined(__KERNEL__)
 
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 036ed7d394ba..26c7d85df7d3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -341,6 +341,7 @@ struct sk_filter;
   *	@sk_scm_credentials: flagged by SO_PASSCRED to recv SCM_CREDENTIALS
   *	@sk_scm_security: flagged by SO_PASSSEC to recv SCM_SECURITY
   *	@sk_scm_pidfd: flagged by SO_PASSPIDFD to recv SCM_PIDFD
+  *	@sk_scm_rights: flagged by SO_PASSRIGHTS to recv SCM_RIGHTS
   *	@ns_tracker: tracker for netns reference
   *	@sk_user_frags: xarray of pages the user is holding a reference on.
   *	@sk_owner: reference to the real owner of the socket that calls
@@ -533,7 +534,8 @@ struct sock {
 		u8		sk_scm_credentials : 1,
 				sk_scm_security : 1,
 				sk_scm_pidfd : 1,
-				sk_scm_unused : 5;
+				sk_scm_rights : 1,
+				sk_scm_unused : 4;
 	};
 	u8			sk_clockid;
 	u8			sk_txtime_deadline_mode : 1,
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index aa5016ff3d91..f333a0ac4ee4 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -145,6 +145,8 @@
 
 #define SO_RCVPRIORITY		82
 
+#define SO_PASSRIGHTS		83
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/net/core/sock.c b/net/core/sock.c
index 9540cbe3d83e..c9f81019cb9d 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1238,6 +1238,12 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
 
 		sk->sk_scm_pidfd = valbool;
 		return 0;
+	case SO_PASSRIGHTS:
+		if (!sk_is_unix(sk))
+			return -EOPNOTSUPP;
+
+		sk->sk_scm_rights = valbool;
+		return 0;
 	case SO_TYPE:
 	case SO_PROTOCOL:
 	case SO_DOMAIN:
@@ -1877,6 +1883,13 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 			v.val = 0;
 		break;
 
+	case SO_PASSRIGHTS:
+		if (sk_is_unix(sk))
+			v.val = sk->sk_scm_rights;
+		else
+			v.val = 0;
+		break;
+
 	case SO_PEERCRED:
 	{
 		struct ucred peercred;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index ba52fc36f9be..941098b090ef 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1015,6 +1015,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern,
 
 	sock_init_data(sock, sk);
 
+	sk->sk_scm_rights	= 1;
 	sk->sk_hash		= unix_unbound_hash(sk);
 	sk->sk_allocation	= GFP_KERNEL_ACCOUNT;
 	sk->sk_write_space	= unix_write_space;
@@ -2073,6 +2074,11 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
 		goto out_unlock;
 	}
 
+	if (UNIXCB(skb).fp && !other->sk_scm_rights) {
+		err = -EPERM;
+		goto out_unlock;
+	}
+
 	if (sk->sk_type != SOCK_SEQPACKET) {
 		err = security_unix_may_send(sk->sk_socket, other->sk_socket);
 		if (err)
@@ -2174,9 +2180,13 @@ static int queue_oob(struct sock *sk, struct msghdr *msg, struct sock *other,
 
 	if (sock_flag(other, SOCK_DEAD) ||
 	    (other->sk_shutdown & RCV_SHUTDOWN)) {
-		unix_state_unlock(other);
 		err = -EPIPE;
-		goto out;
+		goto out_unlock;
+	}
+
+	if (UNIXCB(skb).fp && !other->sk_scm_rights) {
+		err = -EPERM;
+		goto out_unlock;
 	}
 
 	unix_maybe_add_creds(skb, sk, other);
@@ -2192,6 +2202,8 @@ static int queue_oob(struct sock *sk, struct msghdr *msg, struct sock *other,
 	other->sk_data_ready(other);
 
 	return 0;
+out_unlock:
+	unix_state_unlock(other);
 out:
 	consume_skb(skb);
 	return err;
@@ -2295,6 +2307,12 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg,
 		    (other->sk_shutdown & RCV_SHUTDOWN))
 			goto out_pipe_unlock;
 
+		if (UNIXCB(skb).fp && !other->sk_scm_rights) {
+			unix_state_unlock(other);
+			err = -EPERM;
+			goto out_free;
+		}
+
 		unix_maybe_add_creds(skb, sk, other);
 		scm_stat_add(other, skb);
 		skb_queue_tail(&other->sk_receive_queue, skb);
diff --git a/tools/include/uapi/asm-generic/socket.h b/tools/include/uapi/asm-generic/socket.h
index aa5016ff3d91..f333a0ac4ee4 100644
--- a/tools/include/uapi/asm-generic/socket.h
+++ b/tools/include/uapi/asm-generic/socket.h
@@ -145,6 +145,8 @@
 
 #define SO_RCVPRIORITY		82
 
+#define SO_PASSRIGHTS		83
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 9/9] selftest: af_unix: Test SO_PASSRIGHTS.
  2025-05-10  1:56 [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
                   ` (7 preceding siblings ...)
  2025-05-10  1:56 ` [PATCH v2 net-next 8/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
@ 2025-05-10  1:56 ` Kuniyuki Iwashima
  8 siblings, 0 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-10  1:56 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

scm_rights.c has various patterns of tests to exercise GC.

Let's add cases where SO_PASSRIGHTS is disabled.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 .../selftests/net/af_unix/scm_rights.c        | 84 ++++++++++++++++++-
 1 file changed, 81 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/af_unix/scm_rights.c b/tools/testing/selftests/net/af_unix/scm_rights.c
index d66336256580..7589f690fe2f 100644
--- a/tools/testing/selftests/net/af_unix/scm_rights.c
+++ b/tools/testing/selftests/net/af_unix/scm_rights.c
@@ -23,6 +23,7 @@ FIXTURE_VARIANT(scm_rights)
 	int type;
 	int flags;
 	bool test_listener;
+	bool disabled;
 };
 
 FIXTURE_VARIANT_ADD(scm_rights, dgram)
@@ -31,6 +32,16 @@ FIXTURE_VARIANT_ADD(scm_rights, dgram)
 	.type = SOCK_DGRAM,
 	.flags = 0,
 	.test_listener = false,
+	.disabled = false,
+};
+
+FIXTURE_VARIANT_ADD(scm_rights, dgram_disabled)
+{
+	.name = "UNIX ",
+	.type = SOCK_DGRAM,
+	.flags = 0,
+	.test_listener = false,
+	.disabled = true,
 };
 
 FIXTURE_VARIANT_ADD(scm_rights, stream)
@@ -39,6 +50,16 @@ FIXTURE_VARIANT_ADD(scm_rights, stream)
 	.type = SOCK_STREAM,
 	.flags = 0,
 	.test_listener = false,
+	.disabled = false,
+};
+
+FIXTURE_VARIANT_ADD(scm_rights, stream_disabled)
+{
+	.name = "UNIX-STREAM ",
+	.type = SOCK_STREAM,
+	.flags = 0,
+	.test_listener = false,
+	.disabled = true,
 };
 
 FIXTURE_VARIANT_ADD(scm_rights, stream_oob)
@@ -47,6 +68,16 @@ FIXTURE_VARIANT_ADD(scm_rights, stream_oob)
 	.type = SOCK_STREAM,
 	.flags = MSG_OOB,
 	.test_listener = false,
+	.disabled = false,
+};
+
+FIXTURE_VARIANT_ADD(scm_rights, stream_oob_disabled)
+{
+	.name = "UNIX-STREAM ",
+	.type = SOCK_STREAM,
+	.flags = MSG_OOB,
+	.test_listener = false,
+	.disabled = true,
 };
 
 FIXTURE_VARIANT_ADD(scm_rights, stream_listener)
@@ -55,6 +86,16 @@ FIXTURE_VARIANT_ADD(scm_rights, stream_listener)
 	.type = SOCK_STREAM,
 	.flags = 0,
 	.test_listener = true,
+	.disabled = false,
+};
+
+FIXTURE_VARIANT_ADD(scm_rights, stream_listener_disabled)
+{
+	.name = "UNIX-STREAM ",
+	.type = SOCK_STREAM,
+	.flags = 0,
+	.test_listener = true,
+	.disabled = true,
 };
 
 FIXTURE_VARIANT_ADD(scm_rights, stream_listener_oob)
@@ -63,6 +104,16 @@ FIXTURE_VARIANT_ADD(scm_rights, stream_listener_oob)
 	.type = SOCK_STREAM,
 	.flags = MSG_OOB,
 	.test_listener = true,
+	.disabled = false,
+};
+
+FIXTURE_VARIANT_ADD(scm_rights, stream_listener_oob_disabled)
+{
+	.name = "UNIX-STREAM ",
+	.type = SOCK_STREAM,
+	.flags = MSG_OOB,
+	.test_listener = true,
+	.disabled = true,
 };
 
 static int count_sockets(struct __test_metadata *_metadata,
@@ -105,6 +156,9 @@ FIXTURE_SETUP(scm_rights)
 	ret = unshare(CLONE_NEWNET);
 	ASSERT_EQ(0, ret);
 
+	if (variant->disabled)
+		return;
+
 	ret = count_sockets(_metadata, variant);
 	ASSERT_EQ(0, ret);
 }
@@ -113,6 +167,9 @@ FIXTURE_TEARDOWN(scm_rights)
 {
 	int ret;
 
+	if (variant->disabled)
+		return;
+
 	sleep(1);
 
 	ret = count_sockets(_metadata, variant);
@@ -121,6 +178,7 @@ FIXTURE_TEARDOWN(scm_rights)
 
 static void create_listeners(struct __test_metadata *_metadata,
 			     FIXTURE_DATA(scm_rights) *self,
+			     const FIXTURE_VARIANT(scm_rights) *variant,
 			     int n)
 {
 	struct sockaddr_un addr = {
@@ -140,6 +198,12 @@ static void create_listeners(struct __test_metadata *_metadata,
 		ret = listen(self->fd[i], -1);
 		ASSERT_EQ(0, ret);
 
+		if (variant->disabled) {
+			ret = setsockopt(self->fd[i], SOL_SOCKET, SO_PASSRIGHTS,
+					 &(int){0}, sizeof(int));
+			ASSERT_EQ(0, ret);
+		}
+
 		addrlen = sizeof(addr);
 		ret = getsockname(self->fd[i], (struct sockaddr *)&addr, &addrlen);
 		ASSERT_EQ(0, ret);
@@ -164,6 +228,12 @@ static void create_socketpairs(struct __test_metadata *_metadata,
 	for (i = 0; i < n * 2; i += 2) {
 		ret = socketpair(AF_UNIX, variant->type, 0, self->fd + i);
 		ASSERT_EQ(0, ret);
+
+		if (variant->disabled) {
+			ret = setsockopt(self->fd[i], SOL_SOCKET, SO_PASSRIGHTS,
+					 &(int){0}, sizeof(int));
+			ASSERT_EQ(0, ret);
+		}
 	}
 }
 
@@ -175,7 +245,7 @@ static void __create_sockets(struct __test_metadata *_metadata,
 	ASSERT_LE(n * 2, sizeof(self->fd) / sizeof(self->fd[0]));
 
 	if (variant->test_listener)
-		create_listeners(_metadata, self, n);
+		create_listeners(_metadata, self, variant, n);
 	else
 		create_socketpairs(_metadata, self, variant, n);
 }
@@ -227,10 +297,18 @@ void __send_fd(struct __test_metadata *_metadata,
 		.msg_control = &cmsg,
 		.msg_controllen = CMSG_SPACE(sizeof(cmsg.fd)),
 	};
-	int ret;
+	int ret, saved_errno;
 
+	errno = 0;
 	ret = sendmsg(self->fd[receiver * 2 + 1], &msg, variant->flags);
-	ASSERT_EQ(MSGLEN, ret);
+	saved_errno = errno;
+
+	if (variant->disabled) {
+		ASSERT_EQ(-1, ret);
+		ASSERT_EQ(-EPERM, -saved_errno);
+	} else {
+		ASSERT_EQ(MSGLEN, ret);
+	}
 }
 
 #define create_sockets(n)					\
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 4/9] tcp: Restrict SO_TXREHASH to TCP socket.
  2025-05-10  1:56 ` [PATCH v2 net-next 4/9] tcp: Restrict SO_TXREHASH to TCP socket Kuniyuki Iwashima
@ 2025-05-12 19:18   ` Willem de Bruijn
  2025-05-12 22:14     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 23+ messages in thread
From: Willem de Bruijn @ 2025-05-12 19:18 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

Kuniyuki Iwashima wrote:
> sk->sk_txrehash is only used for TCP.
> 
> Let's restrict SO_TXREHASH to TCP to reflect this.
> 
> Later, we will make sk_txrehash a part of the union for other
> protocol families, so we set 0 explicitly in getsockopt().
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  net/core/sock.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b64df2463300..5c84a608ddd7 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1276,6 +1276,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
>  		return 0;
>  		}
>  	case SO_TXREHASH:
> +		if (!sk_is_tcp(sk))
> +			return -EOPNOTSUPP;
>  		if (val < -1 || val > 1)
>  			return -EINVAL;
>  		if ((u8)val == SOCK_TXREHASH_DEFAULT)
> @@ -2102,8 +2104,11 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
>  		break;
>  
>  	case SO_TXREHASH:
> -		/* Paired with WRITE_ONCE() in sk_setsockopt() */
> -		v.val = READ_ONCE(sk->sk_txrehash);
> +		if (sk_is_tcp(sk))
> +			/* Paired with WRITE_ONCE() in sk_setsockopt() */
> +			v.val = READ_ONCE(sk->sk_txrehash);
> +		else
> +			v.val = 0;

Here and in the following getsockopt calls: should the call fail with
EOPNOTSUPP rather than return a value that is legal where the option
is supported (in TCP).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
  2025-05-10  1:56 ` [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock Kuniyuki Iwashima
@ 2025-05-12 19:20   ` Willem de Bruijn
  2025-05-12 22:20     ` Kuniyuki Iwashima
  2025-05-13  2:03   ` kernel test robot
  1 sibling, 1 reply; 23+ messages in thread
From: Willem de Bruijn @ 2025-05-12 19:20 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

Kuniyuki Iwashima wrote:
> As explained in the next patch, SO_PASSRIGHTS would have a problem
> if we assigned a corresponding bit to socket->flags, so it must be
> managed in struct sock.
> 
> Mixing socket->flags and sk->sk_flags for similar options will look
> confusing, and sk->sk_flags does not have enough space on 32bit system.
> 
> Also, as mentioned in commit 16e572626961 ("af_unix: dont send
> SCM_CREDENTIALS by default"), SOCK_PASSCRED and SOCK_PASSPID handling
> is known to be slow, and managing the flags in struct socket cannot
> avoid that for embryo sockets.
> 
> Let's move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
> 
> While at it, other SOCK_XXX flags in net.h are grouped as enum.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

> diff --git a/net/core/sock.c b/net/core/sock.c
> index 1ab59efbafc5..9540cbe3d83e 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1224,19 +1224,19 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
>  		if (!sk_may_scm_recv(sk))
>  			return -EOPNOTSUPP;
>  
> -		assign_bit(SOCK_PASSSEC, &sock->flags, valbool);
> +		sk->sk_scm_security = valbool;

Is it safe to switch from atomic to non-atomic updates?

Reads and writes can race. Especially given that these are bit stores, so RMW.

>  		return 0;
>  	case SO_PASSCRED:
>  		if (!sk_may_scm_recv(sk))
>  			return -EOPNOTSUPP;
>  
> -		assign_bit(SOCK_PASSCRED, &sock->flags, valbool);
> +		sk->sk_scm_credentials = valbool;
>  		return 0;
>  	case SO_PASSPIDFD:
>  		if (!sk_is_unix(sk))
>  			return -EOPNOTSUPP;
>  
> -		assign_bit(SOCK_PASSPIDFD, &sock->flags, valbool);
> +		sk->sk_scm_pidfd = valbool;
>  		return 0;
>  	case SO_TYPE:
>  	case SO_PROTOCOL:
> @@ -1865,14 +1865,14 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
>  
>  	case SO_PASSCRED:
>  		if (sk_may_scm_recv(sk))
> -			v.val = !!test_bit(SOCK_PASSCRED, &sock->flags);
> +			v.val = sk->sk_scm_credentials;
>  		else
>  			v.val = 0;
>  		break;
>  
>  	case SO_PASSPIDFD:
>  		if (sk_is_unix(sk))
> -			v.val = !!test_bit(SOCK_PASSPIDFD, &sock->flags);
> +			v.val = sk->sk_scm_pidfd;
>  		else
>  			v.val = 0;
>  		break;
> @@ -1972,7 +1972,7 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
>  
>  	case SO_PASSSEC:
>  		if (sk_may_scm_recv(sk))
> -			v.val = !!test_bit(SOCK_PASSSEC, &sock->flags);
> +			v.val = sk->sk_scm_security;
>  		else
>  			v.val = 0;
>  		break;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 7/9] af_unix: Inherit sk_flags at connect().
  2025-05-10  1:56 ` [PATCH v2 net-next 7/9] af_unix: Inherit sk_flags at connect() Kuniyuki Iwashima
@ 2025-05-12 19:38   ` Willem de Bruijn
  2025-05-12 22:34     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 23+ messages in thread
From: Willem de Bruijn @ 2025-05-12 19:38 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn
  Cc: Simon Horman, Christian Brauner, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev

Kuniyuki Iwashima wrote:
> For SOCK_STREAM embryo sockets, the SO_PASS{CRED,PIDFD,SEC} options
> are inherited from the parent listen()ing socket.
> 
> Currently, this inheritance happens at accept(), because these
> attributes were stored in sk->sk_socket->flags and the struct socket
> is not allocated until accept().
> 
> This leads to unintentional behaviour.
> 
> When a peer sends data to an embryo socket in the accept() queue,
> unix_maybe_add_creds() embeds credentials into the skb, even if
> neither the peer nor the listener has enabled these options.
> 
> If the option is enabled, the embryo socket receives the ancillary
> data after accept().  If not, the data is silently discarded.
> 
> This conservative approach works for SO_PASS{CRED,PIDFD,SEC}, but not
> for SO_PASSRIGHTS; once an SCM_RIGHTS with a hung file descriptor is
> sent, it’s game over.

Should this be a fix to net then?

It depends on the move of this one bit from socket to sock. So is not
a stand-alone patch. But does not need all of the previous cleanup
patches if needs to be backportable.

> 
> To avoid this, we will need to preserve SOCK_PASSRIGHTS even on embryo
> sockets.
> 
> A recent change made it possible to access the parent's flags in
> sendmsg() via unix_sk(other)->listener->sk->sk_socket->flags, but
> this introduces an unnecessary condition that is irrelevant for
> most sockets, accept()ed sockets and clients.

What is this condition and how is it irrelevant? A constraint on the
kernel having the recent change? I.e., not backportable?
 
> Therefore, we moved SOCK_PASSXXX into struct sock.
> 
> Let’s inherit sk->sk_scm_recv_flags at connect() to avoid receiving
> SCM_RIGHTS on embryo sockets created from a parent with SO_PASSRIGHTS=0.
> 
> Now, we can remove !other->sk_socket check in unix_maybe_add_creds()
> to avoid slow SOCK_PASS{CRED,PIDFD} handling for embryo sockets
> created from a parent with SO_PASS{CRED,PIDFD}=0.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 4/9] tcp: Restrict SO_TXREHASH to TCP socket.
  2025-05-12 19:18   ` Willem de Bruijn
@ 2025-05-12 22:14     ` Kuniyuki Iwashima
  2025-05-13  2:42       ` Willem de Bruijn
  0 siblings, 1 reply; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-12 22:14 UTC (permalink / raw)
  To: willemdebruijn.kernel
  Cc: brauner, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
	pabeni, willemb

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 12 May 2025 15:18:12 -0400
> Kuniyuki Iwashima wrote:
> > sk->sk_txrehash is only used for TCP.
> > 
> > Let's restrict SO_TXREHASH to TCP to reflect this.
> > 
> > Later, we will make sk_txrehash a part of the union for other
> > protocol families, so we set 0 explicitly in getsockopt().
> > 
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> >  net/core/sock.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index b64df2463300..5c84a608ddd7 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1276,6 +1276,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> >  		return 0;
> >  		}
> >  	case SO_TXREHASH:
> > +		if (!sk_is_tcp(sk))
> > +			return -EOPNOTSUPP;
> >  		if (val < -1 || val > 1)
> >  			return -EINVAL;
> >  		if ((u8)val == SOCK_TXREHASH_DEFAULT)
> > @@ -2102,8 +2104,11 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
> >  		break;
> >  
> >  	case SO_TXREHASH:
> > -		/* Paired with WRITE_ONCE() in sk_setsockopt() */
> > -		v.val = READ_ONCE(sk->sk_txrehash);
> > +		if (sk_is_tcp(sk))
> > +			/* Paired with WRITE_ONCE() in sk_setsockopt() */
> > +			v.val = READ_ONCE(sk->sk_txrehash);
> > +		else
> > +			v.val = 0;
> 
> Here and in the following getsockopt calls: should the call fail with
> EOPNOTSUPP rather than return a value that is legal where the option
> is supported (in TCP).

I was wondering which is better but didn't have preference, so will
return -EOPNOTSUPP in v3.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
  2025-05-12 19:20   ` Willem de Bruijn
@ 2025-05-12 22:20     ` Kuniyuki Iwashima
  2025-05-13  2:44       ` Willem de Bruijn
  0 siblings, 1 reply; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-12 22:20 UTC (permalink / raw)
  To: willemdebruijn.kernel
  Cc: brauner, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
	pabeni, willemb

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 12 May 2025 15:20:54 -0400
> Kuniyuki Iwashima wrote:
> > As explained in the next patch, SO_PASSRIGHTS would have a problem
> > if we assigned a corresponding bit to socket->flags, so it must be
> > managed in struct sock.
> > 
> > Mixing socket->flags and sk->sk_flags for similar options will look
> > confusing, and sk->sk_flags does not have enough space on 32bit system.
> > 
> > Also, as mentioned in commit 16e572626961 ("af_unix: dont send
> > SCM_CREDENTIALS by default"), SOCK_PASSCRED and SOCK_PASSPID handling
> > is known to be slow, and managing the flags in struct socket cannot
> > avoid that for embryo sockets.
> > 
> > Let's move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
> > 
> > While at it, other SOCK_XXX flags in net.h are grouped as enum.
> > 
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> 
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 1ab59efbafc5..9540cbe3d83e 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1224,19 +1224,19 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> >  		if (!sk_may_scm_recv(sk))
> >  			return -EOPNOTSUPP;
> >  
> > -		assign_bit(SOCK_PASSSEC, &sock->flags, valbool);
> > +		sk->sk_scm_security = valbool;
> 
> Is it safe to switch from atomic to non-atomic updates?
> 
> Reads and writes can race. Especially given that these are bit stores, so RMW.

Exactly, will move them down after sockopt_lock_sock().

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 7/9] af_unix: Inherit sk_flags at connect().
  2025-05-12 19:38   ` Willem de Bruijn
@ 2025-05-12 22:34     ` Kuniyuki Iwashima
  2025-05-13  2:48       ` Willem de Bruijn
  0 siblings, 1 reply; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-12 22:34 UTC (permalink / raw)
  To: willemdebruijn.kernel
  Cc: brauner, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
	pabeni, willemb

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 12 May 2025 15:38:13 -0400
> Kuniyuki Iwashima wrote:
> > For SOCK_STREAM embryo sockets, the SO_PASS{CRED,PIDFD,SEC} options
> > are inherited from the parent listen()ing socket.
> > 
> > Currently, this inheritance happens at accept(), because these
> > attributes were stored in sk->sk_socket->flags and the struct socket
> > is not allocated until accept().
> > 
> > This leads to unintentional behaviour.
> > 
> > When a peer sends data to an embryo socket in the accept() queue,
> > unix_maybe_add_creds() embeds credentials into the skb, even if
> > neither the peer nor the listener has enabled these options.
> > 
> > If the option is enabled, the embryo socket receives the ancillary
> > data after accept().  If not, the data is silently discarded.
> > 
> > This conservative approach works for SO_PASS{CRED,PIDFD,SEC}, but not
> > for SO_PASSRIGHTS; once an SCM_RIGHTS with a hung file descriptor is
> > sent, it’s game over.
> 
> Should this be a fix to net then?

Regarding SO_PASS{CRED,PIDFD,SEC}, this patch is a small optimisation
to save unnecessary get_pid() etc, like 16e572626961

And, SO_PASSRIGHTS is not yet added here, so this is not a fix.

Maybe I should have clarified like "this works but would not for SO_PASSRIGHTS".


> 
> It depends on the move of this one bit from socket to sock. So is not
> a stand-alone patch. But does not need all of the previous cleanup
> patches if needs to be backportable.
> 
> > 
> > To avoid this, we will need to preserve SOCK_PASSRIGHTS even on embryo
> > sockets.
> > 
> > A recent change made it possible to access the parent's flags in
> > sendmsg() via unix_sk(other)->listener->sk->sk_socket->flags, but
> > this introduces an unnecessary condition that is irrelevant for
> > most sockets, accept()ed sockets and clients.
> 
> What is this condition and how is it irrelevant? A constraint on the
> kernel having the recent change? I.e., not backportable?

Commit aed6ecef55d7 ("af_unix: Save listener for embryo socket.") is
added for a new GC but is a standalone patch.

If we want to use the listener's flags, the condition will be like...

	if (UNIXCB(skb).fp &&
	    ((other->sk_socket && other->sk_socket->sk_flags & SOCK_PASSRIGHTS) ||
	     (!other->sk_socket && unix_sk(other)->listener->sk->sk_socket->sk_flags && SOCK_PASSRIGHTS)))

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
  2025-05-10  1:56 ` [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock Kuniyuki Iwashima
  2025-05-12 19:20   ` Willem de Bruijn
@ 2025-05-13  2:03   ` kernel test robot
  1 sibling, 0 replies; 23+ messages in thread
From: kernel test robot @ 2025-05-13  2:03 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: oe-lkp, lkp, netdev, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, Simon Horman,
	Christian Brauner, Kuniyuki Iwashima, Kuniyuki Iwashima,
	oliver.sang


Hello,

kernel test robot noticed "last_state.load_disk_fail" on:

commit: aaf23d0d0fb88067cb42ff899899ba235c757f97 ("[PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.")
url: https://github.com/intel-lab-lkp/linux/commits/Kuniyuki-Iwashima/af_unix-Factorise-test_bit-for-SOCK_PASSCRED-and-SOCK_PASSPIDFD/20250510-100150
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git 0b28182c73a3d013bcabbb890dc1070a8388f55a
patch link: https://lore.kernel.org/all/20250510015652.9931-7-kuniyu@amazon.com/
patch subject: [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.

in testcase: igt
version: igt-x86_64-aa9b10408-1_20250419
with following parameters:

	group: group-13



config: x86_64-rhel-9.4-func
compiler: gcc-12
test machine: 20 threads 1 sockets (Commet Lake) with 16G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202505130959.3212276d-lkp@intel.com


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250513/202505130959.3212276d-lkp@intel.com


we don't have enough knowledge how this net change causes load_disk_fail issues,
but the issue is persistent after multi runs. just report FYI.



[  188.263289][  T440] LKP: stdout: 410: can't load the disk /dev/disk/by-id/ata-INTEL_SSDSC2BA800G4_BTHV61840945800OGN-part3, skip testing...
[  188.263298][  T440] 
[  190.692731][  T199] pcieport 0000:00:1c.4: AER: Uncorrectable (Non-Fatal) error message received from 0000:03:02.0
[  190.703436][  T199] pcieport 0000:03:02.0: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Receiver ID)
[  190.715600][  T199] pcieport 0000:03:02.0:   device [8086:15ea] error status/mask=00200000/00000000
[  190.724643][  T199] pcieport 0000:03:02.0:    [21] ACSViol                (First)
[  190.732684][  T199] xhci_hcd 0000:06:00.0: AER: can't recover (no error_detected callback)
[  190.740979][  T199] pcieport 0000:03:02.0: AER: device recovery failed
[  190.785617][    T1] e1000e: EEE TX LPI TIMER: 00000011
[    0.000000][    T0] Linux version 6.15.0-rc5-01027-gaaf23d0d0fb8 (kbuild@5fa04ef0f688) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Mon May 12 09:41:23 CST 2025
[    0.000000][    T0] Command line:  ip=::::lkp-cml-d02::dhcp root=/dev/ram0 RESULT_ROOT=/result/igt/group-13/lkp-cml-d02/debian-12-x86_64-20240206.cgz/x86_64-rhel-9.4-func/gcc-12/aaf23d0d0fb88067cb42ff899899ba235c757f97/5 BOOT_IMAGE=/pkg/linux/x86_64-rhel-9.4-func/gcc-12/aaf23d0d0fb88067cb42ff899899ba235c757f97/vmlinuz-6.15.0-rc5-01027-gaaf23d0d0fb8 branch=linux-review/Kuniyuki-Iwashima/af_unix-Factorise-test_bit-for-SOCK_PASSCRED-and-SOCK_PASSPIDFD/20250510-100150 job=/lkp/jobs/scheduled/lkp-cml-d02/igt-group-13-debian-12-x86_64-20240206.cgz-aaf23d0d0fb8-20250512-42822-1bot8b9-2.yaml user=lkp ARCH=x86_64 kconfig=x86_64-rhel-9.4-func commit=aaf23d0d0fb88067cb42ff899899ba235c757f97 intremap=posted_msi acpi_rsdp=0x9b0fe014 max_uptime=6000 LKP_SERVER=internal-lkp-server nokaslr selinux=0 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_log
[    0.000000][    T0] BIOS-provided physical RAM map:
[    0.000000][    T0] BIOS-e820: [mem 0x0000000000000100-0x000000000009efff] usable

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 4/9] tcp: Restrict SO_TXREHASH to TCP socket.
  2025-05-12 22:14     ` Kuniyuki Iwashima
@ 2025-05-13  2:42       ` Willem de Bruijn
  2025-05-13  3:11         ` Kuniyuki Iwashima
  0 siblings, 1 reply; 23+ messages in thread
From: Willem de Bruijn @ 2025-05-13  2:42 UTC (permalink / raw)
  To: Kuniyuki Iwashima, willemdebruijn.kernel
  Cc: brauner, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
	pabeni, willemb

Kuniyuki Iwashima wrote:
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Mon, 12 May 2025 15:18:12 -0400
> > Kuniyuki Iwashima wrote:
> > > sk->sk_txrehash is only used for TCP.
> > > 
> > > Let's restrict SO_TXREHASH to TCP to reflect this.
> > > 
> > > Later, we will make sk_txrehash a part of the union for other
> > > protocol families, so we set 0 explicitly in getsockopt().
> > > 
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > ---
> > >  net/core/sock.c | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index b64df2463300..5c84a608ddd7 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -1276,6 +1276,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> > >  		return 0;
> > >  		}
> > >  	case SO_TXREHASH:
> > > +		if (!sk_is_tcp(sk))
> > > +			return -EOPNOTSUPP;
> > >  		if (val < -1 || val > 1)
> > >  			return -EINVAL;
> > >  		if ((u8)val == SOCK_TXREHASH_DEFAULT)
> > > @@ -2102,8 +2104,11 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
> > >  		break;
> > >  
> > >  	case SO_TXREHASH:
> > > -		/* Paired with WRITE_ONCE() in sk_setsockopt() */
> > > -		v.val = READ_ONCE(sk->sk_txrehash);
> > > +		if (sk_is_tcp(sk))
> > > +			/* Paired with WRITE_ONCE() in sk_setsockopt() */
> > > +			v.val = READ_ONCE(sk->sk_txrehash);
> > > +		else
> > > +			v.val = 0;
> > 
> > Here and in the following getsockopt calls: should the call fail with
> > EOPNOTSUPP rather than return a value that is legal where the option
> > is supported (in TCP).
> 
> I was wondering which is better but didn't have preference, so will
> return -EOPNOTSUPP in v3.

It's a reminder that this is breaking an existing API.

It is unlikely to affect any real users in this case, as SO_TXREHASH
never was function for Unix sockets. But for this and subsequent such
changes we have to be aware that it is in principle a user visible
change.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
  2025-05-12 22:20     ` Kuniyuki Iwashima
@ 2025-05-13  2:44       ` Willem de Bruijn
  2025-05-13  3:18         ` Kuniyuki Iwashima
  0 siblings, 1 reply; 23+ messages in thread
From: Willem de Bruijn @ 2025-05-13  2:44 UTC (permalink / raw)
  To: Kuniyuki Iwashima, willemdebruijn.kernel
  Cc: brauner, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
	pabeni, willemb

Kuniyuki Iwashima wrote:
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Mon, 12 May 2025 15:20:54 -0400
> > Kuniyuki Iwashima wrote:
> > > As explained in the next patch, SO_PASSRIGHTS would have a problem
> > > if we assigned a corresponding bit to socket->flags, so it must be
> > > managed in struct sock.
> > > 
> > > Mixing socket->flags and sk->sk_flags for similar options will look
> > > confusing, and sk->sk_flags does not have enough space on 32bit system.
> > > 
> > > Also, as mentioned in commit 16e572626961 ("af_unix: dont send
> > > SCM_CREDENTIALS by default"), SOCK_PASSCRED and SOCK_PASSPID handling
> > > is known to be slow, and managing the flags in struct socket cannot
> > > avoid that for embryo sockets.
> > > 
> > > Let's move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
> > > 
> > > While at it, other SOCK_XXX flags in net.h are grouped as enum.
> > > 
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > 
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index 1ab59efbafc5..9540cbe3d83e 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -1224,19 +1224,19 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> > >  		if (!sk_may_scm_recv(sk))
> > >  			return -EOPNOTSUPP;
> > >  
> > > -		assign_bit(SOCK_PASSSEC, &sock->flags, valbool);
> > > +		sk->sk_scm_security = valbool;
> > 
> > Is it safe to switch from atomic to non-atomic updates?
> > 
> > Reads and writes can race. Especially given that these are bit stores, so RMW.
> 
> Exactly, will move them down after sockopt_lock_sock().

So all reads in the datapath are with the socket locked? Okay, that
was not immediately obvious to me. If respinning, please add a
comment.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 7/9] af_unix: Inherit sk_flags at connect().
  2025-05-12 22:34     ` Kuniyuki Iwashima
@ 2025-05-13  2:48       ` Willem de Bruijn
  2025-05-13  3:20         ` Kuniyuki Iwashima
  0 siblings, 1 reply; 23+ messages in thread
From: Willem de Bruijn @ 2025-05-13  2:48 UTC (permalink / raw)
  To: Kuniyuki Iwashima, willemdebruijn.kernel
  Cc: brauner, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
	pabeni, willemb

Kuniyuki Iwashima wrote:
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Mon, 12 May 2025 15:38:13 -0400
> > Kuniyuki Iwashima wrote:
> > > For SOCK_STREAM embryo sockets, the SO_PASS{CRED,PIDFD,SEC} options
> > > are inherited from the parent listen()ing socket.
> > > 
> > > Currently, this inheritance happens at accept(), because these
> > > attributes were stored in sk->sk_socket->flags and the struct socket
> > > is not allocated until accept().
> > > 
> > > This leads to unintentional behaviour.
> > > 
> > > When a peer sends data to an embryo socket in the accept() queue,
> > > unix_maybe_add_creds() embeds credentials into the skb, even if
> > > neither the peer nor the listener has enabled these options.
> > > 
> > > If the option is enabled, the embryo socket receives the ancillary
> > > data after accept().  If not, the data is silently discarded.
> > > 
> > > This conservative approach works for SO_PASS{CRED,PIDFD,SEC}, but not
> > > for SO_PASSRIGHTS; once an SCM_RIGHTS with a hung file descriptor is
> > > sent, it’s game over.
> > 
> > Should this be a fix to net then?
> 
> Regarding SO_PASS{CRED,PIDFD,SEC}, this patch is a small optimisation
> to save unnecessary get_pid() etc, like 16e572626961
> 
> And, SO_PASSRIGHTS is not yet added here, so this is not a fix.

Ack, thanks.
 
> Maybe I should have clarified like "this works but would not for SO_PASSRIGHTS".
> 
> 
> > 
> > It depends on the move of this one bit from socket to sock. So is not
> > a stand-alone patch. But does not need all of the previous cleanup
> > patches if needs to be backportable.
> > 
> > > 
> > > To avoid this, we will need to preserve SOCK_PASSRIGHTS even on embryo
> > > sockets.
> > > 
> > > A recent change made it possible to access the parent's flags in
> > > sendmsg() via unix_sk(other)->listener->sk->sk_socket->flags, but
> > > this introduces an unnecessary condition that is irrelevant for
> > > most sockets, accept()ed sockets and clients.
> > 
> > What is this condition and how is it irrelevant? A constraint on the
> > kernel having the recent change? I.e., not backportable?
> 
> Commit aed6ecef55d7 ("af_unix: Save listener for embryo socket.") is
> added for a new GC but is a standalone patch.
> 
> If we want to use the listener's flags, the condition will be like...
> 
> 	if (UNIXCB(skb).fp &&
> 	    ((other->sk_socket && other->sk_socket->sk_flags & SOCK_PASSRIGHTS) ||
> 	     (!other->sk_socket && unix_sk(other)->listener->sk->sk_socket->sk_flags && SOCK_PASSRIGHTS)))

No need to change as this stays as net-next.

Might we helpful to replace "a recent commit" with the full commit reference.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 4/9] tcp: Restrict SO_TXREHASH to TCP socket.
  2025-05-13  2:42       ` Willem de Bruijn
@ 2025-05-13  3:11         ` Kuniyuki Iwashima
  0 siblings, 0 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-13  3:11 UTC (permalink / raw)
  To: willemdebruijn.kernel
  Cc: brauner, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
	pabeni, willemb

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 12 May 2025 22:42:25 -0400
> Kuniyuki Iwashima wrote:
> > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> > Date: Mon, 12 May 2025 15:18:12 -0400
> > > Kuniyuki Iwashima wrote:
> > > > sk->sk_txrehash is only used for TCP.
> > > > 
> > > > Let's restrict SO_TXREHASH to TCP to reflect this.
> > > > 
> > > > Later, we will make sk_txrehash a part of the union for other
> > > > protocol families, so we set 0 explicitly in getsockopt().
> > > > 
> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > ---
> > > >  net/core/sock.c | 9 +++++++--
> > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > index b64df2463300..5c84a608ddd7 100644
> > > > --- a/net/core/sock.c
> > > > +++ b/net/core/sock.c
> > > > @@ -1276,6 +1276,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> > > >  		return 0;
> > > >  		}
> > > >  	case SO_TXREHASH:
> > > > +		if (!sk_is_tcp(sk))
> > > > +			return -EOPNOTSUPP;
> > > >  		if (val < -1 || val > 1)
> > > >  			return -EINVAL;
> > > >  		if ((u8)val == SOCK_TXREHASH_DEFAULT)
> > > > @@ -2102,8 +2104,11 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
> > > >  		break;
> > > >  
> > > >  	case SO_TXREHASH:
> > > > -		/* Paired with WRITE_ONCE() in sk_setsockopt() */
> > > > -		v.val = READ_ONCE(sk->sk_txrehash);
> > > > +		if (sk_is_tcp(sk))
> > > > +			/* Paired with WRITE_ONCE() in sk_setsockopt() */
> > > > +			v.val = READ_ONCE(sk->sk_txrehash);
> > > > +		else
> > > > +			v.val = 0;
> > > 
> > > Here and in the following getsockopt calls: should the call fail with
> > > EOPNOTSUPP rather than return a value that is legal where the option
> > > is supported (in TCP).
> > 
> > I was wondering which is better but didn't have preference, so will
> > return -EOPNOTSUPP in v3.
> 
> It's a reminder that this is breaking an existing API.
> 
> It is unlikely to affect any real users in this case, as SO_TXREHASH
> never was function for Unix sockets. But for this and subsequent such
> changes we have to be aware that it is in principle a user visible
> change.

I agree it's unlikely, and given we recently added 5b0af621c3f6 ("net:
restrict SO_REUSEPORT to inet sockets"), I think this type of change
is okay where it's appropriate.

And I noticed 5b0af621c3f6 didn't change getsockopt().

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
  2025-05-13  2:44       ` Willem de Bruijn
@ 2025-05-13  3:18         ` Kuniyuki Iwashima
  0 siblings, 0 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-13  3:18 UTC (permalink / raw)
  To: willemdebruijn.kernel
  Cc: brauner, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
	pabeni, willemb

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 12 May 2025 22:44:04 -0400
> Kuniyuki Iwashima wrote:
> > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> > Date: Mon, 12 May 2025 15:20:54 -0400
> > > Kuniyuki Iwashima wrote:
> > > > As explained in the next patch, SO_PASSRIGHTS would have a problem
> > > > if we assigned a corresponding bit to socket->flags, so it must be
> > > > managed in struct sock.
> > > > 
> > > > Mixing socket->flags and sk->sk_flags for similar options will look
> > > > confusing, and sk->sk_flags does not have enough space on 32bit system.
> > > > 
> > > > Also, as mentioned in commit 16e572626961 ("af_unix: dont send
> > > > SCM_CREDENTIALS by default"), SOCK_PASSCRED and SOCK_PASSPID handling
> > > > is known to be slow, and managing the flags in struct socket cannot
> > > > avoid that for embryo sockets.
> > > > 
> > > > Let's move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
> > > > 
> > > > While at it, other SOCK_XXX flags in net.h are grouped as enum.
> > > > 
> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > 
> > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > index 1ab59efbafc5..9540cbe3d83e 100644
> > > > --- a/net/core/sock.c
> > > > +++ b/net/core/sock.c
> > > > @@ -1224,19 +1224,19 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> > > >  		if (!sk_may_scm_recv(sk))
> > > >  			return -EOPNOTSUPP;
> > > >  
> > > > -		assign_bit(SOCK_PASSSEC, &sock->flags, valbool);
> > > > +		sk->sk_scm_security = valbool;
> > > 
> > > Is it safe to switch from atomic to non-atomic updates?
> > > 
> > > Reads and writes can race. Especially given that these are bit stores, so RMW.
> > 
> > Exactly, will move them down after sockopt_lock_sock().
> 
> So all reads in the datapath are with the socket locked? Okay, that
> was not immediately obvious to me. If respinning, please add a
> comment.

Sure, will add a comment.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 7/9] af_unix: Inherit sk_flags at connect().
  2025-05-13  2:48       ` Willem de Bruijn
@ 2025-05-13  3:20         ` Kuniyuki Iwashima
  0 siblings, 0 replies; 23+ messages in thread
From: Kuniyuki Iwashima @ 2025-05-13  3:20 UTC (permalink / raw)
  To: willemdebruijn.kernel
  Cc: brauner, davem, edumazet, horms, kuba, kuni1840, kuniyu, netdev,
	pabeni, willemb

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 12 May 2025 22:48:09 -0400
> Kuniyuki Iwashima wrote:
> > From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> > Date: Mon, 12 May 2025 15:38:13 -0400
> > > Kuniyuki Iwashima wrote:
> > > > For SOCK_STREAM embryo sockets, the SO_PASS{CRED,PIDFD,SEC} options
> > > > are inherited from the parent listen()ing socket.
> > > > 
> > > > Currently, this inheritance happens at accept(), because these
> > > > attributes were stored in sk->sk_socket->flags and the struct socket
> > > > is not allocated until accept().
> > > > 
> > > > This leads to unintentional behaviour.
> > > > 
> > > > When a peer sends data to an embryo socket in the accept() queue,
> > > > unix_maybe_add_creds() embeds credentials into the skb, even if
> > > > neither the peer nor the listener has enabled these options.
> > > > 
> > > > If the option is enabled, the embryo socket receives the ancillary
> > > > data after accept().  If not, the data is silently discarded.
> > > > 
> > > > This conservative approach works for SO_PASS{CRED,PIDFD,SEC}, but not
> > > > for SO_PASSRIGHTS; once an SCM_RIGHTS with a hung file descriptor is
> > > > sent, it’s game over.
> > > 
> > > Should this be a fix to net then?
> > 
> > Regarding SO_PASS{CRED,PIDFD,SEC}, this patch is a small optimisation
> > to save unnecessary get_pid() etc, like 16e572626961
> > 
> > And, SO_PASSRIGHTS is not yet added here, so this is not a fix.
> 
> Ack, thanks.
>  
> > Maybe I should have clarified like "this works but would not for SO_PASSRIGHTS".
> > 
> > 
> > > 
> > > It depends on the move of this one bit from socket to sock. So is not
> > > a stand-alone patch. But does not need all of the previous cleanup
> > > patches if needs to be backportable.
> > > 
> > > > 
> > > > To avoid this, we will need to preserve SOCK_PASSRIGHTS even on embryo
> > > > sockets.
> > > > 
> > > > A recent change made it possible to access the parent's flags in
> > > > sendmsg() via unix_sk(other)->listener->sk->sk_socket->flags, but
> > > > this introduces an unnecessary condition that is irrelevant for
> > > > most sockets, accept()ed sockets and clients.
> > > 
> > > What is this condition and how is it irrelevant? A constraint on the
> > > kernel having the recent change? I.e., not backportable?
> > 
> > Commit aed6ecef55d7 ("af_unix: Save listener for embryo socket.") is
> > added for a new GC but is a standalone patch.
> > 
> > If we want to use the listener's flags, the condition will be like...
> > 
> > 	if (UNIXCB(skb).fp &&
> > 	    ((other->sk_socket && other->sk_socket->sk_flags & SOCK_PASSRIGHTS) ||
> > 	     (!other->sk_socket && unix_sk(other)->listener->sk->sk_socket->sk_flags && SOCK_PASSRIGHTS)))
> 
> No need to change as this stays as net-next.
> 
> Might we helpful to replace "a recent commit" with the full commit reference.

Will do.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-05-13  3:20 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-10  1:56 [PATCH v2 net-next 0/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
2025-05-10  1:56 ` [PATCH v2 net-next 1/9] af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD Kuniyuki Iwashima
2025-05-10  1:56 ` [PATCH v2 net-next 2/9] af_unix: Don't pass struct socket to maybe_add_creds() Kuniyuki Iwashima
2025-05-10  1:56 ` [PATCH v2 net-next 3/9] scm: Move scm_recv() from scm.h to scm.c Kuniyuki Iwashima
2025-05-10  1:56 ` [PATCH v2 net-next 4/9] tcp: Restrict SO_TXREHASH to TCP socket Kuniyuki Iwashima
2025-05-12 19:18   ` Willem de Bruijn
2025-05-12 22:14     ` Kuniyuki Iwashima
2025-05-13  2:42       ` Willem de Bruijn
2025-05-13  3:11         ` Kuniyuki Iwashima
2025-05-10  1:56 ` [PATCH v2 net-next 5/9] net: Restrict SO_PASS{CRED,PIDFD,SEC} to AF_{UNIX,NETLINK,BLUETOOTH} Kuniyuki Iwashima
2025-05-10  1:56 ` [PATCH v2 net-next 6/9] af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock Kuniyuki Iwashima
2025-05-12 19:20   ` Willem de Bruijn
2025-05-12 22:20     ` Kuniyuki Iwashima
2025-05-13  2:44       ` Willem de Bruijn
2025-05-13  3:18         ` Kuniyuki Iwashima
2025-05-13  2:03   ` kernel test robot
2025-05-10  1:56 ` [PATCH v2 net-next 7/9] af_unix: Inherit sk_flags at connect() Kuniyuki Iwashima
2025-05-12 19:38   ` Willem de Bruijn
2025-05-12 22:34     ` Kuniyuki Iwashima
2025-05-13  2:48       ` Willem de Bruijn
2025-05-13  3:20         ` Kuniyuki Iwashima
2025-05-10  1:56 ` [PATCH v2 net-next 8/9] af_unix: Introduce SO_PASSRIGHTS Kuniyuki Iwashima
2025-05-10  1:56 ` [PATCH v2 net-next 9/9] selftest: af_unix: Test SO_PASSRIGHTS Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).