Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* [PATCH net v3] rds: filter RDS_INFO_* getsockopt by caller's netns
@ 2026-05-11  7:02 Maoyi Xie
  2026-05-12  5:42 ` kernel test robot
  2026-05-12 19:43 ` Allison Henderson
  0 siblings, 2 replies; 3+ messages in thread
From: Maoyi Xie @ 2026-05-11  7:02 UTC (permalink / raw)
  To: Simon Horman, Allison Henderson, netdev
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-rdma, rds-devel, linux-kernel, Praveen Kakkolangara,
	Maoyi Xie

The RDS_INFO_* family of getsockopt(2) options reads several
file-scope global lists that are not per-netns:

  rds_sock_info / rds6_sock_info,
  rds_sock_inc_info / rds6_sock_inc_info        -> rds_sock_list
  rds_tcp_tc_info / rds6_tcp_tc_info            -> rds_tcp_tc_list
  rds_conn_info / rds6_conn_info,
  rds_conn_message_info_cmn (for the *_SEND_MESSAGES and
  *_RETRANS_MESSAGES variants),
  rds_for_each_conn_info (for RDS_INFO_IB_CONNECTIONS)
                                                -> rds_conn_hash[]

The handlers do not filter by the caller's network namespace.
rds_info_getsockopt() has no netns or capable() check, and
rds_create() has no capable() check, so AF_RDS is reachable from
an unprivileged user namespace. As a result, an unprivileged
caller in a fresh user_ns plus netns can read the bound address
and sock inode of every RDS socket on the host, the peer address
of incoming messages on every RDS socket on the host, the peer
address and TCP sequence numbers of every rds-tcp connection on
the host, and the peer address and RDS sequence numbers of every
RDS connection on the host.

The rds-tcp transport is reachable from a non-initial netns (see
rds_set_transport()), so a one-shot init_net gate at
rds_info_getsockopt() would deny legitimate per-netns visibility
to rds-tcp callers. Instead, filter at each handler by comparing
the netns of the caller's socket to the netns of the list entry,
or to rds_conn_net(conn) for connection paths. Only copy entries
whose netns matches the caller. Counters (RDS_INFO_COUNTERS) are
aggregate statistics and remain global.

Reproducer (KASAN VM, rds and rds_tcp loaded): an AF_RDS socket
binds 127.0.0.1:4242 in init_net as root. A child process enters
a fresh user_ns plus netns and opens AF_RDS there, then calls
getsockopt(SOL_RDS, RDS_INFO_SOCKETS). Before this change, the
child sees the init_net socket. After this change, the child
sees zero entries.

Suggested-by: Allison Henderson <achender@kernel.org>
Suggested-by: Simon Horman <horms@kernel.org>
Reviewed-by: Allison Henderson <achender@kernel.org>
Co-developed-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
Signed-off-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
---
v3: Address Simon Horman's review of v2. The size precheck and the
    lens count are now both restricted to the caller's netns in
    rds_sock_info, rds6_sock_info, rds_tcp_tc_info and
    rds6_tcp_tc_info. Each handler now does a first pass under the
    list lock to count entries visible in the caller's netns, then
    short-circuits with that count if the user buffer is too small,
    then a second pass to fill data. This closes both issues Simon
    flagged: a zero-length probe no longer returns the global count,
    and a caller that sizes its buffer to the value returned by lens
    no longer hits ENOSPC on the second call.
    Re-verified on KASAN VM with the v1 PoC: attacker in fresh
    user_ns + netns sees zero RDS_INFO_SOCKETS entries; init_net
    access sees its own entries; lens returns the ns-scoped count
    on both probe and full reads.
v2: rebased onto net/main tip (b266bacba) so patchwork can apply.
    No code changes. Carries forward Reviewed-by from v1 review.
v1: https://lore.kernel.org/r/20260506075031.2238596-1-maoyixie.tju@gmail.com

 net/rds/af_rds.c     | 42 ++++++++++++++++++++++++++++++++++++------
 net/rds/connection.c | 13 +++++++++++++
 net/rds/tcp.c        | 35 +++++++++++++++++++++++++++++++----
 3 files changed, 80 insertions(+), 10 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 76f625986..6e22b516b 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -735,6 +735,7 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
 			      struct rds_info_iterator *iter,
 			      struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(sock->sk);
 	struct rds_sock *rs;
 	struct rds_incoming *inc;
 	unsigned int total = 0;
@@ -744,6 +745,9 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
 	spin_lock_bh(&rds_sock_lock);
 
 	list_for_each_entry(rs, &rds_sock_list, rs_item) {
+		/* Only show sockets in the caller's netns. */
+		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
+			continue;
 		/* This option only supports IPv4 sockets. */
 		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
 			continue;
@@ -774,6 +778,7 @@ static void rds6_sock_inc_info(struct socket *sock, unsigned int len,
 			       struct rds_info_iterator *iter,
 			       struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(sock->sk);
 	struct rds_incoming *inc;
 	unsigned int total = 0;
 	struct rds_sock *rs;
@@ -783,6 +788,9 @@ static void rds6_sock_inc_info(struct socket *sock, unsigned int len,
 	spin_lock_bh(&rds_sock_lock);
 
 	list_for_each_entry(rs, &rds_sock_list, rs_item) {
+		/* Only show sockets in the caller's netns. */
+		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
+			continue;
 		read_lock(&rs->rs_recv_lock);
 
 		list_for_each_entry(inc, &rs->rs_recv_queue, i_item) {
@@ -806,6 +814,7 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
 			  struct rds_info_iterator *iter,
 			  struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(sock->sk);
 	struct rds_info_socket sinfo;
 	unsigned int cnt = 0;
 	struct rds_sock *rs;
@@ -814,12 +823,22 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
 
 	spin_lock_bh(&rds_sock_lock);
 
-	if (len < rds_sock_count) {
-		cnt = rds_sock_count;
-		goto out;
+	/* First pass: count entries visible in the caller's netns. */
+	list_for_each_entry(rs, &rds_sock_list, rs_item) {
+		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
+			continue;
+		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
+			continue;
+		cnt++;
 	}
 
+	if (len < cnt)
+		goto out;
+
 	list_for_each_entry(rs, &rds_sock_list, rs_item) {
+		/* Only show sockets in the caller's netns. */
+		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
+			continue;
 		/* This option only supports IPv4 sockets. */
 		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
 			continue;
@@ -832,7 +851,6 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
 		sinfo.inum = sock_i_ino(rds_rs_to_sk(rs));
 
 		rds_info_copy(iter, &sinfo, sizeof(sinfo));
-		cnt++;
 	}
 
 out:
@@ -847,17 +865,29 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
 			   struct rds_info_iterator *iter,
 			   struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(sock->sk);
 	struct rds6_info_socket sinfo6;
+	unsigned int cnt = 0;
 	struct rds_sock *rs;
 
 	len /= sizeof(struct rds6_info_socket);
 
 	spin_lock_bh(&rds_sock_lock);
 
-	if (len < rds_sock_count)
+	/* First pass: count entries visible in the caller's netns. */
+	list_for_each_entry(rs, &rds_sock_list, rs_item) {
+		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
+			continue;
+		cnt++;
+	}
+
+	if (len < cnt)
 		goto out;
 
 	list_for_each_entry(rs, &rds_sock_list, rs_item) {
+		/* Only show sockets in the caller's netns. */
+		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
+			continue;
 		sinfo6.sndbuf = rds_sk_sndbuf(rs);
 		sinfo6.rcvbuf = rds_sk_rcvbuf(rs);
 		sinfo6.bound_addr = rs->rs_bound_addr;
@@ -870,7 +900,7 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
 	}
 
  out:
-	lens->nr = rds_sock_count;
+	lens->nr = cnt;
 	lens->each = sizeof(struct rds6_info_socket);
 
 	spin_unlock_bh(&rds_sock_lock);
diff --git a/net/rds/connection.c b/net/rds/connection.c
index c10b7ed06..7c8ab8e97 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -568,6 +568,7 @@ static void rds_conn_message_info_cmn(struct socket *sock, unsigned int len,
 				      struct rds_info_lengths *lens,
 				      int want_send, bool isv6)
 {
+	struct net *net = sock_net(sock->sk);
 	struct hlist_head *head;
 	struct list_head *list;
 	struct rds_connection *conn;
@@ -590,6 +591,9 @@ static void rds_conn_message_info_cmn(struct socket *sock, unsigned int len,
 			struct rds_conn_path *cp;
 			int npaths;
 
+			/* Only show connections in the caller's netns. */
+			if (!net_eq(rds_conn_net(conn), net))
+				continue;
 			if (!isv6 && conn->c_isv6)
 				continue;
 
@@ -688,6 +692,7 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
 			  u64 *buffer,
 			  size_t item_len)
 {
+	struct net *net = sock_net(sock->sk);
 	struct hlist_head *head;
 	struct rds_connection *conn;
 	size_t i;
@@ -700,6 +705,9 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
 	for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
 	     i++, head++) {
 		hlist_for_each_entry_rcu(conn, head, c_hash_node) {
+			/* Only show connections in the caller's netns. */
+			if (!net_eq(rds_conn_net(conn), net))
+				continue;
 
 			/* Zero the per-item buffer before handing it to the
 			 * visitor so any field the visitor does not write -
@@ -733,6 +741,7 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
 				    u64 *buffer,
 				    size_t item_len)
 {
+	struct net *net = sock_net(sock->sk);
 	struct hlist_head *head;
 	struct rds_connection *conn;
 	size_t i;
@@ -747,6 +756,10 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
 		hlist_for_each_entry_rcu(conn, head, c_hash_node) {
 			struct rds_conn_path *cp;
 
+			/* Only show connections in the caller's netns. */
+			if (!net_eq(rds_conn_net(conn), net))
+				continue;
+
 			/* XXX We only copy the information from the first
 			 * path for now.  The problem is that if there are
 			 * more than one underlying paths, we cannot report
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 654e23d13..105e83507 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -235,13 +235,24 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
 			    struct rds_info_iterator *iter,
 			    struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(rds_sock->sk);
 	struct rds_info_tcp_socket tsinfo;
 	struct rds_tcp_connection *tc;
+	unsigned int cnt = 0;
 	unsigned long flags;
 
 	spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
 
-	if (len / sizeof(tsinfo) < rds_tcp_tc_count)
+	/* First pass: count entries visible in the caller's netns. */
+	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
+		if (tc->t_cpath->cp_conn->c_isv6)
+			continue;
+		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
+			continue;
+		cnt++;
+	}
+
+	if (len / sizeof(tsinfo) < cnt)
 		goto out;
 
 	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
@@ -249,6 +260,9 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
 
 		if (tc->t_cpath->cp_conn->c_isv6)
 			continue;
+		/* Only show connections in the caller's netns. */
+		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
+			continue;
 
 		tsinfo.local_addr = inet->inet_saddr;
 		tsinfo.local_port = inet->inet_sport;
@@ -266,7 +280,7 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
 	}
 
 out:
-	lens->nr = rds_tcp_tc_count;
+	lens->nr = cnt;
 	lens->each = sizeof(tsinfo);
 
 	spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags);
@@ -281,19 +295,32 @@ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len,
 			     struct rds_info_iterator *iter,
 			     struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(sock->sk);
 	struct rds6_info_tcp_socket tsinfo6;
 	struct rds_tcp_connection *tc;
+	unsigned int cnt = 0;
 	unsigned long flags;
 
 	spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
 
-	if (len / sizeof(tsinfo6) < rds6_tcp_tc_count)
+	/* First pass: count entries visible in the caller's netns. */
+	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
+		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
+			continue;
+		cnt++;
+	}
+
+	if (len / sizeof(tsinfo6) < cnt)
 		goto out;
 
 	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
 		struct sock *sk = tc->t_sock->sk;
 		struct inet_sock *inet = inet_sk(sk);
 
+		/* Only show connections in the caller's netns. */
+		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
+			continue;
+
 		tsinfo6.local_addr = sk->sk_v6_rcv_saddr;
 		tsinfo6.local_port = inet->inet_sport;
 		tsinfo6.peer_addr = sk->sk_v6_daddr;
@@ -309,7 +336,7 @@ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len,
 	}
 
 out:
-	lens->nr = rds6_tcp_tc_count;
+	lens->nr = cnt;
 	lens->each = sizeof(tsinfo6);
 
 	spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags);

base-commit: b266bacba796ff5c4dcd2ae2fc08aacf7ab39153
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net v3] rds: filter RDS_INFO_* getsockopt by caller's netns
  2026-05-11  7:02 [PATCH net v3] rds: filter RDS_INFO_* getsockopt by caller's netns Maoyi Xie
@ 2026-05-12  5:42 ` kernel test robot
  2026-05-12 19:43 ` Allison Henderson
  1 sibling, 0 replies; 3+ messages in thread
From: kernel test robot @ 2026-05-12  5:42 UTC (permalink / raw)
  To: Maoyi Xie, Simon Horman, Allison Henderson, netdev
  Cc: llvm, oe-kbuild-all, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, linux-rdma, rds-devel, linux-kernel,
	Praveen Kakkolangara, Maoyi Xie

Hi Maoyi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on b266bacba796ff5c4dcd2ae2fc08aacf7ab39153]

url:    https://github.com/intel-lab-lkp/linux/commits/Maoyi-Xie/rds-filter-RDS_INFO_-getsockopt-by-caller-s-netns/20260512-045249
base:   b266bacba796ff5c4dcd2ae2fc08aacf7ab39153
patch link:    https://lore.kernel.org/r/20260511070211.1033178-1-maoyi.xie%40ntu.edu.sg
patch subject: [PATCH net v3] rds: filter RDS_INFO_* getsockopt by caller's netns
config: arm-randconfig-001-20260512 (https://download.01.org/0day-ci/archive/20260512/202605121353.896RQym5-lkp@intel.com/config)
compiler: clang version 23.0.0git (https://github.com/llvm/llvm-project 5bac06718f502014fade905512f1d26d578a18f3)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260512/202605121353.896RQym5-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605121353.896RQym5-lkp@intel.com/

All warnings (new ones prefixed by >>):

   In file included from net/rds/tcp.c:42:
   In file included from net/rds/rds.h:10:
   include/uapi/linux/rds.h:233:18: warning: field peer_addr within 'struct rds6_info_tcp_socket' is less aligned than 'struct in6_addr' and is usually due to 'struct rds6_info_tcp_socket' being packed, which can lead to unaligned accesses [-Wunaligned-access]
     233 |         struct in6_addr peer_addr;
         |                         ^
>> net/rds/tcp.c:52:21: warning: variable 'rds_tcp_tc_count' set but not used [-Wunused-but-set-global]
      52 | static unsigned int rds_tcp_tc_count;
         |                     ^
>> net/rds/tcp.c:54:21: warning: variable 'rds6_tcp_tc_count' set but not used [-Wunused-but-set-global]
      54 | static unsigned int rds6_tcp_tc_count;
         |                     ^
   3 warnings generated.
--
>> net/rds/af_rds.c:46:22: warning: variable 'rds_sock_count' set but not used [-Wunused-but-set-global]
      46 | static unsigned long rds_sock_count;
         |                      ^
   1 warning generated.


vim +/rds_tcp_tc_count +52 net/rds/tcp.c

70041088e3b9766 Andy Grover       2009-08-21  41  
70041088e3b9766 Andy Grover       2009-08-21 @42  #include "rds.h"
70041088e3b9766 Andy Grover       2009-08-21  43  #include "tcp.h"
70041088e3b9766 Andy Grover       2009-08-21  44  
70041088e3b9766 Andy Grover       2009-08-21  45  /* only for info exporting */
70041088e3b9766 Andy Grover       2009-08-21  46  static DEFINE_SPINLOCK(rds_tcp_tc_list_lock);
70041088e3b9766 Andy Grover       2009-08-21  47  static LIST_HEAD(rds_tcp_tc_list);
1e2b44e78eead7b Ka-Cheong Poon    2018-07-23  48  
1e2b44e78eead7b Ka-Cheong Poon    2018-07-23  49  /* rds_tcp_tc_count counts only IPv4 connections.
1e2b44e78eead7b Ka-Cheong Poon    2018-07-23  50   * rds6_tcp_tc_count counts both IPv4 and IPv6 connections.
1e2b44e78eead7b Ka-Cheong Poon    2018-07-23  51   */
ff51bf841587c75 stephen hemminger 2010-10-19 @52  static unsigned int rds_tcp_tc_count;
e65d4d96334e3ff Ka-Cheong Poon    2018-07-30  53  #if IS_ENABLED(CONFIG_IPV6)
1e2b44e78eead7b Ka-Cheong Poon    2018-07-23 @54  static unsigned int rds6_tcp_tc_count;
e65d4d96334e3ff Ka-Cheong Poon    2018-07-30  55  #endif
70041088e3b9766 Andy Grover       2009-08-21  56  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net v3] rds: filter RDS_INFO_* getsockopt by caller's netns
  2026-05-11  7:02 [PATCH net v3] rds: filter RDS_INFO_* getsockopt by caller's netns Maoyi Xie
  2026-05-12  5:42 ` kernel test robot
@ 2026-05-12 19:43 ` Allison Henderson
  1 sibling, 0 replies; 3+ messages in thread
From: Allison Henderson @ 2026-05-12 19:43 UTC (permalink / raw)
  To: Maoyi Xie, Simon Horman, netdev
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-rdma, rds-devel, linux-kernel, Praveen Kakkolangara,
	Maoyi Xie

On Mon, 2026-05-11 at 15:02 +0800, Maoyi Xie wrote:
> The RDS_INFO_* family of getsockopt(2) options reads several
> file-scope global lists that are not per-netns:
> 
>   rds_sock_info / rds6_sock_info,
>   rds_sock_inc_info / rds6_sock_inc_info        -> rds_sock_list
>   rds_tcp_tc_info / rds6_tcp_tc_info            -> rds_tcp_tc_list
>   rds_conn_info / rds6_conn_info,
>   rds_conn_message_info_cmn (for the *_SEND_MESSAGES and
>   *_RETRANS_MESSAGES variants),
>   rds_for_each_conn_info (for RDS_INFO_IB_CONNECTIONS)
>                                                 -> rds_conn_hash[]
> 
> The handlers do not filter by the caller's network namespace.
> rds_info_getsockopt() has no netns or capable() check, and
> rds_create() has no capable() check, so AF_RDS is reachable from
> an unprivileged user namespace. As a result, an unprivileged
> caller in a fresh user_ns plus netns can read the bound address
> and sock inode of every RDS socket on the host, the peer address
> of incoming messages on every RDS socket on the host, the peer
> address and TCP sequence numbers of every rds-tcp connection on
> the host, and the peer address and RDS sequence numbers of every
> RDS connection on the host.
> 
> The rds-tcp transport is reachable from a non-initial netns (see
> rds_set_transport()), so a one-shot init_net gate at
> rds_info_getsockopt() would deny legitimate per-netns visibility
> to rds-tcp callers. Instead, filter at each handler by comparing
> the netns of the caller's socket to the netns of the list entry,
> or to rds_conn_net(conn) for connection paths. Only copy entries
> whose netns matches the caller. Counters (RDS_INFO_COUNTERS) are
> aggregate statistics and remain global.
> 
> Reproducer (KASAN VM, rds and rds_tcp loaded): an AF_RDS socket
> binds 127.0.0.1:4242 in init_net as root. A child process enters
> a fresh user_ns plus netns and opens AF_RDS there, then calls
> getsockopt(SOL_RDS, RDS_INFO_SOCKETS). Before this change, the
> child sees the init_net socket. After this change, the child
> sees zero entries.
> 
> Suggested-by: Allison Henderson <achender@kernel.org>
> Suggested-by: Simon Horman <horms@kernel.org>
> Reviewed-by: Allison Henderson <achender@kernel.org>
> Co-developed-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
> Signed-off-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> ---
> v3: Address Simon Horman's review of v2. The size precheck and the
>     lens count are now both restricted to the caller's netns in
>     rds_sock_info, rds6_sock_info, rds_tcp_tc_info and
>     rds6_tcp_tc_info. Each handler now does a first pass under the
>     list lock to count entries visible in the caller's netns, then
>     short-circuits with that count if the user buffer is too small,
>     then a second pass to fill data. This closes both issues Simon
>     flagged: a zero-length probe no longer returns the global count,
>     and a caller that sizes its buffer to the value returned by lens
>     no longer hits ENOSPC on the second call.
>     Re-verified on KASAN VM with the v1 PoC: attacker in fresh
>     user_ns + netns sees zero RDS_INFO_SOCKETS entries; init_net
>     access sees its own entries; lens returns the ns-scoped count
>     on both probe and full reads.
> v2: rebased onto net/main tip (b266bacba) so patchwork can apply.
>     No code changes. Carries forward Reviewed-by from v1 review.
> v1: https://lore.kernel.org/r/20260506075031.2238596-1-maoyixie.tju@gmail.com
> 
Hi Maoyi,

The two-pass approach looks good to me. The zero-length probe now returns
an appropriately ns-scoped count.  I've already gave the rvb on v2, but I think
v3 is a cleaner solution.

Thanks,
Allison


>  net/rds/af_rds.c     | 42 ++++++++++++++++++++++++++++++++++++------
>  net/rds/connection.c | 13 +++++++++++++
>  net/rds/tcp.c        | 35 +++++++++++++++++++++++++++++++----
>  3 files changed, 80 insertions(+), 10 deletions(-)
> 
> diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
> index 76f625986..6e22b516b 100644
> --- a/net/rds/af_rds.c
> +++ b/net/rds/af_rds.c
> @@ -735,6 +735,7 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
>  			      struct rds_info_iterator *iter,
>  			      struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds_sock *rs;
>  	struct rds_incoming *inc;
>  	unsigned int total = 0;
> @@ -744,6 +745,9 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
>  	spin_lock_bh(&rds_sock_lock);
>  
>  	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		/* Only show sockets in the caller's netns. */
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
>  		/* This option only supports IPv4 sockets. */
>  		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
>  			continue;
> @@ -774,6 +778,7 @@ static void rds6_sock_inc_info(struct socket *sock, unsigned int len,
>  			       struct rds_info_iterator *iter,
>  			       struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds_incoming *inc;
>  	unsigned int total = 0;
>  	struct rds_sock *rs;
> @@ -783,6 +788,9 @@ static void rds6_sock_inc_info(struct socket *sock, unsigned int len,
>  	spin_lock_bh(&rds_sock_lock);
>  
>  	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		/* Only show sockets in the caller's netns. */
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
>  		read_lock(&rs->rs_recv_lock);
>  
>  		list_for_each_entry(inc, &rs->rs_recv_queue, i_item) {
> @@ -806,6 +814,7 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
>  			  struct rds_info_iterator *iter,
>  			  struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds_info_socket sinfo;
>  	unsigned int cnt = 0;
>  	struct rds_sock *rs;
> @@ -814,12 +823,22 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
>  
>  	spin_lock_bh(&rds_sock_lock);
>  
> -	if (len < rds_sock_count) {
> -		cnt = rds_sock_count;
> -		goto out;
> +	/* First pass: count entries visible in the caller's netns. */
> +	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
> +		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
> +			continue;
> +		cnt++;
>  	}
>  
> +	if (len < cnt)
> +		goto out;
> +
>  	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		/* Only show sockets in the caller's netns. */
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
>  		/* This option only supports IPv4 sockets. */
>  		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
>  			continue;
> @@ -832,7 +851,6 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
>  		sinfo.inum = sock_i_ino(rds_rs_to_sk(rs));
>  
>  		rds_info_copy(iter, &sinfo, sizeof(sinfo));
> -		cnt++;
>  	}
>  
>  out:
> @@ -847,17 +865,29 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
>  			   struct rds_info_iterator *iter,
>  			   struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds6_info_socket sinfo6;
> +	unsigned int cnt = 0;
>  	struct rds_sock *rs;
>  
>  	len /= sizeof(struct rds6_info_socket);
>  
>  	spin_lock_bh(&rds_sock_lock);
>  
> -	if (len < rds_sock_count)
> +	/* First pass: count entries visible in the caller's netns. */
> +	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
> +		cnt++;
> +	}
> +
> +	if (len < cnt)
>  		goto out;
>  
>  	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		/* Only show sockets in the caller's netns. */
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
>  		sinfo6.sndbuf = rds_sk_sndbuf(rs);
>  		sinfo6.rcvbuf = rds_sk_rcvbuf(rs);
>  		sinfo6.bound_addr = rs->rs_bound_addr;
> @@ -870,7 +900,7 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
>  	}
>  
>   out:
> -	lens->nr = rds_sock_count;
> +	lens->nr = cnt;
>  	lens->each = sizeof(struct rds6_info_socket);
>  
>  	spin_unlock_bh(&rds_sock_lock);
> diff --git a/net/rds/connection.c b/net/rds/connection.c
> index c10b7ed06..7c8ab8e97 100644
> --- a/net/rds/connection.c
> +++ b/net/rds/connection.c
> @@ -568,6 +568,7 @@ static void rds_conn_message_info_cmn(struct socket *sock, unsigned int len,
>  				      struct rds_info_lengths *lens,
>  				      int want_send, bool isv6)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct hlist_head *head;
>  	struct list_head *list;
>  	struct rds_connection *conn;
> @@ -590,6 +591,9 @@ static void rds_conn_message_info_cmn(struct socket *sock, unsigned int len,
>  			struct rds_conn_path *cp;
>  			int npaths;
>  
> +			/* Only show connections in the caller's netns. */
> +			if (!net_eq(rds_conn_net(conn), net))
> +				continue;
>  			if (!isv6 && conn->c_isv6)
>  				continue;
>  
> @@ -688,6 +692,7 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
>  			  u64 *buffer,
>  			  size_t item_len)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct hlist_head *head;
>  	struct rds_connection *conn;
>  	size_t i;
> @@ -700,6 +705,9 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
>  	for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
>  	     i++, head++) {
>  		hlist_for_each_entry_rcu(conn, head, c_hash_node) {
> +			/* Only show connections in the caller's netns. */
> +			if (!net_eq(rds_conn_net(conn), net))
> +				continue;
>  
>  			/* Zero the per-item buffer before handing it to the
>  			 * visitor so any field the visitor does not write -
> @@ -733,6 +741,7 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
>  				    u64 *buffer,
>  				    size_t item_len)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct hlist_head *head;
>  	struct rds_connection *conn;
>  	size_t i;
> @@ -747,6 +756,10 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
>  		hlist_for_each_entry_rcu(conn, head, c_hash_node) {
>  			struct rds_conn_path *cp;
>  
> +			/* Only show connections in the caller's netns. */
> +			if (!net_eq(rds_conn_net(conn), net))
> +				continue;
> +
>  			/* XXX We only copy the information from the first
>  			 * path for now.  The problem is that if there are
>  			 * more than one underlying paths, we cannot report
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 654e23d13..105e83507 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -235,13 +235,24 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
>  			    struct rds_info_iterator *iter,
>  			    struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(rds_sock->sk);
>  	struct rds_info_tcp_socket tsinfo;
>  	struct rds_tcp_connection *tc;
> +	unsigned int cnt = 0;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
>  
> -	if (len / sizeof(tsinfo) < rds_tcp_tc_count)
> +	/* First pass: count entries visible in the caller's netns. */
> +	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
> +		if (tc->t_cpath->cp_conn->c_isv6)
> +			continue;
> +		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> +			continue;
> +		cnt++;
> +	}
> +
> +	if (len / sizeof(tsinfo) < cnt)
>  		goto out;
>  
>  	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
> @@ -249,6 +260,9 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
>  
>  		if (tc->t_cpath->cp_conn->c_isv6)
>  			continue;
> +		/* Only show connections in the caller's netns. */
> +		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> +			continue;
>  
>  		tsinfo.local_addr = inet->inet_saddr;
>  		tsinfo.local_port = inet->inet_sport;
> @@ -266,7 +280,7 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
>  	}
>  
>  out:
> -	lens->nr = rds_tcp_tc_count;
> +	lens->nr = cnt;
>  	lens->each = sizeof(tsinfo);
>  
>  	spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags);
> @@ -281,19 +295,32 @@ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len,
>  			     struct rds_info_iterator *iter,
>  			     struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds6_info_tcp_socket tsinfo6;
>  	struct rds_tcp_connection *tc;
> +	unsigned int cnt = 0;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
>  
> -	if (len / sizeof(tsinfo6) < rds6_tcp_tc_count)
> +	/* First pass: count entries visible in the caller's netns. */
> +	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
> +		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> +			continue;
> +		cnt++;
> +	}
> +
> +	if (len / sizeof(tsinfo6) < cnt)
>  		goto out;
>  
>  	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
>  		struct sock *sk = tc->t_sock->sk;
>  		struct inet_sock *inet = inet_sk(sk);
>  
> +		/* Only show connections in the caller's netns. */
> +		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> +			continue;
> +
>  		tsinfo6.local_addr = sk->sk_v6_rcv_saddr;
>  		tsinfo6.local_port = inet->inet_sport;
>  		tsinfo6.peer_addr = sk->sk_v6_daddr;
> @@ -309,7 +336,7 @@ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len,
>  	}
>  
>  out:
> -	lens->nr = rds6_tcp_tc_count;
> +	lens->nr = cnt;
>  	lens->each = sizeof(tsinfo6);
>  
>  	spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags);
> 
> base-commit: b266bacba796ff5c4dcd2ae2fc08aacf7ab39153


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-12 19:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11  7:02 [PATCH net v3] rds: filter RDS_INFO_* getsockopt by caller's netns Maoyi Xie
2026-05-12  5:42 ` kernel test robot
2026-05-12 19:43 ` Allison Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox