[PATCH net v2] rds: filter RDS_INFO_* getsockopt by caller's netns

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* [PATCH net v2] rds: filter RDS_INFO_* getsockopt by caller's netns
@ 2026-05-07  8:13 Maoyi Xie
  2026-05-10 14:54 ` Simon Horman
  0 siblings, 1 reply; 4+ messages in thread
From: Maoyi Xie @ 2026-05-07  8:13 UTC (permalink / raw)
  To: Allison Henderson
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, netdev, linux-rdma, rds-devel, linux-kernel,
	Maoyi Xie, Praveen Kakkolangara

From: Maoyi Xie <maoyi.xie@ntu.edu.sg>

The RDS_INFO_* family of getsockopt(2) options reads several
file-scope global lists that are not per-netns:

  rds_sock_info / rds6_sock_info,
  rds_sock_inc_info / rds6_sock_inc_info        -> rds_sock_list
  rds_tcp_tc_info / rds6_tcp_tc_info            -> rds_tcp_tc_list
  rds_conn_info / rds6_conn_info,
  rds_conn_message_info_cmn (for the *_SEND_MESSAGES and
  *_RETRANS_MESSAGES variants),
  rds_for_each_conn_info (for RDS_INFO_IB_CONNECTIONS)
                                                -> rds_conn_hash[]

The handlers do not filter by the caller's network namespace.
rds_info_getsockopt() has no netns or capable() check, and
rds_create() has no capable() check, so AF_RDS is reachable from
an unprivileged user namespace. As a result, an unprivileged
caller in a fresh user_ns plus netns can read the bound address
and sock inode of every RDS socket on the host, the peer address
of incoming messages on every RDS socket on the host, the peer
address and TCP sequence numbers of every rds-tcp connection on
the host, and the peer address and RDS sequence numbers of every
RDS connection on the host.

The rds-tcp transport is reachable from a non-initial netns (see
rds_set_transport()), so a one-shot init_net gate at
rds_info_getsockopt() would deny legitimate per-netns visibility
to rds-tcp callers. Instead, filter at each handler by comparing
the netns of the caller's socket to the netns of the list entry,
or to rds_conn_net(conn) for connection paths. Only copy entries
whose netns matches the caller. Counters (RDS_INFO_COUNTERS) are
aggregate statistics and remain global.

Reproducer (KASAN VM, rds and rds_tcp loaded): an AF_RDS socket
binds 127.0.0.1:4242 in init_net as root. A child process enters
a fresh user_ns plus netns and opens AF_RDS there, then calls
getsockopt(SOL_RDS, RDS_INFO_SOCKETS). Before this change, the
child sees the init_net socket. After this change, the child
sees zero entries.

Suggested-by: Allison Henderson <achender@kernel.org>
Reviewed-by: Allison Henderson <achender@kernel.org>
Co-developed-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
Signed-off-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
---
v2: rebased onto net/main tip (b266bacba) so patchwork can apply.
    No code changes. Carries forward Reviewed-by from v1 review.
    Re-verified on KASAN VM with the same PoC: attacker in fresh
    user_ns plus netns sees zero RDS_INFO_SOCKETS entries while
    init_net access is preserved.
v1: https://lore.kernel.org/r/20260506075031.2238596-1-maoyixie.tju@gmail.com

 net/rds/af_rds.c     | 24 ++++++++++++++++++++++--
 net/rds/connection.c | 13 +++++++++++++
 net/rds/tcp.c        | 25 +++++++++++++++++++++----
 3 files changed, 56 insertions(+), 6 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 76f625986..98f3cfd48 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -735,6 +735,7 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
 			      struct rds_info_iterator *iter,
 			      struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(sock->sk);
 	struct rds_sock *rs;
 	struct rds_incoming *inc;
 	unsigned int total = 0;
@@ -744,6 +745,9 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
 	spin_lock_bh(&rds_sock_lock);
 
 	list_for_each_entry(rs, &rds_sock_list, rs_item) {
+		/* Only show sockets in the caller's netns. */
+		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
+			continue;
 		/* This option only supports IPv4 sockets. */
 		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
 			continue;
@@ -774,6 +778,7 @@ static void rds6_sock_inc_info(struct socket *sock, unsigned int len,
 			       struct rds_info_iterator *iter,
 			       struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(sock->sk);
 	struct rds_incoming *inc;
 	unsigned int total = 0;
 	struct rds_sock *rs;
@@ -783,6 +788,9 @@ static void rds6_sock_inc_info(struct socket *sock, unsigned int len,
 	spin_lock_bh(&rds_sock_lock);
 
 	list_for_each_entry(rs, &rds_sock_list, rs_item) {
+		/* Only show sockets in the caller's netns. */
+		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
+			continue;
 		read_lock(&rs->rs_recv_lock);
 
 		list_for_each_entry(inc, &rs->rs_recv_queue, i_item) {
@@ -806,6 +814,7 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
 			  struct rds_info_iterator *iter,
 			  struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(sock->sk);
 	struct rds_info_socket sinfo;
 	unsigned int cnt = 0;
 	struct rds_sock *rs;
@@ -820,6 +829,9 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
 	}
 
 	list_for_each_entry(rs, &rds_sock_list, rs_item) {
+		/* Only show sockets in the caller's netns. */
+		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
+			continue;
 		/* This option only supports IPv4 sockets. */
 		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
 			continue;
@@ -847,17 +859,24 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
 			   struct rds_info_iterator *iter,
 			   struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(sock->sk);
 	struct rds6_info_socket sinfo6;
+	unsigned int cnt = 0;
 	struct rds_sock *rs;
 
 	len /= sizeof(struct rds6_info_socket);
 
 	spin_lock_bh(&rds_sock_lock);
 
-	if (len < rds_sock_count)
+	if (len < rds_sock_count) {
+		cnt = rds_sock_count;
 		goto out;
+	}
 
 	list_for_each_entry(rs, &rds_sock_list, rs_item) {
+		/* Only show sockets in the caller's netns. */
+		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
+			continue;
 		sinfo6.sndbuf = rds_sk_sndbuf(rs);
 		sinfo6.rcvbuf = rds_sk_rcvbuf(rs);
 		sinfo6.bound_addr = rs->rs_bound_addr;
@@ -867,10 +886,11 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
 		sinfo6.inum = sock_i_ino(rds_rs_to_sk(rs));
 
 		rds_info_copy(iter, &sinfo6, sizeof(sinfo6));
+		cnt++;
 	}
 
  out:
-	lens->nr = rds_sock_count;
+	lens->nr = cnt;
 	lens->each = sizeof(struct rds6_info_socket);
 
 	spin_unlock_bh(&rds_sock_lock);
diff --git a/net/rds/connection.c b/net/rds/connection.c
index c10b7ed06..7c8ab8e97 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -568,6 +568,7 @@ static void rds_conn_message_info_cmn(struct socket *sock, unsigned int len,
 				      struct rds_info_lengths *lens,
 				      int want_send, bool isv6)
 {
+	struct net *net = sock_net(sock->sk);
 	struct hlist_head *head;
 	struct list_head *list;
 	struct rds_connection *conn;
@@ -590,6 +591,9 @@ static void rds_conn_message_info_cmn(struct socket *sock, unsigned int len,
 			struct rds_conn_path *cp;
 			int npaths;
 
+			/* Only show connections in the caller's netns. */
+			if (!net_eq(rds_conn_net(conn), net))
+				continue;
 			if (!isv6 && conn->c_isv6)
 				continue;
 
@@ -688,6 +692,7 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
 			  u64 *buffer,
 			  size_t item_len)
 {
+	struct net *net = sock_net(sock->sk);
 	struct hlist_head *head;
 	struct rds_connection *conn;
 	size_t i;
@@ -700,6 +705,9 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
 	for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
 	     i++, head++) {
 		hlist_for_each_entry_rcu(conn, head, c_hash_node) {
+			/* Only show connections in the caller's netns. */
+			if (!net_eq(rds_conn_net(conn), net))
+				continue;
 
 			/* Zero the per-item buffer before handing it to the
 			 * visitor so any field the visitor does not write -
@@ -733,6 +741,7 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
 				    u64 *buffer,
 				    size_t item_len)
 {
+	struct net *net = sock_net(sock->sk);
 	struct hlist_head *head;
 	struct rds_connection *conn;
 	size_t i;
@@ -747,6 +756,10 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
 		hlist_for_each_entry_rcu(conn, head, c_hash_node) {
 			struct rds_conn_path *cp;
 
+			/* Only show connections in the caller's netns. */
+			if (!net_eq(rds_conn_net(conn), net))
+				continue;
+
 			/* XXX We only copy the information from the first
 			 * path for now.  The problem is that if there are
 			 * more than one underlying paths, we cannot report
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 654e23d13..ef9e958ca 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -235,20 +235,27 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
 			    struct rds_info_iterator *iter,
 			    struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(rds_sock->sk);
 	struct rds_info_tcp_socket tsinfo;
 	struct rds_tcp_connection *tc;
+	unsigned int cnt = 0;
 	unsigned long flags;
 
 	spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
 
-	if (len / sizeof(tsinfo) < rds_tcp_tc_count)
+	if (len / sizeof(tsinfo) < rds_tcp_tc_count) {
+		cnt = rds_tcp_tc_count;
 		goto out;
+	}
 
 	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
 		struct inet_sock *inet = inet_sk(tc->t_sock->sk);
 
 		if (tc->t_cpath->cp_conn->c_isv6)
 			continue;
+		/* Only show connections in the caller's netns. */
+		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
+			continue;
 
 		tsinfo.local_addr = inet->inet_saddr;
 		tsinfo.local_port = inet->inet_sport;
@@ -263,10 +270,11 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
 		tsinfo.tos = tc->t_cpath->cp_conn->c_tos;
 
 		rds_info_copy(iter, &tsinfo, sizeof(tsinfo));
+		cnt++;
 	}
 
 out:
-	lens->nr = rds_tcp_tc_count;
+	lens->nr = cnt;
 	lens->each = sizeof(tsinfo);
 
 	spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags);
@@ -281,19 +289,27 @@ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len,
 			     struct rds_info_iterator *iter,
 			     struct rds_info_lengths *lens)
 {
+	struct net *net = sock_net(sock->sk);
 	struct rds6_info_tcp_socket tsinfo6;
 	struct rds_tcp_connection *tc;
+	unsigned int cnt = 0;
 	unsigned long flags;
 
 	spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
 
-	if (len / sizeof(tsinfo6) < rds6_tcp_tc_count)
+	if (len / sizeof(tsinfo6) < rds6_tcp_tc_count) {
+		cnt = rds6_tcp_tc_count;
 		goto out;
+	}
 
 	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
 		struct sock *sk = tc->t_sock->sk;
 		struct inet_sock *inet = inet_sk(sk);
 
+		/* Only show connections in the caller's netns. */
+		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
+			continue;
+
 		tsinfo6.local_addr = sk->sk_v6_rcv_saddr;
 		tsinfo6.local_port = inet->inet_sport;
 		tsinfo6.peer_addr = sk->sk_v6_daddr;
@@ -306,10 +322,11 @@ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len,
 		tsinfo6.last_seen_una = tc->t_last_seen_una;
 
 		rds_info_copy(iter, &tsinfo6, sizeof(tsinfo6));
+		cnt++;
 	}
 
 out:
-	lens->nr = rds6_tcp_tc_count;
+	lens->nr = cnt;
 	lens->each = sizeof(tsinfo6);
 
 	spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags);

base-commit: b266bacba796ff5c4dcd2ae2fc08aacf7ab39153
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] rds: filter RDS_INFO_* getsockopt by caller's netns
  2026-05-07  8:13 [PATCH net v2] rds: filter RDS_INFO_* getsockopt by caller's netns Maoyi Xie
@ 2026-05-10 14:54 ` Simon Horman
  2026-05-11  6:41   ` Maoyi Xie
  0 siblings, 1 reply; 4+ messages in thread
From: Simon Horman @ 2026-05-10 14:54 UTC (permalink / raw)
  To: maoyixie.tju
  Cc: 'Simon Horman', achender, davem, edumazet, kuba, pabeni,
	netdev, linux-rdma, rds-devel, linux-kernel, maoyi.xie,
	praveen.kakkolangara

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
rds: filter RDS_INFO_* getsockopt by caller's netns

This commit updates the RDS_INFO_* getsockopt handlers to filter sockets and
connections by the caller's network namespace. This prevents users in
isolated namespaces from reading host-wide socket state.

> diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
> index 76f625986a7f2..98f3cfd48b813 100644
> --- a/net/rds/af_rds.c
> +++ b/net/rds/af_rds.c
[ ... ]
> @@ -847,17 +859,24 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
>  			   struct rds_info_iterator *iter,
>  			   struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds6_info_socket sinfo6;
> +	unsigned int cnt = 0;
>  	struct rds_sock *rs;
>  
>  	len /= sizeof(struct rds6_info_socket);
>  
>  	spin_lock_bh(&rds_sock_lock);
>  
> -	if (len < rds_sock_count)
> +	if (len < rds_sock_count) {
> +		cnt = rds_sock_count;
>  		goto out;
> +	}

Does this early-out check using the global rds_sock_count break the namespace
isolation and force callers to over-allocate memory?

If a user in an isolated network namespace provides a 0-length buffer to probe
the required size, len < rds_sock_count evaluates to true and the global count
is returned to userspace. Could this allow a user to track the exact host-wide
number of RDS sockets?

Additionally, if an isolated container has 1 socket but the host system has
thousands, the kernel might reject a correctly sized buffer with -ENOSPC and
demand a buffer large enough for all system-wide sockets. Is it possible to
remove this global counter check and always iterate the list to return the
exact netns-filtered count?

Simon says: I'm not convinced that the issue flagged above is a bug.
            But it does seem suboptimal to me.

>  	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		/* Only show sockets in the caller's netns. */
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
[ ... ]
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 654e23d13e3df..ef9e958cafb4f 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
[ ... ]
> @@ -235,20 +235,27 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
>  			    struct rds_info_iterator *iter,
>  			    struct rds_info_lengths *lens)
>  {
[ ... ]
>  	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
>  		struct inet_sock *inet = inet_sk(tc->t_sock->sk);

Simon says: As a pre-existing issue I don't think this needs to block
            progress of this patch. But it does seem worth investigating
	    (perhaps that is already happening?).

Can concurrent getsockopt calls trigger a NULL pointer dereference here?

While this isn't a regression introduced by this patch, it appears there is
a race condition during list traversal. Looking at rds_tcp_set_callbacks(), a
newly allocated connection tc is added to rds_tcp_tc_list under the lock, but
the lock is released before tc->t_sock is assigned:

rds_tcp_set_callbacks() {
    spin_lock(&rds_tcp_tc_list_lock);
    list_add_tail(&tc->t_list_item, &rds_tcp_tc_list);
    ...
    spin_unlock(&rds_tcp_tc_list_lock);
    ...
    tc->t_sock = sock;
}

If a caller concurrently executes this getsockopt handler during that window,
it would acquire the lock, observe the new entry, and attempt to evaluate
inet_sk(tc->t_sock->sk). Since tc->t_sock is still NULL, would dereferencing
NULL->sk result in a panic?

>  		if (tc->t_cpath->cp_conn->c_isv6)
>  			continue;
> +		/* Only show connections in the caller's netns. */
> +		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> +			continue;
>  
>  		tsinfo.local_addr = inet->inet_saddr;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] rds: filter RDS_INFO_* getsockopt by caller's netns
  2026-05-10 14:54 ` Simon Horman
@ 2026-05-11  6:41   ` Maoyi Xie
  2026-05-12 10:37     ` Simon Horman
  0 siblings, 1 reply; 4+ messages in thread
From: Maoyi Xie @ 2026-05-11  6:41 UTC (permalink / raw)
  To: Simon Horman
  Cc: achender, davem, edumazet, kuba, pabeni, netdev, linux-rdma,
	rds-devel, linux-kernel, maoyi.xie, praveen.kakkolangara

Hi Simon,

Thanks for the review.

> Does this early-out check using the global rds_sock_count break the
> namespace isolation and force callers to over-allocate memory?

Both effects are present. The size returned via lens to a probing
caller is still the global count. A caller in an isolated ns that
sizes its buffer to that value can also see ENOSPC on the second
call. The precheck compares against the global count. The data
written only covers entries in the caller's ns.

v3 addresses this. Each handler now does a first pass to count
entries in the caller's ns. The precheck uses that count. A second
pass fills the buffer. The change applies to four handlers:
rds_sock_info and rds6_sock_info in net/rds/af_rds.c, plus
rds_tcp_tc_info and rds6_tcp_tc_info in net/rds/tcp.c. lens->nr
now reflects the ns scoped count on both probe and full reads.

Re-verified on a KASAN VM. One AF_RDS socket is bound in init_net
to 127.0.0.1:4242. The attacker is the same process after
unshare(CLONE_NEWUSER | CLONE_NEWNET) and uid_map "0 0 1".

  [init]     count-probe rc=-1 errno=ENOSPC optlen_after=28 entries=1
  [init]     full-read   rc=0  len=28 entries=1 (127.0.0.1:4242)
  [attacker] count-probe rc=0  optlen_after=0 entries=0
  [attacker] full-read   rc=0  len=0 entries=0

Pre-v3 the precheck returned the global count of 1 to the attacker
via lens->nr on the zero-length probe. v3 returns 0.

> Can concurrent getsockopt calls trigger a NULL pointer dereference
> here?

Yes, the window looks reachable.

The writer takes rds_tcp_tc_list_lock and calls list_add_tail. It
releases the lock. Only after that it assigns tc->t_sock = sock.
The reader takes the same lock, walks the list, and dereferences
inet_sk(tc->t_sock->sk). There is no NULL check on the read side.
A reader that enters between the writer's spin_unlock and the
t_sock store observes a list entry whose t_sock is still NULL.

The companion restore_callbacks path is safe. list_del_init is
inside the lock. A reader holding the lock cannot observe the
unlinked entry. The matching tc->t_sock = NULL outside the lock
is then harmless. Another reader in the same file at line 676
already checks !tc->t_sock before use.

The smallest fix is to move tc->t_sock = sock into the
rds_tcp_tc_list_lock critical section in rds_tcp_set_callbacks,
just before list_add_tail. The list insertion and the t_sock
store then become atomic from the reader's view. The diff is
one line moved. It does not affect the callback_lock side
effects below.

This is independent of the netns filter. I have not built a
runtime PoC. The window is short. Does the code analysis above
match your reading? If yes, I can send this as a separate patch
with a Fixes tag.

Thanks,
Maoyi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v2] rds: filter RDS_INFO_* getsockopt by caller's netns
  2026-05-11  6:41   ` Maoyi Xie
@ 2026-05-12 10:37     ` Simon Horman
  0 siblings, 0 replies; 4+ messages in thread
From: Simon Horman @ 2026-05-12 10:37 UTC (permalink / raw)
  To: Maoyi Xie
  Cc: achender, davem, edumazet, kuba, pabeni, netdev, linux-rdma,
	rds-devel, linux-kernel, maoyi.xie, praveen.kakkolangara

On Mon, May 11, 2026 at 02:41:58PM +0800, Maoyi Xie wrote:
> Hi Simon,
> 
> Thanks for the review.
> 
> > Does this early-out check using the global rds_sock_count break the
> > namespace isolation and force callers to over-allocate memory?
> 
> Both effects are present. The size returned via lens to a probing
> caller is still the global count. A caller in an isolated ns that
> sizes its buffer to that value can also see ENOSPC on the second
> call. The precheck compares against the global count. The data
> written only covers entries in the caller's ns.
> 
> v3 addresses this. Each handler now does a first pass to count
> entries in the caller's ns. The precheck uses that count. A second
> pass fills the buffer. The change applies to four handlers:
> rds_sock_info and rds6_sock_info in net/rds/af_rds.c, plus
> rds_tcp_tc_info and rds6_tcp_tc_info in net/rds/tcp.c. lens->nr
> now reflects the ns scoped count on both probe and full reads.
> 
> Re-verified on a KASAN VM. One AF_RDS socket is bound in init_net
> to 127.0.0.1:4242. The attacker is the same process after
> unshare(CLONE_NEWUSER | CLONE_NEWNET) and uid_map "0 0 1".
> 
>   [init]     count-probe rc=-1 errno=ENOSPC optlen_after=28 entries=1
>   [init]     full-read   rc=0  len=28 entries=1 (127.0.0.1:4242)
>   [attacker] count-probe rc=0  optlen_after=0 entries=0
>   [attacker] full-read   rc=0  len=0 entries=0
> 
> Pre-v3 the precheck returned the global count of 1 to the attacker
> via lens->nr on the zero-length probe. v3 returns 0.

Thanks. I will look over the updated code.

> > Can concurrent getsockopt calls trigger a NULL pointer dereference
> > here?
> 
> Yes, the window looks reachable.
> 
> The writer takes rds_tcp_tc_list_lock and calls list_add_tail. It
> releases the lock. Only after that it assigns tc->t_sock = sock.
> The reader takes the same lock, walks the list, and dereferences
> inet_sk(tc->t_sock->sk). There is no NULL check on the read side.
> A reader that enters between the writer's spin_unlock and the
> t_sock store observes a list entry whose t_sock is still NULL.
> 
> The companion restore_callbacks path is safe. list_del_init is
> inside the lock. A reader holding the lock cannot observe the
> unlinked entry. The matching tc->t_sock = NULL outside the lock
> is then harmless. Another reader in the same file at line 676
> already checks !tc->t_sock before use.
> 
> The smallest fix is to move tc->t_sock = sock into the
> rds_tcp_tc_list_lock critical section in rds_tcp_set_callbacks,
> just before list_add_tail. The list insertion and the t_sock
> store then become atomic from the reader's view. The diff is
> one line moved. It does not affect the callback_lock side
> effects below.
> 
> This is independent of the netns filter. I have not built a
> runtime PoC. The window is short. Does the code analysis above
> match your reading? If yes, I can send this as a separate patch
> with a Fixes tag.

Likewise, Thanks.

I agree that should be sufficient to address this problem.
And that is is appropriate to post it as a separate patch for with a Fixes tag.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-12 10:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07  8:13 [PATCH net v2] rds: filter RDS_INFO_* getsockopt by caller's netns Maoyi Xie
2026-05-10 14:54 ` Simon Horman
2026-05-11  6:41   ` Maoyi Xie
2026-05-12 10:37     ` Simon Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox