Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Allison Henderson <achender@kernel.org>
To: Maoyi Xie <maoyixie.tju@gmail.com>,
	Simon Horman <horms@kernel.org>,
	 netdev@vger.kernel.org
Cc: "David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	linux-rdma@vger.kernel.org,  rds-devel@oss.oracle.com,
	linux-kernel@vger.kernel.org,
	Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>,
	Maoyi Xie <maoyi.xie@ntu.edu.sg>
Subject: Re: [PATCH net v3] rds: filter RDS_INFO_* getsockopt by caller's netns
Date: Tue, 12 May 2026 12:43:18 -0700	[thread overview]
Message-ID: <9b6148b29f57728d97a9fea8a400e87a3f6a1526.camel@kernel.org> (raw)
In-Reply-To: <20260511070211.1033178-1-maoyi.xie@ntu.edu.sg>

On Mon, 2026-05-11 at 15:02 +0800, Maoyi Xie wrote:
> The RDS_INFO_* family of getsockopt(2) options reads several
> file-scope global lists that are not per-netns:
> 
>   rds_sock_info / rds6_sock_info,
>   rds_sock_inc_info / rds6_sock_inc_info        -> rds_sock_list
>   rds_tcp_tc_info / rds6_tcp_tc_info            -> rds_tcp_tc_list
>   rds_conn_info / rds6_conn_info,
>   rds_conn_message_info_cmn (for the *_SEND_MESSAGES and
>   *_RETRANS_MESSAGES variants),
>   rds_for_each_conn_info (for RDS_INFO_IB_CONNECTIONS)
>                                                 -> rds_conn_hash[]
> 
> The handlers do not filter by the caller's network namespace.
> rds_info_getsockopt() has no netns or capable() check, and
> rds_create() has no capable() check, so AF_RDS is reachable from
> an unprivileged user namespace. As a result, an unprivileged
> caller in a fresh user_ns plus netns can read the bound address
> and sock inode of every RDS socket on the host, the peer address
> of incoming messages on every RDS socket on the host, the peer
> address and TCP sequence numbers of every rds-tcp connection on
> the host, and the peer address and RDS sequence numbers of every
> RDS connection on the host.
> 
> The rds-tcp transport is reachable from a non-initial netns (see
> rds_set_transport()), so a one-shot init_net gate at
> rds_info_getsockopt() would deny legitimate per-netns visibility
> to rds-tcp callers. Instead, filter at each handler by comparing
> the netns of the caller's socket to the netns of the list entry,
> or to rds_conn_net(conn) for connection paths. Only copy entries
> whose netns matches the caller. Counters (RDS_INFO_COUNTERS) are
> aggregate statistics and remain global.
> 
> Reproducer (KASAN VM, rds and rds_tcp loaded): an AF_RDS socket
> binds 127.0.0.1:4242 in init_net as root. A child process enters
> a fresh user_ns plus netns and opens AF_RDS there, then calls
> getsockopt(SOL_RDS, RDS_INFO_SOCKETS). Before this change, the
> child sees the init_net socket. After this change, the child
> sees zero entries.
> 
> Suggested-by: Allison Henderson <achender@kernel.org>
> Suggested-by: Simon Horman <horms@kernel.org>
> Reviewed-by: Allison Henderson <achender@kernel.org>
> Co-developed-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
> Signed-off-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
> ---
> v3: Address Simon Horman's review of v2. The size precheck and the
>     lens count are now both restricted to the caller's netns in
>     rds_sock_info, rds6_sock_info, rds_tcp_tc_info and
>     rds6_tcp_tc_info. Each handler now does a first pass under the
>     list lock to count entries visible in the caller's netns, then
>     short-circuits with that count if the user buffer is too small,
>     then a second pass to fill data. This closes both issues Simon
>     flagged: a zero-length probe no longer returns the global count,
>     and a caller that sizes its buffer to the value returned by lens
>     no longer hits ENOSPC on the second call.
>     Re-verified on KASAN VM with the v1 PoC: attacker in fresh
>     user_ns + netns sees zero RDS_INFO_SOCKETS entries; init_net
>     access sees its own entries; lens returns the ns-scoped count
>     on both probe and full reads.
> v2: rebased onto net/main tip (b266bacba) so patchwork can apply.
>     No code changes. Carries forward Reviewed-by from v1 review.
> v1: https://lore.kernel.org/r/20260506075031.2238596-1-maoyixie.tju@gmail.com
> 
Hi Maoyi,

The two-pass approach looks good to me. The zero-length probe now returns
an appropriately ns-scoped count.  I've already gave the rvb on v2, but I think
v3 is a cleaner solution.

Thanks,
Allison


>  net/rds/af_rds.c     | 42 ++++++++++++++++++++++++++++++++++++------
>  net/rds/connection.c | 13 +++++++++++++
>  net/rds/tcp.c        | 35 +++++++++++++++++++++++++++++++----
>  3 files changed, 80 insertions(+), 10 deletions(-)
> 
> diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
> index 76f625986..6e22b516b 100644
> --- a/net/rds/af_rds.c
> +++ b/net/rds/af_rds.c
> @@ -735,6 +735,7 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
>  			      struct rds_info_iterator *iter,
>  			      struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds_sock *rs;
>  	struct rds_incoming *inc;
>  	unsigned int total = 0;
> @@ -744,6 +745,9 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
>  	spin_lock_bh(&rds_sock_lock);
>  
>  	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		/* Only show sockets in the caller's netns. */
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
>  		/* This option only supports IPv4 sockets. */
>  		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
>  			continue;
> @@ -774,6 +778,7 @@ static void rds6_sock_inc_info(struct socket *sock, unsigned int len,
>  			       struct rds_info_iterator *iter,
>  			       struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds_incoming *inc;
>  	unsigned int total = 0;
>  	struct rds_sock *rs;
> @@ -783,6 +788,9 @@ static void rds6_sock_inc_info(struct socket *sock, unsigned int len,
>  	spin_lock_bh(&rds_sock_lock);
>  
>  	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		/* Only show sockets in the caller's netns. */
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
>  		read_lock(&rs->rs_recv_lock);
>  
>  		list_for_each_entry(inc, &rs->rs_recv_queue, i_item) {
> @@ -806,6 +814,7 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
>  			  struct rds_info_iterator *iter,
>  			  struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds_info_socket sinfo;
>  	unsigned int cnt = 0;
>  	struct rds_sock *rs;
> @@ -814,12 +823,22 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
>  
>  	spin_lock_bh(&rds_sock_lock);
>  
> -	if (len < rds_sock_count) {
> -		cnt = rds_sock_count;
> -		goto out;
> +	/* First pass: count entries visible in the caller's netns. */
> +	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
> +		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
> +			continue;
> +		cnt++;
>  	}
>  
> +	if (len < cnt)
> +		goto out;
> +
>  	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		/* Only show sockets in the caller's netns. */
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
>  		/* This option only supports IPv4 sockets. */
>  		if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
>  			continue;
> @@ -832,7 +851,6 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
>  		sinfo.inum = sock_i_ino(rds_rs_to_sk(rs));
>  
>  		rds_info_copy(iter, &sinfo, sizeof(sinfo));
> -		cnt++;
>  	}
>  
>  out:
> @@ -847,17 +865,29 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
>  			   struct rds_info_iterator *iter,
>  			   struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds6_info_socket sinfo6;
> +	unsigned int cnt = 0;
>  	struct rds_sock *rs;
>  
>  	len /= sizeof(struct rds6_info_socket);
>  
>  	spin_lock_bh(&rds_sock_lock);
>  
> -	if (len < rds_sock_count)
> +	/* First pass: count entries visible in the caller's netns. */
> +	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
> +		cnt++;
> +	}
> +
> +	if (len < cnt)
>  		goto out;
>  
>  	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		/* Only show sockets in the caller's netns. */
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
>  		sinfo6.sndbuf = rds_sk_sndbuf(rs);
>  		sinfo6.rcvbuf = rds_sk_rcvbuf(rs);
>  		sinfo6.bound_addr = rs->rs_bound_addr;
> @@ -870,7 +900,7 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
>  	}
>  
>   out:
> -	lens->nr = rds_sock_count;
> +	lens->nr = cnt;
>  	lens->each = sizeof(struct rds6_info_socket);
>  
>  	spin_unlock_bh(&rds_sock_lock);
> diff --git a/net/rds/connection.c b/net/rds/connection.c
> index c10b7ed06..7c8ab8e97 100644
> --- a/net/rds/connection.c
> +++ b/net/rds/connection.c
> @@ -568,6 +568,7 @@ static void rds_conn_message_info_cmn(struct socket *sock, unsigned int len,
>  				      struct rds_info_lengths *lens,
>  				      int want_send, bool isv6)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct hlist_head *head;
>  	struct list_head *list;
>  	struct rds_connection *conn;
> @@ -590,6 +591,9 @@ static void rds_conn_message_info_cmn(struct socket *sock, unsigned int len,
>  			struct rds_conn_path *cp;
>  			int npaths;
>  
> +			/* Only show connections in the caller's netns. */
> +			if (!net_eq(rds_conn_net(conn), net))
> +				continue;
>  			if (!isv6 && conn->c_isv6)
>  				continue;
>  
> @@ -688,6 +692,7 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
>  			  u64 *buffer,
>  			  size_t item_len)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct hlist_head *head;
>  	struct rds_connection *conn;
>  	size_t i;
> @@ -700,6 +705,9 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
>  	for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
>  	     i++, head++) {
>  		hlist_for_each_entry_rcu(conn, head, c_hash_node) {
> +			/* Only show connections in the caller's netns. */
> +			if (!net_eq(rds_conn_net(conn), net))
> +				continue;
>  
>  			/* Zero the per-item buffer before handing it to the
>  			 * visitor so any field the visitor does not write -
> @@ -733,6 +741,7 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
>  				    u64 *buffer,
>  				    size_t item_len)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct hlist_head *head;
>  	struct rds_connection *conn;
>  	size_t i;
> @@ -747,6 +756,10 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
>  		hlist_for_each_entry_rcu(conn, head, c_hash_node) {
>  			struct rds_conn_path *cp;
>  
> +			/* Only show connections in the caller's netns. */
> +			if (!net_eq(rds_conn_net(conn), net))
> +				continue;
> +
>  			/* XXX We only copy the information from the first
>  			 * path for now.  The problem is that if there are
>  			 * more than one underlying paths, we cannot report
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 654e23d13..105e83507 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -235,13 +235,24 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
>  			    struct rds_info_iterator *iter,
>  			    struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(rds_sock->sk);
>  	struct rds_info_tcp_socket tsinfo;
>  	struct rds_tcp_connection *tc;
> +	unsigned int cnt = 0;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
>  
> -	if (len / sizeof(tsinfo) < rds_tcp_tc_count)
> +	/* First pass: count entries visible in the caller's netns. */
> +	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
> +		if (tc->t_cpath->cp_conn->c_isv6)
> +			continue;
> +		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> +			continue;
> +		cnt++;
> +	}
> +
> +	if (len / sizeof(tsinfo) < cnt)
>  		goto out;
>  
>  	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
> @@ -249,6 +260,9 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
>  
>  		if (tc->t_cpath->cp_conn->c_isv6)
>  			continue;
> +		/* Only show connections in the caller's netns. */
> +		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> +			continue;
>  
>  		tsinfo.local_addr = inet->inet_saddr;
>  		tsinfo.local_port = inet->inet_sport;
> @@ -266,7 +280,7 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
>  	}
>  
>  out:
> -	lens->nr = rds_tcp_tc_count;
> +	lens->nr = cnt;
>  	lens->each = sizeof(tsinfo);
>  
>  	spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags);
> @@ -281,19 +295,32 @@ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len,
>  			     struct rds_info_iterator *iter,
>  			     struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds6_info_tcp_socket tsinfo6;
>  	struct rds_tcp_connection *tc;
> +	unsigned int cnt = 0;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
>  
> -	if (len / sizeof(tsinfo6) < rds6_tcp_tc_count)
> +	/* First pass: count entries visible in the caller's netns. */
> +	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
> +		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> +			continue;
> +		cnt++;
> +	}
> +
> +	if (len / sizeof(tsinfo6) < cnt)
>  		goto out;
>  
>  	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
>  		struct sock *sk = tc->t_sock->sk;
>  		struct inet_sock *inet = inet_sk(sk);
>  
> +		/* Only show connections in the caller's netns. */
> +		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> +			continue;
> +
>  		tsinfo6.local_addr = sk->sk_v6_rcv_saddr;
>  		tsinfo6.local_port = inet->inet_sport;
>  		tsinfo6.peer_addr = sk->sk_v6_daddr;
> @@ -309,7 +336,7 @@ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len,
>  	}
>  
>  out:
> -	lens->nr = rds6_tcp_tc_count;
> +	lens->nr = cnt;
>  	lens->each = sizeof(tsinfo6);
>  
>  	spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags);
> 
> base-commit: b266bacba796ff5c4dcd2ae2fc08aacf7ab39153


      parent reply	other threads:[~2026-05-12 19:43 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11  7:02 [PATCH net v3] rds: filter RDS_INFO_* getsockopt by caller's netns Maoyi Xie
2026-05-12  5:42 ` kernel test robot
2026-05-12 19:43 ` Allison Henderson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b6148b29f57728d97a9fea8a400e87a3f6a1526.camel@kernel.org \
    --to=achender@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maoyi.xie@ntu.edu.sg \
    --cc=maoyixie.tju@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=praveen.kakkolangara@aumovio.com \
    --cc=rds-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox