Re: [PATCH net v2] rds: filter RDS_INFO_* getsockopt by caller's netns

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Simon Horman <horms@kernel.org>
To: maoyixie.tju@gmail.com
Cc: 'Simon Horman' <horms@kernel.org>,
	achender@kernel.org, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, rds-devel@oss.oracle.com,
	linux-kernel@vger.kernel.org, maoyi.xie@ntu.edu.sg,
	praveen.kakkolangara@aumovio.com
Subject: Re: [PATCH net v2] rds: filter RDS_INFO_* getsockopt by caller's netns
Date: Sun, 10 May 2026 15:54:27 +0100	[thread overview]
Message-ID: <20260510145425.1372018-3-horms@kernel.org> (raw)
In-Reply-To: <20260507081332.2868770-1-maoyixie.tju@gmail.com>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
rds: filter RDS_INFO_* getsockopt by caller's netns

This commit updates the RDS_INFO_* getsockopt handlers to filter sockets and
connections by the caller's network namespace. This prevents users in
isolated namespaces from reading host-wide socket state.

> diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
> index 76f625986a7f2..98f3cfd48b813 100644
> --- a/net/rds/af_rds.c
> +++ b/net/rds/af_rds.c
[ ... ]
> @@ -847,17 +859,24 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
>  			   struct rds_info_iterator *iter,
>  			   struct rds_info_lengths *lens)
>  {
> +	struct net *net = sock_net(sock->sk);
>  	struct rds6_info_socket sinfo6;
> +	unsigned int cnt = 0;
>  	struct rds_sock *rs;
>  
>  	len /= sizeof(struct rds6_info_socket);
>  
>  	spin_lock_bh(&rds_sock_lock);
>  
> -	if (len < rds_sock_count)
> +	if (len < rds_sock_count) {
> +		cnt = rds_sock_count;
>  		goto out;
> +	}

Does this early-out check using the global rds_sock_count break the namespace
isolation and force callers to over-allocate memory?

If a user in an isolated network namespace provides a 0-length buffer to probe
the required size, len < rds_sock_count evaluates to true and the global count
is returned to userspace. Could this allow a user to track the exact host-wide
number of RDS sockets?

Additionally, if an isolated container has 1 socket but the host system has
thousands, the kernel might reject a correctly sized buffer with -ENOSPC and
demand a buffer large enough for all system-wide sockets. Is it possible to
remove this global counter check and always iterate the list to return the
exact netns-filtered count?

Simon says: I'm not convinced that the issue flagged above is a bug.
            But it does seem suboptimal to me.

>  	list_for_each_entry(rs, &rds_sock_list, rs_item) {
> +		/* Only show sockets in the caller's netns. */
> +		if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> +			continue;
[ ... ]
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 654e23d13e3df..ef9e958cafb4f 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
[ ... ]
> @@ -235,20 +235,27 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
>  			    struct rds_info_iterator *iter,
>  			    struct rds_info_lengths *lens)
>  {
[ ... ]
>  	list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
>  		struct inet_sock *inet = inet_sk(tc->t_sock->sk);

Simon says: As a pre-existing issue I don't think this needs to block
            progress of this patch. But it does seem worth investigating
	    (perhaps that is already happening?).

Can concurrent getsockopt calls trigger a NULL pointer dereference here?

While this isn't a regression introduced by this patch, it appears there is
a race condition during list traversal. Looking at rds_tcp_set_callbacks(), a
newly allocated connection tc is added to rds_tcp_tc_list under the lock, but
the lock is released before tc->t_sock is assigned:

rds_tcp_set_callbacks() {
    spin_lock(&rds_tcp_tc_list_lock);
    list_add_tail(&tc->t_list_item, &rds_tcp_tc_list);
    ...
    spin_unlock(&rds_tcp_tc_list_lock);
    ...
    tc->t_sock = sock;
}

If a caller concurrently executes this getsockopt handler during that window,
it would acquire the lock, observe the new entry, and attempt to evaluate
inet_sk(tc->t_sock->sk). Since tc->t_sock is still NULL, would dereferencing
NULL->sk result in a panic?

>  		if (tc->t_cpath->cp_conn->c_isv6)
>  			continue;
> +		/* Only show connections in the caller's netns. */
> +		if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> +			continue;
>  
>  		tsinfo.local_addr = inet->inet_saddr;

next prev parent reply	other threads:[~2026-05-10 14:57 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-07  8:13 [PATCH net v2] rds: filter RDS_INFO_* getsockopt by caller's netns Maoyi Xie
2026-05-10 14:54 ` Simon Horman [this message]
2026-05-11  6:41   ` Maoyi Xie
2026-05-12 10:37     ` Simon Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260510145425.1372018-3-horms@kernel.org \
    --to=horms@kernel.org \
    --cc=achender@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maoyi.xie@ntu.edu.sg \
    --cc=maoyixie.tju@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=praveen.kakkolangara@aumovio.com \
    --cc=rds-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.