From: Allison Henderson <achender@kernel.org>
To: Maoyi Xie <maoyixie.tju@gmail.com>
Cc: "David S . Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
rds-devel@oss.oracle.com, linux-kernel@vger.kernel.org,
Maoyi Xie <maoyi.xie@ntu.edu.sg>,
Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
Subject: Re: [PATCH net] rds: filter RDS_INFO_* getsockopt by caller's netns
Date: Wed, 06 May 2026 22:17:43 -0700 [thread overview]
Message-ID: <4df3eb5dc608cf8b4649f1358cb74b9fbbfbbca6.camel@kernel.org> (raw)
In-Reply-To: <20260506075031.2238596-1-maoyixie.tju@gmail.com>
On Wed, 2026-05-06 at 15:50 +0800, Maoyi Xie wrote:
> From: Maoyi Xie <maoyi.xie@ntu.edu.sg>
>
> The RDS_INFO_* family of getsockopt(2) options reads several
> file-scope global lists that are not per-netns:
>
> rds_sock_info / rds6_sock_info,
> rds_sock_inc_info / rds6_sock_inc_info -> rds_sock_list
> rds_tcp_tc_info / rds6_tcp_tc_info -> rds_tcp_tc_list
> rds_conn_info / rds6_conn_info,
> rds_conn_message_info_cmn (for the *_SEND_MESSAGES and
> *_RETRANS_MESSAGES variants),
> rds_for_each_conn_info (for RDS_INFO_IB_CONNECTIONS)
> -> rds_conn_hash[]
>
> The handlers do not filter by the caller's network namespace.
> rds_info_getsockopt() has no netns or capable() check, and
> rds_create() has no capable() check, so AF_RDS is reachable from
> an unprivileged user namespace. As a result, an unprivileged
> caller in a fresh user_ns plus netns can read the bound address
> and sock inode of every RDS socket on the host, the peer address
> of incoming messages on every RDS socket on the host, the peer
> address and TCP sequence numbers of every rds-tcp connection on
> the host, and the peer address and RDS sequence numbers of every
> RDS connection on the host.
>
> The rds-tcp transport is reachable from a non-initial netns (see
> rds_set_transport()), so a one-shot init_net gate at
> rds_info_getsockopt() would deny legitimate per-netns visibility
> to rds-tcp callers. Instead, filter at each handler by comparing
> the netns of the caller's socket to the netns of the list entry,
> or to rds_conn_net(conn) for connection paths. Only copy entries
> whose netns matches the caller. Counters (RDS_INFO_COUNTERS) are
> aggregate statistics and remain global.
>
> Reproducer (KASAN VM, rds and rds_tcp loaded): an AF_RDS socket
> binds 127.0.0.1:4242 in init_net as root. A child process enters
> a fresh user_ns plus netns and opens AF_RDS there, then calls
> getsockopt(SOL_RDS, RDS_INFO_SOCKETS). Before this change, the
> child sees the init_net socket. After this change, the child
> sees zero entries.
>
> Suggested-by: Allison Henderson <achender@kernel.org>
> Co-developed-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
> Signed-off-by: Praveen Kakkolangara <praveen.kakkolangara@aumovio.com>
> Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg>
Thanks Xie. This looks good to me. I notice that patchwork failed to apply this patch though. So you may need to
rebase a v2 onto net/main. Other than that I think it looks good. Thank you!
Reviewed-by: Allison Henderson <achender@kernel.org>
> ---
> net/rds/af_rds.c | 24 ++++++++++++++++++++++--
> net/rds/connection.c | 13 +++++++++++++
> net/rds/tcp.c | 25 +++++++++++++++++++++----
> 3 files changed, 56 insertions(+), 6 deletions(-)
>
> diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
> index b396c673d..469891131 100644
> --- a/net/rds/af_rds.c
> +++ b/net/rds/af_rds.c
> @@ -729,6 +729,7 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
> struct rds_info_iterator *iter,
> struct rds_info_lengths *lens)
> {
> + struct net *net = sock_net(sock->sk);
> struct rds_sock *rs;
> struct rds_incoming *inc;
> unsigned int total = 0;
> @@ -738,6 +739,9 @@ static void rds_sock_inc_info(struct socket *sock, unsigned int len,
> spin_lock_bh(&rds_sock_lock);
>
> list_for_each_entry(rs, &rds_sock_list, rs_item) {
> + /* Only show sockets in the caller's netns. */
> + if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> + continue;
> /* This option only supports IPv4 sockets. */
> if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
> continue;
> @@ -768,6 +772,7 @@ static void rds6_sock_inc_info(struct socket *sock, unsigned int len,
> struct rds_info_iterator *iter,
> struct rds_info_lengths *lens)
> {
> + struct net *net = sock_net(sock->sk);
> struct rds_incoming *inc;
> unsigned int total = 0;
> struct rds_sock *rs;
> @@ -777,6 +782,9 @@ static void rds6_sock_inc_info(struct socket *sock, unsigned int len,
> spin_lock_bh(&rds_sock_lock);
>
> list_for_each_entry(rs, &rds_sock_list, rs_item) {
> + /* Only show sockets in the caller's netns. */
> + if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> + continue;
> read_lock(&rs->rs_recv_lock);
>
> list_for_each_entry(inc, &rs->rs_recv_queue, i_item) {
> @@ -800,6 +808,7 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
> struct rds_info_iterator *iter,
> struct rds_info_lengths *lens)
> {
> + struct net *net = sock_net(sock->sk);
> struct rds_info_socket sinfo;
> unsigned int cnt = 0;
> struct rds_sock *rs;
> @@ -814,6 +823,9 @@ static void rds_sock_info(struct socket *sock, unsigned int len,
> }
>
> list_for_each_entry(rs, &rds_sock_list, rs_item) {
> + /* Only show sockets in the caller's netns. */
> + if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> + continue;
> /* This option only supports IPv4 sockets. */
> if (!ipv6_addr_v4mapped(&rs->rs_bound_addr))
> continue;
> @@ -841,17 +853,24 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
> struct rds_info_iterator *iter,
> struct rds_info_lengths *lens)
> {
> + struct net *net = sock_net(sock->sk);
> struct rds6_info_socket sinfo6;
> + unsigned int cnt = 0;
> struct rds_sock *rs;
>
> len /= sizeof(struct rds6_info_socket);
>
> spin_lock_bh(&rds_sock_lock);
>
> - if (len < rds_sock_count)
> + if (len < rds_sock_count) {
> + cnt = rds_sock_count;
> goto out;
> + }
>
> list_for_each_entry(rs, &rds_sock_list, rs_item) {
> + /* Only show sockets in the caller's netns. */
> + if (!net_eq(sock_net(rds_rs_to_sk(rs)), net))
> + continue;
> sinfo6.sndbuf = rds_sk_sndbuf(rs);
> sinfo6.rcvbuf = rds_sk_rcvbuf(rs);
> sinfo6.bound_addr = rs->rs_bound_addr;
> @@ -861,10 +880,11 @@ static void rds6_sock_info(struct socket *sock, unsigned int len,
> sinfo6.inum = sock_i_ino(rds_rs_to_sk(rs));
>
> rds_info_copy(iter, &sinfo6, sizeof(sinfo6));
> + cnt++;
> }
>
> out:
> - lens->nr = rds_sock_count;
> + lens->nr = cnt;
> lens->each = sizeof(struct rds6_info_socket);
>
> spin_unlock_bh(&rds_sock_lock);
> diff --git a/net/rds/connection.c b/net/rds/connection.c
> index 412441aaa..a73554816 100644
> --- a/net/rds/connection.c
> +++ b/net/rds/connection.c
> @@ -568,6 +568,7 @@ static void rds_conn_message_info_cmn(struct socket *sock, unsigned int len,
> struct rds_info_lengths *lens,
> int want_send, bool isv6)
> {
> + struct net *net = sock_net(sock->sk);
> struct hlist_head *head;
> struct list_head *list;
> struct rds_connection *conn;
> @@ -590,6 +591,9 @@ static void rds_conn_message_info_cmn(struct socket *sock, unsigned int len,
> struct rds_conn_path *cp;
> int npaths;
>
> + /* Only show connections in the caller's netns. */
> + if (!net_eq(rds_conn_net(conn), net))
> + continue;
> if (!isv6 && conn->c_isv6)
> continue;
>
> @@ -688,6 +692,7 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
> u64 *buffer,
> size_t item_len)
> {
> + struct net *net = sock_net(sock->sk);
> struct hlist_head *head;
> struct rds_connection *conn;
> size_t i;
> @@ -700,6 +705,9 @@ void rds_for_each_conn_info(struct socket *sock, unsigned int len,
> for (i = 0, head = rds_conn_hash; i < ARRAY_SIZE(rds_conn_hash);
> i++, head++) {
> hlist_for_each_entry_rcu(conn, head, c_hash_node) {
> + /* Only show connections in the caller's netns. */
> + if (!net_eq(rds_conn_net(conn), net))
> + continue;
>
> /* XXX no c_lock usage.. */
> if (!visitor(conn, buffer))
> @@ -726,6 +734,7 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
> u64 *buffer,
> size_t item_len)
> {
> + struct net *net = sock_net(sock->sk);
> struct hlist_head *head;
> struct rds_connection *conn;
> size_t i;
> @@ -740,6 +749,10 @@ static void rds_walk_conn_path_info(struct socket *sock, unsigned int len,
> hlist_for_each_entry_rcu(conn, head, c_hash_node) {
> struct rds_conn_path *cp;
>
> + /* Only show connections in the caller's netns. */
> + if (!net_eq(rds_conn_net(conn), net))
> + continue;
> +
> /* XXX We only copy the information from the first
> * path for now. The problem is that if there are
> * more than one underlying paths, we cannot report
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 654e23d13..ef9e958ca 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -235,20 +235,27 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
> struct rds_info_iterator *iter,
> struct rds_info_lengths *lens)
> {
> + struct net *net = sock_net(rds_sock->sk);
> struct rds_info_tcp_socket tsinfo;
> struct rds_tcp_connection *tc;
> + unsigned int cnt = 0;
> unsigned long flags;
>
> spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
>
> - if (len / sizeof(tsinfo) < rds_tcp_tc_count)
> + if (len / sizeof(tsinfo) < rds_tcp_tc_count) {
> + cnt = rds_tcp_tc_count;
> goto out;
> + }
>
> list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
> struct inet_sock *inet = inet_sk(tc->t_sock->sk);
>
> if (tc->t_cpath->cp_conn->c_isv6)
> continue;
> + /* Only show connections in the caller's netns. */
> + if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> + continue;
>
> tsinfo.local_addr = inet->inet_saddr;
> tsinfo.local_port = inet->inet_sport;
> @@ -263,10 +270,11 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
> tsinfo.tos = tc->t_cpath->cp_conn->c_tos;
>
> rds_info_copy(iter, &tsinfo, sizeof(tsinfo));
> + cnt++;
> }
>
> out:
> - lens->nr = rds_tcp_tc_count;
> + lens->nr = cnt;
> lens->each = sizeof(tsinfo);
>
> spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags);
> @@ -281,19 +289,27 @@ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len,
> struct rds_info_iterator *iter,
> struct rds_info_lengths *lens)
> {
> + struct net *net = sock_net(sock->sk);
> struct rds6_info_tcp_socket tsinfo6;
> struct rds_tcp_connection *tc;
> + unsigned int cnt = 0;
> unsigned long flags;
>
> spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
>
> - if (len / sizeof(tsinfo6) < rds6_tcp_tc_count)
> + if (len / sizeof(tsinfo6) < rds6_tcp_tc_count) {
> + cnt = rds6_tcp_tc_count;
> goto out;
> + }
>
> list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
> struct sock *sk = tc->t_sock->sk;
> struct inet_sock *inet = inet_sk(sk);
>
> + /* Only show connections in the caller's netns. */
> + if (!net_eq(rds_conn_net(tc->t_cpath->cp_conn), net))
> + continue;
> +
> tsinfo6.local_addr = sk->sk_v6_rcv_saddr;
> tsinfo6.local_port = inet->inet_sport;
> tsinfo6.peer_addr = sk->sk_v6_daddr;
> @@ -306,10 +322,11 @@ static void rds6_tcp_tc_info(struct socket *sock, unsigned int len,
> tsinfo6.last_seen_una = tc->t_last_seen_una;
>
> rds_info_copy(iter, &tsinfo6, sizeof(tsinfo6));
> + cnt++;
> }
>
> out:
> - lens->nr = rds6_tcp_tc_count;
> + lens->nr = cnt;
> lens->each = sizeof(tsinfo6);
>
> spin_unlock_irqrestore(&rds_tcp_tc_list_lock, flags);
>
> base-commit: 028ef9c96e96197026887c0f092424679298aae8
prev parent reply other threads:[~2026-05-07 5:17 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 7:50 [PATCH net] rds: filter RDS_INFO_* getsockopt by caller's netns Maoyi Xie
2026-05-07 5:17 ` Allison Henderson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4df3eb5dc608cf8b4649f1358cb74b9fbbfbbca6.camel@kernel.org \
--to=achender@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=maoyi.xie@ntu.edu.sg \
--cc=maoyixie.tju@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=praveen.kakkolangara@aumovio.com \
--cc=rds-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox