Re: [syzbot] KASAN: use-after-free Read in tcp_retransmit

linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [syzbot] KASAN: use-after-free Read in tcp_retransmit_timer (5)
       [not found]   ` <c389e47f-8f82-fd62-8c1d-d9481d2f71ff@I-love.SAKURA.ne.jp>
@ 2022-04-22 14:40     ` Tetsuo Handa
  2022-04-24  3:57       ` Tetsuo Handa
  0 siblings, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2022-04-22 14:40 UTC (permalink / raw)
  To: Santosh Shilimkar, OFED mailing list
  Cc: syzbot, andrii, andriin, ast, daniel, davem, dsahern, edumazet,
	john.fastabend, kafai, kpsingh, kuba, kuznet, netdev,
	songliubraving, syzkaller-bugs, tpa, yhs, yoshfuji, bpf

Hello, RDS developers.

I was thinking that BPF program is relevant with the TCP/IPv6 socket triggering
use-after-free access. But disassembling syzkaller-generated BPF program concluded
that what "char program[2053]" is doing is not important
( https://lkml.kernel.org/r/d21e278f-a3ff-8603-f6ba-b51a8cddafa8@I-love.SAKURA.ne.jp ).

Then, I realized that TCP/IPv6 port 16385 (which the reproducer is accessing) is
used by kernel RDS server, which can explain
"It seems that a socket with sk->sk_net_refcnt=0 is created by unshare(CLONE_NEWNET)"
at https://lkml.kernel.org/r/fa445f0e-32b7-5e0d-9326-94bc5adba4c1@I-love.SAKURA.ne.jp
because the kernel RDS server starts during boot procedure.

------------------------------------------------------------
root@fuzz:~# unshare -n netstat -tanpe
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name
tcp6       0      0 :::16385                :::*                    LISTEN      0          19627      -
------------------------------------------------------------

With the debug printk() patch shown below,

------------------------------------------------------------
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 0ec2f5906a27..20b3c42b4140 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -429,7 +429,8 @@ static void net_free(struct net *net)
 {
 	if (refcount_dec_and_test(&net->passive)) {
 		kfree(rcu_access_pointer(net->gen));
-		kmem_cache_free(net_cachep, net);
+		memset(net, POISON_FREE, sizeof(struct net));
+		//kmem_cache_free(net_cachep, net);
 	}
 }
 
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index 09cadd556d1e..5792fe3df8ac 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -146,10 +146,9 @@ int rds_tcp_accept_one(struct socket *sock)
 	my_addr = &saddr;
 	peer_addr = &daddr;
 #endif
-	rdsdebug("accepted family %d tcp %pI6c:%u -> %pI6c:%u\n",
-		 sock->sk->sk_family,
-		 my_addr, ntohs(inet->inet_sport),
-		 peer_addr, ntohs(inet->inet_dport));
+	pr_info("accepted family %d tcp %pI6c:%u -> %pI6c:%u refcnt=%d sock_net=%px init_net=%px\n",
+		sock->sk->sk_family, my_addr, ntohs(inet->inet_sport), peer_addr,
+		ntohs(inet->inet_dport), sock->sk->sk_net_refcnt, sock_net(sock->sk), &init_net);
 
 #if IS_ENABLED(CONFIG_IPV6)
 	/* sk_bound_dev_if is not set if the peer address is not link local
------------------------------------------------------------

I get

    accepted family 10 tcp ::ffff:127.0.0.1:16385 -> ::ffff:127.0.0.1:33086 refcnt=0 sock_net=ffffffff860d89c0 init_net=ffffffff860d89c0

if I do

    # echo > /dev/tcp/127.0.0.1/16385

 from init_net namespace, and I get

    accepted family 10 tcp ::ffff:127.0.0.1:16385 -> ::ffff:127.0.0.1:33088 refcnt=0 sock_net=ffff88810a208000 init_net=ffffffff860d89c0

if I do

    # echo > /dev/tcp/127.0.0.1/16385

 from non-init_net namespace. Note that sock->sk->sk_net_refcnt is 0 in both cases.

Like commit 2303f994b3e18709 ("mptcp: Associate MPTCP context with TCP socket") says

    /* kernel sockets do not by default acquire net ref, but TCP timer
     * needs it.
     */

, I came to feel that e.g. rds_tcp_accept_one() is accessing sock_net(sock->sk) on
accepted sockets with sock->sk->sk_net_refcnt=0 (because the listening socket was
created by kernel) is causing this problem. Why not rds kernel server does

	sock->sk->sk_net_refcnt = 1;
	get_net_track(net, &sock->sk->ns_tracker, GFP_KERNEL);
	sock_inuse_add(net, 1);

on accepted sockets like mptcp_subflow_create_socket() does?

For your testing, below is the latest reproducer.
You can try this reproducer with keep-memory-poisoned patch shown above.

------------------------------------------------------------
// https://syzkaller.appspot.com/bug?id=8f0e04b2beffcd42f044d46879cc224f6eb71a99
// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <arpa/inet.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <net/if.h>
#include <pthread.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <unistd.h>
#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/netlink.h>
#include <linux/rtnetlink.h>

#ifndef MSG_PROBE
#define MSG_PROBE 0x10
#endif

struct nlmsg {
	char* pos;
	int nesting;
	struct nlattr* nested[8];
	char buf[4096];
};

static void netlink_init(struct nlmsg* nlmsg, int typ, int flags,
                         const void* data, int size)
{
	memset(nlmsg, 0, sizeof(*nlmsg));
	struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg->buf;
	hdr->nlmsg_type = typ;
	hdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK | flags;
	memcpy(hdr + 1, data, size);
	nlmsg->pos = (char*)(hdr + 1) + NLMSG_ALIGN(size);
}

static void netlink_attr(struct nlmsg* nlmsg, int typ, const void* data,
                         int size)
{
	struct nlattr* attr = (struct nlattr*)nlmsg->pos;
	attr->nla_len = sizeof(*attr) + size;
	attr->nla_type = typ;
	if (size > 0)
		memcpy(attr + 1, data, size);
	nlmsg->pos += NLMSG_ALIGN(attr->nla_len);
}

static int netlink_send_ext(struct nlmsg* nlmsg, int sock, uint16_t reply_type,
                            int* reply_len, bool dofail)
{
	if (nlmsg->pos > nlmsg->buf + sizeof(nlmsg->buf) || nlmsg->nesting)
		exit(1);
	struct nlmsghdr* hdr = (struct nlmsghdr*)nlmsg->buf;
	hdr->nlmsg_len = nlmsg->pos - nlmsg->buf;
	struct sockaddr_nl addr;
	memset(&addr, 0, sizeof(addr));
	addr.nl_family = AF_NETLINK;
	ssize_t n = sendto(sock, nlmsg->buf, hdr->nlmsg_len, 0,
			   (struct sockaddr*)&addr, sizeof(addr));
	if (n != (ssize_t)hdr->nlmsg_len) {
		if (dofail)
			exit(1);
		return -1;
	}
	n = recv(sock, nlmsg->buf, sizeof(nlmsg->buf), 0);
	if (reply_len)
		*reply_len = 0;
	if (n < 0) {
		if (dofail)
			exit(1);
		return -1;
	}
	if (n < (ssize_t)sizeof(struct nlmsghdr)) {
		errno = EINVAL;
		if (dofail)
			exit(1);
		return -1;
	}
	if (hdr->nlmsg_type == NLMSG_DONE)
		return 0;
	if (reply_len && hdr->nlmsg_type == reply_type) {
		*reply_len = n;
		return 0;
	}
	if (n < (ssize_t)(sizeof(struct nlmsghdr) + sizeof(struct nlmsgerr))) {
		errno = EINVAL;
		if (dofail)
			exit(1);
		return -1;
	}
	if (hdr->nlmsg_type != NLMSG_ERROR) {
		errno = EINVAL;
		if (dofail)
			exit(1);
		return -1;
	}
	errno = -((struct nlmsgerr*)(hdr + 1))->error;
	return -errno;
}

static int netlink_send(struct nlmsg* nlmsg, int sock)
{
	return netlink_send_ext(nlmsg, sock, 0, NULL, true);
}

static void netlink_device_change(int sock, const char* name, const void* mac, int macsize)
{
	struct nlmsg nlmsg;
	struct ifinfomsg hdr;
	memset(&hdr, 0, sizeof(hdr));
	hdr.ifi_flags = hdr.ifi_change = IFF_UP;
	hdr.ifi_index = if_nametoindex(name);
	netlink_init(&nlmsg, RTM_NEWLINK, 0, &hdr, sizeof(hdr));
	netlink_attr(&nlmsg, IFLA_ADDRESS, mac, macsize);
	netlink_send(&nlmsg, sock);
}

static void netlink_add_addr(int sock, const char* dev, const void* addr, int addrsize)
{
	struct nlmsg nlmsg;
	struct ifaddrmsg hdr;
	memset(&hdr, 0, sizeof(hdr));
	hdr.ifa_family = addrsize == 4 ? AF_INET : AF_INET6;
	hdr.ifa_prefixlen = addrsize == 4 ? 24 : 120;
	hdr.ifa_scope = RT_SCOPE_UNIVERSE;
	hdr.ifa_index = if_nametoindex(dev);
	netlink_init(&nlmsg, RTM_NEWADDR, NLM_F_CREATE | NLM_F_REPLACE, &hdr,
		     sizeof(hdr));
	netlink_attr(&nlmsg, IFA_LOCAL, addr, addrsize);
	netlink_attr(&nlmsg, IFA_ADDRESS, addr, addrsize);
	netlink_send(&nlmsg, sock);
}

static void netlink_add_addr4(int sock, const char* dev, const char* addr)
{
	struct in_addr in_addr;
	inet_pton(AF_INET, addr, &in_addr);
	netlink_add_addr(sock, dev, &in_addr, sizeof(in_addr));
}

static void netlink_add_addr6(int sock, const char* dev, const char* addr)
{
	struct in6_addr in6_addr;
	inet_pton(AF_INET6, addr, &in6_addr);
	netlink_add_addr(sock, dev, &in6_addr, sizeof(in6_addr));
}

static void initialize_netdevices(void)
{
	int fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);
	uint64_t macaddr = 0x00aaaaaaaaaa;
	if (fd == EOF)
		exit(1);
	netlink_add_addr4(fd, "lo", "127.0.0.1");
	netlink_add_addr6(fd, "lo", "::1");
	netlink_device_change(fd, "lo", &macaddr, ETH_ALEN);
	close(fd);
}

#ifndef __NR_bpf
#define __NR_bpf 321
#endif

static void execute_one(void)
{
	const union bpf_attr attr = {
		.prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
		.insn_cnt = 2,
		.insns = (unsigned long long) "\xb7\x00\x00\x00\x00\x00\x00\x00\x95\x00\x00\x00\x00\x00\x00\x00",
		.license = (unsigned long long) "GPL",
	};
	struct sockaddr_in addr = {
		.sin_family = AF_INET,
		.sin_port = htons(0x4001), /* where kernel RDS TCPv6 socket is listening */
		.sin_addr.s_addr = inet_addr("127.0.0.1")
	};
	const struct msghdr msg = {
		.msg_name = &addr,
		.msg_namelen = sizeof(addr),
	};
	const int bpf_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, 72);
	const int sock_fd = socket(PF_INET, SOCK_STREAM, 0);
	alarm(3);
	while (1) {
		sendmsg(sock_fd, &msg, MSG_OOB | MSG_PROBE | MSG_CONFIRM | MSG_FASTOPEN);
		setsockopt(sock_fd, SOL_SOCKET, SO_ATTACH_BPF, &bpf_fd, sizeof(bpf_fd));
	}
}

int main(int argc, char *argv[])
{
	if (unshare(CLONE_NEWNET))
		return 1;
	initialize_netdevices();
	execute_one();
	return 0;
}
------------------------------------------------------------


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [syzbot] KASAN: use-after-free Read in tcp_retransmit_timer (5)
  2022-04-22 14:40     ` [syzbot] KASAN: use-after-free Read in tcp_retransmit_timer (5) Tetsuo Handa
@ 2022-04-24  3:57       ` Tetsuo Handa
  2022-05-01 15:29         ` [PATCH] net: rds: acquire refcount on TCP sockets Tetsuo Handa
  0 siblings, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2022-04-24  3:57 UTC (permalink / raw)
  To: Santosh Shilimkar, OFED mailing list
  Cc: syzbot, andrii, andriin, ast, daniel, davem, dsahern, edumazet,
	john.fastabend, kafai, kpsingh, kuba, kuznet, netdev,
	songliubraving, syzkaller-bugs, tpa, yhs, yoshfuji, bpf

OK. I succeeded to reproduce this problem without BPF program.
Just dropping TCP packets is sufficient. That is, this bug should be fixed in RDS code.

------------------------------------------------------------
root@fuzz:~# unshare -n sh -c '
ip link set lo up
iptables -A OUTPUT -p tcp --sport 16385 --tcp-flags SYN NONE -m state --state ESTABLISHED,RELATED -j DROP
ip6tables -A OUTPUT -p tcp --sport 16385 --tcp-flags SYN NONE -m state --state ESTABLISHED,RELATED -j DROP
telnet 127.0.0.1 16385
dmesg -c
netstat -tanpe' < /dev/null
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
Connection closed by foreign host.
[   54.922280] accepted family 10 tcp ::ffff:127.0.0.1:16385 -> ::ffff:127.0.0.1:58780 refcnt=0 sock_net=ffff888035c98000 init_net=ffffffff860d89c0
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name
tcp        0      1 127.0.0.1:58780         127.0.0.1:16385         FIN_WAIT1   0          0          -
tcp6       0      0 :::16385                :::*                    LISTEN      0          18301      -
tcp6       1      1 127.0.0.1:16385         127.0.0.1:58780         LAST_ACK    0          0          -
------------------------------------------------------------

------------------------------------------------------------
fuzz login: [   54.849128][ T2718] ip (2718) used greatest stack depth: 11192 bytes left
[   54.922280][  T764] accepted family 10 tcp ::ffff:127.0.0.1:16385 -> ::ffff:127.0.0.1:58780 refcnt=0 sock_net=ffff888035c98000 init_net=ffffffff860d89c0
[  224.330990][    C0] general protection fault, probably for non-canonical address 0x6b6af3ebe92b6bc3: 0000 [#1] PREEMPT SMP
[  224.344491][    C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.0-rc3-00016-gb253435746d9-dirty #767
[  224.355974][    C0] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  224.361184][    C0] RIP: 0010:__tcp_transmit_skb+0x5e5/0xbf0
[  224.364559][    C0] Code: 0f 84 33 05 00 00 4c 89 2c 24 49 89 c5 48 c7 40 10 00 00 00 00 e9 c0 fa ff ff 49 8b 46 30 41 0f b7 55 30 48 8b 80 b8 02 00 00 <65> 48 01 50 58 e9 8e fe ff ff 41 8b 86 fc 08 00 00 48 69 c0 e8 03
[  224.375318][    C0] RSP: 0018:ffffc90000003d38 EFLAGS: 00010297
[  224.378682][    C0] RAX: 6b6b6b6b6b6b6b6b RBX: 000000009e2a2659 RCX: ffff888104a39000
[  224.383253][    C0] RDX: 0000000000000001 RSI: ffff8881008054e0 RDI: ffff888035340000
[  224.387171][    C0] RBP: ffff888100805508 R08: 0000000000000000 R09: 0000000000000000
[  224.389612][    C0] R10: ffff888104a39140 R11: 0000000000000000 R12: 0000000000000001
[  224.392646][    C0] R13: ffff8881008054e0 R14: ffff888035340000 R15: 0000000000000020
[  224.395626][    C0] FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[  224.398662][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  224.400880][    C0] CR2: 000056264812f99c CR3: 000000000a58e000 CR4: 00000000000506f0
[  224.403964][    C0] Call Trace:
[  224.405212][    C0]  <IRQ>
[  224.406355][    C0]  ? tcp_write_timer_handler+0x280/0x280
[  224.408259][    C0]  tcp_write_wakeup+0x112/0x160
[  224.409932][    C0]  ? ktime_get+0x1cb/0x260
[  224.411636][    C0]  tcp_send_probe0+0x13/0x150
[  224.413393][    C0]  tcp_write_timer_handler+0x248/0x280
[  224.415433][    C0]  tcp_write_timer+0xa5/0x110
[  224.417040][    C0]  ? tcp_write_timer_handler+0x280/0x280
[  224.419142][    C0]  call_timer_fn+0xa6/0x300
[  224.420949][    C0]  __run_timers.part.0+0x209/0x320
[  224.422915][    C0]  run_timer_softirq+0x2c/0x60
[  224.424791][    C0]  __do_softirq+0x174/0x53f
[  224.426462][    C0]  __irq_exit_rcu+0xcb/0x120
[  224.428188][    C0]  irq_exit_rcu+0x5/0x20
[  224.430176][    C0]  sysvec_apic_timer_interrupt+0x8e/0xc0
[  224.432301][    C0]  </IRQ>
[  224.433394][    C0]  <TASK>
[  224.434514][    C0]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[  224.436500][    C0] RIP: 0010:default_idle+0xb/0x10
[  224.438220][    C0] Code: 8b 04 25 40 af 01 00 f0 80 60 02 df c3 0f ae f0 0f ae 38 0f ae f0 eb b9 0f 1f 80 00 00 00 00 eb 07 0f 00 2d e3 b6 56 00 fb f4 <c3> cc cc cc cc 53 48 89 fb e8 67 fb fe ff 48 8b 15 a0 91 4e 02 89
[  224.444865][    C0] RSP: 0018:ffffffff83e03ea8 EFLAGS: 00000202
[  224.447077][    C0] RAX: 00000000000223b5 RBX: ffffffff83e61a00 RCX: 0000000000000001
[  224.449957][    C0] RDX: 0000000000000000 RSI: ffffffff832e9bf1 RDI: ffffffff83246666
[  224.452916][    C0] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[  224.455677][    C0] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  224.458458][    C0] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  224.461642][    C0]  default_idle_call+0x54/0x90
[  224.463888][    C0]  do_idle+0x1f3/0x240
[  224.465531][    C0]  cpu_startup_entry+0x14/0x20
[  224.467193][    C0]  start_kernel+0x69c/0x6c1
[  224.469040][    C0]  secondary_startup_64_no_verify+0xc3/0xcb
[  224.471179][    C0]  </TASK>
[  224.472438][    C0] Modules linked in:
[  224.474387][    C0] ---[ end trace 0000000000000000 ]---
[  224.476521][    C0] RIP: 0010:__tcp_transmit_skb+0x5e5/0xbf0
[  224.478893][    C0] Code: 0f 84 33 05 00 00 4c 89 2c 24 49 89 c5 48 c7 40 10 00 00 00 00 e9 c0 fa ff ff 49 8b 46 30 41 0f b7 55 30 48 8b 80 b8 02 00 00 <65> 48 01 50 58 e9 8e fe ff ff 41 8b 86 fc 08 00 00 48 69 c0 e8 03
[  224.485948][    C0] RSP: 0018:ffffc90000003d38 EFLAGS: 00010297
[  224.488110][    C0] RAX: 6b6b6b6b6b6b6b6b RBX: 000000009e2a2659 RCX: ffff888104a39000
[  224.491186][    C0] RDX: 0000000000000001 RSI: ffff8881008054e0 RDI: ffff888035340000
[  224.494378][    C0] RBP: ffff888100805508 R08: 0000000000000000 R09: 0000000000000000
[  224.497576][    C0] R10: ffff888104a39140 R11: 0000000000000000 R12: 0000000000000001
[  224.500600][    C0] R13: ffff8881008054e0 R14: ffff888035340000 R15: 0000000000000020
[  224.503814][    C0] FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[  224.507136][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  224.509421][    C0] CR2: 000056264812f99c CR3: 000000000a58e000 CR4: 00000000000506f0
[  224.512699][    C0] Kernel panic - not syncing: Fatal exception in interrupt
[  224.515847][    C0] Kernel Offset: disabled
[  224.517636][    C0] Rebooting in 10 seconds..
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] net: rds: acquire refcount on TCP sockets
  2022-04-24  3:57       ` Tetsuo Handa
@ 2022-05-01 15:29         ` Tetsuo Handa
  2022-05-01 16:14           ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2022-05-01 15:29 UTC (permalink / raw)
  To: Santosh Shilimkar, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: syzbot, netdev, syzkaller-bugs, OFED mailing list

syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
for TCP socket used by RDS is accessing sock_net() without acquiring a
refcount on net namespace. Since TCP's retransmission can happen after
a process which created net namespace terminated, we need to explicitly
acquire a refcount.

Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
---
 net/rds/tcp.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 5327d130c4b5..8015d2695784 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -493,6 +493,15 @@ void rds_tcp_tune(struct socket *sock)
 	struct net *net = sock_net(sk);
 	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
 
+	/* TCP timer functions might access net namespace even after
+	 * a process which created this net namespace terminated.
+	 */
+	if (!sk->sk_net_refcnt) {
+		sk->sk_net_refcnt = 1;
+		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
+		sock_inuse_add(net, 1);
+	}
+
 	tcp_sock_set_nodelay(sock->sk);
 	lock_sock(sk);
 	if (rtn->sndbuf_size > 0) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: rds: acquire refcount on TCP sockets
  2022-05-01 15:29         ` [PATCH] net: rds: acquire refcount on TCP sockets Tetsuo Handa
@ 2022-05-01 16:14           ` Eric Dumazet
  2022-05-02  1:40             ` [PATCH v2] " Tetsuo Handa
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2022-05-01 16:14 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Santosh Shilimkar, David S. Miller, Jakub Kicinski, Paolo Abeni,
	syzbot, netdev, syzkaller-bugs, OFED mailing list

On Sun, May 1, 2022 at 8:29 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> for TCP socket used by RDS is accessing sock_net() without acquiring a
> refcount on net namespace. Since TCP's retransmission can happen after
> a process which created net namespace terminated, we need to explicitly
> acquire a refcount.
>

Please add a Fixes: tag

> Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> ---
>  net/rds/tcp.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 5327d130c4b5..8015d2695784 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -493,6 +493,15 @@ void rds_tcp_tune(struct socket *sock)
>         struct net *net = sock_net(sk);
>         struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
>
> +       /* TCP timer functions might access net namespace even after
> +        * a process which created this net namespace terminated.
> +        */

Please move this after the lock_sock(sk) [1], so that we are protected
correctly ?

> +       if (!sk->sk_net_refcnt) {
> +               sk->sk_net_refcnt = 1;
> +               get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> +               sock_inuse_add(net, 1);
> +       }
> +
>         tcp_sock_set_nodelay(sock->sk);

>         lock_sock(sk);

[1] Here.

>         if (rtn->sndbuf_size > 0) {
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-01 16:14           ` Eric Dumazet
@ 2022-05-02  1:40             ` Tetsuo Handa
  2022-05-02 14:12               ` Haakon Bugge
                                 ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Tetsuo Handa @ 2022-05-02  1:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Santosh Shilimkar, David S. Miller, Jakub Kicinski, Paolo Abeni,
	syzbot, netdev, syzkaller-bugs, OFED mailing list

syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
for TCP socket used by RDS is accessing sock_net() without acquiring a
refcount on net namespace. Since TCP's retransmission can happen after
a process which created net namespace terminated, we need to explicitly
acquire a refcount.

Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
---
Changes in v2:
  Add Fixes: tag.
  Move to inside lock_sock() section.

I chose 26abe14379f8e2fa and 8a68173691f03661 which went to 4.2 for Fixes: tag,
for refcount was implicitly taken when 70041088e3b97662 ("RDS: Add TCP transport
to RDS") was added to 2.6.32.

 net/rds/tcp.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 5327d130c4b5..2f638f8b7b1e 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -495,6 +495,14 @@ void rds_tcp_tune(struct socket *sock)
 
 	tcp_sock_set_nodelay(sock->sk);
 	lock_sock(sk);
+	/* TCP timer functions might access net namespace even after
+	 * a process which created this net namespace terminated.
+	 */
+	if (!sk->sk_net_refcnt) {
+		sk->sk_net_refcnt = 1;
+		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
+		sock_inuse_add(net, 1);
+	}
 	if (rtn->sndbuf_size > 0) {
 		sk->sk_sndbuf = rtn->sndbuf_size;
 		sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-02  1:40             ` [PATCH v2] " Tetsuo Handa
@ 2022-05-02 14:12               ` Haakon Bugge
  2022-05-02 14:29                 ` Tetsuo Handa
  2022-05-03  9:02               ` Paolo Abeni
  2022-05-03 11:40               ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 30+ messages in thread
From: Haakon Bugge @ 2022-05-02 14:12 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Eric Dumazet, Santosh Shilimkar, David S. Miller, Jakub Kicinski,
	Paolo Abeni, syzbot, netdev, syzkaller-bugs, OFED mailing list



> On 2 May 2022, at 03:40, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
> 
> syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> for TCP socket used by RDS is accessing sock_net() without acquiring a
> refcount on net namespace. Since TCP's retransmission can happen after
> a process which created net namespace terminated, we need to explicitly
> acquire a refcount.
> 
> Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> ---
> Changes in v2:
>  Add Fixes: tag.
>  Move to inside lock_sock() section.
> 
> I chose 26abe14379f8e2fa and 8a68173691f03661 which went to 4.2 for Fixes: tag,
> for refcount was implicitly taken when 70041088e3b97662 ("RDS: Add TCP transport
> to RDS") was added to 2.6.32.
> 
> net/rds/tcp.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
> 
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 5327d130c4b5..2f638f8b7b1e 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -495,6 +495,14 @@ void rds_tcp_tune(struct socket *sock)
> 
> 	tcp_sock_set_nodelay(sock->sk);
> 	lock_sock(sk);
> +	/* TCP timer functions might access net namespace even after
> +	 * a process which created this net namespace terminated.
> +	 */
> +	if (!sk->sk_net_refcnt) {
> +		sk->sk_net_refcnt = 1;
> +		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);

Don't you need a corresponding put_net_track()?


Thxs, Håkon


> +		sock_inuse_add(net, 1);
> +	}
> 	if (rtn->sndbuf_size > 0) {
> 		sk->sk_sndbuf = rtn->sndbuf_size;
> 		sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
> -- 
> 2.34.1
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-02 14:12               ` Haakon Bugge
@ 2022-05-02 14:29                 ` Tetsuo Handa
  0 siblings, 0 replies; 30+ messages in thread
From: Tetsuo Handa @ 2022-05-02 14:29 UTC (permalink / raw)
  To: Haakon Bugge
  Cc: Eric Dumazet, Santosh Shilimkar, David S. Miller, Jakub Kicinski,
	Paolo Abeni, syzbot, netdev, syzkaller-bugs, OFED mailing list

On 2022/05/02 23:12, Haakon Bugge wrote:
>> +	/* TCP timer functions might access net namespace even after
>> +	 * a process which created this net namespace terminated.
>> +	 */
>> +	if (!sk->sk_net_refcnt) {
>> +		sk->sk_net_refcnt = 1;
>> +		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> 
> Don't you need a corresponding put_net_track()?

__sk_free() and __sk_destruct() will do if sk->sk_net_refcnt is set.

> 
>> +		sock_inuse_add(net, 1);
>> +	}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-02  1:40             ` [PATCH v2] " Tetsuo Handa
  2022-05-02 14:12               ` Haakon Bugge
@ 2022-05-03  9:02               ` Paolo Abeni
  2022-05-03  9:56                 ` Tetsuo Handa
                                   ` (2 more replies)
  2022-05-03 11:40               ` patchwork-bot+netdevbpf
  2 siblings, 3 replies; 30+ messages in thread
From: Paolo Abeni @ 2022-05-03  9:02 UTC (permalink / raw)
  To: Tetsuo Handa, Eric Dumazet
  Cc: Santosh Shilimkar, David S. Miller, Jakub Kicinski, syzbot,
	netdev, syzkaller-bugs, OFED mailing list

Hello,

On Mon, 2022-05-02 at 10:40 +0900, Tetsuo Handa wrote:
> syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> for TCP socket used by RDS is accessing sock_net() without acquiring a
> refcount on net namespace. Since TCP's retransmission can happen after
> a process which created net namespace terminated, we need to explicitly
> acquire a refcount.
> 
> Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> ---
> Changes in v2:
>   Add Fixes: tag.
>   Move to inside lock_sock() section.
> 
> I chose 26abe14379f8e2fa and 8a68173691f03661 which went to 4.2 for Fixes: tag,
> for refcount was implicitly taken when 70041088e3b97662 ("RDS: Add TCP transport
> to RDS") was added to 2.6.32.
> 
>  net/rds/tcp.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 5327d130c4b5..2f638f8b7b1e 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -495,6 +495,14 @@ void rds_tcp_tune(struct socket *sock)
>  
>  	tcp_sock_set_nodelay(sock->sk);
>  	lock_sock(sk);
> +	/* TCP timer functions might access net namespace even after
> +	 * a process which created this net namespace terminated.
> +	 */
> +	if (!sk->sk_net_refcnt) {
> +		sk->sk_net_refcnt = 1;
> +		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> +		sock_inuse_add(net, 1);
> +	}
>  	if (rtn->sndbuf_size > 0) {
>  		sk->sk_sndbuf = rtn->sndbuf_size;
>  		sk->sk_userlocks |= SOCK_SNDBUF_LOCK;

This looks equivalent to the fix presented here:

https://lore.kernel.org/all/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/

but the latter looks a more generic solution. @Tetsuo could you please
test the above in your setup?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03  9:02               ` Paolo Abeni
@ 2022-05-03  9:56                 ` Tetsuo Handa
  2022-05-03 11:10                   ` Paolo Abeni
  2022-05-03 13:27                 ` David Laight
  2022-05-03 13:45                 ` Eric Dumazet
  2 siblings, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2022-05-03  9:56 UTC (permalink / raw)
  To: Paolo Abeni, Eric Dumazet
  Cc: Santosh Shilimkar, David S. Miller, Jakub Kicinski, syzbot,
	netdev, syzkaller-bugs, OFED mailing list

On 2022/05/03 18:02, Paolo Abeni wrote:
> This looks equivalent to the fix presented here:

Not equivalent.

> 
> https://lore.kernel.org/all/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
> 
> but the latter looks a more generic solution. @Tetsuo could you please
> test the above in your setup?

I already tested that fix, and the result was
https://lore.kernel.org/all/78cdbf25-4511-a567-bb09-0c07edae8b50@I-love.SAKURA.ne.jp/ .

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03  9:56                 ` Tetsuo Handa
@ 2022-05-03 11:10                   ` Paolo Abeni
  0 siblings, 0 replies; 30+ messages in thread
From: Paolo Abeni @ 2022-05-03 11:10 UTC (permalink / raw)
  To: Tetsuo Handa, Eric Dumazet
  Cc: Santosh Shilimkar, David S. Miller, Jakub Kicinski, syzbot,
	netdev, syzkaller-bugs, OFED mailing list

On Tue, 2022-05-03 at 18:56 +0900, Tetsuo Handa wrote:
> On 2022/05/03 18:02, Paolo Abeni wrote:
> 
> > https://lore.kernel.org/all/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
> > 
> > but the latter looks a more generic solution. @Tetsuo could you please
> > test the above in your setup?
> 
> I already tested that fix, and the result was
> https://lore.kernel.org/all/78cdbf25-4511-a567-bb09-0c07edae8b50@I-love.SAKURA.ne.jp/ .

Thanks, I somewhat missed that reply.

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03  9:02               ` Paolo Abeni
  2022-05-03  9:56                 ` Tetsuo Handa
@ 2022-05-03 13:27                 ` David Laight
  2022-05-03 13:43                   ` Eric Dumazet
  2022-05-03 13:45                 ` Eric Dumazet
  2 siblings, 1 reply; 30+ messages in thread
From: David Laight @ 2022-05-03 13:27 UTC (permalink / raw)
  To: 'Paolo Abeni', Tetsuo Handa, Eric Dumazet
  Cc: Santosh Shilimkar, David S. Miller, Jakub Kicinski, syzbot,
	netdev, syzkaller-bugs, OFED mailing list

From: Paolo Abeni
> Sent: 03 May 2022 10:03
> 
> Hello,
> 
> On Mon, 2022-05-02 at 10:40 +0900, Tetsuo Handa wrote:
> > syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> > for TCP socket used by RDS is accessing sock_net() without acquiring a
> > refcount on net namespace. Since TCP's retransmission can happen after
> > a process which created net namespace terminated, we need to explicitly
> > acquire a refcount.
> >
> > Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> > Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> > Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a
> kernel socket")
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > ---
> > Changes in v2:
> >   Add Fixes: tag.
> >   Move to inside lock_sock() section.
> >
> > I chose 26abe14379f8e2fa and 8a68173691f03661 which went to 4.2 for Fixes: tag,
> > for refcount was implicitly taken when 70041088e3b97662 ("RDS: Add TCP transport
> > to RDS") was added to 2.6.32.
> >
> >  net/rds/tcp.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> > index 5327d130c4b5..2f638f8b7b1e 100644
> > --- a/net/rds/tcp.c
> > +++ b/net/rds/tcp.c
> > @@ -495,6 +495,14 @@ void rds_tcp_tune(struct socket *sock)
> >
> >  	tcp_sock_set_nodelay(sock->sk);
> >  	lock_sock(sk);
> > +	/* TCP timer functions might access net namespace even after
> > +	 * a process which created this net namespace terminated.
> > +	 */
> > +	if (!sk->sk_net_refcnt) {
> > +		sk->sk_net_refcnt = 1;
> > +		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> > +		sock_inuse_add(net, 1);
> > +	}
> >  	if (rtn->sndbuf_size > 0) {
> >  		sk->sk_sndbuf = rtn->sndbuf_size;
> >  		sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
> 
> This looks equivalent to the fix presented here:
> 
> https://lore.kernel.org/all/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
> 
> but the latter looks a more generic solution. @Tetsuo could you please
> test the above in your setup?

Wouldn't a more generic solution be to add a flag to sock_create_kern()
so that it acquires a reference to the namespace?
This could be a bit on one of the existing parameters - like SOCK_NONBLOCK.

I've a driver that uses __sock_create() in order to get that reference.
I'm pretty sure the extra 'security' check will never fail.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03 13:27                 ` David Laight
@ 2022-05-03 13:43                   ` Eric Dumazet
  2022-05-03 14:25                     ` David Laight
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2022-05-03 13:43 UTC (permalink / raw)
  To: David Laight
  Cc: Paolo Abeni, Tetsuo Handa, Santosh Shilimkar, David S. Miller,
	Jakub Kicinski, syzbot, netdev, syzkaller-bugs, OFED mailing list

On Tue, May 3, 2022 at 6:27 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Paolo Abeni
> > Sent: 03 May 2022 10:03
> >
> > Hello,
> >
> > On Mon, 2022-05-02 at 10:40 +0900, Tetsuo Handa wrote:
> > > syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> > > for TCP socket used by RDS is accessing sock_net() without acquiring a
> > > refcount on net namespace. Since TCP's retransmission can happen after
> > > a process which created net namespace terminated, we need to explicitly
> > > acquire a refcount.
> > >
> > > Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> > > Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > > Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> > > Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a
> > kernel socket")
> > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > > ---
> > > Changes in v2:
> > >   Add Fixes: tag.
> > >   Move to inside lock_sock() section.
> > >
> > > I chose 26abe14379f8e2fa and 8a68173691f03661 which went to 4.2 for Fixes: tag,
> > > for refcount was implicitly taken when 70041088e3b97662 ("RDS: Add TCP transport
> > > to RDS") was added to 2.6.32.
> > >
> > >  net/rds/tcp.c | 8 ++++++++
> > >  1 file changed, 8 insertions(+)
> > >
> > > diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> > > index 5327d130c4b5..2f638f8b7b1e 100644
> > > --- a/net/rds/tcp.c
> > > +++ b/net/rds/tcp.c
> > > @@ -495,6 +495,14 @@ void rds_tcp_tune(struct socket *sock)
> > >
> > >     tcp_sock_set_nodelay(sock->sk);
> > >     lock_sock(sk);
> > > +   /* TCP timer functions might access net namespace even after
> > > +    * a process which created this net namespace terminated.
> > > +    */
> > > +   if (!sk->sk_net_refcnt) {
> > > +           sk->sk_net_refcnt = 1;
> > > +           get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> > > +           sock_inuse_add(net, 1);
> > > +   }
> > >     if (rtn->sndbuf_size > 0) {
> > >             sk->sk_sndbuf = rtn->sndbuf_size;
> > >             sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
> >
> > This looks equivalent to the fix presented here:
> >
> > https://lore.kernel.org/all/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
> >
> > but the latter looks a more generic solution. @Tetsuo could you please
> > test the above in your setup?
>
> Wouldn't a more generic solution be to add a flag to sock_create_kern()
> so that it acquires a reference to the namespace?
> This could be a bit on one of the existing parameters - like SOCK_NONBLOCK.
>
> I've a driver that uses __sock_create() in order to get that reference.
> I'm pretty sure the extra 'security' check will never fail.
>

This would be silly really.

Definition of a 'kernel socket' is that it does not hold a reference
to the namespace.
(otherwise a netns could not be destroyed by user space)

A kernel layer using kernel sockets needs to properly dismantle them
when a namespace is destroyed.

In the RDS case, the socket was a user socket, or RDS lacked proper
tracking of all the sockets
so that they can be dismantled properly.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03 13:43                   ` Eric Dumazet
@ 2022-05-03 14:25                     ` David Laight
  0 siblings, 0 replies; 30+ messages in thread
From: David Laight @ 2022-05-03 14:25 UTC (permalink / raw)
  To: 'Eric Dumazet'
  Cc: Paolo Abeni, Tetsuo Handa, Santosh Shilimkar, David S. Miller,
	Jakub Kicinski, syzbot, netdev, syzkaller-bugs, OFED mailing list

From: Eric Dumazet
> Sent: 03 May 2022 14:43
> 
> On Tue, May 3, 2022 at 6:27 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Paolo Abeni
> > > Sent: 03 May 2022 10:03
> > >
> > > Hello,
> > >
> > > On Mon, 2022-05-02 at 10:40 +0900, Tetsuo Handa wrote:
> > > > syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> > > > for TCP socket used by RDS is accessing sock_net() without acquiring a
> > > > refcount on net namespace. Since TCP's retransmission can happen after
> > > > a process which created net namespace terminated, we need to explicitly
> > > > acquire a refcount.
> > > >
> > > > Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> > > > Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > > > Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel
> sockets.")
> > > > Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a
> > > kernel socket")
> > > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > > > ---
> > > > Changes in v2:
> > > >   Add Fixes: tag.
> > > >   Move to inside lock_sock() section.
> > > >
> > > > I chose 26abe14379f8e2fa and 8a68173691f03661 which went to 4.2 for Fixes: tag,
> > > > for refcount was implicitly taken when 70041088e3b97662 ("RDS: Add TCP transport
> > > > to RDS") was added to 2.6.32.
> > > >
> > > >  net/rds/tcp.c | 8 ++++++++
> > > >  1 file changed, 8 insertions(+)
> > > >
> > > > diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> > > > index 5327d130c4b5..2f638f8b7b1e 100644
> > > > --- a/net/rds/tcp.c
> > > > +++ b/net/rds/tcp.c
> > > > @@ -495,6 +495,14 @@ void rds_tcp_tune(struct socket *sock)
> > > >
> > > >     tcp_sock_set_nodelay(sock->sk);
> > > >     lock_sock(sk);
> > > > +   /* TCP timer functions might access net namespace even after
> > > > +    * a process which created this net namespace terminated.
> > > > +    */
> > > > +   if (!sk->sk_net_refcnt) {
> > > > +           sk->sk_net_refcnt = 1;
> > > > +           get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> > > > +           sock_inuse_add(net, 1);
> > > > +   }
> > > >     if (rtn->sndbuf_size > 0) {
> > > >             sk->sk_sndbuf = rtn->sndbuf_size;
> > > >             sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
> > >
> > > This looks equivalent to the fix presented here:
> > >
> > > https://lore.kernel.org/all/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/
> > >
> > > but the latter looks a more generic solution. @Tetsuo could you please
> > > test the above in your setup?
> >
> > Wouldn't a more generic solution be to add a flag to sock_create_kern()
> > so that it acquires a reference to the namespace?
> > This could be a bit on one of the existing parameters - like SOCK_NONBLOCK.
> >
> > I've a driver that uses __sock_create() in order to get that reference.
> > I'm pretty sure the extra 'security' check will never fail.
> >
> 
> This would be silly really.
> 
> Definition of a 'kernel socket' is that it does not hold a reference
> to the namespace.
> (otherwise a netns could not be destroyed by user space)
> 
> A kernel layer using kernel sockets needs to properly dismantle them
> when a namespace is destroyed.

I think it depends on why the driver is using a socket.

If the driver is a 'user' of a TCP connection that happens to
be is a kernel driver then holding the a reference to the namespace
is no different to an application socket holding a reference.
An example might be nfs/tcp - you need to unmount the filesystem
before you can delete the namespace.

OTOH if part of a protocol stack is using a socket for internal
calls (I think I've seen routing sockets used that way) then the
presence of the socket probably shouldn't stop the namespace
being deleted.

Listening sockets are a slight problem - probably for userspace as well.
It would be nicer to be able to get TCP (etc) to error out listening
sockets if they are the only thing stopping a namespace being deleted.

> In the RDS case, the socket was a user socket, or RDS lacked proper
> tracking of all the sockets
> so that they can be dismantled properly.

I think they probably are sockets created in order act on requests
from applications.
I think they should have the same effect on namespaces as a direct
user socket - you can't delete the socket while the connection is
active.
Kill all the relevant processes, tell the driver to stop, and you
can delete the namespace.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03  9:02               ` Paolo Abeni
  2022-05-03  9:56                 ` Tetsuo Handa
  2022-05-03 13:27                 ` David Laight
@ 2022-05-03 13:45                 ` Eric Dumazet
  2022-05-03 14:08                   ` Tetsuo Handa
  2 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2022-05-03 13:45 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Tetsuo Handa, Santosh Shilimkar, David S. Miller, Jakub Kicinski,
	syzbot, netdev, syzkaller-bugs, OFED mailing list

On Tue, May 3, 2022 at 2:02 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hello,
>
> On Mon, 2022-05-02 at 10:40 +0900, Tetsuo Handa wrote:
> > syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> > for TCP socket used by RDS is accessing sock_net() without acquiring a
> > refcount on net namespace. Since TCP's retransmission can happen after
> > a process which created net namespace terminated, we need to explicitly
> > acquire a refcount.
> >
> > Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> > Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> > Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > ---
> > Changes in v2:
> >   Add Fixes: tag.
> >   Move to inside lock_sock() section.
> >
> > I chose 26abe14379f8e2fa and 8a68173691f03661 which went to 4.2 for Fixes: tag,
> > for refcount was implicitly taken when 70041088e3b97662 ("RDS: Add TCP transport
> > to RDS") was added to 2.6.32.
> >
> >  net/rds/tcp.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> > index 5327d130c4b5..2f638f8b7b1e 100644
> > --- a/net/rds/tcp.c
> > +++ b/net/rds/tcp.c
> > @@ -495,6 +495,14 @@ void rds_tcp_tune(struct socket *sock)
> >
> >       tcp_sock_set_nodelay(sock->sk);
> >       lock_sock(sk);
> > +     /* TCP timer functions might access net namespace even after
> > +      * a process which created this net namespace terminated.
> > +      */
> > +     if (!sk->sk_net_refcnt) {
> > +             sk->sk_net_refcnt = 1;
> > +             get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
> > +             sock_inuse_add(net, 1);
> > +     }
> >       if (rtn->sndbuf_size > 0) {
> >               sk->sk_sndbuf = rtn->sndbuf_size;
> >               sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
>
> This looks equivalent to the fix presented here:
>
> https://lore.kernel.org/all/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/

I think this is still needed for layers (NFS ?) that dismantle their
TCP sockets whenever a netns
is dismantled. But RDS case was different, only the listener is a kernel socket.

>
> but the latter looks a more generic solution. @Tetsuo could you please
> test the above in your setup?
>
> Thanks!
>
> Paolo
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03 13:45                 ` Eric Dumazet
@ 2022-05-03 14:08                   ` Tetsuo Handa
  0 siblings, 0 replies; 30+ messages in thread
From: Tetsuo Handa @ 2022-05-03 14:08 UTC (permalink / raw)
  To: Eric Dumazet, Paolo Abeni
  Cc: Santosh Shilimkar, David S. Miller, Jakub Kicinski, syzbot,
	netdev, syzkaller-bugs, OFED mailing list

On 2022/05/03 22:45, Eric Dumazet wrote:
>> This looks equivalent to the fix presented here:
>>
>> https://lore.kernel.org/all/CANn89i+484ffqb93aQm1N-tjxxvb3WDKX0EbD7318RwRgsatjw@mail.gmail.com/

I retested the fix above using

unshare -n sh -c '
ip link set lo up
iptables -A OUTPUT -p tcp --sport 16385 --tcp-flags SYN NONE -m state --state ESTABLISHED,RELATED -j DROP
ip6tables -A OUTPUT -p tcp --sport 16385 --tcp-flags SYN NONE -m state --state ESTABLISHED,RELATED -j DROP
telnet 127.0.0.1 16385
dmesg -c
netstat -tanpe' < /dev/null

as a test case, but it seems racy; sometimes timer function is called again and crashes.

[  426.086565][    C2] general protection fault, probably for non-canonical address 0x6b6af3ebcc3b6bc3: 0000 [#1] PREEMPT SMP KASAN
[  426.096339][    C2] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.18.0-rc5-dirty #807
[  426.103769][    C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  426.111851][    C2] RIP: 0010:__tcp_transmit_skb+0xe72/0x1b80
[  426.117512][    C2] Code: e8 b3 ea dc fd 48 8d 7d 30 45 0f b7 77 30 e8 95 ec dc fd 48 8b 5d 30 48 8d bb b8 02 00 00 e8 85 ec dc fd 48 8b 83 b8 02 00 00 <65> 4c 01 70 58 e9 67 fd ff ff e8 ef 56 ac fd 48 8d bd d0 09 00 00
[  426.124692][    C2] RSP: 0018:ffff888060d09ac8 EFLAGS: 00010246
[  426.126845][    C2] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8880145c8000 RCX: ffffffff838cc28b
[  426.129616][    C2] RDX: dffffc0000000000 RSI: 0000000000000001 RDI: ffff8880145c82b8
[  426.132374][    C2] RBP: ffff8880129f8000 R08: 0000000000000000 R09: 0000000000000007
[  426.135077][    C2] R10: ffffffff838cbfd4 R11: 0000000000000001 R12: ffff8880129f8760
[  426.137793][    C2] R13: ffff88800f6e0118 R14: 0000000000000001 R15: ffff88800f6e00e8
[  426.140489][    C2] FS:  0000000000000000(0000) GS:ffff888060d00000(0000) knlGS:0000000000000000
[  426.143525][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  426.145792][    C2] CR2: 000055b5bb0adabc CR3: 000000000e003000 CR4: 00000000000506e0
[  426.148509][    C2] Call Trace:
[  426.149442][    C2]  <IRQ>
[  426.150183][    C2]  ? __tcp_select_window+0x710/0x710
[  426.151457][    C2]  ? __sanitizer_cov_trace_cmp4+0x1c/0x70
[  426.153007][    C2]  ? tcp_current_mss+0x165/0x280
[  426.154245][    C2]  ? tcp_trim_head+0x300/0x300
[  426.155396][    C2]  ? find_held_lock+0x85/0xa0
[  426.156734][    C2]  ? mark_held_locks+0x65/0x90
[  426.157967][    C2]  tcp_write_wakeup+0x2e2/0x340
[  426.159149][    C2]  tcp_send_probe0+0x2a/0x2c0
[  426.160368][    C2]  tcp_write_timer_handler+0x5cb/0x670
[  426.161740][    C2]  tcp_write_timer+0x86/0x250
[  426.162896][    C2]  ? tcp_write_timer_handler+0x670/0x670
[  426.164285][    C2]  call_timer_fn+0x15d/0x5f0
[  426.165481][    C2]  ? add_timer_on+0x2e0/0x2e0
[  426.166667][    C2]  ? lock_downgrade+0x3c0/0x3c0
[  426.167921][    C2]  ? mark_held_locks+0x24/0x90
[  426.169263][    C2]  ? _raw_spin_unlock_irq+0x1f/0x40
[  426.170564][    C2]  ? tcp_write_timer_handler+0x670/0x670
[  426.171920][    C2]  __run_timers.part.0+0x523/0x740
[  426.173181][    C2]  ? call_timer_fn+0x5f0/0x5f0
[  426.174321][    C2]  ? pvclock_clocksource_read+0xdc/0x1a0
[  426.175655][    C2]  run_timer_softirq+0x66/0xe0
[  426.176825][    C2]  __do_softirq+0x1c2/0x670
[  426.177944][    C2]  __irq_exit_rcu+0xf8/0x140
[  426.179120][    C2]  irq_exit_rcu+0x5/0x20
[  426.180150][    C2]  sysvec_apic_timer_interrupt+0x8e/0xc0
[  426.181486][    C2]  </IRQ>
[  426.182180][    C2]  <TASK>
[  426.182845][    C2]  asm_sysvec_apic_timer_interrupt+0x12/0x20

> 
> I think this is still needed for layers (NFS ?) that dismantle their
> TCP sockets whenever a netns
> is dismantled. But RDS case was different, only the listener is a kernel socket.

We can't apply the fix above.

I think that the fundamental problem is that we use net->ns.count for both
"avoiding use-after-free" purpose and "allowing dismantle from user event" purpose.
Why not to use separated counters?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-02  1:40             ` [PATCH v2] " Tetsuo Handa
  2022-05-02 14:12               ` Haakon Bugge
  2022-05-03  9:02               ` Paolo Abeni
@ 2022-05-03 11:40               ` patchwork-bot+netdevbpf
  2022-05-03 21:17                 ` Eric Dumazet
  2 siblings, 1 reply; 30+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-05-03 11:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: edumazet, santosh.shilimkar, davem, kuba, pabeni,
	syzbot+694120e1002c117747ed, netdev, syzkaller-bugs, linux-rdma

Hello:

This patch was applied to netdev/net.git (master)
by Paolo Abeni <pabeni@redhat.com>:

On Mon, 2 May 2022 10:40:18 +0900 you wrote:
> syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> for TCP socket used by RDS is accessing sock_net() without acquiring a
> refcount on net namespace. Since TCP's retransmission can happen after
> a process which created net namespace terminated, we need to explicitly
> acquire a refcount.
> 
> Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> 
> [...]

Here is the summary with links:
  - [v2] net: rds: acquire refcount on TCP sockets
    https://git.kernel.org/netdev/net/c/3a58f13a881e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03 11:40               ` patchwork-bot+netdevbpf
@ 2022-05-03 21:17                 ` Eric Dumazet
  2022-05-03 22:37                   ` Eric Dumazet
  2022-05-04 13:09                   ` [PATCH v2] net: rds: acquire " Paolo Abeni
  0 siblings, 2 replies; 30+ messages in thread
From: Eric Dumazet @ 2022-05-03 21:17 UTC (permalink / raw)
  To: patchwork-bot+netdevbpf
  Cc: Tetsuo Handa, Santosh Shilimkar, David Miller, Jakub Kicinski,
	Paolo Abeni, syzbot, netdev, syzkaller-bugs, linux-rdma

On Tue, May 3, 2022 at 4:40 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
>
> Hello:
>
> This patch was applied to netdev/net.git (master)
> by Paolo Abeni <pabeni@redhat.com>:
>
> On Mon, 2 May 2022 10:40:18 +0900 you wrote:
> > syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> > for TCP socket used by RDS is accessing sock_net() without acquiring a
> > refcount on net namespace. Since TCP's retransmission can happen after
> > a process which created net namespace terminated, we need to explicitly
> > acquire a refcount.
> >
> > Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> > Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> > Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> >
> > [...]
>
> Here is the summary with links:
>   - [v2] net: rds: acquire refcount on TCP sockets
>     https://git.kernel.org/netdev/net/c/3a58f13a881e
>
> You are awesome, thank you!
> --
> Deet-doot-dot, I am a bot.
> https://korg.docs.kernel.org/patchwork/pwbot.html
>
>

I think we merged this patch too soon.

My question is : What prevents rds_tcp_conn_path_connect(), and thus
rds_tcp_tune() to be called
after the netns refcount already reached 0 ?

I guess we can wait for next syzbot report, but I think that get_net()
should be replaced
by maybe_get_net()

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03 21:17                 ` Eric Dumazet
@ 2022-05-03 22:37                   ` Eric Dumazet
  2022-05-04  1:04                     ` Tetsuo Handa
  2022-05-04 13:09                   ` [PATCH v2] net: rds: acquire " Paolo Abeni
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2022-05-03 22:37 UTC (permalink / raw)
  To: patchwork-bot+netdevbpf
  Cc: Tetsuo Handa, Santosh Shilimkar, David Miller, Jakub Kicinski,
	Paolo Abeni, syzbot, netdev, syzkaller-bugs, linux-rdma

On Tue, May 3, 2022 at 2:17 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, May 3, 2022 at 4:40 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
> >
> > Hello:
> >
> > This patch was applied to netdev/net.git (master)
> > by Paolo Abeni <pabeni@redhat.com>:
> >
> > On Mon, 2 May 2022 10:40:18 +0900 you wrote:
> > > syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> > > for TCP socket used by RDS is accessing sock_net() without acquiring a
> > > refcount on net namespace. Since TCP's retransmission can happen after
> > > a process which created net namespace terminated, we need to explicitly
> > > acquire a refcount.
> > >
> > > Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> > > Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > > Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> > > Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
> > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > >
> > > [...]
> >
> > Here is the summary with links:
> >   - [v2] net: rds: acquire refcount on TCP sockets
> >     https://git.kernel.org/netdev/net/c/3a58f13a881e
> >
> > You are awesome, thank you!
> > --
> > Deet-doot-dot, I am a bot.
> > https://korg.docs.kernel.org/patchwork/pwbot.html
> >
> >
>
> I think we merged this patch too soon.
>
> My question is : What prevents rds_tcp_conn_path_connect(), and thus
> rds_tcp_tune() to be called
> after the netns refcount already reached 0 ?
>
> I guess we can wait for next syzbot report, but I think that get_net()
> should be replaced
> by maybe_get_net()

Yes, syzbot was fast to trigger this exact issue:

HEAD commit:    3a58f13a net: rds: acquire refcount on TCP sockets
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 6934 at lib/refcount.c:25
refcount_warn_saturate+0x169/0x1e0 lib/refcount.c:25
Modules linked in:
CPU: 1 PID: 6934 Comm: kworker/u4:17 Not tainted
5.18.0-rc4-syzkaller-00209-g3a58f13a881e #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Workqueue: krdsd rds_connect_worker
RIP: 0010:refcount_warn_saturate+0x169/0x1e0 lib/refcount.c:25
Code: 09 31 ff 89 de e8 f7 b9 81 fd 84 db 0f 85 36 ff ff ff e8 0a b6
81 fd 48 c7 c7 40 eb 26 8a c6 05 75 1f ac 09 01 e8 56 75 2d 05 <0f> 0b
e9 17 ff ff ff e8 eb b5 81 fd 0f b6 1d 5a 1f ac 09 31 ff 89
RSP: 0018:ffffc9000b5e7b80 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88807a948000 RSI: ffffffff81600c08 RDI: fffff520016bcf62
RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815fb5de R11: 0000000000000000 R12: ffff888021e69b80
R13: ffff88805bc82a00 R14: ffff888021e69ccc R15: ffff8880741a2900
FS:  0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2cb5c000 CR3: 000000005688f000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __refcount_add include/linux/refcount.h:199 [inline]
 __refcount_inc include/linux/refcount.h:250 [inline]
 refcount_inc include/linux/refcount.h:267 [inline]
 get_net include/net/net_namespace.h:248 [inline]
 get_net_track include/net/net_namespace.h:334 [inline]
 rds_tcp_tune+0x5a0/0x5f0 net/rds/tcp.c:503
 rds_tcp_conn_path_connect+0x489/0x880 net/rds/tcp_connect.c:127
 rds_connect_worker+0x1a5/0x2c0 net/rds/threads.c:176
 process_one_work+0x996/0x1610 kernel/workqueue.c:2289
 worker_thread+0x665/0x1080 kernel/workqueue.c:2436
 kthread+0x2e9/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
 </TASK>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03 22:37                   ` Eric Dumazet
@ 2022-05-04  1:04                     ` Tetsuo Handa
  2022-05-04  3:09                       ` Eric Dumazet
  0 siblings, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2022-05-04  1:04 UTC (permalink / raw)
  To: Eric Dumazet, patchwork-bot+netdevbpf
  Cc: Santosh Shilimkar, David Miller, Jakub Kicinski, Paolo Abeni,
	syzbot, netdev, syzkaller-bugs, linux-rdma

On 2022/05/04 7:37, Eric Dumazet wrote:
>> I think we merged this patch too soon.
>>
>> My question is : What prevents rds_tcp_conn_path_connect(), and thus
>> rds_tcp_tune() to be called
>> after the netns refcount already reached 0 ?
>>
>> I guess we can wait for next syzbot report, but I think that get_net()
>> should be replaced
>> by maybe_get_net()
> 
> Yes, syzbot was fast to trigger this exact issue:

Does maybe_get_net() help?

Since rds_conn_net() returns a net namespace without holding a ref, it is theoretically
possible that the net namespace returned by rds_conn_net() is already kmem_cache_free()d
if refcount dropped to 0 by the moment sk_alloc() calls sock_net_set().

rds_tcp_conn_path_connect() {
  sock_create_kern(net = rds_conn_net(conn)) {
    __sock_create(net = rds_conn_net(conn), kern = 1) {
      err = pf->create(net = rds_conn_net(conn), kern = 1) {
        // pf->create is either inet_create or inet6_create
        sk_alloc(net = rds_conn_net(conn), kern = 1) {
          sk->sk_net_refcnt = kern ? 0 : 1;
          if (likely(sk->sk_net_refcnt)) {
            get_net_track(net, &sk->ns_tracker, priority);
            sock_inuse_add(net, 1);
          }
          sock_net_set(sk, net);
        }
      }
    }
  }
  rds_tcp_tune() {
    if (!sk->sk_net_refcnt) {
      sk->sk_net_refcnt = 1;
      get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
      sock_inuse_add(net, 1);
    }
  }
}

"struct rds_connection" needs to hold a ref in order to safely allow
rds_tcp_tune() to call maybe_get_net(), which in turn makes pointless
to use maybe_get_net() from rds_tcp_tune() because "struct rds_connection"
must have a ref. Situation where we are protected by maybe_get_net() is
quite limited if long-lived object is not holding a ref.

Hmm, can we simply use &init_net instead of rds_conn_net(conn) ?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-04  1:04                     ` Tetsuo Handa
@ 2022-05-04  3:09                       ` Eric Dumazet
  2022-05-04  4:58                         ` Tetsuo Handa
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2022-05-04  3:09 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: patchwork-bot+netdevbpf, Santosh Shilimkar, David Miller,
	Jakub Kicinski, Paolo Abeni, syzbot, netdev, syzkaller-bugs,
	linux-rdma

On Tue, May 3, 2022 at 6:04 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2022/05/04 7:37, Eric Dumazet wrote:
> >> I think we merged this patch too soon.
> >>
> >> My question is : What prevents rds_tcp_conn_path_connect(), and thus
> >> rds_tcp_tune() to be called
> >> after the netns refcount already reached 0 ?
> >>
> >> I guess we can wait for next syzbot report, but I think that get_net()
> >> should be replaced
> >> by maybe_get_net()
> >
> > Yes, syzbot was fast to trigger this exact issue:
>
> Does maybe_get_net() help?
>
> Since rds_conn_net() returns a net namespace without holding a ref, it is theoretically
> possible that the net namespace returned by rds_conn_net() is already kmem_cache_free()d
> if refcount dropped to 0 by the moment sk_alloc() calls sock_net_set().

Nope. RDS has an exit() handler called from cleanup_net()

(struct pernet_operations)->exit() or exit_batch() :
rds_tcp_exit_net() (rds_tcp_kill_sock())

This exit() handler _has_ to remove all known listeners, and
definitely cancel work queues (synchronous operation)
before the actual "struct net" free can happen later.



>
> rds_tcp_conn_path_connect() {
>   sock_create_kern(net = rds_conn_net(conn)) {
>     __sock_create(net = rds_conn_net(conn), kern = 1) {
>       err = pf->create(net = rds_conn_net(conn), kern = 1) {
>         // pf->create is either inet_create or inet6_create
>         sk_alloc(net = rds_conn_net(conn), kern = 1) {
>           sk->sk_net_refcnt = kern ? 0 : 1;
>           if (likely(sk->sk_net_refcnt)) {
>             get_net_track(net, &sk->ns_tracker, priority);
>             sock_inuse_add(net, 1);
>           }
>           sock_net_set(sk, net);
>         }
>       }
>     }
>   }
>   rds_tcp_tune() {
>     if (!sk->sk_net_refcnt) {
>       sk->sk_net_refcnt = 1;
>       get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
>       sock_inuse_add(net, 1);
>     }
>   }
> }
>
> "struct rds_connection" needs to hold a ref in order to safely allow
> rds_tcp_tune() to call maybe_get_net(), which in turn makes pointless
> to use maybe_get_net() from rds_tcp_tune() because "struct rds_connection"
> must have a ref. Situation where we are protected by maybe_get_net() is
> quite limited if long-lived object is not holding a ref.
>
> Hmm, can we simply use &init_net instead of rds_conn_net(conn) ?

Only if you plan making RDS unavailable for non init netns.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-04  3:09                       ` Eric Dumazet
@ 2022-05-04  4:58                         ` Tetsuo Handa
  2022-05-04 15:15                           ` Tetsuo Handa
  0 siblings, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2022-05-04  4:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: patchwork-bot+netdevbpf, Santosh Shilimkar, David Miller,
	Jakub Kicinski, Paolo Abeni, syzbot, netdev, syzkaller-bugs,
	linux-rdma

On 2022/05/04 12:09, Eric Dumazet wrote:
>> Does maybe_get_net() help?
>>
>> Since rds_conn_net() returns a net namespace without holding a ref, it is theoretically
>> possible that the net namespace returned by rds_conn_net() is already kmem_cache_free()d
>> if refcount dropped to 0 by the moment sk_alloc() calls sock_net_set().
> 
> Nope. RDS has an exit() handler called from cleanup_net()
> 
> (struct pernet_operations)->exit() or exit_batch() :
> rds_tcp_exit_net() (rds_tcp_kill_sock())

Hmm, when put_net() called __put_net(), this "struct net" is chained to cleanup_list.
When cleanup_net() is called via net_cleanup_work, rds_tcp_exit_net() is called from
ops_exit_list(). Therefore, we can call maybe_get_net() until rds_tcp_exit_net() returns.
That's good.

> 
> This exit() handler _has_ to remove all known listeners, and
> definitely cancel work queues (synchronous operation)
> before the actual "struct net" free can happen later.

But in your report, rds_tcp_tune() is called from rds_tcp_conn_path_connect() from
rds_connect_worker() via "struct rds_connection"->cp_conn_w work. I can see that
rds_tcp_kill_sock() calls rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w), and
rds_tcp_listen_stop() calls flush_workqueue(rds_wq) and flush_work(&rtn->rds_tcp_accept_w).

But I can't see how rds_tcp_exit_net() synchronously cancels all works associated
with "struct rds_conn_path".

struct rds_conn_path {
        struct delayed_work     cp_send_w;
        struct delayed_work     cp_recv_w;
        struct delayed_work     cp_conn_w;
        struct work_struct      cp_down_w;
}

These works are queued to rds_wq, but flush_workqueue() waits for completion only
if already queued. What if timer for queue_delayed_work() has not expired, or was
about to call queue_delayed_work() ? Is flush_workqueue(rds_wq) sufficient?

Anyway, if rds_tcp_kill_sock() can somehow guarantee that all works are completed
or cancelled, the fix would look like something below?

 net/rds/tcp.c         | 11 ++++++++---
 net/rds/tcp.h         |  2 +-
 net/rds/tcp_connect.c |  5 ++++-
 net/rds/tcp_listen.c  |  5 ++++-
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 2f638f8b7b1e..8e26bcf02044 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -487,11 +487,11 @@ struct rds_tcp_net {
 /* All module specific customizations to the RDS-TCP socket should be done in
  * rds_tcp_tune() and applied after socket creation.
  */
-void rds_tcp_tune(struct socket *sock)
+bool rds_tcp_tune(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 	struct net *net = sock_net(sk);
-	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
+	struct rds_tcp_net *rtn;
 
 	tcp_sock_set_nodelay(sock->sk);
 	lock_sock(sk);
@@ -499,10 +499,14 @@ void rds_tcp_tune(struct socket *sock)
 	 * a process which created this net namespace terminated.
 	 */
 	if (!sk->sk_net_refcnt) {
+		if (!maybe_get_net(net)) {
+			release_sock(sk);
+			return false;
+		}
 		sk->sk_net_refcnt = 1;
-		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
 		sock_inuse_add(net, 1);
 	}
+	rtn = net_generic(net, rds_tcp_netid);
 	if (rtn->sndbuf_size > 0) {
 		sk->sk_sndbuf = rtn->sndbuf_size;
 		sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
@@ -512,6 +516,7 @@ void rds_tcp_tune(struct socket *sock)
 		sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
 	}
 	release_sock(sk);
+	return true;
 }
 
 static void rds_tcp_accept_worker(struct work_struct *work)
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index dc8d745d6857..f8b5930d7b34 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -49,7 +49,7 @@ struct rds_tcp_statistics {
 };
 
 /* tcp.c */
-void rds_tcp_tune(struct socket *sock);
+bool rds_tcp_tune(struct socket *sock);
 void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp);
 void rds_tcp_reset_callbacks(struct socket *sock, struct rds_conn_path *cp);
 void rds_tcp_restore_callbacks(struct socket *sock,
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index 5461d77fff4f..f0c477c5d1db 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -124,7 +124,10 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
 	if (ret < 0)
 		goto out;
 
-	rds_tcp_tune(sock);
+	if (!rds_tcp_tune(sock)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	if (isv6) {
 		sin6.sin6_family = AF_INET6;
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index 09cadd556d1e..7edf2e69d3fe 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -133,7 +133,10 @@ int rds_tcp_accept_one(struct socket *sock)
 	__module_get(new_sock->ops->owner);
 
 	rds_tcp_keepalive(new_sock);
-	rds_tcp_tune(new_sock);
+	if (!rds_tcp_tune(new_sock)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	inet = inet_sk(new_sock->sk);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-04  4:58                         ` Tetsuo Handa
@ 2022-05-04 15:15                           ` Tetsuo Handa
  2022-05-05  0:45                             ` [PATCH] net: rds: use maybe_get_net() when acquiring " Tetsuo Handa
  0 siblings, 1 reply; 30+ messages in thread
From: Tetsuo Handa @ 2022-05-04 15:15 UTC (permalink / raw)
  To: Eric Dumazet, Paolo Abeni
  Cc: patchwork-bot+netdevbpf, Santosh Shilimkar, David Miller,
	Jakub Kicinski, syzbot, netdev, syzkaller-bugs, linux-rdma

On 2022/05/04 13:58, Tetsuo Handa wrote:
> On 2022/05/04 12:09, Eric Dumazet wrote:
>> This exit() handler _has_ to remove all known listeners, and
>> definitely cancel work queues (synchronous operation)
>> before the actual "struct net" free can happen later.
> 
> But in your report, rds_tcp_tune() is called from rds_tcp_conn_path_connect() from
> rds_connect_worker() via "struct rds_connection"->cp_conn_w work. I can see that
> rds_tcp_kill_sock() calls rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w), and
> rds_tcp_listen_stop() calls flush_workqueue(rds_wq) and flush_work(&rtn->rds_tcp_accept_w).
> 
> But I can't see how rds_tcp_exit_net() synchronously cancels all works associated
> with "struct rds_conn_path".
> 
> struct rds_conn_path {
>         struct delayed_work     cp_send_w;
>         struct delayed_work     cp_recv_w;
>         struct delayed_work     cp_conn_w;
>         struct work_struct      cp_down_w;
> }
> 
> These works are queued to rds_wq, but flush_workqueue() waits for completion only
> if already queued. What if timer for queue_delayed_work() has not expired, or was
> about to call queue_delayed_work() ? Is flush_workqueue(rds_wq) sufficient?

 rds_tcp_tune+0x5a0/0x5f0 net/rds/tcp.c:503
 rds_tcp_conn_path_connect+0x489/0x880 net/rds/tcp_connect.c:127
 rds_connect_worker+0x1a5/0x2c0 net/rds/threads.c:176
 process_one_work+0x996/0x1610 kernel/workqueue.c:2289

rds_tcp_conn_path_connect is referenced by
"struct rds_transport rds_tcp_transport"->conn_path_connect.
It is invoked by

  ret = conn->c_trans->conn_path_connect(cp)

in rds_connect_worker().

rds_connect_worker is referenced by "struct rds_conn_path"->cp_conn_w
via INIT_DELAYED_WORK().

queue_delayed_work(rds_wq, &cp->cp_conn_w, *) is called by
rds_queue_reconnect() or rds_conn_path_connect_if_down().

If rds_conn_path_connect_if_down() were called from
rds_tcp_accept_one_path() from rds_tcp_accept_one(),
rds_tcp_tune() from rds_tcp_accept_one() was already called
before rds_tcp_tune() from rds_tcp_conn_path_connect() is called.
Since the addition on 0 was not reported at rds_tcp_tune() from
rds_tcp_accept_one(), what Eric is reporting cannot be from
rds_tcp_accept_one() from rds_tcp_accept_worker().

Despite rds_tcp_kill_sock() sets rtn->rds_tcp_listen_sock = NULL and
waits for rds_tcp_accept_one() from rds_tcp_accept_worker() to complete
using flush_workqueue(rds_wq), what Eric is reporting is different from
what syzbot+694120e1002c117747ed was reporting.

> 
> Anyway, if rds_tcp_kill_sock() can somehow guarantee that all works are completed
> or cancelled, the fix would look like something below?

I think it is OK to apply below diff in order to avoid addition on 0 problem, but
it is not proven that kmem_cache_free() is not yet called. What should we do?

> 
>  net/rds/tcp.c         | 11 ++++++++---
>  net/rds/tcp.h         |  2 +-
>  net/rds/tcp_connect.c |  5 ++++-
>  net/rds/tcp_listen.c  |  5 ++++-
>  4 files changed, 17 insertions(+), 6 deletions(-)
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH] net: rds: use maybe_get_net() when acquiring refcount on TCP sockets
  2022-05-04 15:15                           ` Tetsuo Handa
@ 2022-05-05  0:45                             ` Tetsuo Handa
  2022-05-05  0:53                               ` Eric Dumazet
                                                 ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Tetsuo Handa @ 2022-05-05  0:45 UTC (permalink / raw)
  To: Eric Dumazet, Paolo Abeni
  Cc: patchwork-bot+netdevbpf, Santosh Shilimkar, David Miller,
	Jakub Kicinski, syzbot, netdev, syzkaller-bugs, linux-rdma

Eric Dumazet is reporting addition on 0 problem at rds_tcp_tune(), for
delayed works queued in rds_wq might be invoked after a net namespace's
refcount already reached 0.

Since rds_tcp_exit_net() from cleanup_net() calls flush_workqueue(rds_wq),
it is guaranteed that we can instead use maybe_get_net() from delayed work
functions until rds_tcp_exit_net() returns.

Note that I'm not convinced that all works which might access a net
namespace are already queued in rds_wq by the moment rds_tcp_exit_net()
calls flush_workqueue(rds_wq). If some race is there, rds_tcp_exit_net()
will fail to wait for work functions, and kmem_cache_free() could be
called from net_free() before maybe_get_net() is called from
rds_tcp_tune().

Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: 3a58f13a881ed351 ("net: rds: acquire refcount on TCP sockets")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 net/rds/tcp.c         | 11 ++++++++---
 net/rds/tcp.h         |  2 +-
 net/rds/tcp_connect.c |  5 ++++-
 net/rds/tcp_listen.c  |  5 ++++-
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 2f638f8b7b1e..8e26bcf02044 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -487,11 +487,11 @@ struct rds_tcp_net {
 /* All module specific customizations to the RDS-TCP socket should be done in
  * rds_tcp_tune() and applied after socket creation.
  */
-void rds_tcp_tune(struct socket *sock)
+bool rds_tcp_tune(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 	struct net *net = sock_net(sk);
-	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
+	struct rds_tcp_net *rtn;
 
 	tcp_sock_set_nodelay(sock->sk);
 	lock_sock(sk);
@@ -499,10 +499,14 @@ void rds_tcp_tune(struct socket *sock)
 	 * a process which created this net namespace terminated.
 	 */
 	if (!sk->sk_net_refcnt) {
+		if (!maybe_get_net(net)) {
+			release_sock(sk);
+			return false;
+		}
 		sk->sk_net_refcnt = 1;
-		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
 		sock_inuse_add(net, 1);
 	}
+	rtn = net_generic(net, rds_tcp_netid);
 	if (rtn->sndbuf_size > 0) {
 		sk->sk_sndbuf = rtn->sndbuf_size;
 		sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
@@ -512,6 +516,7 @@ void rds_tcp_tune(struct socket *sock)
 		sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
 	}
 	release_sock(sk);
+	return true;
 }
 
 static void rds_tcp_accept_worker(struct work_struct *work)
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index dc8d745d6857..f8b5930d7b34 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -49,7 +49,7 @@ struct rds_tcp_statistics {
 };
 
 /* tcp.c */
-void rds_tcp_tune(struct socket *sock);
+bool rds_tcp_tune(struct socket *sock);
 void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp);
 void rds_tcp_reset_callbacks(struct socket *sock, struct rds_conn_path *cp);
 void rds_tcp_restore_callbacks(struct socket *sock,
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index 5461d77fff4f..f0c477c5d1db 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -124,7 +124,10 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
 	if (ret < 0)
 		goto out;
 
-	rds_tcp_tune(sock);
+	if (!rds_tcp_tune(sock)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	if (isv6) {
 		sin6.sin6_family = AF_INET6;
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index 09cadd556d1e..7edf2e69d3fe 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -133,7 +133,10 @@ int rds_tcp_accept_one(struct socket *sock)
 	__module_get(new_sock->ops->owner);
 
 	rds_tcp_keepalive(new_sock);
-	rds_tcp_tune(new_sock);
+	if (!rds_tcp_tune(new_sock)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	inet = inet_sk(new_sock->sk);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: rds: use maybe_get_net() when acquiring refcount on TCP sockets
  2022-05-05  0:45                             ` [PATCH] net: rds: use maybe_get_net() when acquiring " Tetsuo Handa
@ 2022-05-05  0:53                               ` Eric Dumazet
  2022-05-05  1:04                               ` Jakub Kicinski
  2022-05-05  1:53                               ` [PATCH net v2] " Tetsuo Handa
  2 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2022-05-05  0:53 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Paolo Abeni, patchwork-bot+netdevbpf, Santosh Shilimkar,
	David Miller, Jakub Kicinski, syzbot, netdev, syzkaller-bugs,
	linux-rdma

On Wed, May 4, 2022 at 5:45 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Eric Dumazet is reporting addition on 0 problem at rds_tcp_tune(), for
> delayed works queued in rds_wq might be invoked after a net namespace's
> refcount already reached 0.
>
> Since rds_tcp_exit_net() from cleanup_net() calls flush_workqueue(rds_wq),
> it is guaranteed that we can instead use maybe_get_net() from delayed work
> functions until rds_tcp_exit_net() returns.
>
> Note that I'm not convinced that all works which might access a net
> namespace are already queued in rds_wq by the moment rds_tcp_exit_net()
> calls flush_workqueue(rds_wq). If some race is there, rds_tcp_exit_net()
> will fail to wait for work functions, and kmem_cache_free() could be
> called from net_free() before maybe_get_net() is called from
> rds_tcp_tune().
>
> Reported-by: Eric Dumazet <edumazet@google.com>
> Fixes: 3a58f13a881ed351 ("net: rds: acquire refcount on TCP sockets")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  net/rds/tcp.c         | 11 ++++++++---
>  net/rds/tcp.h         |  2 +-
>  net/rds/tcp_connect.c |  5 ++++-
>  net/rds/tcp_listen.c  |  5 ++++-
>  4 files changed, 17 insertions(+), 6 deletions(-)
>
> diff --git a/net/rds/tcp.c b/net/rds/tcp.c
> index 2f638f8b7b1e..8e26bcf02044 100644
> --- a/net/rds/tcp.c
> +++ b/net/rds/tcp.c
> @@ -487,11 +487,11 @@ struct rds_tcp_net {
>  /* All module specific customizations to the RDS-TCP socket should be done in
>   * rds_tcp_tune() and applied after socket creation.
>   */
> -void rds_tcp_tune(struct socket *sock)
> +bool rds_tcp_tune(struct socket *sock)
>  {
>         struct sock *sk = sock->sk;
>         struct net *net = sock_net(sk);
> -       struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
> +       struct rds_tcp_net *rtn;
>
>         tcp_sock_set_nodelay(sock->sk);
>         lock_sock(sk);
> @@ -499,10 +499,14 @@ void rds_tcp_tune(struct socket *sock)
>          * a process which created this net namespace terminated.
>          */
>         if (!sk->sk_net_refcnt) {
> +               if (!maybe_get_net(net)) {


> +                       release_sock(sk);
> +                       return false;
> +               }
>                 sk->sk_net_refcnt = 1;
> -               get_net_track(net, &sk->ns_tracker, GFP_KERNEL);

This could use:
                  netns_tracker_alloc(net, &sk->ns_tracker, GFP_KERNEL);

>                 sock_inuse_add(net, 1);
>         }
> +       rtn = net_generic(net, rds_tcp_netid);
>         if (rtn->sndbuf_size > 0) {
>                 sk->sk_sndbuf = rtn->sndbuf_size;
>                 sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
> @@ -512,6 +516,7 @@ void rds_tcp_tune(struct socket *sock)
>                 sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
>         }
>         release_sock(sk);
> +       return true;
>  }
>

Otherwise, patch looks good to me, thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] net: rds: use maybe_get_net() when acquiring refcount on TCP sockets
  2022-05-05  0:45                             ` [PATCH] net: rds: use maybe_get_net() when acquiring " Tetsuo Handa
  2022-05-05  0:53                               ` Eric Dumazet
@ 2022-05-05  1:04                               ` Jakub Kicinski
  2022-05-05  1:53                               ` [PATCH net v2] " Tetsuo Handa
  2 siblings, 0 replies; 30+ messages in thread
From: Jakub Kicinski @ 2022-05-05  1:04 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Eric Dumazet, Paolo Abeni, patchwork-bot+netdevbpf,
	Santosh Shilimkar, David Miller, syzbot, netdev, syzkaller-bugs,
	linux-rdma

On Thu, 5 May 2022 09:45:49 +0900 Tetsuo Handa wrote:
> Subject: [PATCH] net: rds: use maybe_get_net() when acquiring refcount on TCP  sockets

Please tag the next version as [PATCH net v2], and make sure it applies
cleanly on top of net/master, 'cause reportedly this one didn't?
https://patchwork.kernel.org/project/netdevbpf/patch/63dab11e-2aeb-5608-6dcb-6ebc3e98056e@I-love.SAKURA.ne.jp/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH net v2] net: rds: use maybe_get_net() when acquiring refcount on TCP sockets
  2022-05-05  0:45                             ` [PATCH] net: rds: use maybe_get_net() when acquiring " Tetsuo Handa
  2022-05-05  0:53                               ` Eric Dumazet
  2022-05-05  1:04                               ` Jakub Kicinski
@ 2022-05-05  1:53                               ` Tetsuo Handa
  2022-05-05 19:13                                 ` Eric Dumazet
  2022-05-06  1:20                                 ` patchwork-bot+netdevbpf
  2 siblings, 2 replies; 30+ messages in thread
From: Tetsuo Handa @ 2022-05-05  1:53 UTC (permalink / raw)
  To: Eric Dumazet, Paolo Abeni
  Cc: patchwork-bot+netdevbpf, Santosh Shilimkar, David Miller,
	Jakub Kicinski, syzbot, netdev, syzkaller-bugs, linux-rdma

Eric Dumazet is reporting addition on 0 problem at rds_tcp_tune(), for
delayed works queued in rds_wq might be invoked after a net namespace's
refcount already reached 0.

Since rds_tcp_exit_net() from cleanup_net() calls flush_workqueue(rds_wq),
it is guaranteed that we can instead use maybe_get_net() from delayed work
functions until rds_tcp_exit_net() returns.

Note that I'm not convinced that all works which might access a net
namespace are already queued in rds_wq by the moment rds_tcp_exit_net()
calls flush_workqueue(rds_wq). If some race is there, rds_tcp_exit_net()
will fail to wait for work functions, and kmem_cache_free() could be
called from net_free() before maybe_get_net() is called from
rds_tcp_tune().

Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: 3a58f13a881ed351 ("net: rds: acquire refcount on TCP sockets")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
Changes in v2:
  Add netns_tracker_alloc().

 net/rds/tcp.c         | 12 +++++++++---
 net/rds/tcp.h         |  2 +-
 net/rds/tcp_connect.c |  5 ++++-
 net/rds/tcp_listen.c  |  5 ++++-
 4 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 2f638f8b7b1e..73ee2771093d 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -487,11 +487,11 @@ struct rds_tcp_net {
 /* All module specific customizations to the RDS-TCP socket should be done in
  * rds_tcp_tune() and applied after socket creation.
  */
-void rds_tcp_tune(struct socket *sock)
+bool rds_tcp_tune(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 	struct net *net = sock_net(sk);
-	struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
+	struct rds_tcp_net *rtn;
 
 	tcp_sock_set_nodelay(sock->sk);
 	lock_sock(sk);
@@ -499,10 +499,15 @@ void rds_tcp_tune(struct socket *sock)
 	 * a process which created this net namespace terminated.
 	 */
 	if (!sk->sk_net_refcnt) {
+		if (!maybe_get_net(net)) {
+			release_sock(sk);
+			return false;
+		}
 		sk->sk_net_refcnt = 1;
-		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
+		netns_tracker_alloc(net, &sk->ns_tracker, GFP_KERNEL);
 		sock_inuse_add(net, 1);
 	}
+	rtn = net_generic(net, rds_tcp_netid);
 	if (rtn->sndbuf_size > 0) {
 		sk->sk_sndbuf = rtn->sndbuf_size;
 		sk->sk_userlocks |= SOCK_SNDBUF_LOCK;
@@ -512,6 +517,7 @@ void rds_tcp_tune(struct socket *sock)
 		sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
 	}
 	release_sock(sk);
+	return true;
 }
 
 static void rds_tcp_accept_worker(struct work_struct *work)
diff --git a/net/rds/tcp.h b/net/rds/tcp.h
index dc8d745d6857..f8b5930d7b34 100644
--- a/net/rds/tcp.h
+++ b/net/rds/tcp.h
@@ -49,7 +49,7 @@ struct rds_tcp_statistics {
 };
 
 /* tcp.c */
-void rds_tcp_tune(struct socket *sock);
+bool rds_tcp_tune(struct socket *sock);
 void rds_tcp_set_callbacks(struct socket *sock, struct rds_conn_path *cp);
 void rds_tcp_reset_callbacks(struct socket *sock, struct rds_conn_path *cp);
 void rds_tcp_restore_callbacks(struct socket *sock,
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index 5461d77fff4f..f0c477c5d1db 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -124,7 +124,10 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
 	if (ret < 0)
 		goto out;
 
-	rds_tcp_tune(sock);
+	if (!rds_tcp_tune(sock)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	if (isv6) {
 		sin6.sin6_family = AF_INET6;
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index 09cadd556d1e..7edf2e69d3fe 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -133,7 +133,10 @@ int rds_tcp_accept_one(struct socket *sock)
 	__module_get(new_sock->ops->owner);
 
 	rds_tcp_keepalive(new_sock);
-	rds_tcp_tune(new_sock);
+	if (!rds_tcp_tune(new_sock)) {
+		ret = -EINVAL;
+		goto out;
+	}
 
 	inet = inet_sk(new_sock->sk);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net v2] net: rds: use maybe_get_net() when acquiring refcount on TCP sockets
  2022-05-05  1:53                               ` [PATCH net v2] " Tetsuo Handa
@ 2022-05-05 19:13                                 ` Eric Dumazet
  2022-05-06  1:20                                 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2022-05-05 19:13 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Paolo Abeni, patchwork-bot+netdevbpf, Santosh Shilimkar,
	David Miller, Jakub Kicinski, syzbot, netdev, syzkaller-bugs,
	linux-rdma

On Wed, May 4, 2022 at 6:54 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> Eric Dumazet is reporting addition on 0 problem at rds_tcp_tune(), for
> delayed works queued in rds_wq might be invoked after a net namespace's
> refcount already reached 0.
>
> Since rds_tcp_exit_net() from cleanup_net() calls flush_workqueue(rds_wq),
> it is guaranteed that we can instead use maybe_get_net() from delayed work
> functions until rds_tcp_exit_net() returns.
>
> Note that I'm not convinced that all works which might access a net
> namespace are already queued in rds_wq by the moment rds_tcp_exit_net()
> calls flush_workqueue(rds_wq). If some race is there, rds_tcp_exit_net()
> will fail to wait for work functions, and kmem_cache_free() could be
> called from net_free() before maybe_get_net() is called from
> rds_tcp_tune().
>
> Reported-by: Eric Dumazet <edumazet@google.com>
> Fixes: 3a58f13a881ed351 ("net: rds: acquire refcount on TCP sockets")
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net v2] net: rds: use maybe_get_net() when acquiring refcount on TCP sockets
  2022-05-05  1:53                               ` [PATCH net v2] " Tetsuo Handa
  2022-05-05 19:13                                 ` Eric Dumazet
@ 2022-05-06  1:20                                 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 30+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-05-06  1:20 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: edumazet, pabeni, patchwork-bot+netdevbpf, santosh.shilimkar,
	davem, kuba, syzbot+694120e1002c117747ed, netdev, syzkaller-bugs,
	linux-rdma

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 5 May 2022 10:53:53 +0900 you wrote:
> Eric Dumazet is reporting addition on 0 problem at rds_tcp_tune(), for
> delayed works queued in rds_wq might be invoked after a net namespace's
> refcount already reached 0.
> 
> Since rds_tcp_exit_net() from cleanup_net() calls flush_workqueue(rds_wq),
> it is guaranteed that we can instead use maybe_get_net() from delayed work
> functions until rds_tcp_exit_net() returns.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: rds: use maybe_get_net() when acquiring refcount on TCP sockets
    https://git.kernel.org/netdev/net/c/6997fbd7a3da

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-03 21:17                 ` Eric Dumazet
  2022-05-03 22:37                   ` Eric Dumazet
@ 2022-05-04 13:09                   ` Paolo Abeni
  2022-05-04 13:25                     ` Eric Dumazet
  1 sibling, 1 reply; 30+ messages in thread
From: Paolo Abeni @ 2022-05-04 13:09 UTC (permalink / raw)
  To: Eric Dumazet, patchwork-bot+netdevbpf
  Cc: Tetsuo Handa, Santosh Shilimkar, David Miller, Jakub Kicinski,
	syzbot, netdev, syzkaller-bugs, linux-rdma

On Tue, 2022-05-03 at 14:17 -0700, Eric Dumazet wrote:
> On Tue, May 3, 2022 at 4:40 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > 
> > Hello:
> > 
> > This patch was applied to netdev/net.git (master)
> > by Paolo Abeni <pabeni@redhat.com>:
> > 
> > On Mon, 2 May 2022 10:40:18 +0900 you wrote:
> > > syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> > > for TCP socket used by RDS is accessing sock_net() without acquiring a
> > > refcount on net namespace. Since TCP's retransmission can happen after
> > > a process which created net namespace terminated, we need to explicitly
> > > acquire a refcount.
> > > 
> > > Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> > > Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > > Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> > > Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
> > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > > 
> > > [...]
> > 
> > Here is the summary with links:
> >   - [v2] net: rds: acquire refcount on TCP sockets
> >     https://git.kernel.org/netdev/net/c/3a58f13a881e
> > 
> > You are awesome, thank you!
> > --
> > Deet-doot-dot, I am a bot.
> > https://korg.docs.kernel.org/patchwork/pwbot.html
> > 
> > 
> 
> I think we merged this patch too soon.

My fault.


> My question is : What prevents rds_tcp_conn_path_connect(), and thus
> rds_tcp_tune() to be called
> after the netns refcount already reached 0 ?
> 
> I guess we can wait for next syzbot report, but I think that get_net()
> should be replaced
> by maybe_get_net()
> 
Should we revert this patch before the next pull request, if a suitable
incremental fix is not available by then?

It looks like the window of opportunity for the race is roughly the
same?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2] net: rds: acquire refcount on TCP sockets
  2022-05-04 13:09                   ` [PATCH v2] net: rds: acquire " Paolo Abeni
@ 2022-05-04 13:25                     ` Eric Dumazet
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2022-05-04 13:25 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: patchwork-bot+netdevbpf, Tetsuo Handa, Santosh Shilimkar,
	David Miller, Jakub Kicinski, syzbot, netdev, syzkaller-bugs,
	linux-rdma

On Wed, May 4, 2022 at 6:09 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2022-05-03 at 14:17 -0700, Eric Dumazet wrote:
> > On Tue, May 3, 2022 at 4:40 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > >
> > > Hello:
> > >
> > > This patch was applied to netdev/net.git (master)
> > > by Paolo Abeni <pabeni@redhat.com>:
> > >
> > > On Mon, 2 May 2022 10:40:18 +0900 you wrote:
> > > > syzbot is reporting use-after-free read in tcp_retransmit_timer() [1],
> > > > for TCP socket used by RDS is accessing sock_net() without acquiring a
> > > > refcount on net namespace. Since TCP's retransmission can happen after
> > > > a process which created net namespace terminated, we need to explicitly
> > > > acquire a refcount.
> > > >
> > > > Link: https://syzkaller.appspot.com/bug?extid=694120e1002c117747ed [1]
> > > > Reported-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > > > Fixes: 26abe14379f8e2fa ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
> > > > Fixes: 8a68173691f03661 ("net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket")
> > > > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > Tested-by: syzbot <syzbot+694120e1002c117747ed@syzkaller.appspotmail.com>
> > > >
> > > > [...]
> > >
> > > Here is the summary with links:
> > >   - [v2] net: rds: acquire refcount on TCP sockets
> > >     https://git.kernel.org/netdev/net/c/3a58f13a881e
> > >
> > > You are awesome, thank you!
> > > --
> > > Deet-doot-dot, I am a bot.
> > > https://korg.docs.kernel.org/patchwork/pwbot.html
> > >
> > >
> >
> > I think we merged this patch too soon.
>
> My fault.
>
>
> > My question is : What prevents rds_tcp_conn_path_connect(), and thus
> > rds_tcp_tune() to be called
> > after the netns refcount already reached 0 ?
> >
> > I guess we can wait for next syzbot report, but I think that get_net()
> > should be replaced
> > by maybe_get_net()
> >
> Should we revert this patch before the next pull request, if a suitable
> incremental fix is not available by then?
>
> It looks like the window of opportunity for the race is roughly the
> same?
>

No need to revert the patch, we certainly are in a better situation,
as refcount_t helps here.

We can refine the logic in a followup.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-05-06  1:20 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <00000000000045dc96059f4d7b02@google.com>
     [not found] ` <000000000000f75af905d3ba0716@google.com>
     [not found]   ` <c389e47f-8f82-fd62-8c1d-d9481d2f71ff@I-love.SAKURA.ne.jp>
2022-04-22 14:40     ` [syzbot] KASAN: use-after-free Read in tcp_retransmit_timer (5) Tetsuo Handa
2022-04-24  3:57       ` Tetsuo Handa
2022-05-01 15:29         ` [PATCH] net: rds: acquire refcount on TCP sockets Tetsuo Handa
2022-05-01 16:14           ` Eric Dumazet
2022-05-02  1:40             ` [PATCH v2] " Tetsuo Handa
2022-05-02 14:12               ` Haakon Bugge
2022-05-02 14:29                 ` Tetsuo Handa
2022-05-03  9:02               ` Paolo Abeni
2022-05-03  9:56                 ` Tetsuo Handa
2022-05-03 11:10                   ` Paolo Abeni
2022-05-03 13:27                 ` David Laight
2022-05-03 13:43                   ` Eric Dumazet
2022-05-03 14:25                     ` David Laight
2022-05-03 13:45                 ` Eric Dumazet
2022-05-03 14:08                   ` Tetsuo Handa
2022-05-03 11:40               ` patchwork-bot+netdevbpf
2022-05-03 21:17                 ` Eric Dumazet
2022-05-03 22:37                   ` Eric Dumazet
2022-05-04  1:04                     ` Tetsuo Handa
2022-05-04  3:09                       ` Eric Dumazet
2022-05-04  4:58                         ` Tetsuo Handa
2022-05-04 15:15                           ` Tetsuo Handa
2022-05-05  0:45                             ` [PATCH] net: rds: use maybe_get_net() when acquiring " Tetsuo Handa
2022-05-05  0:53                               ` Eric Dumazet
2022-05-05  1:04                               ` Jakub Kicinski
2022-05-05  1:53                               ` [PATCH net v2] " Tetsuo Handa
2022-05-05 19:13                                 ` Eric Dumazet
2022-05-06  1:20                                 ` patchwork-bot+netdevbpf
2022-05-04 13:09                   ` [PATCH v2] net: rds: acquire " Paolo Abeni
2022-05-04 13:25                     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).