netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Wang Yufen <wangyufen@huawei.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Jakub Sitnicki <jakub@cloudflare.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Sasha Levin <sashal@kernel.org>,
	ast@kernel.org, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, yoshfuji@linux-ipv6.org,
	dsahern@kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 5.19 010/105] bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues
Date: Thu, 11 Aug 2022 11:26:54 -0400	[thread overview]
Message-ID: <20220811152851.1520029-10-sashal@kernel.org> (raw)
In-Reply-To: <20220811152851.1520029-1-sashal@kernel.org>

From: Wang Yufen <wangyufen@huawei.com>

[ Upstream commit d8616ee2affcff37c5d315310da557a694a3303d ]

During TCP sockmap redirect pressure test, the following warning is triggered:

WARNING: CPU: 3 PID: 2145 at net/core/stream.c:205 sk_stream_kill_queues+0xbc/0xd0
CPU: 3 PID: 2145 Comm: iperf Kdump: loaded Tainted: G        W         5.10.0+ #9
Call Trace:
 inet_csk_destroy_sock+0x55/0x110
 inet_csk_listen_stop+0xbb/0x380
 tcp_close+0x41b/0x480
 inet_release+0x42/0x80
 __sock_release+0x3d/0xa0
 sock_close+0x11/0x20
 __fput+0x9d/0x240
 task_work_run+0x62/0x90
 exit_to_user_mode_prepare+0x110/0x120
 syscall_exit_to_user_mode+0x27/0x190
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The reason we observed is that:

When the listener is closing, a connection may have completed the three-way
handshake but not accepted, and the client has sent some packets. The child
sks in accept queue release by inet_child_forget()->inet_csk_destroy_sock(),
but psocks of child sks have not released.

To fix, add sock_map_destroy to release psocks.

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220524075311.649153-1-wangyufen@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/bpf.h   |  1 +
 include/linux/skmsg.h |  1 +
 net/core/skmsg.c      |  1 +
 net/core/sock_map.c   | 23 +++++++++++++++++++++++
 net/ipv4/tcp_bpf.c    |  1 +
 5 files changed, 27 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2b914a56a2c5..8e6092d0ea95 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2104,6 +2104,7 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr,
 			    union bpf_attr __user *uattr);
 
 void sock_map_unhash(struct sock *sk);
+void sock_map_destroy(struct sock *sk);
 void sock_map_close(struct sock *sk, long timeout);
 #else
 static inline int bpf_prog_offload_init(struct bpf_prog *prog,
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c5a2d6f50f25..153b6dec9b6a 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -95,6 +95,7 @@ struct sk_psock {
 	spinlock_t			link_lock;
 	refcount_t			refcnt;
 	void (*saved_unhash)(struct sock *sk);
+	void (*saved_destroy)(struct sock *sk);
 	void (*saved_close)(struct sock *sk, long timeout);
 	void (*saved_write_space)(struct sock *sk);
 	void (*saved_data_ready)(struct sock *sk);
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index b0fcd0200e84..fc69154bbc88 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -720,6 +720,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node)
 	psock->eval = __SK_NONE;
 	psock->sk_proto = prot;
 	psock->saved_unhash = prot->unhash;
+	psock->saved_destroy = prot->destroy;
 	psock->saved_close = prot->close;
 	psock->saved_write_space = sk->sk_write_space;
 
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 81d4b4756a02..9f08ccfaf6da 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1561,6 +1561,29 @@ void sock_map_unhash(struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(sock_map_unhash);
 
+void sock_map_destroy(struct sock *sk)
+{
+	void (*saved_destroy)(struct sock *sk);
+	struct sk_psock *psock;
+
+	rcu_read_lock();
+	psock = sk_psock_get(sk);
+	if (unlikely(!psock)) {
+		rcu_read_unlock();
+		if (sk->sk_prot->destroy)
+			sk->sk_prot->destroy(sk);
+		return;
+	}
+
+	saved_destroy = psock->saved_destroy;
+	sock_map_remove_links(sk, psock);
+	rcu_read_unlock();
+	sk_psock_stop(psock, true);
+	sk_psock_put(sk, psock);
+	saved_destroy(sk);
+}
+EXPORT_SYMBOL_GPL(sock_map_destroy);
+
 void sock_map_close(struct sock *sk, long timeout)
 {
 	void (*saved_close)(struct sock *sk, long timeout);
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 0d3f68bb51c0..a1626afe87a1 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -540,6 +540,7 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
 				   struct proto *base)
 {
 	prot[TCP_BPF_BASE]			= *base;
+	prot[TCP_BPF_BASE].destroy		= sock_map_destroy;
 	prot[TCP_BPF_BASE].close		= sock_map_close;
 	prot[TCP_BPF_BASE].recvmsg		= tcp_bpf_recvmsg;
 	prot[TCP_BPF_BASE].sock_is_readable	= sk_msg_is_readable;
-- 
2.35.1


  parent reply	other threads:[~2022-08-11 15:30 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220811152851.1520029-1-sashal@kernel.org>
2022-08-11 15:26 ` [PATCH AUTOSEL 5.19 003/105] ath10k: htt_tx: do not interpret Eth frames as WiFi Sasha Levin
2022-08-11 15:26 ` [PATCH AUTOSEL 5.19 004/105] ath10k: fix misreported tx bandwidth for 160Mhz Sasha Levin
2022-08-11 15:26 ` [PATCH AUTOSEL 5.19 009/105] ath10k: fix regdomain info of iw reg set/get Sasha Levin
2022-08-11 15:26 ` Sasha Levin [this message]
2022-08-11 15:26 ` [PATCH AUTOSEL 5.19 015/105] iavf: Add waiting for response from PF in set mac Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 023/105] net/mlx5: Add HW definitions of vport debug counters Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 026/105] net: phy: marvell-88x2222: set proper phydev->port Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 027/105] selftests: mlxsw: resource_scale: Allow skipping a test Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 028/105] net: make xpcs_do_config to accept advertising for pcs-xpcs and sja1105 Sasha Levin
2022-08-11 15:38   ` Russell King (Oracle)
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 029/105] net: stmmac: make mdio register skips PHY scanning for fixed-link Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 030/105] net: dsa: ar9331: fix potential dead lock on mdio access Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 043/105] mlxsw: cmd: Increase 'config_profile.flood_mode' length Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 051/105] ipv6/addrconf: fix timing bug in tempaddr regen Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 056/105] octeontx2-af: fix operand size in bitwise operation Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 066/105] octeontx2-af: Don't reset previous pfc config Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 068/105] bpf: Make non-preallocated allocation low priority Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 071/105] selftests/bpf: Do not attach kprobe_multi bench to bpf_dispatcher_xdp_func Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 076/105] net: devlink: avoid false DEADLOCK warning reported by lockdep Sasha Levin
2022-08-11 15:56   ` Jakub Kicinski
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 077/105] bpf: Don't redirect packets with invalid pkt_len Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 079/105] can: sja1000: Add Quirk for RZ/N1 SJA1000 CAN controller Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 082/105] Bluetooth: use memset avoid memory leaks Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 083/105] Bluetooth: Collect kcov coverage from hci_rx_work Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 085/105] Bluetooth: hci_sync: Check LMP feature bit instead of quirk Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 088/105] Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 089/105] Bluetooth: mgmt: Fix using hci_conn_abort Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 090/105] bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 095/105] net: ethernet: stmicro: stmmac: first disable all queues and disconnect in release Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 096/105] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration Sasha Levin
2022-08-11 15:54   ` Jakub Kicinski
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 097/105] Revert "ath11k: add support for hardware rfkill for QCA6390" Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 098/105] wifi: rtl8xxxu: Fix the error handling of the probe function Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220811152851.1520029-10-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=jakub@cloudflare.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=wangyufen@huawei.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).