From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Wang Yufen <wangyufen@huawei.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Jakub Sitnicki <jakub@cloudflare.com>,
John Fastabend <john.fastabend@gmail.com>,
Sasha Levin <sashal@kernel.org>,
ast@kernel.org, davem@davemloft.net, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, yoshfuji@linux-ipv6.org,
dsahern@kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org
Subject: [PATCH AUTOSEL 5.19 010/105] bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues
Date: Thu, 11 Aug 2022 11:26:54 -0400 [thread overview]
Message-ID: <20220811152851.1520029-10-sashal@kernel.org> (raw)
In-Reply-To: <20220811152851.1520029-1-sashal@kernel.org>
From: Wang Yufen <wangyufen@huawei.com>
[ Upstream commit d8616ee2affcff37c5d315310da557a694a3303d ]
During TCP sockmap redirect pressure test, the following warning is triggered:
WARNING: CPU: 3 PID: 2145 at net/core/stream.c:205 sk_stream_kill_queues+0xbc/0xd0
CPU: 3 PID: 2145 Comm: iperf Kdump: loaded Tainted: G W 5.10.0+ #9
Call Trace:
inet_csk_destroy_sock+0x55/0x110
inet_csk_listen_stop+0xbb/0x380
tcp_close+0x41b/0x480
inet_release+0x42/0x80
__sock_release+0x3d/0xa0
sock_close+0x11/0x20
__fput+0x9d/0x240
task_work_run+0x62/0x90
exit_to_user_mode_prepare+0x110/0x120
syscall_exit_to_user_mode+0x27/0x190
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason we observed is that:
When the listener is closing, a connection may have completed the three-way
handshake but not accepted, and the client has sent some packets. The child
sks in accept queue release by inet_child_forget()->inet_csk_destroy_sock(),
but psocks of child sks have not released.
To fix, add sock_map_destroy to release psocks.
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20220524075311.649153-1-wangyufen@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
include/linux/bpf.h | 1 +
include/linux/skmsg.h | 1 +
net/core/skmsg.c | 1 +
net/core/sock_map.c | 23 +++++++++++++++++++++++
net/ipv4/tcp_bpf.c | 1 +
5 files changed, 27 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2b914a56a2c5..8e6092d0ea95 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2104,6 +2104,7 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr);
void sock_map_unhash(struct sock *sk);
+void sock_map_destroy(struct sock *sk);
void sock_map_close(struct sock *sk, long timeout);
#else
static inline int bpf_prog_offload_init(struct bpf_prog *prog,
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index c5a2d6f50f25..153b6dec9b6a 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -95,6 +95,7 @@ struct sk_psock {
spinlock_t link_lock;
refcount_t refcnt;
void (*saved_unhash)(struct sock *sk);
+ void (*saved_destroy)(struct sock *sk);
void (*saved_close)(struct sock *sk, long timeout);
void (*saved_write_space)(struct sock *sk);
void (*saved_data_ready)(struct sock *sk);
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index b0fcd0200e84..fc69154bbc88 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -720,6 +720,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node)
psock->eval = __SK_NONE;
psock->sk_proto = prot;
psock->saved_unhash = prot->unhash;
+ psock->saved_destroy = prot->destroy;
psock->saved_close = prot->close;
psock->saved_write_space = sk->sk_write_space;
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 81d4b4756a02..9f08ccfaf6da 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -1561,6 +1561,29 @@ void sock_map_unhash(struct sock *sk)
}
EXPORT_SYMBOL_GPL(sock_map_unhash);
+void sock_map_destroy(struct sock *sk)
+{
+ void (*saved_destroy)(struct sock *sk);
+ struct sk_psock *psock;
+
+ rcu_read_lock();
+ psock = sk_psock_get(sk);
+ if (unlikely(!psock)) {
+ rcu_read_unlock();
+ if (sk->sk_prot->destroy)
+ sk->sk_prot->destroy(sk);
+ return;
+ }
+
+ saved_destroy = psock->saved_destroy;
+ sock_map_remove_links(sk, psock);
+ rcu_read_unlock();
+ sk_psock_stop(psock, true);
+ sk_psock_put(sk, psock);
+ saved_destroy(sk);
+}
+EXPORT_SYMBOL_GPL(sock_map_destroy);
+
void sock_map_close(struct sock *sk, long timeout)
{
void (*saved_close)(struct sock *sk, long timeout);
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 0d3f68bb51c0..a1626afe87a1 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -540,6 +540,7 @@ static void tcp_bpf_rebuild_protos(struct proto prot[TCP_BPF_NUM_CFGS],
struct proto *base)
{
prot[TCP_BPF_BASE] = *base;
+ prot[TCP_BPF_BASE].destroy = sock_map_destroy;
prot[TCP_BPF_BASE].close = sock_map_close;
prot[TCP_BPF_BASE].recvmsg = tcp_bpf_recvmsg;
prot[TCP_BPF_BASE].sock_is_readable = sk_msg_is_readable;
--
2.35.1
next prev parent reply other threads:[~2022-08-11 15:30 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220811152851.1520029-1-sashal@kernel.org>
2022-08-11 15:26 ` [PATCH AUTOSEL 5.19 003/105] ath10k: htt_tx: do not interpret Eth frames as WiFi Sasha Levin
2022-08-11 15:26 ` [PATCH AUTOSEL 5.19 004/105] ath10k: fix misreported tx bandwidth for 160Mhz Sasha Levin
2022-08-11 15:26 ` [PATCH AUTOSEL 5.19 009/105] ath10k: fix regdomain info of iw reg set/get Sasha Levin
2022-08-11 15:26 ` Sasha Levin [this message]
2022-08-11 15:26 ` [PATCH AUTOSEL 5.19 015/105] iavf: Add waiting for response from PF in set mac Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 023/105] net/mlx5: Add HW definitions of vport debug counters Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 026/105] net: phy: marvell-88x2222: set proper phydev->port Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 027/105] selftests: mlxsw: resource_scale: Allow skipping a test Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 028/105] net: make xpcs_do_config to accept advertising for pcs-xpcs and sja1105 Sasha Levin
2022-08-11 15:38 ` Russell King (Oracle)
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 029/105] net: stmmac: make mdio register skips PHY scanning for fixed-link Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 030/105] net: dsa: ar9331: fix potential dead lock on mdio access Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 043/105] mlxsw: cmd: Increase 'config_profile.flood_mode' length Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 051/105] ipv6/addrconf: fix timing bug in tempaddr regen Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 056/105] octeontx2-af: fix operand size in bitwise operation Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 066/105] octeontx2-af: Don't reset previous pfc config Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 068/105] bpf: Make non-preallocated allocation low priority Sasha Levin
2022-08-11 15:27 ` [PATCH AUTOSEL 5.19 071/105] selftests/bpf: Do not attach kprobe_multi bench to bpf_dispatcher_xdp_func Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 076/105] net: devlink: avoid false DEADLOCK warning reported by lockdep Sasha Levin
2022-08-11 15:56 ` Jakub Kicinski
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 077/105] bpf: Don't redirect packets with invalid pkt_len Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 079/105] can: sja1000: Add Quirk for RZ/N1 SJA1000 CAN controller Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 082/105] Bluetooth: use memset avoid memory leaks Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 083/105] Bluetooth: Collect kcov coverage from hci_rx_work Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 085/105] Bluetooth: hci_sync: Check LMP feature bit instead of quirk Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 088/105] Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 089/105] Bluetooth: mgmt: Fix using hci_conn_abort Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 090/105] bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 095/105] net: ethernet: stmicro: stmmac: first disable all queues and disconnect in release Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 096/105] net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration Sasha Levin
2022-08-11 15:54 ` Jakub Kicinski
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 097/105] Revert "ath11k: add support for hardware rfkill for QCA6390" Sasha Levin
2022-08-11 15:28 ` [PATCH AUTOSEL 5.19 098/105] wifi: rtl8xxxu: Fix the error handling of the probe function Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220811152851.1520029-10-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stable@vger.kernel.org \
--cc=wangyufen@huawei.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).