From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Jakub Sitnicki <jakub@cloudflare.com>,
Martin KaFai Lau <kafai@fb.com>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.13 06/35] bpf, sockmap: On cleanup we additionally need to remove cached skb
Date: Fri, 6 Aug 2021 10:16:49 +0200 [thread overview]
Message-ID: <20210806081113.920577986@linuxfoundation.org> (raw)
In-Reply-To: <20210806081113.718626745@linuxfoundation.org>
From: John Fastabend <john.fastabend@gmail.com>
[ Upstream commit 476d98018f32e68e7c5d4e8456940cf2b6d66f10 ]
Its possible if a socket is closed and the receive thread is under memory
pressure it may have cached a skb. We need to ensure these skbs are
free'd along with the normal ingress_skb queue.
Before 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") tear
down and backlog processing both had sock_lock for the common case of
socket close or unhash. So it was not possible to have both running in
parrallel so all we would need is the kfree in those kernels.
But, latest kernels include the commit 799aa7f98d5e and this requires a
bit more work. Without the ingress_lock guarding reading/writing the
state->skb case its possible the tear down could run before the state
update causing it to leak memory or worse when the backlog reads the state
it could potentially run interleaved with the tear down and we might end up
free'ing the state->skb from tear down side but already have the reference
from backlog side. To resolve such races we wrap accesses in ingress_lock
on both sides serializing tear down and backlog case. In both cases this
only happens after an EAGAIN error case so having an extra lock in place
is likely fine. The normal path will skip the locks.
Note, we check state->skb before grabbing lock. This works because
we can only enqueue with the mutex we hold already. Avoiding a race
on adding state->skb after the check. And if tear down path is running
that is also fine if the tear down path then removes state->skb we
will simply set skb=NULL and the subsequent goto is skipped. This
slight complication avoids locking in normal case.
With this fix we no longer see this warning splat from tcp side on
socket close when we hit the above case with redirect to ingress self.
[224913.935822] WARNING: CPU: 3 PID: 32100 at net/core/stream.c:208 sk_stream_kill_queues+0x212/0x220
[224913.935841] Modules linked in: fuse overlay bpf_preload x86_pkg_temp_thermal intel_uncore wmi_bmof squashfs sch_fq_codel efivarfs ip_tables x_tables uas xhci_pci ixgbe mdio xfrm_algo xhci_hcd wmi
[224913.935897] CPU: 3 PID: 32100 Comm: fgs-bench Tainted: G I 5.14.0-rc1alu+ #181
[224913.935908] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[224913.935914] RIP: 0010:sk_stream_kill_queues+0x212/0x220
[224913.935923] Code: 8b 83 20 02 00 00 85 c0 75 20 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 df e8 2b 11 fe ff eb c3 0f 0b e9 7c ff ff ff 0f 0b eb ce <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 90 0f 1f 44 00 00 41 57 41
[224913.935932] RSP: 0018:ffff88816271fd38 EFLAGS: 00010206
[224913.935941] RAX: 0000000000000ae8 RBX: ffff88815acd5240 RCX: dffffc0000000000
[224913.935948] RDX: 0000000000000003 RSI: 0000000000000ae8 RDI: ffff88815acd5460
[224913.935954] RBP: ffff88815acd5460 R08: ffffffff955c0ae8 R09: fffffbfff2e6f543
[224913.935961] R10: ffffffff9737aa17 R11: fffffbfff2e6f542 R12: ffff88815acd5390
[224913.935967] R13: ffff88815acd5480 R14: ffffffff98d0c080 R15: ffffffff96267500
[224913.935974] FS: 00007f86e6bd1700(0000) GS:ffff888451cc0000(0000) knlGS:0000000000000000
[224913.935981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[224913.935988] CR2: 000000c0008eb000 CR3: 00000001020e0005 CR4: 00000000003706e0
[224913.935994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[224913.936000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[224913.936007] Call Trace:
[224913.936016] inet_csk_destroy_sock+0xba/0x1f0
[224913.936033] __tcp_close+0x620/0x790
[224913.936047] tcp_close+0x20/0x80
[224913.936056] inet_release+0x8f/0xf0
[224913.936070] __sock_release+0x72/0x120
[224913.936083] sock_close+0x14/0x20
Fixes: a136678c0bdbb ("bpf: sk_msg, zap ingress queue on psock down")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210727160500.1713554-3-john.fastabend@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
net/core/skmsg.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index b088fe07fc00..7e7205e93258 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -613,23 +613,42 @@ static void sock_drop(struct sock *sk, struct sk_buff *skb)
kfree_skb(skb);
}
+static void sk_psock_skb_state(struct sk_psock *psock,
+ struct sk_psock_work_state *state,
+ struct sk_buff *skb,
+ int len, int off)
+{
+ spin_lock_bh(&psock->ingress_lock);
+ if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
+ state->skb = skb;
+ state->len = len;
+ state->off = off;
+ } else {
+ sock_drop(psock->sk, skb);
+ }
+ spin_unlock_bh(&psock->ingress_lock);
+}
+
static void sk_psock_backlog(struct work_struct *work)
{
struct sk_psock *psock = container_of(work, struct sk_psock, work);
struct sk_psock_work_state *state = &psock->work_state;
- struct sk_buff *skb;
+ struct sk_buff *skb = NULL;
bool ingress;
u32 len, off;
int ret;
mutex_lock(&psock->work_mutex);
- if (state->skb) {
+ if (unlikely(state->skb)) {
+ spin_lock_bh(&psock->ingress_lock);
skb = state->skb;
len = state->len;
off = state->off;
state->skb = NULL;
- goto start;
+ spin_unlock_bh(&psock->ingress_lock);
}
+ if (skb)
+ goto start;
while ((skb = skb_dequeue(&psock->ingress_skb))) {
len = skb->len;
@@ -644,9 +663,8 @@ static void sk_psock_backlog(struct work_struct *work)
len, ingress);
if (ret <= 0) {
if (ret == -EAGAIN) {
- state->skb = skb;
- state->len = len;
- state->off = off;
+ sk_psock_skb_state(psock, state, skb,
+ len, off);
goto end;
}
/* Hard errors break pipe and stop xmit. */
@@ -745,6 +763,11 @@ static void __sk_psock_zap_ingress(struct sk_psock *psock)
skb_bpf_redirect_clear(skb);
sock_drop(psock->sk, skb);
}
+ kfree_skb(psock->work_state.skb);
+ /* We null the skb here to ensure that calls to sk_psock_backlog
+ * do not pick up the free'd skb.
+ */
+ psock->work_state.skb = NULL;
__sk_psock_purge_ingress_msg(psock);
}
--
2.30.2
next prev parent reply other threads:[~2021-08-06 8:22 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-06 8:16 [PATCH 5.13 00/35] 5.13.9-rc1 review Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 01/35] drm/i915: Revert "drm/i915/gem: Asynchronous cmdparser" Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 02/35] Revert "drm/i915: Propagate errors on awaiting already signaled fences" Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 03/35] power: supply: ab8500: Call battery population once Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 04/35] skmsg: Increase sk->sk_drops when dropping packets Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 05/35] skmsg: Pass source psock to sk_psock_skb_redirect() Greg Kroah-Hartman
2021-08-06 8:16 ` Greg Kroah-Hartman [this message]
2021-08-07 17:45 ` [PATCH 5.13 06/35] bpf, sockmap: On cleanup we additionally need to remove cached skb Naresh Kamboju
2021-08-09 17:33 ` John Fastabend
2021-08-06 8:16 ` [PATCH 5.13 07/35] cifs: use helpers when parsing uid/gid mount options and validate them Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 08/35] cifs: add missing parsing of backupuid Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 09/35] net: dsa: sja1105: parameterize the number of ports Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 10/35] net: dsa: sja1105: fix address learning getting disabled on the CPU port Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 11/35] ASoC: Intel: boards: handle hda-dsp-common as a module Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 12/35] ASoC: Intel: boards: create sof-maxim-common module Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 13/35] ASoC: Intel: boards: fix xrun issue on platform with max98373 Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 14/35] regulator: rtmv20: Fix wrong mask for strobe-polarity-high Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 15/35] regulator: rt5033: Fix n_voltages settings for BUCK and LDO Greg Kroah-Hartman
2021-08-06 8:16 ` [PATCH 5.13 16/35] spi: stm32h7: fix full duplex irq handler handling Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 17/35] ASoC: tlv320aic31xx: fix reversed bclk/wclk master bits Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 18/35] regulator: mtk-dvfsrc: Fix wrong dev pointer for devm_regulator_register Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 19/35] r8152: Fix potential PM refcount imbalance Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 20/35] r8152: Fix a deadlock by doubly PM resume Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 21/35] qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union() Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 22/35] ASoC: rt5682: Fix the issue of garbled recording after powerd_dbus_suspend Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 23/35] net: Fix zero-copy head len calculation Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 24/35] ASoC: ti: j721e-evm: Fix unbalanced domain activity tracking during startup Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 25/35] ASoC: ti: j721e-evm: Check for not initialized parent_clk_id Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 26/35] efi/mokvar: Reserve the table only if it is in boot services data Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 27/35] nvme: fix nvme_setup_command metadata trace event Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 28/35] drm/amd/display: Fix comparison error in dcn21 DML Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 29/35] drm/amd/display: Fix max vstartup calculation for modes with borders Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 30/35] io_uring: never attempt iopoll reissue from release path Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 31/35] io_uring: explicitly catch any illegal async queue attempt Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 32/35] Revert "spi: mediatek: fix fifo rx mode" Greg Kroah-Hartman
2021-08-06 18:54 ` Guenter Roeck
2021-08-07 8:20 ` Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 33/35] Revert "Bluetooth: Shutdown controller after workqueues are flushed or cancelled" Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 34/35] Revert "watchdog: iTCO_wdt: Account for rebooting on second timeout" Greg Kroah-Hartman
2021-08-06 8:17 ` [PATCH 5.13 35/35] drm/amd/display: Fix ASSR regression on embedded panels Greg Kroah-Hartman
2021-08-06 10:44 ` [PATCH 5.13 00/35] 5.13.9-rc1 review Fox Chen
2021-08-06 14:33 ` Jon Hunter
2021-08-06 18:59 ` Guenter Roeck
2021-08-06 23:45 ` Justin Forbes
2021-08-07 13:16 ` Aakash Hemadri
2021-08-07 17:55 ` Naresh Kamboju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210806081113.920577986@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=andrii@kernel.org \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox