From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Christoph Paasch <cpaasch@apple.com>,
Paolo Abeni <pabeni@redhat.com>,
Matthieu Baerts <matthieu.baerts@tessares.net>,
"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 6.1 06/16] mptcp: fix accept vs worker race
Date: Fri, 28 Apr 2023 13:27:58 +0200 [thread overview]
Message-ID: <20230428112040.280464855@linuxfoundation.org> (raw)
In-Reply-To: <20230428112040.063291126@linuxfoundation.org>
From: Paolo Abeni <pabeni@redhat.com>
commit 63740448a32eb662e05894425b47bcc5814136f4 upstream.
The mptcp worker and mptcp_accept() can race, as reported by Christoph:
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 14351 at lib/refcount.c:25 refcount_warn_saturate+0x105/0x1b0 lib/refcount.c:25
Modules linked in:
CPU: 1 PID: 14351 Comm: syz-executor.2 Not tainted 6.3.0-rc1-gde5e8fd0123c #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:refcount_warn_saturate+0x105/0x1b0 lib/refcount.c:25
Code: 02 31 ff 89 de e8 1b f0 a7 ff 84 db 0f 85 6e ff ff ff e8 3e f5 a7 ff 48 c7 c7 d8 c7 34 83 c6 05 6d 2d 0f 02 01 e8 cb 3d 90 ff <0f> 0b e9 4f ff ff ff e8 1f f5 a7 ff 0f b6 1d 54 2d 0f 02 31 ff 89
RSP: 0018:ffffc90000a47bf8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88802eae98c0 RSI: ffffffff81097d4f RDI: 0000000000000001
RBP: ffff88802e712180 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: ffff88802eaea148 R12: ffff88802e712100
R13: ffff88802e712a88 R14: ffff888005cb93a8 R15: ffff88802e712a88
FS: 0000000000000000(0000) GS:ffff88803ed00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f277fd89120 CR3: 0000000035486002 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__refcount_add include/linux/refcount.h:199 [inline]
__refcount_inc include/linux/refcount.h:250 [inline]
refcount_inc include/linux/refcount.h:267 [inline]
sock_hold include/net/sock.h:775 [inline]
__mptcp_close+0x4c6/0x4d0 net/mptcp/protocol.c:3051
mptcp_close+0x24/0xe0 net/mptcp/protocol.c:3072
inet_release+0x56/0xa0 net/ipv4/af_inet.c:429
__sock_release+0x51/0xf0 net/socket.c:653
sock_close+0x18/0x20 net/socket.c:1395
__fput+0x113/0x430 fs/file_table.c:321
task_work_run+0x96/0x100 kernel/task_work.c:179
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0x4fc/0x10c0 kernel/exit.c:869
do_group_exit+0x51/0xf0 kernel/exit.c:1019
get_signal+0x12b0/0x1390 kernel/signal.c:2859
arch_do_signal_or_restart+0x25/0x260 arch/x86/kernel/signal.c:306
exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
exit_to_user_mode_prepare+0x131/0x1a0 kernel/entry/common.c:203
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x19/0x40 kernel/entry/common.c:296
do_syscall_64+0x46/0x90 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fec4b4926a9
Code: Unable to access opcode bytes at 0x7fec4b49267f.
RSP: 002b:00007fec49f9dd78 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00000000006bc058 RCX: 00007fec4b4926a9
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00000000006bc058
RBP: 00000000006bc050 R08: 00000000007df998 R09: 00000000007df998
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006bc05c
R13: fffffffffffffea8 R14: 000000000000000b R15: 000000000001fe40
</TASK>
The root cause is that the worker can force fallback to TCP the first
mptcp subflow, actually deleting the unaccepted msk socket.
We can explicitly prevent the race delaying the unaccepted msk deletion
at listener shutdown time. In case the closed subflow is later accepted,
just drop the mptcp context and let the user-space deal with the
paired mptcp socket.
Fixes: b6985b9b8295 ("mptcp: use the workqueue to destroy unaccepted sockets")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/375
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/mptcp/protocol.c | 68 +++++++++++++++++++++++++++++++++------------------
net/mptcp/protocol.h | 1
net/mptcp/subflow.c | 22 +++++++++-------
3 files changed, 58 insertions(+), 33 deletions(-)
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2330,7 +2330,26 @@ static void __mptcp_close_ssk(struct soc
unsigned int flags)
{
struct mptcp_sock *msk = mptcp_sk(sk);
- bool need_push, dispose_it;
+ bool dispose_it, need_push = false;
+
+ /* If the first subflow moved to a close state before accept, e.g. due
+ * to an incoming reset, mptcp either:
+ * - if either the subflow or the msk are dead, destroy the context
+ * (the subflow socket is deleted by inet_child_forget) and the msk
+ * - otherwise do nothing at the moment and take action at accept and/or
+ * listener shutdown - user-space must be able to accept() the closed
+ * socket.
+ */
+ if (msk->in_accept_queue && msk->first == ssk) {
+ if (!sock_flag(sk, SOCK_DEAD) && !sock_flag(ssk, SOCK_DEAD))
+ return;
+
+ /* ensure later check in mptcp_worker() will dispose the msk */
+ sock_set_flag(sk, SOCK_DEAD);
+ lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
+ mptcp_subflow_drop_ctx(ssk);
+ goto out_release;
+ }
dispose_it = !msk->subflow || ssk != msk->subflow->sk;
if (dispose_it)
@@ -2366,18 +2385,6 @@ static void __mptcp_close_ssk(struct soc
if (!inet_csk(ssk)->icsk_ulp_ops) {
WARN_ON_ONCE(!sock_flag(ssk, SOCK_DEAD));
kfree_rcu(subflow, rcu);
- } else if (msk->in_accept_queue && msk->first == ssk) {
- /* if the first subflow moved to a close state, e.g. due to
- * incoming reset and we reach here before inet_child_forget()
- * the TCP stack could later try to close it via
- * inet_csk_listen_stop(), or deliver it to the user space via
- * accept().
- * We can't delete the subflow - or risk a double free - nor let
- * the msk survive - or will be leaked in the non accept scenario:
- * fallback and let TCP cope with the subflow cleanup.
- */
- WARN_ON_ONCE(sock_flag(ssk, SOCK_DEAD));
- mptcp_subflow_drop_ctx(ssk);
} else {
/* otherwise tcp will dispose of the ssk and subflow ctx */
if (ssk->sk_state == TCP_LISTEN) {
@@ -2391,6 +2398,8 @@ static void __mptcp_close_ssk(struct soc
/* close acquired an extra ref */
__sock_put(ssk);
}
+
+out_release:
release_sock(ssk);
sock_put(ssk);
@@ -2445,21 +2454,14 @@ static void __mptcp_close_subflow(struct
mptcp_close_ssk(sk, ssk, subflow);
}
- /* if the MPC subflow has been closed before the msk is accepted,
- * msk will never be accept-ed, close it now
- */
- if (!msk->first && msk->in_accept_queue) {
- sock_set_flag(sk, SOCK_DEAD);
- inet_sk_state_store(sk, TCP_CLOSE);
- }
}
-static bool mptcp_check_close_timeout(const struct sock *sk)
+static bool mptcp_should_close(const struct sock *sk)
{
s32 delta = tcp_jiffies32 - inet_csk(sk)->icsk_mtup.probe_timestamp;
struct mptcp_subflow_context *subflow;
- if (delta >= TCP_TIMEWAIT_LEN)
+ if (delta >= TCP_TIMEWAIT_LEN || mptcp_sk(sk)->in_accept_queue)
return true;
/* if all subflows are in closed status don't bother with additional
@@ -2667,7 +2669,7 @@ static void mptcp_worker(struct work_str
* even if it is orphaned and in FIN_WAIT2 state
*/
if (sock_flag(sk, SOCK_DEAD)) {
- if (mptcp_check_close_timeout(sk)) {
+ if (mptcp_should_close(sk)) {
inet_sk_state_store(sk, TCP_CLOSE);
mptcp_do_fastclose(sk);
}
@@ -2912,6 +2914,14 @@ static void __mptcp_destroy_sock(struct
sock_put(sk);
}
+void __mptcp_unaccepted_force_close(struct sock *sk)
+{
+ sock_set_flag(sk, SOCK_DEAD);
+ inet_sk_state_store(sk, TCP_CLOSE);
+ mptcp_do_fastclose(sk);
+ __mptcp_destroy_sock(sk);
+}
+
static __poll_t mptcp_check_readable(struct mptcp_sock *msk)
{
/* Concurrent splices from sk_receive_queue into receive_queue will
@@ -3759,6 +3769,18 @@ static int mptcp_stream_accept(struct so
if (!ssk->sk_socket)
mptcp_sock_graft(ssk, newsock);
}
+
+ /* Do late cleanup for the first subflow as necessary. Also
+ * deal with bad peers not doing a complete shutdown.
+ */
+ if (msk->first &&
+ unlikely(inet_sk_state_load(msk->first) == TCP_CLOSE)) {
+ __mptcp_close_ssk(newsk, msk->first,
+ mptcp_subflow_ctx(msk->first), 0);
+ if (unlikely(list_empty(&msk->conn_list)))
+ inet_sk_state_store(newsk, TCP_CLOSE);
+ }
+
release_sock(newsk);
}
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -620,6 +620,7 @@ void mptcp_sock_graft(struct sock *sk, s
struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk);
bool __mptcp_close(struct sock *sk, long timeout);
void mptcp_cancel_work(struct sock *sk);
+void __mptcp_unaccepted_force_close(struct sock *sk);
bool mptcp_addresses_equal(const struct mptcp_addr_info *a,
const struct mptcp_addr_info *b, bool use_port);
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -661,9 +661,12 @@ void mptcp_subflow_drop_ctx(struct sock
if (!ctx)
return;
- subflow_ulp_fallback(ssk, ctx);
- if (ctx->conn)
- sock_put(ctx->conn);
+ list_del(&mptcp_subflow_ctx(ssk)->node);
+ if (inet_csk(ssk)->icsk_ulp_ops) {
+ subflow_ulp_fallback(ssk, ctx);
+ if (ctx->conn)
+ sock_put(ctx->conn);
+ }
kfree_rcu(ctx, rcu);
}
@@ -1763,6 +1766,7 @@ void mptcp_subflow_queue_clean(struct so
struct request_sock_queue *queue = &inet_csk(listener_ssk)->icsk_accept_queue;
struct mptcp_sock *msk, *next, *head = NULL;
struct request_sock *req;
+ struct sock *sk;
/* build a list of all unaccepted mptcp sockets */
spin_lock_bh(&queue->rskq_lock);
@@ -1778,11 +1782,12 @@ void mptcp_subflow_queue_clean(struct so
continue;
/* skip if already in list */
- msk = mptcp_sk(subflow->conn);
+ sk = subflow->conn;
+ msk = mptcp_sk(sk);
if (msk->dl_next || msk == head)
continue;
- sock_hold(subflow->conn);
+ sock_hold(sk);
msk->dl_next = head;
head = msk;
}
@@ -1796,16 +1801,13 @@ void mptcp_subflow_queue_clean(struct so
release_sock(listener_ssk);
for (msk = head; msk; msk = next) {
- struct sock *sk = (struct sock *)msk;
+ sk = (struct sock *)msk;
lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
next = msk->dl_next;
msk->dl_next = NULL;
- /* prevent the stack from later re-schedule the worker for
- * this socket
- */
- inet_sk_state_store(sk, TCP_CLOSE);
+ __mptcp_unaccepted_force_close(sk);
release_sock(sk);
/* lockdep will report a false positive ABBA deadlock
next prev parent reply other threads:[~2023-04-28 11:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-28 11:27 [PATCH 6.1 00/16] 6.1.27-rc1 review Greg Kroah-Hartman
2023-04-28 11:27 ` [PATCH 6.1 01/16] um: Only disable SSE on clang to work around old GCC bugs Greg Kroah-Hartman
2023-04-28 11:27 ` [PATCH 6.1 02/16] phy: phy-brcm-usb: Utilize platform_get_irq_byname_optional() Greg Kroah-Hartman
2023-04-28 11:27 ` [PATCH 6.1 03/16] KVM: arm64: Retry fault if vma_lookup() results become invalid Greg Kroah-Hartman
2023-04-28 11:27 ` [PATCH 6.1 04/16] mm/mempolicy: fix use-after-free of VMA iterator Greg Kroah-Hartman
2023-04-28 11:27 ` [PATCH 6.1 05/16] mptcp: stops worker on unaccepted sockets at listener close Greg Kroah-Hartman
2023-04-28 11:27 ` Greg Kroah-Hartman [this message]
2023-04-28 11:27 ` [PATCH 6.1 07/16] wifi: brcmfmac: slab-out-of-bounds read in brcmf_get_assoc_ies() Greg Kroah-Hartman
2023-04-28 11:28 ` [PATCH 6.1 08/16] drm/fb-helper: set x/yres_virtual in drm_fb_helper_check_var Greg Kroah-Hartman
2023-04-28 11:28 ` [PATCH 6.1 09/16] gpiolib: acpi: Add a ignore wakeup quirk for Clevo NL5xNU Greg Kroah-Hartman
2023-04-28 11:28 ` [PATCH 6.1 10/16] bluetooth: Perform careful capability checks in hci_sock_ioctl() Greg Kroah-Hartman
2023-04-28 11:28 ` [PATCH 6.1 11/16] btrfs: fix uninitialized variable warnings Greg Kroah-Hartman
2023-04-28 11:28 ` [PATCH 6.1 12/16] USB: serial: option: add UNISOC vendor and TOZED LT70C product Greg Kroah-Hartman
2023-04-28 11:28 ` [PATCH 6.1 13/16] driver core: Dont require dynamic_debug for initcall_debug probe timing Greg Kroah-Hartman
2023-04-28 11:28 ` [PATCH 6.1 14/16] riscv: Move early dtb mapping into the fixmap region Greg Kroah-Hartman
2023-04-28 11:28 ` [PATCH 6.1 15/16] riscv: Do not set initial_boot_params to the linear address of the dtb Greg Kroah-Hartman
2023-04-28 11:28 ` [PATCH 6.1 16/16] riscv: No need to relocate the dtb as it lies in the fixmap region Greg Kroah-Hartman
2023-04-28 14:35 ` [PATCH 6.1 00/16] 6.1.27-rc1 review Markus Reichelt
2023-04-28 22:06 ` Naresh Kamboju
2023-04-28 22:29 ` Shuah Khan
2023-04-29 4:11 ` Guenter Roeck
2023-04-29 6:08 ` Ron Economos
2023-04-29 7:43 ` Bagas Sanjaya
2023-04-29 9:54 ` Conor Dooley
2023-04-29 9:56 ` ogasawara takeshi
2023-04-29 17:14 ` Florian Fainelli
2023-05-02 5:39 ` Chris Paterson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230428112040.280464855@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=cpaasch@apple.com \
--cc=davem@davemloft.net \
--cc=matthieu.baerts@tessares.net \
--cc=pabeni@redhat.com \
--cc=patches@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).