* [PATCH net 0/6] mptcp: fixes for 6.4
@ 2023-06-20 16:24 Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 1/6] mptcp: handle correctly disconnect() failures Matthieu Baerts
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Matthieu Baerts @ 2023-06-20 16:24 UTC (permalink / raw)
To: mptcp, Mat Martineau, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Florian Westphal
Cc: netdev, linux-kernel, Matthieu Baerts, stable, Christoph Paasch
Patch 1 correctly handles disconnect() failures that can happen in some
specific cases: now the socket state is set as unconnected as expected.
That fixes an issue introduced in v6.2.
Patch 2 fixes a divide by zero bug in mptcp_recvmsg() with a fix similar
to a recent one from Eric Dumazet for TCP introducing sk_wait_pending
flag. It should address an issue present in MPTCP from almost the
beginning, from v5.9.
Patch 3 fixes a possible list corruption on passive MPJ even if the race
seems very unlikely, better be safe than sorry. The possible issue is
present from v5.17.
Patch 4 consolidates fallback and non fallback state machines to avoid
leaking some MPTCP sockets. The fix is likely needed for versions from
v5.11.
Patch 5 drops code that is no longer used after the introduction of
patch 4/6. This is not really a fix but this patch can probably land in
the -net tree as well not to leave unused code.
Patch 6 ensures listeners are unhashed before updating their sk status
to avoid possible deadlocks when diag info are going to be retrieved
with a lock. Even if it should not be visible with the way we are
currently getting diag info, the issue is present from v5.17.
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
---
Paolo Abeni (6):
mptcp: handle correctly disconnect() failures
mptcp: fix possible divide by zero in recvmsg()
mptcp: fix possible list corruption on passive MPJ
mptcp: consolidate fallback and non fallback state machine
mptcp: drop legacy code around RX EOF
mptcp: ensure listener is unhashed before updating the sk status
net/mptcp/pm_netlink.c | 1 +
net/mptcp/protocol.c | 160 ++++++++++++++++++++-----------------------------
net/mptcp/protocol.h | 5 +-
net/mptcp/subflow.c | 17 +++---
4 files changed, 76 insertions(+), 107 deletions(-)
---
base-commit: 9a43827e876c9a071826cc81783aa2222b020f1d
change-id: 20230620-upstream-net-20230620-misc-fixes-for-v6-4-55ef43802324
Best regards,
--
Matthieu Baerts <matthieu.baerts@tessares.net>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net 1/6] mptcp: handle correctly disconnect() failures
2023-06-20 16:24 [PATCH net 0/6] mptcp: fixes for 6.4 Matthieu Baerts
@ 2023-06-20 16:24 ` Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 2/6] mptcp: fix possible divide by zero in recvmsg() Matthieu Baerts
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts @ 2023-06-20 16:24 UTC (permalink / raw)
To: mptcp, Mat Martineau, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Florian Westphal
Cc: netdev, linux-kernel, Matthieu Baerts, stable, Christoph Paasch
From: Paolo Abeni <pabeni@redhat.com>
Currently the mptcp code has assumes that disconnect() can fail only
at mptcp_sendmsg_fastopen() time - to avoid a deadlock scenario - and
don't even bother returning an error code.
Soon mptcp_disconnect() will handle more error conditions: let's track
them explicitly.
As a bonus, explicitly annotate TCP-level disconnect as not failing:
the mptcp code never blocks for event on the subflows.
Fixes: 7d803344fdc3 ("mptcp: fix deadlock in fastopen error path")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
---
net/mptcp/protocol.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 67311e7d5b21..86f8a7621aff 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1727,7 +1727,13 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
if (ret && ret != -EINPROGRESS && ret != -ERESTARTSYS && ret != -EINTR)
*copied_syn = 0;
} else if (ret && ret != -EINPROGRESS) {
- mptcp_disconnect(sk, 0);
+ /* The disconnect() op called by tcp_sendmsg_fastopen()/
+ * __inet_stream_connect() can fail, due to looking check,
+ * see mptcp_disconnect().
+ * Attempt it again outside the problematic scope.
+ */
+ if (!mptcp_disconnect(sk, 0))
+ sk->sk_socket->state = SS_UNCONNECTED;
}
inet_sk(sk)->defer_connect = 0;
@@ -2389,7 +2395,10 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
need_push = (flags & MPTCP_CF_PUSH) && __mptcp_retransmit_pending_data(sk);
if (!dispose_it) {
- tcp_disconnect(ssk, 0);
+ /* The MPTCP code never wait on the subflow sockets, TCP-level
+ * disconnect should never fail
+ */
+ WARN_ON_ONCE(tcp_disconnect(ssk, 0));
msk->subflow->state = SS_UNCONNECTED;
mptcp_subflow_ctx_reset(subflow);
release_sock(ssk);
@@ -2812,7 +2821,7 @@ void mptcp_subflow_shutdown(struct sock *sk, struct sock *ssk, int how)
break;
fallthrough;
case TCP_SYN_SENT:
- tcp_disconnect(ssk, O_NONBLOCK);
+ WARN_ON_ONCE(tcp_disconnect(ssk, O_NONBLOCK));
break;
default:
if (__mptcp_check_fallback(mptcp_sk(sk))) {
@@ -3075,11 +3084,10 @@ static int mptcp_disconnect(struct sock *sk, int flags)
/* We are on the fastopen error path. We can't call straight into the
* subflows cleanup code due to lock nesting (we are already under
- * msk->firstsocket lock). Do nothing and leave the cleanup to the
- * caller.
+ * msk->firstsocket lock).
*/
if (msk->fastopening)
- return 0;
+ return -EBUSY;
mptcp_listen_inuse_dec(sk);
inet_sk_state_store(sk, TCP_CLOSE);
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 2/6] mptcp: fix possible divide by zero in recvmsg()
2023-06-20 16:24 [PATCH net 0/6] mptcp: fixes for 6.4 Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 1/6] mptcp: handle correctly disconnect() failures Matthieu Baerts
@ 2023-06-20 16:24 ` Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 3/6] mptcp: fix possible list corruption on passive MPJ Matthieu Baerts
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts @ 2023-06-20 16:24 UTC (permalink / raw)
To: mptcp, Mat Martineau, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Florian Westphal
Cc: netdev, linux-kernel, Matthieu Baerts, stable, Christoph Paasch
From: Paolo Abeni <pabeni@redhat.com>
Christoph reported a divide by zero bug in mptcp_recvmsg():
divide error: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 19978 Comm: syz-executor.6 Not tainted 6.4.0-rc2-gffcc7899081b #20
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:__tcp_select_window+0x30e/0x420 net/ipv4/tcp_output.c:3018
Code: 11 ff 0f b7 cd c1 e9 0c b8 ff ff ff ff d3 e0 89 c1 f7 d1 01 cb 21 c3 eb 17 e8 2e 83 11 ff 31 db eb 0e e8 25 83 11 ff 89 d8 99 <f7> 7c 24 04 29 d3 65 48 8b 04 25 28 00 00 00 48 3b 44 24 10 75 60
RSP: 0018:ffffc90000a07a18 EFLAGS: 00010246
RAX: 000000000000ffd7 RBX: 000000000000ffd7 RCX: 0000000000040000
RDX: 0000000000000000 RSI: 000000000003ffff RDI: 0000000000040000
RBP: 000000000000ffd7 R08: ffffffff820cf297 R09: 0000000000000001
R10: 0000000000000000 R11: ffffffff8103d1a0 R12: 0000000000003f00
R13: 0000000000300000 R14: ffff888101cf3540 R15: 0000000000180000
FS: 00007f9af4c09640(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b33824000 CR3: 000000012f241001 CR4: 0000000000170ee0
Call Trace:
<TASK>
__tcp_cleanup_rbuf+0x138/0x1d0 net/ipv4/tcp.c:1611
mptcp_recvmsg+0xcb8/0xdd0 net/mptcp/protocol.c:2034
inet_recvmsg+0x127/0x1f0 net/ipv4/af_inet.c:861
____sys_recvmsg+0x269/0x2b0 net/socket.c:1019
___sys_recvmsg+0xe6/0x260 net/socket.c:2764
do_recvmmsg+0x1a5/0x470 net/socket.c:2858
__do_sys_recvmmsg net/socket.c:2937 [inline]
__se_sys_recvmmsg net/socket.c:2953 [inline]
__x64_sys_recvmmsg+0xa6/0x130 net/socket.c:2953
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x47/0xa0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f9af58fc6a9
Code: 5c c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4f 37 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007f9af4c08cd8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 00000000006bc050 RCX: 00007f9af58fc6a9
RDX: 0000000000000001 RSI: 0000000020000140 RDI: 0000000000000004
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000f00 R11: 0000000000000246 R12: 00000000006bc05c
R13: fffffffffffffea8 R14: 00000000006bc050 R15: 000000000001fe40
</TASK>
mptcp_recvmsg is allowed to release the msk socket lock when
blocking, and before re-acquiring it another thread could have
switched the sock to TCP_LISTEN status - with a prior
connect(AF_UNSPEC) - also clearing icsk_ack.rcv_mss.
Address the issue preventing the disconnect if some other process is
concurrently performing a blocking syscall on the same socket, alike
commit 4faeee0cf8a5 ("tcp: deny tcp_disconnect() when threads are waiting").
Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning")
Cc: stable@vger.kernel.org
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/404
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
---
net/mptcp/protocol.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 86f8a7621aff..ee357700b27b 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3082,6 +3082,12 @@ static int mptcp_disconnect(struct sock *sk, int flags)
{
struct mptcp_sock *msk = mptcp_sk(sk);
+ /* Deny disconnect if other threads are blocked in sk_wait_event()
+ * or inet_wait_for_connect().
+ */
+ if (sk->sk_wait_pending)
+ return -EBUSY;
+
/* We are on the fastopen error path. We can't call straight into the
* subflows cleanup code due to lock nesting (we are already under
* msk->firstsocket lock).
@@ -3148,6 +3154,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *sk,
inet_sk(nsk)->pinet6 = mptcp_inet6_sk(nsk);
#endif
+ nsk->sk_wait_pending = 0;
__mptcp_init_sock(nsk);
msk = mptcp_sk(nsk);
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 3/6] mptcp: fix possible list corruption on passive MPJ
2023-06-20 16:24 [PATCH net 0/6] mptcp: fixes for 6.4 Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 1/6] mptcp: handle correctly disconnect() failures Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 2/6] mptcp: fix possible divide by zero in recvmsg() Matthieu Baerts
@ 2023-06-20 16:24 ` Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 4/6] mptcp: consolidate fallback and non fallback state machine Matthieu Baerts
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts @ 2023-06-20 16:24 UTC (permalink / raw)
To: mptcp, Mat Martineau, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Florian Westphal
Cc: netdev, linux-kernel, Matthieu Baerts, stable
From: Paolo Abeni <pabeni@redhat.com>
At passive MPJ time, if the msk socket lock is held by the user,
the new subflow is appended to the msk->join_list under the msk
data lock.
In mptcp_release_cb()/__mptcp_flush_join_list(), the subflows in
that list are moved from the join_list into the conn_list under the
msk socket lock.
Append and removal could race, possibly corrupting such list.
Address the issue splicing the join list into a temporary one while
still under the msk data lock.
Found by code inspection, the race itself should be almost impossible
to trigger in practice.
Fixes: 3e5014909b56 ("mptcp: cleanup MPJ subflow list handling")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
---
net/mptcp/protocol.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ee357700b27b..9a40dae31cec 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -850,12 +850,12 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
return true;
}
-static void __mptcp_flush_join_list(struct sock *sk)
+static void __mptcp_flush_join_list(struct sock *sk, struct list_head *join_list)
{
struct mptcp_subflow_context *tmp, *subflow;
struct mptcp_sock *msk = mptcp_sk(sk);
- list_for_each_entry_safe(subflow, tmp, &msk->join_list, node) {
+ list_for_each_entry_safe(subflow, tmp, join_list, node) {
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
bool slow = lock_sock_fast(ssk);
@@ -3342,9 +3342,14 @@ static void mptcp_release_cb(struct sock *sk)
for (;;) {
unsigned long flags = (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED) |
msk->push_pending;
+ struct list_head join_list;
+
if (!flags)
break;
+ INIT_LIST_HEAD(&join_list);
+ list_splice_init(&msk->join_list, &join_list);
+
/* the following actions acquire the subflow socket lock
*
* 1) can't be invoked in atomic scope
@@ -3355,8 +3360,9 @@ static void mptcp_release_cb(struct sock *sk)
msk->push_pending = 0;
msk->cb_flags &= ~flags;
spin_unlock_bh(&sk->sk_lock.slock);
+
if (flags & BIT(MPTCP_FLUSH_JOIN_LIST))
- __mptcp_flush_join_list(sk);
+ __mptcp_flush_join_list(sk, &join_list);
if (flags & BIT(MPTCP_PUSH_PENDING))
__mptcp_push_pending(sk, 0);
if (flags & BIT(MPTCP_RETRANSMIT))
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 4/6] mptcp: consolidate fallback and non fallback state machine
2023-06-20 16:24 [PATCH net 0/6] mptcp: fixes for 6.4 Matthieu Baerts
` (2 preceding siblings ...)
2023-06-20 16:24 ` [PATCH net 3/6] mptcp: fix possible list corruption on passive MPJ Matthieu Baerts
@ 2023-06-20 16:24 ` Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 5/6] mptcp: drop legacy code around RX EOF Matthieu Baerts
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts @ 2023-06-20 16:24 UTC (permalink / raw)
To: mptcp, Mat Martineau, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Florian Westphal
Cc: netdev, linux-kernel, Matthieu Baerts, stable
From: Paolo Abeni <pabeni@redhat.com>
An orphaned msk releases the used resources via the worker,
when the latter first see the msk in CLOSED status.
If the msk status transitions to TCP_CLOSE in the release callback
invoked by the worker's final release_sock(), such instance of the
workqueue will not take any action.
Additionally the MPTCP code prevents scheduling the worker once the
socket reaches the CLOSE status: such msk resources will be leaked.
The only code path that can trigger the above scenario is the
__mptcp_check_send_data_fin() in fallback mode.
Address the issue removing the special handling of fallback socket
in __mptcp_check_send_data_fin(), consolidating the state machine
for fallback and non fallback socket.
Since non-fallback sockets do not send and do not receive data_fin,
the mptcp code can update the msk internal status to match the next
step in the SM every time data fin (ack) should be generated or
received.
As a consequence we can remove a bunch of checks for fallback from
the fastpath.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
---
net/mptcp/protocol.c | 41 +++++++++++++++--------------------------
net/mptcp/subflow.c | 17 ++++++++++-------
2 files changed, 25 insertions(+), 33 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 9a40dae31cec..27d206f7af62 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -44,7 +44,7 @@ enum {
static struct percpu_counter mptcp_sockets_allocated ____cacheline_aligned_in_smp;
static void __mptcp_destroy_sock(struct sock *sk);
-static void __mptcp_check_send_data_fin(struct sock *sk);
+static void mptcp_check_send_data_fin(struct sock *sk);
DEFINE_PER_CPU(struct mptcp_delegated_action, mptcp_delegated_actions);
static struct net_device mptcp_napi_dev;
@@ -424,8 +424,7 @@ static bool mptcp_pending_data_fin_ack(struct sock *sk)
{
struct mptcp_sock *msk = mptcp_sk(sk);
- return !__mptcp_check_fallback(msk) &&
- ((1 << sk->sk_state) &
+ return ((1 << sk->sk_state) &
(TCPF_FIN_WAIT1 | TCPF_CLOSING | TCPF_LAST_ACK)) &&
msk->write_seq == READ_ONCE(msk->snd_una);
}
@@ -583,9 +582,6 @@ static bool mptcp_check_data_fin(struct sock *sk)
u64 rcv_data_fin_seq;
bool ret = false;
- if (__mptcp_check_fallback(msk))
- return ret;
-
/* Need to ack a DATA_FIN received from a peer while this side
* of the connection is in ESTABLISHED, FIN_WAIT1, or FIN_WAIT2.
* msk->rcv_data_fin was set when parsing the incoming options
@@ -623,7 +619,8 @@ static bool mptcp_check_data_fin(struct sock *sk)
}
ret = true;
- mptcp_send_ack(msk);
+ if (!__mptcp_check_fallback(msk))
+ mptcp_send_ack(msk);
mptcp_close_wake_up(sk);
}
return ret;
@@ -1609,7 +1606,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
if (!mptcp_timer_pending(sk))
mptcp_reset_timer(sk);
if (do_check_data_fin)
- __mptcp_check_send_data_fin(sk);
+ mptcp_check_send_data_fin(sk);
}
static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool first)
@@ -2680,8 +2677,6 @@ static void mptcp_worker(struct work_struct *work)
if (unlikely((1 << state) & (TCPF_CLOSE | TCPF_LISTEN)))
goto unlock;
- mptcp_check_data_fin_ack(sk);
-
mptcp_check_fastclose(msk);
mptcp_pm_nl_work(msk);
@@ -2689,7 +2684,8 @@ static void mptcp_worker(struct work_struct *work)
if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags))
mptcp_check_for_eof(msk);
- __mptcp_check_send_data_fin(sk);
+ mptcp_check_send_data_fin(sk);
+ mptcp_check_data_fin_ack(sk);
mptcp_check_data_fin(sk);
if (test_and_clear_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags))
@@ -2828,6 +2824,12 @@ void mptcp_subflow_shutdown(struct sock *sk, struct sock *ssk, int how)
pr_debug("Fallback");
ssk->sk_shutdown |= how;
tcp_shutdown(ssk, how);
+
+ /* simulate the data_fin ack reception to let the state
+ * machine move forward
+ */
+ WRITE_ONCE(mptcp_sk(sk)->snd_una, mptcp_sk(sk)->snd_nxt);
+ mptcp_schedule_work(sk);
} else {
pr_debug("Sending DATA_FIN on subflow %p", ssk);
tcp_send_ack(ssk);
@@ -2867,7 +2869,7 @@ static int mptcp_close_state(struct sock *sk)
return next & TCP_ACTION_FIN;
}
-static void __mptcp_check_send_data_fin(struct sock *sk)
+static void mptcp_check_send_data_fin(struct sock *sk)
{
struct mptcp_subflow_context *subflow;
struct mptcp_sock *msk = mptcp_sk(sk);
@@ -2885,19 +2887,6 @@ static void __mptcp_check_send_data_fin(struct sock *sk)
WRITE_ONCE(msk->snd_nxt, msk->write_seq);
- /* fallback socket will not get data_fin/ack, can move to the next
- * state now
- */
- if (__mptcp_check_fallback(msk)) {
- WRITE_ONCE(msk->snd_una, msk->write_seq);
- if ((1 << sk->sk_state) & (TCPF_CLOSING | TCPF_LAST_ACK)) {
- inet_sk_state_store(sk, TCP_CLOSE);
- mptcp_close_wake_up(sk);
- } else if (sk->sk_state == TCP_FIN_WAIT1) {
- inet_sk_state_store(sk, TCP_FIN_WAIT2);
- }
- }
-
mptcp_for_each_subflow(msk, subflow) {
struct sock *tcp_sk = mptcp_subflow_tcp_sock(subflow);
@@ -2917,7 +2906,7 @@ static void __mptcp_wr_shutdown(struct sock *sk)
WRITE_ONCE(msk->write_seq, msk->write_seq + 1);
WRITE_ONCE(msk->snd_data_fin_enable, 1);
- __mptcp_check_send_data_fin(sk);
+ mptcp_check_send_data_fin(sk);
}
static void __mptcp_destroy_sock(struct sock *sk)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 4688daa6b38b..d9c8b21c6076 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1749,14 +1749,16 @@ static void subflow_state_change(struct sock *sk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
struct sock *parent = subflow->conn;
+ struct mptcp_sock *msk;
__subflow_state_change(sk);
+ msk = mptcp_sk(parent);
if (subflow_simultaneous_connect(sk)) {
mptcp_propagate_sndbuf(parent, sk);
mptcp_do_fallback(sk);
- mptcp_rcv_space_init(mptcp_sk(parent), sk);
- pr_fallback(mptcp_sk(parent));
+ mptcp_rcv_space_init(msk, sk);
+ pr_fallback(msk);
subflow->conn_finished = 1;
mptcp_set_connected(parent);
}
@@ -1772,11 +1774,12 @@ static void subflow_state_change(struct sock *sk)
subflow_sched_work_if_closed(mptcp_sk(parent), sk);
- if (__mptcp_check_fallback(mptcp_sk(parent)) &&
- !subflow->rx_eof && subflow_is_done(sk)) {
- subflow->rx_eof = 1;
- mptcp_subflow_eof(parent);
- }
+ /* when the fallback subflow closes the rx side, trigger a 'dummy'
+ * ingress data fin, so that the msk state will follow along
+ */
+ if (__mptcp_check_fallback(msk) && subflow_is_done(sk) && msk->first == sk &&
+ mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true))
+ mptcp_schedule_work(parent);
}
void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *listener_ssk)
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 5/6] mptcp: drop legacy code around RX EOF
2023-06-20 16:24 [PATCH net 0/6] mptcp: fixes for 6.4 Matthieu Baerts
` (3 preceding siblings ...)
2023-06-20 16:24 ` [PATCH net 4/6] mptcp: consolidate fallback and non fallback state machine Matthieu Baerts
@ 2023-06-20 16:24 ` Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 6/6] mptcp: ensure listener is unhashed before updating the sk status Matthieu Baerts
2023-06-22 5:50 ` [PATCH net 0/6] mptcp: fixes for 6.4 patchwork-bot+netdevbpf
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts @ 2023-06-20 16:24 UTC (permalink / raw)
To: mptcp, Mat Martineau, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Florian Westphal
Cc: netdev, linux-kernel, Matthieu Baerts
From: Paolo Abeni <pabeni@redhat.com>
Thanks to the previous patch -- "mptcp: consolidate fallback and non
fallback state machine" -- we can finally drop the "temporary hack"
used to detect rx eof.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
---
net/mptcp/protocol.c | 49 -------------------------------------------------
net/mptcp/protocol.h | 5 +----
2 files changed, 1 insertion(+), 53 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 27d206f7af62..a66ec341485e 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -894,49 +894,6 @@ bool mptcp_schedule_work(struct sock *sk)
return false;
}
-void mptcp_subflow_eof(struct sock *sk)
-{
- if (!test_and_set_bit(MPTCP_WORK_EOF, &mptcp_sk(sk)->flags))
- mptcp_schedule_work(sk);
-}
-
-static void mptcp_check_for_eof(struct mptcp_sock *msk)
-{
- struct mptcp_subflow_context *subflow;
- struct sock *sk = (struct sock *)msk;
- int receivers = 0;
-
- mptcp_for_each_subflow(msk, subflow)
- receivers += !subflow->rx_eof;
- if (receivers)
- return;
-
- if (!(sk->sk_shutdown & RCV_SHUTDOWN)) {
- /* hopefully temporary hack: propagate shutdown status
- * to msk, when all subflows agree on it
- */
- WRITE_ONCE(sk->sk_shutdown, sk->sk_shutdown | RCV_SHUTDOWN);
-
- smp_mb__before_atomic(); /* SHUTDOWN must be visible first */
- sk->sk_data_ready(sk);
- }
-
- switch (sk->sk_state) {
- case TCP_ESTABLISHED:
- inet_sk_state_store(sk, TCP_CLOSE_WAIT);
- break;
- case TCP_FIN_WAIT1:
- inet_sk_state_store(sk, TCP_CLOSING);
- break;
- case TCP_FIN_WAIT2:
- inet_sk_state_store(sk, TCP_CLOSE);
- break;
- default:
- return;
- }
- mptcp_close_wake_up(sk);
-}
-
static struct sock *mptcp_subflow_recv_lookup(const struct mptcp_sock *msk)
{
struct mptcp_subflow_context *subflow;
@@ -2161,9 +2118,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
break;
}
- if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags))
- mptcp_check_for_eof(msk);
-
if (sk->sk_shutdown & RCV_SHUTDOWN) {
/* race breaker: the shutdown could be after the
* previous receive queue check
@@ -2681,9 +2635,6 @@ static void mptcp_worker(struct work_struct *work)
mptcp_pm_nl_work(msk);
- if (test_and_clear_bit(MPTCP_WORK_EOF, &msk->flags))
- mptcp_check_for_eof(msk);
-
mptcp_check_send_data_fin(sk);
mptcp_check_data_fin_ack(sk);
mptcp_check_data_fin(sk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 70c957bc56a8..d3783a7056e1 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -113,7 +113,6 @@
/* MPTCP socket atomic flags */
#define MPTCP_NOSPACE 1
#define MPTCP_WORK_RTX 2
-#define MPTCP_WORK_EOF 3
#define MPTCP_FALLBACK_DONE 4
#define MPTCP_WORK_CLOSE_SUBFLOW 5
@@ -476,14 +475,13 @@ struct mptcp_subflow_context {
send_mp_fail : 1,
send_fastclose : 1,
send_infinite_map : 1,
- rx_eof : 1,
remote_key_valid : 1, /* received the peer key from */
disposable : 1, /* ctx can be free at ulp release time */
stale : 1, /* unable to snd/rcv data, do not use for xmit */
local_id_valid : 1, /* local_id is correctly initialized */
valid_csum_seen : 1, /* at least one csum validated */
is_mptfo : 1, /* subflow is doing TFO */
- __unused : 8;
+ __unused : 9;
enum mptcp_data_avail data_avail;
u32 remote_nonce;
u64 thmac;
@@ -720,7 +718,6 @@ static inline u64 mptcp_expand_seq(u64 old_seq, u64 cur_seq, bool use_64bit)
void __mptcp_check_push(struct sock *sk, struct sock *ssk);
void __mptcp_data_acked(struct sock *sk);
void __mptcp_error_report(struct sock *sk);
-void mptcp_subflow_eof(struct sock *sk);
bool mptcp_update_rcv_data_fin(struct mptcp_sock *msk, u64 data_fin_seq, bool use_64bit);
static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk)
{
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net 6/6] mptcp: ensure listener is unhashed before updating the sk status
2023-06-20 16:24 [PATCH net 0/6] mptcp: fixes for 6.4 Matthieu Baerts
` (4 preceding siblings ...)
2023-06-20 16:24 ` [PATCH net 5/6] mptcp: drop legacy code around RX EOF Matthieu Baerts
@ 2023-06-20 16:24 ` Matthieu Baerts
2023-06-22 5:50 ` [PATCH net 0/6] mptcp: fixes for 6.4 patchwork-bot+netdevbpf
6 siblings, 0 replies; 8+ messages in thread
From: Matthieu Baerts @ 2023-06-20 16:24 UTC (permalink / raw)
To: mptcp, Mat Martineau, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Florian Westphal
Cc: netdev, linux-kernel, Matthieu Baerts, Christoph Paasch, stable
From: Paolo Abeni <pabeni@redhat.com>
The MPTCP protocol access the listener subflow in a lockless
manner in a couple of places (poll, diag). That works only if
the msk itself leaves the listener status only after that the
subflow itself has been closed/disconnected. Otherwise we risk
deadlock in diag, as reported by Christoph.
Address the issue ensuring that the first subflow (the listener
one) is always disconnected before updating the msk socket status.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/407
Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
---
net/mptcp/pm_netlink.c | 1 +
net/mptcp/protocol.c | 31 +++++++++++++++++++------------
2 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 59f8f3124855..1224dfca5bf3 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1047,6 +1047,7 @@ static int mptcp_pm_nl_create_listen_socket(struct sock *sk,
if (err)
return err;
+ inet_sk_state_store(newsk, TCP_LISTEN);
err = kernel_listen(ssock, backlog);
if (err)
return err;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index a66ec341485e..a6c7f2d24909 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2368,13 +2368,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
kfree_rcu(subflow, rcu);
} else {
/* otherwise tcp will dispose of the ssk and subflow ctx */
- if (ssk->sk_state == TCP_LISTEN) {
- tcp_set_state(ssk, TCP_CLOSE);
- mptcp_subflow_queue_clean(sk, ssk);
- inet_csk_listen_stop(ssk);
- mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CLOSED);
- }
-
__tcp_close(ssk, 0);
/* close acquired an extra ref */
@@ -2902,10 +2895,24 @@ static __poll_t mptcp_check_readable(struct mptcp_sock *msk)
return EPOLLIN | EPOLLRDNORM;
}
-static void mptcp_listen_inuse_dec(struct sock *sk)
+static void mptcp_check_listen_stop(struct sock *sk)
{
- if (inet_sk_state_load(sk) == TCP_LISTEN)
- sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
+ struct sock *ssk;
+
+ if (inet_sk_state_load(sk) != TCP_LISTEN)
+ return;
+
+ sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);
+ ssk = mptcp_sk(sk)->first;
+ if (WARN_ON_ONCE(!ssk || inet_sk_state_load(ssk) != TCP_LISTEN))
+ return;
+
+ lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
+ mptcp_subflow_queue_clean(sk, ssk);
+ inet_csk_listen_stop(ssk);
+ mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CLOSED);
+ tcp_set_state(ssk, TCP_CLOSE);
+ release_sock(ssk);
}
bool __mptcp_close(struct sock *sk, long timeout)
@@ -2918,7 +2925,7 @@ bool __mptcp_close(struct sock *sk, long timeout)
WRITE_ONCE(sk->sk_shutdown, SHUTDOWN_MASK);
if ((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) {
- mptcp_listen_inuse_dec(sk);
+ mptcp_check_listen_stop(sk);
inet_sk_state_store(sk, TCP_CLOSE);
goto cleanup;
}
@@ -3035,7 +3042,7 @@ static int mptcp_disconnect(struct sock *sk, int flags)
if (msk->fastopening)
return -EBUSY;
- mptcp_listen_inuse_dec(sk);
+ mptcp_check_listen_stop(sk);
inet_sk_state_store(sk, TCP_CLOSE);
mptcp_stop_timer(sk);
--
2.40.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net 0/6] mptcp: fixes for 6.4
2023-06-20 16:24 [PATCH net 0/6] mptcp: fixes for 6.4 Matthieu Baerts
` (5 preceding siblings ...)
2023-06-20 16:24 ` [PATCH net 6/6] mptcp: ensure listener is unhashed before updating the sk status Matthieu Baerts
@ 2023-06-22 5:50 ` patchwork-bot+netdevbpf
6 siblings, 0 replies; 8+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-06-22 5:50 UTC (permalink / raw)
To: Matthieu Baerts
Cc: mptcp, martineau, davem, edumazet, kuba, pabeni, fw, netdev,
linux-kernel, stable, cpaasch
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 20 Jun 2023 18:24:17 +0200 you wrote:
> Patch 1 correctly handles disconnect() failures that can happen in some
> specific cases: now the socket state is set as unconnected as expected.
> That fixes an issue introduced in v6.2.
>
> Patch 2 fixes a divide by zero bug in mptcp_recvmsg() with a fix similar
> to a recent one from Eric Dumazet for TCP introducing sk_wait_pending
> flag. It should address an issue present in MPTCP from almost the
> beginning, from v5.9.
>
> [...]
Here is the summary with links:
- [net,1/6] mptcp: handle correctly disconnect() failures
https://git.kernel.org/netdev/net/c/c2b2ae3925b6
- [net,2/6] mptcp: fix possible divide by zero in recvmsg()
https://git.kernel.org/netdev/net/c/0ad529d9fd2b
- [net,3/6] mptcp: fix possible list corruption on passive MPJ
https://git.kernel.org/netdev/net/c/56a666c48b03
- [net,4/6] mptcp: consolidate fallback and non fallback state machine
https://git.kernel.org/netdev/net/c/81c1d0290160
- [net,5/6] mptcp: drop legacy code around RX EOF
https://git.kernel.org/netdev/net/c/b7535cfed223
- [net,6/6] mptcp: ensure listener is unhashed before updating the sk status
https://git.kernel.org/netdev/net/c/57fc0f1ceaa4
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-06-22 5:50 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-20 16:24 [PATCH net 0/6] mptcp: fixes for 6.4 Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 1/6] mptcp: handle correctly disconnect() failures Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 2/6] mptcp: fix possible divide by zero in recvmsg() Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 3/6] mptcp: fix possible list corruption on passive MPJ Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 4/6] mptcp: consolidate fallback and non fallback state machine Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 5/6] mptcp: drop legacy code around RX EOF Matthieu Baerts
2023-06-20 16:24 ` [PATCH net 6/6] mptcp: ensure listener is unhashed before updating the sk status Matthieu Baerts
2023-06-22 5:50 ` [PATCH net 0/6] mptcp: fixes for 6.4 patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).