[PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races

mptcp.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races
@ 2025-07-05  7:24 Paolo Abeni
  2025-07-05  7:24 ` [PATCH mptcp-net v2 1/5] mptcp: make fallback action and fallback decision atomic Paolo Abeni
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Paolo Abeni @ 2025-07-05  7:24 UTC (permalink / raw)
  To: mptcp

This series contains 3 fixes somewhat related to various races we have
while handling fallback and 2 small follow-up likely more suited for
net-next.

The root cause of the issues addressed here is that the check for
"we can fallback to tcp now" and the related action are not atomic. That
also applies to fallback due to MP_FAIL - where the window race is even
wider.

Address the issue introducing an additional spinlock to bundle together
all the relevant events, as per patch 1 and 2.

Note that mptcp_disconnect() unconditionally
clears the fallback status (zeroing msk->flags) and that may race with
operation still running on the (closing) subflows.

Such race is addressed in patch 3.

Patch 4 cleans up a bit the fallback code, introducing specific MIB for
each FB reason, and patch 5 drops the, hopefully now redundandt
pr_fallback().

Paolo Abeni (5):
  mptcp: make fallback action and fallback decision atomic
  mptcp: plug races between subflow fail and subflow creation
  mptcp: fix status reset on disconnect()
  mptcp: track fallbacks accurately via mibs
  mptcp: remove pr_fallback()

 net/mptcp/ctrl.c     |   4 +-
 net/mptcp/mib.c      |   5 ++
 net/mptcp/mib.h      |   7 +++
 net/mptcp/options.c  |   4 +-
 net/mptcp/pm.c       |   8 ++-
 net/mptcp/protocol.c | 126 +++++++++++++++++++++++++++++++++++--------
 net/mptcp/protocol.h |  35 ++++++------
 net/mptcp/subflow.c  |  35 ++++++------
 8 files changed, 164 insertions(+), 60 deletions(-)

-- 
2.49.0

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH mptcp-net v2 1/5] mptcp: make fallback action and fallback decision atomic
  2025-07-05  7:24 [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races Paolo Abeni
@ 2025-07-05  7:24 ` Paolo Abeni
  2025-07-08 14:54   ` Matthieu Baerts
  2025-07-05  7:24 ` [PATCH mptcp-net v2 2/5] mptcp: plug races between subflow fail and subflow creation Paolo Abeni
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Abeni @ 2025-07-05  7:24 UTC (permalink / raw)
  To: mptcp

Syzkaller reported the following splat:

WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 __mptcp_do_fallback net/mptcp/protocol.h:1223 [inline]
WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_do_fallback net/mptcp/protocol.h:1244 [inline]
WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 check_fully_established net/mptcp/options.c:982 [inline]
WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153
Modules linked in:
CPU: 1 UID: 0 PID: 7704 Comm: syz.3.1419 Not tainted 6.16.0-rc3-gbd5ce2324dba #20 PREEMPT(voluntary)
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:__mptcp_do_fallback net/mptcp/protocol.h:1223 [inline]
RIP: 0010:mptcp_do_fallback net/mptcp/protocol.h:1244 [inline]
RIP: 0010:check_fully_established net/mptcp/options.c:982 [inline]
RIP: 0010:mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153
Code: 24 18 e8 bb 2a 00 fd e9 1b df ff ff e8 b1 21 0f 00 e8 ec 5f c4 fc 44 0f b7 ac 24 b0 00 00 00 e9 54 f1 ff ff e8 d9 5f c4 fc 90 <0f> 0b 90 e9 b8 f4 ff ff e8 8b 2a 00 fd e9 8d e6 ff ff e8 81 2a 00
RSP: 0018:ffff8880a3f08448 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880180a8000 RCX: ffffffff84afcf45
RDX: ffff888090223700 RSI: ffffffff84afdaa7 RDI: 0000000000000001
RBP: ffff888017955780 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8880180a8910 R14: ffff8880a3e9d058 R15: 0000000000000000
FS:  00005555791b8500(0000) GS:ffff88811c495000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000110c2800b7 CR3: 0000000058e44000 CR4: 0000000000350ef0
Call Trace:
 <IRQ>
 tcp_reset+0x26f/0x2b0 net/ipv4/tcp_input.c:4432
 tcp_validate_incoming+0x1057/0x1b60 net/ipv4/tcp_input.c:5975
 tcp_rcv_established+0x5b5/0x21f0 net/ipv4/tcp_input.c:6166
 tcp_v4_do_rcv+0x5dc/0xa70 net/ipv4/tcp_ipv4.c:1925
 tcp_v4_rcv+0x3473/0x44a0 net/ipv4/tcp_ipv4.c:2363
 ip_protocol_deliver_rcu+0xba/0x480 net/ipv4/ip_input.c:205
 ip_local_deliver_finish+0x2f1/0x500 net/ipv4/ip_input.c:233
 NF_HOOK include/linux/netfilter.h:317 [inline]
 NF_HOOK include/linux/netfilter.h:311 [inline]
 ip_local_deliver+0x1be/0x560 net/ipv4/ip_input.c:254
 dst_input include/net/dst.h:469 [inline]
 ip_rcv_finish net/ipv4/ip_input.c:447 [inline]
 NF_HOOK include/linux/netfilter.h:317 [inline]
 NF_HOOK include/linux/netfilter.h:311 [inline]
 ip_rcv+0x514/0x810 net/ipv4/ip_input.c:567
 __netif_receive_skb_one_core+0x197/0x1e0 net/core/dev.c:5975
 __netif_receive_skb+0x1f/0x120 net/core/dev.c:6088
 process_backlog+0x301/0x1360 net/core/dev.c:6440
 __napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7453
 napi_poll net/core/dev.c:7517 [inline]
 net_rx_action+0xb44/0x1010 net/core/dev.c:7644
 handle_softirqs+0x1d0/0x770 kernel/softirq.c:579
 do_softirq+0x3f/0x90 kernel/softirq.c:480
 </IRQ>
 <TASK>
 __local_bh_enable_ip+0xed/0x110 kernel/softirq.c:407
 local_bh_enable include/linux/bottom_half.h:33 [inline]
 inet_csk_listen_stop+0x2c5/0x1070 net/ipv4/inet_connection_sock.c:1524
 mptcp_check_listen_stop.part.0+0x1cc/0x220 net/mptcp/protocol.c:2985
 mptcp_check_listen_stop net/mptcp/mib.h:118 [inline]
 __mptcp_close+0x9b9/0xbd0 net/mptcp/protocol.c:3000
 mptcp_close+0x2f/0x140 net/mptcp/protocol.c:3066
 inet_release+0xed/0x200 net/ipv4/af_inet.c:435
 inet6_release+0x4f/0x70 net/ipv6/af_inet6.c:487
 __sock_release+0xb3/0x270 net/socket.c:649
 sock_close+0x1c/0x30 net/socket.c:1439
 __fput+0x402/0xb70 fs/file_table.c:465
 task_work_run+0x150/0x240 kernel/task_work.c:227
 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
 exit_to_user_mode_loop+0xd4/0xe0 kernel/entry/common.c:114
 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline]
 syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline]
 syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline]
 do_syscall_64+0x245/0x360 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fc92f8a36ad
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcf52802d8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4
RAX: 0000000000000000 RBX: 00007ffcf52803a8 RCX: 00007fc92f8a36ad
RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003
RBP: 00007fc92fae7ba0 R08: 0000000000000001 R09: 0000002800000000
R10: 00007fc92f700000 R11: 0000000000000246 R12: 00007fc92fae5fac
R13: 00007fc92fae5fa0 R14: 0000000000026d00 R15: 0000000000026c51
 </TASK>
irq event stamp: 4068
hardirqs last  enabled at (4076): [<ffffffff81544816>] __up_console_sem+0x76/0x80 kernel/printk/printk.c:344
hardirqs last disabled at (4085): [<ffffffff815447fb>] __up_console_sem+0x5b/0x80 kernel/printk/printk.c:342
softirqs last  enabled at (3096): [<ffffffff840e1be0>] local_bh_enable include/linux/bottom_half.h:33 [inline]
softirqs last  enabled at (3096): [<ffffffff840e1be0>] inet_csk_listen_stop+0x2c0/0x1070 net/ipv4/inet_connection_sock.c:1524
softirqs last disabled at (3097): [<ffffffff813b6b9f>] do_softirq+0x3f/0x90 kernel/softirq.c:480

Since we need to track the 'fallback is possible' condition and the
fallback status separately, there are a few possible races open between
the check and the actual fallback action.

Add a spinlock to protect the fallback related information and use
it close all the possible related races. While at it also remove the
too-early clearing of allow_infinite_fallback in __mptcp_subflow_connect():
the field will be correctly cleared by subflow_finish_connect() if/
when the connection will complete successfully.

If fallback is not possible, as per RFC, reset the current subflow.

Reported by: Matthieu Baerts <matttbe@kernel.org>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/570
Reported-by: syzbot+5cf807c20386d699b524@syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/555
Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v1 -> v2:
  - remove unneeded and racing allow_infinite_fallback in
    __mptcp_subflow_connect()
  - really close race in subflow_check_data_avail()

Sharing earlier to feed syzkaller.
Hopefully a follow-up on mptcp-net-next will cleanup a bit the fallback
handling, I tried to minimize the delta for backport's sake
---
 net/mptcp/options.c  |  3 ++-
 net/mptcp/protocol.c | 40 +++++++++++++++++++++++++++++++++++-----
 net/mptcp/protocol.h | 21 +++++++++++++++------
 net/mptcp/subflow.c  |  8 ++++----
 4 files changed, 56 insertions(+), 16 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 56fd7bcb786c..4680d980e605 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -978,8 +978,9 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
 		if (subflow->mp_join)
 			goto reset;
 		subflow->mp_capable = 0;
+		if (mptcp_do_fallback(ssk))
+			goto reset;
 		pr_fallback(msk);
-		mptcp_do_fallback(ssk);
 		return false;
 	}
 
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1de42dc4e8ea..38bb11865b50 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -560,10 +560,9 @@ static bool mptcp_check_data_fin(struct sock *sk)
 
 static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk)
 {
-	if (READ_ONCE(msk->allow_infinite_fallback)) {
+	if (mptcp_do_fallback(ssk)) {
 		MPTCP_INC_STATS(sock_net(ssk),
 				MPTCP_MIB_DSSCORRUPTIONFALLBACK);
-		mptcp_do_fallback(ssk);
 	} else {
 		MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DSSCORRUPTIONRESET);
 		mptcp_subflow_reset(ssk);
@@ -803,6 +802,14 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
 	if (sk->sk_state != TCP_ESTABLISHED)
 		return false;
 
+	spin_lock_bh(&msk->fallback_lock);
+	if (__mptcp_check_fallback(msk)) {
+		spin_unlock_bh(&msk->fallback_lock);
+		return false;
+	}
+	mptcp_subflow_joined(msk, ssk);
+	spin_unlock_bh(&msk->fallback_lock);
+
 	/* attach to msk socket only after we are sure we will deal with it
 	 * at close time
 	 */
@@ -811,7 +818,6 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
 
 	mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
 	mptcp_sockopt_sync_locked(msk, ssk);
-	mptcp_subflow_joined(msk, ssk);
 	mptcp_stop_tout_timer(sk);
 	__mptcp_propagate_sndbuf(sk, ssk);
 	return true;
@@ -1136,10 +1142,14 @@ static void mptcp_update_infinite_map(struct mptcp_sock *msk,
 	mpext->infinite_map = 1;
 	mpext->data_len = 0;
 
+	if (!mptcp_do_fallback(ssk)) {
+		mptcp_subflow_reset(ssk);
+		return;
+	}
+
 	MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_INFINITEMAPTX);
 	mptcp_subflow_ctx(ssk)->send_infinite_map = 0;
 	pr_fallback(msk);
-	mptcp_do_fallback(ssk);
 }
 
 #define MPTCP_MAX_GSO_SIZE (GSO_LEGACY_MAX_SIZE - (MAX_TCP_HEADER + 1))
@@ -2543,9 +2553,9 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
 
 static void __mptcp_retrans(struct sock *sk)
 {
+	struct mptcp_sendmsg_info info = { .data_lock_held = true, };
 	struct mptcp_sock *msk = mptcp_sk(sk);
 	struct mptcp_subflow_context *subflow;
-	struct mptcp_sendmsg_info info = {};
 	struct mptcp_data_frag *dfrag;
 	struct sock *ssk;
 	int ret, err;
@@ -2590,6 +2600,18 @@ static void __mptcp_retrans(struct sock *sk)
 			info.sent = 0;
 			info.limit = READ_ONCE(msk->csum_enabled) ? dfrag->data_len :
 								    dfrag->already_sent;
+
+			/*
+			 * make the whole retrans decision, xmit, disallow
+			 * fallback atomic
+			 */
+			spin_lock_bh(&msk->fallback_lock);
+			if (__mptcp_check_fallback(msk)) {
+				spin_unlock_bh(&msk->fallback_lock);
+				release_sock(ssk);
+				return;
+			}
+
 			while (info.sent < info.limit) {
 				ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
 				if (ret <= 0)
@@ -2605,6 +2627,7 @@ static void __mptcp_retrans(struct sock *sk)
 					 info.size_goal);
 				WRITE_ONCE(msk->allow_infinite_fallback, false);
 			}
+			spin_unlock_bh(&msk->fallback_lock);
 
 			release_sock(ssk);
 		}
@@ -2738,6 +2761,7 @@ static void __mptcp_init_sock(struct sock *sk)
 	msk->last_ack_recv = tcp_jiffies32;
 
 	mptcp_pm_data_init(msk);
+	spin_lock_init(&msk->fallback_lock);
 
 	/* re-use the csk retrans timer for MPTCP-level retrans */
 	timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0);
@@ -3524,7 +3548,13 @@ bool mptcp_finish_join(struct sock *ssk)
 
 	/* active subflow, already present inside the conn_list */
 	if (!list_empty(&subflow->node)) {
+		spin_lock_bh(&msk->fallback_lock);
+		if (__mptcp_check_fallback(msk)) {
+			spin_unlock_bh(&msk->fallback_lock);
+			return false;
+		}
 		mptcp_subflow_joined(msk, ssk);
+		spin_unlock_bh(&msk->fallback_lock);
 		mptcp_propagate_sndbuf(parent, ssk);
 		return true;
 	}
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e36ae88b10ad..52c9175c7e49 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -352,6 +352,7 @@ struct mptcp_sock {
 	u32		subflow_id;
 	u32		setsockopt_seq;
 	char		ca_name[TCP_CA_NAME_MAX];
+	spinlock_t	fallback_lock;	/* protects fallback && allow_infinite_fallback */
 };
 
 #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock)
@@ -1214,15 +1215,21 @@ static inline bool mptcp_check_fallback(const struct sock *sk)
 	return __mptcp_check_fallback(msk);
 }
 
-static inline void __mptcp_do_fallback(struct mptcp_sock *msk)
+static inline bool __mptcp_do_fallback(struct mptcp_sock *msk)
 {
 	if (__mptcp_check_fallback(msk)) {
 		pr_debug("TCP fallback already done (msk=%p)\n", msk);
-		return;
+		return true;
 	}
-	if (WARN_ON_ONCE(!READ_ONCE(msk->allow_infinite_fallback)))
-		return;
+	spin_lock_bh(&msk->fallback_lock);
+	if (!msk->allow_infinite_fallback) {
+		spin_unlock_bh(&msk->fallback_lock);
+		return false;
+	}
+
 	set_bit(MPTCP_FALLBACK_DONE, &msk->flags);
+	spin_unlock_bh(&msk->fallback_lock);
+	return true;
 }
 
 static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
@@ -1234,14 +1241,15 @@ static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
 			TCPF_SYN_RECV | TCPF_LISTEN));
 }
 
-static inline void mptcp_do_fallback(struct sock *ssk)
+static inline bool mptcp_do_fallback(struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	struct sock *sk = subflow->conn;
 	struct mptcp_sock *msk;
 
 	msk = mptcp_sk(sk);
-	__mptcp_do_fallback(msk);
+	if (!__mptcp_do_fallback(msk))
+		return false;
 	if (READ_ONCE(msk->snd_data_fin_enable) && !(ssk->sk_shutdown & SEND_SHUTDOWN)) {
 		gfp_t saved_allocation = ssk->sk_allocation;
 
@@ -1253,6 +1261,7 @@ static inline void mptcp_do_fallback(struct sock *ssk)
 		tcp_shutdown(ssk, SEND_SHUTDOWN);
 		ssk->sk_allocation = saved_allocation;
 	}
+	return true;
 }
 
 #define pr_fallback(a) pr_debug("%s:fallback to TCP (msk=%p)\n", __func__, a)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 15613d691bfe..433c0b25dbc1 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -544,9 +544,11 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 	mptcp_get_options(skb, &mp_opt);
 	if (subflow->request_mptcp) {
 		if (!(mp_opt.suboptions & OPTION_MPTCP_MPC_SYNACK)) {
+			if (!mptcp_do_fallback(sk))
+				goto do_reset;
+
 			MPTCP_INC_STATS(sock_net(sk),
 					MPTCP_MIB_MPCAPABLEACTIVEFALLBACK);
-			mptcp_do_fallback(sk);
 			pr_fallback(msk);
 			goto fallback;
 		}
@@ -1395,7 +1397,7 @@ static bool subflow_check_data_avail(struct sock *ssk)
 			return true;
 		}
 
-		if (!READ_ONCE(msk->allow_infinite_fallback)) {
+		if (!mptcp_do_fallback(ssk)) {
 			/* fatal protocol error, close the socket.
 			 * subflow_error_report() will introduce the appropriate barriers
 			 */
@@ -1414,7 +1416,6 @@ static bool subflow_check_data_avail(struct sock *ssk)
 			return false;
 		}
 
-		mptcp_do_fallback(ssk);
 	}
 
 	skb = skb_peek(&ssk->sk_receive_queue);
@@ -1679,7 +1680,6 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_pm_local *local,
 	/* discard the subflow socket */
 	mptcp_sock_graft(ssk, sk->sk_socket);
 	iput(SOCK_INODE(sf));
-	WRITE_ONCE(msk->allow_infinite_fallback, false);
 	mptcp_stop_tout_timer(sk);
 	return 0;
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH mptcp-net v2 2/5] mptcp: plug races between subflow fail and subflow creation
  2025-07-05  7:24 [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races Paolo Abeni
  2025-07-05  7:24 ` [PATCH mptcp-net v2 1/5] mptcp: make fallback action and fallback decision atomic Paolo Abeni
@ 2025-07-05  7:24 ` Paolo Abeni
  2025-07-08 15:42   ` Matthieu Baerts
  2025-07-05  7:24 ` [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect() Paolo Abeni
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Abeni @ 2025-07-05  7:24 UTC (permalink / raw)
  To: mptcp

We have races similar to the one addressed by the previous patch
between subflow failing and additional subflow creation. They are
just harder to trigger.

The solution is similar. Use a separate flag to track the condition
'socket state prevent any additional subflow creation' protected by
the fallback lock.

The socket fallback makes such flag true, and also receiving or sending
a MPTCP fail option.

The field 'allow_infinite_fallback' is now always touched under the
relevant lock, we can drop the ONCE annotation on write.

Fixes: 478d770008b0 ("mptcp: send out MP_FAIL when data checksum fails")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/pm.c       |  8 +++++++-
 net/mptcp/protocol.c | 11 ++++++-----
 net/mptcp/protocol.h |  8 +++++++-
 net/mptcp/subflow.c  | 19 ++++++++++++++-----
 4 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 25ce2d5135bc..687dbb59d084 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -765,8 +765,14 @@ void mptcp_pm_mp_fail_received(struct sock *sk, u64 fail_seq)
 
 	pr_debug("fail_seq=%llu\n", fail_seq);
 
-	if (!READ_ONCE(msk->allow_infinite_fallback))
+	/* After accepting the fail, we can't create any other subflows */
+	spin_lock_bh(&msk->fallback_lock);
+	if (!msk->allow_infinite_fallback) {
+		spin_unlock_bh(&msk->fallback_lock);
 		return;
+	}
+	msk->allow_subflows = false;
+	spin_unlock_bh(&msk->fallback_lock);
 
 	if (!subflow->fail_tout) {
 		pr_debug("send MP_FAIL response and infinite map\n");
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 38bb11865b50..71b258aebcd9 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -791,7 +791,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 static void mptcp_subflow_joined(struct mptcp_sock *msk, struct sock *ssk)
 {
 	mptcp_subflow_ctx(ssk)->map_seq = READ_ONCE(msk->ack_seq);
-	WRITE_ONCE(msk->allow_infinite_fallback, false);
+	msk->allow_infinite_fallback = false;
 	mptcp_event(MPTCP_EVENT_SUB_ESTABLISHED, msk, ssk, GFP_ATOMIC);
 }
 
@@ -803,7 +803,7 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
 		return false;
 
 	spin_lock_bh(&msk->fallback_lock);
-	if (__mptcp_check_fallback(msk)) {
+	if (!msk->allow_subflows) {
 		spin_unlock_bh(&msk->fallback_lock);
 		return false;
 	}
@@ -2625,7 +2625,7 @@ static void __mptcp_retrans(struct sock *sk)
 				len = max(copied, len);
 				tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle,
 					 info.size_goal);
-				WRITE_ONCE(msk->allow_infinite_fallback, false);
+				msk->allow_infinite_fallback = false;
 			}
 			spin_unlock_bh(&msk->fallback_lock);
 
@@ -2753,7 +2753,8 @@ static void __mptcp_init_sock(struct sock *sk)
 	WRITE_ONCE(msk->first, NULL);
 	inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss;
 	WRITE_ONCE(msk->csum_enabled, mptcp_is_checksum_enabled(sock_net(sk)));
-	WRITE_ONCE(msk->allow_infinite_fallback, true);
+	msk->allow_infinite_fallback = true;
+	msk->allow_subflows = true;
 	msk->recovery = false;
 	msk->subflow_id = 1;
 	msk->last_data_sent = tcp_jiffies32;
@@ -3549,7 +3550,7 @@ bool mptcp_finish_join(struct sock *ssk)
 	/* active subflow, already present inside the conn_list */
 	if (!list_empty(&subflow->node)) {
 		spin_lock_bh(&msk->fallback_lock);
-		if (__mptcp_check_fallback(msk)) {
+		if (!msk->allow_subflows) {
 			spin_unlock_bh(&msk->fallback_lock);
 			return false;
 		}
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 52c9175c7e49..34c0863f1eae 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -348,11 +348,16 @@ struct mptcp_sock {
 		u64	rtt_us; /* last maximum rtt of subflows */
 	} rcvq_space;
 	u8		scaling_ratio;
+	bool		allow_subflows;
 
 	u32		subflow_id;
 	u32		setsockopt_seq;
 	char		ca_name[TCP_CA_NAME_MAX];
-	spinlock_t	fallback_lock;	/* protects fallback && allow_infinite_fallback */
+
+	spinlock_t	fallback_lock;	/* protects fallback,
+					 * allow_infinite_fallback and
+					 * allow_join
+					 */
 };
 
 #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock)
@@ -1227,6 +1232,7 @@ static inline bool __mptcp_do_fallback(struct mptcp_sock *msk)
 		return false;
 	}
 
+	msk->allow_subflows = false;
 	set_bit(MPTCP_FALLBACK_DONE, &msk->flags);
 	spin_unlock_bh(&msk->fallback_lock);
 	return true;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 433c0b25dbc1..11c5b042c2e1 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1302,20 +1302,29 @@ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ss
 		mptcp_schedule_work(sk);
 }
 
-static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk)
+static bool mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	unsigned long fail_tout;
 
+	/* we are really failing, prevent any later subflow join */
+	spin_lock_bh(&msk->fallback_lock);
+	if (!msk->allow_infinite_fallback) {
+		spin_unlock_bh(&msk->fallback_lock);
+		return false;
+	}
+	msk->allow_subflows = false;
+	spin_unlock_bh(&msk->fallback_lock);
+
 	/* graceful failure can happen only on the MPC subflow */
 	if (WARN_ON_ONCE(ssk != READ_ONCE(msk->first)))
-		return;
+		return false;
 
 	/* since the close timeout take precedence on the fail one,
 	 * no need to start the latter when the first is already set
 	 */
 	if (sock_flag((struct sock *)msk, SOCK_DEAD))
-		return;
+		return true;
 
 	/* we don't need extreme accuracy here, use a zero fail_tout as special
 	 * value meaning no fail timeout at all;
@@ -1327,6 +1336,7 @@ static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk)
 	tcp_send_ack(ssk);
 
 	mptcp_reset_tout_timer(msk, subflow->fail_tout);
+	return true;
 }
 
 static bool subflow_check_data_avail(struct sock *ssk)
@@ -1387,12 +1397,11 @@ static bool subflow_check_data_avail(struct sock *ssk)
 		    (subflow->mp_join || subflow->valid_csum_seen)) {
 			subflow->send_mp_fail = 1;
 
-			if (!READ_ONCE(msk->allow_infinite_fallback)) {
+			if (!mptcp_subflow_fail(msk, ssk)) {
 				subflow->reset_transient = 0;
 				subflow->reset_reason = MPTCP_RST_EMIDDLEBOX;
 				goto reset;
 			}
-			mptcp_subflow_fail(msk, ssk);
 			WRITE_ONCE(subflow->data_avail, true);
 			return true;
 		}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()
  2025-07-05  7:24 [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races Paolo Abeni
  2025-07-05  7:24 ` [PATCH mptcp-net v2 1/5] mptcp: make fallback action and fallback decision atomic Paolo Abeni
  2025-07-05  7:24 ` [PATCH mptcp-net v2 2/5] mptcp: plug races between subflow fail and subflow creation Paolo Abeni
@ 2025-07-05  7:24 ` Paolo Abeni
  2025-07-08 15:43   ` Matthieu Baerts
  2025-07-05  7:24 ` [PATCH mptcp-net v2 4/5] mptcp: track fallbacks accurately via mibs Paolo Abeni
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Abeni @ 2025-07-05  7:24 UTC (permalink / raw)
  To: mptcp

disconnect() can still race with subflow fallback leaving the msk in
inconsistent status. To avoid such race, disconnect() must be blocking,
error out if the argument prevent such wait: the socket will linger
in SS_DISCONNECT state forever.

Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 71b258aebcd9..0dc55a17f274 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3119,8 +3119,19 @@ static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
 	inet_sk(msk)->inet_rcv_saddr = inet_sk(ssk)->inet_rcv_saddr;
 }
 
+/* can safely clear the fallback status only if no subflow is going to
+ * push or receive any data on the wire
+ */
+static bool mptcp_can_reset_fb_status(const struct mptcp_sock *msk)
+{
+	return list_empty(&msk->conn_list) ||
+	       inet_sk_state_load(msk->first) == TCP_CLOSE;
+}
+
 static int mptcp_disconnect(struct sock *sk, int flags)
 {
+	long timeout = sock_sndtimeo(sk, flags & MSG_DONTWAIT);
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
 	/* We are on the fastopen error path. We can't call straight into the
@@ -3142,7 +3153,34 @@ static int mptcp_disconnect(struct sock *sk, int flags)
 	 * subflow
 	 */
 	mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
+
+	if (!mptcp_can_reset_fb_status(msk)) {
+		if (!timeout)
+			return -EAGAIN;
+
+		add_wait_queue(sk_sleep(sk), &wait);
+
+		/* __mptcp_close_ssk() will wake us */
+		do {
+			if (sk_wait_event(sk, &timeout,
+					  mptcp_can_reset_fb_status(msk),
+					  &wait))
+				break;
+		} while (!signal_pending(current) && timeout);
+
+		remove_wait_queue(sk_sleep(sk), &wait);
+
+		/* timeout/signal can prevent reaching the expected status */
+		if (!mptcp_can_reset_fb_status(msk))
+			return -EBUSY;
+	}
+
+	spin_lock_bh(&msk->fallback_lock);
+	msk->allow_subflows = true;
+	msk->allow_infinite_fallback = true;
 	WRITE_ONCE(msk->flags, 0);
+	spin_unlock_bh(&msk->fallback_lock);
+
 	msk->cb_flags = 0;
 	msk->recovery = false;
 	WRITE_ONCE(msk->can_ack, false);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH mptcp-net v2 4/5] mptcp: track fallbacks accurately via mibs
  2025-07-05  7:24 [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races Paolo Abeni
                   ` (2 preceding siblings ...)
  2025-07-05  7:24 ` [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect() Paolo Abeni
@ 2025-07-05  7:24 ` Paolo Abeni
  2025-07-08 15:56   ` Matthieu Baerts
  2025-07-05  7:24 ` [PATCH mptcp-net v2 5/5] mptcp: remove pr_fallback() Paolo Abeni
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Paolo Abeni @ 2025-07-05  7:24 UTC (permalink / raw)
  To: mptcp

Add the mibs required to cover the few possible fallback causes still
lacking suck info.

Move the relevant mib increment into the fallback helper, so that no
eventual future fallback operation will miss a paired mib increment.

Additionally track failed fallback via its own mib.

While at the above, rename an existing helper to reduce long lines
problems all along.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
still a couple of long lines I could not trim
---
 net/mptcp/ctrl.c     |  4 ++--
 net/mptcp/mib.c      |  5 +++++
 net/mptcp/mib.h      |  7 +++++++
 net/mptcp/options.c  |  2 +-
 net/mptcp/protocol.c | 44 ++++++++++++++++++++++++++++++--------------
 net/mptcp/protocol.h | 31 ++++++++-----------------------
 net/mptcp/subflow.c  | 10 ++++------
 7 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index d9290c5bb6c7..fed40dae5583 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -533,9 +533,9 @@ void mptcp_active_detect_blackhole(struct sock *ssk, bool expired)
 	to_max = mptcp_get_pernet(net)->syn_retrans_before_tcp_fallback;
 
 	if (timeouts == to_max || (timeouts < to_max && expired)) {
-		MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP);
 		subflow->mpc_drop = 1;
-		mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow);
+		mptcp_early_fallback(mptcp_sk(subflow->conn), subflow,
+				     MPTCP_MIB_MPCAPABLEACTIVEDROP);
 	}
 }
 
diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
index 0c24545f0e8d..064f5eaf0595 100644
--- a/net/mptcp/mib.c
+++ b/net/mptcp/mib.c
@@ -80,6 +80,11 @@ static const struct snmp_mib mptcp_snmp_list[] = {
 	SNMP_MIB_ITEM("RcvWndConflict", MPTCP_MIB_RCVWNDCONFLICT),
 	SNMP_MIB_ITEM("MPCurrEstab", MPTCP_MIB_CURRESTAB),
 	SNMP_MIB_ITEM("Blackhole", MPTCP_MIB_BLACKHOLE),
+	SNMP_MIB_ITEM("DataFallback", MPTCP_MIB_DATA_FALLBACK),
+	SNMP_MIB_ITEM("MD5sumFallback", MPTCP_MIB_MD5SUM_FALLBACK),
+	SNMP_MIB_ITEM("DssFallback", MPTCP_MIB_DSS_FALLBACK),
+	SNMP_MIB_ITEM("SimultFallback", MPTCP_MIB_SIMULT_FALLBACK),
+	SNMP_MIB_ITEM("FallbackFailed", MPTCP_MIB_FALLBACK_FAILED),
 	SNMP_MIB_SENTINEL
 };
 
diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
index 250c6b77977e..99c8788694c3 100644
--- a/net/mptcp/mib.h
+++ b/net/mptcp/mib.h
@@ -81,6 +81,13 @@ enum linux_mptcp_mib_field {
 	MPTCP_MIB_RCVWNDCONFLICT,	/* Conflict with while updating msk rcv wnd */
 	MPTCP_MIB_CURRESTAB,		/* Current established MPTCP connections */
 	MPTCP_MIB_BLACKHOLE,		/* A blackhole has been detected */
+	MPTCP_MIB_DATA_FALLBACK,	/* Missing DSS/MPC+data on first
+					 * established packet
+					 */
+	MPTCP_MIB_MD5SUM_FALLBACK,	/* Conflicting TCP option enabled */
+	MPTCP_MIB_DSS_FALLBACK,		/* Bad or missing DSS */
+	MPTCP_MIB_SIMULT_FALLBACK,	/* Simultaneous connect */
+	MPTCP_MIB_FALLBACK_FAILED,	/* Can't fallback due to msk status */
 	__MPTCP_MIB_MAX
 };
 
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 4680d980e605..c9e1f3ec3772 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -978,7 +978,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
 		if (subflow->mp_join)
 			goto reset;
 		subflow->mp_capable = 0;
-		if (mptcp_do_fallback(ssk))
+		if (mptcp_do_fallback(ssk, MPTCP_MIB_DATA_FALLBACK))
 			goto reset;
 		pr_fallback(msk);
 		return false;
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 0dc55a17f274..5f99f02c827c 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -67,6 +67,27 @@ static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
 	return &inet_stream_ops;
 }
 
+bool __mptcp_do_fallback(struct mptcp_sock *msk, int fb_mib)
+{
+	struct net *net = sock_net((struct sock *)msk);
+
+	if (__mptcp_check_fallback(msk))
+		return true;
+
+	spin_lock_bh(&msk->fallback_lock);
+	if (!msk->allow_infinite_fallback) {
+		__MPTCP_INC_STATS(net, MPTCP_MIB_SIMULT_FALLBACK);
+		spin_unlock_bh(&msk->fallback_lock);
+		return false;
+	}
+
+	msk->allow_subflows = false;
+	set_bit(MPTCP_FALLBACK_DONE, &msk->flags);
+	__MPTCP_INC_STATS(net, fb_mib);
+	spin_unlock_bh(&msk->fallback_lock);
+	return true;
+}
+
 static int __mptcp_socket_create(struct mptcp_sock *msk)
 {
 	struct mptcp_subflow_context *subflow;
@@ -560,10 +581,7 @@ static bool mptcp_check_data_fin(struct sock *sk)
 
 static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk)
 {
-	if (mptcp_do_fallback(ssk)) {
-		MPTCP_INC_STATS(sock_net(ssk),
-				MPTCP_MIB_DSSCORRUPTIONFALLBACK);
-	} else {
+	if (!mptcp_do_fallback(ssk, MPTCP_MIB_DSSCORRUPTIONFALLBACK)) {
 		MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DSSCORRUPTIONRESET);
 		mptcp_subflow_reset(ssk);
 	}
@@ -1142,12 +1160,11 @@ static void mptcp_update_infinite_map(struct mptcp_sock *msk,
 	mpext->infinite_map = 1;
 	mpext->data_len = 0;
 
-	if (!mptcp_do_fallback(ssk)) {
+	if (!mptcp_do_fallback(ssk, MPTCP_MIB_INFINITEMAPTX)) {
 		mptcp_subflow_reset(ssk);
 		return;
 	}
 
-	MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_INFINITEMAPTX);
 	mptcp_subflow_ctx(ssk)->send_infinite_map = 0;
 	pr_fallback(msk);
 }
@@ -3717,16 +3734,15 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	 * TCP option space.
 	 */
 	if (rcu_access_pointer(tcp_sk(ssk)->md5sig_info))
-		mptcp_subflow_early_fallback(msk, subflow);
+		mptcp_early_fallback(msk, subflow, MPTCP_MIB_MD5SUM_FALLBACK);
 #endif
 	if (subflow->request_mptcp) {
-		if (mptcp_active_should_disable(sk)) {
-			MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_MPCAPABLEACTIVEDISABLED);
-			mptcp_subflow_early_fallback(msk, subflow);
-		} else if (mptcp_token_new_connect(ssk) < 0) {
-			MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_TOKENFALLBACKINIT);
-			mptcp_subflow_early_fallback(msk, subflow);
-		}
+		if (mptcp_active_should_disable(sk))
+			mptcp_early_fallback(msk, subflow,
+					    MPTCP_MIB_MPCAPABLEACTIVEDISABLED);
+		else if (mptcp_token_new_connect(ssk) < 0)
+			mptcp_early_fallback(msk, subflow,
+					     MPTCP_MIB_TOKENFALLBACKINIT);
 	}
 
 	WRITE_ONCE(msk->write_seq, subflow->idsn);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 34c0863f1eae..ceb2f2ba2cc7 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1220,24 +1220,6 @@ static inline bool mptcp_check_fallback(const struct sock *sk)
 	return __mptcp_check_fallback(msk);
 }
 
-static inline bool __mptcp_do_fallback(struct mptcp_sock *msk)
-{
-	if (__mptcp_check_fallback(msk)) {
-		pr_debug("TCP fallback already done (msk=%p)\n", msk);
-		return true;
-	}
-	spin_lock_bh(&msk->fallback_lock);
-	if (!msk->allow_infinite_fallback) {
-		spin_unlock_bh(&msk->fallback_lock);
-		return false;
-	}
-
-	msk->allow_subflows = false;
-	set_bit(MPTCP_FALLBACK_DONE, &msk->flags);
-	spin_unlock_bh(&msk->fallback_lock);
-	return true;
-}
-
 static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
 {
 	struct sock *ssk = READ_ONCE(msk->first);
@@ -1247,14 +1229,16 @@ static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
 			TCPF_SYN_RECV | TCPF_LISTEN));
 }
 
-static inline bool mptcp_do_fallback(struct sock *ssk)
+bool __mptcp_do_fallback(struct mptcp_sock *msk, int fb_mib);
+
+static inline bool mptcp_do_fallback(struct sock *ssk, int fb_mib)
 {
 	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
 	struct sock *sk = subflow->conn;
 	struct mptcp_sock *msk;
 
 	msk = mptcp_sk(sk);
-	if (!__mptcp_do_fallback(msk))
+	if (!__mptcp_do_fallback(msk, fb_mib))
 		return false;
 	if (READ_ONCE(msk->snd_data_fin_enable) && !(ssk->sk_shutdown & SEND_SHUTDOWN)) {
 		gfp_t saved_allocation = ssk->sk_allocation;
@@ -1272,12 +1256,13 @@ static inline bool mptcp_do_fallback(struct sock *ssk)
 
 #define pr_fallback(a) pr_debug("%s:fallback to TCP (msk=%p)\n", __func__, a)
 
-static inline void mptcp_subflow_early_fallback(struct mptcp_sock *msk,
-						struct mptcp_subflow_context *subflow)
+static inline void mptcp_early_fallback(struct mptcp_sock *msk,
+					struct mptcp_subflow_context *subflow,
+					int fb_mib)
 {
 	pr_fallback(msk);
 	subflow->request_mptcp = 0;
-	__mptcp_do_fallback(msk);
+	__mptcp_do_fallback(msk, fb_mib);
 }
 
 static inline bool mptcp_check_infinite_map(struct sk_buff *skb)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 11c5b042c2e1..e140d032eb3e 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -544,11 +544,10 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 	mptcp_get_options(skb, &mp_opt);
 	if (subflow->request_mptcp) {
 		if (!(mp_opt.suboptions & OPTION_MPTCP_MPC_SYNACK)) {
-			if (!mptcp_do_fallback(sk))
+			if (!mptcp_do_fallback(sk,
+					    MPTCP_MIB_MPCAPABLEACTIVEFALLBACK))
 				goto do_reset;
 
-			MPTCP_INC_STATS(sock_net(sk),
-					MPTCP_MIB_MPCAPABLEACTIVEFALLBACK);
 			pr_fallback(msk);
 			goto fallback;
 		}
@@ -1406,7 +1405,7 @@ static bool subflow_check_data_avail(struct sock *ssk)
 			return true;
 		}
 
-		if (!mptcp_do_fallback(ssk)) {
+		if (!mptcp_do_fallback(ssk, MPTCP_MIB_DSS_FALLBACK)) {
 			/* fatal protocol error, close the socket.
 			 * subflow_error_report() will introduce the appropriate barriers
 			 */
@@ -1424,7 +1423,6 @@ static bool subflow_check_data_avail(struct sock *ssk)
 			WRITE_ONCE(subflow->data_avail, false);
 			return false;
 		}
-
 	}
 
 	skb = skb_peek(&ssk->sk_receive_queue);
@@ -1860,7 +1858,7 @@ static void subflow_state_change(struct sock *sk)
 
 	msk = mptcp_sk(parent);
 	if (subflow_simultaneous_connect(sk)) {
-		mptcp_do_fallback(sk);
+		mptcp_do_fallback(sk, MPTCP_MIB_SIMULT_FALLBACK);
 		pr_fallback(msk);
 		subflow->conn_finished = 1;
 		mptcp_propagate_state(parent, sk, subflow, NULL);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH mptcp-net v2 5/5] mptcp: remove pr_fallback()
  2025-07-05  7:24 [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races Paolo Abeni
                   ` (3 preceding siblings ...)
  2025-07-05  7:24 ` [PATCH mptcp-net v2 4/5] mptcp: track fallbacks accurately via mibs Paolo Abeni
@ 2025-07-05  7:24 ` Paolo Abeni
  2025-07-05 18:04   ` kernel test robot
  2025-07-08 15:57   ` Matthieu Baerts
  2025-07-07  7:43 ` [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races MPTCP CI
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Paolo Abeni @ 2025-07-05  7:24 UTC (permalink / raw)
  To: mptcp

We can now track fully the fallback status of a given connection
via the relevant mibs, the mentioned helper is redundant. Remove
it completely.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/options.c  | 1 -
 net/mptcp/protocol.c | 1 -
 net/mptcp/protocol.h | 3 ---
 net/mptcp/subflow.c  | 2 --
 4 files changed, 7 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index c9e1f3ec3772..878bb147409c 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -980,7 +980,6 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
 		subflow->mp_capable = 0;
 		if (mptcp_do_fallback(ssk, MPTCP_MIB_DATA_FALLBACK))
 			goto reset;
-		pr_fallback(msk);
 		return false;
 	}
 
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 5f99f02c827c..397a135cb811 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1166,7 +1166,6 @@ static void mptcp_update_infinite_map(struct mptcp_sock *msk,
 	}
 
 	mptcp_subflow_ctx(ssk)->send_infinite_map = 0;
-	pr_fallback(msk);
 }
 
 #define MPTCP_MAX_GSO_SIZE (GSO_LEGACY_MAX_SIZE - (MAX_TCP_HEADER + 1))
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index ceb2f2ba2cc7..d39991c90582 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1254,13 +1254,10 @@ static inline bool mptcp_do_fallback(struct sock *ssk, int fb_mib)
 	return true;
 }
 
-#define pr_fallback(a) pr_debug("%s:fallback to TCP (msk=%p)\n", __func__, a)
-
 static inline void mptcp_early_fallback(struct mptcp_sock *msk,
 					struct mptcp_subflow_context *subflow,
 					int fb_mib)
 {
-	pr_fallback(msk);
 	subflow->request_mptcp = 0;
 	__mptcp_do_fallback(msk, fb_mib);
 }
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index e140d032eb3e..af9f09b8736a 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -548,7 +548,6 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
 					    MPTCP_MIB_MPCAPABLEACTIVEFALLBACK))
 				goto do_reset;
 
-			pr_fallback(msk);
 			goto fallback;
 		}
 
@@ -1859,7 +1858,6 @@ static void subflow_state_change(struct sock *sk)
 	msk = mptcp_sk(parent);
 	if (subflow_simultaneous_connect(sk)) {
 		mptcp_do_fallback(sk, MPTCP_MIB_SIMULT_FALLBACK);
-		pr_fallback(msk);
 		subflow->conn_finished = 1;
 		mptcp_propagate_state(parent, sk, subflow, NULL);
 	}
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 5/5] mptcp: remove pr_fallback()
  2025-07-05  7:24 ` [PATCH mptcp-net v2 5/5] mptcp: remove pr_fallback() Paolo Abeni
@ 2025-07-05 18:04   ` kernel test robot
  2025-07-08 15:57   ` Matthieu Baerts
  1 sibling, 0 replies; 28+ messages in thread
From: kernel test robot @ 2025-07-05 18:04 UTC (permalink / raw)
  To: Paolo Abeni, mptcp; +Cc: oe-kbuild-all

Hi Paolo,

kernel test robot noticed the following build warnings:

[auto build test WARNING on mptcp/export]
[also build test WARNING on mptcp/export-net linus/master v6.16-rc4 next-20250704]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Paolo-Abeni/mptcp-make-fallback-action-and-fallback-decision-atomic/20250705-152610
base:   https://github.com/multipath-tcp/mptcp_net-next.git export
patch link:    https://lore.kernel.org/r/189eef71d3406ce2931e6d9442c0c5bba1df42fa.1751700000.git.pabeni%40redhat.com
patch subject: [PATCH mptcp-net v2 5/5] mptcp: remove pr_fallback()
config: arm-randconfig-001-20250705 (https://download.01.org/0day-ci/archive/20250706/202507060115.0WOIqNXG-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 12.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250706/202507060115.0WOIqNXG-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507060115.0WOIqNXG-lkp@intel.com/

All warnings (new ones prefixed by >>):

   net/mptcp/subflow.c: In function 'subflow_state_change':
>> net/mptcp/subflow.c:1854:28: warning: variable 'msk' set but not used [-Wunused-but-set-variable]
    1854 |         struct mptcp_sock *msk;
         |                            ^~~


vim +/msk +1854 net/mptcp/subflow.c

648ef4b88673dad Mat Martineau    2020-01-21  1849  
648ef4b88673dad Mat Martineau    2020-01-21  1850  static void subflow_state_change(struct sock *sk)
648ef4b88673dad Mat Martineau    2020-01-21  1851  {
648ef4b88673dad Mat Martineau    2020-01-21  1852  	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
dc093db5cc052b7 Paolo Abeni      2020-03-13  1853  	struct sock *parent = subflow->conn;
81c1d029016001f Paolo Abeni      2023-06-20 @1854  	struct mptcp_sock *msk;
648ef4b88673dad Mat Martineau    2020-01-21  1855  
648ef4b88673dad Mat Martineau    2020-01-21  1856  	__subflow_state_change(sk);
648ef4b88673dad Mat Martineau    2020-01-21  1857  
81c1d029016001f Paolo Abeni      2023-06-20  1858  	msk = mptcp_sk(parent);
8fd738049ac3d67 Davide Caratti   2020-06-29  1859  	if (subflow_simultaneous_connect(sk)) {
be9cad90d7343cf Paolo Abeni      2025-07-05  1860  		mptcp_do_fallback(sk, MPTCP_MIB_SIMULT_FALLBACK);
8fd738049ac3d67 Davide Caratti   2020-06-29  1861  		subflow->conn_finished = 1;
e4a0fa47e816e18 Paolo Abeni      2024-02-08  1862  		mptcp_propagate_state(parent, sk, subflow, NULL);
8fd738049ac3d67 Davide Caratti   2020-06-29  1863  	}
8fd738049ac3d67 Davide Caratti   2020-06-29  1864  
648ef4b88673dad Mat Martineau    2020-01-21  1865  	/* as recvmsg() does not acquire the subflow socket for ssk selection
648ef4b88673dad Mat Martineau    2020-01-21  1866  	 * a fin packet carrying a DSS can be unnoticed if we don't trigger
648ef4b88673dad Mat Martineau    2020-01-21  1867  	 * the data available machinery here.
648ef4b88673dad Mat Martineau    2020-01-21  1868  	 */
e1ff9e82e2ea53d Davide Caratti   2020-06-29  1869  	if (mptcp_subflow_data_available(sk))
2e52213c79c0b94 Florian Westphal 2020-02-26  1870  		mptcp_data_ready(parent, sk);
499ada5073361c6 Paolo Abeni      2021-06-10  1871  	else if (unlikely(sk->sk_err))
499ada5073361c6 Paolo Abeni      2021-06-10  1872  		subflow_error_report(sk);
648ef4b88673dad Mat Martineau    2020-01-21  1873  
40947e13997a1cb Florian Westphal 2021-02-12  1874  	subflow_sched_work_if_closed(mptcp_sk(parent), sk);
648ef4b88673dad Mat Martineau    2020-01-21  1875  }
648ef4b88673dad Mat Martineau    2020-01-21  1876  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races
  2025-07-05  7:24 [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races Paolo Abeni
                   ` (4 preceding siblings ...)
  2025-07-05  7:24 ` [PATCH mptcp-net v2 5/5] mptcp: remove pr_fallback() Paolo Abeni
@ 2025-07-07  7:43 ` MPTCP CI
  2025-07-07  8:47 ` MPTCP CI
  2025-07-08 15:41 ` Matthieu Baerts
  7 siblings, 0 replies; 28+ messages in thread
From: MPTCP CI @ 2025-07-07  7:43 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

But sadly, our CI spotted some issues with it when trying to build it.

You can find more details there:

  https://github.com/multipath-tcp/mptcp_net-next/actions/runs/16110745400

Status: failure
Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/54ab1b69acaf
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=979295

Feel free to reply to this email if you cannot access logs, if you need
some support to fix the error, if this doesn't seem to be caused by your
modifications or if the error is a false positive one.

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races
  2025-07-05  7:24 [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races Paolo Abeni
                   ` (5 preceding siblings ...)
  2025-07-07  7:43 ` [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races MPTCP CI
@ 2025-07-07  8:47 ` MPTCP CI
  2025-07-08 15:41 ` Matthieu Baerts
  7 siblings, 0 replies; 28+ messages in thread
From: MPTCP CI @ 2025-07-07  8:47 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/16110745406

Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/54ab1b69acaf
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=979295


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 1/5] mptcp: make fallback action and fallback decision atomic
  2025-07-05  7:24 ` [PATCH mptcp-net v2 1/5] mptcp: make fallback action and fallback decision atomic Paolo Abeni
@ 2025-07-08 14:54   ` Matthieu Baerts
  2025-07-08 15:13     ` Matthieu Baerts
  2025-07-09  9:09     ` Paolo Abeni
  0 siblings, 2 replies; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-08 14:54 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

Hi Paolo,

On 05/07/2025 09:24, Paolo Abeni wrote:
> Syzkaller reported the following splat:

Thank you for working on that!

(...)
> Since we need to track the 'fallback is possible' condition and the
> fallback status separately, there are a few possible races open between
> the check and the actual fallback action.
> 
> Add a spinlock to protect the fallback related information and use
> it close all the possible related races. While at it also remove the
> too-early clearing of allow_infinite_fallback in __mptcp_subflow_connect():
> the field will be correctly cleared by subflow_finish_connect() if/
> when the connection will complete successfully.

Just to be sure, will we correctly handle the case when:
- a SYN + MP_JOIN is sent
- an issue is detected, a fallback is done
- a SYN + ACK + MP_JOIN is received (or later on in the join process).
> If fallback is not possible, as per RFC, reset the current subflow.
> 
> Reported by: Matthieu Baerts <matttbe@kernel.org>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/570
> Reported-by: syzbot+5cf807c20386d699b524@syzkaller.appspotmail.com
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/555
> Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v1 -> v2:
>   - remove unneeded and racing allow_infinite_fallback in
>     __mptcp_subflow_connect()
>   - really close race in subflow_check_data_avail()
> 
> Sharing earlier to feed syzkaller.
> Hopefully a follow-up on mptcp-net-next will cleanup a bit the fallback
> handling, I tried to minimize the delta for backport's sake

I fed my 4 syzkaller instances 1 day and 6 hours ago, and everything
look good so far. But please note that (1) it was not easy for them to
reproduce the issue, and (2) you removed the warning that was triggering
the splat :)
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index 56fd7bcb786c..4680d980e605 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -978,8 +978,9 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
>  		if (subflow->mp_join)
>  			goto reset;
>  		subflow->mp_capable = 0;
> +		if (mptcp_do_fallback(ssk))

It looks strange: in case of success, we reset?

Not that the selftests don't validate this branch:

 https://ci-results.mptcp.dev/lcov/export/mptcp/options.c.gcov.html#L977

(also, I wonder if we should not have a MIB counter here)

> +			goto reset;
>  		pr_fallback(msk);
> -		mptcp_do_fallback(ssk);
>  		return false;
>  	}
>  
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 1de42dc4e8ea..38bb11865b50 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -560,10 +560,9 @@ static bool mptcp_check_data_fin(struct sock *sk)
>  
>  static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk)
>  {
> -	if (READ_ONCE(msk->allow_infinite_fallback)) {
> +	if (mptcp_do_fallback(ssk)) {
>  		MPTCP_INC_STATS(sock_net(ssk),
>  				MPTCP_MIB_DSSCORRUPTIONFALLBACK);
> -		mptcp_do_fallback(ssk);
>  	} else {
>  		MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DSSCORRUPTIONRESET);
>  		mptcp_subflow_reset(ssk);
> @@ -803,6 +802,14 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
>  	if (sk->sk_state != TCP_ESTABLISHED)
>  		return false;
>  
> +	spin_lock_bh(&msk->fallback_lock);
> +	if (__mptcp_check_fallback(msk)) {
> +		spin_unlock_bh(&msk->fallback_lock);
> +		return false;

(I wonder if we should not have a MIB counter here as well)

> +	}
> +	mptcp_subflow_joined(msk, ssk);
> +	spin_unlock_bh(&msk->fallback_lock);
> +
>  	/* attach to msk socket only after we are sure we will deal with it
>  	 * at close time
>  	 */
> @@ -811,7 +818,6 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk, struct sock *ssk)
>  
>  	mptcp_subflow_ctx(ssk)->subflow_id = msk->subflow_id++;
>  	mptcp_sockopt_sync_locked(msk, ssk);
> -	mptcp_subflow_joined(msk, ssk);
>  	mptcp_stop_tout_timer(sk);
>  	__mptcp_propagate_sndbuf(sk, ssk);
>  	return true;
> @@ -1136,10 +1142,14 @@ static void mptcp_update_infinite_map(struct mptcp_sock *msk,
>  	mpext->infinite_map = 1;
>  	mpext->data_len = 0;
>  
> +	if (!mptcp_do_fallback(ssk)) {
> +		mptcp_subflow_reset(ssk);

(I wonder if we should not have a MIB counter here as well)

> +		return;
> +	}
> +
>  	MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_INFINITEMAPTX);
>  	mptcp_subflow_ctx(ssk)->send_infinite_map = 0;
>  	pr_fallback(msk);
> -	mptcp_do_fallback(ssk);
>  }
>  
>  #define MPTCP_MAX_GSO_SIZE (GSO_LEGACY_MAX_SIZE - (MAX_TCP_HEADER + 1))
> @@ -2543,9 +2553,9 @@ static void mptcp_check_fastclose(struct mptcp_sock *msk)
>  
>  static void __mptcp_retrans(struct sock *sk)
>  {
> +	struct mptcp_sendmsg_info info = { .data_lock_held = true, };

Just to be sure: is this modification here linked to the rest?

>  	struct mptcp_sock *msk = mptcp_sk(sk);
>  	struct mptcp_subflow_context *subflow;
> -	struct mptcp_sendmsg_info info = {};
>  	struct mptcp_data_frag *dfrag;
>  	struct sock *ssk;
>  	int ret, err;
> @@ -2590,6 +2600,18 @@ static void __mptcp_retrans(struct sock *sk)
>  			info.sent = 0;
>  			info.limit = READ_ONCE(msk->csum_enabled) ? dfrag->data_len :
>  								    dfrag->already_sent;
> +
> +			/*
> +			 * make the whole retrans decision, xmit, disallow
> +			 * fallback atomic
> +			 */
> +			spin_lock_bh(&msk->fallback_lock);
> +			if (__mptcp_check_fallback(msk)) {
> +				spin_unlock_bh(&msk->fallback_lock);
> +				release_sock(ssk);
> +				return;
> +			}
> +
>  			while (info.sent < info.limit) {
>  				ret = mptcp_sendmsg_frag(sk, ssk, dfrag, &info);
>  				if (ret <= 0)
> @@ -2605,6 +2627,7 @@ static void __mptcp_retrans(struct sock *sk)
>  					 info.size_goal);
>  				WRITE_ONCE(msk->allow_infinite_fallback, false);
>  			}
> +			spin_unlock_bh(&msk->fallback_lock);
>  
>  			release_sock(ssk);
>  		}
> @@ -2738,6 +2761,7 @@ static void __mptcp_init_sock(struct sock *sk)
>  	msk->last_ack_recv = tcp_jiffies32;
>  
>  	mptcp_pm_data_init(msk);
> +	spin_lock_init(&msk->fallback_lock);
>  
>  	/* re-use the csk retrans timer for MPTCP-level retrans */
>  	timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0);
> @@ -3524,7 +3548,13 @@ bool mptcp_finish_join(struct sock *ssk)
>  
>  	/* active subflow, already present inside the conn_list */
>  	if (!list_empty(&subflow->node)) {
> +		spin_lock_bh(&msk->fallback_lock);
> +		if (__mptcp_check_fallback(msk)) {
> +			spin_unlock_bh(&msk->fallback_lock);
> +			return false;
> +		}
>  		mptcp_subflow_joined(msk, ssk);
> +		spin_unlock_bh(&msk->fallback_lock);

Small detail: why not modifying mptcp_subflow_joined() to have the
fallback check there? (+ rename the function name?)
If it returns false, we can abort. (Same above where
mptcp_subflow_joined is used)

>  		mptcp_propagate_sndbuf(parent, ssk);
>  		return true;
>  	}
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index e36ae88b10ad..52c9175c7e49 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -352,6 +352,7 @@ struct mptcp_sock {
>  	u32		subflow_id;
>  	u32		setsockopt_seq;
>  	char		ca_name[TCP_CA_NAME_MAX];
> +	spinlock_t	fallback_lock;	/* protects fallback && allow_infinite_fallback */
>  };
>  
>  #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock)
> @@ -1214,15 +1215,21 @@ static inline bool mptcp_check_fallback(const struct sock *sk)
>  	return __mptcp_check_fallback(msk);
>  }
>  
> -static inline void __mptcp_do_fallback(struct mptcp_sock *msk)
> +static inline bool __mptcp_do_fallback(struct mptcp_sock *msk)

I wonder if this function should not be renamed, to help with the
backports, but also because we don't do a fallback if we cannot do it:
it is more "try to fallback, but check before if it is possible" :)
Same for mptcp_do_fallback(). WDYT?

>  {
>  	if (__mptcp_check_fallback(msk)) {
>  		pr_debug("TCP fallback already done (msk=%p)\n", msk);
> -		return;
> +		return true;
>  	}
> -	if (WARN_ON_ONCE(!READ_ONCE(msk->allow_infinite_fallback)))
> -		return;
> +	spin_lock_bh(&msk->fallback_lock);
> +	if (!msk->allow_infinite_fallback) {
> +		spin_unlock_bh(&msk->fallback_lock);
> +		return false;
> +	}
> +
>  	set_bit(MPTCP_FALLBACK_DONE, &msk->flags);
> +	spin_unlock_bh(&msk->fallback_lock);
> +	return true;
>  }
>  
>  static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
> @@ -1234,14 +1241,15 @@ static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
>  			TCPF_SYN_RECV | TCPF_LISTEN));
>  }
>  
> -static inline void mptcp_do_fallback(struct sock *ssk)
> +static inline bool mptcp_do_fallback(struct sock *ssk)
>  {
>  	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
>  	struct sock *sk = subflow->conn;
>  	struct mptcp_sock *msk;
>  
>  	msk = mptcp_sk(sk);
> -	__mptcp_do_fallback(msk);
> +	if (!__mptcp_do_fallback(msk))
> +		return false;
>  	if (READ_ONCE(msk->snd_data_fin_enable) && !(ssk->sk_shutdown & SEND_SHUTDOWN)) {
>  		gfp_t saved_allocation = ssk->sk_allocation;
>  
> @@ -1253,6 +1261,7 @@ static inline void mptcp_do_fallback(struct sock *ssk)
>  		tcp_shutdown(ssk, SEND_SHUTDOWN);
>  		ssk->sk_allocation = saved_allocation;
>  	}
> +	return true;
>  }
>  
>  #define pr_fallback(a) pr_debug("%s:fallback to TCP (msk=%p)\n", __func__, a)
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 15613d691bfe..433c0b25dbc1 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -544,9 +544,11 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
>  	mptcp_get_options(skb, &mp_opt);
>  	if (subflow->request_mptcp) {
>  		if (!(mp_opt.suboptions & OPTION_MPTCP_MPC_SYNACK)) {
> +			if (!mptcp_do_fallback(sk))
> +				goto do_reset;

(I wonder if we should not have a MIB counter here as well)

> +
>  			MPTCP_INC_STATS(sock_net(sk),
>  					MPTCP_MIB_MPCAPABLEACTIVEFALLBACK);
> -			mptcp_do_fallback(sk);
>  			pr_fallback(msk);
>  			goto fallback;
>  		}
> @@ -1395,7 +1397,7 @@ static bool subflow_check_data_avail(struct sock *ssk)
>  			return true;
>  		}
>  
> -		if (!READ_ONCE(msk->allow_infinite_fallback)) {
> +		if (!mptcp_do_fallback(ssk)) {
>  			/* fatal protocol error, close the socket.
>  			 * subflow_error_report() will introduce the appropriate barriers
>  			 */
> @@ -1414,7 +1416,6 @@ static bool subflow_check_data_avail(struct sock *ssk)
>  			return false;
>  		}
>  
> -		mptcp_do_fallback(ssk);
>  	}
>  
>  	skb = skb_peek(&ssk->sk_receive_queue);
> @@ -1679,7 +1680,6 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_pm_local *local,
>  	/* discard the subflow socket */
>  	mptcp_sock_graft(ssk, sk->sk_socket);
>  	iput(SOCK_INODE(sf));
> -	WRITE_ONCE(msk->allow_infinite_fallback, false);
>  	mptcp_stop_tout_timer(sk);
>  	return 0;
>  

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 1/5] mptcp: make fallback action and fallback decision atomic
  2025-07-08 14:54   ` Matthieu Baerts
@ 2025-07-08 15:13     ` Matthieu Baerts
  2025-07-09  9:09     ` Paolo Abeni
  1 sibling, 0 replies; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-08 15:13 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: mptcp

On 08/07/2025 16:54, Matthieu Baerts wrote:

(...)

>> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
>> index 56fd7bcb786c..4680d980e605 100644
>> --- a/net/mptcp/options.c
>> +++ b/net/mptcp/options.c
>> @@ -978,8 +978,9 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
>>  		if (subflow->mp_join)
>>  			goto reset;
>>  		subflow->mp_capable = 0;
>> +		if (mptcp_do_fallback(ssk))
> 
> It looks strange: in case of success, we reset?
> 
> Not that the selftests don't validate this branch:
> 
>  https://ci-results.mptcp.dev/lcov/export/mptcp/options.c.gcov.html#L977
> 
> (also, I wonder if we should not have a MIB counter here)

Sorry, I just noticed your patch 4/5. I pressed "Send" too quickly :)

(same below of course)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races
  2025-07-05  7:24 [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races Paolo Abeni
                   ` (6 preceding siblings ...)
  2025-07-07  8:47 ` MPTCP CI
@ 2025-07-08 15:41 ` Matthieu Baerts
  7 siblings, 0 replies; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-08 15:41 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 05/07/2025 09:24, Paolo Abeni wrote:
> This series contains 3 fixes somewhat related to various races we have
> while handling fallback and 2 small follow-up likely more suited for
> net-next.
> 
> The root cause of the issues addressed here is that the check for
> "we can fallback to tcp now" and the related action are not atomic. That
> also applies to fallback due to MP_FAIL - where the window race is even
> wider.
> 
> Address the issue introducing an additional spinlock to bundle together
> all the relevant events, as per patch 1 and 2.
> 
> Note that mptcp_disconnect() unconditionally
> clears the fallback status (zeroing msk->flags) and that may race with
> operation still running on the (closing) subflows.
> 
> Such race is addressed in patch 3.
> 
> Patch 4 cleans up a bit the fallback code, introducing specific MIB for
> each FB reason, and patch 5 drops the, hopefully now redundandt
> pr_fallback().

Thank you very much for the fixes!

> Paolo Abeni (5):
>   mptcp: make fallback action and fallback decision atomic
>   mptcp: plug races between subflow fail and subflow creation
>   mptcp: fix status reset on disconnect()
>   mptcp: track fallbacks accurately via mibs
>   mptcp: remove pr_fallback()

I have a few small comments, please see the individual patches if you
don't mind!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 2/5] mptcp: plug races between subflow fail and subflow creation
  2025-07-05  7:24 ` [PATCH mptcp-net v2 2/5] mptcp: plug races between subflow fail and subflow creation Paolo Abeni
@ 2025-07-08 15:42   ` Matthieu Baerts
  2025-07-09  9:14     ` Paolo Abeni
  0 siblings, 1 reply; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-08 15:42 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 05/07/2025 09:24, Paolo Abeni wrote:
> We have races similar to the one addressed by the previous patch
> between subflow failing and additional subflow creation. They are
> just harder to trigger.
> 
> The solution is similar. Use a separate flag to track the condition
> 'socket state prevent any additional subflow creation' protected by
> the fallback lock.
> 
> The socket fallback makes such flag true, and also receiving or sending
> a MPTCP fail option.

Just to be sure, can we not rely on the MPTCP_FALLBACK_DONE flag for the
two cases?

In other words, do we not already have something set when
sending/receiving an MP_FAIL instead of adding a new field? It feels
like we already store this info, no?

Can we not continue to use __mptcp_check_fallback(msk) and
mptcp_check_infinite_map()?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()
  2025-07-05  7:24 ` [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect() Paolo Abeni
@ 2025-07-08 15:43   ` Matthieu Baerts
  2025-07-09  9:27     ` Paolo Abeni
  0 siblings, 1 reply; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-08 15:43 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 05/07/2025 09:24, Paolo Abeni wrote:
> disconnect() can still race with subflow fallback leaving the msk in
> inconsistent status. To avoid such race, disconnect() must be blocking,
> error out if the argument prevent such wait: the socket will linger
> in SS_DISCONNECT state forever.

Good idea!

Just to be sure, this is not really a behaviour change compared to TCP,
right?

No risks of breaking some cases? (io_uring, etc.?)

In which cases is this MSG_DONTWAIT flag set with disconnect()?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 4/5] mptcp: track fallbacks accurately via mibs
  2025-07-05  7:24 ` [PATCH mptcp-net v2 4/5] mptcp: track fallbacks accurately via mibs Paolo Abeni
@ 2025-07-08 15:56   ` Matthieu Baerts
  2025-07-09  9:55     ` Paolo Abeni
  0 siblings, 1 reply; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-08 15:56 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 05/07/2025 09:24, Paolo Abeni wrote:
> Add the mibs required to cover the few possible fallback causes still
> lacking suck info.
> 
> Move the relevant mib increment into the fallback helper, so that no
> eventual future fallback operation will miss a paired mib increment.
> 
> Additionally track failed fallback via its own mib.

Good idea!

> While at the above, rename an existing helper to reduce long lines
> problems all along.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> still a couple of long lines I could not trim

That's OK. My understanding is that we should try to have lines under 80
chars, but when this condition makes the code harder to read, it is fine
to go over.

> ---
>  net/mptcp/ctrl.c     |  4 ++--
>  net/mptcp/mib.c      |  5 +++++
>  net/mptcp/mib.h      |  7 +++++++
>  net/mptcp/options.c  |  2 +-
>  net/mptcp/protocol.c | 44 ++++++++++++++++++++++++++++++--------------
>  net/mptcp/protocol.h | 31 ++++++++-----------------------
>  net/mptcp/subflow.c  | 10 ++++------
>  7 files changed, 57 insertions(+), 46 deletions(-)
> 
> diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
> index d9290c5bb6c7..fed40dae5583 100644
> --- a/net/mptcp/ctrl.c
> +++ b/net/mptcp/ctrl.c
> @@ -533,9 +533,9 @@ void mptcp_active_detect_blackhole(struct sock *ssk, bool expired)
>  	to_max = mptcp_get_pernet(net)->syn_retrans_before_tcp_fallback;
>  
>  	if (timeouts == to_max || (timeouts < to_max && expired)) {
> -		MPTCP_INC_STATS(net, MPTCP_MIB_MPCAPABLEACTIVEDROP);
>  		subflow->mpc_drop = 1;
> -		mptcp_subflow_early_fallback(mptcp_sk(subflow->conn), subflow);
> +		mptcp_early_fallback(mptcp_sk(subflow->conn), subflow,
> +				     MPTCP_MIB_MPCAPABLEACTIVEDROP);
>  	}
>  }
>  
> diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
> index 0c24545f0e8d..064f5eaf0595 100644
> --- a/net/mptcp/mib.c
> +++ b/net/mptcp/mib.c
> @@ -80,6 +80,11 @@ static const struct snmp_mib mptcp_snmp_list[] = {
>  	SNMP_MIB_ITEM("RcvWndConflict", MPTCP_MIB_RCVWNDCONFLICT),
>  	SNMP_MIB_ITEM("MPCurrEstab", MPTCP_MIB_CURRESTAB),
>  	SNMP_MIB_ITEM("Blackhole", MPTCP_MIB_BLACKHOLE),
> +	SNMP_MIB_ITEM("DataFallback", MPTCP_MIB_DATA_FALLBACK),

Should it not be prefixed MPCapableFallback like others above?

Maybe: MPCapableFallbackData / MPTCP_MIB_MPCAPABLEDATAFALLBACK?

> +	SNMP_MIB_ITEM("MD5sumFallback", MPTCP_MIB_MD5SUM_FALLBACK),

For the name: is it limited to MD5Sum? Or any TCP AO?

> +	SNMP_MIB_ITEM("DssFallback", MPTCP_MIB_DSS_FALLBACK),
> +	SNMP_MIB_ITEM("SimultFallback", MPTCP_MIB_SIMULT_FALLBACK),
> +	SNMP_MIB_ITEM("FallbackFailed", MPTCP_MIB_FALLBACK_FAILED),

Do you think it would be worth it to check in the selftests
(mptcp_join.sh?) if any of these counters are incremented when they
should not be?

>  	SNMP_MIB_SENTINEL
>  };
>  
> diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
> index 250c6b77977e..99c8788694c3 100644
> --- a/net/mptcp/mib.h
> +++ b/net/mptcp/mib.h
> @@ -81,6 +81,13 @@ enum linux_mptcp_mib_field {
>  	MPTCP_MIB_RCVWNDCONFLICT,	/* Conflict with while updating msk rcv wnd */
>  	MPTCP_MIB_CURRESTAB,		/* Current established MPTCP connections */
>  	MPTCP_MIB_BLACKHOLE,		/* A blackhole has been detected */
> +	MPTCP_MIB_DATA_FALLBACK,	/* Missing DSS/MPC+data on first
> +					 * established packet
> +					 */
> +	MPTCP_MIB_MD5SUM_FALLBACK,	/* Conflicting TCP option enabled */
> +	MPTCP_MIB_DSS_FALLBACK,		/* Bad or missing DSS */
> +	MPTCP_MIB_SIMULT_FALLBACK,	/* Simultaneous connect */

Maybe clearer with CONNECT or CONN in the name?

Maybe: SimultConnectFallback / MPTCP_MIB_SIMULT_CONN_FALLBACK?

> +	MPTCP_MIB_FALLBACK_FAILED,	/* Can't fallback due to msk status */
>  	__MPTCP_MIB_MAX
>  };
>  
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index 4680d980e605..c9e1f3ec3772 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -978,7 +978,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
>  		if (subflow->mp_join)
>  			goto reset;
>  		subflow->mp_capable = 0;
> -		if (mptcp_do_fallback(ssk))
> +		if (mptcp_do_fallback(ssk, MPTCP_MIB_DATA_FALLBACK))
>  			goto reset;
>  		pr_fallback(msk);
>  		return false;
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 0dc55a17f274..5f99f02c827c 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -67,6 +67,27 @@ static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
>  	return &inet_stream_ops;
>  }
>  
> +bool __mptcp_do_fallback(struct mptcp_sock *msk, int fb_mib)
> +{
> +	struct net *net = sock_net((struct sock *)msk);
> +
> +	if (__mptcp_check_fallback(msk))
> +		return true;
> +
> +	spin_lock_bh(&msk->fallback_lock);
> +	if (!msk->allow_infinite_fallback) {
> +		__MPTCP_INC_STATS(net, MPTCP_MIB_SIMULT_FALLBACK);

Is the name OK? To me, a counter with this name should be incremented
when __mptcp_check_fallback() returns true just above, no?

Here, we now try to do a fallback, but we cannot, a reset is probably
needed instead, but that's not because we try to do the same fallback in
parallel, no?

I guess you wanted to use MPTCP_MIB_FALLBACK_FAILED, no?

Anyway, even with 'fallback failed', because in the first patch, the
helper is now also used to check if a fallback can happen. It means that
here we can be in a "normal" situation where we just want to fallback if
we can. If not, that's not an error.

In other words, I don't think we can have a generic MIB here. If we do,
this counter will be incremented in parallel to one linked to a subflow
reset, e.g. MPTCP_MIB_DSSCORRUPTIONRESET. I wonder if (1) this helper
should not be renamed to avoid such confusion -- see patch 1 -- and if
(2) another argument should not be passed to increment another counter
linked to a subflow reset or an abort (MPTCP_MIB_FALLBACK_FAILED can be
passed if that's not normal). Or maybe clearer to add one new helper to
do a fallback or a reset + increment the right MIB counter depending on
the situation? (And another helper to do a fallback if possible, and if
not, only increment MPTCP_MIB_FALLBACK_FAILED?)

(also, detail: why not doing that after the spin_unlock?)

> +		spin_unlock_bh(&msk->fallback_lock);
> +		return false;
> +	}
> +
> +	msk->allow_subflows = false;
> +	set_bit(MPTCP_FALLBACK_DONE, &msk->flags);
> +	__MPTCP_INC_STATS(net, fb_mib);

(same here, why not doing that after the spin_unlock?)

> +	spin_unlock_bh(&msk->fallback_lock);
> +	return true;
> +}
> +
>  static int __mptcp_socket_create(struct mptcp_sock *msk)
>  {
>  	struct mptcp_subflow_context *subflow;
> @@ -560,10 +581,7 @@ static bool mptcp_check_data_fin(struct sock *sk)
>  
>  static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk)
>  {
> -	if (mptcp_do_fallback(ssk)) {
> -		MPTCP_INC_STATS(sock_net(ssk),
> -				MPTCP_MIB_DSSCORRUPTIONFALLBACK);
> -	} else {
> +	if (!mptcp_do_fallback(ssk, MPTCP_MIB_DSSCORRUPTIONFALLBACK)) {
>  		MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DSSCORRUPTIONRESET);
>  		mptcp_subflow_reset(ssk);
>  	}
> @@ -1142,12 +1160,11 @@ static void mptcp_update_infinite_map(struct mptcp_sock *msk,
>  	mpext->infinite_map = 1;
>  	mpext->data_len = 0;
>  
> -	if (!mptcp_do_fallback(ssk)) {
> +	if (!mptcp_do_fallback(ssk, MPTCP_MIB_INFINITEMAPTX)) {
>  		mptcp_subflow_reset(ssk);
>  		return;
>  	}
>  
> -	MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_INFINITEMAPTX);
>  	mptcp_subflow_ctx(ssk)->send_infinite_map = 0;
>  	pr_fallback(msk);
>  }
> @@ -3717,16 +3734,15 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>  	 * TCP option space.
>  	 */
>  	if (rcu_access_pointer(tcp_sk(ssk)->md5sig_info))
> -		mptcp_subflow_early_fallback(msk, subflow);
> +		mptcp_early_fallback(msk, subflow, MPTCP_MIB_MD5SUM_FALLBACK);
>  #endif
>  	if (subflow->request_mptcp) {
> -		if (mptcp_active_should_disable(sk)) {
> -			MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_MPCAPABLEACTIVEDISABLED);
> -			mptcp_subflow_early_fallback(msk, subflow);
> -		} else if (mptcp_token_new_connect(ssk) < 0) {
> -			MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_TOKENFALLBACKINIT);
> -			mptcp_subflow_early_fallback(msk, subflow);
> -		}
> +		if (mptcp_active_should_disable(sk))
> +			mptcp_early_fallback(msk, subflow,
> +					    MPTCP_MIB_MPCAPABLEACTIVEDISABLED);
> +		else if (mptcp_token_new_connect(ssk) < 0)
> +			mptcp_early_fallback(msk, subflow,
> +					     MPTCP_MIB_TOKENFALLBACKINIT);
>  	}
>  
>  	WRITE_ONCE(msk->write_seq, subflow->idsn);
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 34c0863f1eae..ceb2f2ba2cc7 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -1220,24 +1220,6 @@ static inline bool mptcp_check_fallback(const struct sock *sk)
>  	return __mptcp_check_fallback(msk);
>  }
>  
> -static inline bool __mptcp_do_fallback(struct mptcp_sock *msk)
> -{
> -	if (__mptcp_check_fallback(msk)) {
> -		pr_debug("TCP fallback already done (msk=%p)\n", msk);
> -		return true;
> -	}
> -	spin_lock_bh(&msk->fallback_lock);
> -	if (!msk->allow_infinite_fallback) {
> -		spin_unlock_bh(&msk->fallback_lock);
> -		return false;
> -	}
> -
> -	msk->allow_subflows = false;
> -	set_bit(MPTCP_FALLBACK_DONE, &msk->flags);
> -	spin_unlock_bh(&msk->fallback_lock);
> -	return true;
> -}
> -
>  static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
>  {
>  	struct sock *ssk = READ_ONCE(msk->first);
> @@ -1247,14 +1229,16 @@ static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
>  			TCPF_SYN_RECV | TCPF_LISTEN));
>  }
>  
> -static inline bool mptcp_do_fallback(struct sock *ssk)
> +bool __mptcp_do_fallback(struct mptcp_sock *msk, int fb_mib);
> +
> +static inline bool mptcp_do_fallback(struct sock *ssk, int fb_mib)
>  {
>  	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
>  	struct sock *sk = subflow->conn;
>  	struct mptcp_sock *msk;
>  
>  	msk = mptcp_sk(sk);
> -	if (!__mptcp_do_fallback(msk))
> +	if (!__mptcp_do_fallback(msk, fb_mib))
>  		return false;
>  	if (READ_ONCE(msk->snd_data_fin_enable) && !(ssk->sk_shutdown & SEND_SHUTDOWN)) {
>  		gfp_t saved_allocation = ssk->sk_allocation;
> @@ -1272,12 +1256,13 @@ static inline bool mptcp_do_fallback(struct sock *ssk)
>  
>  #define pr_fallback(a) pr_debug("%s:fallback to TCP (msk=%p)\n", __func__, a)
>  
> -static inline void mptcp_subflow_early_fallback(struct mptcp_sock *msk,
> -						struct mptcp_subflow_context *subflow)
> +static inline void mptcp_early_fallback(struct mptcp_sock *msk,
> +					struct mptcp_subflow_context *subflow,
> +					int fb_mib)
>  {
>  	pr_fallback(msk);
>  	subflow->request_mptcp = 0;
> -	__mptcp_do_fallback(msk);
> +	__mptcp_do_fallback(msk, fb_mib);
>  }
>  
>  static inline bool mptcp_check_infinite_map(struct sk_buff *skb)
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 11c5b042c2e1..e140d032eb3e 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -544,11 +544,10 @@ static void subflow_finish_connect(struct sock *sk, const struct sk_buff *skb)
>  	mptcp_get_options(skb, &mp_opt);
>  	if (subflow->request_mptcp) {
>  		if (!(mp_opt.suboptions & OPTION_MPTCP_MPC_SYNACK)) {
> -			if (!mptcp_do_fallback(sk))
> +			if (!mptcp_do_fallback(sk,
> +					    MPTCP_MIB_MPCAPABLEACTIVEFALLBACK))
>  				goto do_reset;
>  
> -			MPTCP_INC_STATS(sock_net(sk),
> -					MPTCP_MIB_MPCAPABLEACTIVEFALLBACK);
>  			pr_fallback(msk);
>  			goto fallback;
>  		}
> @@ -1406,7 +1405,7 @@ static bool subflow_check_data_avail(struct sock *ssk)
>  			return true;
>  		}
>  
> -		if (!mptcp_do_fallback(ssk)) {
> +		if (!mptcp_do_fallback(ssk, MPTCP_MIB_DSS_FALLBACK)) {
>  			/* fatal protocol error, close the socket.
>  			 * subflow_error_report() will introduce the appropriate barriers
>  			 */
> @@ -1424,7 +1423,6 @@ static bool subflow_check_data_avail(struct sock *ssk)
>  			WRITE_ONCE(subflow->data_avail, false);
>  			return false;
>  		}
> -

Maybe better to remove this line in patch 1/5?

>  	}
>  
>  	skb = skb_peek(&ssk->sk_receive_queue);
> @@ -1860,7 +1858,7 @@ static void subflow_state_change(struct sock *sk)
>  
>  	msk = mptcp_sk(parent);
>  	if (subflow_simultaneous_connect(sk)) {
> -		mptcp_do_fallback(sk);
> +		mptcp_do_fallback(sk, MPTCP_MIB_SIMULT_FALLBACK);
>  		pr_fallback(msk);
>  		subflow->conn_finished = 1;
>  		mptcp_propagate_state(parent, sk, subflow, NULL);

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 5/5] mptcp: remove pr_fallback()
  2025-07-05  7:24 ` [PATCH mptcp-net v2 5/5] mptcp: remove pr_fallback() Paolo Abeni
  2025-07-05 18:04   ` kernel test robot
@ 2025-07-08 15:57   ` Matthieu Baerts
  1 sibling, 0 replies; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-08 15:57 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 05/07/2025 09:24, Paolo Abeni wrote:
> We can now track fully the fallback status of a given connection
> via the relevant mibs, the mentioned helper is redundant. Remove
> it completely.

I don't often use the pr_debug(), but I don't know about the other devs.

If removing this helper is an issue, we could either move this
pr_fallback() in __mptcp_do_fallback(), or also add a pr_debug() in the
macro to incrementing any MPTCP MIB counter?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 1/5] mptcp: make fallback action and fallback decision atomic
  2025-07-08 14:54   ` Matthieu Baerts
  2025-07-08 15:13     ` Matthieu Baerts
@ 2025-07-09  9:09     ` Paolo Abeni
  1 sibling, 0 replies; 28+ messages in thread
From: Paolo Abeni @ 2025-07-09  9:09 UTC (permalink / raw)
  To: Matthieu Baerts; +Cc: mptcp

On 7/8/25 4:54 PM, Matthieu Baerts wrote:
> On 05/07/2025 09:24, Paolo Abeni wrote:
>> Since we need to track the 'fallback is possible' condition and the
>> fallback status separately, there are a few possible races open between
>> the check and the actual fallback action.
>>
>> Add a spinlock to protect the fallback related information and use
>> it close all the possible related races. While at it also remove the
>> too-early clearing of allow_infinite_fallback in __mptcp_subflow_connect():
>> the field will be correctly cleared by subflow_finish_connect() if/
>> when the connection will complete successfully.
> 
> Just to be sure, will we correctly handle the case when:
> - a SYN + MP_JOIN is sent
> - an issue is detected, a fallback is done
> - a SYN + ACK + MP_JOIN is received (or later on in the join process).

Plus this also plugs a similar race on the passive side.

>> If fallback is not possible, as per RFC, reset the current subflow.
>>
>> Reported by: Matthieu Baerts <matttbe@kernel.org>
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/570
>> Reported-by: syzbot+5cf807c20386d699b524@syzkaller.appspotmail.com
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/555
>> Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status")
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> v1 -> v2:
>>   - remove unneeded and racing allow_infinite_fallback in
>>     __mptcp_subflow_connect()
>>   - really close race in subflow_check_data_avail()
>>
>> Sharing earlier to feed syzkaller.
>> Hopefully a follow-up on mptcp-net-next will cleanup a bit the fallback
>> handling, I tried to minimize the delta for backport's sake
> 
> I fed my 4 syzkaller instances 1 day and 6 hours ago, and everything
> look good so far. But please note that (1) it was not easy for them to
> reproduce the issue, and (2) you removed the warning that was triggering
> the splat :)
>> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
>> index 56fd7bcb786c..4680d980e605 100644
>> --- a/net/mptcp/options.c
>> +++ b/net/mptcp/options.c
>> @@ -978,8 +978,9 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
>>  		if (subflow->mp_join)
>>  			goto reset;
>>  		subflow->mp_capable = 0;
>> +		if (mptcp_do_fallback(ssk))
> 
> It looks strange: in case of success, we reset?

Good catch, the above is wrong, should be:

	if (!mptcp_do_fallback(ssk))

> 
> Not that the selftests don't validate this branch:
> 
>  https://ci-results.mptcp.dev/lcov/export/mptcp/options.c.gcov.html#L977
> 
> (also, I wonder if we should not have a MIB counter here)

This is the minimal patch intended to reach stable. Mibs are added by a
later patch.

/P


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 2/5] mptcp: plug races between subflow fail and subflow creation
  2025-07-08 15:42   ` Matthieu Baerts
@ 2025-07-09  9:14     ` Paolo Abeni
  2025-07-09  9:18       ` Matthieu Baerts
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Abeni @ 2025-07-09  9:14 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On 7/8/25 5:42 PM, Matthieu Baerts wrote:
> On 05/07/2025 09:24, Paolo Abeni wrote:
>> We have races similar to the one addressed by the previous patch
>> between subflow failing and additional subflow creation. They are
>> just harder to trigger.
>>
>> The solution is similar. Use a separate flag to track the condition
>> 'socket state prevent any additional subflow creation' protected by
>> the fallback lock.
>>
>> The socket fallback makes such flag true, and also receiving or sending
>> a MPTCP fail option.
> 
> Just to be sure, can we not rely on the MPTCP_FALLBACK_DONE flag for the
> two cases?
> 
> In other words, do we not already have something set when
> sending/receiving an MP_FAIL instead of adding a new field? It feels
> like we already store this info, no?

mp fail is not an alias for the new field, as the latter will be true if
e.g mp fail has been received or has been sent or sent or in case of
fallback.

We could check all the above conditions in an helper, but I found
simpler adding a new field

> Can we not continue to use __mptcp_check_fallback(msk) and
> mptcp_check_infinite_map()?

Yes and no ;)

Meaning such checks outside the fallback lock are racy, so they are
possibly useful in the fastpath/datapath, but for fallback related
decision we need to acquire the lock first.

/P


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 2/5] mptcp: plug races between subflow fail and subflow creation
  2025-07-09  9:14     ` Paolo Abeni
@ 2025-07-09  9:18       ` Matthieu Baerts
  0 siblings, 0 replies; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-09  9:18 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

On 09/07/2025 11:14, Paolo Abeni wrote:
> On 7/8/25 5:42 PM, Matthieu Baerts wrote:
>> On 05/07/2025 09:24, Paolo Abeni wrote:
>>> We have races similar to the one addressed by the previous patch
>>> between subflow failing and additional subflow creation. They are
>>> just harder to trigger.
>>>
>>> The solution is similar. Use a separate flag to track the condition
>>> 'socket state prevent any additional subflow creation' protected by
>>> the fallback lock.
>>>
>>> The socket fallback makes such flag true, and also receiving or sending
>>> a MPTCP fail option.
>>
>> Just to be sure, can we not rely on the MPTCP_FALLBACK_DONE flag for the
>> two cases?
>>
>> In other words, do we not already have something set when
>> sending/receiving an MP_FAIL instead of adding a new field? It feels
>> like we already store this info, no?
> 
> mp fail is not an alias for the new field, as the latter will be true if
> e.g mp fail has been received or has been sent or sent or in case of
> fallback.
> 
> We could check all the above conditions in an helper, but I found
> simpler adding a new field
> 
>> Can we not continue to use __mptcp_check_fallback(msk) and
>> mptcp_check_infinite_map()?
> 
> Yes and no ;)
> 
> Meaning such checks outside the fallback lock are racy, so they are
> possibly useful in the fastpath/datapath, but for fallback related
> decision we need to acquire the lock first.

OK, I see, thank you.

Hopefully this new field will not be too complex to backport.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()
  2025-07-08 15:43   ` Matthieu Baerts
@ 2025-07-09  9:27     ` Paolo Abeni
  2025-07-09 10:39       ` Matthieu Baerts
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Abeni @ 2025-07-09  9:27 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On 7/8/25 5:43 PM, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 05/07/2025 09:24, Paolo Abeni wrote:
>> disconnect() can still race with subflow fallback leaving the msk in
>> inconsistent status. To avoid such race, disconnect() must be blocking,
>> error out if the argument prevent such wait: the socket will linger
>> in SS_DISCONNECT state forever.
> 
> Good idea!
> 
> Just to be sure, this is not really a behaviour change compared to TCP,
> right?

Actually it is, as the TCP disconnect never blocks. On the flip side the
race actually exists and I haven't find another way to plug it.

An alternative I thought about was to create a new first subflow on
disconnect and let the old one 'detach' from the mptcp socket, but that
was problematic as the 'detached' ssk would have a NULL subflow->conn
and currently we assume that to be ~always != NULL

> No risks of breaking some cases? (io_uring, etc.?)

Do io_uring work with mptcp? The disconnect failure when blocking is not
possible should really never be a problem from a functional PoV:
disconnect() can fail per documentation and the application should
handle that case.

Blocking could cause some additional latency or slowdown in that code
path. I think it should be not problematic, but I have no real idea and
tend to be too optimistic WRT this kind of issues.

I suggest not sending this to stable (with a note in the commit message).

> In which cases is this MSG_DONTWAIT flag set with disconnect()?

Typo in the patch, should be O_NONBLOCK. The socket layer, the vfs layer
 and iouring could set that.

/P

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 4/5] mptcp: track fallbacks accurately via mibs
  2025-07-08 15:56   ` Matthieu Baerts
@ 2025-07-09  9:55     ` Paolo Abeni
  2025-07-09 10:23       ` Paolo Abeni
  2025-07-09 10:55       ` Matthieu Baerts
  0 siblings, 2 replies; 28+ messages in thread
From: Paolo Abeni @ 2025-07-09  9:55 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On 7/8/25 5:56 PM, Matthieu Baerts wrote:
> On 05/07/2025 09:24, Paolo Abeni wrote:
>> diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
>> index 0c24545f0e8d..064f5eaf0595 100644
>> --- a/net/mptcp/mib.c
>> +++ b/net/mptcp/mib.c
>> @@ -80,6 +80,11 @@ static const struct snmp_mib mptcp_snmp_list[] = {
>>  	SNMP_MIB_ITEM("RcvWndConflict", MPTCP_MIB_RCVWNDCONFLICT),
>>  	SNMP_MIB_ITEM("MPCurrEstab", MPTCP_MIB_CURRESTAB),
>>  	SNMP_MIB_ITEM("Blackhole", MPTCP_MIB_BLACKHOLE),
>> +	SNMP_MIB_ITEM("DataFallback", MPTCP_MIB_DATA_FALLBACK),
> 
> Should it not be prefixed MPCapableFallback like others above?
> 
> Maybe: MPCapableFallbackData / MPTCP_MIB_MPCAPABLEDATAFALLBACK?

Fine by me

> 
>> +	SNMP_MIB_ITEM("MD5sumFallback", MPTCP_MIB_MD5SUM_FALLBACK),
> 
> For the name: is it limited to MD5Sum? Or any TCP AO?

To be more accurate should be MD5_SIG_FALLBACK, or just MD5_FALLBACK

> 
>> +	SNMP_MIB_ITEM("DssFallback", MPTCP_MIB_DSS_FALLBACK),
>> +	SNMP_MIB_ITEM("SimultFallback", MPTCP_MIB_SIMULT_FALLBACK),
>> +	SNMP_MIB_ITEM("FallbackFailed", MPTCP_MIB_FALLBACK_FAILED),
> 
> Do you think it would be worth it to check in the selftests
> (mptcp_join.sh?) if any of these counters are incremented when they
> should not be?

Could be a nice follow-up! (which I don't plan ATM).

>>  	SNMP_MIB_SENTINEL
>>  };
>>  
>> diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
>> index 250c6b77977e..99c8788694c3 100644
>> --- a/net/mptcp/mib.h
>> +++ b/net/mptcp/mib.h
>> @@ -81,6 +81,13 @@ enum linux_mptcp_mib_field {
>>  	MPTCP_MIB_RCVWNDCONFLICT,	/* Conflict with while updating msk rcv wnd */
>>  	MPTCP_MIB_CURRESTAB,		/* Current established MPTCP connections */
>>  	MPTCP_MIB_BLACKHOLE,		/* A blackhole has been detected */
>> +	MPTCP_MIB_DATA_FALLBACK,	/* Missing DSS/MPC+data on first
>> +					 * established packet
>> +					 */
>> +	MPTCP_MIB_MD5SUM_FALLBACK,	/* Conflicting TCP option enabled */
>> +	MPTCP_MIB_DSS_FALLBACK,		/* Bad or missing DSS */
>> +	MPTCP_MIB_SIMULT_FALLBACK,	/* Simultaneous connect */
> 
> Maybe clearer with CONNECT or CONN in the name?
> 
> Maybe: SimultConnectFallback / MPTCP_MIB_SIMULT_CONN_FALLBACK?

Fine by me.

> 
>> +	MPTCP_MIB_FALLBACK_FAILED,	/* Can't fallback due to msk status */
>>  	__MPTCP_MIB_MAX
>>  };
>>  
>> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
>> index 4680d980e605..c9e1f3ec3772 100644
>> --- a/net/mptcp/options.c
>> +++ b/net/mptcp/options.c
>> @@ -978,7 +978,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
>>  		if (subflow->mp_join)
>>  			goto reset;
>>  		subflow->mp_capable = 0;
>> -		if (mptcp_do_fallback(ssk))
>> +		if (mptcp_do_fallback(ssk, MPTCP_MIB_DATA_FALLBACK))
>>  			goto reset;
>>  		pr_fallback(msk);
>>  		return false;
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 0dc55a17f274..5f99f02c827c 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -67,6 +67,27 @@ static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
>>  	return &inet_stream_ops;
>>  }
>>  
>> +bool __mptcp_do_fallback(struct mptcp_sock *msk, int fb_mib)
>> +{
>> +	struct net *net = sock_net((struct sock *)msk);
>> +
>> +	if (__mptcp_check_fallback(msk))
>> +		return true;
>> +
>> +	spin_lock_bh(&msk->fallback_lock);
>> +	if (!msk->allow_infinite_fallback) {
>> +		__MPTCP_INC_STATS(net, MPTCP_MIB_SIMULT_FALLBACK);
> 
> Is the name OK? To me, a counter with this name should be incremented
> when __mptcp_check_fallback() returns true just above, no?

Typo above should be MPTCP_MIB_FALLBACK_FAILED.

> Here, we now try to do a fallback, but we cannot, a reset is probably
> needed instead, but that's not because we try to do the same fallback in
> parallel, no?
> 
> I guess you wanted to use MPTCP_MIB_FALLBACK_FAILED, no?
> 
> Anyway, even with 'fallback failed', because in the first patch, the
> helper is now also used to check if a fallback can happen. It means that
> here we can be in a "normal" situation where we just want to fallback if
> we can. If not, that's not an error.

Modulo bugs it's either:

- mptcp_do_fallback() returns false and we reset the subflow
- It's an early fallback and mptcp_do_fallback() should not return
false. Corrently the code does not even check for `false` return value
in this case, but we could/should add a WARN.

In any case it's not an error, but it's a fallback failure: the code
path should have caused a fallback, but instead the socket has been
reset. I think we should track the exceptional event with a MIB, and
FALLBACK_FAILED still sounds like a valid name to me.

> In other words, I don't think we can have a generic MIB here. If we do,
> this counter will be incremented in parallel to one linked to a subflow
> reset, e.g. MPTCP_MIB_DSSCORRUPTIONRESET.

AFAICS some reset i.e. in option.c and in subflow_finish_connect() don't
get their own MIB incremented. Even what that happens I think it's good:
having more mibs incremented allow understanding what happened exactly.
Not that there are already other exceptional events where multiple mibs
are incremented (IIRC even in the MPTCP code).

I wonder if (1) this helper
> should not be renamed to avoid such confusion -- see patch 1 -- and if
> (2) another argument should not be passed to increment another counter
> linked to a subflow reset or an abort (MPTCP_MIB_FALLBACK_FAILED can be
> passed if that's not normal). Or maybe clearer to add one new helper to
> do a fallback or a reset + increment the right MIB counter depending on
> the situation? (And another helper to do a fallback if possible, and if
> not, only increment MPTCP_MIB_FALLBACK_FAILED?)

I would not over-complicate the code here. The reset depends on the
caller, having that separate AFAICS makes the code simpler.

I'm ok to rename [__]mptcp_do_fallback() to [__]mptcp_try_fallback(),
but I would not do that in patch 1, for backport's sake.

> (also, detail: why not doing that after the spin_unlock?)

So we don't need to lock/unlock BH again to update the MIB>
  			 */
>> @@ -1424,7 +1423,6 @@ static bool subflow_check_data_avail(struct sock *ssk)
>>  			WRITE_ONCE(subflow->data_avail, false);
>>  			return false;
>>  		}
>> -
> 
> Maybe better to remove this line in patch 1/5?

Fine by me.

/P


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 4/5] mptcp: track fallbacks accurately via mibs
  2025-07-09  9:55     ` Paolo Abeni
@ 2025-07-09 10:23       ` Paolo Abeni
  2025-07-09 10:42         ` Matthieu Baerts
  2025-07-09 10:55       ` Matthieu Baerts
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Abeni @ 2025-07-09 10:23 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On 7/9/25 11:55 AM, Paolo Abeni wrote:
> I'm ok to rename [__]mptcp_do_fallback() to [__]mptcp_try_fallback(),
> but I would not do that in patch 1, for backport's sake.

Actually patch 1 already touches almost all the the places where
[__]mptcp_do_fallback is used, so it should be fine to do the rename is
such patch.

For consistency with other MIBs I'll additionally trim all the '_' in
the MPTCP-specific part of the newly introduced MIB name i.e.

MPTCP_MIB_FALLBACK_FAILED -> MPTCP_MIB_FALLBACKFAILED

/P


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()
  2025-07-09  9:27     ` Paolo Abeni
@ 2025-07-09 10:39       ` Matthieu Baerts
  2025-07-09 10:49         ` Paolo Abeni
  0 siblings, 1 reply; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-09 10:39 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

Hi Paolo,

Thank you for your reply!

On 09/07/2025 11:27, Paolo Abeni wrote:
> 
> 
> On 7/8/25 5:43 PM, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> On 05/07/2025 09:24, Paolo Abeni wrote:
>>> disconnect() can still race with subflow fallback leaving the msk in
>>> inconsistent status. To avoid such race, disconnect() must be blocking,
>>> error out if the argument prevent such wait: the socket will linger
>>> in SS_DISCONNECT state forever.
>>
>> Good idea!
>>
>> Just to be sure, this is not really a behaviour change compared to TCP,
>> right?
> 
> Actually it is, as the TCP disconnect never blocks. On the flip side the
> race actually exists and I haven't find another way to plug it.
> 
> An alternative I thought about was to create a new first subflow on
> disconnect and let the old one 'detach' from the mptcp socket, but that
> was problematic as the 'detached' ssk would have a NULL subflow->conn
> and currently we assume that to be ~always != NULL

Indeed, it doesn't sound safer :)

>> No risks of breaking some cases? (io_uring, etc.?)
> 
> Do io_uring work with mptcp?

I don't know, I never tried. But I don't see why it wouldn't. (Of
course, not when used with zero copy, etc.)

> The disconnect failure when blocking is not
> possible should really never be a problem from a functional PoV:
> disconnect() can fail per documentation and the application should
> handle that case.

But it could set "O_NONBLOCK" and do the close later, no? Or is this
correctly handled?

> Blocking could cause some additional latency or slowdown in that code
> path. I think it should be not problematic, but I have no real idea and
> tend to be too optimistic WRT this kind of issues.
> 
> I suggest not sending this to stable (with a note in the commit message).

Maybe safer indeed.

>> In which cases is this MSG_DONTWAIT flag set with disconnect()?
> 
> Typo in the patch, should be O_NONBLOCK. The socket layer, the vfs layer
>  and iouring could set that.

So typically what we do in the packetdrill tests. But maybe we don't
check the returned value?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 4/5] mptcp: track fallbacks accurately via mibs
  2025-07-09 10:23       ` Paolo Abeni
@ 2025-07-09 10:42         ` Matthieu Baerts
  0 siblings, 0 replies; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-09 10:42 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On 09/07/2025 12:23, Paolo Abeni wrote:
> On 7/9/25 11:55 AM, Paolo Abeni wrote:
>> I'm ok to rename [__]mptcp_do_fallback() to [__]mptcp_try_fallback(),
>> but I would not do that in patch 1, for backport's sake.
> 
> Actually patch 1 already touches almost all the the places where
> [__]mptcp_do_fallback is used, so it should be fine to do the rename is
> such patch.

Great, thank you!

Just to come back on your "for backport's sake": if the behaviour of a
function is modified, I think it is safer to rename it → so during the
backports, we can easily find extra calls that are only in stable
versions, but not in the dev branch, and do extra adaptations if needed.

> For consistency with other MIBs I'll additionally trim all the '_' in
> the MPTCP-specific part of the newly introduced MIB name i.e.
> 
> MPTCP_MIB_FALLBACK_FAILED -> MPTCP_MIB_FALLBACKFAILED

Good point!
Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()
  2025-07-09 10:39       ` Matthieu Baerts
@ 2025-07-09 10:49         ` Paolo Abeni
  2025-07-09 10:55           ` Matthieu Baerts
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Abeni @ 2025-07-09 10:49 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On 7/9/25 12:39 PM, Matthieu Baerts wrote:
> On 09/07/2025 11:27, Paolo Abeni wrote:
>> The disconnect failure when blocking is not
>> possible should really never be a problem from a functional PoV:
>> disconnect() can fail per documentation and the application should
>> handle that case.
> 
> But it could set "O_NONBLOCK" and do the close later, no? Or is this
> correctly handled?

Yes and no. Meaning the kernel does not offer any mechanism to complete
gracefully a blocking disconnect returning EAGAIN. In such a case the
socket will stay in SS_DISCONNECTING (socket) state until close time.

>>> In which cases is this MSG_DONTWAIT flag set with disconnect()?
>>
>> Typo in the patch, should be O_NONBLOCK. The socket layer, the vfs layer
>>  and iouring could set that.
> 
> So typically what we do in the packetdrill tests. But maybe we don't
> check the returned value?

The only relevant pktdrill test I could find is
`disconnect_after_accept.pkt` and it checks the return value.

Let me check how it will run after the MSG_DONTWAIT -> O_NONBLOCK fix

/P



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 4/5] mptcp: track fallbacks accurately via mibs
  2025-07-09  9:55     ` Paolo Abeni
  2025-07-09 10:23       ` Paolo Abeni
@ 2025-07-09 10:55       ` Matthieu Baerts
  1 sibling, 0 replies; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-09 10:55 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On 09/07/2025 11:55, Paolo Abeni wrote:
> On 7/8/25 5:56 PM, Matthieu Baerts wrote:
>> On 05/07/2025 09:24, Paolo Abeni wrote:
>>> diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c
>>> index 0c24545f0e8d..064f5eaf0595 100644
>>> --- a/net/mptcp/mib.c
>>> +++ b/net/mptcp/mib.c
>>> @@ -80,6 +80,11 @@ static const struct snmp_mib mptcp_snmp_list[] = {
>>>  	SNMP_MIB_ITEM("RcvWndConflict", MPTCP_MIB_RCVWNDCONFLICT),
>>>  	SNMP_MIB_ITEM("MPCurrEstab", MPTCP_MIB_CURRESTAB),
>>>  	SNMP_MIB_ITEM("Blackhole", MPTCP_MIB_BLACKHOLE),
>>> +	SNMP_MIB_ITEM("DataFallback", MPTCP_MIB_DATA_FALLBACK),
>>
>> Should it not be prefixed MPCapableFallback like others above?
>>
>> Maybe: MPCapableFallbackData / MPTCP_MIB_MPCAPABLEDATAFALLBACK?
> 
> Fine by me
> 
>>
>>> +	SNMP_MIB_ITEM("MD5sumFallback", MPTCP_MIB_MD5SUM_FALLBACK),
>>
>> For the name: is it limited to MD5Sum? Or any TCP AO?
> 
> To be more accurate should be MD5_SIG_FALLBACK, or just MD5_FALLBACK

Do we fallback to TCP when TCP AO is used? Or maybe it can be used in
parallel? I don't remember.

>>> +	SNMP_MIB_ITEM("DssFallback", MPTCP_MIB_DSS_FALLBACK),
>>> +	SNMP_MIB_ITEM("SimultFallback", MPTCP_MIB_SIMULT_FALLBACK),
>>> +	SNMP_MIB_ITEM("FallbackFailed", MPTCP_MIB_FALLBACK_FAILED),
>>
>> Do you think it would be worth it to check in the selftests
>> (mptcp_join.sh?) if any of these counters are incremented when they
>> should not be?
> 
> Could be a nice follow-up! (which I don't plan ATM).

I just created this:

  https://github.com/multipath-tcp/mptcp_net-next/issues/571


>>>  	SNMP_MIB_SENTINEL
>>>  };
>>>  
>>> diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h
>>> index 250c6b77977e..99c8788694c3 100644
>>> --- a/net/mptcp/mib.h
>>> +++ b/net/mptcp/mib.h
>>> @@ -81,6 +81,13 @@ enum linux_mptcp_mib_field {
>>>  	MPTCP_MIB_RCVWNDCONFLICT,	/* Conflict with while updating msk rcv wnd */
>>>  	MPTCP_MIB_CURRESTAB,		/* Current established MPTCP connections */
>>>  	MPTCP_MIB_BLACKHOLE,		/* A blackhole has been detected */
>>> +	MPTCP_MIB_DATA_FALLBACK,	/* Missing DSS/MPC+data on first
>>> +					 * established packet
>>> +					 */
>>> +	MPTCP_MIB_MD5SUM_FALLBACK,	/* Conflicting TCP option enabled */
>>> +	MPTCP_MIB_DSS_FALLBACK,		/* Bad or missing DSS */
>>> +	MPTCP_MIB_SIMULT_FALLBACK,	/* Simultaneous connect */
>>
>> Maybe clearer with CONNECT or CONN in the name?
>>
>> Maybe: SimultConnectFallback / MPTCP_MIB_SIMULT_CONN_FALLBACK?
> 
> Fine by me.
> 
>>
>>> +	MPTCP_MIB_FALLBACK_FAILED,	/* Can't fallback due to msk status */
>>>  	__MPTCP_MIB_MAX
>>>  };
>>>  
>>> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
>>> index 4680d980e605..c9e1f3ec3772 100644
>>> --- a/net/mptcp/options.c
>>> +++ b/net/mptcp/options.c
>>> @@ -978,7 +978,7 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk,
>>>  		if (subflow->mp_join)
>>>  			goto reset;
>>>  		subflow->mp_capable = 0;
>>> -		if (mptcp_do_fallback(ssk))
>>> +		if (mptcp_do_fallback(ssk, MPTCP_MIB_DATA_FALLBACK))
>>>  			goto reset;
>>>  		pr_fallback(msk);
>>>  		return false;
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index 0dc55a17f274..5f99f02c827c 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -67,6 +67,27 @@ static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
>>>  	return &inet_stream_ops;
>>>  }
>>>  
>>> +bool __mptcp_do_fallback(struct mptcp_sock *msk, int fb_mib)
>>> +{
>>> +	struct net *net = sock_net((struct sock *)msk);
>>> +
>>> +	if (__mptcp_check_fallback(msk))
>>> +		return true;
>>> +
>>> +	spin_lock_bh(&msk->fallback_lock);
>>> +	if (!msk->allow_infinite_fallback) {
>>> +		__MPTCP_INC_STATS(net, MPTCP_MIB_SIMULT_FALLBACK);
>>
>> Is the name OK? To me, a counter with this name should be incremented
>> when __mptcp_check_fallback() returns true just above, no?
> 
> Typo above should be MPTCP_MIB_FALLBACK_FAILED.
> 
>> Here, we now try to do a fallback, but we cannot, a reset is probably
>> needed instead, but that's not because we try to do the same fallback in
>> parallel, no?
>>
>> I guess you wanted to use MPTCP_MIB_FALLBACK_FAILED, no?
>>
>> Anyway, even with 'fallback failed', because in the first patch, the
>> helper is now also used to check if a fallback can happen. It means that
>> here we can be in a "normal" situation where we just want to fallback if
>> we can. If not, that's not an error.
> 
> Modulo bugs it's either:
> 
> - mptcp_do_fallback() returns false and we reset the subflow
> - It's an early fallback and mptcp_do_fallback() should not return
> false. Corrently the code does not even check for `false` return value
> in this case, but we could/should add a WARN.

Good idea :)
> In any case it's not an error, but it's a fallback failure: the code
> path should have caused a fallback, but instead the socket has been
> reset. I think we should track the exceptional event with a MIB, and
> FALLBACK_FAILED still sounds like a valid name to me.

I don't think it is always an error, see when you set
MPTCP_MIB_DSSCORRUPTIONFALLBACK (mptcp_dss_corruption) or
MPTCP_MIB_DSS_FALLBACK (subflow_check_data_avail), no?

>> In other words, I don't think we can have a generic MIB here. If we do,
>> this counter will be incremented in parallel to one linked to a subflow
>> reset, e.g. MPTCP_MIB_DSSCORRUPTIONRESET.
> 
> AFAICS some reset i.e. in option.c and in subflow_finish_connect() don't
> get their own MIB incremented. Even what that happens I think it's good:
> having more mibs incremented allow understanding what happened exactly.
> Not that there are already other exceptional events where multiple mibs
> are incremented (IIRC even in the MPTCP code).

Sure but it might be confusing to see that the "fallback failed" when it
was not supposed to even try to do a fallback at that stage of the
connection.

>> I wonder if (1) this helper
>> should not be renamed to avoid such confusion -- see patch 1 -- and if
>> (2) another argument should not be passed to increment another counter
>> linked to a subflow reset or an abort (MPTCP_MIB_FALLBACK_FAILED can be
>> passed if that's not normal). Or maybe clearer to add one new helper to
>> do a fallback or a reset + increment the right MIB counter depending on
>> the situation? (And another helper to do a fallback if possible, and if
>> not, only increment MPTCP_MIB_FALLBACK_FAILED?)
> 
> I would not over-complicate the code here. The reset depends on the
> caller, having that separate AFAICS makes the code simpler.

Fine by me.

> I'm ok to rename [__]mptcp_do_fallback() to [__]mptcp_try_fallback(),
> but I would not do that in patch 1, for backport's sake.
> 
>> (also, detail: why not doing that after the spin_unlock?)
> 
> So we don't need to lock/unlock BH again to update the MIB>

Ah yes, you use the "__" version of MPTCP_INC_STATS, missed that.

>   			 */
>>> @@ -1424,7 +1423,6 @@ static bool subflow_check_data_avail(struct sock *ssk)
>>>  			WRITE_ONCE(subflow->data_avail, false);
>>>  			return false;
>>>  		}
>>> -
>>
>> Maybe better to remove this line in patch 1/5?
> 
> Fine by me.
> 
> /P
> 

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()
  2025-07-09 10:49         ` Paolo Abeni
@ 2025-07-09 10:55           ` Matthieu Baerts
  2025-07-09 12:33             ` Paolo Abeni
  0 siblings, 1 reply; 28+ messages in thread
From: Matthieu Baerts @ 2025-07-09 10:55 UTC (permalink / raw)
  To: Paolo Abeni, mptcp

On 09/07/2025 12:49, Paolo Abeni wrote:
> On 7/9/25 12:39 PM, Matthieu Baerts wrote:
>> On 09/07/2025 11:27, Paolo Abeni wrote:
>>> The disconnect failure when blocking is not
>>> possible should really never be a problem from a functional PoV:
>>> disconnect() can fail per documentation and the application should
>>> handle that case.
>>
>> But it could set "O_NONBLOCK" and do the close later, no? Or is this
>> correctly handled?
> 
> Yes and no. Meaning the kernel does not offer any mechanism to complete
> gracefully a blocking disconnect returning EAGAIN. In such a case the
> socket will stay in SS_DISCONNECTING (socket) state until close time.
> 
>>>> In which cases is this MSG_DONTWAIT flag set with disconnect()?
>>>
>>> Typo in the patch, should be O_NONBLOCK. The socket layer, the vfs layer
>>>  and iouring could set that.
>>
>> So typically what we do in the packetdrill tests. But maybe we don't
>> check the returned value?
> 
> The only relevant pktdrill test I could find is
> `disconnect_after_accept.pkt` and it checks the return value.
> 
> Let me check how it will run after the MSG_DONTWAIT -> O_NONBLOCK fix

Thank you!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect()
  2025-07-09 10:55           ` Matthieu Baerts
@ 2025-07-09 12:33             ` Paolo Abeni
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Abeni @ 2025-07-09 12:33 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp

On 7/9/25 12:55 PM, Matthieu Baerts wrote:
> On 09/07/2025 12:49, Paolo Abeni wrote:
>> On 7/9/25 12:39 PM, Matthieu Baerts wrote:
>>> On 09/07/2025 11:27, Paolo Abeni wrote:
>>>> The disconnect failure when blocking is not
>>>> possible should really never be a problem from a functional PoV:
>>>> disconnect() can fail per documentation and the application should
>>>> handle that case.
>>>
>>> But it could set "O_NONBLOCK" and do the close later, no? Or is this
>>> correctly handled?
>>
>> Yes and no. Meaning the kernel does not offer any mechanism to complete
>> gracefully a blocking disconnect returning EAGAIN. In such a case the
>> socket will stay in SS_DISCONNECTING (socket) state until close time.
>>
>>>>> In which cases is this MSG_DONTWAIT flag set with disconnect()?
>>>>
>>>> Typo in the patch, should be O_NONBLOCK. The socket layer, the vfs layer
>>>>  and iouring could set that.
>>>
>>> So typically what we do in the packetdrill tests. But maybe we don't
>>> check the returned value?
>>
>> The only relevant pktdrill test I could find is
>> `disconnect_after_accept.pkt` and it checks the return value.
>>
>> Let me check how it will run after the MSG_DONTWAIT -> O_NONBLOCK fix
> 
> Thank you!

The test is passing here. I think that is due to mptcp_disconnect()
invoking mptcp_close_ssk() with the MPTCP_CF_FASTCLOSE flag, which in
turn forces tcp_disconnect() invocation on msk->first, that forciby
closes the first subflow (sk_state == TCP_CLOSE after that).

So we can trim that blocking wait out of patch 3.

/P


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-07-09 12:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-05  7:24 [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races Paolo Abeni
2025-07-05  7:24 ` [PATCH mptcp-net v2 1/5] mptcp: make fallback action and fallback decision atomic Paolo Abeni
2025-07-08 14:54   ` Matthieu Baerts
2025-07-08 15:13     ` Matthieu Baerts
2025-07-09  9:09     ` Paolo Abeni
2025-07-05  7:24 ` [PATCH mptcp-net v2 2/5] mptcp: plug races between subflow fail and subflow creation Paolo Abeni
2025-07-08 15:42   ` Matthieu Baerts
2025-07-09  9:14     ` Paolo Abeni
2025-07-09  9:18       ` Matthieu Baerts
2025-07-05  7:24 ` [PATCH mptcp-net v2 3/5] mptcp: fix status reset on disconnect() Paolo Abeni
2025-07-08 15:43   ` Matthieu Baerts
2025-07-09  9:27     ` Paolo Abeni
2025-07-09 10:39       ` Matthieu Baerts
2025-07-09 10:49         ` Paolo Abeni
2025-07-09 10:55           ` Matthieu Baerts
2025-07-09 12:33             ` Paolo Abeni
2025-07-05  7:24 ` [PATCH mptcp-net v2 4/5] mptcp: track fallbacks accurately via mibs Paolo Abeni
2025-07-08 15:56   ` Matthieu Baerts
2025-07-09  9:55     ` Paolo Abeni
2025-07-09 10:23       ` Paolo Abeni
2025-07-09 10:42         ` Matthieu Baerts
2025-07-09 10:55       ` Matthieu Baerts
2025-07-05  7:24 ` [PATCH mptcp-net v2 5/5] mptcp: remove pr_fallback() Paolo Abeni
2025-07-05 18:04   ` kernel test robot
2025-07-08 15:57   ` Matthieu Baerts
2025-07-07  7:43 ` [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races MPTCP CI
2025-07-07  8:47 ` MPTCP CI
2025-07-08 15:41 ` Matthieu Baerts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).