* [PATCH net 1/3] mptcp: always handle address removal under msk socket lock
2025-02-24 18:11 [PATCH net 0/3] mptcp: misc. fixes Matthieu Baerts (NGI0)
@ 2025-02-24 18:11 ` Matthieu Baerts (NGI0)
2025-02-24 18:11 ` [PATCH net 2/3] mptcp: reset when MPTCP opts are dropped after join Matthieu Baerts (NGI0)
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-02-24 18:11 UTC (permalink / raw)
To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, linux-kernel, Matthieu Baerts (NGI0), stable,
syzbot+cd3ce3d03a3393ae9700
From: Paolo Abeni <pabeni@redhat.com>
Syzkaller reported a lockdep splat in the PM control path:
WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 sock_owned_by_me include/net/sock.h:1711 [inline]
WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 msk_owned_by_me net/mptcp/protocol.h:363 [inline]
WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 mptcp_pm_nl_addr_send_ack+0x57c/0x610 net/mptcp/pm_netlink.c:788
Modules linked in:
CPU: 0 UID: 0 PID: 6693 Comm: syz.0.205 Not tainted 6.14.0-rc2-syzkaller-00303-gad1b832bf1cf #0
Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
RIP: 0010:sock_owned_by_me include/net/sock.h:1711 [inline]
RIP: 0010:msk_owned_by_me net/mptcp/protocol.h:363 [inline]
RIP: 0010:mptcp_pm_nl_addr_send_ack+0x57c/0x610 net/mptcp/pm_netlink.c:788
Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ca 7b d3 f5 eb b9 e8 c3 7b d3 f5 90 0f 0b 90 e9 dd fb ff ff e8 b5 7b d3 f5 90 <0f> 0b 90 e9 3e fb ff ff 44 89 f1 80 e1 07 38 c1 0f 8c eb fb ff ff
RSP: 0000:ffffc900034f6f60 EFLAGS: 00010283
RAX: ffffffff8bee3c2b RBX: 0000000000000001 RCX: 0000000000080000
RDX: ffffc90004d42000 RSI: 000000000000a407 RDI: 000000000000a408
RBP: ffffc900034f7030 R08: ffffffff8bee37f6 R09: 0100000000000000
R10: dffffc0000000000 R11: ffffed100bcc62e4 R12: ffff88805e6316e0
R13: ffff88805e630c00 R14: dffffc0000000000 R15: ffff88805e630c00
FS: 00007f7e9a7e96c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2fd18ff8 CR3: 0000000032c24000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
mptcp_pm_remove_addr+0x103/0x1d0 net/mptcp/pm.c:59
mptcp_pm_remove_anno_addr+0x1f4/0x2f0 net/mptcp/pm_netlink.c:1486
mptcp_nl_remove_subflow_and_signal_addr net/mptcp/pm_netlink.c:1518 [inline]
mptcp_pm_nl_del_addr_doit+0x118d/0x1af0 net/mptcp/pm_netlink.c:1629
genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
genl_rcv_msg+0xb1f/0xec0 net/netlink/genetlink.c:1210
netlink_rcv_skb+0x206/0x480 net/netlink/af_netlink.c:2543
genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1348
netlink_sendmsg+0x8de/0xcb0 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:718 [inline]
__sock_sendmsg+0x221/0x270 net/socket.c:733
____sys_sendmsg+0x53a/0x860 net/socket.c:2573
___sys_sendmsg net/socket.c:2627 [inline]
__sys_sendmsg+0x269/0x350 net/socket.c:2659
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7e9998cde9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f7e9a7e9038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f7e99ba5fa0 RCX: 00007f7e9998cde9
RDX: 000000002000c094 RSI: 0000400000000000 RDI: 0000000000000007
RBP: 00007f7e99a0e2a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f7e99ba5fa0 R15: 00007fff49231088
Indeed the PM can try to send a RM_ADDR over a msk without acquiring
first the msk socket lock.
The bugged code-path comes from an early optimization: when there
are no subflows, the PM should (usually) not send RM_ADDR
notifications.
The above statement is incorrect, as without locks another process
could concurrent create a new subflow and cause the RM_ADDR generation.
Additionally the supposed optimization is not very effective even
performance-wise, as most mptcp sockets should have at least one
subflow: the MPC one.
Address the issue removing the buggy code path, the existing "slow-path"
will handle correctly even the edge case.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink")
Cc: stable@vger.kernel.org
Reported-by: syzbot+cd3ce3d03a3393ae9700@syzkaller.appspotmail.com
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/546
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/pm_netlink.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index 572d160edca33c0a941203d8ae0b0bde0f2ef3e2..c0e47f4f7b1aa2fedf615c44ea595c1f9d2528f9 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -1514,11 +1514,6 @@ static int mptcp_nl_remove_subflow_and_signal_addr(struct net *net,
if (mptcp_pm_is_userspace(msk))
goto next;
- if (list_empty(&msk->conn_list)) {
- mptcp_pm_remove_anno_addr(msk, addr, false);
- goto next;
- }
-
lock_sock(sk);
remove_subflow = mptcp_lookup_subflow_by_saddr(&msk->conn_list, addr);
mptcp_pm_remove_anno_addr(msk, addr, remove_subflow &&
--
2.47.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH net 2/3] mptcp: reset when MPTCP opts are dropped after join
2025-02-24 18:11 [PATCH net 0/3] mptcp: misc. fixes Matthieu Baerts (NGI0)
2025-02-24 18:11 ` [PATCH net 1/3] mptcp: always handle address removal under msk socket lock Matthieu Baerts (NGI0)
@ 2025-02-24 18:11 ` Matthieu Baerts (NGI0)
2025-02-25 12:04 ` Chester A. Unal
2025-02-24 18:11 ` [PATCH net 3/3] mptcp: safety check before fallback Matthieu Baerts (NGI0)
2025-02-26 3:20 ` [PATCH net 0/3] mptcp: misc. fixes patchwork-bot+netdevbpf
3 siblings, 1 reply; 6+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-02-24 18:11 UTC (permalink / raw)
To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, linux-kernel, Matthieu Baerts (NGI0), stable,
Chester A. Unal
Before this patch, if the checksum was not used, the subflow was only
reset if map_data_len was != 0. If there were no MPTCP options or an
invalid mapping, map_data_len was not set to the data len, and then the
subflow was not reset as it should have been, leaving the MPTCP
connection in a wrong fallback mode.
This map_data_len condition has been introduced to handle the reception
of the infinite mapping. Instead, a new dedicated mapping error could
have been returned and treated as a special case. However, the commit
31bf11de146c ("mptcp: introduce MAPPING_BAD_CSUM") has been introduced
by Paolo Abeni soon after, and backported later on to stable. It better
handle the csum case, and it means the exception for valid_csum_seen in
subflow_can_fallback(), plus this one for the infinite mapping in
subflow_check_data_avail(), are no longer needed.
In other words, the code can be simplified there: a fallback should only
be done if msk->allow_infinite_fallback is set. This boolean is set to
false once MPTCP-specific operations acting on the whole MPTCP
connection vs the initial path have been done, e.g. a second path has
been created, or an MPTCP re-injection -- yes, possible even with a
single subflow. The subflow_can_fallback() helper can then be dropped,
and replaced by this single condition.
This also makes the code clearer: a fallback should only be done if it
is possible to do so.
While at it, no need to set map_data_len to 0 in get_mapping_status()
for the infinite mapping case: it will be set to skb->len just after, at
the end of subflow_check_data_avail(), and not read in between.
Fixes: f8d4bcacff3b ("mptcp: infinite mapping receiving")
Cc: stable@vger.kernel.org
Reported-by: Chester A. Unal <chester.a.unal@xpedite-tech.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/544
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/subflow.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index dfcbef9c46246983d21c827d9551d4eb2c95430e..9f18217dddc865bc467a2c5c34aa4bf23bf3ac75 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1142,7 +1142,6 @@ static enum mapping_status get_mapping_status(struct sock *ssk,
if (data_len == 0) {
pr_debug("infinite mapping received\n");
MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_INFINITEMAPRX);
- subflow->map_data_len = 0;
return MAPPING_INVALID;
}
@@ -1286,18 +1285,6 @@ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ss
mptcp_schedule_work(sk);
}
-static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)
-{
- struct mptcp_sock *msk = mptcp_sk(subflow->conn);
-
- if (subflow->mp_join)
- return false;
- else if (READ_ONCE(msk->csum_enabled))
- return !subflow->valid_csum_seen;
- else
- return READ_ONCE(msk->allow_infinite_fallback);
-}
-
static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
@@ -1393,7 +1380,7 @@ static bool subflow_check_data_avail(struct sock *ssk)
return true;
}
- if (!subflow_can_fallback(subflow) && subflow->map_data_len) {
+ if (!READ_ONCE(msk->allow_infinite_fallback)) {
/* fatal protocol error, close the socket.
* subflow_error_report() will introduce the appropriate barriers
*/
--
2.47.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH net 2/3] mptcp: reset when MPTCP opts are dropped after join
2025-02-24 18:11 ` [PATCH net 2/3] mptcp: reset when MPTCP opts are dropped after join Matthieu Baerts (NGI0)
@ 2025-02-25 12:04 ` Chester A. Unal
0 siblings, 0 replies; 6+ messages in thread
From: Chester A. Unal @ 2025-02-25 12:04 UTC (permalink / raw)
To: Matthieu Baerts (NGI0)
Cc: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, netdev, linux-kernel,
stable, Zenon van Deventer, dwayne Du Preez, Alexander Scholten
Tested-by: Chester A. Unal <chester.a.unal@xpedite-tech.com>
Thanks a ton!
Chester A.
On Mon, Feb 24, 2025 at 6:19 PM Matthieu Baerts (NGI0)
<matttbe@kernel.org> wrote:
>
> Before this patch, if the checksum was not used, the subflow was only
> reset if map_data_len was != 0. If there were no MPTCP options or an
> invalid mapping, map_data_len was not set to the data len, and then the
> subflow was not reset as it should have been, leaving the MPTCP
> connection in a wrong fallback mode.
>
> This map_data_len condition has been introduced to handle the reception
> of the infinite mapping. Instead, a new dedicated mapping error could
> have been returned and treated as a special case. However, the commit
> 31bf11de146c ("mptcp: introduce MAPPING_BAD_CSUM") has been introduced
> by Paolo Abeni soon after, and backported later on to stable. It better
> handle the csum case, and it means the exception for valid_csum_seen in
> subflow_can_fallback(), plus this one for the infinite mapping in
> subflow_check_data_avail(), are no longer needed.
>
> In other words, the code can be simplified there: a fallback should only
> be done if msk->allow_infinite_fallback is set. This boolean is set to
> false once MPTCP-specific operations acting on the whole MPTCP
> connection vs the initial path have been done, e.g. a second path has
> been created, or an MPTCP re-injection -- yes, possible even with a
> single subflow. The subflow_can_fallback() helper can then be dropped,
> and replaced by this single condition.
>
> This also makes the code clearer: a fallback should only be done if it
> is possible to do so.
>
> While at it, no need to set map_data_len to 0 in get_mapping_status()
> for the infinite mapping case: it will be set to skb->len just after, at
> the end of subflow_check_data_avail(), and not read in between.
>
> Fixes: f8d4bcacff3b ("mptcp: infinite mapping receiving")
> Cc: stable@vger.kernel.org
> Reported-by: Chester A. Unal <chester.a.unal@xpedite-tech.com>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/544
> Acked-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> net/mptcp/subflow.c | 15 +--------------
> 1 file changed, 1 insertion(+), 14 deletions(-)
>
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index dfcbef9c46246983d21c827d9551d4eb2c95430e..9f18217dddc865bc467a2c5c34aa4bf23bf3ac75 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1142,7 +1142,6 @@ static enum mapping_status get_mapping_status(struct sock *ssk,
> if (data_len == 0) {
> pr_debug("infinite mapping received\n");
> MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_INFINITEMAPRX);
> - subflow->map_data_len = 0;
> return MAPPING_INVALID;
> }
>
> @@ -1286,18 +1285,6 @@ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ss
> mptcp_schedule_work(sk);
> }
>
> -static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)
> -{
> - struct mptcp_sock *msk = mptcp_sk(subflow->conn);
> -
> - if (subflow->mp_join)
> - return false;
> - else if (READ_ONCE(msk->csum_enabled))
> - return !subflow->valid_csum_seen;
> - else
> - return READ_ONCE(msk->allow_infinite_fallback);
> -}
> -
> static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk)
> {
> struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> @@ -1393,7 +1380,7 @@ static bool subflow_check_data_avail(struct sock *ssk)
> return true;
> }
>
> - if (!subflow_can_fallback(subflow) && subflow->map_data_len) {
> + if (!READ_ONCE(msk->allow_infinite_fallback)) {
> /* fatal protocol error, close the socket.
> * subflow_error_report() will introduce the appropriate barriers
> */
>
> --
> 2.47.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net 3/3] mptcp: safety check before fallback
2025-02-24 18:11 [PATCH net 0/3] mptcp: misc. fixes Matthieu Baerts (NGI0)
2025-02-24 18:11 ` [PATCH net 1/3] mptcp: always handle address removal under msk socket lock Matthieu Baerts (NGI0)
2025-02-24 18:11 ` [PATCH net 2/3] mptcp: reset when MPTCP opts are dropped after join Matthieu Baerts (NGI0)
@ 2025-02-24 18:11 ` Matthieu Baerts (NGI0)
2025-02-26 3:20 ` [PATCH net 0/3] mptcp: misc. fixes patchwork-bot+netdevbpf
3 siblings, 0 replies; 6+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-02-24 18:11 UTC (permalink / raw)
To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, linux-kernel, Matthieu Baerts (NGI0)
Recently, some fallback have been initiated, while the connection was
not supposed to fallback.
Add a safety check with a warning to detect when an wrong attempt to
fallback is being done. This should help detecting any future issues
quicker.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/protocol.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index f6a207958459db5bd39f91ed7431b5a766669f92..ad21925af061263d908bd9faddffaef979f46c93 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1199,6 +1199,8 @@ static inline void __mptcp_do_fallback(struct mptcp_sock *msk)
pr_debug("TCP fallback already done (msk=%p)\n", msk);
return;
}
+ if (WARN_ON_ONCE(!READ_ONCE(msk->allow_infinite_fallback)))
+ return;
set_bit(MPTCP_FALLBACK_DONE, &msk->flags);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH net 0/3] mptcp: misc. fixes
2025-02-24 18:11 [PATCH net 0/3] mptcp: misc. fixes Matthieu Baerts (NGI0)
` (2 preceding siblings ...)
2025-02-24 18:11 ` [PATCH net 3/3] mptcp: safety check before fallback Matthieu Baerts (NGI0)
@ 2025-02-26 3:20 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-02-26 3:20 UTC (permalink / raw)
To: Matthieu Baerts
Cc: mptcp, martineau, geliang, davem, edumazet, kuba, pabeni, horms,
netdev, linux-kernel, stable, syzbot+cd3ce3d03a3393ae9700,
chester.a.unal
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 24 Feb 2025 19:11:49 +0100 you wrote:
> Here are two unrelated fixes, plus an extra patch:
>
> - Patch 1: prevent a warning by removing an unneeded and incorrect small
> optimisation in the path-manager. A fix for v5.10.
>
> - Patch 2: reset a subflow when MPTCP opts have been dropped after
> having correctly added a new path. A fix for v5.19.
>
> [...]
Here is the summary with links:
- [net,1/3] mptcp: always handle address removal under msk socket lock
https://git.kernel.org/netdev/net/c/f865c24bc551
- [net,2/3] mptcp: reset when MPTCP opts are dropped after join
https://git.kernel.org/netdev/net/c/8668860b0ad3
- [net,3/3] mptcp: safety check before fallback
https://git.kernel.org/netdev/net/c/db75a16813aa
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 6+ messages in thread