* [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap
[not found] <20251105113625.148900-1-jiayuan.chen@linux.dev>
@ 2025-11-05 11:36 ` Jiayuan Chen
2025-11-05 14:39 ` Matthieu Baerts
2025-11-05 11:36 ` [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
1 sibling, 1 reply; 4+ messages in thread
From: Jiayuan Chen @ 2025-11-05 11:36 UTC (permalink / raw)
To: mptcp
Cc: Jiayuan Chen, stable, Matthieu Baerts, Mat Martineau,
Geliang Tang, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Shuah Khan, Florian Westphal, linux-kernel,
netdev, bpf, linux-kselftest
The sockmap feature allows bpf syscall from userspace using , or based
on bpf sockops, replacing the sk_prot of sockets during protocol stack
processing with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
subflow_syn_recv_sock()
tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
bpf_skops_established <== sockops
bpf_sock_map_update(sk) <== call bpf helper
tcp_bpf_update_proto() <== update sk_prot
'''
Consider two scenarios:
1. When the server has MPTCP enabled and the client also requests MPTCP,
the sk passed to the BPF program is a subflow sk. Since subflows only
handle partial data, replacing their sk_prot is meaningless and will
cause traffic disruption.
2. When the server has MPTCP enabled but the client sends a TCP SYN
without MPTCP, subflow_syn_recv_sock() performs a fallback on the
subflow, replacing the subflow sk's sk_prot with the native sk_prot.
'''
subflow_ulp_fallback()
subflow_drop_ctx()
mptcp_subflow_ops_undo_override()
'''
Subsequently, accept::mptcp_stream_accept::mptcp_fallback_tcp_ops()
converts the subflow to plain TCP.
For the first case, we should prevent it from being combined with sockmap
by setting sk_prot->psock_update_sk_prot to NULL, which will be blocked by
sockmap's own flow.
For the second case, since subflow_syn_recv_sock() has already restored
sk_prot to native tcp_prot/tcpv6_prot, no further action is needed.
Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
net/mptcp/subflow.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 30961b3d1702..ddd0fc6fcf45 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -2144,6 +2144,10 @@ void __init mptcp_subflow_init(void)
tcp_prot_override = tcp_prot;
tcp_prot_override.release_cb = tcp_release_cb_override;
tcp_prot_override.diag_destroy = tcp_abort_override;
+#ifdef CONFIG_BPF_SYSCALL
+ /* Disable sockmap processing for subflows */
+ tcp_prot_override.psock_update_sk_prot = NULL;
+#endif
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
/* In struct mptcp_subflow_request_sock, we assume the TCP request sock
@@ -2180,6 +2184,10 @@ void __init mptcp_subflow_init(void)
tcpv6_prot_override = tcpv6_prot;
tcpv6_prot_override.release_cb = tcp_release_cb_override;
tcpv6_prot_override.diag_destroy = tcp_abort_override;
+#ifdef CONFIG_BPF_SYSCALL
+ /* Disable sockmap processing for subflows */
+ tcpv6_prot_override.psock_update_sk_prot = NULL;
+#endif
#endif
mptcp_diag_subflow_init(&subflow_ulp_ops);
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF
[not found] <20251105113625.148900-1-jiayuan.chen@linux.dev>
2025-11-05 11:36 ` [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
@ 2025-11-05 11:36 ` Jiayuan Chen
2025-11-05 14:40 ` Matthieu Baerts
1 sibling, 1 reply; 4+ messages in thread
From: Jiayuan Chen @ 2025-11-05 11:36 UTC (permalink / raw)
To: mptcp
Cc: Jiayuan Chen, stable, Jakub Sitnicki, Matthieu Baerts,
Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest
The sockmap feature allows bpf syscall from userspace, or based
on bpf sockops, replacing the sk_prot of sockets during protocol stack
processing with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
syn_recv_sock()/subflow_syn_recv_sock()
tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
bpf_skops_established <== sockops
bpf_sock_map_update(sk) <== call bpf helper
tcp_bpf_update_proto() <== update sk_prot
'''
When the server has MPTCP enabled but the client sends a TCP SYN
without MPTCP, subflow_syn_recv_sock() performs a fallback on the
subflow, replacing the subflow sk's sk_prot with the native sk_prot.
'''
subflow_syn_recv_sock()
subflow_ulp_fallback()
subflow_drop_ctx()
mptcp_subflow_ops_undo_override()
'''
Then, this subflow can be normally used by sockmap, which replaces the
native sk_prot with sockmap's custom sk_prot. The issue occurs when the
user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops().
Here, it uses sk->sk_prot to compare with the native sk_prot, but this
is incorrect when sockmap is used, as we may incorrectly set
sk->sk_socket->ops.
This fix uses the more generic sk_family for the comparison instead.
Additionally, this also prevents a WARNING from occurring:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 \
mptcp_stream_accept+0x34c/0x380
Modules linked in:
RIP: 0010:mptcp_stream_accept+0x34c/0x380
RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
PKRU: 55555554
Call Trace:
<TASK>
do_accept+0xeb/0x190
? __x64_sys_pselect6+0x61/0x80
? _raw_spin_unlock+0x12/0x30
? alloc_fd+0x11e/0x190
__sys_accept4+0x8c/0x100
__x64_sys_accept+0x1f/0x30
x64_sys_call+0x202f/0x20f0
do_syscall_64+0x72/0x9a0
? switch_fpu_return+0x60/0xf0
? irqentry_exit_to_user_mode+0xdb/0x1e0
? irqentry_exit+0x3f/0x50
? clear_bhb_loop+0x50/0xa0
? clear_bhb_loop+0x50/0xa0
? clear_bhb_loop+0x50/0xa0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
---[ end trace 0000000000000000 ]---
result from ./scripts/decode_stacktrace.sh:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 337 at net/mptcp/protocol.c:68 mptcp_stream_accept \
(net-next/net/mptcp/protocol.c:4005)
Modules linked in:
...
PKRU: 55555554
Call Trace:
<TASK>
do_accept (net-next/net/socket.c:1989)
__sys_accept4 (net-next/net/socket.c:2028 net-next/net/socket.c:2057)
__x64_sys_accept (net-next/net/socket.c:2067)
x64_sys_call (net-next/arch/x86/entry/syscall_64.c:41)
do_syscall_64 (net-next/arch/x86/entry/syscall_64.c:63 \
net-next/arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (net-next/arch/x86/entry/entry_64.S:130)
RIP: 0033:0x7f87ac92b83d
---[ end trace 0000000000000000 ]---
Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
---
net/mptcp/protocol.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4cd5df01446e..b5e5e130b158 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -61,11 +61,13 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
{
+ unsigned short family = READ_ONCE(sk->sk_family);
+
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
- if (sk->sk_prot == &tcpv6_prot)
+ if (family == AF_INET6)
return &inet6_stream_ops;
#endif
- WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
+ WARN_ON_ONCE(family != AF_INET);
return &inet_stream_ops;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap
2025-11-05 11:36 ` [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
@ 2025-11-05 14:39 ` Matthieu Baerts
0 siblings, 0 replies; 4+ messages in thread
From: Matthieu Baerts @ 2025-11-05 14:39 UTC (permalink / raw)
To: Jiayuan Chen, mptcp
Cc: stable, Mat Martineau, Geliang Tang, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
linux-kselftest
Hi Jiayuan,
On 05/11/2025 12:36, Jiayuan Chen wrote:
> The sockmap feature allows bpf syscall from userspace using , or based
(is there a word missing before the ','?)
> on bpf sockops, replacing the sk_prot of sockets during protocol stack
> processing with sockmap's custom read/write interfaces.
> '''
> tcp_rcv_state_process()
> subflow_syn_recv_sock()
> tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
> bpf_skops_established <== sockops
> bpf_sock_map_update(sk) <== call bpf helper
> tcp_bpf_update_proto() <== update sk_prot
> '''
> Consider two scenarios:
>
> 1. When the server has MPTCP enabled and the client also requests MPTCP,
> the sk passed to the BPF program is a subflow sk. Since subflows only
> handle partial data, replacing their sk_prot is meaningless and will
> cause traffic disruption.
>
> 2. When the server has MPTCP enabled but the client sends a TCP SYN
> without MPTCP, subflow_syn_recv_sock() performs a fallback on the
> subflow, replacing the subflow sk's sk_prot with the native sk_prot.
> '''
> subflow_ulp_fallback()
> subflow_drop_ctx()
> mptcp_subflow_ops_undo_override()
> '''
> Subsequently, accept::mptcp_stream_accept::mptcp_fallback_tcp_ops()
> converts the subflow to plain TCP.
>
> For the first case, we should prevent it from being combined with sockmap
> by setting sk_prot->psock_update_sk_prot to NULL, which will be blocked by
> sockmap's own flow.
>
> For the second case, since subflow_syn_recv_sock() has already restored
> sk_prot to native tcp_prot/tcpv6_prot, no further action is needed.
>
> Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
It should not change anything for the backport, but for this patch, the
Fixes tag can be older I think, e.g.
Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing
connections")
If you need to send a v5, please use this one.
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF
2025-11-05 11:36 ` [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
@ 2025-11-05 14:40 ` Matthieu Baerts
0 siblings, 0 replies; 4+ messages in thread
From: Matthieu Baerts @ 2025-11-05 14:40 UTC (permalink / raw)
To: Jiayuan Chen, mptcp
Cc: stable, Jakub Sitnicki, Mat Martineau, Geliang Tang,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Simon Horman, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
Hao Luo, Jiri Olsa, Shuah Khan, Florian Westphal, linux-kernel,
netdev, bpf, linux-kselftest
Hi Jiayuan,
On 05/11/2025 12:36, Jiayuan Chen wrote:
If you need to send a v5, please remove the 'net,' prefix from the
title. And maybe good to mention 'sockmap', e.g.
mptcp: fix proto fallback detection with sockmap
> The sockmap feature allows bpf syscall from userspace, or based
> on bpf sockops, replacing the sk_prot of sockets during protocol stack
> processing with sockmap's custom read/write interfaces.
> '''
> tcp_rcv_state_process()
> syn_recv_sock()/subflow_syn_recv_sock()
> tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
> bpf_skops_established <== sockops
> bpf_sock_map_update(sk) <== call bpf helper
> tcp_bpf_update_proto() <== update sk_prot
> '''
>
> When the server has MPTCP enabled but the client sends a TCP SYN
> without MPTCP, subflow_syn_recv_sock() performs a fallback on the
> subflow, replacing the subflow sk's sk_prot with the native sk_prot.
> '''
> subflow_syn_recv_sock()
> subflow_ulp_fallback()
> subflow_drop_ctx()
> mptcp_subflow_ops_undo_override()
> '''
>
> Then, this subflow can be normally used by sockmap, which replaces the
> native sk_prot with sockmap's custom sk_prot. The issue occurs when the
> user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops().
> Here, it uses sk->sk_prot to compare with the native sk_prot, but this
> is incorrect when sockmap is used, as we may incorrectly set
> sk->sk_socket->ops.
>
> This fix uses the more generic sk_family for the comparison instead.
>
> Additionally, this also prevents a WARNING from occurring:
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 \
> mptcp_stream_accept+0x34c/0x380
> Modules linked in:
> RIP: 0010:mptcp_stream_accept+0x34c/0x380
> RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
> PKRU: 55555554
> Call Trace:
> <TASK>
> do_accept+0xeb/0x190
> ? __x64_sys_pselect6+0x61/0x80
> ? _raw_spin_unlock+0x12/0x30
> ? alloc_fd+0x11e/0x190
> __sys_accept4+0x8c/0x100
> __x64_sys_accept+0x1f/0x30
> x64_sys_call+0x202f/0x20f0
> do_syscall_64+0x72/0x9a0
> ? switch_fpu_return+0x60/0xf0
> ? irqentry_exit_to_user_mode+0xdb/0x1e0
> ? irqentry_exit+0x3f/0x50
> ? clear_bhb_loop+0x50/0xa0
> ? clear_bhb_loop+0x50/0xa0
> ? clear_bhb_loop+0x50/0xa0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> </TASK>
> ---[ end trace 0000000000000000 ]---
>
> result from ./scripts/decode_stacktrace.sh:
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 337 at net/mptcp/protocol.c:68 mptcp_stream_accept \
> (net-next/net/mptcp/protocol.c:4005)
> Modules linked in:
> ...
>
> PKRU: 55555554
> Call Trace:
> <TASK>
> do_accept (net-next/net/socket.c:1989)
> __sys_accept4 (net-next/net/socket.c:2028 net-next/net/socket.c:2057)
> __x64_sys_accept (net-next/net/socket.c:2067)
> x64_sys_call (net-next/arch/x86/entry/syscall_64.c:41)
> do_syscall_64 (net-next/arch/x86/entry/syscall_64.c:63 \
> net-next/arch/x86/entry/syscall_64.c:94)
> entry_SYSCALL_64_after_hwframe (net-next/arch/x86/entry/entry_64.S:130)
> RIP: 0033:0x7f87ac92b83d
>
> ---[ end trace 0000000000000000 ]---
If you need to send a v5, please remove the non-decoded stacktrace, only
the decoded one is interesting. You can also remove the 'net-next/'
prefix in the paths. So only to keep 'net/mptcp/protocol.c:4005' for
example.
>
> Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
> net/mptcp/protocol.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 4cd5df01446e..b5e5e130b158 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -61,11 +61,13 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
>
> static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
> {
> + unsigned short family = READ_ONCE(sk->sk_family);
> +
> #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> - if (sk->sk_prot == &tcpv6_prot)
> + if (family == AF_INET6)
> return &inet6_stream_ops;
> #endif
> - WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
> + WARN_ON_ONCE(family != AF_INET);
I wonder if it would be interesting to return NULL if the family is not
AF_INET{,6}. But I guess this will cause a crash quickly after, no?
If yes, probably better to continue returning &inet_stream_ops here.
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> return &inet_stream_ops;
> }
>
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-11-05 14:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20251105113625.148900-1-jiayuan.chen@linux.dev>
2025-11-05 11:36 ` [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
2025-11-05 14:39 ` Matthieu Baerts
2025-11-05 11:36 ` [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
2025-11-05 14:40 ` Matthieu Baerts
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).