From: Matthieu Baerts <matttbe@kernel.org>
To: Jiayuan Chen <jiayuan.chen@linux.dev>, mptcp@lists.linux.dev
Cc: stable@vger.kernel.org, Jakub Sitnicki <jakub@cloudflare.com>,
Mat Martineau <martineau@kernel.org>,
Geliang Tang <geliang@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
Florian Westphal <fw@strlen.de>,
linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
bpf@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF
Date: Wed, 5 Nov 2025 15:40:03 +0100 [thread overview]
Message-ID: <7d12a5fb-f923-4176-901a-8dc967eda52e@kernel.org> (raw)
In-Reply-To: <20251105113625.148900-3-jiayuan.chen@linux.dev>
Hi Jiayuan,
On 05/11/2025 12:36, Jiayuan Chen wrote:
If you need to send a v5, please remove the 'net,' prefix from the
title. And maybe good to mention 'sockmap', e.g.
mptcp: fix proto fallback detection with sockmap
> The sockmap feature allows bpf syscall from userspace, or based
> on bpf sockops, replacing the sk_prot of sockets during protocol stack
> processing with sockmap's custom read/write interfaces.
> '''
> tcp_rcv_state_process()
> syn_recv_sock()/subflow_syn_recv_sock()
> tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
> bpf_skops_established <== sockops
> bpf_sock_map_update(sk) <== call bpf helper
> tcp_bpf_update_proto() <== update sk_prot
> '''
>
> When the server has MPTCP enabled but the client sends a TCP SYN
> without MPTCP, subflow_syn_recv_sock() performs a fallback on the
> subflow, replacing the subflow sk's sk_prot with the native sk_prot.
> '''
> subflow_syn_recv_sock()
> subflow_ulp_fallback()
> subflow_drop_ctx()
> mptcp_subflow_ops_undo_override()
> '''
>
> Then, this subflow can be normally used by sockmap, which replaces the
> native sk_prot with sockmap's custom sk_prot. The issue occurs when the
> user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops().
> Here, it uses sk->sk_prot to compare with the native sk_prot, but this
> is incorrect when sockmap is used, as we may incorrectly set
> sk->sk_socket->ops.
>
> This fix uses the more generic sk_family for the comparison instead.
>
> Additionally, this also prevents a WARNING from occurring:
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 \
> mptcp_stream_accept+0x34c/0x380
> Modules linked in:
> RIP: 0010:mptcp_stream_accept+0x34c/0x380
> RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
> PKRU: 55555554
> Call Trace:
> <TASK>
> do_accept+0xeb/0x190
> ? __x64_sys_pselect6+0x61/0x80
> ? _raw_spin_unlock+0x12/0x30
> ? alloc_fd+0x11e/0x190
> __sys_accept4+0x8c/0x100
> __x64_sys_accept+0x1f/0x30
> x64_sys_call+0x202f/0x20f0
> do_syscall_64+0x72/0x9a0
> ? switch_fpu_return+0x60/0xf0
> ? irqentry_exit_to_user_mode+0xdb/0x1e0
> ? irqentry_exit+0x3f/0x50
> ? clear_bhb_loop+0x50/0xa0
> ? clear_bhb_loop+0x50/0xa0
> ? clear_bhb_loop+0x50/0xa0
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> </TASK>
> ---[ end trace 0000000000000000 ]---
>
> result from ./scripts/decode_stacktrace.sh:
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 337 at net/mptcp/protocol.c:68 mptcp_stream_accept \
> (net-next/net/mptcp/protocol.c:4005)
> Modules linked in:
> ...
>
> PKRU: 55555554
> Call Trace:
> <TASK>
> do_accept (net-next/net/socket.c:1989)
> __sys_accept4 (net-next/net/socket.c:2028 net-next/net/socket.c:2057)
> __x64_sys_accept (net-next/net/socket.c:2067)
> x64_sys_call (net-next/arch/x86/entry/syscall_64.c:41)
> do_syscall_64 (net-next/arch/x86/entry/syscall_64.c:63 \
> net-next/arch/x86/entry/syscall_64.c:94)
> entry_SYSCALL_64_after_hwframe (net-next/arch/x86/entry/entry_64.S:130)
> RIP: 0033:0x7f87ac92b83d
>
> ---[ end trace 0000000000000000 ]---
If you need to send a v5, please remove the non-decoded stacktrace, only
the decoded one is interesting. You can also remove the 'net-next/'
prefix in the paths. So only to keep 'net/mptcp/protocol.c:4005' for
example.
>
> Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
> net/mptcp/protocol.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 4cd5df01446e..b5e5e130b158 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -61,11 +61,13 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
>
> static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
> {
> + unsigned short family = READ_ONCE(sk->sk_family);
> +
> #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> - if (sk->sk_prot == &tcpv6_prot)
> + if (family == AF_INET6)
> return &inet6_stream_ops;
> #endif
> - WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
> + WARN_ON_ONCE(family != AF_INET);
I wonder if it would be interesting to return NULL if the family is not
AF_INET{,6}. But I guess this will cause a crash quickly after, no?
If yes, probably better to continue returning &inet_stream_ops here.
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> return &inet_stream_ops;
> }
>
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
next prev parent reply other threads:[~2025-11-05 14:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 11:36 [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
2025-11-05 11:36 ` [PATCH net v4 1/3] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
2025-11-05 14:39 ` Matthieu Baerts
2025-11-05 11:36 ` [PATCH net v4 2/3] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
2025-11-05 14:40 ` Matthieu Baerts [this message]
2025-11-05 11:36 ` [PATCH net v4 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
2025-11-05 14:40 ` Matthieu Baerts
2025-11-05 16:12 ` Jiayuan Chen
2025-11-05 16:28 ` Matthieu Baerts
2025-11-06 1:46 ` Jiayuan Chen
2025-11-05 14:37 ` [PATCH net v4 0/3] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7d12a5fb-f923-4176-901a-8dc967eda52e@kernel.org \
--to=matttbe@kernel.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eddyz87@gmail.com \
--cc=edumazet@google.com \
--cc=fw@strlen.de \
--cc=geliang@kernel.org \
--cc=haoluo@google.com \
--cc=horms@kernel.org \
--cc=jakub@cloudflare.com \
--cc=jiayuan.chen@linux.dev \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=martineau@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=stable@vger.kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).