public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 6.1.y v1 0/2] mptcp: Fix conflicts between MPTCP and sockmap
@ 2025-11-30  3:23 Jiayuan Chen
  2025-11-30  3:23 ` [PATCH 6.1.y v1 1/2] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jiayuan Chen @ 2025-11-30  3:23 UTC (permalink / raw)
  To: stable, mptcp, matthieu.baerts, sashal, gregkh; +Cc: Jiayuan Chen

Overall, we encountered a warning [1] that can be triggered by running the
selftest I provided.

sockmap works by replacing sk_data_ready, recvmsg, sendmsg operations and
implementing fast socket-level forwarding logic:
1. Users can obtain file descriptors through userspace socket()/accept()
   interfaces, then call BPF syscall to perform these replacements.
2. Users can also use the bpf_sock_hash_update helper (in sockops programs)
   to replace handlers when TCP connections enter ESTABLISHED state
  (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB/BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB)

However, when combined with MPTCP, an issue arises: MPTCP creates subflow
sk's and performs TCP handshakes, so the BPF program obtains subflow sk's
and may incorrectly replace their sk_prot. We need to reject such
operations. In patch 1, we set psock_update_sk_prot to NULL in the
subflow's custom sk_prot.

Additionally, if the server's listening socket has MPTCP enabled and the
client's TCP also uses MPTCP, we should allow the combination of subflow
and sockmap. This is because the latest Golang programs have enabled MPTCP
for listening sockets by default [2]. For programs already using sockmap,
upgrading Golang should not cause sockmap functionality to fail.

Patch 2 prevents the panic from occurring.

Despite these patches fixing stream corruption, users of sockmap must set
GODEBUG=multipathtcp=0 to disable MPTCP until sockmap fully supports it.

[1] truncated warning:
------------[ cut here ]------------

BUG: kernel NULL pointer dereference, address: 00000000000004bb
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 400 Comm: test_progs Not tainted 6.1.0+ #16
RIP: 0010:mptcp_stream_accept (./include/linux/list.h:88 net/mptcp/protocol.c:3719)

RSP: 0018:ffffc90000ef3cf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8880089dcc58
RDX: 0000000000000003 RSI: 0000002c000000b0 RDI: 0000000000000000
RBP: ffffc90000ef3d38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880089dc600
R13: ffff88800b859e00 R14: ffff88800638c680 R15: 0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000004bb CR3: 000000000b8e8006 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
? apparmor_socket_accept (security/apparmor/lsm.c:966)
do_accept (net/socket.c:1856)
__sys_accept4 (net/socket.c:1897 net/socket.c:1927)
__x64_sys_accept (net/socket.c:1941)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)

[2]: https://go-review.googlesource.com/c/go/+/607715

Jiayuan Chen (2):
  mptcp: disallow MPTCP subflows from sockmap
  net,mptcp: fix proto fallback detection with BPF

 net/mptcp/protocol.c | 5 +++--
 net/mptcp/subflow.c  | 8 ++++++++
 2 files changed, 11 insertions(+), 2 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 6.1.y v1 1/2] mptcp: disallow MPTCP subflows from sockmap
  2025-11-30  3:23 [PATCH 6.1.y v1 0/2] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
@ 2025-11-30  3:23 ` Jiayuan Chen
  2025-11-30  3:23 ` [PATCH 6.1.y v1 2/2] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
  2025-11-30 16:21 ` [PATCH 6.1.y v1 0/2] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts
  2 siblings, 0 replies; 6+ messages in thread
From: Jiayuan Chen @ 2025-11-30  3:23 UTC (permalink / raw)
  To: stable, mptcp, matthieu.baerts, sashal, gregkh
  Cc: Jiayuan Chen, Matthieu Baerts

The sockmap feature allows bpf syscall from userspace, or based on bpf
sockops, replacing the sk_prot of sockets during protocol stack processing
with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
  subflow_syn_recv_sock()
    tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
      bpf_skops_established       <== sockops
        bpf_sock_map_update(sk)   <== call bpf helper
          tcp_bpf_update_proto()  <== update sk_prot
'''
Consider two scenarios:

1. When the server has MPTCP enabled and the client also requests MPTCP,
   the sk passed to the BPF program is a subflow sk. Since subflows only
   handle partial data, replacing their sk_prot is meaningless and will
   cause traffic disruption.

2. When the server has MPTCP enabled but the client sends a TCP SYN
   without MPTCP, subflow_syn_recv_sock() performs a fallback on the
   subflow, replacing the subflow sk's sk_prot with the native sk_prot.
   '''
   subflow_ulp_fallback()
    subflow_drop_ctx()
      mptcp_subflow_ops_undo_override()
   '''
   Subsequently, accept::mptcp_stream_accept::mptcp_fallback_tcp_ops()
   converts the subflow to plain TCP.

For the first case, we should prevent it from being combined with sockmap
by setting sk_prot->psock_update_sk_prot to NULL, which will be blocked by
sockmap's own flow.

For the second case, since subflow_syn_recv_sock() has already restored
sk_prot to native tcp_prot/tcpv6_prot, no further action is needed.

Fixes: d2f77c53342e ("mptcp: check for plain TCP sock at accept time")
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/mptcp/subflow.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 2159b5f9988f..c922bbb12bd8 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1936,6 +1936,10 @@ void __init mptcp_subflow_init(void)
 
 	tcp_prot_override = tcp_prot;
 	tcp_prot_override.release_cb = tcp_release_cb_override;
+#ifdef CONFIG_BPF_SYSCALL
+	/* Disable sockmap processing for subflows */
+	tcp_prot_override.psock_update_sk_prot = NULL;
+#endif
 
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
 	subflow_request_sock_ipv6_ops = tcp_request_sock_ipv6_ops;
@@ -1957,6 +1961,10 @@ void __init mptcp_subflow_init(void)
 
 	tcpv6_prot_override = tcpv6_prot;
 	tcpv6_prot_override.release_cb = tcp_release_cb_override;
+#ifdef CONFIG_BPF_SYSCALL
+	/* Disable sockmap processing for subflows */
+	tcpv6_prot_override.psock_update_sk_prot = NULL;
+#endif
 #endif
 
 	mptcp_diag_subflow_init(&subflow_ulp_ops);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 6.1.y v1 2/2] net,mptcp: fix proto fallback detection with BPF
  2025-11-30  3:23 [PATCH 6.1.y v1 0/2] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
  2025-11-30  3:23 ` [PATCH 6.1.y v1 1/2] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
@ 2025-11-30  3:23 ` Jiayuan Chen
  2025-11-30 16:21   ` Matthieu Baerts
  2025-11-30 16:21 ` [PATCH 6.1.y v1 0/2] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts
  2 siblings, 1 reply; 6+ messages in thread
From: Jiayuan Chen @ 2025-11-30  3:23 UTC (permalink / raw)
  To: stable, mptcp, matthieu.baerts, sashal, gregkh
  Cc: Jiayuan Chen, Jakub Sitnicki, Matthieu Baerts

The sockmap feature allows bpf syscall from userspace, or based
on bpf sockops, replacing the sk_prot of sockets during protocol stack
processing with sockmap's custom read/write interfaces.
'''
tcp_rcv_state_process()
  syn_recv_sock()/subflow_syn_recv_sock()
    tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
      bpf_skops_established       <== sockops
        bpf_sock_map_update(sk)   <== call bpf helper
          tcp_bpf_update_proto()  <== update sk_prot
'''

When the server has MPTCP enabled but the client sends a TCP SYN
without MPTCP, subflow_syn_recv_sock() performs a fallback on the
subflow, replacing the subflow sk's sk_prot with the native sk_prot.
'''
subflow_syn_recv_sock()
  subflow_ulp_fallback()
    subflow_drop_ctx()
      mptcp_subflow_ops_undo_override()
'''

Then, this subflow can be normally used by sockmap, which replaces the
native sk_prot with sockmap's custom sk_prot. The issue occurs when the
user executes accept::mptcp_stream_accept::mptcp_fallback_tcp_ops().
Here, it uses sk->sk_prot to compare with the native sk_prot, but this
is incorrect when sockmap is used, as we may incorrectly set
sk->sk_socket->ops.

This fix uses the more generic sk_family for the comparison instead.

Additionally, this also prevents a PANIC from occurring:

result from ./scripts/decode_stacktrace.sh:
------------[ cut here ]------------

BUG: kernel NULL pointer dereference, address: 00000000000004bb
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 400 Comm: test_progs Not tainted 6.1.0+ #16
RIP: 0010:mptcp_stream_accept (./include/linux/list.h:88 net/mptcp/protocol.c:3719)

RSP: 0018:ffffc90000ef3cf0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8880089dcc58
RDX: 0000000000000003 RSI: 0000002c000000b0 RDI: 0000000000000000
RBP: ffffc90000ef3d38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8880089dc600
R13: ffff88800b859e00 R14: ffff88800638c680 R15: 0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000004bb CR3: 000000000b8e8006 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
? apparmor_socket_accept (security/apparmor/lsm.c:966)
do_accept (net/socket.c:1856)
__sys_accept4 (net/socket.c:1897 net/socket.c:1927)
__x64_sys_accept (net/socket.c:1941)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)

Fixes: d2f77c53342e ("mptcp: check for plain TCP sock at accept time")
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/mptcp/protocol.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1dbc62537259..13e3510e6c8f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -79,8 +79,9 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
 static bool mptcp_is_tcpsk(struct sock *sk)
 {
 	struct socket *sock = sk->sk_socket;
+	unsigned short family = READ_ONCE(sk->sk_family);
 
-	if (unlikely(sk->sk_prot == &tcp_prot)) {
+	if (unlikely(family == AF_INET)) {
 		/* we are being invoked after mptcp_accept() has
 		 * accepted a non-mp-capable flow: sk is a tcp_sk,
 		 * not an mptcp one.
@@ -91,7 +92,7 @@ static bool mptcp_is_tcpsk(struct sock *sk)
 		sock->ops = &inet_stream_ops;
 		return true;
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
-	} else if (unlikely(sk->sk_prot == &tcpv6_prot)) {
+	} else if (unlikely(family == AF_INET6)) {
 		sock->ops = &inet6_stream_ops;
 		return true;
 #endif
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 6.1.y v1 2/2] net,mptcp: fix proto fallback detection with BPF
  2025-11-30  3:23 ` [PATCH 6.1.y v1 2/2] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
@ 2025-11-30 16:21   ` Matthieu Baerts
  2025-12-01  1:43     ` Jiayuan Chen
  0 siblings, 1 reply; 6+ messages in thread
From: Matthieu Baerts @ 2025-11-30 16:21 UTC (permalink / raw)
  To: Jiayuan Chen, stable, mptcp, sashal, gregkh; +Cc: Jakub Sitnicki

Hi Jiayuan,

On 30/11/2025 04:23, Jiayuan Chen wrote:
> The sockmap feature allows bpf syscall from userspace, or based
> on bpf sockops, replacing the sk_prot of sockets during protocol stack
> processing with sockmap's custom read/write interfaces.

(...)

> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 1dbc62537259..13e3510e6c8f 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -79,8 +79,9 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
>  static bool mptcp_is_tcpsk(struct sock *sk)
>  {
>  	struct socket *sock = sk->sk_socket;
> +	unsigned short family = READ_ONCE(sk->sk_family);
>  
> -	if (unlikely(sk->sk_prot == &tcp_prot)) {
> +	if (unlikely(family == AF_INET)) {
>  		/* we are being invoked after mptcp_accept() has
>  		 * accepted a non-mp-capable flow: sk is a tcp_sk,
>  		 * not an mptcp one.
> @@ -91,7 +92,7 @@ static bool mptcp_is_tcpsk(struct sock *sk)
>  		sock->ops = &inet_stream_ops;
>  		return true;
>  #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> -	} else if (unlikely(sk->sk_prot == &tcpv6_prot)) {
> +	} else if (unlikely(family == AF_INET6)) {

These modifications here break MPTCP: this function (mptcp_is_tcpsk) is
there to check if the socket is a "plain" TCP one (return "true") or an
MPTCP one (return "false"). If it is not an MPTCP one, the sock ops is
modified.

Here, you are saying: any IPv4 or IPv6 socket is a "plain" TCP one,
never an MPTCP socket then.

I suggest adding ...

  if (sk->sk_protocol == IPPROTO_MPTCP)
          return false;

... at the beginning of this function. I'm planning to send a patch
later on including this check. Once it is sent, do you mind checking it
with sockmap if you have the setup available, please?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 6.1.y v1 0/2] mptcp: Fix conflicts between MPTCP and sockmap
  2025-11-30  3:23 [PATCH 6.1.y v1 0/2] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
  2025-11-30  3:23 ` [PATCH 6.1.y v1 1/2] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
  2025-11-30  3:23 ` [PATCH 6.1.y v1 2/2] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
@ 2025-11-30 16:21 ` Matthieu Baerts
  2 siblings, 0 replies; 6+ messages in thread
From: Matthieu Baerts @ 2025-11-30 16:21 UTC (permalink / raw)
  To: Jiayuan Chen; +Cc: stable, mptcp, sashal, gregkh

Hi Jiayuan,

On 30/11/2025 04:23, Jiayuan Chen wrote:
> Overall, we encountered a warning [1] that can be triggered by running the
> selftest I provided.

Thank you for having shared these patches, but there are some issues
with them:

- Patch 1/2 has already been queued, see [1].

- Patch 2/2 is the exact same patch as the one sent by Sasha [2], and
recently dropped [3] because it breaks MPTCP. See my comment there.

- You need to follow the stable rules [4] when sending patches to
stable. In short, here, you should have kept the original upstream
commit message, then add the SHA and a note about the conflicts, e.g.
similar to [5]. Without that, it is a new patch, not a backport (and the
reviewed-by/acked-by/... cannot be kept).

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-6.1/mptcp-disallow-mptcp-subflows-from-sockmap.patch
[2]
https://lore.kernel.org/stable/9e6ef98f-12eb-4608-aece-cf321e0a38d7@kernel.org/T/#u
[3]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/commit/?id=a662939f16b515c5162e304e6a4cf95370e596e1
[4] https://docs.kernel.org/process/stable-kernel-rules.html
[5]
https://lore.kernel.org/stable/20251129165612.2125498-2-matttbe@kernel.org/T/#u

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 6.1.y v1 2/2] net,mptcp: fix proto fallback detection with BPF
  2025-11-30 16:21   ` Matthieu Baerts
@ 2025-12-01  1:43     ` Jiayuan Chen
  0 siblings, 0 replies; 6+ messages in thread
From: Jiayuan Chen @ 2025-12-01  1:43 UTC (permalink / raw)
  To: Matthieu Baerts, stable, mptcp, sashal, gregkh; +Cc: Jakub Sitnicki

2025/12/1 24:21, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:


> 
> Hi Jiayuan,
> 
> On 30/11/2025 04:23, Jiayuan Chen wrote:
> 
> > 
> > The sockmap feature allows bpf syscall from userspace, or based
> >  on bpf sockops, replacing the sk_prot of sockets during protocol stack
> >  processing with sockmap's custom read/write interfaces.
> > 
> (...)
> 
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> >  index 1dbc62537259..13e3510e6c8f 100644
> >  --- a/net/mptcp/protocol.c
> >  +++ b/net/mptcp/protocol.c
> >  @@ -79,8 +79,9 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
> >  static bool mptcp_is_tcpsk(struct sock *sk)
> >  {
> >  struct socket *sock = sk->sk_socket;
> >  + unsigned short family = READ_ONCE(sk->sk_family);
> >  
> >  - if (unlikely(sk->sk_prot == &tcp_prot)) {
> >  + if (unlikely(family == AF_INET)) {
> >  /* we are being invoked after mptcp_accept() has
> >  * accepted a non-mp-capable flow: sk is a tcp_sk,
> >  * not an mptcp one.
> >  @@ -91,7 +92,7 @@ static bool mptcp_is_tcpsk(struct sock *sk)
> >  sock->ops = &inet_stream_ops;
> >  return true;
> >  #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> >  - } else if (unlikely(sk->sk_prot == &tcpv6_prot)) {
> >  + } else if (unlikely(family == AF_INET6)) {
> > 
> These modifications here break MPTCP: this function (mptcp_is_tcpsk) is
> there to check if the socket is a "plain" TCP one (return "true") or an
> MPTCP one (return "false"). If it is not an MPTCP one, the sock ops is
> modified.
> 
> Here, you are saying: any IPv4 or IPv6 socket is a "plain" TCP one,
> never an MPTCP socket then.
> 
> I suggest adding ...
> 
>  if (sk->sk_protocol == IPPROTO_MPTCP)
>  return false;
> 
> ... at the beginning of this function. I'm planning to send a patch
> later on including this check. Once it is sent, do you mind checking it
> with sockmap if you have the setup available, please?

Yes, of course. I can test it once I receive the patch.

> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-12-01  1:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-30  3:23 [PATCH 6.1.y v1 0/2] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
2025-11-30  3:23 ` [PATCH 6.1.y v1 1/2] mptcp: disallow MPTCP subflows from sockmap Jiayuan Chen
2025-11-30  3:23 ` [PATCH 6.1.y v1 2/2] net,mptcp: fix proto fallback detection with BPF Jiayuan Chen
2025-11-30 16:21   ` Matthieu Baerts
2025-12-01  1:43     ` Jiayuan Chen
2025-11-30 16:21 ` [PATCH 6.1.y v1 0/2] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox