[PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap

linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap
@ 2025-10-23 12:54 Jiayuan Chen
  2025-10-23 12:54 ` [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap Jiayuan Chen
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Jiayuan Chen @ 2025-10-23 12:54 UTC (permalink / raw)
  To: mptcp
  Cc: Jiayuan Chen, John Fastabend, Jakub Sitnicki, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn, David S. Miller,
	Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
	Geliang Tang, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

Overall, we encountered a warning [1] that can be triggered by running the
selftest I provided.

MPTCP creates subflows for data transmission between two endpoints.
However, BPF can use sockops to perform additional operations when TCP
completes the three-way handshake. The issue arose because we used sockmap
in sockops, which replaces sk->sk_prot and some handlers. Since subflows
also have their own specialized handlers, this creates a conflict and leads
to traffic failure. Therefore, we need to reject operations targeting
subflows.

This patchset simply prevents the combination of subflows and sockmap
without changing any functionality.

A complete integration of MPTCP and sockmap would require more effort, for
example, we would need to retrieve the parent socket from subflows in
sockmap and implement handlers like read_skb.

If maintainers don't object, we can further improve this in subsequent
work.

[1] truncated warning:
[   18.234652] ------------[ cut here ]------------
[   18.234664] WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380
[   18.234726] Modules linked in:
[   18.234755] RIP: 0010:mptcp_stream_accept+0x34c/0x380
[   18.234762] RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
[   18.234800] PKRU: 55555554
[   18.234806] Call Trace:
[   18.234810]  <TASK>
[   18.234837]  do_accept+0xeb/0x190
[   18.234861]  ? __x64_sys_pselect6+0x61/0x80
[   18.234898]  ? _raw_spin_unlock+0x12/0x30
[   18.234915]  ? alloc_fd+0x11e/0x190
[   18.234925]  __sys_accept4+0x8c/0x100
[   18.234930]  __x64_sys_accept+0x1f/0x30
[   18.234933]  x64_sys_call+0x202f/0x20f0
[   18.234966]  do_syscall_64+0x72/0x9a0
[   18.234979]  ? switch_fpu_return+0x60/0xf0
[   18.234993]  ? irqentry_exit_to_user_mode+0xdb/0x1e0
[   18.235002]  ? irqentry_exit+0x3f/0x50
[   18.235005]  ? clear_bhb_loop+0x50/0xa0
[   18.235022]  ? clear_bhb_loop+0x50/0xa0
[   18.235025]  ? clear_bhb_loop+0x50/0xa0
[   18.235028]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   18.235066]  </TASK>
[   18.235109] ---[ end trace 0000000000000000 ]---

---
v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/T/#t
    Some advice suggested by Jakub Sitnicki

v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linux.dev/T/#t
    Some advice from Matthieu Baerts.

Jiayuan Chen (3):
  net,mptcp: fix proto fallback detection with BPF sockmap
  bpf,sockmap: disallow MPTCP sockets from sockmap
  selftests/bpf: Add mptcp test with sockmap

 net/core/sock_map.c                           |  27 ++++
 net/mptcp/protocol.c                          |   9 +-
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 150 ++++++++++++++++++
 .../selftests/bpf/progs/mptcp_sockmap.c       |  43 +++++
 4 files changed, 227 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap
  2025-10-23 12:54 [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
@ 2025-10-23 12:54 ` Jiayuan Chen
  2025-10-23 14:10   ` Matthieu Baerts
  2025-10-28 11:30   ` Paolo Abeni
  2025-10-23 12:54 ` [PATCH net v3 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap Jiayuan Chen
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Jiayuan Chen @ 2025-10-23 12:54 UTC (permalink / raw)
  To: mptcp
  Cc: Jiayuan Chen, stable, Jakub Sitnicki, John Fastabend,
	Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S. Miller, Jakub Kicinski, Simon Horman, Matthieu Baerts,
	Mat Martineau, Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

When the server has MPTCP enabled but receives a non-MP-capable request
from a client, it calls mptcp_fallback_tcp_ops().

Since non-MPTCP connections are allowed to use sockmap, which replaces
sk->sk_prot, using sk->sk_prot to determine the IP version in
mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning
incorrect ops to sk->sk_socket->ops.

Additionally, when BPF Sockmap modifies the protocol handlers, the
original WARN_ON_ONCE(sk->sk_prot != &tcp_prot) check would falsely
trigger warnings.

Fix this by using the more stable sk_family to distinguish between IPv4
and IPv6 connections, ensuring correct fallback protocol operations are
selected even when BPF Sockmap has modified the socket protocol handlers.

Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/mptcp/protocol.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 0292162a14ee..2393741bc310 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -61,11 +61,16 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
 
 static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
 {
+	/* When BPF sockmap is used, it may replace sk->sk_prot.
+	 * Using sk_family is a reliable way to determine the IP version.
+	 */
+	unsigned short family = READ_ONCE(sk->sk_family);
+
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
-	if (sk->sk_prot == &tcpv6_prot)
+	if (family == AF_INET6)
 		return &inet6_stream_ops;
 #endif
-	WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
+	WARN_ON_ONCE(family != AF_INET);
 	return &inet_stream_ops;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net v3 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap
  2025-10-23 12:54 [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
  2025-10-23 12:54 ` [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap Jiayuan Chen
@ 2025-10-23 12:54 ` Jiayuan Chen
  2025-10-28 12:03   ` Paolo Abeni
  2025-10-23 12:54 ` [PATCH net v3 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
  2025-10-23 14:10 ` [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts
  3 siblings, 1 reply; 17+ messages in thread
From: Jiayuan Chen @ 2025-10-23 12:54 UTC (permalink / raw)
  To: mptcp
  Cc: Jiayuan Chen, stable, Jakub Sitnicki, John Fastabend,
	Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn,
	David S. Miller, Jakub Kicinski, Simon Horman, Matthieu Baerts,
	Mat Martineau, Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

MPTCP creates subflows for data transmission, and these sockets should not
be added to sockmap because MPTCP sets specialized data_ready handlers
that would be overridden by sockmap.

Additionally, for the parent socket of MPTCP subflows (plain TCP socket),
MPTCP sk requires specific protocol handling that conflicts with sockmap's
operation(mptcp_prot).

This patch adds proper checks to reject MPTCP subflows and their parent
sockets from being added to sockmap, while preserving compatibility with
reuseport functionality for listening MPTCP sockets.

We cannot add this logic to sock_map_sk_state_allowed() because the sockops
path doesn't execute this function, and the socket state coming from
sockops might be in states like SYN_RECV. So moving
sock_map_sk_state_allowed() to sock_{map,hash}_update_common() is not
appropriate. Instead, we introduce a new function to handle MPTCP checks.

Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
Cc: <stable@vger.kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/core/sock_map.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 5947b38e4f8b..5be38cdfb5cc 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -467,6 +467,27 @@ static int sock_map_get_next_key(struct bpf_map *map, void *key, void *next)
 	return 0;
 }
 
+/* Disallow MPTCP subflows and their parent sockets. However, a TCP_LISTEN
+ * MPTCP socket is permitted because sockmap can also serve for reuseport
+ * socket selection.
+ */
+static inline bool sock_map_sk_type_allowed(const struct sock *sk)
+{
+	/* MPTCP subflows are not intended for data I/O by user */
+	if (sk_is_tcp(sk) && sk_is_mptcp(sk))
+		goto disallow;
+
+	/* MPTCP parents use mptcp_prot - not supported with sockmap yet */
+	if (sk->sk_protocol == IPPROTO_MPTCP && sk->sk_state != TCP_LISTEN)
+		goto disallow;
+
+	return true;
+
+disallow:
+	pr_err_once("sockmap/sockhash: MPTCP sockets are not supported\n");
+	return false;
+}
+
 static int sock_map_update_common(struct bpf_map *map, u32 idx,
 				  struct sock *sk, u64 flags)
 {
@@ -482,6 +503,9 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
 	if (unlikely(idx >= map->max_entries))
 		return -E2BIG;
 
+	if (!sock_map_sk_type_allowed(sk))
+		return -EOPNOTSUPP;
+
 	link = sk_psock_init_link();
 	if (!link)
 		return -ENOMEM;
@@ -1003,6 +1027,9 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
 	if (unlikely(flags > BPF_EXIST))
 		return -EINVAL;
 
+	if (!sock_map_sk_type_allowed(sk))
+		return -EOPNOTSUPP;
+
 	link = sk_psock_init_link();
 	if (!link)
 		return -ENOMEM;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net v3 3/3] selftests/bpf: Add mptcp test with sockmap
  2025-10-23 12:54 [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
  2025-10-23 12:54 ` [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap Jiayuan Chen
  2025-10-23 12:54 ` [PATCH net v3 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap Jiayuan Chen
@ 2025-10-23 12:54 ` Jiayuan Chen
  2025-10-23 14:10 ` [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts
  3 siblings, 0 replies; 17+ messages in thread
From: Jiayuan Chen @ 2025-10-23 12:54 UTC (permalink / raw)
  To: mptcp
  Cc: Jiayuan Chen, John Fastabend, Jakub Sitnicki, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn, David S. Miller,
	Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
	Geliang Tang, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

Add test cases to verify that when MPTCP falls back to plain TCP sockets,
they can properly work with sockmap.

Additionally, add test cases to ensure that sockmap correctly rejects
MPTCP sockets as expected.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 150 ++++++++++++++++++
 .../selftests/bpf/progs/mptcp_sockmap.c       |  43 +++++
 2 files changed, 193 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c

diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index f8eb7f9d4fd2..56c556f603cc 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -6,11 +6,14 @@
 #include <netinet/in.h>
 #include <test_progs.h>
 #include <unistd.h>
+#include <error.h>
 #include "cgroup_helpers.h"
 #include "network_helpers.h"
+#include "socket_helpers.h"
 #include "mptcp_sock.skel.h"
 #include "mptcpify.skel.h"
 #include "mptcp_subflow.skel.h"
+#include "mptcp_sockmap.skel.h"
 
 #define NS_TEST "mptcp_ns"
 #define ADDR_1	"10.0.1.1"
@@ -436,6 +439,151 @@ static void test_subflow(void)
 	close(cgroup_fd);
 }
 
+/* Test sockmap on MPTCP server handling non-mp-capable clients. */
+static void test_sockmap_with_mptcp_fallback(struct mptcp_sockmap *skel)
+{
+	int listen_fd = -1, client_fd1 = -1, client_fd2 = -1;
+	int server_fd1 = -1, server_fd2 = -1, sent, recvd;
+	char snd[9] = "123456789";
+	char rcv[10];
+
+	/* start server with MPTCP enabled */
+	listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
+	if (!ASSERT_OK_FD(listen_fd, "redirect:start_mptcp_server"))
+		return;
+
+	skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
+	skel->bss->sk_index = 0;
+	/* create client without MPTCP enabled */
+	client_fd1 = connect_to_fd_opts(listen_fd, NULL);
+	if (!ASSERT_OK_FD(client_fd1, "redirect:connect_to_fd"))
+		goto end;
+
+	server_fd1 = xaccept_nonblock(listen_fd, NULL, NULL);
+	skel->bss->sk_index = 1;
+	client_fd2 = connect_to_fd_opts(listen_fd, NULL);
+	if (!ASSERT_OK_FD(client_fd2, "redirect:connect_to_fd"))
+		goto end;
+
+	server_fd2 = xaccept_nonblock(listen_fd, NULL, NULL);
+	/* test normal redirect behavior: data sent by client_fd1 can be
+	 * received by client_fd2
+	 */
+	skel->bss->redirect_idx = 1;
+	sent = xsend(client_fd1, snd, sizeof(snd), 0);
+	if (!ASSERT_EQ(sent, sizeof(snd), "redirect:xsend(client_fd1)"))
+		goto end;
+
+	/* try to recv more bytes to avoid truncation check */
+	recvd = recv_timeout(client_fd2, rcv, sizeof(rcv), MSG_DONTWAIT, 2);
+	if (!ASSERT_EQ(recvd, sizeof(snd), "redirect:recv(client_fd2)"))
+		goto end;
+
+end:
+	if (client_fd1 > 1)
+		close(client_fd1);
+	if (client_fd2 > 1)
+		close(client_fd2);
+	if (server_fd1 > 0)
+		close(server_fd1);
+	if (server_fd2 > 0)
+		close(server_fd2);
+	close(listen_fd);
+}
+
+/* Test sockmap rejection of MPTCP sockets - both server and client sides. */
+static void test_sockmap_reject_mptcp(struct mptcp_sockmap *skel)
+{
+	int client_fd1 = -1, client_fd2 = -1;
+	int listen_fd = -1, server_fd = -1;
+	int err, zero = 0;
+
+	/* start server with MPTCP enabled */
+	listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
+	if (!ASSERT_OK_FD(listen_fd, "start_mptcp_server"))
+		return;
+
+	skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
+	skel->bss->sk_index = 0;
+	/* create client with MPTCP enabled */
+	client_fd1 = connect_to_fd(listen_fd, 0);
+	if (!ASSERT_OK_FD(client_fd1, "connect_to_fd client_fd1"))
+		goto end;
+
+	/* bpf_sock_map_update() called from sockops should reject MPTCP sk */
+	if (!ASSERT_EQ(skel->bss->helper_ret, -EOPNOTSUPP, "should reject"))
+		goto end;
+
+	/* set trace_port = -1 to stop sockops */
+	skel->bss->trace_port = -1;
+	client_fd2 = connect_to_fd(listen_fd, 0);
+	if (!ASSERT_OK_FD(client_fd2, "connect_to_fd client_fd2"))
+		goto end;
+
+	server_fd = xaccept_nonblock(listen_fd, NULL, NULL);
+	err = bpf_map_update_elem(bpf_map__fd(skel->maps.sock_map),
+				  &zero, &server_fd, BPF_NOEXIST);
+	if (!ASSERT_EQ(err, -EOPNOTSUPP, "server should be disallowed"))
+		goto end;
+
+	/* MPTCP client should also be disallowed */
+	err = bpf_map_update_elem(bpf_map__fd(skel->maps.sock_map),
+				  &zero, &client_fd1, BPF_NOEXIST);
+	if (!ASSERT_EQ(err, -EOPNOTSUPP, "client should be disallowed"))
+		goto end;
+end:
+	if (client_fd1 > 0)
+		close(client_fd1);
+	if (client_fd2 > 0)
+		close(client_fd2);
+	if (server_fd > 0)
+		close(server_fd);
+	close(listen_fd);
+}
+
+static void test_mptcp_sockmap(void)
+{
+	struct mptcp_sockmap *skel;
+	struct netns_obj *netns;
+	int cgroup_fd, err;
+
+	cgroup_fd = test__join_cgroup("/mptcp_sockmap");
+	if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup: mptcp_sockmap"))
+		return;
+
+	skel = mptcp_sockmap__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_load: mptcp_sockmap"))
+		goto close_cgroup;
+
+	skel->links.mptcp_sockmap_inject =
+		bpf_program__attach_cgroup(skel->progs.mptcp_sockmap_inject, cgroup_fd);
+	if (!ASSERT_OK_PTR(skel->links.mptcp_sockmap_inject, "attach sockmap"))
+		goto skel_destroy;
+
+	err = bpf_prog_attach(bpf_program__fd(skel->progs.mptcp_sockmap_redirect),
+			      bpf_map__fd(skel->maps.sock_map),
+			      BPF_SK_SKB_STREAM_VERDICT, 0);
+	if (!ASSERT_OK(err, "bpf_prog_attach stream verdict"))
+		goto skel_destroy;
+
+	netns = netns_new(NS_TEST, true);
+	if (!ASSERT_OK_PTR(netns, "netns_new: mptcp_sockmap"))
+		goto skel_destroy;
+
+	if (endpoint_init("subflow") < 0)
+		goto close_netns;
+
+	test_sockmap_with_mptcp_fallback(skel);
+	test_sockmap_reject_mptcp(skel);
+
+close_netns:
+	netns_free(netns);
+skel_destroy:
+	mptcp_sockmap__destroy(skel);
+close_cgroup:
+	close(cgroup_fd);
+}
+
 void test_mptcp(void)
 {
 	if (test__start_subtest("base"))
@@ -444,4 +592,6 @@ void test_mptcp(void)
 		test_mptcpify();
 	if (test__start_subtest("subflow"))
 		test_subflow();
+	if (test__start_subtest("sockmap"))
+		test_mptcp_sockmap();
 }
diff --git a/tools/testing/selftests/bpf/progs/mptcp_sockmap.c b/tools/testing/selftests/bpf/progs/mptcp_sockmap.c
new file mode 100644
index 000000000000..d4eef0cbadb9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_sockmap.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "bpf_tracing_net.h"
+
+char _license[] SEC("license") = "GPL";
+
+int sk_index;
+int redirect_idx;
+int trace_port;
+int helper_ret;
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+	__uint(max_entries, 100);
+} sock_map SEC(".maps");
+
+SEC("sockops")
+int mptcp_sockmap_inject(struct bpf_sock_ops *skops)
+{
+	struct bpf_sock *sk;
+
+	/* only accept specified connection */
+	if (skops->local_port != trace_port ||
+	    skops->op != BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
+		return 1;
+
+	sk = skops->sk;
+	if (!sk)
+		return 1;
+
+	/* update sk handler */
+	helper_ret = bpf_sock_map_update(skops, &sock_map, &sk_index, BPF_NOEXIST);
+
+	return 1;
+}
+
+SEC("sk_skb/stream_verdict")
+int mptcp_sockmap_redirect(struct __sk_buff *skb)
+{
+	/* redirect skb to the sk under sock_map[redirect_idx] */
+	return bpf_sk_redirect_map(skb, &sock_map, redirect_idx, 0);
+}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap
  2025-10-23 12:54 [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
                   ` (2 preceding siblings ...)
  2025-10-23 12:54 ` [PATCH net v3 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
@ 2025-10-23 14:10 ` Matthieu Baerts
  2025-10-24  4:13   ` Jiayuan Chen
  3 siblings, 1 reply; 17+ messages in thread
From: Matthieu Baerts @ 2025-10-23 14:10 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S. Miller, Jakub Kicinski,
	Simon Horman, Mat Martineau, Geliang Tang, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

Hi Jiayuan,

Thank you for the v3. Sorry, I didn't have the opportunity to react on
the v2.

On 23/10/2025 14:54, Jiayuan Chen wrote:
> Overall, we encountered a warning [1] that can be triggered by running the
> selftest I provided.
> 
> MPTCP creates subflows for data transmission between two endpoints.
> However, BPF can use sockops to perform additional operations when TCP
> completes the three-way handshake. The issue arose because we used sockmap
> in sockops, which replaces sk->sk_prot and some handlers.

Do you know at what stage the sk->sk_prot is modified with sockmap? When
switching to TCP_ESTABLISHED?

Is it before or after having set "tcp_sk(sk)->is_mptcp = 0" (in
subflow_ulp_fallback(), coming from subflow_syn_recv_sock() I suppose)?

If MPTCP is still being used (sk_is_tcp(sk) && sk_is_mptcp(sk)), I guess
sockmap should never touch the in-kernel TCP subflows: they will likely
only carry a part of the data. Instead, sockmap should act on the MPTCP
sockets, not the in-kernel TCP subflows.

There is one particular case to take into consideration: an MPTCP
connection can fallback to "plain" TCP before being used by the
userspace. Typically, that's when an MPTCP listening socket receives a
"plain" TCP request (without MPTCP): a "plain" TCP socket will then be
created, and exposed to the userspace. In this case, sk_is_mptcp(sk)
will return false. I guess that's the case you are trying to handle,
right? (It might help BPF reviewers to mention that in the commit
message(s).)

I would then say that sk->sk_prot->psock_update_sk_prot should not point
to tcp_bpf_update_proto() when MPTCP is being used (or this callback
should take the MPTCP case into account, but I guess no). In case of
fallback before the accept() stage, the socket can then be used as a
"plain" TCP one. I guess when tcp_bpf_update_proto() will be called,
sk_prot is pointing to tcp(v6)_prot, not the MPTCP subflow override one,
right?

> Since subflows
> also have their own specialized handlers, this creates a conflict and leads
> to traffic failure. Therefore, we need to reject operations targeting
> subflows.

Would it not work to set sk_prot->psock_update_sk_prot to NULL for the
v4 and v6 subflows (in mptcp_subflow_init()) for the moment while
sockmap is not supported with MPTCP? This might save you some checks in
sock_map.c, no?

> This patchset simply prevents the combination of subflows and sockmap
> without changing any functionality.

In your case, you have an MPTCP listening socket, but you receive a TCP
request, right? The "sockmap update" is done when switching to
TCP_ESTABLISHED, when !sk_is_mptcp(sk), but that's before
mptcp_stream_accept(). That's why sk->sk_prot has been modified, but it
is fine to look at sk_family, and return inet(6)_stream_ops, right?

A more important question: what will typically happen in your case if
you receive an MPTCP request and sockmap is then not supported? Will the
connection be rejected or stay in a strange state because the userspace
will not expect that? In these cases, would it not be better to disallow
sockmap usage while the MPTCP support is not available? The userspace
would then get an error from the beginning that the protocol is not
supported, and should then not create an MPTCP socket in this case for
the moment, no?

I can understand that the switch from TCP to MPTCP was probably done
globally, and this transition should be as seamless as possible, but it
should not cause a regression with MPTCP requests. An alternative could
be to force a fallback to TCP when sockmap is used, even when an MPTCP
request is received, but not sure if it is practical to do, and might be
strange from the user point of view.

> A complete integration of MPTCP and sockmap would require more effort, for
> example, we would need to retrieve the parent socket from subflows in
> sockmap and implement handlers like read_skb.
> 
> If maintainers don't object, we can further improve this in subsequent
> work.

That would be great to add MPTCP support in sockmap! As mentioned above,
this should be done on the MPTCP socket. I guess the TCP "in-kernel"
subflows should not be modified.

> [1] truncated warning:
> [   18.234652] ------------[ cut here ]------------
> [   18.234664] WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380
> [   18.234726] Modules linked in:
> [   18.234755] RIP: 0010:mptcp_stream_accept+0x34c/0x380
> [   18.234762] RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
> [   18.234800] PKRU: 55555554
> [   18.234806] Call Trace:
> [   18.234810]  <TASK>
> [   18.234837]  do_accept+0xeb/0x190
> [   18.234861]  ? __x64_sys_pselect6+0x61/0x80
> [   18.234898]  ? _raw_spin_unlock+0x12/0x30
> [   18.234915]  ? alloc_fd+0x11e/0x190
> [   18.234925]  __sys_accept4+0x8c/0x100
> [   18.234930]  __x64_sys_accept+0x1f/0x30
> [   18.234933]  x64_sys_call+0x202f/0x20f0
> [   18.234966]  do_syscall_64+0x72/0x9a0
> [   18.234979]  ? switch_fpu_return+0x60/0xf0
> [   18.234993]  ? irqentry_exit_to_user_mode+0xdb/0x1e0
> [   18.235002]  ? irqentry_exit+0x3f/0x50
> [   18.235005]  ? clear_bhb_loop+0x50/0xa0
> [   18.235022]  ? clear_bhb_loop+0x50/0xa0
> [   18.235025]  ? clear_bhb_loop+0x50/0xa0
> [   18.235028]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   18.235066]  </TASK>
> [   18.235109] ---[ end trace 0000000000000000 ]---

Please next time use the ./scripts/decode_stacktrace.sh if possible.
(and strip the timestamps if it is not giving useful info)

Just to be sure: is it the warning you get on top of net or net-next? Or
an older version? (Always useful to mention the base)

> ---
> v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/T/#t
>     Some advice suggested by Jakub Sitnicki
> 
> v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linux.dev/T/#t
>     Some advice from Matthieu Baerts.

(It usually helps reviewers to add more details in the notes/changelog
for the individual patch)

> Jiayuan Chen (3):
>   net,mptcp: fix proto fallback detection with BPF sockmap

(detail: you can use the "mptcp:" prefix, no need to add "net,")

>   bpf,sockmap: disallow MPTCP sockets from sockmap
>   selftests/bpf: Add mptcp test with sockmap
> 
>  net/core/sock_map.c                           |  27 ++++
>  net/mptcp/protocol.c                          |   9 +-
>  .../testing/selftests/bpf/prog_tests/mptcp.c  | 150 ++++++++++++++++++
>  .../selftests/bpf/progs/mptcp_sockmap.c       |  43 +++++
>  4 files changed, 227 insertions(+), 2 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c
> 

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap
  2025-10-23 12:54 ` [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap Jiayuan Chen
@ 2025-10-23 14:10   ` Matthieu Baerts
  2025-10-23 14:38     ` Jiayuan Chen
  2025-10-28 11:30   ` Paolo Abeni
  1 sibling, 1 reply; 17+ messages in thread
From: Matthieu Baerts @ 2025-10-23 14:10 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: stable, Jakub Sitnicki, John Fastabend, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn, David S. Miller,
	Jakub Kicinski, Simon Horman, Mat Martineau, Geliang Tang,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

Hi Jiayuan,

On 23/10/2025 14:54, Jiayuan Chen wrote:
> When the server has MPTCP enabled but receives a non-MP-capable request
> from a client, it calls mptcp_fallback_tcp_ops().
> 
> Since non-MPTCP connections are allowed to use sockmap, which replaces
> sk->sk_prot, using sk->sk_prot to determine the IP version in
> mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning
> incorrect ops to sk->sk_socket->ops.
> 
> Additionally, when BPF Sockmap modifies the protocol handlers, the
> original WARN_ON_ONCE(sk->sk_prot != &tcp_prot) check would falsely
> trigger warnings.
> 
> Fix this by using the more stable sk_family to distinguish between IPv4
> and IPv6 connections, ensuring correct fallback protocol operations are
> selected even when BPF Sockmap has modified the socket protocol handlers.
> 
> Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  net/mptcp/protocol.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 0292162a14ee..2393741bc310 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -61,11 +61,16 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
>  
>  static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
>  {
> +	/* When BPF sockmap is used, it may replace sk->sk_prot.
> +	 * Using sk_family is a reliable way to determine the IP version.
> +	 */
> +	unsigned short family = READ_ONCE(sk->sk_family);
> +
>  #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> -	if (sk->sk_prot == &tcpv6_prot)
> +	if (family == AF_INET6)
>  		return &inet6_stream_ops;
>  #endif
> -	WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
> +	WARN_ON_ONCE(family != AF_INET);
>  	return &inet_stream_ops;

Just to be sure: is there anything in BPF modifying sk->sk_socket->ops?
Because that's what mptcp_fallback_tcp_ops() will do somehow.

In other words, is it always fine to set inet(6)_stream_ops? (I guess
yes, but better to be sure while we are looking at that :) )

>  }
>  

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap
  2025-10-23 14:10   ` Matthieu Baerts
@ 2025-10-23 14:38     ` Jiayuan Chen
  0 siblings, 0 replies; 17+ messages in thread
From: Jiayuan Chen @ 2025-10-23 14:38 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp
  Cc: stable, Jakub Sitnicki, John Fastabend, Eric Dumazet,
	Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn, David S. Miller,
	Jakub Kicinski, Simon Horman, Mat Martineau, Geliang Tang,
	Andrii Nakryiko, Eduard Zingerman, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

October 23, 2025 at 22:10, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:


> 
> Hi Jiayuan,
> 
> On 23/10/2025 14:54, Jiayuan Chen wrote:
> 
> > 
> > When the server has MPTCP enabled but receives a non-MP-capable request
> >  from a client, it calls mptcp_fallback_tcp_ops().
> >  
> >  Since non-MPTCP connections are allowed to use sockmap, which replaces
> >  sk->sk_prot, using sk->sk_prot to determine the IP version in
> >  mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning
> >  incorrect ops to sk->sk_socket->ops.
> >  
> >  Additionally, when BPF Sockmap modifies the protocol handlers, the
> >  original WARN_ON_ONCE(sk->sk_prot != &tcp_prot) check would falsely
> >  trigger warnings.
> >  
> >  Fix this by using the more stable sk_family to distinguish between IPv4
> >  and IPv6 connections, ensuring correct fallback protocol operations are
> >  selected even when BPF Sockmap has modified the socket protocol handlers.
> >  
> >  Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
> >  Cc: <stable@vger.kernel.org>
> >  Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> >  Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
> >  ---
> >  net/mptcp/protocol.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >  
> >  diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> >  index 0292162a14ee..2393741bc310 100644
> >  --- a/net/mptcp/protocol.c
> >  +++ b/net/mptcp/protocol.c
> >  @@ -61,11 +61,16 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
> >  
> >  static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
> >  {
> >  + /* When BPF sockmap is used, it may replace sk->sk_prot.
> >  + * Using sk_family is a reliable way to determine the IP version.
> >  + */
> >  + unsigned short family = READ_ONCE(sk->sk_family);
> >  +
> >  #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> >  - if (sk->sk_prot == &tcpv6_prot)
> >  + if (family == AF_INET6)
> >  return &inet6_stream_ops;
> >  #endif
> >  - WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
> >  + WARN_ON_ONCE(family != AF_INET);
> >  return &inet_stream_ops;
> > 
> Just to be sure: is there anything in BPF modifying sk->sk_socket->ops?
> Because that's what mptcp_fallback_tcp_ops() will do somehow.
> 
> In other words, is it always fine to set inet(6)_stream_ops? (I guess
> yes, but better to be sure while we are looking at that :) )



Hi Matt,

I can confirm that on the BPF side, the only special operations targeting
sockets currently are sockmap/sockhash. Their implementations do not modify
sk->sk_socket->ops. Currently, they only modify sk->prot, because the BPF
side typically operates on 'struct sock' and does not concern itself with
'struct socket'.

Therefore, setting inet(6)_stream_ops is fine.

Thanks,
Jiayuan

> > 
> > }
> > 
> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap
  2025-10-23 14:10 ` [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts
@ 2025-10-24  4:13   ` Jiayuan Chen
  2025-10-28 17:26     ` Matthieu Baerts
  0 siblings, 1 reply; 17+ messages in thread
From: Jiayuan Chen @ 2025-10-24  4:13 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp
  Cc: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S. Miller, Jakub Kicinski,
	Simon Horman, Mat Martineau, Geliang Tang, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

2025/10/23 22:10, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > 写到:

> >  MPTCP creates subflows for data transmission between two endpoints.
> >  However, BPF can use sockops to perform additional operations when TCP
> >  completes the three-way handshake. The issue arose because we used sockmap
> >  in sockops, which replaces sk->sk_prot and some handlers.
> > 
> Do you know at what stage the sk->sk_prot is modified with sockmap? When
> switching to TCP_ESTABLISHED?
> Is it before or after having set "tcp_sk(sk)->is_mptcp = 0" (in
> subflow_ulp_fallback(), coming from subflow_syn_recv_sock() I suppose)?

Yes, there are two call points. One is after executing subflow_syn_recv_sock():
tcp_init_transfer(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB, skb);

So at this point, is_mptcp = 0. The other call point is when userspace calls
the BPF interface, passing in an fd while it's not a subflow but a parent sk
with its own mptcp_prot we will also reject it.

You can refer to my provided selftest, which covers these scenarios.

> If MPTCP is still being used (sk_is_tcp(sk) && sk_is_mptcp(sk)), I guess
> sockmap should never touch the in-kernel TCP subflows: they will likely
> only carry a part of the data. Instead, sockmap should act on the MPTCP
> sockets, not the in-kernel TCP subflows.

Yes, I agree.

For full functionality, we need to retrieve the parent socket from MPTCP
and integrate it with sockmap, rather than simply rejecting.

The current implementation rejects MPTCP because I previously attempted to
add sockmap support for MPTCP, but it required implementing many interfaces
and would take considerable time.

So for now, I'm proposing this as a fix to resolve the immediate issue.
Subsequently, we can continue working on fully integrating MPTCP with sockmap.

> There is one particular case to take into consideration: an MPTCP
> connection can fallback to "plain" TCP before being used by the
> userspace. Typically, that's when an MPTCP listening socket receives a
> "plain" TCP request (without MPTCP): a "plain" TCP socket will then be
> created, and exposed to the userspace. In this case, sk_is_mptcp(sk)
> will return false. I guess that's the case you are trying to handle,
> right? (It might help BPF reviewers to mention that in the commit
> message(s).)

Yes, this is primarily the case we're addressing. I will add this description
to the commit message.

> I would then say that sk->sk_prot->psock_update_sk_prot should not point
> to tcp_bpf_update_proto() when MPTCP is being used (or this callback
> should take the MPTCP case into account, but I guess no). In case of
> fallback before the accept() stage, the socket can then be used as a
> "plain" TCP one. I guess when tcp_bpf_update_proto() will be called,
> sk_prot is pointing to tcp(v6)_prot, not the MPTCP subflow override one,
> right?

Yes, when tcp_bpf_update_proto is called the sk_prot is pointing to tcp(v6)_prot.
subflow_syn_recv_sock
 mptcp_subflow_drop_ctx
  subflow_ulp_fallback
   mptcp_subflow_ops_undo_override -> reset sk_prot to original one

So [patch 2/3] aims to prevent psock_update_sk_prot from being executed on subflows.

Actually, replacing the subflow's callbacks is also incorrect, as you mentioned earlier,
because subflows only carry part of the data. By checking for subflows early and skipping
subsequent steps, we avoid incorrect logic.

Furthermore, there's another risk: if an IPv6 request comes in and we perform the replacement,
MPTCP will roll it back to inet_stream_ops. I haven't delved too deeply into the potential
impact, but I noticed that inet6_release has many V6-specific cleanup procedures not present
in inet_release.

> > 
> > Since subflows
> >  also have their own specialized handlers, this creates a conflict and leads
> >  to traffic failure. Therefore, we need to reject operations targeting
> >  subflows.
> > 
> Would it not work to set sk_prot->psock_update_sk_prot to NULL for the
> v4 and v6 subflows (in mptcp_subflow_init()) for the moment while
> sockmap is not supported with MPTCP? This might save you some checks in
> sock_map.c, no?

This seems like a reliable alternative I hadn't considered initially.

However, adding the check on the BPF side serves another purpose: to explicitly
warn users that sockmap and MPTCP are incompatible.

Since the latest Golang version enables MPTCP server by default, and if the client
doesn't support MPTCP, it falls back to TCP logic. We want to print a clear message
informing users who have upgraded to the latest Golang and are using sockmap.

Perhaps we could add a function like sk_is_mptcp_subflow() in the MPTCP side?
The implementation would simply be sk_is_tcp(sk) && sk_is_mptcp(sk).

Implementing this check logic on the BPF side might become invalid if MPTCP internals
change later; placing it in the MPTCP side might be a better choice.

> > 
> > This patchset simply prevents the combination of subflows and sockmap
> >  without changing any functionality.
> > 
> In your case, you have an MPTCP listening socket, but you receive a TCP
> request, right? The "sockmap update" is done when switching to
> TCP_ESTABLISHED, when !sk_is_mptcp(sk), but that's before
> mptcp_stream_accept(). That's why sk->sk_prot has been modified, but it
> is fine to look at sk_family, and return inet(6)_stream_ops, right?

I believe so. Since MPTCP is fundamentally based on TCP, using sk_family to
determine which ops to fall back to should be sufficient.

However, strictly speaking, this [patch 1/3] might not even be necessary if we
prevent the sk_prot replacement for subflows at the sockmap layer.

> A more important question: what will typically happen in your case if
> you receive an MPTCP request and sockmap is then not supported? Will the
> connection be rejected or stay in a strange state because the userspace
> will not expect that? In these cases, would it not be better to disallow
> sockmap usage while the MPTCP support is not available? The userspace
> would then get an error from the beginning that the protocol is not
> supported, and should then not create an MPTCP socket in this case for
> the moment, no?
>
> I can understand that the switch from TCP to MPTCP was probably done
> globally, and this transition should be as seamless as possible, but it
> should not cause a regression with MPTCP requests. An alternative could
> be to force a fallback to TCP when sockmap is used, even when an MPTCP
> request is received, but not sure if it is practical to do, and might be
> strange from the user point of view.

Actually, I understand this not as an MPTCP regression, but as a sockmap
regression.

Let me explain how users typically use sockmap:

Users typically create multiple sockets on a host and program using BPF+sockmap
to enable fast data redirection. This involves intercepting data sent or received
by one socket and redirecting it to the send or receive queue of another socket.

This requires explicit user programming. The goal is that when multiple microservices
on one host need to communicate, they can bypass most of the network stack and avoid
data copies between user and kernel space.

However, when an MPTCP request occurs, this redirection flow fails.

Since the sockmap workflow typically occurs after the three-way handshake, rolling
back at that point might be too late, and undoing the logic for MPTCP would be very
complex.

Regardless, the reality is that MPTCP and sockmap are already conflicting, and this
has been the case for some time. So I think our first step is to catch specific
behavior on the BPF side and print a message
"sockmap/sockhash: MPTCP sockets are not supported\n", informing users to either
stop using sockmap or not use MPTCP.

As for the logic to check for subflows, I think implementing it in subflow.c would be
beneficial, as this logic would likely be useful later if we want to
support MPTCP + sockmap.

Furthermore, this commit also addresses the issue of incorrectly selecting
inet_stream_ops due to the subflow prot replacement, as mentioned above.

> > 
> > A complete integration of MPTCP and sockmap would require more effort, for
> >  example, we would need to retrieve the parent socket from subflows in
> >  sockmap and implement handlers like read_skb.
> >  
> >  If maintainers don't object, we can further improve this in subsequent
> >  work.
> > 
> That would be great to add MPTCP support in sockmap! As mentioned above,
> this should be done on the MPTCP socket. I guess the TCP "in-kernel"
> subflows should not be modified.

I think we should first fix the issue by having sockmap reject operations on subflows.
Subsequently, we can work on fully integrating sockmap with MPTCP as a feature
(which would require implementing some handlers).

> > 
> > [1] truncated warning:
> >  [ 18.234652] ------------[ cut here ]------------
> >  [ 18.234664] WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380
> >  [ 18.234726] Modules linked in:
> >  [ 18.234755] RIP: 0010:mptcp_stream_accept+0x34c/0x380
> >  [ 18.234762] RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
[...]
> > 
> Please next time use the ./scripts/decode_stacktrace.sh if possible.
> (and strip the timestamps if it is not giving useful info)
> Just to be sure: is it the warning you get on top of net or net-next? Or
> an older version? (Always useful to mention the base)

Thank you, Matthieu. I will pay attention to this.

> > 
> > ---
> >  v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/T/#t
> >  Some advice suggested by Jakub Sitnicki
> >  
> >  v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linux.dev/T/#t
> >  Some advice from Matthieu Baerts.
> > 
> (It usually helps reviewers to add more details in the notes/changelog
> for the individual patch)

Thank you, Matthieu. I will provide more detailed descriptions in the future.

Best regards,
Jiayuan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap
  2025-10-23 12:54 ` [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap Jiayuan Chen
  2025-10-23 14:10   ` Matthieu Baerts
@ 2025-10-28 11:30   ` Paolo Abeni
  2025-10-28 11:47     ` Paolo Abeni
  2025-11-03 12:44     ` Jiayuan Chen
  1 sibling, 2 replies; 17+ messages in thread
From: Paolo Abeni @ 2025-10-28 11:30 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: stable, Jakub Sitnicki, John Fastabend, Eric Dumazet,
	Kuniyuki Iwashima, Willem de Bruijn, David S. Miller,
	Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
	Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

On 10/23/25 2:54 PM, Jiayuan Chen wrote:
> When the server has MPTCP enabled but receives a non-MP-capable request
> from a client, it calls mptcp_fallback_tcp_ops().
> 
> Since non-MPTCP connections are allowed to use sockmap, which replaces
> sk->sk_prot, using sk->sk_prot to determine the IP version in
> mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning
> incorrect ops to sk->sk_socket->ops.

I don't see how sockmap could modify the to-be-accepted socket sk_prot
before mptcp_fallback_tcp_ops(), as such call happens before the fd is
installed, and AFAICS sockmap can only fetch sockets via fds.

Is this patch needed?

/P


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap
  2025-10-28 11:30   ` Paolo Abeni
@ 2025-10-28 11:47     ` Paolo Abeni
  2025-11-03 12:45       ` Jiayuan Chen
  2025-11-03 12:44     ` Jiayuan Chen
  1 sibling, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2025-10-28 11:47 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: stable, Jakub Sitnicki, John Fastabend, Eric Dumazet,
	Kuniyuki Iwashima, Willem de Bruijn, David S. Miller,
	Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
	Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

On 10/28/25 12:30 PM, Paolo Abeni wrote:
> On 10/23/25 2:54 PM, Jiayuan Chen wrote:
>> When the server has MPTCP enabled but receives a non-MP-capable request
>> from a client, it calls mptcp_fallback_tcp_ops().
>>
>> Since non-MPTCP connections are allowed to use sockmap, which replaces
>> sk->sk_prot, using sk->sk_prot to determine the IP version in
>> mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning
>> incorrect ops to sk->sk_socket->ops.
> 
> I don't see how sockmap could modify the to-be-accepted socket sk_prot
> before mptcp_fallback_tcp_ops(), as such call happens before the fd is
> installed, and AFAICS sockmap can only fetch sockets via fds.
> 
> Is this patch needed?

Matttbe explained off-list the details of how that could happen. I think
the commit message here must be more verbose to explain clearly the
whys, even to those non proficient in sockmap like me.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap
  2025-10-23 12:54 ` [PATCH net v3 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap Jiayuan Chen
@ 2025-10-28 12:03   ` Paolo Abeni
  2025-11-03 12:52     ` Jiayuan Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Paolo Abeni @ 2025-10-28 12:03 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: stable, Jakub Sitnicki, John Fastabend, Eric Dumazet,
	Kuniyuki Iwashima, Willem de Bruijn, David S. Miller,
	Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
	Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

On 10/23/25 2:54 PM, Jiayuan Chen wrote:
> MPTCP creates subflows for data transmission, and these sockets should not
> be added to sockmap because MPTCP sets specialized data_ready handlers
> that would be overridden by sockmap.
> 
> Additionally, for the parent socket of MPTCP subflows (plain TCP socket),
> MPTCP sk requires specific protocol handling that conflicts with sockmap's
> operation(mptcp_prot).
> 
> This patch adds proper checks to reject MPTCP subflows and their parent
> sockets from being added to sockmap, while preserving compatibility with
> reuseport functionality for listening MPTCP sockets.

It's unclear to me why that is safe. sockmap is going to change the
listener msk proto ops.

The listener could disconnect and create an egress connection, still
using the wrong ops.

I think sockmap should always be prevented for mptcp socket, or at least
a solid explanation of why such exception is safe should be included in
the commit message.

Note that the first option allows for solving the issue entirely in the
mptcp code, setting dummy/noop psock_update_sk_prot for mptcp sockets
and mptcp subflows.

/P


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap
  2025-10-24  4:13   ` Jiayuan Chen
@ 2025-10-28 17:26     ` Matthieu Baerts
  2025-11-03 12:34       ` Jiayuan Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Matthieu Baerts @ 2025-10-28 17:26 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S. Miller, Jakub Kicinski,
	Simon Horman, Mat Martineau, Geliang Tang, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

Hi Jiayuan,

Thank you for your reply!

On 24/10/2025 06:13, Jiayuan Chen wrote:
> 2025/10/23 22:10, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > 写到:
> 
> 
>>>  MPTCP creates subflows for data transmission between two endpoints.
>>>  However, BPF can use sockops to perform additional operations when TCP
>>>  completes the three-way handshake. The issue arose because we used sockmap
>>>  in sockops, which replaces sk->sk_prot and some handlers.
>>>
>> Do you know at what stage the sk->sk_prot is modified with sockmap? When
>> switching to TCP_ESTABLISHED?
>> Is it before or after having set "tcp_sk(sk)->is_mptcp = 0" (in
>> subflow_ulp_fallback(), coming from subflow_syn_recv_sock() I suppose)?
> 
> 
> Yes, there are two call points. One is after executing subflow_syn_recv_sock():
> tcp_init_transfer(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB, skb);
> 
> So at this point, is_mptcp = 0. The other call point is when userspace calls
> the BPF interface, passing in an fd while it's not a subflow but a parent sk
> with its own mptcp_prot we will also reject it.

OK, thank you for the explanations! I think your commit message in patch
1/3 should then explain the conditions to have mptcp_fallback_tcp_ops()
being called with a different sk_prot. In short: MPTCP listening socket,
TCP request without MPTCP, sk_prot reset to TCP (subflow_syn_recv_sock)
when SYN RECV, then reset by sockmap when ESTABLISHED, then accept part
and sk_prot is not the expected one.

> You can refer to my provided selftest, which covers these scenarios.
> 
>> If MPTCP is still being used (sk_is_tcp(sk) && sk_is_mptcp(sk)), I guess
>> sockmap should never touch the in-kernel TCP subflows: they will likely
>> only carry a part of the data. Instead, sockmap should act on the MPTCP
>> sockets, not the in-kernel TCP subflows.
> 
> Yes, I agree.
> 
> For full functionality, we need to retrieve the parent socket from MPTCP
> and integrate it with sockmap, rather than simply rejecting.

We should be careful when adding such exceptions. I will add more
details below.

> The current implementation rejects MPTCP because I previously attempted to
> add sockmap support for MPTCP, but it required implementing many interfaces
> and would take considerable time.
> 
> So for now, I'm proposing this as a fix to resolve the immediate issue.
> Subsequently, we can continue working on fully integrating MPTCP with sockmap.

It makes sense to start with the fix for stable, then the implementation
later. I think the implementation should not be that complex: it is just
that it has to be done at MPTCP level, not TCP. sockmap supports
different protocol, and it doesn't seem to be TCP specific, so that
should be feasible.

>> There is one particular case to take into consideration: an MPTCP
>> connection can fallback to "plain" TCP before being used by the
>> userspace. Typically, that's when an MPTCP listening socket receives a
>> "plain" TCP request (without MPTCP): a "plain" TCP socket will then be
>> created, and exposed to the userspace. In this case, sk_is_mptcp(sk)
>> will return false. I guess that's the case you are trying to handle,
>> right? (It might help BPF reviewers to mention that in the commit
>> message(s).)
> 
> Yes, this is primarily the case we're addressing. I will add this description
> to the commit message.

Thanks!

>> I would then say that sk->sk_prot->psock_update_sk_prot should not point
>> to tcp_bpf_update_proto() when MPTCP is being used (or this callback
>> should take the MPTCP case into account, but I guess no). In case of
>> fallback before the accept() stage, the socket can then be used as a
>> "plain" TCP one. I guess when tcp_bpf_update_proto() will be called,
>> sk_prot is pointing to tcp(v6)_prot, not the MPTCP subflow override one,
>> right?
> 
> Yes, when tcp_bpf_update_proto is called the sk_prot is pointing to tcp(v6)_prot.
> subflow_syn_recv_sock
>  mptcp_subflow_drop_ctx
>   subflow_ulp_fallback
>    mptcp_subflow_ops_undo_override -> reset sk_prot to original one

I see, it would be good to add that in the commit message as well.

> So [patch 2/3] aims to prevent psock_update_sk_prot from being executed on subflows.
> 
> Actually, replacing the subflow's callbacks is also incorrect, as you mentioned earlier,
> because subflows only carry part of the data. By checking for subflows early and skipping
> subsequent steps, we avoid incorrect logic.
> 
> Furthermore, there's another risk: if an IPv6 request comes in and we perform the replacement,
> MPTCP will roll it back to inet_stream_ops. I haven't delved too deeply into the potential
> impact, but I noticed that inet6_release has many V6-specific cleanup procedures not present
> in inet_release.

That's why we have the WARN_ON_ONCE(): this sk_prot was not expected, a
fix in the code is required if another value is accepted.

>>> Since subflows
>>>  also have their own specialized handlers, this creates a conflict and leads
>>>  to traffic failure. Therefore, we need to reject operations targeting
>>>  subflows.
>>>
>> Would it not work to set sk_prot->psock_update_sk_prot to NULL for the
>> v4 and v6 subflows (in mptcp_subflow_init()) for the moment while
>> sockmap is not supported with MPTCP? This might save you some checks in
>> sock_map.c, no?
> 
> This seems like a reliable alternative I hadn't considered initially.
> 
> However, adding the check on the BPF side serves another purpose: to explicitly
> warn users that sockmap and MPTCP are incompatible.
> 
> Since the latest Golang version enables MPTCP server by default, and if the client
> doesn't support MPTCP, it falls back to TCP logic. We want to print a clear message
> informing users who have upgraded to the latest Golang and are using sockmap.
> 
> Perhaps we could add a function like sk_is_mptcp_subflow() in the MPTCP side?
> The implementation would simply be sk_is_tcp(sk) && sk_is_mptcp(sk).
> 
> Implementing this check logic on the BPF side might become invalid if MPTCP internals
> change later; placing it in the MPTCP side might be a better choice.

I can understand that adding an error message can be helpful, but I
don't think we should add MPTCP specific checks in sockmap for the moment.

>>> This patchset simply prevents the combination of subflows and sockmap
>>>  without changing any functionality.
>>>
>> In your case, you have an MPTCP listening socket, but you receive a TCP
>> request, right? The "sockmap update" is done when switching to
>> TCP_ESTABLISHED, when !sk_is_mptcp(sk), but that's before
>> mptcp_stream_accept(). That's why sk->sk_prot has been modified, but it
>> is fine to look at sk_family, and return inet(6)_stream_ops, right?
> 
> I believe so. Since MPTCP is fundamentally based on TCP, using sk_family to
> determine which ops to fall back to should be sufficient.
> 
> However, strictly speaking, this [patch 1/3] might not even be necessary if we
> prevent the sk_prot replacement for subflows at the sockmap layer.
> 
>> A more important question: what will typically happen in your case if
>> you receive an MPTCP request and sockmap is then not supported? Will the
>> connection be rejected or stay in a strange state because the userspace
>> will not expect that? In these cases, would it not be better to disallow
>> sockmap usage while the MPTCP support is not available? The userspace
>> would then get an error from the beginning that the protocol is not
>> supported, and should then not create an MPTCP socket in this case for
>> the moment, no?
>>
>> I can understand that the switch from TCP to MPTCP was probably done
>> globally, and this transition should be as seamless as possible, but it
>> should not cause a regression with MPTCP requests. An alternative could
>> be to force a fallback to TCP when sockmap is used, even when an MPTCP
>> request is received, but not sure if it is practical to do, and might be
>> strange from the user point of view.
> 
> Actually, I understand this not as an MPTCP regression, but as a sockmap
> regression.
> 
> Let me explain how users typically use sockmap:
> 
> Users typically create multiple sockets on a host and program using BPF+sockmap
> to enable fast data redirection. This involves intercepting data sent or received
> by one socket and redirecting it to the send or receive queue of another socket.
> 
> This requires explicit user programming. The goal is that when multiple microservices
> on one host need to communicate, they can bypass most of the network stack and avoid
> data copies between user and kernel space.
> 
> However, when an MPTCP request occurs, this redirection flow fails.

This part bothers me a bit. Does it mean that when the userspace creates
a TCP listening socket (IPPROTO_TCP), MPTCP requests will be accepted,
but MPTCP will not be used ; but when an MPTCP socket is used instead,
MPTCP requests will be rejected?

If yes, it might be clearer not to allow sockmap on connections created
from MPTCP sockets. But when looking at sockmap and what's happening
when a TCP socket is created following a "plain TCP" request, we would
need specific MPTCP code to catch that in sockmap...

> Since the sockmap workflow typically occurs after the three-way handshake, rolling
> back at that point might be too late, and undoing the logic for MPTCP would be very
> complex.
> 
> Regardless, the reality is that MPTCP and sockmap are already conflicting, and this
> has been the case for some time. So I think our first step is to catch specific
> behavior on the BPF side and print a message
> "sockmap/sockhash: MPTCP sockets are not supported\n", informing users to either
> stop using sockmap or not use MPTCP.
> 
> As for the logic to check for subflows, I think implementing it in subflow.c would be
> beneficial, as this logic would likely be useful later if we want to
> support MPTCP + sockmap.

Probably yes.

> Furthermore, this commit also addresses the issue of incorrectly selecting
> inet_stream_ops due to the subflow prot replacement, as mentioned above.

(indeed, but this seems to happen only when sk_prot has been replaced by
sockmap :) )

>>> A complete integration of MPTCP and sockmap would require more effort, for
>>>  example, we would need to retrieve the parent socket from subflows in
>>>  sockmap and implement handlers like read_skb.
>>>  
>>>  If maintainers don't object, we can further improve this in subsequent
>>>  work.
>>>
>> That would be great to add MPTCP support in sockmap! As mentioned above,
>> this should be done on the MPTCP socket. I guess the TCP "in-kernel"
>> subflows should not be modified.
> 
> 
> I think we should first fix the issue by having sockmap reject operations on subflows.
> Subsequently, we can work on fully integrating sockmap with MPTCP as a feature
> (which would require implementing some handlers).

OK for me!

>>> [1] truncated warning:
>>>  [ 18.234652] ------------[ cut here ]------------
>>>  [ 18.234664] WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380
>>>  [ 18.234726] Modules linked in:
>>>  [ 18.234755] RIP: 0010:mptcp_stream_accept+0x34c/0x380
>>>  [ 18.234762] RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
> [...]
>>>
>> Please next time use the ./scripts/decode_stacktrace.sh if possible.
>> (and strip the timestamps if it is not giving useful info)
>> Just to be sure: is it the warning you get on top of net or net-next? Or
>> an older version? (Always useful to mention the base)
> 
> Thank you, Matthieu. I will pay attention to this.
> 
> 
>>>
>>> ---
>>>  v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/T/#t
>>>  Some advice suggested by Jakub Sitnicki
>>>  
>>>  v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linux.dev/T/#t
>>>  Some advice from Matthieu Baerts.
>>>
>> (It usually helps reviewers to add more details in the notes/changelog
>> for the individual patch)
> 
> Thank you, Matthieu. I will provide more detailed descriptions in the future.

Thanks!

So for the v4, patch 2/3 would be replaced by one setting ...

  tcp_prot_override.psock_update_sk_prot = NULL;
  (...)
  tcpv6_prot_override.psock_update_sk_prot = NULL;

... in mptcp_subflow_init(). (+ more details for patch 1/3).

From there, we can discuss with other maintainers what to do with the
MPTCP listening socket + sockmap case. And in parallel, we can also
discuss MPTCP support with sockmap. WDYT?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

pw-bot: cr


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap
  2025-10-28 17:26     ` Matthieu Baerts
@ 2025-11-03 12:34       ` Jiayuan Chen
  2025-11-03 15:53         ` Matthieu Baerts
  0 siblings, 1 reply; 17+ messages in thread
From: Jiayuan Chen @ 2025-11-03 12:34 UTC (permalink / raw)
  To: Matthieu Baerts, mptcp
  Cc: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S. Miller, Jakub Kicinski,
	Simon Horman, Mat Martineau, Geliang Tang, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

October 29, 2025 at 1:26 AM, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:

Hi Matthieu, apologies for the delayed response.

> 
> Hi Jiayuan,
> 
> Thank you for your reply!
> 
> On 24/10/2025 06:13, Jiayuan Chen wrote:
> 
> > 
> > 2025/10/23 22:10, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > 写到:
> >  
> >  
> >  MPTCP creates subflows for data transmission between two endpoints.
> >  However, BPF can use sockops to perform additional operations when TCP
> >  completes the three-way handshake. The issue arose because we used sockmap
> >  in sockops, which replaces sk->sk_prot and some handlers.
> > 
> > > 
> > > Do you know at what stage the sk->sk_prot is modified with sockmap? When
> > >  switching to TCP_ESTABLISHED?
> > >  Is it before or after having set "tcp_sk(sk)->is_mptcp = 0" (in
> > >  subflow_ulp_fallback(), coming from subflow_syn_recv_sock() I suppose)?
> > > 
> >  
> >  
> >  Yes, there are two call points. One is after executing subflow_syn_recv_sock():
> >  tcp_init_transfer(sk, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB, skb);
> >  
> >  So at this point, is_mptcp = 0. The other call point is when userspace calls
> >  the BPF interface, passing in an fd while it's not a subflow but a parent sk
> >  with its own mptcp_prot we will also reject it.
> > 
> OK, thank you for the explanations! I think your commit message in patch
> 1/3 should then explain the conditions to have mptcp_fallback_tcp_ops()
> being called with a different sk_prot. In short: MPTCP listening socket,
> TCP request without MPTCP, sk_prot reset to TCP (subflow_syn_recv_sock)
> when SYN RECV, then reset by sockmap when ESTABLISHED, then accept part
> and sk_prot is not the expected one.
> 
> > 
> > You can refer to my provided selftest, which covers these scenarios.
> >  
> > 
> > > 
> > > If MPTCP is still being used (sk_is_tcp(sk) && sk_is_mptcp(sk)), I guess
> > >  sockmap should never touch the in-kernel TCP subflows: they will likely
> > >  only carry a part of the data. Instead, sockmap should act on the MPTCP
> > >  sockets, not the in-kernel TCP subflows.
> > > 
> >  
> >  Yes, I agree.
> >  
> >  For full functionality, we need to retrieve the parent socket from MPTCP
> >  and integrate it with sockmap, rather than simply rejecting.
> > 
> We should be careful when adding such exceptions. I will add more
> details below.
> 
> > 
> > The current implementation rejects MPTCP because I previously attempted to
> >  add sockmap support for MPTCP, but it required implementing many interfaces
> >  and would take considerable time.
> >  
> >  So for now, I'm proposing this as a fix to resolve the immediate issue.
> >  Subsequently, we can continue working on fully integrating MPTCP with sockmap.
> > 
> It makes sense to start with the fix for stable, then the implementation
> later. I think the implementation should not be that complex: it is just
> that it has to be done at MPTCP level, not TCP. sockmap supports
> different protocol, and it doesn't seem to be TCP specific, so that
> should be feasible.

I agree with that. From a userspace perspective, we can't really manipulate subflow
TCP directly, and I also think it's correct to handle this at the MPTCP layer.

But I didn't quite get your point about "it has to be done at MPTCP level." Currently,
BPF provides 'sockops' capability, which invokes BPF programs in the protocol stack.
The input parameter sk for the BPF program is actually a TCP sk (subflow).

Many helper functions (like sockmap) have no choice but to care about whether it's MPTCP
or not.

> > 
> > > 
> > > There is one particular case to take into consideration: an MPTCP
> > >  connection can fallback to "plain" TCP before being used by the
> > >  userspace. Typically, that's when an MPTCP listening socket receives a
> > >  "plain" TCP request (without MPTCP): a "plain" TCP socket will then be
> > >  created, and exposed to the userspace. In this case, sk_is_mptcp(sk)
> > >  will return false. I guess that's the case you are trying to handle,
> > >  right? (It might help BPF reviewers to mention that in the commit
> > >  message(s).)
> > > 
> >  
> >  Yes, this is primarily the case we're addressing. I will add this description
> >  to the commit message.
> > 
> Thanks!
> 
> > 
> > > 
> > > I would then say that sk->sk_prot->psock_update_sk_prot should not point
> > >  to tcp_bpf_update_proto() when MPTCP is being used (or this callback
> > >  should take the MPTCP case into account, but I guess no). In case of
> > >  fallback before the accept() stage, the socket can then be used as a
> > >  "plain" TCP one. I guess when tcp_bpf_update_proto() will be called,
> > >  sk_prot is pointing to tcp(v6)_prot, not the MPTCP subflow override one,
> > >  right?
> > > 
> >  
> >  Yes, when tcp_bpf_update_proto is called the sk_prot is pointing to tcp(v6)_prot.
> >  subflow_syn_recv_sock
> >  mptcp_subflow_drop_ctx
> >  subflow_ulp_fallback
> >  mptcp_subflow_ops_undo_override -> reset sk_prot to original one
> > 
> I see, it would be good to add that in the commit message as well.

Thanks, I will add it.

> 
> > 
> > So [patch 2/3] aims to prevent psock_update_sk_prot from being executed on subflows.
> >  
> >  Actually, replacing the subflow's callbacks is also incorrect, as you mentioned earlier,
> >  because subflows only carry part of the data. By checking for subflows early and skipping
> >  subsequent steps, we avoid incorrect logic.
> >  
> >  Furthermore, there's another risk: if an IPv6 request comes in and we perform the replacement,
> >  MPTCP will roll it back to inet_stream_ops. I haven't delved too deeply into the potential
> >  impact, but I noticed that inet6_release has many V6-specific cleanup procedures not present
> >  in inet_release.
> > 
> That's why we have the WARN_ON_ONCE(): this sk_prot was not expected, a
> fix in the code is required if another value is accepted.
> 
> > 
> > Since subflows
> >  also have their own specialized handlers, this creates a conflict and leads
> >  to traffic failure. Therefore, we need to reject operations targeting
> >  subflows.
> > 
> > > 
> > > Would it not work to set sk_prot->psock_update_sk_prot to NULL for the
> > >  v4 and v6 subflows (in mptcp_subflow_init()) for the moment while
> > >  sockmap is not supported with MPTCP? This might save you some checks in
> > >  sock_map.c, no?
> > > 
> >  
> >  This seems like a reliable alternative I hadn't considered initially.
> >  
> >  However, adding the check on the BPF side serves another purpose: to explicitly
> >  warn users that sockmap and MPTCP are incompatible.
> >  
> >  Since the latest Golang version enables MPTCP server by default, and if the client
> >  doesn't support MPTCP, it falls back to TCP logic. We want to print a clear message
> >  informing users who have upgraded to the latest Golang and are using sockmap.
> >  
> >  Perhaps we could add a function like sk_is_mptcp_subflow() in the MPTCP side?
> >  The implementation would simply be sk_is_tcp(sk) && sk_is_mptcp(sk).
> >  
> >  Implementing this check logic on the BPF side might become invalid if MPTCP internals
> >  change later; placing it in the MPTCP side might be a better choice.
> > 
> I can understand that adding an error message can be helpful, but I
> don't think we should add MPTCP specific checks in sockmap for the moment.

Thanks, I will try to set psock_update_sk_prot to NULL and test it, and if it works I will send a new patch then.

> > 
> > This patchset simply prevents the combination of subflows and sockmap
> >  without changing any functionality.
> > 
> > > 
> > > In your case, you have an MPTCP listening socket, but you receive a TCP
> > >  request, right? The "sockmap update" is done when switching to
> > >  TCP_ESTABLISHED, when !sk_is_mptcp(sk), but that's before
> > >  mptcp_stream_accept(). That's why sk->sk_prot has been modified, but it
> > >  is fine to look at sk_family, and return inet(6)_stream_ops, right?
> > > 
> >  
> >  I believe so. Since MPTCP is fundamentally based on TCP, using sk_family to
> >  determine which ops to fall back to should be sufficient.
> >  
> >  However, strictly speaking, this [patch 1/3] might not even be necessary if we
> >  prevent the sk_prot replacement for subflows at the sockmap layer.
> >  
> > 
> > > 
> > > A more important question: what will typically happen in your case if
> > >  you receive an MPTCP request and sockmap is then not supported? Will the
> > >  connection be rejected or stay in a strange state because the userspace
> > >  will not expect that? In these cases, would it not be better to disallow
> > >  sockmap usage while the MPTCP support is not available? The userspace
> > >  would then get an error from the beginning that the protocol is not
> > >  supported, and should then not create an MPTCP socket in this case for
> > >  the moment, no?
> > > 
> > >  I can understand that the switch from TCP to MPTCP was probably done
> > >  globally, and this transition should be as seamless as possible, but it
> > >  should not cause a regression with MPTCP requests. An alternative could
> > >  be to force a fallback to TCP when sockmap is used, even when an MPTCP
> > >  request is received, but not sure if it is practical to do, and might be
> > >  strange from the user point of view.
> > > 
> >  
> >  Actually, I understand this not as an MPTCP regression, but as a sockmap
> >  regression.
> >  
> >  Let me explain how users typically use sockmap:
> >  
> >  Users typically create multiple sockets on a host and program using BPF+sockmap
> >  to enable fast data redirection. This involves intercepting data sent or received
> >  by one socket and redirecting it to the send or receive queue of another socket.
> >  
> >  This requires explicit user programming. The goal is that when multiple microservices
> >  on one host need to communicate, they can bypass most of the network stack and avoid
> >  data copies between user and kernel space.
> >  
> >  However, when an MPTCP request occurs, this redirection flow fails.
> > 
> This part bothers me a bit. Does it mean that when the userspace creates
> a TCP listening socket (IPPROTO_TCP), MPTCP requests will be accepted,
> but MPTCP will not be used ; but when an MPTCP socket is used instead,
> MPTCP requests will be rejected?

"when the userspace creates a TCP listening socket (IPPROTO_TCP), MPTCP requests will be accepted,
but MPTCP will not be used"
--- Yes, that's essentially the logic behind MPTCP fallback, right? In this case, it should work
fine with sockmap as well. That's exactly what this patch aims to achieve.

"but when an MPTCP socket is used instead, MPTCP requests will be rejected?"
--- Exactly. Currently, because sockmap operates directly on the subflow sk, it breaks the MPTCP
connection. The purpose of this patch is to explicitly return an error when users try to replace
certain handlers of the subflow sk.

This way, users at least get a clear error message instead of just experiencing a mysterious connection
failure.

> If yes, it might be clearer not to allow sockmap on connections created
> from MPTCP sockets. But when looking at sockmap and what's happening
> when a TCP socket is created following a "plain TCP" request, we would
> need specific MPTCP code to catch that in sockmap...

I know what you're concerned about, and I also don't want to add any MPTCP-specific checks on the
sockmap or BPF side :).

I will try to set psock_update_sk_prot to NULL first.


> > 
> > Since the sockmap workflow typically occurs after the three-way handshake, rolling
> >  back at that point might be too late, and undoing the logic for MPTCP would be very
> >  complex.
> >  
> >  Regardless, the reality is that MPTCP and sockmap are already conflicting, and this
> >  has been the case for some time. So I think our first step is to catch specific
> >  behavior on the BPF side and print a message
> >  "sockmap/sockhash: MPTCP sockets are not supported\n", informing users to either
> >  stop using sockmap or not use MPTCP.
> >  
> >  As for the logic to check for subflows, I think implementing it in subflow.c would be
> >  beneficial, as this logic would likely be useful later if we want to
> >  support MPTCP + sockmap.
> > 
> Probably yes.
> > 
> > Furthermore, this commit also addresses the issue of incorrectly selecting
> >  inet_stream_ops due to the subflow prot replacement, as mentioned above.
> > 
> (indeed, but this seems to happen only when sk_prot has been replaced by
> sockmap :) )
> 
> > 
> > A complete integration of MPTCP and sockmap would require more effort, for
> >  example, we would need to retrieve the parent socket from subflows in
> >  sockmap and implement handlers like read_skb.
> >  
> >  If maintainers don't object, we can further improve this in subsequent
> >  work.
> > 
> > > 
> > > That would be great to add MPTCP support in sockmap! As mentioned above,
> > >  this should be done on the MPTCP socket. I guess the TCP "in-kernel"
> > >  subflows should not be modified.
> > > 
> >  
> >  
> >  I think we should first fix the issue by having sockmap reject operations on subflows.
> >  Subsequently, we can work on fully integrating sockmap with MPTCP as a feature
> >  (which would require implementing some handlers).
> > 
> OK for me!
> 
> > 
> > [1] truncated warning:
> >  [ 18.234652] ------------[ cut here ]------------
> >  [ 18.234664] WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380
> >  [ 18.234726] Modules linked in:
> >  [ 18.234755] RIP: 0010:mptcp_stream_accept+0x34c/0x380
> >  [ 18.234762] RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
> >  [...]
> > 
> > > 
> > > Please next time use the ./scripts/decode_stacktrace.sh if possible.
> > >  (and strip the timestamps if it is not giving useful info)
> > >  Just to be sure: is it the warning you get on top of net or net-next? Or
> > >  an older version? (Always useful to mention the base)
> > > 
> >  
> >  Thank you, Matthieu. I will pay attention to this.
> >  
> >  
> > 
> >  ---
> >  v2: https://lore.kernel.org/bpf/20251020060503.325369-1-jiayuan.chen@linux.dev/T/#t
> >  Some advice suggested by Jakub Sitnicki
> >  
> >  v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linux.dev/T/#t
> >  Some advice from Matthieu Baerts.
> > 
> > > 
> > > (It usually helps reviewers to add more details in the notes/changelog
> > >  for the individual patch)
> > > 
> >  
> >  Thank you, Matthieu. I will provide more detailed descriptions in the future.
> > 
> Thanks!
> 
> So for the v4, patch 2/3 would be replaced by one setting ...
> 
>  tcp_prot_override.psock_update_sk_prot = NULL;
>  (...)
>  tcpv6_prot_override.psock_update_sk_prot = NULL;
> 
> ... in mptcp_subflow_init(). (+ more details for patch 1/3).
> From there, we can discuss with other maintainers what to do with the
> MPTCP listening socket + sockmap case. And in parallel, we can also
> discuss MPTCP support with sockmap. WDYT?
> 

Thanks, I will do it.

> Cheers,
> Matt
> -- 
> Sponsored by the NGI0 Core fund.
> 
> pw-bot: cr
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap
  2025-10-28 11:30   ` Paolo Abeni
  2025-10-28 11:47     ` Paolo Abeni
@ 2025-11-03 12:44     ` Jiayuan Chen
  1 sibling, 0 replies; 17+ messages in thread
From: Jiayuan Chen @ 2025-11-03 12:44 UTC (permalink / raw)
  To: Paolo Abeni, mptcp
  Cc: stable, Jakub Sitnicki, John Fastabend, Eric Dumazet,
	Kuniyuki Iwashima, Willem de Bruijn, David S. Miller,
	Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
	Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

October 28, 2025 at 19:30, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:


> 
> On 10/23/25 2:54 PM, Jiayuan Chen wrote:
> 
> > 
> > When the server has MPTCP enabled but receives a non-MP-capable request
> >  from a client, it calls mptcp_fallback_tcp_ops().
> >  
> >  Since non-MPTCP connections are allowed to use sockmap, which replaces
> >  sk->sk_prot, using sk->sk_prot to determine the IP version in
> >  mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning
> >  incorrect ops to sk->sk_socket->ops.
> > 
> I don't see how sockmap could modify the to-be-accepted socket sk_prot
> before mptcp_fallback_tcp_ops(), as such call happens before the fd is
> installed, and AFAICS sockmap can only fetch sockets via fds.
> 
> Is this patch needed?

"mptcp_fallback_tcp_ops" is only called during the accept process. However,
before that, for an already established TCP socket, its sk_prot is replaced via the following path:
tcp_rcv_state_process()
  tcp_init_transfer(BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
    call bpf prog
        bpf_sock_map_update(sk)
           tcp_bpf_update_proto()

However, after discussing with Matthieu, we've concluded that this patch is indeed no
longer necessary, as we have a simpler way to intercept the operation."

Thanks~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap
  2025-10-28 11:47     ` Paolo Abeni
@ 2025-11-03 12:45       ` Jiayuan Chen
  0 siblings, 0 replies; 17+ messages in thread
From: Jiayuan Chen @ 2025-11-03 12:45 UTC (permalink / raw)
  To: Paolo Abeni, mptcp
  Cc: stable, Jakub Sitnicki, John Fastabend, Eric Dumazet,
	Kuniyuki Iwashima, Willem de Bruijn, David S. Miller,
	Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
	Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

October 28, 2025 at 19:47, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:


> 
> On 10/28/25 12:30 PM, Paolo Abeni wrote:
> 
> > 
> > On 10/23/25 2:54 PM, Jiayuan Chen wrote:
> > 
> > > 
> > > When the server has MPTCP enabled but receives a non-MP-capable request
> > >  from a client, it calls mptcp_fallback_tcp_ops().
> > > 
> > >  Since non-MPTCP connections are allowed to use sockmap, which replaces
> > >  sk->sk_prot, using sk->sk_prot to determine the IP version in
> > >  mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning
> > >  incorrect ops to sk->sk_socket->ops.
> > > 
> >  
> >  I don't see how sockmap could modify the to-be-accepted socket sk_prot
> >  before mptcp_fallback_tcp_ops(), as such call happens before the fd is
> >  installed, and AFAICS sockmap can only fetch sockets via fds.
> >  
> >  Is this patch needed?
> > 
> Matttbe explained off-list the details of how that could happen. I think
> the commit message here must be more verbose to explain clearly the
> whys, even to those non proficient in sockmap like me.
> 
> Thanks,
> 
> Paolo
>

Thanks, I will add more details into commit message :).

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap
  2025-10-28 12:03   ` Paolo Abeni
@ 2025-11-03 12:52     ` Jiayuan Chen
  0 siblings, 0 replies; 17+ messages in thread
From: Jiayuan Chen @ 2025-11-03 12:52 UTC (permalink / raw)
  To: Paolo Abeni, mptcp
  Cc: stable, Jakub Sitnicki, John Fastabend, Eric Dumazet,
	Kuniyuki Iwashima, Willem de Bruijn, David S. Miller,
	Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
	Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Shuah Khan, Florian Westphal, linux-kernel, netdev, bpf,
	linux-kselftest

October 28, 2025 at 20:03, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:


> 
> On 10/23/25 2:54 PM, Jiayuan Chen wrote:
> 
> > 
> > MPTCP creates subflows for data transmission, and these sockets should not
> >  be added to sockmap because MPTCP sets specialized data_ready handlers
> >  that would be overridden by sockmap.
> >  
> >  Additionally, for the parent socket of MPTCP subflows (plain TCP socket),
> >  MPTCP sk requires specific protocol handling that conflicts with sockmap's
> >  operation(mptcp_prot).
> >  
> >  This patch adds proper checks to reject MPTCP subflows and their parent
> >  sockets from being added to sockmap, while preserving compatibility with
> >  reuseport functionality for listening MPTCP sockets.
> > 
> It's unclear to me why that is safe. sockmap is going to change the
> listener msk proto ops.
> 
> The listener could disconnect and create an egress connection, still
> using the wrong ops.

sockmap only replaces read/write handler of a sk and keeps another handler.

But I agree with you; I also don't think sockmap should replace the handlers of
the listen socket. Because for a listen socket, sockmap is merely used as a container, 
just like hash map or array map. But in reality, that's exactly what it does...

> I think sockmap should always be prevented for mptcp socket, or at least
> a solid explanation of why such exception is safe should be included in
> the commit message.
> 
> Note that the first option allows for solving the issue entirely in the
> mptcp code, setting dummy/noop psock_update_sk_prot for mptcp sockets
> and mptcp subflows.

I will do it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap
  2025-11-03 12:34       ` Jiayuan Chen
@ 2025-11-03 15:53         ` Matthieu Baerts
  0 siblings, 0 replies; 17+ messages in thread
From: Matthieu Baerts @ 2025-11-03 15:53 UTC (permalink / raw)
  To: Jiayuan Chen, mptcp
  Cc: John Fastabend, Jakub Sitnicki, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, David S. Miller, Jakub Kicinski,
	Simon Horman, Mat Martineau, Geliang Tang, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Florian Westphal, linux-kernel, netdev, bpf, linux-kselftest

Hi Jiayuan,

On 03/11/2025 13:34, Jiayuan Chen wrote:
> October 29, 2025 at 1:26 AM, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:
>> On 24/10/2025 06:13, Jiayuan Chen wrote:

(...)

>>> The current implementation rejects MPTCP because I previously attempted to
>>>  add sockmap support for MPTCP, but it required implementing many interfaces
>>>  and would take considerable time.
>>>  
>>>  So for now, I'm proposing this as a fix to resolve the immediate issue.
>>>  Subsequently, we can continue working on fully integrating MPTCP with sockmap.
>>>
>> It makes sense to start with the fix for stable, then the implementation
>> later. I think the implementation should not be that complex: it is just
>> that it has to be done at MPTCP level, not TCP. sockmap supports
>> different protocol, and it doesn't seem to be TCP specific, so that
>> should be feasible.
> 
> I agree with that. From a userspace perspective, we can't really manipulate subflow
> TCP directly, and I also think it's correct to handle this at the MPTCP layer.
> 
> But I didn't quite get your point about "it has to be done at MPTCP level." Currently,
> BPF provides 'sockops' capability, which invokes BPF programs in the protocol stack.
> The input parameter sk for the BPF program is actually a TCP sk (subflow).
> 
> Many helper functions (like sockmap) have no choice but to care about whether it's MPTCP
> or not.

I see. Maybe new MPTCP equivalent hooks will be needed then?

(...)

>>>> A more important question: what will typically happen in your case if
>>>>  you receive an MPTCP request and sockmap is then not supported? Will the
>>>>  connection be rejected or stay in a strange state because the userspace
>>>>  will not expect that? In these cases, would it not be better to disallow
>>>>  sockmap usage while the MPTCP support is not available? The userspace
>>>>  would then get an error from the beginning that the protocol is not
>>>>  supported, and should then not create an MPTCP socket in this case for
>>>>  the moment, no?
>>>>
>>>>  I can understand that the switch from TCP to MPTCP was probably done
>>>>  globally, and this transition should be as seamless as possible, but it
>>>>  should not cause a regression with MPTCP requests. An alternative could
>>>>  be to force a fallback to TCP when sockmap is used, even when an MPTCP
>>>>  request is received, but not sure if it is practical to do, and might be
>>>>  strange from the user point of view.
>>>>
>>>  
>>>  Actually, I understand this not as an MPTCP regression, but as a sockmap
>>>  regression.
>>>  
>>>  Let me explain how users typically use sockmap:
>>>  
>>>  Users typically create multiple sockets on a host and program using BPF+sockmap
>>>  to enable fast data redirection. This involves intercepting data sent or received
>>>  by one socket and redirecting it to the send or receive queue of another socket.
>>>  
>>>  This requires explicit user programming. The goal is that when multiple microservices
>>>  on one host need to communicate, they can bypass most of the network stack and avoid
>>>  data copies between user and kernel space.
>>>  
>>>  However, when an MPTCP request occurs, this redirection flow fails.
>>>
>> This part bothers me a bit. Does it mean that when the userspace creates
>> a TCP listening socket (IPPROTO_TCP), MPTCP requests will be accepted,
>> but MPTCP will not be used ; but when an MPTCP socket is used instead,
>> MPTCP requests will be rejected?
> 
> "when the userspace creates a TCP listening socket (IPPROTO_TCP), MPTCP requests will be accepted,
> but MPTCP will not be used"
> --- Yes, that's essentially the logic behind MPTCP fallback, right? In this case, it should work
> fine with sockmap as well. That's exactly what this patch aims to achieve.

That's an MPTCP fallback to TCP for the client side here: the client
requests to use MPTCP, but the server doesn't support it. In this case,
MPTCP options will be ignored, and a "plain" TCP SYN+ACK will be sent
back to the client. In this case, the server using sockmap doesn't
handle MPTCP, because it created an IPPROTO_TCP.

In other words, the situation you had before GO 1.24, right?

> "but when an MPTCP socket is used instead, MPTCP requests will be rejected?"
> --- Exactly. Currently, because sockmap operates directly on the subflow sk, it breaks the MPTCP
> connection. The purpose of this patch is to explicitly return an error when users try to replace
> certain handlers of the subflow sk.

I don't think a message at that point is that useful. Ideally, the
userspace should get an error or a notice when setting sockmap up. But I
understand sockmap are not really attached to listening sockets, and it
doesn't seem possible to block sockmap at setup time because it is going
to be used with "accept()ed" connection created from an MPTCP listening
socket.

So I guess we will still need patch 1/3 (with a better commit message),
and patch 2/3 should be restricted to remove psock_update_sk_prot for
MPTCP subflows.

> This way, users at least get a clear error message instead of just experiencing a mysterious connection
> failure.
> 
>> If yes, it might be clearer not to allow sockmap on connections created
>> from MPTCP sockets. But when looking at sockmap and what's happening
>> when a TCP socket is created following a "plain TCP" request, we would
>> need specific MPTCP code to catch that in sockmap...
> 
> I know what you're concerned about, and I also don't want to add any MPTCP-specific checks on the
> sockmap or BPF side :).
> 
> I will try to set psock_update_sk_prot to NULL first.

Thanks!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-11-03 15:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-23 12:54 [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
2025-10-23 12:54 ` [PATCH net v3 1/3] net,mptcp: fix proto fallback detection with BPF sockmap Jiayuan Chen
2025-10-23 14:10   ` Matthieu Baerts
2025-10-23 14:38     ` Jiayuan Chen
2025-10-28 11:30   ` Paolo Abeni
2025-10-28 11:47     ` Paolo Abeni
2025-11-03 12:45       ` Jiayuan Chen
2025-11-03 12:44     ` Jiayuan Chen
2025-10-23 12:54 ` [PATCH net v3 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap Jiayuan Chen
2025-10-28 12:03   ` Paolo Abeni
2025-11-03 12:52     ` Jiayuan Chen
2025-10-23 12:54 ` [PATCH net v3 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
2025-10-23 14:10 ` [PATCH net v3 0/3] mptcp: Fix conflicts between MPTCP and sockmap Matthieu Baerts
2025-10-24  4:13   ` Jiayuan Chen
2025-10-28 17:26     ` Matthieu Baerts
2025-11-03 12:34       ` Jiayuan Chen
2025-11-03 15:53         ` Matthieu Baerts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).