* [PATCH net v2 0/3] mptcp: Fix conflicts between MPTCP and sockmap
@ 2025-10-20 6:04 Jiayuan Chen
2025-10-20 6:04 ` [PATCH net v2 1/3] net,mptcp: fix incorrect IPv4/IPv6 fallback detection with BPF Sockmap Jiayuan Chen
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Jiayuan Chen @ 2025-10-20 6:04 UTC (permalink / raw)
To: mptcp, netdev, bpf
Cc: Jiayuan Chen, John Fastabend, Jakub Sitnicki, Eric Dumazet,
Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn, David S. Miller,
Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Florian Westphal, linux-kernel, linux-kselftest
Overall, we encountered a warning [1] that can be triggered by running the
selftest I provided.
MPTCP creates subflows for data transmission between two endpoints.
However, BPF can use sockops to perform additional operations when TCP
completes the three-way handshake. The issue arose because we used sockmap
in sockops, which replaces sk->sk_prot and some handlers. Since subflows
also have their own specialized handlers, this creates a conflict and leads
to traffic failure. Therefore, we need to reject operations targeting
subflows.
This patchset simply prevents the combination of subflows and sockmap
without changing any functionality.
A complete integration of MPTCP and sockmap would require more effort, for
example, we would need to retrieve the parent socket from subflows in
sockmap and implement handlers like read_skb.
If maintainers don't object, we can further improve this in subsequent
work.
v1: https://lore.kernel.org/mptcp/a0a2b87119a06c5ffaa51427a0964a05534fe6f1@linux.dev/T/#t
[1] truncated warning:
[ 18.234652] ------------[ cut here ]------------
[ 18.234664] WARNING: CPU: 1 PID: 388 at net/mptcp/protocol.c:68 mptcp_stream_accept+0x34c/0x380
[ 18.234726] Modules linked in:
[ 18.234755] RIP: 0010:mptcp_stream_accept+0x34c/0x380
[ 18.234762] RSP: 0018:ffffc90000cf3cf8 EFLAGS: 00010202
[ 18.234800] PKRU: 55555554
[ 18.234806] Call Trace:
[ 18.234810] <TASK>
[ 18.234837] do_accept+0xeb/0x190
[ 18.234861] ? __x64_sys_pselect6+0x61/0x80
[ 18.234898] ? _raw_spin_unlock+0x12/0x30
[ 18.234915] ? alloc_fd+0x11e/0x190
[ 18.234925] __sys_accept4+0x8c/0x100
[ 18.234930] __x64_sys_accept+0x1f/0x30
[ 18.234933] x64_sys_call+0x202f/0x20f0
[ 18.234966] do_syscall_64+0x72/0x9a0
[ 18.234979] ? switch_fpu_return+0x60/0xf0
[ 18.234993] ? irqentry_exit_to_user_mode+0xdb/0x1e0
[ 18.235002] ? irqentry_exit+0x3f/0x50
[ 18.235005] ? clear_bhb_loop+0x50/0xa0
[ 18.235022] ? clear_bhb_loop+0x50/0xa0
[ 18.235025] ? clear_bhb_loop+0x50/0xa0
[ 18.235028] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 18.235066] </TASK>
[ 18.235109] ---[ end trace 0000000000000000 ]---
[ 18.235677] sockmap: MPTCP sockets are not supported
Jiayuan Chen (3):
net,mptcp: fix incorrect IPv4/IPv6 fallback detection with BPF Sockmap
bpf,sockmap: disallow MPTCP sockets from sockmap updates
selftests/bpf: Add mptcp test with sockmap
net/core/sock_map.c | 9 ++
net/mptcp/protocol.c | 7 +-
.../testing/selftests/bpf/prog_tests/mptcp.c | 136 ++++++++++++++++++
.../selftests/bpf/progs/mptcp_sockmap.c | 43 ++++++
4 files changed, 193 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c
--
2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net v2 1/3] net,mptcp: fix incorrect IPv4/IPv6 fallback detection with BPF Sockmap
2025-10-20 6:04 [PATCH net v2 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
@ 2025-10-20 6:04 ` Jiayuan Chen
2025-10-21 10:24 ` Jakub Sitnicki
2025-10-20 6:04 ` [PATCH net v2 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap updates Jiayuan Chen
2025-10-20 6:04 ` [PATCH net v2 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
2 siblings, 1 reply; 8+ messages in thread
From: Jiayuan Chen @ 2025-10-20 6:04 UTC (permalink / raw)
To: mptcp, netdev, bpf
Cc: Jiayuan Chen, John Fastabend, Jakub Sitnicki, Eric Dumazet,
Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn, David S. Miller,
Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
Geliang Tang, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Florian Westphal, linux-kernel, linux-kselftest
When the server has MPTCP enabled but receives a non-MP-capable request
from a client, it calls mptcp_fallback_tcp_ops().
Since non-MPTCP connections are allowed to use sockmap, which replaces
sk->sk_prot, using sk->sk_prot to determine the IP version in
mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning
incorrect ops to sk->sk_socket->ops.
Additionally, when BPF Sockmap modifies the protocol handlers, the
original WARN_ON_ONCE(sk->sk_prot != &tcp_prot) check would falsely
trigger warnings.
Fix this by using the more stable sk_family to distinguish between IPv4
and IPv6 connections, ensuring correct fallback protocol operations are
selected even when BPF Sockmap has modified the socket protocol handlers.
Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
net/mptcp/protocol.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 0292162a14ee..c2d1513615ae 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -61,11 +61,14 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
{
+ /* When BPF Sockmap is used, it replaces sk->sk_prot.
+ * Using sk_family is a reliable way to determine the IP version.
+ */
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
- if (sk->sk_prot == &tcpv6_prot)
+ if (sk->sk_family == AF_INET6)
return &inet6_stream_ops;
#endif
- WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
+ WARN_ON_ONCE(sk->sk_family != AF_INET);
return &inet_stream_ops;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net v2 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap updates
2025-10-20 6:04 [PATCH net v2 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
2025-10-20 6:04 ` [PATCH net v2 1/3] net,mptcp: fix incorrect IPv4/IPv6 fallback detection with BPF Sockmap Jiayuan Chen
@ 2025-10-20 6:04 ` Jiayuan Chen
2025-10-21 10:49 ` Jakub Sitnicki
2025-10-20 6:04 ` [PATCH net v2 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
2 siblings, 1 reply; 8+ messages in thread
From: Jiayuan Chen @ 2025-10-20 6:04 UTC (permalink / raw)
To: mptcp, netdev, bpf
Cc: Jiayuan Chen, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, John Fastabend, Jakub Sitnicki, David S. Miller,
Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
Geliang Tang, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Florian Westphal, linux-kernel, linux-kselftest
MPTCP creates subflows for data transmission, and these sockets should not
be added to sockmap because MPTCP sets specialized data_ready handlers
that would be overridden by sockmap.
Additionally, for the parent socket of MPTCP subflows (plain TCP socket),
MPTCP sk requires specific protocol handling that conflicts with sockmap's
operation(mptcp_prot).
This patch adds proper checks to reject MPTCP subflows and their parent
sockets from being added to sockmap, while preserving compatibility with
reuseport functionality for listening MPTCP sockets.
Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
net/core/sock_map.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 5947b38e4f8b..da21deb970b3 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -535,6 +535,15 @@ static bool sock_map_redirect_allowed(const struct sock *sk)
static bool sock_map_sk_is_suitable(const struct sock *sk)
{
+ if ((sk_is_tcp(sk) && sk_is_mptcp(sk)) /* subflow */ ||
+ (sk->sk_protocol == IPPROTO_MPTCP && sk->sk_state != TCP_LISTEN)) {
+ /* Disallow MPTCP subflows and their parent socket.
+ * However, a TCP_LISTEN MPTCP socket is permitted because
+ * sockmap can also serve for reuseport socket selection.
+ */
+ pr_err_once("sockmap: MPTCP sockets are not supported\n");
+ return false;
+ }
return !!sk->sk_prot->psock_update_sk_prot;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH net v2 3/3] selftests/bpf: Add mptcp test with sockmap
2025-10-20 6:04 [PATCH net v2 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
2025-10-20 6:04 ` [PATCH net v2 1/3] net,mptcp: fix incorrect IPv4/IPv6 fallback detection with BPF Sockmap Jiayuan Chen
2025-10-20 6:04 ` [PATCH net v2 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap updates Jiayuan Chen
@ 2025-10-20 6:04 ` Jiayuan Chen
2025-10-23 10:18 ` Jakub Sitnicki
2 siblings, 1 reply; 8+ messages in thread
From: Jiayuan Chen @ 2025-10-20 6:04 UTC (permalink / raw)
To: mptcp, netdev, bpf
Cc: Jiayuan Chen, John Fastabend, Jakub Sitnicki, Eric Dumazet,
Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn, David S. Miller,
Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Florian Westphal, linux-kernel, linux-kselftest
Add test cases to verify that when MPTCP falls back to plain TCP sockets,
they can properly work with sockmap.
Additionally, add test cases to ensure that sockmap correctly rejects
MPTCP sockets as expected.
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
.../testing/selftests/bpf/prog_tests/mptcp.c | 136 ++++++++++++++++++
.../selftests/bpf/progs/mptcp_sockmap.c | 43 ++++++
2 files changed, 179 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c
diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
index f8eb7f9d4fd2..54459b385439 100644
--- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
+++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
@@ -6,11 +6,14 @@
#include <netinet/in.h>
#include <test_progs.h>
#include <unistd.h>
+#include <error.h>
#include "cgroup_helpers.h"
#include "network_helpers.h"
+#include "socket_helpers.h"
#include "mptcp_sock.skel.h"
#include "mptcpify.skel.h"
#include "mptcp_subflow.skel.h"
+#include "mptcp_sockmap.skel.h"
#define NS_TEST "mptcp_ns"
#define ADDR_1 "10.0.1.1"
@@ -436,6 +439,137 @@ static void test_subflow(void)
close(cgroup_fd);
}
+/* Test sockmap on MPTCP server handling non-mp-capable clients. */
+static void test_sockmap_with_mptcp_fallback(struct mptcp_sockmap *skel)
+{
+ int listen_fd = -1, client_fd1 = -1, client_fd2 = -1;
+ int server_fd1 = -1, server_fd2 = -1, sent, recvd;
+ char snd[9] = "123456789";
+ char rcv[10];
+
+ listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
+ if (!ASSERT_OK_FD(listen_fd, "redirect:start_mptcp_server"))
+ return;
+
+ skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
+ skel->bss->sk_index = 0;
+ client_fd1 = connect_to_fd_opts(listen_fd, NULL);
+ if (!ASSERT_OK_FD(client_fd1, "redirect:connect_to_fd"))
+ goto end;
+ server_fd1 = xaccept_nonblock(listen_fd, NULL, NULL);
+ skel->bss->sk_index = 1;
+ client_fd2 = connect_to_fd_opts(listen_fd, NULL);
+ if (!ASSERT_OK_FD(client_fd2, "redirect:connect_to_fd"))
+ goto end;
+ server_fd1 = xaccept_nonblock(listen_fd, NULL, NULL);
+ /* test normal redirect behavior: the data sent by client_fd1 can be
+ * received by client_fd2
+ */
+ skel->bss->redirect_idx = 1;
+ sent = xsend(client_fd1, snd, sizeof(snd), 0);
+ if (!ASSERT_EQ(sent, sizeof(snd), "redirect:xsend(client_fd1)"))
+ goto end;
+
+ /* try to recv more byte to avoid truncation check */
+ recvd = recv_timeout(client_fd2, rcv, sizeof(rcv), MSG_DONTWAIT, 2);
+ if (!ASSERT_EQ(recvd, sizeof(snd), "redirect:recv(client_fd2)"))
+ goto end;
+
+end:
+ if (client_fd1 > 1)
+ close(client_fd1);
+ if (client_fd2 > 1)
+ close(client_fd2);
+ if (server_fd1 > 0)
+ close(server_fd1);
+ if (server_fd2 > 0)
+ close(server_fd2);
+ close(listen_fd);
+}
+
+static void test_sockmap_reject_mptcp(struct mptcp_sockmap *skel)
+{
+ int listen_fd = -1, server_fd = -1;
+ int client_fd1 = -1, client_fd2 = -1;
+ int err, zero = 0;
+
+ listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
+ if (!ASSERT_OK_FD(listen_fd, "start_mptcp_server"))
+ return;
+
+ skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
+ skel->bss->sk_index = 0;
+ client_fd1 = connect_to_fd(listen_fd, 0);
+ if (!ASSERT_OK_FD(client_fd1, "connect_to_fd client_fd1"))
+ goto end;
+ /* sockmap helper called from sockops prog should reject mptcp sk */
+ if (ASSERT_EQ(skel->bss->helper_ret, -EOPNOTSUPP, "should reject"))
+ goto end;
+
+ /* skip sockops prog */
+ skel->bss->trace_port = -1;
+ client_fd2 = connect_to_fd(listen_fd, 0);
+ if (!ASSERT_OK_FD(client_fd2, "connect_to_fd client_fd2"))
+ goto end;
+
+ server_fd = xaccept_nonblock(listen_fd, NULL, NULL);
+ err = bpf_map_update_elem(bpf_map__fd(skel->maps.sock_map),
+ &zero, &server_fd, BPF_NOEXIST);
+ if (ASSERT_EQ(err, -EOPNOTSUPP, "should reject"))
+ goto end;
+end:
+ if (client_fd1 > 0)
+ close(client_fd1);
+ if (client_fd2 > 0)
+ close(client_fd2);
+ if (server_fd > 0)
+ close(server_fd);
+ close(listen_fd);
+}
+
+static void test_mptcp_sockmap(void)
+{
+ struct mptcp_sockmap *skel;
+ struct netns_obj *netns;
+ int cgroup_fd, err;
+
+ cgroup_fd = test__join_cgroup("/mptcp_sockmap");
+ if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup: mptcp_sockmap"))
+ return;
+
+ skel = mptcp_sockmap__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open_load: mptcp_sockmap"))
+ goto close_cgroup;
+
+ skel->links.mptcp_sockmap_inject =
+ bpf_program__attach_cgroup(skel->progs.mptcp_sockmap_inject, cgroup_fd);
+ if (!ASSERT_OK_PTR(skel->links.mptcp_sockmap_inject, "attach sockmap"))
+ goto skel_destroy;
+
+ err = bpf_prog_attach(bpf_program__fd(skel->progs.mptcp_sockmap_redirect),
+ bpf_map__fd(skel->maps.sock_map),
+ BPF_SK_SKB_STREAM_VERDICT, 0);
+ if (!ASSERT_OK(err, "bpf_prog_attach stream verdict"))
+ goto skel_destroy;
+
+ netns = netns_new(NS_TEST, true);
+ if (!ASSERT_OK_PTR(netns, "netns_new: mptcp_sockmap"))
+ goto skel_destroy;
+
+ if (endpoint_init("subflow") < 0)
+ goto close_netns;
+
+ test_sockmap_with_mptcp_fallback(skel);
+ test_sockmap_reject_mptcp(skel);
+
+close_netns:
+ netns_free(netns);
+skel_destroy:
+ mptcp_sockmap__destroy(skel);
+close_cgroup:
+ close(cgroup_fd);
+}
+
void test_mptcp(void)
{
if (test__start_subtest("base"))
@@ -444,4 +578,6 @@ void test_mptcp(void)
test_mptcpify();
if (test__start_subtest("subflow"))
test_subflow();
+ if (test__start_subtest("sockmap"))
+ test_mptcp_sockmap();
}
diff --git a/tools/testing/selftests/bpf/progs/mptcp_sockmap.c b/tools/testing/selftests/bpf/progs/mptcp_sockmap.c
new file mode 100644
index 000000000000..d4eef0cbadb9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_sockmap.c
@@ -0,0 +1,43 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "bpf_tracing_net.h"
+
+char _license[] SEC("license") = "GPL";
+
+int sk_index;
+int redirect_idx;
+int trace_port;
+int helper_ret;
+struct {
+ __uint(type, BPF_MAP_TYPE_SOCKMAP);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(__u32));
+ __uint(max_entries, 100);
+} sock_map SEC(".maps");
+
+SEC("sockops")
+int mptcp_sockmap_inject(struct bpf_sock_ops *skops)
+{
+ struct bpf_sock *sk;
+
+ /* only accept specified connection */
+ if (skops->local_port != trace_port ||
+ skops->op != BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
+ return 1;
+
+ sk = skops->sk;
+ if (!sk)
+ return 1;
+
+ /* update sk handler */
+ helper_ret = bpf_sock_map_update(skops, &sock_map, &sk_index, BPF_NOEXIST);
+
+ return 1;
+}
+
+SEC("sk_skb/stream_verdict")
+int mptcp_sockmap_redirect(struct __sk_buff *skb)
+{
+ /* redirect skb to the sk under sock_map[redirect_idx] */
+ return bpf_sk_redirect_map(skb, &sock_map, redirect_idx, 0);
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net v2 1/3] net,mptcp: fix incorrect IPv4/IPv6 fallback detection with BPF Sockmap
2025-10-20 6:04 ` [PATCH net v2 1/3] net,mptcp: fix incorrect IPv4/IPv6 fallback detection with BPF Sockmap Jiayuan Chen
@ 2025-10-21 10:24 ` Jakub Sitnicki
0 siblings, 0 replies; 8+ messages in thread
From: Jakub Sitnicki @ 2025-10-21 10:24 UTC (permalink / raw)
To: Jiayuan Chen
Cc: mptcp, netdev, bpf, John Fastabend, Eric Dumazet,
Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn, David S. Miller,
Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
Geliang Tang, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Florian Westphal, linux-kernel, linux-kselftest
On Mon, Oct 20, 2025 at 02:04 PM +08, Jiayuan Chen wrote:
> When the server has MPTCP enabled but receives a non-MP-capable request
> from a client, it calls mptcp_fallback_tcp_ops().
>
> Since non-MPTCP connections are allowed to use sockmap, which replaces
> sk->sk_prot, using sk->sk_prot to determine the IP version in
> mptcp_fallback_tcp_ops() becomes unreliable. This can lead to assigning
> incorrect ops to sk->sk_socket->ops.
>
> Additionally, when BPF Sockmap modifies the protocol handlers, the
> original WARN_ON_ONCE(sk->sk_prot != &tcp_prot) check would falsely
> trigger warnings.
>
> Fix this by using the more stable sk_family to distinguish between IPv4
> and IPv6 connections, ensuring correct fallback protocol operations are
> selected even when BPF Sockmap has modified the socket protocol handlers.
>
> Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
> net/mptcp/protocol.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 0292162a14ee..c2d1513615ae 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -61,11 +61,14 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
>
> static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
> {
> + /* When BPF Sockmap is used, it replaces sk->sk_prot.
> + * Using sk_family is a reliable way to determine the IP version.
> + */
> #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> - if (sk->sk_prot == &tcpv6_prot)
> + if (sk->sk_family == AF_INET6)
> return &inet6_stream_ops;
> #endif
> - WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
> + WARN_ON_ONCE(sk->sk_family != AF_INET);
> return &inet_stream_ops;
> }
Should probably be a READ_ONCE(sk->sk_family) based on what I see in
IPV6_ADDRFORM:
https://elixir.bootlin.com/linux/v6.18-rc1/source/net/ipv6/ipv6_sockglue.c#L607
Nit: It's BPF sockmap, cpumap, etc. We don't treat it as a proper noun.
Other than that:
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap updates
2025-10-20 6:04 ` [PATCH net v2 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap updates Jiayuan Chen
@ 2025-10-21 10:49 ` Jakub Sitnicki
2025-10-21 12:16 ` Jiayuan Chen
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Sitnicki @ 2025-10-21 10:49 UTC (permalink / raw)
To: Jiayuan Chen
Cc: mptcp, netdev, bpf, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, John Fastabend, David S. Miller, Jakub Kicinski,
Simon Horman, Matthieu Baerts, Mat Martineau, Geliang Tang,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
Florian Westphal, linux-kernel, linux-kselftest
On Mon, Oct 20, 2025 at 02:04 PM +08, Jiayuan Chen wrote:
> MPTCP creates subflows for data transmission, and these sockets should not
> be added to sockmap because MPTCP sets specialized data_ready handlers
> that would be overridden by sockmap.
>
> Additionally, for the parent socket of MPTCP subflows (plain TCP socket),
> MPTCP sk requires specific protocol handling that conflicts with sockmap's
> operation(mptcp_prot).
>
> This patch adds proper checks to reject MPTCP subflows and their parent
> sockets from being added to sockmap, while preserving compatibility with
> reuseport functionality for listening MPTCP sockets.
>
> Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
> net/core/sock_map.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 5947b38e4f8b..da21deb970b3 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -535,6 +535,15 @@ static bool sock_map_redirect_allowed(const struct sock *sk)
>
> static bool sock_map_sk_is_suitable(const struct sock *sk)
> {
> + if ((sk_is_tcp(sk) && sk_is_mptcp(sk)) /* subflow */ ||
> + (sk->sk_protocol == IPPROTO_MPTCP && sk->sk_state != TCP_LISTEN)) {
> + /* Disallow MPTCP subflows and their parent socket.
> + * However, a TCP_LISTEN MPTCP socket is permitted because
> + * sockmap can also serve for reuseport socket selection.
> + */
> + pr_err_once("sockmap: MPTCP sockets are not supported\n");
> + return false;
> + }
> return !!sk->sk_prot->psock_update_sk_prot;
> }
You're checking sk_state without sk_lock held. That doesn't seem right.
Take a look how we always call sock_map_sk_state_allowed() after
grabbing the lock.
Same might apply to sk_is_mptcp(). Please double check.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap updates
2025-10-21 10:49 ` Jakub Sitnicki
@ 2025-10-21 12:16 ` Jiayuan Chen
0 siblings, 0 replies; 8+ messages in thread
From: Jiayuan Chen @ 2025-10-21 12:16 UTC (permalink / raw)
To: Jakub Sitnicki
Cc: mptcp, netdev, bpf, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
Willem de Bruijn, John Fastabend, David S. Miller, Jakub Kicinski,
Simon Horman, Matthieu Baerts, Mat Martineau, Geliang Tang,
Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
Florian Westphal, linux-kernel, linux-kselftest
October 21, 2025 at 18:49, "Jakub Sitnicki" <jakub@cloudflare.com mailto:jakub@cloudflare.com?to=%22Jakub%20Sitnicki%22%20%3Cjakub%40cloudflare.com%3E > wrote:
>
> On Mon, Oct 20, 2025 at 02:04 PM +08, Jiayuan Chen wrote:
>
> >
> > MPTCP creates subflows for data transmission, and these sockets should not
> > be added to sockmap because MPTCP sets specialized data_ready handlers
> > that would be overridden by sockmap.
> >
> > Additionally, for the parent socket of MPTCP subflows (plain TCP socket),
> > MPTCP sk requires specific protocol handling that conflicts with sockmap's
> > operation(mptcp_prot).
> >
> > This patch adds proper checks to reject MPTCP subflows and their parent
> > sockets from being added to sockmap, while preserving compatibility with
> > reuseport functionality for listening MPTCP sockets.
> >
> > Fixes: 0b4f33def7bb ("mptcp: fix tcp fallback crash")
> > Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> > ---
> > net/core/sock_map.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> > index 5947b38e4f8b..da21deb970b3 100644
> > --- a/net/core/sock_map.c
> > +++ b/net/core/sock_map.c
> > @@ -535,6 +535,15 @@ static bool sock_map_redirect_allowed(const struct sock *sk)
> >
> > static bool sock_map_sk_is_suitable(const struct sock *sk)
> > {
> > + if ((sk_is_tcp(sk) && sk_is_mptcp(sk)) /* subflow */ ||
> > + (sk->sk_protocol == IPPROTO_MPTCP && sk->sk_state != TCP_LISTEN)) {
> > + /* Disallow MPTCP subflows and their parent socket.
> > + * However, a TCP_LISTEN MPTCP socket is permitted because
> > + * sockmap can also serve for reuseport socket selection.
> > + */
> > + pr_err_once("sockmap: MPTCP sockets are not supported\n");
> > + return false;
> > + }
> > return !!sk->sk_prot->psock_update_sk_prot;
> > }
> >
> You're checking sk_state without sk_lock held. That doesn't seem right.
> Take a look how we always call sock_map_sk_state_allowed() after
> grabbing the lock.
> Same might apply to sk_is_mptcp(). Please double check.
>
Thank you for the suggestion. It seems more appropriate to place this logic
inside sock_map_sk_state_allowed().
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v2 3/3] selftests/bpf: Add mptcp test with sockmap
2025-10-20 6:04 ` [PATCH net v2 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
@ 2025-10-23 10:18 ` Jakub Sitnicki
0 siblings, 0 replies; 8+ messages in thread
From: Jakub Sitnicki @ 2025-10-23 10:18 UTC (permalink / raw)
To: Jiayuan Chen
Cc: mptcp, netdev, bpf, John Fastabend, Eric Dumazet,
Kuniyuki Iwashima, Paolo Abeni, Willem de Bruijn, David S. Miller,
Jakub Kicinski, Simon Horman, Matthieu Baerts, Mat Martineau,
Geliang Tang, Andrii Nakryiko, Eduard Zingerman,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
Shuah Khan, Florian Westphal, linux-kernel, linux-kselftest
On Mon, Oct 20, 2025 at 02:04 PM +08, Jiayuan Chen wrote:
> Add test cases to verify that when MPTCP falls back to plain TCP sockets,
> they can properly work with sockmap.
>
> Additionally, add test cases to ensure that sockmap correctly rejects
> MPTCP sockets as expected.
>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
> .../testing/selftests/bpf/prog_tests/mptcp.c | 136 ++++++++++++++++++
> .../selftests/bpf/progs/mptcp_sockmap.c | 43 ++++++
> 2 files changed, 179 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/progs/mptcp_sockmap.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> index f8eb7f9d4fd2..54459b385439 100644
> --- a/tools/testing/selftests/bpf/prog_tests/mptcp.c
> +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c
> @@ -6,11 +6,14 @@
> #include <netinet/in.h>
> #include <test_progs.h>
> #include <unistd.h>
> +#include <error.h>
> #include "cgroup_helpers.h"
> #include "network_helpers.h"
> +#include "socket_helpers.h"
> #include "mptcp_sock.skel.h"
> #include "mptcpify.skel.h"
> #include "mptcp_subflow.skel.h"
> +#include "mptcp_sockmap.skel.h"
>
> #define NS_TEST "mptcp_ns"
> #define ADDR_1 "10.0.1.1"
> @@ -436,6 +439,137 @@ static void test_subflow(void)
> close(cgroup_fd);
> }
>
> +/* Test sockmap on MPTCP server handling non-mp-capable clients. */
> +static void test_sockmap_with_mptcp_fallback(struct mptcp_sockmap *skel)
> +{
> + int listen_fd = -1, client_fd1 = -1, client_fd2 = -1;
> + int server_fd1 = -1, server_fd2 = -1, sent, recvd;
> + char snd[9] = "123456789";
> + char rcv[10];
> +
> + listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
> + if (!ASSERT_OK_FD(listen_fd, "redirect:start_mptcp_server"))
> + return;
> +
> + skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
> + skel->bss->sk_index = 0;
> + client_fd1 = connect_to_fd_opts(listen_fd, NULL);
> + if (!ASSERT_OK_FD(client_fd1, "redirect:connect_to_fd"))
> + goto end;
> + server_fd1 = xaccept_nonblock(listen_fd, NULL, NULL);
> + skel->bss->sk_index = 1;
> + client_fd2 = connect_to_fd_opts(listen_fd, NULL);
> + if (!ASSERT_OK_FD(client_fd2, "redirect:connect_to_fd"))
> + goto end;
> + server_fd1 = xaccept_nonblock(listen_fd, NULL, NULL);
> + /* test normal redirect behavior: the data sent by client_fd1 can be
> + * received by client_fd2
> + */
> + skel->bss->redirect_idx = 1;
> + sent = xsend(client_fd1, snd, sizeof(snd), 0);
> + if (!ASSERT_EQ(sent, sizeof(snd), "redirect:xsend(client_fd1)"))
> + goto end;
> +
> + /* try to recv more byte to avoid truncation check */
> + recvd = recv_timeout(client_fd2, rcv, sizeof(rcv), MSG_DONTWAIT, 2);
> + if (!ASSERT_EQ(recvd, sizeof(snd), "redirect:recv(client_fd2)"))
> + goto end;
> +
> +end:
> + if (client_fd1 > 1)
> + close(client_fd1);
> + if (client_fd2 > 1)
> + close(client_fd2);
> + if (server_fd1 > 0)
> + close(server_fd1);
> + if (server_fd2 > 0)
> + close(server_fd2);
> + close(listen_fd);
> +}
> +
> +static void test_sockmap_reject_mptcp(struct mptcp_sockmap *skel)
> +{
> + int listen_fd = -1, server_fd = -1;
> + int client_fd1 = -1, client_fd2 = -1;
> + int err, zero = 0;
> +
> + listen_fd = start_mptcp_server(AF_INET, NULL, 0, 0);
> + if (!ASSERT_OK_FD(listen_fd, "start_mptcp_server"))
> + return;
> +
> + skel->bss->trace_port = ntohs(get_socket_local_port(listen_fd));
> + skel->bss->sk_index = 0;
> + client_fd1 = connect_to_fd(listen_fd, 0);
> + if (!ASSERT_OK_FD(client_fd1, "connect_to_fd client_fd1"))
> + goto end;
> + /* sockmap helper called from sockops prog should reject mptcp sk */
> + if (ASSERT_EQ(skel->bss->helper_ret, -EOPNOTSUPP, "should reject"))
> + goto end;
I'm confused. Should we bail out (goto end) if EOPNOTSUPP is *not*
returned? That is "if (!ASSERT_EQ(...))".
> +
> + /* skip sockops prog */
> + skel->bss->trace_port = -1;
> + client_fd2 = connect_to_fd(listen_fd, 0);
> + if (!ASSERT_OK_FD(client_fd2, "connect_to_fd client_fd2"))
> + goto end;
> +
> + server_fd = xaccept_nonblock(listen_fd, NULL, NULL);
> + err = bpf_map_update_elem(bpf_map__fd(skel->maps.sock_map),
> + &zero, &server_fd, BPF_NOEXIST);
> + if (ASSERT_EQ(err, -EOPNOTSUPP, "should reject"))
> + goto end;
Same here. The check seems backward.
> +end:
> + if (client_fd1 > 0)
> + close(client_fd1);
> + if (client_fd2 > 0)
> + close(client_fd2);
> + if (server_fd > 0)
> + close(server_fd);
> + close(listen_fd);
> +}
> +
> +static void test_mptcp_sockmap(void)
> +{
> + struct mptcp_sockmap *skel;
> + struct netns_obj *netns;
> + int cgroup_fd, err;
> +
> + cgroup_fd = test__join_cgroup("/mptcp_sockmap");
> + if (!ASSERT_OK_FD(cgroup_fd, "join_cgroup: mptcp_sockmap"))
> + return;
> +
> + skel = mptcp_sockmap__open_and_load();
> + if (!ASSERT_OK_PTR(skel, "skel_open_load: mptcp_sockmap"))
> + goto close_cgroup;
> +
> + skel->links.mptcp_sockmap_inject =
> + bpf_program__attach_cgroup(skel->progs.mptcp_sockmap_inject, cgroup_fd);
> + if (!ASSERT_OK_PTR(skel->links.mptcp_sockmap_inject, "attach sockmap"))
> + goto skel_destroy;
> +
> + err = bpf_prog_attach(bpf_program__fd(skel->progs.mptcp_sockmap_redirect),
> + bpf_map__fd(skel->maps.sock_map),
> + BPF_SK_SKB_STREAM_VERDICT, 0);
> + if (!ASSERT_OK(err, "bpf_prog_attach stream verdict"))
> + goto skel_destroy;
> +
> + netns = netns_new(NS_TEST, true);
> + if (!ASSERT_OK_PTR(netns, "netns_new: mptcp_sockmap"))
> + goto skel_destroy;
> +
> + if (endpoint_init("subflow") < 0)
> + goto close_netns;
> +
> + test_sockmap_with_mptcp_fallback(skel);
> + test_sockmap_reject_mptcp(skel);
> +
> +close_netns:
> + netns_free(netns);
> +skel_destroy:
> + mptcp_sockmap__destroy(skel);
> +close_cgroup:
> + close(cgroup_fd);
> +}
> +
> void test_mptcp(void)
> {
> if (test__start_subtest("base"))
> @@ -444,4 +578,6 @@ void test_mptcp(void)
> test_mptcpify();
> if (test__start_subtest("subflow"))
> test_subflow();
> + if (test__start_subtest("sockmap"))
> + test_mptcp_sockmap();
> }
> diff --git a/tools/testing/selftests/bpf/progs/mptcp_sockmap.c b/tools/testing/selftests/bpf/progs/mptcp_sockmap.c
> new file mode 100644
> index 000000000000..d4eef0cbadb9
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/mptcp_sockmap.c
> @@ -0,0 +1,43 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include "bpf_tracing_net.h"
> +
> +char _license[] SEC("license") = "GPL";
> +
> +int sk_index;
> +int redirect_idx;
> +int trace_port;
> +int helper_ret;
> +struct {
> + __uint(type, BPF_MAP_TYPE_SOCKMAP);
> + __uint(key_size, sizeof(__u32));
> + __uint(value_size, sizeof(__u32));
> + __uint(max_entries, 100);
> +} sock_map SEC(".maps");
> +
> +SEC("sockops")
> +int mptcp_sockmap_inject(struct bpf_sock_ops *skops)
> +{
> + struct bpf_sock *sk;
> +
> + /* only accept specified connection */
> + if (skops->local_port != trace_port ||
> + skops->op != BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB)
> + return 1;
> +
> + sk = skops->sk;
> + if (!sk)
> + return 1;
> +
> + /* update sk handler */
> + helper_ret = bpf_sock_map_update(skops, &sock_map, &sk_index, BPF_NOEXIST);
> +
> + return 1;
> +}
> +
> +SEC("sk_skb/stream_verdict")
> +int mptcp_sockmap_redirect(struct __sk_buff *skb)
> +{
> + /* redirect skb to the sk under sock_map[redirect_idx] */
> + return bpf_sk_redirect_map(skb, &sock_map, redirect_idx, 0);
> +}
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-10-23 10:18 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-20 6:04 [PATCH net v2 0/3] mptcp: Fix conflicts between MPTCP and sockmap Jiayuan Chen
2025-10-20 6:04 ` [PATCH net v2 1/3] net,mptcp: fix incorrect IPv4/IPv6 fallback detection with BPF Sockmap Jiayuan Chen
2025-10-21 10:24 ` Jakub Sitnicki
2025-10-20 6:04 ` [PATCH net v2 2/3] bpf,sockmap: disallow MPTCP sockets from sockmap updates Jiayuan Chen
2025-10-21 10:49 ` Jakub Sitnicki
2025-10-21 12:16 ` Jiayuan Chen
2025-10-20 6:04 ` [PATCH net v2 3/3] selftests/bpf: Add mptcp test with sockmap Jiayuan Chen
2025-10-23 10:18 ` Jakub Sitnicki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).