* [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits
@ 2026-05-08 15:40 Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 1/8] mptcp: pm: in-kernel: explicitly limit batches to array size Matthieu Baerts (NGI0)
` (8 more replies)
0 siblings, 9 replies; 11+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-05-08 15:40 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Shuah Khan,
linux-kselftest
Allow switching from 8 to 64 for the maximum number of subflows and
accepted ADD_ADDR, and from 8 to 255 for the number of MPTCP endpoints.
The previous limit of 8 subflows makes sense in most cases. Using more
subflows will very likely *not* improve the situation, and could even
decrease the performances. But there are no technical limitations nor
performance impact to raise this limit, so let's do it: this will allow
people with very specific use-cases, and researchers to easily create
more subflows, and measure the performance impact by themselves.
- Patches 1-2: increase subflows and accepted ADD_ADDR limits.
- Patches 3-4: increase endpoints limit.
- Patches 5-7: validate the new limits: 64 subflows, 255 endpoints.
- Patch 8: selftests: use send()/recv() instead of sendto()/recvfrom().
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Matthieu Baerts (NGI0) (8):
mptcp: pm: in-kernel: explicitly limit batches to array size
mptcp: pm: in-kernel: increase all limits to 64
mptcp: pm: kernel: allow flushing more than 8 endpoints
mptcp: pm: in-kernel: increase endpoints limit
selftests: mptcp: join: allow changing ifaces nr per test
selftests: mptcp: join: validate 8x8 subflows
selftests: mptcp: pm: validate new limits
selftests: mptcp: pm: use simpler send/recv forms
net/mptcp/pm_kernel.c | 77 +++++++++++++++++--------
tools/testing/selftests/net/mptcp/mptcp_join.sh | 33 ++++++++++-
tools/testing/selftests/net/mptcp/pm_netlink.sh | 56 +++++++++++-------
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 8 +--
4 files changed, 121 insertions(+), 53 deletions(-)
---
base-commit: 6a4c4656b0d2d4056a1f0c35442db4e8a5cf8021
change-id: 20260508-net-next-mptcp-pm-inc-limits-b825af50e400
Best regards,
--
Matthieu Baerts (NGI0) <matttbe@kernel.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next 1/8] mptcp: pm: in-kernel: explicitly limit batches to array size
2026-05-08 15:40 [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits Matthieu Baerts (NGI0)
@ 2026-05-08 15:40 ` Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 2/8] mptcp: pm: in-kernel: increase all limits to 64 Matthieu Baerts (NGI0)
` (7 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-05-08 15:40 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0)
The in-kernel PM can create subflows in reply to ADD_ADDR by batch of
maximum 8 subflows for the moment. Same when adding new "subflow"
endpoints with the fullmesh flag. This limit is linked to the arrays
used during these steps.
There was no explicit limit to the arrays size (8), because the limit of
extra subflows is the same (8). It seems safer to use an explicit limit,
but also these two sizes are going to be different in the next commit.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/pm_kernel.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
index fc818b63752e..f8987a33bed4 100644
--- a/net/mptcp/pm_kernel.c
+++ b/net/mptcp/pm_kernel.c
@@ -201,7 +201,8 @@ fill_remote_addr(struct mptcp_sock *msk, struct mptcp_addr_info *local,
static unsigned int
fill_remote_addresses_fullmesh(struct mptcp_sock *msk,
struct mptcp_addr_info *local,
- struct mptcp_addr_info *addrs)
+ struct mptcp_addr_info *addrs,
+ int addrs_size)
{
u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk);
bool deny_id0 = READ_ONCE(msk->pm.remote_deny_join_id0);
@@ -236,7 +237,8 @@ fill_remote_addresses_fullmesh(struct mptcp_sock *msk,
msk->pm.extra_subflows++;
i++;
- if (msk->pm.extra_subflows >= limit_extra_subflows)
+ if (msk->pm.extra_subflows >= limit_extra_subflows ||
+ i == addrs_size)
break;
}
@@ -248,7 +250,8 @@ fill_remote_addresses_fullmesh(struct mptcp_sock *msk,
*/
static unsigned int
fill_remote_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *local,
- bool fullmesh, struct mptcp_addr_info *addrs)
+ bool fullmesh, struct mptcp_addr_info *addrs,
+ int addrs_size)
{
/* Non-fullmesh: fill in the single entry corresponding to the primary
* MPC subflow remote address, and return 1, corresponding to 1 entry.
@@ -257,7 +260,7 @@ fill_remote_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *local,
return fill_remote_addr(msk, local, addrs);
/* Fullmesh endpoint: fill all possible remote addresses */
- return fill_remote_addresses_fullmesh(msk, local, addrs);
+ return fill_remote_addresses_fullmesh(msk, local, addrs, addrs_size);
}
static struct mptcp_pm_addr_entry *
@@ -410,7 +413,8 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
else /* local_addr_used is not decr for ID 0 */
msk->pm.local_addr_used++;
- nr = fill_remote_addresses_vec(msk, &local.addr, fullmesh, addrs);
+ nr = fill_remote_addresses_vec(msk, &local.addr, fullmesh,
+ addrs, ARRAY_SIZE(addrs));
if (nr == 0)
continue;
@@ -447,6 +451,7 @@ static unsigned int
fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk,
struct mptcp_addr_info *remote,
struct mptcp_pm_local *locals,
+ int locals_size,
bool c_flag_case)
{
u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk);
@@ -488,7 +493,8 @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk,
msk->pm.extra_subflows++;
i++;
- if (msk->pm.extra_subflows >= limit_extra_subflows)
+ if (msk->pm.extra_subflows >= limit_extra_subflows ||
+ i == locals_size)
break;
}
rcu_read_unlock();
@@ -559,7 +565,8 @@ fill_local_laminar_endp(struct mptcp_sock *msk, struct mptcp_addr_info *remote,
static unsigned int
fill_local_addresses_vec_c_flag(struct mptcp_sock *msk,
struct mptcp_addr_info *remote,
- struct mptcp_pm_local *locals)
+ struct mptcp_pm_local *locals,
+ int locals_size)
{
u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk);
struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk);
@@ -586,7 +593,8 @@ fill_local_addresses_vec_c_flag(struct mptcp_sock *msk,
msk->pm.extra_subflows++;
i++;
- if (msk->pm.extra_subflows >= limit_extra_subflows)
+ if (msk->pm.extra_subflows >= limit_extra_subflows ||
+ i == locals_size)
break;
}
@@ -620,13 +628,14 @@ fill_local_address_any(struct mptcp_sock *msk, struct mptcp_addr_info *remote,
*/
static unsigned int
fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *remote,
- struct mptcp_pm_local *locals)
+ struct mptcp_pm_local *locals, int locals_size)
{
bool c_flag_case = remote->id && mptcp_pm_add_addr_c_flag_case(msk);
/* If there is at least one MPTCP endpoint with a fullmesh flag */
if (mptcp_pm_get_endp_fullmesh_max(msk))
return fill_local_addresses_vec_fullmesh(msk, remote, locals,
+ locals_size,
c_flag_case);
/* If there is at least one MPTCP endpoint with a laminar flag */
@@ -637,7 +646,8 @@ fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *remote,
* limits are used -- accepting no ADD_ADDR -- and use subflow endpoints
*/
if (c_flag_case)
- return fill_local_addresses_vec_c_flag(msk, remote, locals);
+ return fill_local_addresses_vec_c_flag(msk, remote, locals,
+ locals_size);
/* No special case: fill in the single 'IPADDRANY' local address */
return fill_local_address_any(msk, remote, &locals[0]);
@@ -672,7 +682,7 @@ static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk)
/* connect to the specified remote address, using whatever
* local address the routing configuration will pick.
*/
- nr = fill_local_addresses_vec(msk, &remote, locals);
+ nr = fill_local_addresses_vec(msk, &remote, locals, ARRAY_SIZE(locals));
if (nr == 0)
return;
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 2/8] mptcp: pm: in-kernel: increase all limits to 64
2026-05-08 15:40 [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 1/8] mptcp: pm: in-kernel: explicitly limit batches to array size Matthieu Baerts (NGI0)
@ 2026-05-08 15:40 ` Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 3/8] mptcp: pm: kernel: allow flushing more than 8 endpoints Matthieu Baerts (NGI0)
` (6 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-05-08 15:40 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0)
This means switching the maximum from 8 to 64 for the number of subflows
and accepted ADD_ADDR.
The previous limit of 8 subflows makes sense in most cases. Using more
subflows will very likely *not* improve the situation, and could even
decrease the performances. But there are no technical limitations nor
performance impact to raise this limit, so let's do it: this will allow
people with very specific use-cases, and researchers to easily create
more subflows, and measure the performance impact by themselves.
The theoretical limit is 255 -- the ID is written in a u8 on the wire --
but 64 is more than enough. With so many subflows, it will be costly to
iterate over all of them when operations are done in bottom half.
Note that the in-kernel PM will continue to create subflows in reply to
ADD_ADDR with a single batch of maximum 8 subflows. Same when adding new
"subflow" endpoints with the fullmesh flag. Increasing those batch
limits would have a memory impact, and it looks fine not to cover these
cases with larger batches for the moment. If more is needed later, the
position of the last subflow from the list could be remembered, and the
list iteration could continue later.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/434
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/pm_kernel.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
index f8987a33bed4..aabd73d15c15 100644
--- a/net/mptcp/pm_kernel.c
+++ b/net/mptcp/pm_kernel.c
@@ -30,6 +30,7 @@ struct pm_nl_pernet {
};
#define MPTCP_PM_ADDR_MAX 8
+#define MPTCP_PM_SUBFLOWS_MAX 64
static struct pm_nl_pernet *pm_nl_get_pernet(const struct net *net)
{
@@ -1381,10 +1382,10 @@ static int parse_limit(struct genl_info *info, int id, unsigned int *limit)
return 0;
*limit = nla_get_u32(attr);
- if (*limit > MPTCP_PM_ADDR_MAX) {
+ if (*limit > MPTCP_PM_SUBFLOWS_MAX) {
NL_SET_ERR_MSG_ATTR_FMT(info->extack, attr,
"limit greater than maximum (%u)",
- MPTCP_PM_ADDR_MAX);
+ MPTCP_PM_SUBFLOWS_MAX);
return -EINVAL;
}
return 0;
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 3/8] mptcp: pm: kernel: allow flushing more than 8 endpoints
2026-05-08 15:40 [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 1/8] mptcp: pm: in-kernel: explicitly limit batches to array size Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 2/8] mptcp: pm: in-kernel: increase all limits to 64 Matthieu Baerts (NGI0)
@ 2026-05-08 15:40 ` Matthieu Baerts (NGI0)
2026-05-11 11:25 ` Matthieu Baerts
2026-05-08 15:40 ` [PATCH net-next 4/8] mptcp: pm: in-kernel: increase endpoints limit Matthieu Baerts (NGI0)
` (5 subsequent siblings)
8 siblings, 1 reply; 11+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-05-08 15:40 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0)
The mptcp_rm_list structure contains an array of IDs of 8 entries: to be
able to send a RM_ADDR with 8 IDs. This limitation was OK so far because
there could maximum 8 endpoints.
But this is going to change in the next commit. To cope with that, if
one of the arrays is full, the iteration stops, the lists are processed,
then the iteration continues where it previously stopped.
Note that if there are many endpoints to remove, and multiple RM_ADDR to
send, it might be more likely that some of these RM_ADDRs are dropped or
lost. This is a known limitation: RM_ADDR are not retransmitted in
MPTCPv1.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/pm_kernel.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
index aabd73d15c15..ea3a7ea82013 100644
--- a/net/mptcp/pm_kernel.c
+++ b/net/mptcp/pm_kernel.c
@@ -1223,19 +1223,30 @@ int mptcp_pm_nl_del_addr_doit(struct sk_buff *skb, struct genl_info *info)
}
static void mptcp_pm_flush_addrs_and_subflows(struct mptcp_sock *msk,
- struct list_head *rm_list)
+ struct list_head *rm_list,
+ struct mptcp_pm_addr_entry *entry)
{
- struct mptcp_rm_list alist = { .nr = 0 }, slist = { .nr = 0 };
- struct mptcp_pm_addr_entry *entry;
+ struct mptcp_rm_list alist, slist;
+ bool more;
- list_for_each_entry(entry, rm_list, list) {
- if (slist.nr < MPTCP_RM_IDS_MAX &&
- mptcp_lookup_subflow_by_saddr(&msk->conn_list, &entry->addr))
+again:
+ alist.nr = 0;
+ slist.nr = 0;
+ more = false;
+
+ entry = list_prepare_entry(entry, rm_list, list);
+ list_for_each_entry_continue(entry, rm_list, list) {
+ if (mptcp_lookup_subflow_by_saddr(&msk->conn_list, &entry->addr))
slist.ids[slist.nr++] = mptcp_endp_get_local_id(msk, &entry->addr);
- if (alist.nr < MPTCP_RM_IDS_MAX &&
- mptcp_remove_anno_list_by_saddr(msk, &entry->addr))
+ if (mptcp_remove_anno_list_by_saddr(msk, &entry->addr))
alist.ids[alist.nr++] = mptcp_endp_get_local_id(msk, &entry->addr);
+
+ if (slist.nr == MPTCP_RM_IDS_MAX ||
+ alist.nr == MPTCP_RM_IDS_MAX) {
+ more = !list_is_last(&entry->list, rm_list);
+ break;
+ }
}
spin_lock_bh(&msk->pm.lock);
@@ -1246,9 +1257,14 @@ static void mptcp_pm_flush_addrs_and_subflows(struct mptcp_sock *msk,
if (slist.nr)
mptcp_pm_rm_subflow(msk, &slist);
/* Reset counters: maybe some subflows have been removed before */
- bitmap_fill(msk->pm.id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
- msk->pm.local_addr_used = 0;
+ if (!more) {
+ bitmap_fill(msk->pm.id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
+ msk->pm.local_addr_used = 0;
+ }
spin_unlock_bh(&msk->pm.lock);
+
+ if (more)
+ goto again;
}
static void mptcp_nl_flush_addrs_list(struct net *net,
@@ -1265,7 +1281,7 @@ static void mptcp_nl_flush_addrs_list(struct net *net,
if (!mptcp_pm_is_userspace(msk)) {
lock_sock(sk);
- mptcp_pm_flush_addrs_and_subflows(msk, rm_list);
+ mptcp_pm_flush_addrs_and_subflows(msk, rm_list, NULL);
release_sock(sk);
}
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 4/8] mptcp: pm: in-kernel: increase endpoints limit
2026-05-08 15:40 [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits Matthieu Baerts (NGI0)
` (2 preceding siblings ...)
2026-05-08 15:40 ` [PATCH net-next 3/8] mptcp: pm: kernel: allow flushing more than 8 endpoints Matthieu Baerts (NGI0)
@ 2026-05-08 15:40 ` Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 5/8] selftests: mptcp: join: allow changing ifaces nr per test Matthieu Baerts (NGI0)
` (4 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-05-08 15:40 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0)
The endpoints are managed in a list which was limited to 8 entries.
This limit can be too small in some cases: by having the same limit as
the number of subflows, it might not allow creating all expected
subflows when having a mix of v4 and v6 addresses that can all use MPTCP
on v4/v6 only networks.
While increasing the limit above the new subflows one, why not using the
technical limit: 255. Indeed, the endpoint will each have an ID that
will be used on the wire, limited to u8, and the ID 0 is reserved to the
initial subflow.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/pm_kernel.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
index ea3a7ea82013..4ba4346d7adc 100644
--- a/net/mptcp/pm_kernel.c
+++ b/net/mptcp/pm_kernel.c
@@ -746,7 +746,7 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet,
*/
if (pernet->next_id == MPTCP_PM_MAX_ADDR_ID)
pernet->next_id = 1;
- if (pernet->endpoints >= MPTCP_PM_ADDR_MAX) {
+ if (pernet->endpoints == MPTCP_PM_MAX_ADDR_ID) {
ret = -ERANGE;
goto out;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 5/8] selftests: mptcp: join: allow changing ifaces nr per test
2026-05-08 15:40 [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits Matthieu Baerts (NGI0)
` (3 preceding siblings ...)
2026-05-08 15:40 ` [PATCH net-next 4/8] mptcp: pm: in-kernel: increase endpoints limit Matthieu Baerts (NGI0)
@ 2026-05-08 15:40 ` Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 6/8] selftests: mptcp: join: validate 8x8 subflows Matthieu Baerts (NGI0)
` (3 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-05-08 15:40 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Shuah Khan,
linux-kselftest
By default, 4 network interfaces are created per subtest in a dedicated
net namespace. Each netns has a dedicated pair of v4 and v6 addresses.
Future tests will need more.
Simply always creating more network interfaces per test will increase
the execution time for all other tests, for no other benefits. So now it
is possible to change this number only when needed, by setting ifaces_nr
when calling 'reset' and 'init_shapers', e.g.
ifaces_nr=8 reset "Subtest title"
ifaces_nr=8 init_shapers
Note that it might also be interesting to decrease the default value to
2 to reduce the setup time, especially when a debug kernel config is
being used.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index beec41f6662a..28da9df797ae 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -63,6 +63,7 @@ unset fastclose
unset fullmesh
unset speed
unset bind_addr
+unset ifaces_nr
unset join_syn_rej
unset join_csum_ns1
unset join_csum_ns2
@@ -146,7 +147,7 @@ init_partial()
# ns1eth4 ns2eth4
local i
- for i in $(seq 1 4); do
+ for i in $(seq 1 "${ifaces_nr:-4}"); do
ip link add ns1eth$i netns "$ns1" type veth peer name ns2eth$i netns "$ns2"
ip -net "$ns1" addr add 10.0.$i.1/24 dev ns1eth$i
ip -net "$ns1" addr add dead:beef:$i::1/64 dev ns1eth$i nodad
@@ -165,7 +166,7 @@ init_partial()
init_shapers()
{
local i
- for i in $(seq 1 4); do
+ for i in $(seq 1 "${ifaces_nr:-4}"); do
tc -n $ns1 qdisc add dev ns1eth$i root netem rate 20mbit delay 1ms
tc -n $ns2 qdisc add dev ns2eth$i root netem rate 20mbit delay 1ms
done
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 6/8] selftests: mptcp: join: validate 8x8 subflows
2026-05-08 15:40 [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits Matthieu Baerts (NGI0)
` (4 preceding siblings ...)
2026-05-08 15:40 ` [PATCH net-next 5/8] selftests: mptcp: join: allow changing ifaces nr per test Matthieu Baerts (NGI0)
@ 2026-05-08 15:40 ` Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 7/8] selftests: mptcp: pm: validate new limits Matthieu Baerts (NGI0)
` (2 subsequent siblings)
8 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-05-08 15:40 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Shuah Khan,
linux-kselftest
The limits have been recently increased, it is required to validate that
having 64 subflows is allowed.
Here, both the client and the server have 8 network interfaces. The
server has 8 endpoints marked as 'signal' to announce all its v4
addresses. The client also has 8 endpoints, but marked as 'subflow' and
'fullmesh' in order to create 8 subflows to each address announced by
the server. This means 63 additional subflows will be created after the
initial one.
If it is not possible to increase the limits to 64, it means an older
kernel version is being used, and the test is skipped.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 28 +++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 28da9df797ae..c6bb345d056b 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -513,6 +513,19 @@ reset_with_tcp_filter()
fi
}
+# For kernel supporting limits above 8
+# $1: title ; $2,4: addrs limit ns1,2 ; $3,5: subflows limit ns1,2
+reset_with_high_limits()
+{
+ reset "${1}" || return 1
+
+ if ! pm_nl_set_limits "${ns1}" "${2}" "${3}" 2>/dev/null ||
+ ! pm_nl_set_limits "${ns2}" "${4}" "${5}" 2>/dev/null; then
+ mark_as_skipped "unable to set the limits to ${*:2}"
+ return 1
+ fi
+}
+
# $1: err msg
fail_test()
{
@@ -3670,6 +3683,21 @@ fullmesh_tests()
chk_prio_nr 0 1 1 0
chk_rm_nr 0 1
fi
+
+ # fullmesh in 8x8 to create 63 additional subflows
+ if ifaces_nr=8 reset_with_high_limits "fullmesh 8x8" 64 64 64 64; then
+ # higher chance to lose ADD_ADDR: allow retransmissions
+ ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
+ local i
+ for i in $(seq 1 8); do
+ pm_nl_add_endpoint $ns2 10.0.$i.2 flags subflow,fullmesh
+ pm_nl_add_endpoint $ns1 10.0.$i.1 flags signal
+ done
+ speed=slow \
+ run_tests $ns1 $ns2 10.0.1.1
+ chk_join_nr 63 63 63
+ fi
+
}
fastclose_tests()
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 7/8] selftests: mptcp: pm: validate new limits
2026-05-08 15:40 [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits Matthieu Baerts (NGI0)
` (5 preceding siblings ...)
2026-05-08 15:40 ` [PATCH net-next 6/8] selftests: mptcp: join: validate 8x8 subflows Matthieu Baerts (NGI0)
@ 2026-05-08 15:40 ` Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 8/8] selftests: mptcp: pm: use simpler send/recv forms Matthieu Baerts (NGI0)
2026-05-12 1:19 ` [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits patchwork-bot+netdevbpf
8 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-05-08 15:40 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Shuah Khan,
linux-kselftest
These limits have been recently updated, from 8 to:
- 64 for the subflows and accepted add_addr
- 255 for the MPTCP endpoints
These modifications validate the new limits, but are also compatible
with the previous ones, to be able to continue to validate stable kernel
using the last version of the selftests. That's why new variables are
now used instead of hard-coded values.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/mptcp/pm_netlink.sh | 56 +++++++++++++++----------
1 file changed, 35 insertions(+), 21 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/pm_netlink.sh b/tools/testing/selftests/net/mptcp/pm_netlink.sh
index 04594dfc22b1..21bfe1311f11 100755
--- a/tools/testing/selftests/net/mptcp/pm_netlink.sh
+++ b/tools/testing/selftests/net/mptcp/pm_netlink.sh
@@ -66,6 +66,15 @@ get_limits() {
fi
}
+get_limits_nb() {
+ if mptcp_lib_is_ip_mptcp; then
+ ip -n "${ns1}" mptcp limits | awk '{ print $2" "$4 }'
+ else
+ ip netns exec "${ns1}" ./pm_nl_ctl limits | \
+ awk '{ printf "%s ", $2 }'
+ fi
+}
+
format_endpoints() {
mptcp_lib_pm_nl_format_endpoints "${@}"
}
@@ -164,6 +173,7 @@ check "get_endpoint 2" "" "simple del addr" 1
check "show_endpoints" \
"$(format_endpoints "1,10.0.1.1" \
"3,10.0.1.3,signal backup")" "dump addrs after del"
+add_endpoint 10.0.1.2 id 2
add_endpoint 10.0.1.3 2>/dev/null
check "get_endpoint 4" "" "duplicate addr" 1
@@ -171,25 +181,29 @@ check "get_endpoint 4" "" "duplicate addr" 1
add_endpoint 10.0.1.4 flags signal
check "get_endpoint 4" "$(format_endpoints "4,10.0.1.4,signal")" "id addr increment"
-for i in $(seq 5 9); do
- add_endpoint "10.0.1.${i}" flags signal >/dev/null 2>&1
-done
-check "get_endpoint 9" "$(format_endpoints "9,10.0.1.9,signal")" "hard addr limit"
-check "get_endpoint 10" "" "above hard addr limit" 1
+read -r -a default_limits_nb <<< "$(get_limits_nb)"
+# limits have been increased: from 8 to 64 for subflows/add_addr & 255 for endp
+if mptcp_lib_expect_all_features || set_limits 9 9 2>/dev/null; then
+ max_endp=255
+ max_limits=64
+else
+ max_endp=8
+ max_limits=8
+fi
+set_limits "${default_limits_nb[@]}"
-del_endpoint 9
-for i in $(seq 10 255); do
- add_endpoint 10.0.0.9 id "${i}"
- del_endpoint "${i}"
+for i in $(seq 5 ${max_endp}); do
+ add_endpoint "10.0.0.${i}" id "${i}"
done
-check "show_endpoints" \
- "$(format_endpoints "1,10.0.1.1" \
- "3,10.0.1.3,signal backup" \
- "4,10.0.1.4,signal" \
- "5,10.0.1.5,signal" \
- "6,10.0.1.6,signal" \
- "7,10.0.1.7,signal" \
- "8,10.0.1.8,signal")" "id limit"
+check "get_endpoint ${max_endp}" \
+ "$(format_endpoints "${max_endp},10.0.0.${max_endp}")" "id limit"
+
+if add_endpoint '10.0.0.1' &>/dev/null; then
+ hardlimit="no error"
+else
+ hardlimit="error"
+fi
+check "echo ${hardlimit}" "error" "above hard addr limit"
flush_endpoint
check "show_endpoints" "" "flush addrs"
@@ -202,15 +216,15 @@ if ! mptcp_lib_is_ip_mptcp; then
flush_endpoint
fi
-set_limits 9 1 2>/dev/null
+set_limits $((max_limits + 1)) 1 2>/dev/null
check "get_limits" "${default_limits}" "rcv addrs above hard limit"
-set_limits 1 9 2>/dev/null
+set_limits 1 $((max_limits + 1)) 2>/dev/null
check "get_limits" "${default_limits}" "subflows above hard limit"
-set_limits 8 8
+set_limits ${max_limits} ${max_limits}
flush_endpoint ## to make sure it doesn't affect the limits
-check "get_limits" "$(format_limits 8 8)" "set limits"
+check "get_limits" "$(format_limits ${max_limits} ${max_limits})" "set limits"
flush_endpoint
add_endpoint 10.0.1.1
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 8/8] selftests: mptcp: pm: use simpler send/recv forms
2026-05-08 15:40 [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits Matthieu Baerts (NGI0)
` (6 preceding siblings ...)
2026-05-08 15:40 ` [PATCH net-next 7/8] selftests: mptcp: pm: validate new limits Matthieu Baerts (NGI0)
@ 2026-05-08 15:40 ` Matthieu Baerts (NGI0)
2026-05-12 1:19 ` [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits patchwork-bot+netdevbpf
8 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-05-08 15:40 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Shuah Khan,
linux-kselftest
Instead of sendto() and recvfrom() which the NL address that was already
provided before.
Just simpler and easier to read without the to/from variants.
While at it, fix a checkpatch warning by removing multiple assignments.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
index 99eecccbf0c8..78180da1efcc 100644
--- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
+++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c
@@ -217,8 +217,6 @@ static int capture_events(int fd, int event_group)
/* do a netlink command and, if max > 0, fetch the reply ; nh's size >1024B */
static int do_nl_req(int fd, struct nlmsghdr *nh, int len, int max)
{
- struct sockaddr_nl nladdr = { .nl_family = AF_NETLINK };
- socklen_t addr_len;
void *data = nh;
int rem, ret;
int err = 0;
@@ -230,15 +228,15 @@ static int do_nl_req(int fd, struct nlmsghdr *nh, int len, int max)
}
nh->nlmsg_len = len;
- ret = sendto(fd, data, len, 0, (void *)&nladdr, sizeof(nladdr));
+ ret = send(fd, data, len, 0);
if (ret != len)
error(1, errno, "send netlink: %uB != %uB\n", ret, len);
- addr_len = sizeof(nladdr);
- rem = ret = recvfrom(fd, data, max, 0, (void *)&nladdr, &addr_len);
+ ret = recv(fd, data, max, 0);
if (ret < 0)
error(1, errno, "recv netlink: %uB\n", ret);
+ rem = ret;
/* Beware: the NLMSG_NEXT macro updates the 'rem' argument */
for (; NLMSG_OK(nh, rem); nh = NLMSG_NEXT(nh, rem)) {
if (nh->nlmsg_type == NLMSG_DONE)
--
2.53.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 3/8] mptcp: pm: kernel: allow flushing more than 8 endpoints
2026-05-08 15:40 ` [PATCH net-next 3/8] mptcp: pm: kernel: allow flushing more than 8 endpoints Matthieu Baerts (NGI0)
@ 2026-05-11 11:25 ` Matthieu Baerts
0 siblings, 0 replies; 11+ messages in thread
From: Matthieu Baerts @ 2026-05-11 11:25 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel
Hello,
On 08/05/2026 17:40, Matthieu Baerts (NGI0) wrote:
> The mptcp_rm_list structure contains an array of IDs of 8 entries: to be
> able to send a RM_ADDR with 8 IDs. This limitation was OK so far because
> there could maximum 8 endpoints.
>
> But this is going to change in the next commit. To cope with that, if
> one of the arrays is full, the iteration stops, the lists are processed,
> then the iteration continues where it previously stopped.
>
> Note that if there are many endpoints to remove, and multiple RM_ADDR to
> send, it might be more likely that some of these RM_ADDRs are dropped or
> lost. This is a known limitation: RM_ADDR are not retransmitted in
> MPTCPv1.
(...)
> diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
> index aabd73d15c15..ea3a7ea82013 100644
> --- a/net/mptcp/pm_kernel.c
> +++ b/net/mptcp/pm_kernel.c
> @@ -1223,19 +1223,30 @@ int mptcp_pm_nl_del_addr_doit(struct sk_buff *skb, struct genl_info *info)
> }
>
> static void mptcp_pm_flush_addrs_and_subflows(struct mptcp_sock *msk,
> - struct list_head *rm_list)
> + struct list_head *rm_list,
> + struct mptcp_pm_addr_entry *entry)
> {
> - struct mptcp_rm_list alist = { .nr = 0 }, slist = { .nr = 0 };
> - struct mptcp_pm_addr_entry *entry;
> + struct mptcp_rm_list alist, slist;
> + bool more;
>
> - list_for_each_entry(entry, rm_list, list) {
> - if (slist.nr < MPTCP_RM_IDS_MAX &&
> - mptcp_lookup_subflow_by_saddr(&msk->conn_list, &entry->addr))
> +again:
> + alist.nr = 0;
> + slist.nr = 0;
FYI, Sashiko Gemini is saying:
> Are the ids arrays in alist and slist left uninitialized on the stack here?
> Later, in mptcp_pm_remove_addr(), a full struct assignment
> (msk->pm.rm_list_tx = *rm_list) copies the structure. Could this copy
> uninitialized stack memory into the persistent socket structure and
> trigger KMSAN use-of-uninitialized-value warnings?
It is not an issue, because if 'nr' are 0, nothing else is read from the
structures. AFAICS KMSAN will then not complain if the uninitialized
values are not used.
> + more = false;
Sashiko Gemini is saying:
> If "more" is true and the function loops back to process another batch,
> mptcp_pm_remove_addr() will have already set the MPTCP_RM_ADDR_SIGNAL bit in
> msk->pm.addr_signal during the first iteration.
> Since mptcp_pm_flush_addrs_and_subflows() is called with lock_sock(sk) held,
> the MPTCP TX path cannot run to transmit the RM_ADDR and clear the signal bit
> between iterations.
> When the loop processes the second batch and calls mptcp_pm_remove_addr()
> again, msk->pm.addr_signal is still set.
> Will this cause mptcp_pm_remove_addr() to return -EINVAL and silently drop all
> batches after the first locally?
That's fine: RM_ADDR are notifications that can be lost anyway. What is
important is to remove the different addresses internally.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits
2026-05-08 15:40 [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits Matthieu Baerts (NGI0)
` (7 preceding siblings ...)
2026-05-08 15:40 ` [PATCH net-next 8/8] selftests: mptcp: pm: use simpler send/recv forms Matthieu Baerts (NGI0)
@ 2026-05-12 1:19 ` patchwork-bot+netdevbpf
8 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-12 1:19 UTC (permalink / raw)
To: Matthieu Baerts
Cc: martineau, geliang, davem, edumazet, kuba, pabeni, horms, netdev,
mptcp, linux-kernel, shuah, linux-kselftest
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Fri, 08 May 2026 17:40:45 +0200 you wrote:
> Allow switching from 8 to 64 for the maximum number of subflows and
> accepted ADD_ADDR, and from 8 to 255 for the number of MPTCP endpoints.
>
> The previous limit of 8 subflows makes sense in most cases. Using more
> subflows will very likely *not* improve the situation, and could even
> decrease the performances. But there are no technical limitations nor
> performance impact to raise this limit, so let's do it: this will allow
> people with very specific use-cases, and researchers to easily create
> more subflows, and measure the performance impact by themselves.
>
> [...]
Here is the summary with links:
- [net-next,1/8] mptcp: pm: in-kernel: explicitly limit batches to array size
https://git.kernel.org/netdev/net-next/c/9031e5e31d5d
- [net-next,2/8] mptcp: pm: in-kernel: increase all limits to 64
https://git.kernel.org/netdev/net-next/c/c8646664fbf1
- [net-next,3/8] mptcp: pm: kernel: allow flushing more than 8 endpoints
https://git.kernel.org/netdev/net-next/c/607f16ab462b
- [net-next,4/8] mptcp: pm: in-kernel: increase endpoints limit
https://git.kernel.org/netdev/net-next/c/e845e6397d78
- [net-next,5/8] selftests: mptcp: join: allow changing ifaces nr per test
https://git.kernel.org/netdev/net-next/c/e1515a1a494b
- [net-next,6/8] selftests: mptcp: join: validate 8x8 subflows
https://git.kernel.org/netdev/net-next/c/1697837a67fa
- [net-next,7/8] selftests: mptcp: pm: validate new limits
https://git.kernel.org/netdev/net-next/c/c9b581e619d2
- [net-next,8/8] selftests: mptcp: pm: use simpler send/recv forms
https://git.kernel.org/netdev/net-next/c/ed5372634c5b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-05-12 1:20 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-08 15:40 [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 1/8] mptcp: pm: in-kernel: explicitly limit batches to array size Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 2/8] mptcp: pm: in-kernel: increase all limits to 64 Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 3/8] mptcp: pm: kernel: allow flushing more than 8 endpoints Matthieu Baerts (NGI0)
2026-05-11 11:25 ` Matthieu Baerts
2026-05-08 15:40 ` [PATCH net-next 4/8] mptcp: pm: in-kernel: increase endpoints limit Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 5/8] selftests: mptcp: join: allow changing ifaces nr per test Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 6/8] selftests: mptcp: join: validate 8x8 subflows Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 7/8] selftests: mptcp: pm: validate new limits Matthieu Baerts (NGI0)
2026-05-08 15:40 ` [PATCH net-next 8/8] selftests: mptcp: pm: use simpler send/recv forms Matthieu Baerts (NGI0)
2026-05-12 1:19 ` [PATCH net-next 0/8] mptcp: pm: in-kernel: increase limits patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox