* [PATCH net-next 1/3] mptcp: make ADD_ADDR retransmission timeout adaptive
2025-09-07 15:32 [PATCH net-next 0/3] mptcp: make ADD_ADDR retransmission timeout adaptive Matthieu Baerts (NGI0)
@ 2025-09-07 15:32 ` Matthieu Baerts (NGI0)
2025-09-07 15:32 ` [PATCH net-next 2/3] selftests: mptcp: join: tolerate more ADD_ADDR Matthieu Baerts (NGI0)
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-09-07 15:32 UTC (permalink / raw)
To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
Shuah Khan
Cc: netdev, linux-doc, linux-kernel, linux-kselftest,
Matthieu Baerts (NGI0), Christoph Paasch, Geliang Tang
From: Geliang Tang <tanggeliang@kylinos.cn>
Currently the ADD_ADDR option is retransmitted with a fixed timeout. This
patch makes the retransmission timeout adaptive by using the maximum RTO
among all the subflows, while still capping it at the configured maximum
value (add_addr_timeout_max). This improves responsiveness when
establishing new subflows.
Specifically:
1. Adds mptcp_adjust_add_addr_timeout() helper to compute the adaptive
timeout.
2. Uses maximum subflow RTO (icsk_rto) when available.
3. Applies exponential backoff based on retransmission count.
4. Maintains fallback to configured max timeout when no RTO data exists.
This slightly changes the behaviour of the MPTCP "add_addr_timeout"
sysctl knob to be used as a maximum instead of a fixed value. But this
is seen as an improvement: the ADD_ADDR might be sent quicker than
before to improve the overall MPTCP connection. Also, the default
value is set to 2 min, which was already way too long, and caused the
ADD_ADDR not to be retransmitted for connections shorter than 2 minutes.
Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/576
Reviewed-by: Christoph Paasch <cpaasch@openai.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
v2: no changes.
---
Documentation/networking/mptcp-sysctl.rst | 8 +++++---
net/mptcp/pm.c | 28 ++++++++++++++++++++++++----
2 files changed, 29 insertions(+), 7 deletions(-)
diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst
index 1683c139821e3ba6d9eaa3c59330a523d29f1164..1eb6af26b4a7acdedd575a126c576210a78f0d4d 100644
--- a/Documentation/networking/mptcp-sysctl.rst
+++ b/Documentation/networking/mptcp-sysctl.rst
@@ -8,9 +8,11 @@ MPTCP Sysfs variables
===============================
add_addr_timeout - INTEGER (seconds)
- Set the timeout after which an ADD_ADDR control message will be
- resent to an MPTCP peer that has not acknowledged a previous
- ADD_ADDR message.
+ Set the maximum value of timeout after which an ADD_ADDR control message
+ will be resent to an MPTCP peer that has not acknowledged a previous
+ ADD_ADDR message. A dynamically estimated retransmission timeout based
+ on the estimated connection round-trip-time is used if this value is
+ lower than the maximum one.
Do not retransmit if set to 0.
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 136a380602cae872b76560649c924330e5f42533..204e1f61212e2be77a8476f024b59be67d04b80a 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -268,6 +268,27 @@ int mptcp_pm_mp_prio_send_ack(struct mptcp_sock *msk,
return -EINVAL;
}
+static unsigned int mptcp_adjust_add_addr_timeout(struct mptcp_sock *msk)
+{
+ const struct net *net = sock_net((struct sock *)msk);
+ unsigned int rto = mptcp_get_add_addr_timeout(net);
+ struct mptcp_subflow_context *subflow;
+ unsigned int max = 0;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+ struct inet_connection_sock *icsk = inet_csk(ssk);
+
+ if (icsk->icsk_rto > max)
+ max = icsk->icsk_rto;
+ }
+
+ if (max && max < rto)
+ rto = max;
+
+ return rto;
+}
+
static void mptcp_pm_add_timer(struct timer_list *timer)
{
struct mptcp_pm_add_entry *entry = timer_container_of(entry, timer,
@@ -292,7 +313,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer)
goto out;
}
- timeout = mptcp_get_add_addr_timeout(sock_net(sk));
+ timeout = mptcp_adjust_add_addr_timeout(msk);
if (!timeout)
goto out;
@@ -307,7 +328,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer)
if (entry->retrans_times < ADD_ADDR_RETRANS_MAX)
sk_reset_timer(sk, timer,
- jiffies + timeout);
+ jiffies + (timeout << entry->retrans_times));
spin_unlock_bh(&msk->pm.lock);
@@ -348,7 +369,6 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
{
struct mptcp_pm_add_entry *add_entry = NULL;
struct sock *sk = (struct sock *)msk;
- struct net *net = sock_net(sk);
unsigned int timeout;
lockdep_assert_held(&msk->pm.lock);
@@ -374,7 +394,7 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
timer_setup(&add_entry->add_timer, mptcp_pm_add_timer, 0);
reset_timer:
- timeout = mptcp_get_add_addr_timeout(net);
+ timeout = mptcp_adjust_add_addr_timeout(msk);
if (timeout)
sk_reset_timer(sk, &add_entry->add_timer, jiffies + timeout);
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH net-next 2/3] selftests: mptcp: join: tolerate more ADD_ADDR
2025-09-07 15:32 [PATCH net-next 0/3] mptcp: make ADD_ADDR retransmission timeout adaptive Matthieu Baerts (NGI0)
2025-09-07 15:32 ` [PATCH net-next 1/3] " Matthieu Baerts (NGI0)
@ 2025-09-07 15:32 ` Matthieu Baerts (NGI0)
2025-09-07 15:32 ` [PATCH net-next 3/3] selftests: mptcp: join: allow more time to send ADD_ADDR Matthieu Baerts (NGI0)
2025-09-10 2:20 ` [PATCH net-next 0/3] mptcp: make ADD_ADDR retransmission timeout adaptive patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-09-07 15:32 UTC (permalink / raw)
To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
Shuah Khan
Cc: netdev, linux-doc, linux-kernel, linux-kselftest,
Matthieu Baerts (NGI0)
ADD_ADDR can be retransmitted, and with, the parent commit, these
retransmissions can be sent quicker: from 2 minutes to less than one
second.
To avoid false positives where retransmitted ADD_ADDR causes higher
counters than expected, it is required to be more tolerant. Errors are
now only reported when fewer ADD_ADDRs have been sent/received, except
if no ADD_ADDR are expected.
Before the parent commit, the tolerance was present for each tests where
the ADD_ADDR could be retransmitted in a reasonable time (1 sec). Now
that all tests can have retransmitted ADD_ADDR, it is normal to apply
the same tolerance for all tests.
An alternative could be to disable the ADD_ADDR retransmissions by
default, but that's changing the default kernel behaviour. Plus,
ADD_ADDR retransmissions can be required for some tests. To avoid adding
exceptions to many tests, it seems better to increase the tolerance.
Later, we could add a new MIB counter to identify the ADD_ADDR
retransmissions, and remove the tolerance when this counter is
available.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 2f046167a0b6cc6fb5531a033d8d95c9ea399cf9..e9e11a9e60fd5374c8a98c3b7159ccbca8053030 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -358,6 +358,7 @@ reset_with_add_addr_timeout()
tables="${ip6tables}"
fi
+ # set a maximum, to avoid too long timeout with exponential backoff
ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
@@ -1669,7 +1670,6 @@ chk_add_nr()
local tx=""
local rx=""
local count
- local timeout
if [[ $ns_invert = "invert" ]]; then
ns_tx=$ns2
@@ -1678,15 +1678,13 @@ chk_add_nr()
rx=" server"
fi
- timeout=$(ip netns exec ${ns_tx} sysctl -n net.mptcp.add_addr_timeout)
-
print_check "add addr rx${rx}"
count=$(mptcp_lib_get_counter ${ns_rx} "MPTcpExtAddAddr")
if [ -z "$count" ]; then
print_skip
- # if the test configured a short timeout tolerate greater then expected
- # add addrs options, due to retransmissions
- elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then
+ # Tolerate more ADD_ADDR then expected (if any), due to retransmissions
+ elif [ "$count" != "$add_nr" ] &&
+ { [ "$add_nr" -eq 0 ] || [ "$count" -lt "$add_nr" ]; }; then
fail_test "got $count ADD_ADDR[s] expected $add_nr"
else
print_ok
@@ -1774,18 +1772,15 @@ chk_add_tx_nr()
{
local add_tx_nr=$1
local echo_tx_nr=$2
- local timeout
local count
- timeout=$(ip netns exec $ns1 sysctl -n net.mptcp.add_addr_timeout)
-
print_check "add addr tx"
count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtAddAddrTx")
if [ -z "$count" ]; then
print_skip
- # if the test configured a short timeout tolerate greater then expected
- # add addrs options, due to retransmissions
- elif [ "$count" != "$add_tx_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_tx_nr" ]; }; then
+ # Tolerate more ADD_ADDR then expected (if any), due to retransmissions
+ elif [ "$count" != "$add_tx_nr" ] &&
+ { [ "$add_tx_nr" -eq 0 ] || [ "$count" -lt "$add_tx_nr" ]; }; then
fail_test "got $count ADD_ADDR[s] TX, expected $add_tx_nr"
else
print_ok
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* [PATCH net-next 3/3] selftests: mptcp: join: allow more time to send ADD_ADDR
2025-09-07 15:32 [PATCH net-next 0/3] mptcp: make ADD_ADDR retransmission timeout adaptive Matthieu Baerts (NGI0)
2025-09-07 15:32 ` [PATCH net-next 1/3] " Matthieu Baerts (NGI0)
2025-09-07 15:32 ` [PATCH net-next 2/3] selftests: mptcp: join: tolerate more ADD_ADDR Matthieu Baerts (NGI0)
@ 2025-09-07 15:32 ` Matthieu Baerts (NGI0)
2025-09-10 2:20 ` [PATCH net-next 0/3] mptcp: make ADD_ADDR retransmission timeout adaptive patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-09-07 15:32 UTC (permalink / raw)
To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
Shuah Khan
Cc: netdev, linux-doc, linux-kernel, linux-kselftest,
Matthieu Baerts (NGI0)
When many ADD_ADDR need to be sent, it can take some time to send each
of them, and create new subflows. Some CIs seem to occasionally have
issues with these tests, especially with "debug" kernels.
Two subtests will now run for a slightly longer time: the last two where
3 or more ADD_ADDR are sent during the test.
Reviewed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index e9e11a9e60fd5374c8a98c3b7159ccbca8053030..b41cebfa1f921ce9ea6a88a908bf6d5e6027b367 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -2268,7 +2268,8 @@ signal_address_tests()
pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
pm_nl_add_endpoint $ns1 10.0.4.1 flags signal
pm_nl_set_limits $ns2 3 3
- run_tests $ns1 $ns2 10.0.1.1
+ speed=slow \
+ run_tests $ns1 $ns2 10.0.1.1
chk_join_nr 3 3 3
chk_add_nr 3 3
fi
@@ -2280,7 +2281,8 @@ signal_address_tests()
pm_nl_add_endpoint $ns1 10.0.3.1 flags signal
pm_nl_add_endpoint $ns1 10.0.14.1 flags signal
pm_nl_set_limits $ns2 3 3
- run_tests $ns1 $ns2 10.0.1.1
+ speed=slow \
+ run_tests $ns1 $ns2 10.0.1.1
join_syn_tx=3 \
chk_join_nr 1 1 1
chk_add_nr 3 3
--
2.51.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 0/3] mptcp: make ADD_ADDR retransmission timeout adaptive
2025-09-07 15:32 [PATCH net-next 0/3] mptcp: make ADD_ADDR retransmission timeout adaptive Matthieu Baerts (NGI0)
` (2 preceding siblings ...)
2025-09-07 15:32 ` [PATCH net-next 3/3] selftests: mptcp: join: allow more time to send ADD_ADDR Matthieu Baerts (NGI0)
@ 2025-09-10 2:20 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-09-10 2:20 UTC (permalink / raw)
To: Matthieu Baerts
Cc: mptcp, martineau, geliang, davem, edumazet, kuba, pabeni, horms,
corbet, shuah, netdev, linux-doc, linux-kernel, linux-kselftest,
cpaasch
Hello:
This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Sun, 07 Sep 2025 17:32:41 +0200 you wrote:
> Currently, the MPTCP ADD_ADDR notifications are retransmitted after a
> fixed timeout controlled by the net.mptcp.add_addr_timeout sysctl knob,
> if the corresponding "echo" packets are not received before. This can be
> too slow (or too quick), especially with a too cautious default value
> set to 2 minutes.
>
> - Patch 1: make ADD_ADDR retransmission timeout adaptive, using the
> TCP's retransmission timeout. The corresponding sysctl knob is now
> used as a maximum value.
>
> [...]
Here is the summary with links:
- [net-next,1/3] mptcp: make ADD_ADDR retransmission timeout adaptive
https://git.kernel.org/netdev/net-next/c/30549eebc4d8
- [net-next,2/3] selftests: mptcp: join: tolerate more ADD_ADDR
https://git.kernel.org/netdev/net-next/c/63c31d42cf6f
- [net-next,3/3] selftests: mptcp: join: allow more time to send ADD_ADDR
https://git.kernel.org/netdev/net-next/c/e2cda6343bfe
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 5+ messages in thread