[PATCH net-next 0/6] mptcp: misc. features for v6.18

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 0/6] mptcp: misc. features for v6.18
@ 2025-09-01  9:39 Matthieu Baerts (NGI0)
  2025-09-01  9:39 ` [PATCH net-next 1/6] mptcp: use HMAC-SHA256 library instead of open-coded HMAC Matthieu Baerts (NGI0)
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-09-01  9:39 UTC (permalink / raw)
  To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan
  Cc: netdev, linux-kernel, linux-doc, linux-kselftest,
	Matthieu Baerts (NGI0), Eric Biggers, Christoph Paasch, Gang Yan,
	Geliang Tang

This series contains 4 independent new features:

- Patch 1: use HMAC-SHA256 library instead of open-coded HMAC.

- Patches 2-3: make ADD_ADDR retransmission timeout adaptive + simplify
  selftests.

- Patch 4: selftests: check for unexpected fallback counter increments.

- Patches 5-6: record subflows in RPS table, for aRFS support.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Christoph Paasch (2):
      net: Add rfs_needed() helper
      mptcp: record subflows in RPS table

Eric Biggers (1):
      mptcp: use HMAC-SHA256 library instead of open-coded HMAC

Gang Yan (1):
      selftests: mptcp: add checks for fallback counters

Geliang Tang (2):
      mptcp: make ADD_ADDR retransmission timeout adaptive
      selftests: mptcp: remove add_addr_timeout settings

 Documentation/networking/mptcp-sysctl.rst       |   8 +-
 include/net/rps.h                               |  85 ++++++++++------
 net/mptcp/crypto.c                              |  35 +------
 net/mptcp/pm.c                                  |  28 +++++-
 net/mptcp/protocol.c                            |  21 ++++
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 126 +++++++++++++++++++++++-
 6 files changed, 231 insertions(+), 72 deletions(-)
---
base-commit: d23ad54de795ec0054f90ecb03b41e8f2c410f3a
change-id: 20250829-net-next-mptcp-misc-feat-6-18-722fa87a60f1

Best regards,
-- 
Matthieu Baerts (NGI0) <matttbe@kernel.org>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next 1/6] mptcp: use HMAC-SHA256 library instead of open-coded HMAC
  2025-09-01  9:39 [PATCH net-next 0/6] mptcp: misc. features for v6.18 Matthieu Baerts (NGI0)
@ 2025-09-01  9:39 ` Matthieu Baerts (NGI0)
  2025-09-01  9:39 ` [PATCH net-next 2/6] mptcp: make ADD_ADDR retransmission timeout adaptive Matthieu Baerts (NGI0)
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-09-01  9:39 UTC (permalink / raw)
  To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan
  Cc: netdev, linux-kernel, linux-doc, linux-kselftest,
	Matthieu Baerts (NGI0), Eric Biggers

From: Eric Biggers <ebiggers@kernel.org>

Now that there are easy-to-use HMAC-SHA256 library functions, use these
in net/mptcp/crypto.c instead of open-coding the HMAC algorithm.

Remove the WARN_ON_ONCE() for messages longer than SHA256_DIGEST_SIZE.
The new implementation handles all message lengths correctly.

The mptcp-crypto KUnit test still passes after this change.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 net/mptcp/crypto.c | 35 ++---------------------------------
 1 file changed, 2 insertions(+), 33 deletions(-)

diff --git a/net/mptcp/crypto.c b/net/mptcp/crypto.c
index b08ba959ac4fd485bb833043ff58d4c8bac8a37a..31948e18d97da7ee0ee2ae9e4f7c9ca0e3b330a7 100644
--- a/net/mptcp/crypto.c
+++ b/net/mptcp/crypto.c
@@ -22,7 +22,6 @@
 
 #include <linux/kernel.h>
 #include <crypto/sha2.h>
-#include <linux/unaligned.h>
 
 #include "protocol.h"
 
@@ -43,39 +42,9 @@ void mptcp_crypto_key_sha(u64 key, u32 *token, u64 *idsn)
 
 void mptcp_crypto_hmac_sha(u64 key1, u64 key2, u8 *msg, int len, void *hmac)
 {
-	u8 input[SHA256_BLOCK_SIZE + SHA256_DIGEST_SIZE];
-	u8 key1be[8];
-	u8 key2be[8];
-	int i;
+	__be64 key[2] = { cpu_to_be64(key1), cpu_to_be64(key2) };
 
-	if (WARN_ON_ONCE(len > SHA256_DIGEST_SIZE))
-		len = SHA256_DIGEST_SIZE;
-
-	put_unaligned_be64(key1, key1be);
-	put_unaligned_be64(key2, key2be);
-
-	/* Generate key xored with ipad */
-	memset(input, 0x36, SHA256_BLOCK_SIZE);
-	for (i = 0; i < 8; i++)
-		input[i] ^= key1be[i];
-	for (i = 0; i < 8; i++)
-		input[i + 8] ^= key2be[i];
-
-	memcpy(&input[SHA256_BLOCK_SIZE], msg, len);
-
-	/* emit sha256(K1 || msg) on the second input block, so we can
-	 * reuse 'input' for the last hashing
-	 */
-	sha256(input, SHA256_BLOCK_SIZE + len, &input[SHA256_BLOCK_SIZE]);
-
-	/* Prepare second part of hmac */
-	memset(input, 0x5C, SHA256_BLOCK_SIZE);
-	for (i = 0; i < 8; i++)
-		input[i] ^= key1be[i];
-	for (i = 0; i < 8; i++)
-		input[i + 8] ^= key2be[i];
-
-	sha256(input, SHA256_BLOCK_SIZE + SHA256_DIGEST_SIZE, hmac);
+	hmac_sha256_usingrawkey((const u8 *)key, sizeof(key), msg, len, hmac);
 }
 
 #if IS_MODULE(CONFIG_MPTCP_KUNIT_TEST)

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 2/6] mptcp: make ADD_ADDR retransmission timeout adaptive
  2025-09-01  9:39 [PATCH net-next 0/6] mptcp: misc. features for v6.18 Matthieu Baerts (NGI0)
  2025-09-01  9:39 ` [PATCH net-next 1/6] mptcp: use HMAC-SHA256 library instead of open-coded HMAC Matthieu Baerts (NGI0)
@ 2025-09-01  9:39 ` Matthieu Baerts (NGI0)
  2025-09-01  9:39 ` [PATCH net-next 3/6] selftests: mptcp: remove add_addr_timeout settings Matthieu Baerts (NGI0)
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-09-01  9:39 UTC (permalink / raw)
  To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan
  Cc: netdev, linux-kernel, linux-doc, linux-kselftest,
	Matthieu Baerts (NGI0), Christoph Paasch, Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

Currently the ADD_ADDR option is retransmitted with a fixed timeout. This
patch makes the retransmission timeout adaptive by using the maximum RTO
among all the subflows, while still capping it at the configured maximum
value (add_addr_timeout_max). This improves responsiveness when
establishing new subflows.

Specifically:
1. Adds mptcp_adjust_add_addr_timeout() helper to compute the adaptive
timeout.
2. Uses maximum subflow RTO (icsk_rto) when available.
3. Applies exponential backoff based on retransmission count.
4. Maintains fallback to configured max timeout when no RTO data exists.

This slightly changes the behaviour of the MPTCP "add_addr_timeout"
sysctl knob to be used as a maximum instead of a fixed value. But this
is seen as an improvement: the ADD_ADDR might be sent quicker than
before to improve the overall MPTCP connection. Also, the default
value is set to 2 min, which was already way too long, and caused the
ADD_ADDR not to be retransmitted for connections shorter than 2 minutes.

Suggested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/576
Reviewed-by: Christoph Paasch <cpaasch@openai.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 Documentation/networking/mptcp-sysctl.rst |  8 +++++---
 net/mptcp/pm.c                            | 28 ++++++++++++++++++++++++----
 2 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst
index 1683c139821e3ba6d9eaa3c59330a523d29f1164..1eb6af26b4a7acdedd575a126c576210a78f0d4d 100644
--- a/Documentation/networking/mptcp-sysctl.rst
+++ b/Documentation/networking/mptcp-sysctl.rst
@@ -8,9 +8,11 @@ MPTCP Sysfs variables
 ===============================
 
 add_addr_timeout - INTEGER (seconds)
-	Set the timeout after which an ADD_ADDR control message will be
-	resent to an MPTCP peer that has not acknowledged a previous
-	ADD_ADDR message.
+	Set the maximum value of timeout after which an ADD_ADDR control message
+	will be resent to an MPTCP peer that has not acknowledged a previous
+	ADD_ADDR message. A dynamically estimated retransmission timeout based
+	on the estimated connection round-trip-time is used if this value is
+	lower than the maximum one.
 
 	Do not retransmit if set to 0.
 
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 136a380602cae872b76560649c924330e5f42533..204e1f61212e2be77a8476f024b59be67d04b80a 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -268,6 +268,27 @@ int mptcp_pm_mp_prio_send_ack(struct mptcp_sock *msk,
 	return -EINVAL;
 }
 
+static unsigned int mptcp_adjust_add_addr_timeout(struct mptcp_sock *msk)
+{
+	const struct net *net = sock_net((struct sock *)msk);
+	unsigned int rto = mptcp_get_add_addr_timeout(net);
+	struct mptcp_subflow_context *subflow;
+	unsigned int max = 0;
+
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+		struct inet_connection_sock *icsk = inet_csk(ssk);
+
+		if (icsk->icsk_rto > max)
+			max = icsk->icsk_rto;
+	}
+
+	if (max && max < rto)
+		rto = max;
+
+	return rto;
+}
+
 static void mptcp_pm_add_timer(struct timer_list *timer)
 {
 	struct mptcp_pm_add_entry *entry = timer_container_of(entry, timer,
@@ -292,7 +313,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer)
 		goto out;
 	}
 
-	timeout = mptcp_get_add_addr_timeout(sock_net(sk));
+	timeout = mptcp_adjust_add_addr_timeout(msk);
 	if (!timeout)
 		goto out;
 
@@ -307,7 +328,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer)
 
 	if (entry->retrans_times < ADD_ADDR_RETRANS_MAX)
 		sk_reset_timer(sk, timer,
-			       jiffies + timeout);
+			       jiffies + (timeout << entry->retrans_times));
 
 	spin_unlock_bh(&msk->pm.lock);
 
@@ -348,7 +369,6 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
 {
 	struct mptcp_pm_add_entry *add_entry = NULL;
 	struct sock *sk = (struct sock *)msk;
-	struct net *net = sock_net(sk);
 	unsigned int timeout;
 
 	lockdep_assert_held(&msk->pm.lock);
@@ -374,7 +394,7 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
 
 	timer_setup(&add_entry->add_timer, mptcp_pm_add_timer, 0);
 reset_timer:
-	timeout = mptcp_get_add_addr_timeout(net);
+	timeout = mptcp_adjust_add_addr_timeout(msk);
 	if (timeout)
 		sk_reset_timer(sk, &add_entry->add_timer, jiffies + timeout);
 

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 3/6] selftests: mptcp: remove add_addr_timeout settings
  2025-09-01  9:39 [PATCH net-next 0/6] mptcp: misc. features for v6.18 Matthieu Baerts (NGI0)
  2025-09-01  9:39 ` [PATCH net-next 1/6] mptcp: use HMAC-SHA256 library instead of open-coded HMAC Matthieu Baerts (NGI0)
  2025-09-01  9:39 ` [PATCH net-next 2/6] mptcp: make ADD_ADDR retransmission timeout adaptive Matthieu Baerts (NGI0)
@ 2025-09-01  9:39 ` Matthieu Baerts (NGI0)
  2025-09-01  9:39 ` [PATCH net-next 4/6] selftests: mptcp: add checks for fallback counters Matthieu Baerts (NGI0)
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-09-01  9:39 UTC (permalink / raw)
  To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan
  Cc: netdev, linux-kernel, linux-doc, linux-kselftest,
	Matthieu Baerts (NGI0), Geliang Tang

From: Geliang Tang <tanggeliang@kylinos.cn>

Now that add_addr_timeout can be dynamically adjusted, there is no need
to set specific timeout values in the mptcp_join.sh tests. This patch
removes the explicit sysctl settings for net.mptcp.add_addr_timeout
from the test scripts.

The change simplifies the test setup and ensures the tests work with
the default or dynamically adjusted timeout values.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 82cae37d9c2026cc55466636d53a76f929a03452..e85bb62046e020dbacbbd44e1f9e110e1d0104c7 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -347,8 +347,6 @@ reset_with_add_addr_timeout()
 		tables="${ip6tables}"
 	fi
 
-	ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
-
 	if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
 			-m tcp --tcp-option 30 \
 			-m bpf --bytecode \
@@ -2183,7 +2181,6 @@ signal_address_tests()
 		pm_nl_add_endpoint $ns2 10.0.4.2 flags signal
 
 		# the peer could possibly miss some addr notification, allow retransmission
-		ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
 		speed=slow \
 			run_tests $ns1 $ns2 10.0.1.1
 

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 4/6] selftests: mptcp: add checks for fallback counters
  2025-09-01  9:39 [PATCH net-next 0/6] mptcp: misc. features for v6.18 Matthieu Baerts (NGI0)
                   ` (2 preceding siblings ...)
  2025-09-01  9:39 ` [PATCH net-next 3/6] selftests: mptcp: remove add_addr_timeout settings Matthieu Baerts (NGI0)
@ 2025-09-01  9:39 ` Matthieu Baerts (NGI0)
  2025-09-01  9:39 ` [PATCH net-next 5/6] net: Add rfs_needed() helper Matthieu Baerts (NGI0)
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-09-01  9:39 UTC (permalink / raw)
  To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan
  Cc: netdev, linux-kernel, linux-doc, linux-kselftest,
	Matthieu Baerts (NGI0), Gang Yan

From: Gang Yan <yangang@kylinos.cn>

Recently, some mib counters about fallback has been added, this patch
provides a method to check the expected behavior of these mib counters
during the test execution.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/571
Signed-off-by: Gang Yan <yangang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 123 ++++++++++++++++++++++++
 1 file changed, 123 insertions(+)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index e85bb62046e020dbacbbd44e1f9e110e1d0104c7..a97b568104bc284f050b2f0e09fe3fdd3341c5cb 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -74,6 +74,17 @@ unset join_create_err
 unset join_bind_err
 unset join_connect_err
 
+unset fb_ns1
+unset fb_ns2
+unset fb_infinite_map_tx
+unset fb_dss_corruption
+unset fb_simult_conn
+unset fb_mpc_passive
+unset fb_mpc_active
+unset fb_mpc_data
+unset fb_md5_sig
+unset fb_dss
+
 # generated using "nfbpf_compile '(ip && (ip[54] & 0xf0) == 0x30) ||
 #				  (ip6 && (ip6[74] & 0xf0) == 0x30)'"
 CBPF_MPTCP_SUBOPTION_ADD_ADDR="14,
@@ -1397,6 +1408,115 @@ chk_join_tx_nr()
 	print_results "join Tx" ${rc}
 }
 
+chk_fallback_nr()
+{
+	local infinite_map_tx=${fb_infinite_map_tx:-0}
+	local dss_corruption=${fb_dss_corruption:-0}
+	local simult_conn=${fb_simult_conn:-0}
+	local mpc_passive=${fb_mpc_passive:-0}
+	local mpc_active=${fb_mpc_active:-0}
+	local mpc_data=${fb_mpc_data:-0}
+	local md5_sig=${fb_md5_sig:-0}
+	local dss=${fb_dss:-0}
+	local rc=${KSFT_PASS}
+	local ns=$1
+	local count
+
+	count=$(mptcp_lib_get_counter ${!ns} "MPTcpExtInfiniteMapTx")
+	if [ -z "$count" ]; then
+		rc=${KSFT_SKIP}
+	elif [ "$count" != "$infinite_map_tx" ]; then
+		rc=${KSFT_FAIL}
+		print_check "$ns infinite map tx fallback"
+		fail_test "got $count infinite map tx fallback[s] in $ns expected $infinite_map_tx"
+	fi
+
+	count=$(mptcp_lib_get_counter ${!ns} "MPTcpExtDSSCorruptionFallback")
+	if [ -z "$count" ]; then
+		rc=${KSFT_SKIP}
+	elif [ "$count" != "$dss_corruption" ]; then
+		rc=${KSFT_FAIL}
+		print_check "$ns dss corruption fallback"
+		fail_test "got $count dss corruption fallback[s] in $ns expected $dss_corruption"
+	fi
+
+	count=$(mptcp_lib_get_counter ${!ns} "MPTcpExtSimultConnectFallback")
+	if [ -z "$count" ]; then
+		rc=${KSFT_SKIP}
+	elif [ "$count" != "$simult_conn" ]; then
+		rc=${KSFT_FAIL}
+		print_check "$ns simult conn fallback"
+		fail_test "got $count simult conn fallback[s] in $ns expected $simult_conn"
+	fi
+
+	count=$(mptcp_lib_get_counter ${!ns} "MPTcpExtMPCapableFallbackACK")
+	if [ -z "$count" ]; then
+		rc=${KSFT_SKIP}
+	elif [ "$count" != "$mpc_passive" ]; then
+		rc=${KSFT_FAIL}
+		print_check "$ns mpc passive fallback"
+		fail_test "got $count mpc passive fallback[s] in $ns expected $mpc_passive"
+	fi
+
+	count=$(mptcp_lib_get_counter ${!ns} "MPTcpExtMPCapableFallbackSYNACK")
+	if [ -z "$count" ]; then
+		rc=${KSFT_SKIP}
+	elif [ "$count" != "$mpc_active" ]; then
+		rc=${KSFT_FAIL}
+		print_check "$ns mpc active fallback"
+		fail_test "got $count mpc active fallback[s] in $ns expected $mpc_active"
+	fi
+
+	count=$(mptcp_lib_get_counter ${!ns} "MPTcpExtMPCapableDataFallback")
+	if [ -z "$count" ]; then
+		rc=${KSFT_SKIP}
+	elif [ "$count" != "$mpc_data" ]; then
+		rc=${KSFT_FAIL}
+		print_check "$ns mpc data fallback"
+		fail_test "got $count mpc data fallback[s] in $ns expected $mpc_data"
+	fi
+
+	count=$(mptcp_lib_get_counter ${!ns} "MPTcpExtMD5SigFallback")
+	if [ -z "$count" ]; then
+		rc=${KSFT_SKIP}
+	elif [ "$count" != "$md5_sig" ]; then
+		rc=${KSFT_FAIL}
+		print_check "$ns MD5 Sig fallback"
+		fail_test "got $count MD5 Sig fallback[s] in $ns expected $md5_sig"
+	fi
+
+	count=$(mptcp_lib_get_counter ${!ns} "MPTcpExtDssFallback")
+	if [ -z "$count" ]; then
+		rc=${KSFT_SKIP}
+	elif [ "$count" != "$dss" ]; then
+		rc=${KSFT_FAIL}
+		print_check "$ns dss fallback"
+		fail_test "got $count dss fallback[s] in $ns expected $dss"
+	fi
+
+	return $rc
+}
+
+chk_fallback_nr_all()
+{
+	local netns=("ns1" "ns2")
+	local fb_ns=("fb_ns1" "fb_ns2")
+	local rc=${KSFT_PASS}
+
+	for i in 0 1; do
+		if [ -n "${!fb_ns[i]}" ]; then
+			eval "${!fb_ns[i]}" \
+				chk_fallback_nr ${netns[i]} || rc=${?}
+		else
+			chk_fallback_nr ${netns[i]} || rc=${?}
+		fi
+	done
+
+	if [ "${rc}" != "${KSFT_PASS}" ]; then
+		print_results "fallback" ${rc}
+	fi
+}
+
 chk_join_nr()
 {
 	local syn_nr=$1
@@ -1482,6 +1602,8 @@ chk_join_nr()
 	join_syn_tx="${join_syn_tx:-${syn_nr}}" \
 		chk_join_tx_nr
 
+	chk_fallback_nr_all
+
 	if $validate_checksum; then
 		chk_csum_nr $csum_ns1 $csum_ns2
 		chk_fail_nr $fail_nr $fail_nr
@@ -3334,6 +3456,7 @@ fail_tests()
 		join_csum_ns1=+1 join_csum_ns2=+0 \
 			join_fail_nr=1 join_rst_nr=0 join_infi_nr=1 \
 			join_corrupted_pkts="$(pedit_action_pkts)" \
+			fb_ns1="fb_dss=1" fb_ns2="fb_infinite_map_tx=1" \
 			chk_join_nr 0 0 0
 		chk_fail_nr 1 -1 invert
 	fi

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 5/6] net: Add rfs_needed() helper
  2025-09-01  9:39 [PATCH net-next 0/6] mptcp: misc. features for v6.18 Matthieu Baerts (NGI0)
                   ` (3 preceding siblings ...)
  2025-09-01  9:39 ` [PATCH net-next 4/6] selftests: mptcp: add checks for fallback counters Matthieu Baerts (NGI0)
@ 2025-09-01  9:39 ` Matthieu Baerts (NGI0)
  2025-09-01  9:39 ` [PATCH net-next 6/6] mptcp: record subflows in RPS table Matthieu Baerts (NGI0)
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-09-01  9:39 UTC (permalink / raw)
  To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan
  Cc: netdev, linux-kernel, linux-doc, linux-kselftest,
	Matthieu Baerts (NGI0), Christoph Paasch

From: Christoph Paasch <cpaasch@openai.com>

Add a helper to check if RFS is needed or not. Allows to make the code a
bit cleaner and the next patch to have MPTCP use this helper to decide
whether or not to iterate over the subflows.

tun_flow_update() was calling sock_rps_record_flow_hash() regardless of
the state of rfs_needed. This was not really a bug as sock_flow_table
simply ends up being NULL and thus everything will be fine.
This commit here thus also implicitly makes tun_flow_update() respect
the state of rfs_needed.

Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 include/net/rps.h | 85 ++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 56 insertions(+), 29 deletions(-)

diff --git a/include/net/rps.h b/include/net/rps.h
index 9917dce42ca457e9c25d9e84ee450235f771d09b..f1794cd2e7fb32a36bde9959fab651663ab190fd 100644
--- a/include/net/rps.h
+++ b/include/net/rps.h
@@ -85,11 +85,8 @@ static inline void rps_record_sock_flow(struct rps_sock_flow_table *table,
 		WRITE_ONCE(table->ents[index], val);
 }
 
-#endif /* CONFIG_RPS */
-
-static inline void sock_rps_record_flow_hash(__u32 hash)
+static inline void _sock_rps_record_flow_hash(__u32 hash)
 {
-#ifdef CONFIG_RPS
 	struct rps_sock_flow_table *sock_flow_table;
 
 	if (!hash)
@@ -99,42 +96,33 @@ static inline void sock_rps_record_flow_hash(__u32 hash)
 	if (sock_flow_table)
 		rps_record_sock_flow(sock_flow_table, hash);
 	rcu_read_unlock();
-#endif
 }
 
-static inline void sock_rps_record_flow(const struct sock *sk)
+static inline void _sock_rps_record_flow(const struct sock *sk)
 {
-#ifdef CONFIG_RPS
-	if (static_branch_unlikely(&rfs_needed)) {
-		/* Reading sk->sk_rxhash might incur an expensive cache line
-		 * miss.
-		 *
-		 * TCP_ESTABLISHED does cover almost all states where RFS
-		 * might be useful, and is cheaper [1] than testing :
-		 *	IPv4: inet_sk(sk)->inet_daddr
-		 * 	IPv6: ipv6_addr_any(&sk->sk_v6_daddr)
-		 * OR	an additional socket flag
-		 * [1] : sk_state and sk_prot are in the same cache line.
+	/* Reading sk->sk_rxhash might incur an expensive cache line
+	 * miss.
+	 *
+	 * TCP_ESTABLISHED does cover almost all states where RFS
+	 * might be useful, and is cheaper [1] than testing :
+	 *	IPv4: inet_sk(sk)->inet_daddr
+	 *	IPv6: ipv6_addr_any(&sk->sk_v6_daddr)
+	 * OR	an additional socket flag
+	 * [1] : sk_state and sk_prot are in the same cache line.
+	 */
+	if (sk->sk_state == TCP_ESTABLISHED) {
+		/* This READ_ONCE() is paired with the WRITE_ONCE()
+		 * from sock_rps_save_rxhash() and sock_rps_reset_rxhash().
 		 */
-		if (sk->sk_state == TCP_ESTABLISHED) {
-			/* This READ_ONCE() is paired with the WRITE_ONCE()
-			 * from sock_rps_save_rxhash() and sock_rps_reset_rxhash().
-			 */
-			sock_rps_record_flow_hash(READ_ONCE(sk->sk_rxhash));
-		}
+		_sock_rps_record_flow_hash(READ_ONCE(sk->sk_rxhash));
 	}
-#endif
 }
 
-static inline void sock_rps_delete_flow(const struct sock *sk)
+static inline void _sock_rps_delete_flow(const struct sock *sk)
 {
-#ifdef CONFIG_RPS
 	struct rps_sock_flow_table *table;
 	u32 hash, index;
 
-	if (!static_branch_unlikely(&rfs_needed))
-		return;
-
 	hash = READ_ONCE(sk->sk_rxhash);
 	if (!hash)
 		return;
@@ -147,6 +135,45 @@ static inline void sock_rps_delete_flow(const struct sock *sk)
 			WRITE_ONCE(table->ents[index], RPS_NO_CPU);
 	}
 	rcu_read_unlock();
+}
+#endif /* CONFIG_RPS */
+
+static inline bool rfs_is_needed(void)
+{
+#ifdef CONFIG_RPS
+	return static_branch_unlikely(&rfs_needed);
+#else
+	return false;
+#endif
+}
+
+static inline void sock_rps_record_flow_hash(__u32 hash)
+{
+#ifdef CONFIG_RPS
+	if (!rfs_is_needed())
+		return;
+
+	_sock_rps_record_flow_hash(hash);
+#endif
+}
+
+static inline void sock_rps_record_flow(const struct sock *sk)
+{
+#ifdef CONFIG_RPS
+	if (!rfs_is_needed())
+		return;
+
+	_sock_rps_record_flow(sk);
+#endif
+}
+
+static inline void sock_rps_delete_flow(const struct sock *sk)
+{
+#ifdef CONFIG_RPS
+	if (!rfs_is_needed())
+		return;
+
+	_sock_rps_delete_flow(sk);
 #endif
 }
 

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next 6/6] mptcp: record subflows in RPS table
  2025-09-01  9:39 [PATCH net-next 0/6] mptcp: misc. features for v6.18 Matthieu Baerts (NGI0)
                   ` (4 preceding siblings ...)
  2025-09-01  9:39 ` [PATCH net-next 5/6] net: Add rfs_needed() helper Matthieu Baerts (NGI0)
@ 2025-09-01  9:39 ` Matthieu Baerts (NGI0)
  2025-09-02 14:26 ` [PATCH net-next 0/6] mptcp: misc. features for v6.18 Jakub Kicinski
  2025-09-02 14:29 ` Matthieu Baerts
  7 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2025-09-01  9:39 UTC (permalink / raw)
  To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan
  Cc: netdev, linux-kernel, linux-doc, linux-kselftest,
	Matthieu Baerts (NGI0), Christoph Paasch

From: Christoph Paasch <cpaasch@openai.com>

Accelerated Receive Flow Steering (aRFS) relies on sockets recording
their RX flow hash into the rps_sock_flow_table so that incoming packets
are steered to the CPU where the application runs.

With MPTCP, the application interacts with the parent MPTCP socket while
data is carried over per-subflow TCP sockets. Without recording these
subflows, aRFS cannot steer interrupts and RX processing for the flows
to the desired CPU.

Record all subflows in the RPS table by calling sock_rps_record_flow()
for each subflow at the start of mptcp_sendmsg(), mptcp_recvmsg() and
mptcp_stream_accept(), by using the new helper
mptcp_rps_record_subflows().

It does not by itself improve throughput, but ensures that IRQ and RX
processing are directed to the right CPU, which is a
prerequisite for effective aRFS.

Signed-off-by: Christoph Paasch <cpaasch@openai.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 net/mptcp/protocol.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index ad41c48126e44fda646f1ec1c81957db1407a6cc..a8d57b88578dfea807d3d55e430849aa8005c637 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -12,6 +12,7 @@
 #include <linux/sched/signal.h>
 #include <linux/atomic.h>
 #include <net/aligned_data.h>
+#include <net/rps.h>
 #include <net/sock.h>
 #include <net/inet_common.h>
 #include <net/inet_hashtables.h>
@@ -1740,6 +1741,20 @@ static u32 mptcp_send_limit(const struct sock *sk)
 	return limit - not_sent;
 }
 
+static void mptcp_rps_record_subflows(const struct mptcp_sock *msk)
+{
+	struct mptcp_subflow_context *subflow;
+
+	if (!rfs_is_needed())
+		return;
+
+	mptcp_for_each_subflow(msk, subflow) {
+		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+
+		sock_rps_record_flow(ssk);
+	}
+}
+
 static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
@@ -1753,6 +1768,8 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	lock_sock(sk);
 
+	mptcp_rps_record_subflows(msk);
+
 	if (unlikely(inet_test_bit(DEFER_CONNECT, sk) ||
 		     msg->msg_flags & MSG_FASTOPEN)) {
 		int copied_syn = 0;
@@ -2131,6 +2148,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		goto out_err;
 	}
 
+	mptcp_rps_record_subflows(msk);
+
 	timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
 
 	len = min_t(size_t, len, INT_MAX);
@@ -3922,6 +3941,8 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
 				mptcp_sock_graft(ssk, newsock);
 		}
 
+		mptcp_rps_record_subflows(msk);
+
 		/* Do late cleanup for the first subflow as necessary. Also
 		 * deal with bad peers not doing a complete shutdown.
 		 */

-- 
2.50.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-01  9:39 [PATCH net-next 0/6] mptcp: misc. features for v6.18 Matthieu Baerts (NGI0)
                   ` (5 preceding siblings ...)
  2025-09-01  9:39 ` [PATCH net-next 6/6] mptcp: record subflows in RPS table Matthieu Baerts (NGI0)
@ 2025-09-02 14:26 ` Jakub Kicinski
  2025-09-02 14:51   ` Matthieu Baerts
  2025-09-02 14:29 ` Matthieu Baerts
  7 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-09-02 14:26 UTC (permalink / raw)
  To: Matthieu Baerts (NGI0)
  Cc: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, netdev,
	linux-kernel, linux-doc, linux-kselftest, Eric Biggers,
	Christoph Paasch, Gang Yan

On Mon, 01 Sep 2025 11:39:09 +0200 Matthieu Baerts (NGI0) wrote:
> This series contains 4 independent new features:
> 
> - Patch 1: use HMAC-SHA256 library instead of open-coded HMAC.
> 
> - Patches 2-3: make ADD_ADDR retransmission timeout adaptive + simplify
>   selftests.
> 
> - Patch 4: selftests: check for unexpected fallback counter increments.
> 
> - Patches 5-6: record subflows in RPS table, for aRFS support.

I don't see why, but kmemleak started to hit this with the join test
2 branches ago :\ Have you seen any kmemleak issues on your side?
We also see occasional leaked skb in driver tests which makes no sense.

unreferenced object 0xffff8880029d3340 (size 3016):
  comm "softirq", pid 0, jiffies 4297316940
  hex dump (first 32 bytes):
    0a 00 01 02 0a 00 01 01 00 00 00 00 9e b8 7d 27  ..............}'
    0a 00 07 41 00 00 00 00 00 00 00 00 00 00 00 00  ...A............
  backtrace (crc 3653d88c):
    kmem_cache_alloc_noprof+0x284/0x330
    sk_prot_alloc.constprop.0+0x4e/0x1b0
    sk_clone_lock+0x4b/0x10d0
    mptcp_sk_clone_init+0x2e/0x10d0
    subflow_syn_recv_sock+0x9d1/0x1680
    tcp_check_req+0x3a4/0x1910
    tcp_v4_rcv+0x1004/0x30a0
    ip_protocol_deliver_rcu+0x82/0x350
    ip_local_deliver_finish+0x35d/0x620
    ip_local_deliver+0x19c/0x470
    ip_rcv+0xc2/0x370
    __netif_receive_skb_one_core+0x108/0x180
    process_backlog+0x3c1/0x13e0
    __napi_poll.constprop.0+0x9f/0x460
    net_rx_action+0x54f/0xda0
    handle_softirqs+0x215/0x610

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-01  9:39 [PATCH net-next 0/6] mptcp: misc. features for v6.18 Matthieu Baerts (NGI0)
                   ` (6 preceding siblings ...)
  2025-09-02 14:26 ` [PATCH net-next 0/6] mptcp: misc. features for v6.18 Jakub Kicinski
@ 2025-09-02 14:29 ` Matthieu Baerts
  2025-09-02 19:09   ` Jakub Kicinski
  7 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts @ 2025-09-02 14:29 UTC (permalink / raw)
  To: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, Jonathan Corbet,
	Shuah Khan
  Cc: netdev, linux-kernel, linux-doc, linux-kselftest, Eric Biggers,
	Christoph Paasch, Gang Yan

Hello,

On 01/09/2025 11:39, Matthieu Baerts (NGI0) wrote:
> This series contains 4 independent new features:
> 
> - Patch 1: use HMAC-SHA256 library instead of open-coded HMAC.
> 
> - Patches 2-3: make ADD_ADDR retransmission timeout adaptive + simplify
>   selftests.

I just noticed that NIPA reported some issues due to these 2 patches. In
short, some packets (MPTCP ADD_ADDR notifications) can now be
retransmitted quicker, but some tests check MIB counters and don't
expect retransmissions. If the environment is a bit slow, it is possible
to have more retransmissions. We should adapt the tests to avoid false
positives.

Is it possible to drop just these two patches? Or do you prefer to mark
the whole series as "Changes requested"?

> - Patch 4: selftests: check for unexpected fallback counter increments.
> 
> - Patches 5-6: record subflows in RPS table, for aRFS support.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-02 14:26 ` [PATCH net-next 0/6] mptcp: misc. features for v6.18 Jakub Kicinski
@ 2025-09-02 14:51   ` Matthieu Baerts
  2025-09-02 15:27     ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts @ 2025-09-02 14:51 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, netdev,
	linux-kernel, linux-doc, linux-kselftest, Eric Biggers,
	Christoph Paasch, Gang Yan

Hi Jakub,

On 02/09/2025 16:26, Jakub Kicinski wrote:
> On Mon, 01 Sep 2025 11:39:09 +0200 Matthieu Baerts (NGI0) wrote:
>> This series contains 4 independent new features:
>>
>> - Patch 1: use HMAC-SHA256 library instead of open-coded HMAC.
>>
>> - Patches 2-3: make ADD_ADDR retransmission timeout adaptive + simplify
>>   selftests.
>>
>> - Patch 4: selftests: check for unexpected fallback counter increments.
>>
>> - Patches 5-6: record subflows in RPS table, for aRFS support.
> 
> I don't see why, but kmemleak started to hit this with the join test
> 2 branches ago :\ Have you seen any kmemleak issues on your side?
> We also see occasional leaked skb in driver tests which makes no sense.
> 
> unreferenced object 0xffff8880029d3340 (size 3016):
>   comm "softirq", pid 0, jiffies 4297316940
>   hex dump (first 32 bytes):
>     0a 00 01 02 0a 00 01 01 00 00 00 00 9e b8 7d 27  ..............}'
>     0a 00 07 41 00 00 00 00 00 00 00 00 00 00 00 00  ...A............
>   backtrace (crc 3653d88c):
>     kmem_cache_alloc_noprof+0x284/0x330
>     sk_prot_alloc.constprop.0+0x4e/0x1b0
>     sk_clone_lock+0x4b/0x10d0
>     mptcp_sk_clone_init+0x2e/0x10d0
>     subflow_syn_recv_sock+0x9d1/0x1680
>     tcp_check_req+0x3a4/0x1910
>     tcp_v4_rcv+0x1004/0x30a0
>     ip_protocol_deliver_rcu+0x82/0x350
>     ip_local_deliver_finish+0x35d/0x620
>     ip_local_deliver+0x19c/0x470
>     ip_rcv+0xc2/0x370
>     __netif_receive_skb_one_core+0x108/0x180
>     process_backlog+0x3c1/0x13e0
>     __napi_poll.constprop.0+0x9f/0x460
>     net_rx_action+0x54f/0xda0
>     handle_softirqs+0x215/0x610

Thank you for this notification!

No, I didn't notice that on our side. For KMemLeak, now I'm waiting 5
seconds, then I force the scan, and check for issues once. On NIPA, I
see that there are still 2 scans + cat, and apparently, the issue was
always visible during the 2nd scan:


https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/279881/1-mptcp-join-sh/stdout

https://netdev-3.bots.linux.dev/vmksft-mptcp-dbg/results/280062/1-mptcp-join-sh/stdout

It is unclear why a second scan is needed and only the second one caught
something. Was it the same with the strange issues you mentioned in
driver tests? Do you think I should re-add the second scan + cat?

When looking at the modifications of this series, it is unclear what
could cause that.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-02 14:51   ` Matthieu Baerts
@ 2025-09-02 15:27     ` Jakub Kicinski
  2025-09-02 18:25       ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-09-02 15:27 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, netdev,
	linux-kernel, linux-doc, linux-kselftest, Eric Biggers,
	Christoph Paasch, Gang Yan, Catalin Marinas

On Tue, 2 Sep 2025 16:51:47 +0200 Matthieu Baerts wrote:
> It is unclear why a second scan is needed and only the second one caught
> something. Was it the same with the strange issues you mentioned in
> driver tests? Do you think I should re-add the second scan + cat?

Not sure, cc: Catalin, from experience it seems like second scan often
surfaces issues the first scan missed.

> When looking at the modifications of this series, it is unclear what
> could cause that.

Yes, I don't think it's related to the series. For one thing the series
a couple of before the first report.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-02 15:27     ` Jakub Kicinski
@ 2025-09-02 18:25       ` Catalin Marinas
  2025-09-02 18:50         ` Matthieu Baerts
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-09-02 18:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Matthieu Baerts, mptcp, Mat Martineau, Geliang Tang,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Shuah Khan, netdev, linux-kernel, linux-doc,
	linux-kselftest, Eric Biggers, Christoph Paasch, Gang Yan

On Tue, Sep 02, 2025 at 08:27:59AM -0700, Jakub Kicinski wrote:
> On Tue, 2 Sep 2025 16:51:47 +0200 Matthieu Baerts wrote:
> > It is unclear why a second scan is needed and only the second one caught
> > something. Was it the same with the strange issues you mentioned in
> > driver tests? Do you think I should re-add the second scan + cat?
> 
> Not sure, cc: Catalin, from experience it seems like second scan often
> surfaces issues the first scan missed.

It's some of the kmemleak heuristics to reduce false positives. It does
a checksum of the object during scanning and only reports a leak if the
checksum is the same in two consecutive scans.

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-02 18:25       ` Catalin Marinas
@ 2025-09-02 18:50         ` Matthieu Baerts
  2025-09-02 21:18           ` Catalin Marinas
  0 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts @ 2025-09-02 18:50 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Jakub Kicinski, mptcp, Mat Martineau, Geliang Tang,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Shuah Khan, netdev, linux-kernel, linux-doc,
	linux-kselftest, Eric Biggers, Christoph Paasch, Gang Yan

Hi Catalin,

2 Sept 2025 20:25:19 Catalin Marinas <catalin.marinas@arm.com>:

> On Tue, Sep 02, 2025 at 08:27:59AM -0700, Jakub Kicinski wrote:
>> On Tue, 2 Sep 2025 16:51:47 +0200 Matthieu Baerts wrote:
>>> It is unclear why a second scan is needed and only the second one caught
>>> something. Was it the same with the strange issues you mentioned in
>>> driver tests? Do you think I should re-add the second scan + cat?
>>
>> Not sure, cc: Catalin, from experience it seems like second scan often
>> surfaces issues the first scan missed.
>
> It's some of the kmemleak heuristics to reduce false positives. It does
> a checksum of the object during scanning and only reports a leak if the
> checksum is the same in two consecutive scans.

Thank you for the explanation!

Does that mean a scan should be triggered at the end of the tests,
then wait 5 second for the grace period, then trigger another scan
and check the results?

Or wait 5 seconds, then trigger two consecutive scans?

Cheers,
Matt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-02 14:29 ` Matthieu Baerts
@ 2025-09-02 19:09   ` Jakub Kicinski
  2025-09-02 19:25     ` Matthieu Baerts
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2025-09-02 19:09 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, netdev,
	linux-kernel, linux-doc, linux-kselftest, Eric Biggers,
	Christoph Paasch, Gang Yan

On Tue, 2 Sep 2025 16:29:33 +0200 Matthieu Baerts wrote:
> On 01/09/2025 11:39, Matthieu Baerts (NGI0) wrote:
> > This series contains 4 independent new features:
> > 
> > - Patch 1: use HMAC-SHA256 library instead of open-coded HMAC.
> > 
> > - Patches 2-3: make ADD_ADDR retransmission timeout adaptive + simplify
> >   selftests.  
> 
> I just noticed that NIPA reported some issues due to these 2 patches. In
> short, some packets (MPTCP ADD_ADDR notifications) can now be
> retransmitted quicker, but some tests check MIB counters and don't
> expect retransmissions. If the environment is a bit slow, it is possible
> to have more retransmissions. We should adapt the tests to avoid false
> positives.
> 
> Is it possible to drop just these two patches? Or do you prefer to mark
> the whole series as "Changes requested"?

Your call, we can also apply as is. mptcp-join is ignored, anyway.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-02 19:09   ` Jakub Kicinski
@ 2025-09-02 19:25     ` Matthieu Baerts
  2025-09-02 20:53       ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts @ 2025-09-02 19:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, netdev,
	linux-kernel, linux-doc, linux-kselftest, Eric Biggers,
	Christoph Paasch, Gang Yan

2 Sept 2025 21:09:36 Jakub Kicinski <kuba@kernel.org>:

> On Tue, 2 Sep 2025 16:29:33 +0200 Matthieu Baerts wrote:
>> On 01/09/2025 11:39, Matthieu Baerts (NGI0) wrote:
>>> This series contains 4 independent new features:
>>>
>>> - Patch 1: use HMAC-SHA256 library instead of open-coded HMAC.
>>>
>>> - Patches 2-3: make ADD_ADDR retransmission timeout adaptive + simplify
>>>   selftests. 
>>
>> I just noticed that NIPA reported some issues due to these 2 patches. In
>> short, some packets (MPTCP ADD_ADDR notifications) can now be
>> retransmitted quicker, but some tests check MIB counters and don't
>> expect retransmissions. If the environment is a bit slow, it is possible
>> to have more retransmissions. We should adapt the tests to avoid false
>> positives.
>>
>> Is it possible to drop just these two patches? Or do you prefer to mark
>> the whole series as "Changes requested"?
>
> Your call, we can also apply as is. mptcp-join is ignored, anyway.

I realised patch 3/6 is going to cause issues when running on older
kernels, so we would need to revert it if we want to apply all patches.

But if you prefer a v2 for the whole series instead of applying 1,4-6,
I can also do that :)

Cheers,
Matt

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-02 19:25     ` Matthieu Baerts
@ 2025-09-02 20:53       ` Jakub Kicinski
  0 siblings, 0 replies; 19+ messages in thread
From: Jakub Kicinski @ 2025-09-02 20:53 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: mptcp, Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan, netdev,
	linux-kernel, linux-doc, linux-kselftest, Eric Biggers,
	Christoph Paasch, Gang Yan

On Tue, 2 Sep 2025 21:25:33 +0200 (GMT+02:00) Matthieu Baerts wrote:
> >> I just noticed that NIPA reported some issues due to these 2 patches. In
> >> short, some packets (MPTCP ADD_ADDR notifications) can now be
> >> retransmitted quicker, but some tests check MIB counters and don't
> >> expect retransmissions. If the environment is a bit slow, it is possible
> >> to have more retransmissions. We should adapt the tests to avoid false
> >> positives.
> >>
> >> Is it possible to drop just these two patches? Or do you prefer to mark
> >> the whole series as "Changes requested"?  
> >
> > Your call, we can also apply as is. mptcp-join is ignored, anyway.  
> 
> I realised patch 3/6 is going to cause issues when running on older
> kernels, so we would need to revert it if we want to apply all patches.
> 
> But if you prefer a v2 for the whole series instead of applying 1,4-6,
> I can also do that :)

Alright, please send a v2, then. Sorry for the flip-flop.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-02 18:50         ` Matthieu Baerts
@ 2025-09-02 21:18           ` Catalin Marinas
  2025-09-02 21:38             ` Matthieu Baerts
  0 siblings, 1 reply; 19+ messages in thread
From: Catalin Marinas @ 2025-09-02 21:18 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Jakub Kicinski, mptcp, Mat Martineau, Geliang Tang,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Shuah Khan, netdev, linux-kernel, linux-doc,
	linux-kselftest, Eric Biggers, Christoph Paasch, Gang Yan

On Tue, Sep 02, 2025 at 08:50:19PM +0200, Matthieu Baerts wrote:
> Hi Catalin,
> 
> 2 Sept 2025 20:25:19 Catalin Marinas <catalin.marinas@arm.com>:
> 
> > On Tue, Sep 02, 2025 at 08:27:59AM -0700, Jakub Kicinski wrote:
> >> On Tue, 2 Sep 2025 16:51:47 +0200 Matthieu Baerts wrote:
> >>> It is unclear why a second scan is needed and only the second one caught
> >>> something. Was it the same with the strange issues you mentioned in
> >>> driver tests? Do you think I should re-add the second scan + cat?
> >>
> >> Not sure, cc: Catalin, from experience it seems like second scan often
> >> surfaces issues the first scan missed.
> >
> > It's some of the kmemleak heuristics to reduce false positives. It does
> > a checksum of the object during scanning and only reports a leak if the
> > checksum is the same in two consecutive scans.
> 
> Thank you for the explanation!
> 
> Does that mean a scan should be triggered at the end of the tests,
> then wait 5 second for the grace period, then trigger another scan
> and check the results?
> 
> Or wait 5 seconds, then trigger two consecutive scans?

The 5 seconds is the minimum age of an object before it gets reported as
a leak. It's not related to the scanning process. So you could do two
scans in succession and wait 5 seconds before checking for leaks.

However, I'd go with the first option - do a scan, wait 5 seconds and do
another. That's mostly because at the end of the scan kmemleak prints if
it found new unreferenced objects. It might not print the message if a
leaked object is younger than 5 seconds. In practice, though, the scan
may take longer, depending on how loaded your system is.

The second option works as well but waiting between them has a better
chance of removing false positives if, say, some objects are moved
between lists and two consecutive scans do not detect the list_head
change (and update the object's checksum).

-- 
Catalin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-02 21:18           ` Catalin Marinas
@ 2025-09-02 21:38             ` Matthieu Baerts
  2025-09-02 22:21               ` Christoph Paasch
  0 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts @ 2025-09-02 21:38 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Jakub Kicinski, mptcp, Mat Martineau, Geliang Tang,
	David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Jonathan Corbet, Shuah Khan, netdev, linux-kernel, linux-doc,
	linux-kselftest, Eric Biggers, Christoph Paasch, Gang Yan

2 Sept 2025 23:18:56 Catalin Marinas <catalin.marinas@arm.com>:

> On Tue, Sep 02, 2025 at 08:50:19PM +0200, Matthieu Baerts wrote:
>> Hi Catalin,
>>
>> 2 Sept 2025 20:25:19 Catalin Marinas <catalin.marinas@arm.com>:
>>
>>> On Tue, Sep 02, 2025 at 08:27:59AM -0700, Jakub Kicinski wrote:
>>>> On Tue, 2 Sep 2025 16:51:47 +0200 Matthieu Baerts wrote:
>>>>> It is unclear why a second scan is needed and only the second one caught
>>>>> something. Was it the same with the strange issues you mentioned in
>>>>> driver tests? Do you think I should re-add the second scan + cat?
>>>>
>>>> Not sure, cc: Catalin, from experience it seems like second scan often
>>>> surfaces issues the first scan missed.
>>>
>>> It's some of the kmemleak heuristics to reduce false positives. It does
>>> a checksum of the object during scanning and only reports a leak if the
>>> checksum is the same in two consecutive scans.
>>
>> Thank you for the explanation!
>>
>> Does that mean a scan should be triggered at the end of the tests,
>> then wait 5 second for the grace period, then trigger another scan
>> and check the results?
>>
>> Or wait 5 seconds, then trigger two consecutive scans?
>
> The 5 seconds is the minimum age of an object before it gets reported as
> a leak. It's not related to the scanning process. So you could do two
> scans in succession and wait 5 seconds before checking for leaks.
>
> However, I'd go with the first option - do a scan, wait 5 seconds and do
> another. That's mostly because at the end of the scan kmemleak prints if
> it found new unreferenced objects. It might not print the message if a
> leaked object is younger than 5 seconds. In practice, though, the scan
> may take longer, depending on how loaded your system is.
>
> The second option works as well but waiting between them has a better
> chance of removing false positives if, say, some objects are moved
> between lists and two consecutive scans do not detect the list_head
> change (and update the object's checksum).

Thank you for this very nice reply, that's very clear!

I will then adapt our CI having CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF
to do a manual scan at the very end, wait 5 seconds and do another.

Cheers,
Matt


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next 0/6] mptcp: misc. features for v6.18
  2025-09-02 21:38             ` Matthieu Baerts
@ 2025-09-02 22:21               ` Christoph Paasch
  0 siblings, 0 replies; 19+ messages in thread
From: Christoph Paasch @ 2025-09-02 22:21 UTC (permalink / raw)
  To: Matthieu Baerts
  Cc: Catalin Marinas, Jakub Kicinski, mptcp, Mat Martineau,
	Geliang Tang, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Jonathan Corbet, Shuah Khan, netdev, linux-kernel,
	linux-doc, linux-kselftest, Eric Biggers, Gang Yan

On Tue, Sep 2, 2025 at 2:38 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>
> 2 Sept 2025 23:18:56 Catalin Marinas <catalin.marinas@arm.com>:
>
> > On Tue, Sep 02, 2025 at 08:50:19PM +0200, Matthieu Baerts wrote:
> >> Hi Catalin,
> >>
> >> 2 Sept 2025 20:25:19 Catalin Marinas <catalin.marinas@arm.com>:
> >>
> >>> On Tue, Sep 02, 2025 at 08:27:59AM -0700, Jakub Kicinski wrote:
> >>>> On Tue, 2 Sep 2025 16:51:47 +0200 Matthieu Baerts wrote:
> >>>>> It is unclear why a second scan is needed and only the second one caught
> >>>>> something. Was it the same with the strange issues you mentioned in
> >>>>> driver tests? Do you think I should re-add the second scan + cat?
> >>>>
> >>>> Not sure, cc: Catalin, from experience it seems like second scan often
> >>>> surfaces issues the first scan missed.
> >>>
> >>> It's some of the kmemleak heuristics to reduce false positives. It does
> >>> a checksum of the object during scanning and only reports a leak if the
> >>> checksum is the same in two consecutive scans.
> >>
> >> Thank you for the explanation!
> >>
> >> Does that mean a scan should be triggered at the end of the tests,
> >> then wait 5 second for the grace period, then trigger another scan
> >> and check the results?
> >>
> >> Or wait 5 seconds, then trigger two consecutive scans?
> >
> > The 5 seconds is the minimum age of an object before it gets reported as
> > a leak. It's not related to the scanning process. So you could do two
> > scans in succession and wait 5 seconds before checking for leaks.
> >
> > However, I'd go with the first option - do a scan, wait 5 seconds and do
> > another. That's mostly because at the end of the scan kmemleak prints if
> > it found new unreferenced objects. It might not print the message if a
> > leaked object is younger than 5 seconds. In practice, though, the scan
> > may take longer, depending on how loaded your system is.
> >
> > The second option works as well but waiting between them has a better
> > chance of removing false positives if, say, some objects are moved
> > between lists and two consecutive scans do not detect the list_head
> > change (and update the object's checksum).
>
> Thank you for this very nice reply, that's very clear!
>
> I will then adapt our CI having CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF
> to do a manual scan at the very end, wait 5 seconds and do another.

FWIW - I am able to pretty reliably reproduce the kmemleak. However, I
also tried adding an inline kmemleak scan to the test harness (did it
once with, once without a sleep). When I do that the kmemleak
disappears :-)

(not saying that adding the scan isn't useful, just pointing out that
this particular leak seems to be related to how quickly we iterate
over the testcases)


Christoph

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-09-02 22:21 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-01  9:39 [PATCH net-next 0/6] mptcp: misc. features for v6.18 Matthieu Baerts (NGI0)
2025-09-01  9:39 ` [PATCH net-next 1/6] mptcp: use HMAC-SHA256 library instead of open-coded HMAC Matthieu Baerts (NGI0)
2025-09-01  9:39 ` [PATCH net-next 2/6] mptcp: make ADD_ADDR retransmission timeout adaptive Matthieu Baerts (NGI0)
2025-09-01  9:39 ` [PATCH net-next 3/6] selftests: mptcp: remove add_addr_timeout settings Matthieu Baerts (NGI0)
2025-09-01  9:39 ` [PATCH net-next 4/6] selftests: mptcp: add checks for fallback counters Matthieu Baerts (NGI0)
2025-09-01  9:39 ` [PATCH net-next 5/6] net: Add rfs_needed() helper Matthieu Baerts (NGI0)
2025-09-01  9:39 ` [PATCH net-next 6/6] mptcp: record subflows in RPS table Matthieu Baerts (NGI0)
2025-09-02 14:26 ` [PATCH net-next 0/6] mptcp: misc. features for v6.18 Jakub Kicinski
2025-09-02 14:51   ` Matthieu Baerts
2025-09-02 15:27     ` Jakub Kicinski
2025-09-02 18:25       ` Catalin Marinas
2025-09-02 18:50         ` Matthieu Baerts
2025-09-02 21:18           ` Catalin Marinas
2025-09-02 21:38             ` Matthieu Baerts
2025-09-02 22:21               ` Christoph Paasch
2025-09-02 14:29 ` Matthieu Baerts
2025-09-02 19:09   ` Jakub Kicinski
2025-09-02 19:25     ` Matthieu Baerts
2025-09-02 20:53       ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).