* [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port
@ 2026-06-01 5:22 Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 01/11] mptcp: options: suboptions sizes can be negative Matthieu Baerts (NGI0)
` (10 more replies)
0 siblings, 11 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0),
Jonathan Corbet, Shuah Khan, linux-doc, linux-kselftest,
Neal Cardwell, Kuniyuki Iwashima, Shuah Khan
Up to this series, it was possible to add a "signal" MPTCP endpoint with
an IPv6 address and a port, or to directly request to send an ADD_ADDR
with a v6 address and a port, but the expected ADD_ADDR wasn't sent when
TCP timestamps was used for the connection.
In fact, such signalling option cannot be sent when TCP timestamps is
used due to a lack of option space: the limit is at 40 bytes, and, with
padding, TCP timestamps is taking 12 bytes, while an ADD_ADDR IPv6 +
port is taking 30 bytes. The selected solution here is to simply drop
the TCP timestamps option when such ADD_ADDR of 30 bytes needs to be
sent.
- Patches 1-3: small cleanups to avoid computing ADD/RM_ADDR twice.
- Patches 4-7: the new feature, controlled by a new sysctl knob.
- Patch 8: extra checks in the MPTCP Join selftests.
- Patches 9-11: small pcap-related improvements in the selftests.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Matthieu Baerts (NGI0) (11):
mptcp: options: suboptions sizes can be negative
mptcp: pm: avoid computing rm_addr size twice
mptcp: pm: avoid computing add_addr size twice
mptcp: introduce add_addr_v6_port_drop_ts sysctl knob
tcp: allow mptcp to drop TS for some packets
mptcp: pm: drop TCP TS with ADD_ADDRv6 + port
selftests: mptcp: validate ADD_ADDRv6 + TS + port
selftests: mptcp: always check sent/dropped ADD_ADDRs
selftests: mptcp: connect: test name in pcap file
selftests: mptcp: simult_flow: test name in pcap file
selftests: mptcp: pcap: drop most of the payload
Documentation/networking/mptcp-sysctl.rst | 13 ++++
include/net/mptcp.h | 3 +-
net/ipv4/tcp_output.c | 6 +-
net/mptcp/ctrl.c | 18 ++++-
net/mptcp/options.c | 64 ++++++----------
net/mptcp/pm.c | 49 +++++++++++--
net/mptcp/protocol.h | 30 +-------
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 8 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 85 ++++++++++------------
tools/testing/selftests/net/mptcp/simult_flows.sh | 11 ++-
10 files changed, 159 insertions(+), 128 deletions(-)
---
base-commit: 8415598365503ced2e3d019491b0a2756c85c494
change-id: 20260601-net-next-mptcp-add-addr6-port-ts-40d8d74d8e20
Best regards,
--
Matthieu Baerts (NGI0) <matttbe@kernel.org>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH net-next 01/11] mptcp: options: suboptions sizes can be negative
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 02/11] mptcp: pm: avoid computing rm_addr size twice Matthieu Baerts (NGI0)
` (9 subsequent siblings)
10 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0)
Use a signed int for the returned size, because when other options are
dropped, the size can be negative, e.g. to send an echo ADD_ADDR with a
v4 address, and no port.
The behaviour is not changed, because it was working as expected with an
overflow. But it is clearer like this, and it will help later on.
Even if, for the moment, only the ADD_ADDR size can be negative in some
cases, a signed int is now used for all mptcp_established_options_*()
helpers, not to mismatch the type, and as a question of uniformity.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/options.c | 28 +++++++++++-----------------
1 file changed, 11 insertions(+), 17 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 8a1c5698983c..1db418a9d4a6 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -447,8 +447,7 @@ static void clear_3rdack_retransmission(struct sock *sk)
}
static bool mptcp_established_options_mp(struct sock *sk, struct sk_buff *skb,
- bool snd_data_fin_enable,
- unsigned int *size,
+ bool snd_data_fin_enable, int *size,
struct mptcp_out_options *opts)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
@@ -560,8 +559,7 @@ static void mptcp_write_data_fin(struct mptcp_subflow_context *subflow,
}
static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb,
- bool snd_data_fin_enable,
- unsigned int *size,
+ bool snd_data_fin_enable, int *size,
struct mptcp_out_options *opts)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
@@ -651,8 +649,8 @@ static u64 add_addr_generate_hmac(u64 key1, u64 key2,
return get_unaligned_be64(&hmac[SHA256_DIGEST_SIZE - sizeof(u64)]);
}
-static bool mptcp_established_options_add_addr(struct sock *sk, struct sk_buff *skb,
- unsigned int *size,
+static bool mptcp_established_options_add_addr(struct sock *sk,
+ struct sk_buff *skb, int *size,
unsigned int remaining,
struct mptcp_out_options *opts)
{
@@ -715,8 +713,7 @@ static bool mptcp_established_options_add_addr(struct sock *sk, struct sk_buff *
return true;
}
-static bool mptcp_established_options_rm_addr(struct sock *sk,
- unsigned int *size,
+static bool mptcp_established_options_rm_addr(struct sock *sk, int *size,
unsigned int remaining,
struct mptcp_out_options *opts)
{
@@ -745,8 +742,7 @@ static bool mptcp_established_options_rm_addr(struct sock *sk,
return true;
}
-static bool mptcp_established_options_mp_prio(struct sock *sk,
- unsigned int *size,
+static bool mptcp_established_options_mp_prio(struct sock *sk, int *size,
unsigned int remaining,
struct mptcp_out_options *opts)
{
@@ -772,7 +768,7 @@ static bool mptcp_established_options_mp_prio(struct sock *sk,
}
static noinline bool mptcp_established_options_rst(struct sock *sk, struct sk_buff *skb,
- unsigned int *size,
+ int *size,
unsigned int remaining,
struct mptcp_out_options *opts)
{
@@ -790,8 +786,7 @@ static noinline bool mptcp_established_options_rst(struct sock *sk, struct sk_bu
return true;
}
-static bool mptcp_established_options_fastclose(struct sock *sk,
- unsigned int *size,
+static bool mptcp_established_options_fastclose(struct sock *sk, int *size,
unsigned int remaining,
struct mptcp_out_options *opts)
{
@@ -813,8 +808,7 @@ static bool mptcp_established_options_fastclose(struct sock *sk,
return true;
}
-static bool mptcp_established_options_mp_fail(struct sock *sk,
- unsigned int *size,
+static bool mptcp_established_options_mp_fail(struct sock *sk, int *size,
unsigned int remaining,
struct mptcp_out_options *opts)
{
@@ -842,9 +836,9 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
struct mptcp_sock *msk = mptcp_sk(subflow->conn);
- unsigned int opt_size = 0;
bool snd_data_fin;
bool ret = false;
+ int opt_size = 0;
opts->suboptions = 0;
@@ -872,7 +866,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
if (mptcp_established_options_mp(sk, skb, snd_data_fin, &opt_size, opts))
ret = true;
else if (mptcp_established_options_dss(sk, skb, snd_data_fin, &opt_size, opts)) {
- unsigned int mp_fail_size;
+ int mp_fail_size;
ret = true;
if (mptcp_established_options_mp_fail(sk, &mp_fail_size,
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 02/11] mptcp: pm: avoid computing rm_addr size twice
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 01/11] mptcp: options: suboptions sizes can be negative Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 03/11] mptcp: pm: avoid computing add_addr " Matthieu Baerts (NGI0)
` (8 subsequent siblings)
10 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0)
mptcp_rm_addr_len helper was called twice: in mptcp_pm_rm_addr_signal,
then just after in mptcp_established_options_rm_addr. Both to check the
remaining space.
The second call is not needed: if there is not enough space,
mptcp_pm_rm_addr_signal will return false, and the caller,
mptcp_established_options_rm_addr, will do the same without re-checking
the size again. Instead, mptcp_pm_rm_addr_signal can directly set the
size.
While at it, move mptcp_rm_addr_len to pm.c, as it is now only used
there, once.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/options.c | 11 ++---------
net/mptcp/pm.c | 11 ++++++++++-
net/mptcp/protocol.h | 10 +---------
3 files changed, 13 insertions(+), 19 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 1db418a9d4a6..05c08034a15d 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -720,19 +720,12 @@ static bool mptcp_established_options_rm_addr(struct sock *sk, int *size,
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
struct mptcp_sock *msk = mptcp_sk(subflow->conn);
struct mptcp_rm_list rm_list;
- int i, len;
+ int i;
if (!mptcp_pm_should_rm_signal(msk) ||
- !(mptcp_pm_rm_addr_signal(msk, remaining, &rm_list)))
+ !(mptcp_pm_rm_addr_signal(msk, remaining, &rm_list, size)))
return false;
- len = mptcp_rm_addr_len(&rm_list);
- if (len < 0)
- return false;
- if (remaining < len)
- return false;
-
- *size = len;
opts->suboptions |= OPTION_MPTCP_RM_ADDR;
opts->rm_list = rm_list;
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 3e770c7407e1..48299c8fe2a4 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -960,8 +960,16 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
return ret;
}
+static int mptcp_rm_addr_len(const struct mptcp_rm_list *rm_list)
+{
+ if (rm_list->nr == 0 || rm_list->nr > MPTCP_RM_IDS_MAX)
+ return -EINVAL;
+
+ return TCPOLEN_MPTCP_RM_ADDR_BASE + roundup(rm_list->nr - 1, 4) + 1;
+}
+
bool mptcp_pm_rm_addr_signal(struct mptcp_sock *msk, unsigned int remaining,
- struct mptcp_rm_list *rm_list)
+ struct mptcp_rm_list *rm_list, int *size)
{
int ret = false, len;
u8 rm_addr;
@@ -981,6 +989,7 @@ bool mptcp_pm_rm_addr_signal(struct mptcp_sock *msk, unsigned int remaining,
if (remaining < len)
goto out_unlock;
+ *size = len;
*rm_list = msk->pm.rm_list_tx;
WRITE_ONCE(msk->pm.addr_signal, rm_addr);
ret = true;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e4f5aba24da7..da677f5cef71 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1221,20 +1221,12 @@ static inline unsigned int mptcp_add_addr_len(int family, bool echo, bool port)
return len;
}
-static inline int mptcp_rm_addr_len(const struct mptcp_rm_list *rm_list)
-{
- if (rm_list->nr == 0 || rm_list->nr > MPTCP_RM_IDS_MAX)
- return -EINVAL;
-
- return TCPOLEN_MPTCP_RM_ADDR_BASE + roundup(rm_list->nr - 1, 4) + 1;
-}
-
bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
unsigned int opt_size, unsigned int remaining,
struct mptcp_addr_info *addr, bool *echo,
bool *drop_other_suboptions);
bool mptcp_pm_rm_addr_signal(struct mptcp_sock *msk, unsigned int remaining,
- struct mptcp_rm_list *rm_list);
+ struct mptcp_rm_list *rm_list, int *len);
int mptcp_pm_get_local_id(struct mptcp_sock *msk, struct sock_common *skc);
int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk,
struct mptcp_pm_addr_entry *skc);
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 03/11] mptcp: pm: avoid computing add_addr size twice
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 01/11] mptcp: options: suboptions sizes can be negative Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 02/11] mptcp: pm: avoid computing rm_addr size twice Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 04/11] mptcp: introduce add_addr_v6_port_drop_ts sysctl knob Matthieu Baerts (NGI0)
` (7 subsequent siblings)
10 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0)
mptcp_add_addr_len helper was called twice: in mptcp_pm_add_addr_signal,
then just after in mptcp_established_options_add_addr. Both to check
the remaining space.
The second call is not needed: if there is not enough space,
mptcp_pm_add_addr_signal will return false, and the caller,
mptcp_established_options_add_addr, will do the same without re-checking
the size again. Instead, mptcp_pm_add_addr_signal can directly set the
size.
Note that the returned size can be negative when other suboptions are
dropped, e.g. to send an echo ADD_ADDR with a v4 address, and no port.
While at it:
- move mptcp_add_addr_len to pm.c, as it is now only used from there
- use 'int' in mptcp_add_addr_len for the size, instead of having a mix
- use a bool for 'ret' in mptcp_pm_add_addr_signal
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/options.c | 16 +++-------------
net/mptcp/pm.c | 26 ++++++++++++++++++++++----
net/mptcp/protocol.h | 17 +----------------
3 files changed, 26 insertions(+), 33 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 05c08034a15d..be85607733f3 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -657,34 +657,25 @@ static bool mptcp_established_options_add_addr(struct sock *sk,
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
struct mptcp_sock *msk = mptcp_sk(subflow->conn);
bool drop_other_suboptions = false;
- unsigned int opt_size = *size;
struct mptcp_addr_info addr;
bool echo;
- int len;
/* add addr will strip the existing options, be sure to avoid breaking
* MPC/MPJ handshakes
*/
if (!mptcp_pm_should_add_signal(msk) ||
(opts->suboptions & (OPTION_MPTCP_MPJ_ACK | OPTION_MPTCP_MPC_ACK)) ||
- !mptcp_pm_add_addr_signal(msk, skb, opt_size, remaining, &addr,
- &echo, &drop_other_suboptions))
+ !mptcp_pm_add_addr_signal(msk, skb, size, remaining, &addr, &echo,
+ &drop_other_suboptions))
return false;
/*
* Later on, mptcp_write_options() will enforce mutually exclusion with
* DSS, bail out if such option is set and we can't drop it.
*/
- if (drop_other_suboptions)
- remaining += opt_size;
- else if (opts->suboptions & OPTION_MPTCP_DSS)
+ if (!drop_other_suboptions && opts->suboptions & OPTION_MPTCP_DSS)
return false;
- len = mptcp_add_addr_len(addr.family, echo, !!addr.port);
- if (remaining < len)
- return false;
-
- *size = len;
if (drop_other_suboptions) {
pr_debug("drop other suboptions\n");
opts->suboptions = 0;
@@ -695,7 +686,6 @@ static bool mptcp_established_options_add_addr(struct sock *sk,
* options
*/
opts->ahmac = 0;
- *size -= opt_size;
}
opts->addr = addr;
opts->suboptions |= OPTION_MPTCP_ADD_ADDR;
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 48299c8fe2a4..6a2cbe8616d3 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -887,14 +887,30 @@ void mptcp_pm_mp_fail_received(struct sock *sk, u64 fail_seq)
}
}
+static int mptcp_add_addr_len(int family, bool echo, bool port)
+{
+ int len = TCPOLEN_MPTCP_ADD_ADDR_BASE;
+
+ if (family == AF_INET6)
+ len = TCPOLEN_MPTCP_ADD_ADDR6_BASE;
+ if (!echo)
+ len += MPTCPOPT_THMAC_LEN;
+ /* account for 2 trailing 'nop' options */
+ if (port)
+ len += TCPOLEN_MPTCP_PORT_LEN + TCPOLEN_MPTCP_PORT_ALIGN;
+
+ return len;
+}
+
bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
- unsigned int opt_size, unsigned int remaining,
+ int *size, int remaining,
struct mptcp_addr_info *addr, bool *echo,
bool *drop_other_suboptions)
{
bool skip_add_addr = false;
- int ret = false;
+ bool ret = false;
u8 add_addr;
+ int len = 0;
u8 family;
bool port;
@@ -909,7 +925,7 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
* if any, will be carried by the 'original' TCP ack
*/
if (skb && skb_is_tcp_pure_ack(skb)) {
- remaining += opt_size;
+ len -= *size;
*drop_other_suboptions = true;
}
@@ -926,7 +942,8 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
family = msk->pm.local.family;
}
- if (remaining < mptcp_add_addr_len(family, *echo, port)) {
+ len += mptcp_add_addr_len(family, *echo, port);
+ if (len > remaining) {
struct net *net = sock_net((struct sock *)msk);
if (!*drop_other_suboptions)
@@ -942,6 +959,7 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
}
ret = true;
+ *size = len;
drop_signal_mark:
WRITE_ONCE(msk->pm.addr_signal, add_addr);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index da677f5cef71..e0ffebaa6795 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1206,23 +1206,8 @@ static inline bool mptcp_pm_is_kernel(const struct mptcp_sock *msk)
return READ_ONCE(msk->pm.pm_type) == MPTCP_PM_TYPE_KERNEL;
}
-static inline unsigned int mptcp_add_addr_len(int family, bool echo, bool port)
-{
- u8 len = TCPOLEN_MPTCP_ADD_ADDR_BASE;
-
- if (family == AF_INET6)
- len = TCPOLEN_MPTCP_ADD_ADDR6_BASE;
- if (!echo)
- len += MPTCPOPT_THMAC_LEN;
- /* account for 2 trailing 'nop' options */
- if (port)
- len += TCPOLEN_MPTCP_PORT_LEN + TCPOLEN_MPTCP_PORT_ALIGN;
-
- return len;
-}
-
bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
- unsigned int opt_size, unsigned int remaining,
+ int *size, int remaining,
struct mptcp_addr_info *addr, bool *echo,
bool *drop_other_suboptions);
bool mptcp_pm_rm_addr_signal(struct mptcp_sock *msk, unsigned int remaining,
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 04/11] mptcp: introduce add_addr_v6_port_drop_ts sysctl knob
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
` (2 preceding siblings ...)
2026-06-01 5:22 ` [PATCH net-next 03/11] mptcp: pm: avoid computing add_addr " Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
2026-06-01 5:44 ` Eric Dumazet
2026-06-01 5:22 ` [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets Matthieu Baerts (NGI0)
` (6 subsequent siblings)
10 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0),
Jonathan Corbet, Shuah Khan, linux-doc, linux-kselftest
This sysctl is going to be used in the next commits to drop TCP
timestamps option, to be able to send an ADD_ADDR with a v6 IP address
and a port number. It is enabled by default.
This knob is explicitly disabled in the MPTCP Join selftest, with the
"signal addr list progresses after tx drop" subtest, to continue
verifying the previous behaviour where the ADD_ADDR is not sent due to a
lack of space.
While at it, move syn_retrans_before_tcp_fallback down from struct
mptcp_pernet, to avoid creating another 3 bytes hole.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Jonathan Corbet <corbet@lwn.net>
To: Shuah Khan <skhan@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
---
Documentation/networking/mptcp-sysctl.rst | 13 +++++++++++++
net/mptcp/ctrl.c | 18 +++++++++++++++++-
net/mptcp/protocol.h | 1 +
tools/testing/selftests/net/mptcp/mptcp_join.sh | 1 +
4 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst
index 1eb6af26b4a7..b9b5f58e0625 100644
--- a/Documentation/networking/mptcp-sysctl.rst
+++ b/Documentation/networking/mptcp-sysctl.rst
@@ -21,6 +21,19 @@ add_addr_timeout - INTEGER (seconds)
Default: 120
+add_addr_v6_port_drop_ts - BOOLEAN
+ Control whether preparing an ADD_ADDR with an IPv6 address and a port
+ should drop the TCP timestamps option to have enough option space to
+ send the signal.
+
+ If there is not enough option space, and the TCP timestamps option
+ cannot be dropped, the signal cannot be sent. Note that dropping the TCP
+ timestamps option for one packet of the connection could disrupt some
+ middleboxes: even if it should be unlikely, they could drop the packet
+ or block the connection. This is a per-namespace sysctl.
+
+ Default: 1 (enabled)
+
allow_join_initial_addr_port - BOOLEAN
Allow peers to send join requests to the IP address and port number used
by the initial subflow if the value is 1. This controls a flag that is
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index d96130e49942..c94a192f4118 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -32,12 +32,13 @@ struct mptcp_pernet {
unsigned int close_timeout;
unsigned int stale_loss_cnt;
atomic_t active_disable_times;
- u8 syn_retrans_before_tcp_fallback;
unsigned long active_disable_stamp;
+ u8 syn_retrans_before_tcp_fallback;
u8 mptcp_enabled;
u8 checksum_enabled;
u8 allow_join_initial_addr_port;
u8 pm_type;
+ u8 add_addr_v6_port_drop_ts;
char scheduler[MPTCP_SCHED_NAME_MAX];
char path_manager[MPTCP_PM_NAME_MAX];
};
@@ -94,6 +95,11 @@ const char *mptcp_get_scheduler(const struct net *net)
return mptcp_get_pernet(net)->scheduler;
}
+unsigned int mptcp_add_addr_v6_port_drop_ts(const struct net *net)
+{
+ return mptcp_get_pernet(net)->add_addr_v6_port_drop_ts;
+}
+
static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet)
{
pernet->mptcp_enabled = 1;
@@ -108,6 +114,7 @@ static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet)
pernet->pm_type = MPTCP_PM_TYPE_KERNEL;
strscpy(pernet->scheduler, "default", sizeof(pernet->scheduler));
strscpy(pernet->path_manager, "kernel", sizeof(pernet->path_manager));
+ pernet->add_addr_v6_port_drop_ts = 1;
}
#ifdef CONFIG_SYSCTL
@@ -362,6 +369,14 @@ static struct ctl_table mptcp_sysctl_table[] = {
.mode = 0444,
.proc_handler = proc_available_path_managers,
},
+ {
+ .procname = "add_addr_v6_port_drop_ts",
+ .maxlen = sizeof(u8),
+ .mode = 0644,
+ .proc_handler = proc_dou8vec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE
+ },
};
static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet)
@@ -389,6 +404,7 @@ static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet)
table[10].data = &pernet->syn_retrans_before_tcp_fallback;
table[11].data = &pernet->path_manager;
/* table[12] is for available_path_managers which is read-only info */
+ table[13].data = &pernet->add_addr_v6_port_drop_ts;
hdr = register_net_sysctl_sz(net, MPTCP_SYSCTL_PATH, table,
ARRAY_SIZE(mptcp_sysctl_table));
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e0ffebaa6795..f4276980d78a 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -798,6 +798,7 @@ unsigned int mptcp_close_timeout(const struct sock *sk);
int mptcp_get_pm_type(const struct net *net);
const char *mptcp_get_path_manager(const struct net *net);
const char *mptcp_get_scheduler(const struct net *net);
+unsigned int mptcp_add_addr_v6_port_drop_ts(const struct net *net);
void mptcp_active_disable(struct sock *sk);
bool mptcp_active_should_disable(struct sock *ssk);
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 5d4d0f127f79..23b17957686a 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -3313,6 +3313,7 @@ add_addr_ports_tests()
if reset "signal addr list progresses after tx drop"; then
pm_nl_set_limits $ns1 0 2
pm_nl_set_limits $ns2 1 0
+ ip netns exec $ns1 sysctl -q net.mptcp.add_addr_v6_port_drop_ts=0 2>/dev/null || true
ip netns exec $ns1 sysctl -q net.ipv4.tcp_timestamps=1
ip netns exec $ns2 sysctl -q net.ipv4.tcp_timestamps=1
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
` (3 preceding siblings ...)
2026-06-01 5:22 ` [PATCH net-next 04/11] mptcp: introduce add_addr_v6_port_drop_ts sysctl knob Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
2026-06-01 6:04 ` Eric Dumazet
2026-06-01 5:22 ` [PATCH net-next 06/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
` (5 subsequent siblings)
10 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0),
Neal Cardwell, Kuniyuki Iwashima
With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
taking 30 bytes, the 40-byte limit for the TCP options is reached. In
this case, it is then not possible to send the address signal.
The idea is to let MPTCP dropping the TCP-timestamps option for some
specific packets, to be able to send some specific pure ACK carrying >28
bytes of MPTCP options, like with this specific ADD_ADDR. A new
parameter is passed from tcp_established_options to the MPTCP side to
indicate if the TCP TS option is used, and if it should be dropped. The
next commit implements the part on MPTCP side, but split into two
patches to help TCP maintainers to identify the modifications on TCP
side. This feature will be controlled by a new add_addr_v6_port_drop_ts
MPTCP sysctl knob.
It is important to keep in mind that dropping the TCP timestamps option
for one packet of the connection could eventually disrupt some
middleboxes: even if it should be unlikely, they could drop the packet
or even block the connection. That's why this new feature will be
controlled by a sysctl knob.
Note that it would be technically possible to squeeze both options into
the header if the ADD_ADDR is first written, and then the TCP timestamps
without the NOPs preceding it. But this means more modifications on TCP
side, plus some middleboxes could still be disrupted by that.
About the implementation, instead of passing a new boolean (drop_ts),
another option would be to pass the whole option structure (opts),
but 'struct tcp_out_options' is currently defined in tcp_output.c, and
would need to be exported. Plus that means the removal of the TCP TS
option would be done on the MPTCP side, and not here on the TCP side.
It feels clearer to remove other TCP options from the TCP side, than
hiding that from the MPTCP side.
Yet an other alternative would be to pass the size already taken by the
other TCP options, and have a way to drop them all when needed. But this
feels better to target only the timestamps option where dropping it
should be safe, even if it is currently the only option that would be
set before MPTCP, when MPTCP is used.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Neal Cardwell <ncardwell@google.com>
To: Kuniyuki Iwashima <kuniyu@google.com>
---
include/net/mptcp.h | 3 ++-
net/ipv4/tcp_output.c | 6 +++++-
net/mptcp/options.c | 3 ++-
3 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index f7263fe2a2e4..000b6593bfa4 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -151,7 +151,7 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
struct mptcp_out_options *opts);
bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
unsigned int *size, unsigned int remaining,
- struct mptcp_out_options *opts);
+ bool *drop_ts, struct mptcp_out_options *opts);
bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
@@ -270,6 +270,7 @@ static inline bool mptcp_established_options(struct sock *sk,
struct sk_buff *skb,
unsigned int *size,
unsigned int remaining,
+ bool *drop_ts,
struct mptcp_out_options *opts)
{
return false;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ef0c10cd31c7..53ee4c8f5f8c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1181,12 +1181,16 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
*/
if (sk_is_mptcp(sk)) {
unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
+ bool drop_ts = opts->options & OPTION_TS;
unsigned int opt_size = 0;
if (mptcp_established_options(sk, skb, &opt_size, remaining,
- &opts->mptcp)) {
+ &drop_ts, &opts->mptcp)) {
opts->options |= OPTION_MPTCP;
size += opt_size;
+
+ if (drop_ts)
+ opts->options &= ~OPTION_TS;
}
}
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index be85607733f3..ccb5ac0aa729 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -815,7 +815,7 @@ static bool mptcp_established_options_mp_fail(struct sock *sk, int *size,
bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
unsigned int *size, unsigned int remaining,
- struct mptcp_out_options *opts)
+ bool *drop_ts, struct mptcp_out_options *opts)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
struct mptcp_sock *msk = mptcp_sk(subflow->conn);
@@ -824,6 +824,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
int opt_size = 0;
opts->suboptions = 0;
+ *drop_ts = false;
/* Force later mptcp_write_options(), but do not use any actual
* option space.
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 06/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
` (4 preceding siblings ...)
2026-06-01 5:22 ` [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 07/11] selftests: mptcp: validate ADD_ADDRv6 + TS " Matthieu Baerts (NGI0)
` (4 subsequent siblings)
10 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0)
With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
taking 30 bytes, the 40-byte limit for the TCP options is reached. In
this case, it is then not possible to send the signal.
To be able to send this ADD_ADDR, the TCP timestamps option can now be
dropped. This is done, when needed by setting the *drop_ts parameter
from mptcp_established_options. This feature is controlled by a new
net.mptcp.add_addr_v6_port_drop_ts sysctl knob, enabled by default.
It is important to keep in mind that dropping the TCP timestamps option
for one packet of the connection could eventually disrupt some
middleboxes: even if it should be unlikely, they could drop the packet
or even block the connection. That's why this new feature can be
controlled by a sysctl knob.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/448
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/options.c | 8 ++++++--
net/mptcp/pm.c | 12 +++++++++++-
net/mptcp/protocol.h | 2 +-
3 files changed, 18 insertions(+), 4 deletions(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index ccb5ac0aa729..02336d1c1550 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -652,6 +652,7 @@ static u64 add_addr_generate_hmac(u64 key1, u64 key2,
static bool mptcp_established_options_add_addr(struct sock *sk,
struct sk_buff *skb, int *size,
unsigned int remaining,
+ bool *drop_ts,
struct mptcp_out_options *opts)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
@@ -666,7 +667,7 @@ static bool mptcp_established_options_add_addr(struct sock *sk,
if (!mptcp_pm_should_add_signal(msk) ||
(opts->suboptions & (OPTION_MPTCP_MPJ_ACK | OPTION_MPTCP_MPC_ACK)) ||
!mptcp_pm_add_addr_signal(msk, skb, size, remaining, &addr, &echo,
- &drop_other_suboptions))
+ &drop_other_suboptions, drop_ts))
return false;
/*
@@ -819,6 +820,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
struct mptcp_sock *msk = mptcp_sk(subflow->conn);
+ bool add_addr_drop_ts = *drop_ts;
bool snd_data_fin;
bool ret = false;
int opt_size = 0;
@@ -869,10 +871,12 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
*size += opt_size;
remaining -= opt_size;
- if (mptcp_established_options_add_addr(sk, skb, &opt_size, remaining, opts)) {
+ if (mptcp_established_options_add_addr(sk, skb, &opt_size, remaining,
+ &add_addr_drop_ts, opts)) {
*size += opt_size;
remaining -= opt_size;
ret = true;
+ *drop_ts = add_addr_drop_ts;
} else if (mptcp_established_options_rm_addr(sk, &opt_size, remaining, opts)) {
*size += opt_size;
remaining -= opt_size;
diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 6a2cbe8616d3..b1b3f7482f7c 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -905,7 +905,7 @@ static int mptcp_add_addr_len(int family, bool echo, bool port)
bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
int *size, int remaining,
struct mptcp_addr_info *addr, bool *echo,
- bool *drop_other_suboptions)
+ bool *drop_other_suboptions, bool *drop_ts)
{
bool skip_add_addr = false;
bool ret = false;
@@ -949,6 +949,13 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
if (!*drop_other_suboptions)
goto out_unlock;
+ if (*drop_ts && mptcp_add_addr_v6_port_drop_ts(net)) {
+ /* OK without TCP Timestamps? */
+ len -= TCPOLEN_TSTAMP_ALIGNED;
+ if (len <= remaining)
+ goto enough_space;
+ }
+
if (*echo) {
MPTCP_INC_STATS(net, MPTCP_MIB_ECHOADDTXDROP);
} else {
@@ -958,6 +965,9 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
goto drop_signal_mark;
}
+ *drop_ts = false;
+
+enough_space:
ret = true;
*size = len;
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index f4276980d78a..50c3205cab46 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1210,7 +1210,7 @@ static inline bool mptcp_pm_is_kernel(const struct mptcp_sock *msk)
bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, const struct sk_buff *skb,
int *size, int remaining,
struct mptcp_addr_info *addr, bool *echo,
- bool *drop_other_suboptions);
+ bool *drop_other_suboptions, bool *drop_ts);
bool mptcp_pm_rm_addr_signal(struct mptcp_sock *msk, unsigned int remaining,
struct mptcp_rm_list *rm_list, int *len);
int mptcp_pm_get_local_id(struct mptcp_sock *msk, struct sock_common *skc);
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 07/11] selftests: mptcp: validate ADD_ADDRv6 + TS + port
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
` (5 preceding siblings ...)
2026-06-01 5:22 ` [PATCH net-next 06/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 08/11] selftests: mptcp: always check sent/dropped ADD_ADDRs Matthieu Baerts (NGI0)
` (3 subsequent siblings)
10 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Shuah Khan,
linux-kselftest
This validates the feature added by parent commit, where it is now
possible to send an ADD_ADDR with a v6 IP address and a port number,
while the connection is using TCP Timestamps.
This test is simply a copy of the previous one: "signal address with
port", but using IPv6 addresses. This test is only executed if the
add_addr_v6_port_drop_ts sysctl knob is available. If not, it means the
kernel doesn't support this feature.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 23b17957686a..d491c3e964d6 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -3214,6 +3214,17 @@ add_addr_ports_tests()
chk_add_nr 1 1 1
fi
+ # signal address v6 with port
+ if reset "signal address v6 with port" &&
+ continue_if mptcp_lib_has_file '/proc/sys/net/mptcp/add_addr_v6_port_drop_ts'; then
+ pm_nl_set_limits $ns1 0 1
+ pm_nl_set_limits $ns2 1 1
+ pm_nl_add_endpoint $ns1 dead:beef:2::1 flags signal port 10100
+ run_tests $ns1 $ns2 dead:beef:1::1
+ chk_join_nr 1 1 1
+ chk_add_nr 1 1 1
+ fi
+
# subflow and signal with port
if reset "subflow and signal with port"; then
pm_nl_add_endpoint $ns1 10.0.2.1 flags signal port 10100
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 08/11] selftests: mptcp: always check sent/dropped ADD_ADDRs
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
` (6 preceding siblings ...)
2026-06-01 5:22 ` [PATCH net-next 07/11] selftests: mptcp: validate ADD_ADDRv6 + TS " Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 09/11] selftests: mptcp: connect: test name in pcap file Matthieu Baerts (NGI0)
` (2 subsequent siblings)
10 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Shuah Khan,
linux-kselftest
Before, they were only checked on demand, but it seems better to check
them each time received ADD_ADDRs are checked.
Errors are only reported when the counter exists, and the value is not
the expected one. This is similar to what is done in chk_join_nr: it
reduces the output, and avoids a lot of 'skip' when validating older
kernels. Also here, some tests need to adapt the default expected
counters, e.g. when ADD_ADDR echo are dropped on the reception side, or
it is not possible to send an ADD_ADDR due to the limited option space.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 71 ++++++++++---------------
1 file changed, 27 insertions(+), 44 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index d491c3e964d6..82c0f7df3be2 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -87,6 +87,10 @@ unset fb_mpc_data
unset fb_md5_sig
unset fb_dss
+unset add_addr_tx_nr
+unset add_addr_echo_tx_nr
+unset add_addr_drop_tx_nr
+
# generated using "nfbpf_compile '(ip && (ip[54] & 0xf0) == 0x30) ||
# (ip6 && (ip6[74] & 0xf0) == 0x30)'"
CBPF_MPTCP_SUBOPTION_ADD_ADDR="14,
@@ -1710,6 +1714,9 @@ chk_add_nr()
local ack_nr=$port_nr
local mis_syn_nr=0
local mis_ack_nr=0
+ local add_tx_nr=${add_addr_tx_nr:-${add_nr}}
+ local echo_tx_nr=${add_addr_echo_tx_nr:-${echo_nr}}
+ local drop_tx_nr=${add_addr_drop_tx_nr:-0}
local ns_tx=$ns1
local ns_rx=$ns2
local tx=""
@@ -1811,50 +1818,25 @@ chk_add_nr()
print_ok
fi
fi
-}
-chk_add_tx_nr()
-{
- local add_tx_nr=$1
- local echo_tx_nr=$2
- local count
-
- print_check "add addr tx"
- count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtAddAddrTx")
- if [ -z "$count" ]; then
- print_skip
+ count=$(mptcp_lib_get_counter ${ns_tx} "MPTcpExtAddAddrTx")
# Tolerate more ADD_ADDR then expected (if any), due to retransmissions
- elif [ "$count" != "$add_tx_nr" ] &&
- { [ "$add_tx_nr" -eq 0 ] || [ "$count" -lt "$add_tx_nr" ]; }; then
+ if [ -n "$count" ] && [ "$count" != "$add_tx_nr" ] &&
+ { [ "$add_tx_nr" -eq 0 ] || [ "$count" -lt "$add_tx_nr" ]; }; then
+ print_check "add addr tx"
fail_test "got $count ADD_ADDR[s] TX, expected $add_tx_nr"
- else
- print_ok
fi
- print_check "add addr echo tx"
- count=$(mptcp_lib_get_counter ${ns2} "MPTcpExtEchoAddTx")
- if [ -z "$count" ]; then
- print_skip
- elif [ "$count" != "$echo_tx_nr" ]; then
+ count=$(mptcp_lib_get_counter ${ns_rx} "MPTcpExtEchoAddTx")
+ if [ -n "$count" ] && [ "$count" != "$echo_tx_nr" ]; then
+ print_check "add addr echo tx"
fail_test "got $count ADD_ADDR echo[s] TX, expected $echo_tx_nr"
- else
- print_ok
fi
-}
-chk_add_drop_tx_nr()
-{
- local drop_tx_nr=$1
- local count
-
- print_check "add addr tx drop"
- count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtAddAddrTxDrop")
- if [ -z "$count" ]; then
- print_skip
- elif [ "$count" != "$drop_tx_nr" ]; then
+ count=$(mptcp_lib_get_counter ${ns_tx} "MPTcpExtAddAddrTxDrop")
+ if [ -n "$count" ] && [ "$count" != "$drop_tx_nr" ]; then
+ print_check "add addr tx drop"
fail_test "got $count ADD_ADDR drop[s] TX, expected $drop_tx_nr"
- else
- print_ok
fi
}
@@ -2267,7 +2249,6 @@ signal_address_tests()
pm_nl_add_endpoint $ns1 10.0.2.1 flags signal
run_tests $ns1 $ns2 10.0.1.1
chk_join_nr 0 0 0
- chk_add_tx_nr 1 1
chk_add_nr 1 1
fi
@@ -2545,8 +2526,8 @@ add_addr_timeout_tests()
speed=slow \
run_tests $ns1 $ns2 10.0.1.1
chk_join_nr 1 1 1
- chk_add_tx_nr 4 4
- chk_add_nr 4 0
+ add_addr_echo_tx_nr=4 \
+ chk_add_nr 4 0
fi
# add_addr timeout IPv6
@@ -2557,7 +2538,8 @@ add_addr_timeout_tests()
speed=slow \
run_tests $ns1 $ns2 dead:beef:1::1
chk_join_nr 1 1 1
- chk_add_nr 4 0
+ add_addr_echo_tx_nr=4 \
+ chk_add_nr 4 0
fi
# signal addresses timeout
@@ -2569,7 +2551,8 @@ add_addr_timeout_tests()
speed=10 \
run_tests $ns1 $ns2 10.0.1.1
chk_join_nr 2 2 2
- chk_add_nr 8 0
+ add_addr_echo_tx_nr=8 \
+ chk_add_nr 8 0
fi
# signal invalid addresses timeout
@@ -2582,7 +2565,8 @@ add_addr_timeout_tests()
run_tests $ns1 $ns2 10.0.1.1
join_syn_tx=2 \
chk_join_nr 1 1 1
- chk_add_nr 8 0
+ add_addr_echo_tx_nr=7 \
+ chk_add_nr 8 0
fi
}
@@ -3331,9 +3315,8 @@ add_addr_ports_tests()
pm_nl_add_endpoint $ns1 dead:beef:2::1 flags signal port 10100
pm_nl_add_endpoint $ns1 dead:beef:3::1 flags signal
run_tests $ns1 $ns2 dead:beef:1::1
- chk_add_drop_tx_nr 1
- chk_add_tx_nr 1 1
- chk_add_nr 1 1 0
+ add_addr_drop_tx_nr=1 \
+ chk_add_nr 1 1 0
fi
}
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 09/11] selftests: mptcp: connect: test name in pcap file
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
` (7 preceding siblings ...)
2026-06-01 5:22 ` [PATCH net-next 08/11] selftests: mptcp: always check sent/dropped ADD_ADDRs Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 10/11] selftests: mptcp: simult_flow: " Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 11/11] selftests: mptcp: pcap: drop most of the payload Matthieu Baerts (NGI0)
10 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Shuah Khan,
linux-kselftest
Even if the pcap prefix is printed in the test, it is clearer if this
prefix also include the test name: mptcp_connect.
With this, it is easily possible to find out which pcap was produced by
which test, and easily delete the right ones.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index d158678fa6ab..5befd8584a4d 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -212,8 +212,8 @@ if $checksum; then
fi
if $capture; then
- rndh="${ns1:4}"
- mptcp_lib_pr_info "Packet capture files will have this prefix: ${rndh}-"
+ capprefix="mptcp_connect-${ns1:4}"
+ mptcp_lib_pr_info "pcap will have this prefix: ${capprefix}-"
fi
set_ethtool_flags() {
@@ -372,7 +372,7 @@ do_transfer()
capuser="-Z $SUDO_USER"
fi
- local capfile="${rndh}-${connector_ns:0:3}-${listener_ns:0:3}-${cl_proto}-${srv_proto}-${connect_addr}-${port}"
+ local capfile="${capprefix}-${connector_ns:0:3}-${listener_ns:0:3}-${cl_proto}-${srv_proto}-${connect_addr}-${port}"
local capopt="-i any -s 65535 -B 32768 ${capuser}"
ip netns exec ${listener_ns} tcpdump ${capopt} \
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 10/11] selftests: mptcp: simult_flow: test name in pcap file
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
` (8 preceding siblings ...)
2026-06-01 5:22 ` [PATCH net-next 09/11] selftests: mptcp: connect: test name in pcap file Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 11/11] selftests: mptcp: pcap: drop most of the payload Matthieu Baerts (NGI0)
10 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Shuah Khan,
linux-kselftest
To be able to easily find out which pcap was produced by which test, the
selftest name is now added to the pcap file, similar to the other tests.
While at it, print the prefix name to be able to find which capture
files have been produced by which test after several runs. This prefix
was not printed anywhere before.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/mptcp/simult_flows.sh | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
index 7b9aabe10170..d723261bdc62 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -24,6 +24,7 @@ small=""
sout=""
cout=""
capout=""
+capprefix=""
size=0
usage() {
@@ -70,6 +71,11 @@ setup()
mptcp_lib_ns_init ns1 ns2 ns3
+ if $capture; then
+ capprefix="simult_flows-${ns1:4}"
+ mptcp_lib_pr_info "pcap will have this prefix: ${capprefix}-"
+ fi
+
ip link add ns1eth1 netns "$ns1" type veth peer name ns2eth1 netns "$ns2"
ip link add ns1eth2 netns "$ns1" type veth peer name ns2eth2 netns "$ns2"
ip link add ns2eth3 netns "$ns2" type veth peer name ns3eth1 netns "$ns3"
@@ -136,14 +142,13 @@ do_transfer()
if $capture; then
local capuser
- local rndh="${ns1:4}"
if [ -z $SUDO_USER ] ; then
capuser=""
else
capuser="-Z $SUDO_USER"
fi
- local capfile="${rndh}-${port}"
+ local capfile="${capprefix}-${port}"
local capopt="-i any -s 65535 -B 32768 ${capuser}"
ip netns exec ${ns3} tcpdump ${capopt} -w "${capfile}-listener.pcap" >> "${capout}" 2>&1 &
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH net-next 11/11] selftests: mptcp: pcap: drop most of the payload
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
` (9 preceding siblings ...)
2026-06-01 5:22 ` [PATCH net-next 10/11] selftests: mptcp: simult_flow: " Matthieu Baerts (NGI0)
@ 2026-06-01 5:22 ` Matthieu Baerts (NGI0)
10 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts (NGI0) @ 2026-06-01 5:22 UTC (permalink / raw)
To: Mat Martineau, Geliang Tang, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman
Cc: netdev, mptcp, linux-kernel, Matthieu Baerts (NGI0), Shuah Khan,
linux-kselftest
Limit the size of each captured packet to 108B (IPv4 only) or 128B (a
mix of v4 and v6): this should drop most of the payload that is
generally not needed when debugging an issue.
8 bytes are left in this payload, to be able to inspect the beginning,
just in case.
Please also note that generally, this payload is usually mostly filled
with 0, except at the end. This reduces the .pcap sizes, and reduce IO
usage, which helps debugging issues.
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
To: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
---
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 2 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 2 +-
tools/testing/selftests/net/mptcp/simult_flows.sh | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 5befd8584a4d..7a2a851fa0ad 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -373,7 +373,7 @@ do_transfer()
fi
local capfile="${capprefix}-${connector_ns:0:3}-${listener_ns:0:3}-${cl_proto}-${srv_proto}-${connect_addr}-${port}"
- local capopt="-i any -s 65535 -B 32768 ${capuser}"
+ local capopt="-i any -s 128 -B 32768 ${capuser}"
ip netns exec ${listener_ns} tcpdump ${capopt} \
-w "${capfile}-listener.pcap" >> "${capout}" 2>&1 &
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 82c0f7df3be2..600eddb1796f 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -971,7 +971,7 @@ cond_start_capture()
capfile=$(printf "mp_join-%02u-%s.pcap" "$MPTCP_LIB_TEST_COUNTER" "$ns")
echo "Capturing traffic for test $MPTCP_LIB_TEST_COUNTER into $capfile"
- ip netns exec "$ns" tcpdump -i any -s 65535 -B 32768 $capuser -w "$capfile" > "$capout" 2>&1 &
+ ip netns exec "$ns" tcpdump -i any -s 128 -B 32768 $capuser -w "$capfile" > "$capout" 2>&1 &
cappid=$!
sleep 1
diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
index d723261bdc62..3ea3d1efe32e 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -149,7 +149,7 @@ do_transfer()
fi
local capfile="${capprefix}-${port}"
- local capopt="-i any -s 65535 -B 32768 ${capuser}"
+ local capopt="-i any -s 108 -B 32768 ${capuser}"
ip netns exec ${ns3} tcpdump ${capopt} -w "${capfile}-listener.pcap" >> "${capout}" 2>&1 &
local cappid_listener=$!
--
2.53.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 04/11] mptcp: introduce add_addr_v6_port_drop_ts sysctl knob
2026-06-01 5:22 ` [PATCH net-next 04/11] mptcp: introduce add_addr_v6_port_drop_ts sysctl knob Matthieu Baerts (NGI0)
@ 2026-06-01 5:44 ` Eric Dumazet
2026-06-01 6:04 ` Matthieu Baerts
0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2026-06-01 5:44 UTC (permalink / raw)
To: Matthieu Baerts (NGI0)
Cc: Mat Martineau, Geliang Tang, David S. Miller, Jakub Kicinski,
Paolo Abeni, Simon Horman, netdev, mptcp, linux-kernel,
Jonathan Corbet, Shuah Khan, linux-doc, linux-kselftest
On Sun, May 31, 2026 at 10:24 PM Matthieu Baerts (NGI0)
<matttbe@kernel.org> wrote:
>
> This sysctl is going to be used in the next commits to drop TCP
> timestamps option, to be able to send an ADD_ADDR with a v6 IP address
> and a port number. It is enabled by default.
>
> This knob is explicitly disabled in the MPTCP Join selftest, with the
> "signal addr list progresses after tx drop" subtest, to continue
> verifying the previous behaviour where the ADD_ADDR is not sent due to a
> lack of space.
>
> While at it, move syn_retrans_before_tcp_fallback down from struct
> mptcp_pernet, to avoid creating another 3 bytes hole.
>
> Reviewed-by: Mat Martineau <martineau@kernel.org>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
...
> };
> @@ -94,6 +95,11 @@ const char *mptcp_get_scheduler(const struct net *net)
> return mptcp_get_pernet(net)->scheduler;
> }
>
> +unsigned int mptcp_add_addr_v6_port_drop_ts(const struct net *net)
> +{
> + return mptcp_get_pernet(net)->add_addr_v6_port_drop_ts;
> +}
Please use READ_ONCE() over sysctls.
This will avoid future patches from KCSAN bots.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 04/11] mptcp: introduce add_addr_v6_port_drop_ts sysctl knob
2026-06-01 5:44 ` Eric Dumazet
@ 2026-06-01 6:04 ` Matthieu Baerts
0 siblings, 0 replies; 19+ messages in thread
From: Matthieu Baerts @ 2026-06-01 6:04 UTC (permalink / raw)
To: Eric Dumazet
Cc: Mat Martineau, Geliang Tang, David S. Miller, Jakub Kicinski,
Paolo Abeni, Simon Horman, netdev, mptcp, linux-kernel,
Jonathan Corbet, Shuah Khan, linux-doc, linux-kselftest
Hi Eric,
On 01/06/2026 15:44, Eric Dumazet wrote:
> On Sun, May 31, 2026 at 10:24 PM Matthieu Baerts (NGI0)
> <matttbe@kernel.org> wrote:
>>
>> This sysctl is going to be used in the next commits to drop TCP
>> timestamps option, to be able to send an ADD_ADDR with a v6 IP address
>> and a port number. It is enabled by default.
>>
>> This knob is explicitly disabled in the MPTCP Join selftest, with the
>> "signal addr list progresses after tx drop" subtest, to continue
>> verifying the previous behaviour where the ADD_ADDR is not sent due to a
>> lack of space.
>>
>> While at it, move syn_retrans_before_tcp_fallback down from struct
>> mptcp_pernet, to avoid creating another 3 bytes hole.
>>
>> Reviewed-by: Mat Martineau <martineau@kernel.org>
>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>
> ...
>
>> };
>> @@ -94,6 +95,11 @@ const char *mptcp_get_scheduler(const struct net *net)
>> return mptcp_get_pernet(net)->scheduler;
>> }
>>
>> +unsigned int mptcp_add_addr_v6_port_drop_ts(const struct net *net)
>> +{
>> + return mptcp_get_pernet(net)->add_addr_v6_port_drop_ts;
>> +}
>
> Please use READ_ONCE() over sysctls.
> This will avoid future patches from KCSAN bots.
Good point, I will do that.
I see READ_ONCE() should be used over all other MPTCP sysctls. I can
send fixes to net for those.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets
2026-06-01 5:22 ` [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets Matthieu Baerts (NGI0)
@ 2026-06-01 6:04 ` Eric Dumazet
2026-06-01 6:52 ` Matthieu Baerts
0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2026-06-01 6:04 UTC (permalink / raw)
To: Matthieu Baerts (NGI0)
Cc: Mat Martineau, Geliang Tang, David S. Miller, Jakub Kicinski,
Paolo Abeni, Simon Horman, netdev, mptcp, linux-kernel,
Neal Cardwell, Kuniyuki Iwashima
On Sun, May 31, 2026 at 10:25 PM Matthieu Baerts (NGI0)
<matttbe@kernel.org> wrote:
>
> With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
> taking 30 bytes, the 40-byte limit for the TCP options is reached. In
> this case, it is then not possible to send the address signal.
>
> The idea is to let MPTCP dropping the TCP-timestamps option for some
> specific packets, to be able to send some specific pure ACK carrying >28
> bytes of MPTCP options, like with this specific ADD_ADDR. A new
> parameter is passed from tcp_established_options to the MPTCP side to
> indicate if the TCP TS option is used, and if it should be dropped. The
> next commit implements the part on MPTCP side, but split into two
> patches to help TCP maintainers to identify the modifications on TCP
> side. This feature will be controlled by a new add_addr_v6_port_drop_ts
> MPTCP sysctl knob.
>
> It is important to keep in mind that dropping the TCP timestamps option
> for one packet of the connection could eventually disrupt some
> middleboxes: even if it should be unlikely, they could drop the packet
> or even block the connection. That's why this new feature will be
> controlled by a sysctl knob.
>
> Note that it would be technically possible to squeeze both options into
> the header if the ADD_ADDR is first written, and then the TCP timestamps
> without the NOPs preceding it. But this means more modifications on TCP
> side, plus some middleboxes could still be disrupted by that.
>
> About the implementation, instead of passing a new boolean (drop_ts),
> another option would be to pass the whole option structure (opts),
> but 'struct tcp_out_options' is currently defined in tcp_output.c, and
> would need to be exported. Plus that means the removal of the TCP TS
> option would be done on the MPTCP side, and not here on the TCP side.
> It feels clearer to remove other TCP options from the TCP side, than
> hiding that from the MPTCP side.
>
> Yet an other alternative would be to pass the size already taken by the
> other TCP options, and have a way to drop them all when needed. But this
> feels better to target only the timestamps option where dropping it
> should be safe, even if it is currently the only option that would be
> set before MPTCP, when MPTCP is used.
>
> Reviewed-by: Mat Martineau <martineau@kernel.org>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> To: Neal Cardwell <ncardwell@google.com>
> To: Kuniyuki Iwashima <kuniyu@google.com>
> ---
> include/net/mptcp.h | 3 ++-
> net/ipv4/tcp_output.c | 6 +++++-
> net/mptcp/options.c | 3 ++-
> 3 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> index f7263fe2a2e4..000b6593bfa4 100644
> --- a/include/net/mptcp.h
> +++ b/include/net/mptcp.h
> @@ -151,7 +151,7 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
> struct mptcp_out_options *opts);
> bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
> unsigned int *size, unsigned int remaining,
> - struct mptcp_out_options *opts);
> + bool *drop_ts, struct mptcp_out_options *opts);
> bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
>
> void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> @@ -270,6 +270,7 @@ static inline bool mptcp_established_options(struct sock *sk,
> struct sk_buff *skb,
> unsigned int *size,
> unsigned int remaining,
> + bool *drop_ts,
> struct mptcp_out_options *opts)
> {
> return false;
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ef0c10cd31c7..53ee4c8f5f8c 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1181,12 +1181,16 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
> */
> if (sk_is_mptcp(sk)) {
> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
> + bool drop_ts = opts->options & OPTION_TS;
> unsigned int opt_size = 0;
>
> if (mptcp_established_options(sk, skb, &opt_size, remaining,
> - &opts->mptcp)) {
> + &drop_ts, &opts->mptcp)) {
> opts->options |= OPTION_MPTCP;
> size += opt_size;
> +
> + if (drop_ts)
> + opts->options &= ~OPTION_TS;
> }
> }
Passing local variables' addresses forces the compiler to use a stack
canary in this hot function, even for non-MPTCP flows.
I was about to test the following patch, which removes the current
stack canary caused by MPTCP :/
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-92 (-92)
Function old new delta
tcp_options_write.isra 1423 1407 -16
mptcp_established_options 2746 2720 -26
tcp_established_options 553 503 -50
Total: Before=22110750, After=22110658, chg -0.00%
diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index f7263fe2a2e40b507257c3720cc2d78d37357d6d..f55838fd6cca308908607243735f8768540bb419
100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -149,9 +149,9 @@ bool mptcp_syn_options(struct sock *sk, const
struct sk_buff *skb,
unsigned int *size, struct mptcp_out_options *opts);
bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
struct mptcp_out_options *opts);
-bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
- unsigned int *size, unsigned int remaining,
- struct mptcp_out_options *opts);
+u32 mptcp_established_options(struct sock *sk, struct sk_buff *skb,
+ unsigned int remaining,
+ struct mptcp_out_options *opts);
bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
@@ -266,13 +266,13 @@ static inline bool mptcp_synack_options(const
struct request_sock *req,
return false;
}
-static inline bool mptcp_established_options(struct sock *sk,
- struct sk_buff *skb,
- unsigned int *size,
- unsigned int remaining,
- struct mptcp_out_options *opts)
+static inline u32 mptcp_established_options(struct sock *sk,
+ struct sk_buff *skb,
+ unsigned int *size,
+ unsigned int remaining,
+ struct mptcp_out_options *opts)
{
- return false;
+ return 0;
}
static inline bool mptcp_incoming_options(struct sock *sk,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ef0c10cd31c71ff585a937fde37f2b08b1214b5a..594ec6ba02d5413d43842f79aefbf4d8355c4f3f
100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1183,10 +1183,11 @@ static unsigned int
tcp_established_options(struct sock *sk, struct sk_buff *skb
unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
unsigned int opt_size = 0;
- if (mptcp_established_options(sk, skb, &opt_size, remaining,
- &opts->mptcp)) {
+ opt_size = mptcp_established_options(sk, skb, remaining,
+ &opts->mptcp);
+ if (opt_size) {
opts->options |= OPTION_MPTCP;
- size += opt_size;
+ size += (opt_size & 63);
}
}
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index d96130e49942e2fb878cd1897ad43c1d420fb233..503ebd71d562134431cf0ea33276c035bddae00c
100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -49,7 +49,7 @@ static struct mptcp_pernet *mptcp_get_pernet(const
struct net *net)
int mptcp_is_enabled(const struct net *net)
{
- return mptcp_get_pernet(net)->mptcp_enabled;
+ return READ_ONCE(mptcp_get_pernet(net)->mptcp_enabled);
}
unsigned int mptcp_get_add_addr_timeout(const struct net *net)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 8a1c5698983cff3082d68290626dd8f1e044527f..4ac01cecb6bd965f1f95f6f2342515eb2b7591f5
100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -836,15 +836,15 @@ static bool
mptcp_established_options_mp_fail(struct sock *sk,
return true;
}
-bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
- unsigned int *size, unsigned int remaining,
- struct mptcp_out_options *opts)
+u32 mptcp_established_options(struct sock *sk, struct sk_buff *skb,
+ unsigned int remaining,
+ struct mptcp_out_options *opts)
{
struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk);
struct mptcp_sock *msk = mptcp_sk(subflow->conn);
unsigned int opt_size = 0;
+ u32 total_size = 0;
bool snd_data_fin;
- bool ret = false;
opts->suboptions = 0;
@@ -852,34 +852,33 @@ bool mptcp_established_options(struct sock *sk,
struct sk_buff *skb,
* option space.
*/
if (unlikely(__mptcp_check_fallback(msk) &&
!mptcp_check_infinite_map(skb)))
- return true;
+ return 64;
if (unlikely(skb && TCP_SKB_CB(skb)->tcp_flags & TCPHDR_RST)) {
if (mptcp_established_options_fastclose(sk, &opt_size,
remaining, opts) ||
mptcp_established_options_mp_fail(sk, &opt_size,
remaining, opts)) {
- *size += opt_size;
+ total_size += opt_size;
remaining -= opt_size;
}
/* MP_RST can be used with MP_FASTCLOSE and MP_FAIL if
there is room */
if (mptcp_established_options_rst(sk, skb, &opt_size,
remaining, opts)) {
- *size += opt_size;
+ total_size += opt_size;
remaining -= opt_size;
}
- return true;
+ return 64 + total_size;;
}
snd_data_fin = mptcp_data_fin_enabled(msk);
if (mptcp_established_options_mp(sk, skb, snd_data_fin,
&opt_size, opts))
- ret = true;
+ total_size += 64;
else if (mptcp_established_options_dss(sk, skb, snd_data_fin,
&opt_size, opts)) {
unsigned int mp_fail_size;
- ret = true;
if (mptcp_established_options_mp_fail(sk, &mp_fail_size,
remaining -
opt_size, opts)) {
- *size += opt_size + mp_fail_size;
+ total_size += opt_size + mp_fail_size;
remaining -= opt_size - mp_fail_size;
- return true;
+ return total_size;
}
}
@@ -887,27 +886,24 @@ bool mptcp_established_options(struct sock *sk,
struct sk_buff *skb,
* TCP option space would be fatal
*/
if (WARN_ON_ONCE(opt_size > remaining))
- return false;
+ return 0;
- *size += opt_size;
+ total_size += opt_size;
remaining -= opt_size;
if (mptcp_established_options_add_addr(sk, skb, &opt_size,
remaining, opts)) {
- *size += opt_size;
+ total_size += opt_size;
remaining -= opt_size;
- ret = true;
} else if (mptcp_established_options_rm_addr(sk, &opt_size,
remaining, opts)) {
- *size += opt_size;
+ total_size += opt_size;
remaining -= opt_size;
- ret = true;
}
if (mptcp_established_options_mp_prio(sk, &opt_size, remaining, opts)) {
- *size += opt_size;
+ total_size += opt_size;
remaining -= opt_size;
- ret = true;
}
- return ret;
+ return total_size;
}
bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets
2026-06-01 6:04 ` Eric Dumazet
@ 2026-06-01 6:52 ` Matthieu Baerts
2026-06-01 7:26 ` Eric Dumazet
0 siblings, 1 reply; 19+ messages in thread
From: Matthieu Baerts @ 2026-06-01 6:52 UTC (permalink / raw)
To: Eric Dumazet
Cc: Mat Martineau, Geliang Tang, David S. Miller, Jakub Kicinski,
Paolo Abeni, Simon Horman, netdev, mptcp, linux-kernel,
Neal Cardwell, Kuniyuki Iwashima
Hi Eric,
Thank you for the review!
On 01/06/2026 16:04, Eric Dumazet wrote:
> On Sun, May 31, 2026 at 10:25 PM Matthieu Baerts (NGI0)
> <matttbe@kernel.org> wrote:
>>
>> With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
>> taking 30 bytes, the 40-byte limit for the TCP options is reached. In
>> this case, it is then not possible to send the address signal.
>>
>> The idea is to let MPTCP dropping the TCP-timestamps option for some
>> specific packets, to be able to send some specific pure ACK carrying >28
>> bytes of MPTCP options, like with this specific ADD_ADDR. A new
>> parameter is passed from tcp_established_options to the MPTCP side to
>> indicate if the TCP TS option is used, and if it should be dropped. The
>> next commit implements the part on MPTCP side, but split into two
>> patches to help TCP maintainers to identify the modifications on TCP
>> side. This feature will be controlled by a new add_addr_v6_port_drop_ts
>> MPTCP sysctl knob.
>>
>> It is important to keep in mind that dropping the TCP timestamps option
>> for one packet of the connection could eventually disrupt some
>> middleboxes: even if it should be unlikely, they could drop the packet
>> or even block the connection. That's why this new feature will be
>> controlled by a sysctl knob.
>>
>> Note that it would be technically possible to squeeze both options into
>> the header if the ADD_ADDR is first written, and then the TCP timestamps
>> without the NOPs preceding it. But this means more modifications on TCP
>> side, plus some middleboxes could still be disrupted by that.
>>
>> About the implementation, instead of passing a new boolean (drop_ts),
>> another option would be to pass the whole option structure (opts),
>> but 'struct tcp_out_options' is currently defined in tcp_output.c, and
>> would need to be exported. Plus that means the removal of the TCP TS
>> option would be done on the MPTCP side, and not here on the TCP side.
>> It feels clearer to remove other TCP options from the TCP side, than
>> hiding that from the MPTCP side.
>>
>> Yet an other alternative would be to pass the size already taken by the
>> other TCP options, and have a way to drop them all when needed. But this
>> feels better to target only the timestamps option where dropping it
>> should be safe, even if it is currently the only option that would be
>> set before MPTCP, when MPTCP is used.
>>
>> Reviewed-by: Mat Martineau <martineau@kernel.org>
>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> ---
>> To: Neal Cardwell <ncardwell@google.com>
>> To: Kuniyuki Iwashima <kuniyu@google.com>
>> ---
>> include/net/mptcp.h | 3 ++-
>> net/ipv4/tcp_output.c | 6 +++++-
>> net/mptcp/options.c | 3 ++-
>> 3 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
>> index f7263fe2a2e4..000b6593bfa4 100644
>> --- a/include/net/mptcp.h
>> +++ b/include/net/mptcp.h
>> @@ -151,7 +151,7 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
>> struct mptcp_out_options *opts);
>> bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
>> unsigned int *size, unsigned int remaining,
>> - struct mptcp_out_options *opts);
>> + bool *drop_ts, struct mptcp_out_options *opts);
>> bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
>>
>> void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
>> @@ -270,6 +270,7 @@ static inline bool mptcp_established_options(struct sock *sk,
>> struct sk_buff *skb,
>> unsigned int *size,
>> unsigned int remaining,
>> + bool *drop_ts,
>> struct mptcp_out_options *opts)
>> {
>> return false;
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index ef0c10cd31c7..53ee4c8f5f8c 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -1181,12 +1181,16 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
>> */
>> if (sk_is_mptcp(sk)) {
>> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
>> + bool drop_ts = opts->options & OPTION_TS;
>> unsigned int opt_size = 0;
>>
>> if (mptcp_established_options(sk, skb, &opt_size, remaining,
>> - &opts->mptcp)) {
>> + &drop_ts, &opts->mptcp)) {
>> opts->options |= OPTION_MPTCP;
>> size += opt_size;
>> +
>> + if (drop_ts)
>> + opts->options &= ~OPTION_TS;
>> }
>> }
>
> Passing local variables' addresses forces the compiler to use a stack
> canary in this hot function, even for non-MPTCP flows.
>
> I was about to test the following patch, which removes the current
> stack canary caused by MPTCP :/
Sorry, I didn't know you were planning to do that.
Would that be OK for you if I use an unused bit in opts->mptcp? It's a
bit "hackish", but it avoids adding a new local variable address. Or do
you have another idea?
The modifications in net/ipv4/tcp_output.c would then be limited to:
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ef0c10cd31c7..f4edc9c4f3fc 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1181,12 +1181,18 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
*/
if (sk_is_mptcp(sk)) {
unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
+ bool has_ts = opts->options & OPTION_TS;
unsigned int opt_size = 0;
if (mptcp_established_options(sk, skb, &opt_size, remaining,
- &opts->mptcp)) {
+ has_ts, &opts->mptcp)) {
opts->options |= OPTION_MPTCP;
size += opt_size;
+
+#if IS_ENABLED(CONFIG_MPTCP)
+ if (opts->mptcp.drop_ts)
+ opts->options &= ~OPTION_TS;
+#endif
}
}
I can also avoid adding a new parameter in mptcp_established_options
(bool has_ts) by setting opts->mptcp.drop_ts before calling it, but
that's clearer with this new parameter I think.
> $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
> add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-92 (-92)
> Function old new delta
> tcp_options_write.isra 1423 1407 -16
> mptcp_established_options 2746 2720 -26
> tcp_established_options 553 503 -50
> Total: Before=22110750, After=22110658, chg -0.00%
>
Good reduction!
(...)
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ef0c10cd31c71ff585a937fde37f2b08b1214b5a..594ec6ba02d5413d43842f79aefbf4d8355c4f3f
> 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1183,10 +1183,11 @@ static unsigned int
> tcp_established_options(struct sock *sk, struct sk_buff *skb
> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
> unsigned int opt_size = 0;
>
> - if (mptcp_established_options(sk, skb, &opt_size, remaining,
> - &opts->mptcp)) {
> + opt_size = mptcp_established_options(sk, skb, remaining,
> + &opts->mptcp);
> + if (opt_size) {
> opts->options |= OPTION_MPTCP;
> - size += opt_size;
> + size += (opt_size & 63);
Nice trick! What about returning a negative number when the MPTCP option
is not needed? Just to avoid playing with masks in the code?
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets
2026-06-01 6:52 ` Matthieu Baerts
@ 2026-06-01 7:26 ` Eric Dumazet
2026-06-01 7:36 ` Matthieu Baerts
2026-06-01 8:07 ` Matthieu Baerts
0 siblings, 2 replies; 19+ messages in thread
From: Eric Dumazet @ 2026-06-01 7:26 UTC (permalink / raw)
To: Matthieu Baerts
Cc: Mat Martineau, Geliang Tang, David S. Miller, Jakub Kicinski,
Paolo Abeni, Simon Horman, netdev, mptcp, linux-kernel,
Neal Cardwell, Kuniyuki Iwashima
On Sun, May 31, 2026 at 11:52 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>
> Hi Eric,
>
> Thank you for the review!
>
> On 01/06/2026 16:04, Eric Dumazet wrote:
> > On Sun, May 31, 2026 at 10:25 PM Matthieu Baerts (NGI0)
> > <matttbe@kernel.org> wrote:
> >>
> >> With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
> >> taking 30 bytes, the 40-byte limit for the TCP options is reached. In
> >> this case, it is then not possible to send the address signal.
> >>
> >> The idea is to let MPTCP dropping the TCP-timestamps option for some
> >> specific packets, to be able to send some specific pure ACK carrying >28
> >> bytes of MPTCP options, like with this specific ADD_ADDR. A new
> >> parameter is passed from tcp_established_options to the MPTCP side to
> >> indicate if the TCP TS option is used, and if it should be dropped. The
> >> next commit implements the part on MPTCP side, but split into two
> >> patches to help TCP maintainers to identify the modifications on TCP
> >> side. This feature will be controlled by a new add_addr_v6_port_drop_ts
> >> MPTCP sysctl knob.
> >>
> >> It is important to keep in mind that dropping the TCP timestamps option
> >> for one packet of the connection could eventually disrupt some
> >> middleboxes: even if it should be unlikely, they could drop the packet
> >> or even block the connection. That's why this new feature will be
> >> controlled by a sysctl knob.
> >>
> >> Note that it would be technically possible to squeeze both options into
> >> the header if the ADD_ADDR is first written, and then the TCP timestamps
> >> without the NOPs preceding it. But this means more modifications on TCP
> >> side, plus some middleboxes could still be disrupted by that.
> >>
> >> About the implementation, instead of passing a new boolean (drop_ts),
> >> another option would be to pass the whole option structure (opts),
> >> but 'struct tcp_out_options' is currently defined in tcp_output.c, and
> >> would need to be exported. Plus that means the removal of the TCP TS
> >> option would be done on the MPTCP side, and not here on the TCP side.
> >> It feels clearer to remove other TCP options from the TCP side, than
> >> hiding that from the MPTCP side.
> >>
> >> Yet an other alternative would be to pass the size already taken by the
> >> other TCP options, and have a way to drop them all when needed. But this
> >> feels better to target only the timestamps option where dropping it
> >> should be safe, even if it is currently the only option that would be
> >> set before MPTCP, when MPTCP is used.
> >>
> >> Reviewed-by: Mat Martineau <martineau@kernel.org>
> >> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> >> ---
> >> To: Neal Cardwell <ncardwell@google.com>
> >> To: Kuniyuki Iwashima <kuniyu@google.com>
> >> ---
> >> include/net/mptcp.h | 3 ++-
> >> net/ipv4/tcp_output.c | 6 +++++-
> >> net/mptcp/options.c | 3 ++-
> >> 3 files changed, 9 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> >> index f7263fe2a2e4..000b6593bfa4 100644
> >> --- a/include/net/mptcp.h
> >> +++ b/include/net/mptcp.h
> >> @@ -151,7 +151,7 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
> >> struct mptcp_out_options *opts);
> >> bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
> >> unsigned int *size, unsigned int remaining,
> >> - struct mptcp_out_options *opts);
> >> + bool *drop_ts, struct mptcp_out_options *opts);
> >> bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
> >>
> >> void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> >> @@ -270,6 +270,7 @@ static inline bool mptcp_established_options(struct sock *sk,
> >> struct sk_buff *skb,
> >> unsigned int *size,
> >> unsigned int remaining,
> >> + bool *drop_ts,
> >> struct mptcp_out_options *opts)
> >> {
> >> return false;
> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> >> index ef0c10cd31c7..53ee4c8f5f8c 100644
> >> --- a/net/ipv4/tcp_output.c
> >> +++ b/net/ipv4/tcp_output.c
> >> @@ -1181,12 +1181,16 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
> >> */
> >> if (sk_is_mptcp(sk)) {
> >> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
> >> + bool drop_ts = opts->options & OPTION_TS;
> >> unsigned int opt_size = 0;
> >>
> >> if (mptcp_established_options(sk, skb, &opt_size, remaining,
> >> - &opts->mptcp)) {
> >> + &drop_ts, &opts->mptcp)) {
> >> opts->options |= OPTION_MPTCP;
> >> size += opt_size;
> >> +
> >> + if (drop_ts)
> >> + opts->options &= ~OPTION_TS;
> >> }
> >> }
> >
> > Passing local variables' addresses forces the compiler to use a stack
> > canary in this hot function, even for non-MPTCP flows.
> >
> > I was about to test the following patch, which removes the current
> > stack canary caused by MPTCP :/
>
> Sorry, I didn't know you were planning to do that.
>
> Would that be OK for you if I use an unused bit in opts->mptcp? It's a
> bit "hackish", but it avoids adding a new local variable address. Or do
> you have another idea?
>
> The modifications in net/ipv4/tcp_output.c would then be limited to:
>
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ef0c10cd31c7..f4edc9c4f3fc 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1181,12 +1181,18 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
> */
> if (sk_is_mptcp(sk)) {
> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
> + bool has_ts = opts->options & OPTION_TS;
> unsigned int opt_size = 0;
>
> if (mptcp_established_options(sk, skb, &opt_size, remaining,
> - &opts->mptcp)) {
> + has_ts, &opts->mptcp)) {
> opts->options |= OPTION_MPTCP;
> size += opt_size;
> +
> +#if IS_ENABLED(CONFIG_MPTCP)
> + if (opts->mptcp.drop_ts)
> + opts->options &= ~OPTION_TS;
> +#endif
SGTM, but maybe the IS_ENABLED() is not needed in this block,
guarded by if (sk_is_mptcp(sk)) ?
Also I am unsure opts->mptcp.drop_ts is cleared already before
reaching tcp_established_options()?
> }
> }
>
>
> I can also avoid adding a new parameter in mptcp_established_options
> (bool has_ts) by setting opts->mptcp.drop_ts before calling it, but
> that's clearer with this new parameter I think.
>
> > $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
> > add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-92 (-92)
> > Function old new delta
> > tcp_options_write.isra 1423 1407 -16
> > mptcp_established_options 2746 2720 -26
> > tcp_established_options 553 503 -50
> > Total: Before=22110750, After=22110658, chg -0.00%
> >
>
> Good reduction!
>
> (...)
>
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index ef0c10cd31c71ff585a937fde37f2b08b1214b5a..594ec6ba02d5413d43842f79aefbf4d8355c4f3f
> > 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -1183,10 +1183,11 @@ static unsigned int
> > tcp_established_options(struct sock *sk, struct sk_buff *skb
> > unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
> > unsigned int opt_size = 0;
> >
> > - if (mptcp_established_options(sk, skb, &opt_size, remaining,
> > - &opts->mptcp)) {
> > + opt_size = mptcp_established_options(sk, skb, remaining,
> > + &opts->mptcp);
> > + if (opt_size) {
> > opts->options |= OPTION_MPTCP;
> > - size += opt_size;
> > + size += (opt_size & 63);
>
> Nice trick! What about returning a negative number when the MPTCP option
> is not needed? Just to avoid playing with masks in the code?
Yes, that would work, thanks.
I can provide a patch in a couple of hours.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets
2026-06-01 7:26 ` Eric Dumazet
@ 2026-06-01 7:36 ` Matthieu Baerts
2026-06-01 8:07 ` Matthieu Baerts
1 sibling, 0 replies; 19+ messages in thread
From: Matthieu Baerts @ 2026-06-01 7:36 UTC (permalink / raw)
To: Eric Dumazet
Cc: Mat Martineau, Geliang Tang, David S. Miller, Jakub Kicinski,
Paolo Abeni, Simon Horman, netdev, mptcp, linux-kernel,
Neal Cardwell, Kuniyuki Iwashima
On 01/06/2026 17:26, Eric Dumazet wrote:
> On Sun, May 31, 2026 at 11:52 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>>
>> Hi Eric,
>>
>> Thank you for the review!
>>
>> On 01/06/2026 16:04, Eric Dumazet wrote:
>>> On Sun, May 31, 2026 at 10:25 PM Matthieu Baerts (NGI0)
>>> <matttbe@kernel.org> wrote:
>>>>
>>>> With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
>>>> taking 30 bytes, the 40-byte limit for the TCP options is reached. In
>>>> this case, it is then not possible to send the address signal.
>>>>
>>>> The idea is to let MPTCP dropping the TCP-timestamps option for some
>>>> specific packets, to be able to send some specific pure ACK carrying >28
>>>> bytes of MPTCP options, like with this specific ADD_ADDR. A new
>>>> parameter is passed from tcp_established_options to the MPTCP side to
>>>> indicate if the TCP TS option is used, and if it should be dropped. The
>>>> next commit implements the part on MPTCP side, but split into two
>>>> patches to help TCP maintainers to identify the modifications on TCP
>>>> side. This feature will be controlled by a new add_addr_v6_port_drop_ts
>>>> MPTCP sysctl knob.
>>>>
>>>> It is important to keep in mind that dropping the TCP timestamps option
>>>> for one packet of the connection could eventually disrupt some
>>>> middleboxes: even if it should be unlikely, they could drop the packet
>>>> or even block the connection. That's why this new feature will be
>>>> controlled by a sysctl knob.
>>>>
>>>> Note that it would be technically possible to squeeze both options into
>>>> the header if the ADD_ADDR is first written, and then the TCP timestamps
>>>> without the NOPs preceding it. But this means more modifications on TCP
>>>> side, plus some middleboxes could still be disrupted by that.
>>>>
>>>> About the implementation, instead of passing a new boolean (drop_ts),
>>>> another option would be to pass the whole option structure (opts),
>>>> but 'struct tcp_out_options' is currently defined in tcp_output.c, and
>>>> would need to be exported. Plus that means the removal of the TCP TS
>>>> option would be done on the MPTCP side, and not here on the TCP side.
>>>> It feels clearer to remove other TCP options from the TCP side, than
>>>> hiding that from the MPTCP side.
>>>>
>>>> Yet an other alternative would be to pass the size already taken by the
>>>> other TCP options, and have a way to drop them all when needed. But this
>>>> feels better to target only the timestamps option where dropping it
>>>> should be safe, even if it is currently the only option that would be
>>>> set before MPTCP, when MPTCP is used.
>>>>
>>>> Reviewed-by: Mat Martineau <martineau@kernel.org>
>>>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>>> ---
>>>> To: Neal Cardwell <ncardwell@google.com>
>>>> To: Kuniyuki Iwashima <kuniyu@google.com>
>>>> ---
>>>> include/net/mptcp.h | 3 ++-
>>>> net/ipv4/tcp_output.c | 6 +++++-
>>>> net/mptcp/options.c | 3 ++-
>>>> 3 files changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
>>>> index f7263fe2a2e4..000b6593bfa4 100644
>>>> --- a/include/net/mptcp.h
>>>> +++ b/include/net/mptcp.h
>>>> @@ -151,7 +151,7 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
>>>> struct mptcp_out_options *opts);
>>>> bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
>>>> unsigned int *size, unsigned int remaining,
>>>> - struct mptcp_out_options *opts);
>>>> + bool *drop_ts, struct mptcp_out_options *opts);
>>>> bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
>>>>
>>>> void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
>>>> @@ -270,6 +270,7 @@ static inline bool mptcp_established_options(struct sock *sk,
>>>> struct sk_buff *skb,
>>>> unsigned int *size,
>>>> unsigned int remaining,
>>>> + bool *drop_ts,
>>>> struct mptcp_out_options *opts)
>>>> {
>>>> return false;
>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>>> index ef0c10cd31c7..53ee4c8f5f8c 100644
>>>> --- a/net/ipv4/tcp_output.c
>>>> +++ b/net/ipv4/tcp_output.c
>>>> @@ -1181,12 +1181,16 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
>>>> */
>>>> if (sk_is_mptcp(sk)) {
>>>> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
>>>> + bool drop_ts = opts->options & OPTION_TS;
>>>> unsigned int opt_size = 0;
>>>>
>>>> if (mptcp_established_options(sk, skb, &opt_size, remaining,
>>>> - &opts->mptcp)) {
>>>> + &drop_ts, &opts->mptcp)) {
>>>> opts->options |= OPTION_MPTCP;
>>>> size += opt_size;
>>>> +
>>>> + if (drop_ts)
>>>> + opts->options &= ~OPTION_TS;
>>>> }
>>>> }
>>>
>>> Passing local variables' addresses forces the compiler to use a stack
>>> canary in this hot function, even for non-MPTCP flows.
>>>
>>> I was about to test the following patch, which removes the current
>>> stack canary caused by MPTCP :/
>>
>> Sorry, I didn't know you were planning to do that.
>>
>> Would that be OK for you if I use an unused bit in opts->mptcp? It's a
>> bit "hackish", but it avoids adding a new local variable address. Or do
>> you have another idea?
>>
>> The modifications in net/ipv4/tcp_output.c would then be limited to:
>>
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index ef0c10cd31c7..f4edc9c4f3fc 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -1181,12 +1181,18 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
>> */
>> if (sk_is_mptcp(sk)) {
>> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
>> + bool has_ts = opts->options & OPTION_TS;
>> unsigned int opt_size = 0;
>>
>> if (mptcp_established_options(sk, skb, &opt_size, remaining,
>> - &opts->mptcp)) {
>> + has_ts, &opts->mptcp)) {
>> opts->options |= OPTION_MPTCP;
>> size += opt_size;
>> +
>> +#if IS_ENABLED(CONFIG_MPTCP)
>> + if (opts->mptcp.drop_ts)
>> + opts->options &= ~OPTION_TS;
>> +#endif
>
> SGTM, but maybe the IS_ENABLED() is not needed in this block,
> guarded by if (sk_is_mptcp(sk)) ?
I didn't think about that, I will check.
> Also I am unsure opts->mptcp.drop_ts is cleared already before
> reaching tcp_established_options()?
Indeed, it is not. I explicitly reset it in mptcp_established_options,
but I could do it here, that would be clearer.
>> }
>> }
>>
>>
>> I can also avoid adding a new parameter in mptcp_established_options
>> (bool has_ts) by setting opts->mptcp.drop_ts before calling it, but
>> that's clearer with this new parameter I think.
>>
>>> $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
>>> add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-92 (-92)
>>> Function old new delta
>>> tcp_options_write.isra 1423 1407 -16
>>> mptcp_established_options 2746 2720 -26
>>> tcp_established_options 553 503 -50
>>> Total: Before=22110750, After=22110658, chg -0.00%
>>>
>>
>> Good reduction!
>>
>> (...)
>>
>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>> index ef0c10cd31c71ff585a937fde37f2b08b1214b5a..594ec6ba02d5413d43842f79aefbf4d8355c4f3f
>>> 100644
>>> --- a/net/ipv4/tcp_output.c
>>> +++ b/net/ipv4/tcp_output.c
>>> @@ -1183,10 +1183,11 @@ static unsigned int
>>> tcp_established_options(struct sock *sk, struct sk_buff *skb
>>> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
>>> unsigned int opt_size = 0;
>>>
>>> - if (mptcp_established_options(sk, skb, &opt_size, remaining,
>>> - &opts->mptcp)) {
>>> + opt_size = mptcp_established_options(sk, skb, remaining,
>>> + &opts->mptcp);
>>> + if (opt_size) {
>>> opts->options |= OPTION_MPTCP;
>>> - size += opt_size;
>>> + size += (opt_size & 63);
>>
>> Nice trick! What about returning a negative number when the MPTCP option
>> is not needed? Just to avoid playing with masks in the code?
>
> Yes, that would work, thanks.
>
> I can provide a patch in a couple of hours.
No hurry, thank you! I will wait for your patches to be applied before
sending a v2.
(While at it, no need to initialise opt_size to 0 here above, and there
was a double ";;" in mptcp_established_options, around "return 64 +
total_size;;" but this code will change anyway.)
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets
2026-06-01 7:26 ` Eric Dumazet
2026-06-01 7:36 ` Matthieu Baerts
@ 2026-06-01 8:07 ` Matthieu Baerts
1 sibling, 0 replies; 19+ messages in thread
From: Matthieu Baerts @ 2026-06-01 8:07 UTC (permalink / raw)
To: Eric Dumazet
Cc: Mat Martineau, Geliang Tang, David S. Miller, Jakub Kicinski,
Paolo Abeni, Simon Horman, netdev, mptcp, linux-kernel,
Neal Cardwell, Kuniyuki Iwashima
On 01/06/2026 17:26, Eric Dumazet wrote:
> On Sun, May 31, 2026 at 11:52 PM Matthieu Baerts <matttbe@kernel.org> wrote:
(...)
>> The modifications in net/ipv4/tcp_output.c would then be limited to:
>>
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index ef0c10cd31c7..f4edc9c4f3fc 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -1181,12 +1181,18 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
>> */
>> if (sk_is_mptcp(sk)) {
>> unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
>> + bool has_ts = opts->options & OPTION_TS;
>> unsigned int opt_size = 0;
>>
>> if (mptcp_established_options(sk, skb, &opt_size, remaining,
>> - &opts->mptcp)) {
>> + has_ts, &opts->mptcp)) {
>> opts->options |= OPTION_MPTCP;
>> size += opt_size;
>> +
>> +#if IS_ENABLED(CONFIG_MPTCP)
>> + if (opts->mptcp.drop_ts)
>> + opts->options &= ~OPTION_TS;
>> +#endif
>
> SGTM, but maybe the IS_ENABLED() is not needed in this block,
> guarded by if (sk_is_mptcp(sk)) ?
It looks like it is still needed, same if I use:
if (IS_ENABLED(CONFIG_MPTCP) && sk_is_mptcp(sk)) {
Or maybe I missed another technique to avoid an extra #if.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-06-01 8:07 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 01/11] mptcp: options: suboptions sizes can be negative Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 02/11] mptcp: pm: avoid computing rm_addr size twice Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 03/11] mptcp: pm: avoid computing add_addr " Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 04/11] mptcp: introduce add_addr_v6_port_drop_ts sysctl knob Matthieu Baerts (NGI0)
2026-06-01 5:44 ` Eric Dumazet
2026-06-01 6:04 ` Matthieu Baerts
2026-06-01 5:22 ` [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets Matthieu Baerts (NGI0)
2026-06-01 6:04 ` Eric Dumazet
2026-06-01 6:52 ` Matthieu Baerts
2026-06-01 7:26 ` Eric Dumazet
2026-06-01 7:36 ` Matthieu Baerts
2026-06-01 8:07 ` Matthieu Baerts
2026-06-01 5:22 ` [PATCH net-next 06/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 07/11] selftests: mptcp: validate ADD_ADDRv6 + TS " Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 08/11] selftests: mptcp: always check sent/dropped ADD_ADDRs Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 09/11] selftests: mptcp: connect: test name in pcap file Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 10/11] selftests: mptcp: simult_flow: " Matthieu Baerts (NGI0)
2026-06-01 5:22 ` [PATCH net-next 11/11] selftests: mptcp: pcap: drop most of the payload Matthieu Baerts (NGI0)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox