Netdev List
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <edumazet@google.com>
Cc: netdev@vger.kernel.org, mptcp@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	Neal Cardwell <ncardwell@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	"Matthieu Baerts (NGI0)" <matttbe@kernel.org>,
	Mat Martineau <martineau@kernel.org>,
	Geliang Tang <geliang@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Simon Horman <horms@kernel.org>
Subject: Re: [PATCH net-next v2 05/15] tcp: allow mptcp to drop TS for some packets
Date: Thu, 11 Jun 2026 09:39:21 +0200	[thread overview]
Message-ID: <fb96d949-dcf5-494d-8076-eb4a972fa5e4@redhat.com> (raw)
In-Reply-To: <20260605-net-next-mptcp-add-addr6-port-ts-v2-5-758e7ca73f4d@kernel.org>

Hi Eric,

On 6/5/26 11:21 AM, Matthieu Baerts (NGI0) wrote:
> With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
> taking 30 bytes, the 40-byte limit for the TCP options is reached. In
> this case, it is then not possible to send the address signal.
> 
> The idea is to let MPTCP dropping the TCP-timestamps option for some
> specific packets, to be able to send some specific pure ACK carrying >28
> bytes of MPTCP options, like with this specific ADD_ADDR. A new
> parameter is passed from tcp_established_options to the MPTCP side to
> indicate if the TCP TS option is used, and if it should be dropped. The
> next commit implements the part on MPTCP side, but split into two
> patches to help TCP maintainers to identify the modifications on TCP
> side. This feature will be controlled by a new add_addr_v6_port_drop_ts
> MPTCP sysctl knob.
> 
> It is important to keep in mind that dropping the TCP timestamps option
> for one packet of the connection could eventually disrupt some
> middleboxes: even if it should be unlikely, they could drop the packet
> or even block the connection. That's why this new feature will be
> controlled by a sysctl knob.
> 
> Note that it would be technically possible to squeeze both options into
> the header if the ADD_ADDR is first written, and then the TCP timestamps
> without the NOPs preceding it. But this means more modifications on TCP
> side, plus some middleboxes could still be disrupted by that.
> 
> In this implementation, an unused bit is used in mptcp_out_options
> structure to avoid passing an address to a local variable. Reading and
> setting it needs CONFIG_MPTCP, so the whole block now has this #if
> condition: mptcp_established_options() is then no longer used without
> CONFIG_MPTCP.
> 
> About alternatives, instead of passing a new boolean (has_ts), another
> option would be to pass the whole option structure (opts), but
> 'struct tcp_out_options' is currently defined in tcp_output.c, and it
> would need to be exported. Plus that means the removal of the TCP TS
> option would be done on the MPTCP side, and not here on the TCP side.
> It feels clearer to remove other TCP options from the TCP side, than
> hiding that from the MPTCP side.
> 
> Yet an other alternative would be to pass the size already taken by the
> other TCP options, and have a way to drop them all when needed. But this
> feels better to target only the timestamps option where dropping it
> should be safe, even if it is currently the only option that would be
> set before MPTCP, when MPTCP is used.
> 
> Reviewed-by: Mat Martineau <martineau@kernel.org>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> - v2: Avoid passing local variables' addresses to
>   mptcp_established_options not to force the compiler to use a stack
>   canary in this hot function, even for non-MPTCP flows. (Eric Dumazet)
> To: Neal Cardwell <ncardwell@google.com>
> To: Kuniyuki Iwashima <kuniyu@google.com>
> ---
>  include/net/mptcp.h   | 13 +++----------
>  net/ipv4/tcp_output.c | 10 +++++++++-
>  net/mptcp/options.c   |  2 +-
>  3 files changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
> index 24d1016a4664..71b9fc5a5796 100644
> --- a/include/net/mptcp.h
> +++ b/include/net/mptcp.h
> @@ -72,7 +72,8 @@ struct mptcp_out_options {
>  	u8 reset_reason:4,
>  	   reset_transient:1,
>  	   csum_reqd:1,
> -	   allow_join_id0:1;
> +	   allow_join_id0:1,
> +	   drop_ts:1;
>  	union {
>  		struct {
>  			u64 sndr_key;
> @@ -153,7 +154,7 @@ bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
>  bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
>  			  struct mptcp_out_options *opts);
>  int mptcp_established_options(struct sock *sk, struct sk_buff *skb,
> -			      unsigned int remaining,
> +			      unsigned int remaining, bool has_ts,
>  			      struct mptcp_out_options *opts);
>  bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
>  
> @@ -269,14 +270,6 @@ static inline bool mptcp_synack_options(const struct request_sock *req,
>  	return false;
>  }
>  
> -static inline int mptcp_established_options(struct sock *sk,
> -					    struct sk_buff *skb,
> -					    unsigned int remaining,
> -					    struct mptcp_out_options *opts)
> -{
> -	return -1;
> -}
> -
>  static inline bool mptcp_incoming_options(struct sock *sk,
>  					  struct sk_buff *skb)
>  {
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index d3b8e61d3c5e..26dd751ec72a 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1175,6 +1175,7 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
>  		size += TCPOLEN_TSTAMP_ALIGNED;
>  	}
>  
> +#if IS_ENABLED(CONFIG_MPTCP)
>  	/* MPTCP options have precedence over SACK for the limited TCP
>  	 * option space because a MPTCP connection would be forced to
>  	 * fall back to regular TCP if a required multipath option is
> @@ -1183,15 +1184,22 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
>  	 */
>  	if (sk_is_mptcp(sk)) {
>  		unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
> +		bool has_ts = opts->options & OPTION_TS;
>  		int opt_size;
>  
> -		opt_size = mptcp_established_options(sk, skb, remaining,
> +		opts->mptcp.drop_ts = 0;
> +
> +		opt_size = mptcp_established_options(sk, skb, remaining, has_ts,
>  						     &opts->mptcp);
>  		if (opt_size >= 0) {
>  			opts->options |= OPTION_MPTCP;
>  			size += opt_size;
> +
> +			if (opts->mptcp.drop_ts)
> +				opts->options &= ~OPTION_TS;

I'm wondering if you are ok with this patch in the current form?

One thing that was discussed on the mptcp ML was exposing the
tcp_out_options layout so that mptcp_established_options() could receive
such argument and likely clean-up a bit this code.

Not done here because placing tcp_out_options under `include` felt a bit
"too much". Perhaps adding a `net/ipv4/tcp_option.h` header (and
including it from the mptcp code) would be more palatable?

Thanks,

Paolo


  reply	other threads:[~2026-06-11  7:39 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-05  9:21 [PATCH net-next v2 00/15] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 01/15] mptcp: options: suboptions sizes can be negative Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 02/15] mptcp: pm: avoid computing rm_addr size twice Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 03/15] mptcp: pm: avoid computing add_addr " Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 04/15] mptcp: introduce add_addr_v6_port_drop_ts sysctl knob Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 05/15] tcp: allow mptcp to drop TS for some packets Matthieu Baerts (NGI0)
2026-06-11  7:39   ` Paolo Abeni [this message]
2026-06-11  8:49     ` Matthieu Baerts
2026-06-11 12:38       ` Eric Dumazet
2026-06-05  9:21 ` [PATCH net-next v2 06/15] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
2026-06-10 15:13   ` Matthieu Baerts
2026-06-10 15:22     ` Jakub Kicinski
2026-06-05  9:21 ` [PATCH net-next v2 07/15] selftests: mptcp: validate ADD_ADDRv6 + TS " Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 08/15] selftests: mptcp: always check sent/dropped ADD_ADDRs Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 09/15] mptcp: pm: use for_each_subflow helper Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 10/15] mptcp: pm: rename add_entry structure to add_addr Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 11/15] mptcp: pm: uniform announced addresses helpers Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 12/15] mptcp: pm: remove add_ prefix from timer Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 13/15] mptcp: pm: make mptcp_pm_add_addr_send_ack static Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 14/15] mptcp: pm: avoid using del_timer directly Matthieu Baerts (NGI0)
2026-06-05  9:21 ` [PATCH net-next v2 15/15] mptcp: options: rst: drop unused skb parameter Matthieu Baerts (NGI0)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fb96d949-dcf5-494d-8076-eb4a972fa5e4@redhat.com \
    --to=pabeni@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=geliang@kernel.org \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martineau@kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox