Netdev List
 help / color / mirror / Atom feed
From: Matthieu Baerts <matttbe@kernel.org>
To: Eric Dumazet <edumazet@google.com>
Cc: Mat Martineau <martineau@kernel.org>,
	Geliang Tang <geliang@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	netdev@vger.kernel.org, mptcp@lists.linux.dev,
	linux-kernel@vger.kernel.org,
	Neal Cardwell <ncardwell@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>
Subject: Re: [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets
Date: Mon, 1 Jun 2026 17:36:54 +1000	[thread overview]
Message-ID: <da187652-64c5-4775-a662-c52eb6cd8549@kernel.org> (raw)
In-Reply-To: <CANn89iLBh-tSEzDuZce5G2dmC6An1agoJ9UNq2qwAAC0+oTuPg@mail.gmail.com>

On 01/06/2026 17:26, Eric Dumazet wrote:
> On Sun, May 31, 2026 at 11:52 PM Matthieu Baerts <matttbe@kernel.org> wrote:
>>
>> Hi Eric,
>>
>> Thank you for the review!
>>
>> On 01/06/2026 16:04, Eric Dumazet wrote:
>>> On Sun, May 31, 2026 at 10:25 PM Matthieu Baerts (NGI0)
>>> <matttbe@kernel.org> wrote:
>>>>
>>>> With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port
>>>> taking 30 bytes, the 40-byte limit for the TCP options is reached. In
>>>> this case, it is then not possible to send the address signal.
>>>>
>>>> The idea is to let MPTCP dropping the TCP-timestamps option for some
>>>> specific packets, to be able to send some specific pure ACK carrying >28
>>>> bytes of MPTCP options, like with this specific ADD_ADDR. A new
>>>> parameter is passed from tcp_established_options to the MPTCP side to
>>>> indicate if the TCP TS option is used, and if it should be dropped. The
>>>> next commit implements the part on MPTCP side, but split into two
>>>> patches to help TCP maintainers to identify the modifications on TCP
>>>> side. This feature will be controlled by a new add_addr_v6_port_drop_ts
>>>> MPTCP sysctl knob.
>>>>
>>>> It is important to keep in mind that dropping the TCP timestamps option
>>>> for one packet of the connection could eventually disrupt some
>>>> middleboxes: even if it should be unlikely, they could drop the packet
>>>> or even block the connection. That's why this new feature will be
>>>> controlled by a sysctl knob.
>>>>
>>>> Note that it would be technically possible to squeeze both options into
>>>> the header if the ADD_ADDR is first written, and then the TCP timestamps
>>>> without the NOPs preceding it. But this means more modifications on TCP
>>>> side, plus some middleboxes could still be disrupted by that.
>>>>
>>>> About the implementation, instead of passing a new boolean (drop_ts),
>>>> another option would be to pass the whole option structure (opts),
>>>> but 'struct tcp_out_options' is currently defined in tcp_output.c, and
>>>> would need to be exported. Plus that means the removal of the TCP TS
>>>> option would be done on the MPTCP side, and not here on the TCP side.
>>>> It feels clearer to remove other TCP options from the TCP side, than
>>>> hiding that from the MPTCP side.
>>>>
>>>> Yet an other alternative would be to pass the size already taken by the
>>>> other TCP options, and have a way to drop them all when needed. But this
>>>> feels better to target only the timestamps option where dropping it
>>>> should be safe, even if it is currently the only option that would be
>>>> set before MPTCP, when MPTCP is used.
>>>>
>>>> Reviewed-by: Mat Martineau <martineau@kernel.org>
>>>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>>> ---
>>>> To: Neal Cardwell <ncardwell@google.com>
>>>> To: Kuniyuki Iwashima <kuniyu@google.com>
>>>> ---
>>>>  include/net/mptcp.h   | 3 ++-
>>>>  net/ipv4/tcp_output.c | 6 +++++-
>>>>  net/mptcp/options.c   | 3 ++-
>>>>  3 files changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/net/mptcp.h b/include/net/mptcp.h
>>>> index f7263fe2a2e4..000b6593bfa4 100644
>>>> --- a/include/net/mptcp.h
>>>> +++ b/include/net/mptcp.h
>>>> @@ -151,7 +151,7 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
>>>>                           struct mptcp_out_options *opts);
>>>>  bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
>>>>                                unsigned int *size, unsigned int remaining,
>>>> -                              struct mptcp_out_options *opts);
>>>> +                              bool *drop_ts, struct mptcp_out_options *opts);
>>>>  bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
>>>>
>>>>  void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
>>>> @@ -270,6 +270,7 @@ static inline bool mptcp_established_options(struct sock *sk,
>>>>                                              struct sk_buff *skb,
>>>>                                              unsigned int *size,
>>>>                                              unsigned int remaining,
>>>> +                                            bool *drop_ts,
>>>>                                              struct mptcp_out_options *opts)
>>>>  {
>>>>         return false;
>>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>>> index ef0c10cd31c7..53ee4c8f5f8c 100644
>>>> --- a/net/ipv4/tcp_output.c
>>>> +++ b/net/ipv4/tcp_output.c
>>>> @@ -1181,12 +1181,16 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
>>>>          */
>>>>         if (sk_is_mptcp(sk)) {
>>>>                 unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
>>>> +               bool drop_ts = opts->options & OPTION_TS;
>>>>                 unsigned int opt_size = 0;
>>>>
>>>>                 if (mptcp_established_options(sk, skb, &opt_size, remaining,
>>>> -                                             &opts->mptcp)) {
>>>> +                                             &drop_ts, &opts->mptcp)) {
>>>>                         opts->options |= OPTION_MPTCP;
>>>>                         size += opt_size;
>>>> +
>>>> +                       if (drop_ts)
>>>> +                               opts->options &= ~OPTION_TS;
>>>>                 }
>>>>         }
>>>
>>> Passing local variables' addresses forces the compiler to use a stack
>>> canary in this hot function, even for non-MPTCP flows.
>>>
>>> I was about to test the following patch, which removes the current
>>> stack canary caused by MPTCP :/
>>
>> Sorry, I didn't know you were planning to do that.
>>
>> Would that be OK for you if I use an unused bit in opts->mptcp? It's a
>> bit "hackish", but it avoids adding a new local variable address. Or do
>> you have another idea?
>>
>> The modifications in net/ipv4/tcp_output.c would then be limited to:
>>
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index ef0c10cd31c7..f4edc9c4f3fc 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -1181,12 +1181,18 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb
>>          */
>>         if (sk_is_mptcp(sk)) {
>>                 unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
>> +               bool has_ts = opts->options & OPTION_TS;
>>                 unsigned int opt_size = 0;
>>
>>                 if (mptcp_established_options(sk, skb, &opt_size, remaining,
>> -                                             &opts->mptcp)) {
>> +                                             has_ts, &opts->mptcp)) {
>>                         opts->options |= OPTION_MPTCP;
>>                         size += opt_size;
>> +
>> +#if IS_ENABLED(CONFIG_MPTCP)
>> +                       if (opts->mptcp.drop_ts)
>> +                               opts->options &= ~OPTION_TS;
>> +#endif
> 
> SGTM, but maybe the IS_ENABLED() is not needed in this block,
> guarded by  if (sk_is_mptcp(sk))  ?

I didn't think about that, I will check.

> Also I am unsure opts->mptcp.drop_ts is cleared already before
> reaching  tcp_established_options()?

Indeed, it is not. I explicitly reset it in mptcp_established_options,
but I could do it here, that would be clearer.

>>                 }
>>         }
>>
>>
>> I can also avoid adding a new parameter in mptcp_established_options
>> (bool has_ts) by setting opts->mptcp.drop_ts before calling it, but
>> that's clearer with this new parameter I think.
>>
>>> $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new
>>> add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-92 (-92)
>>> Function                                     old     new   delta
>>> tcp_options_write.isra                      1423    1407     -16
>>> mptcp_established_options                   2746    2720     -26
>>> tcp_established_options                      553     503     -50
>>> Total: Before=22110750, After=22110658, chg -0.00%
>>>
>>
>> Good reduction!
>>
>> (...)
>>
>>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>>> index ef0c10cd31c71ff585a937fde37f2b08b1214b5a..594ec6ba02d5413d43842f79aefbf4d8355c4f3f
>>> 100644
>>> --- a/net/ipv4/tcp_output.c
>>> +++ b/net/ipv4/tcp_output.c
>>> @@ -1183,10 +1183,11 @@ static unsigned int
>>> tcp_established_options(struct sock *sk, struct sk_buff *skb
>>>                 unsigned int remaining = MAX_TCP_OPTION_SPACE - size;
>>>                 unsigned int opt_size = 0;
>>>
>>> -               if (mptcp_established_options(sk, skb, &opt_size, remaining,
>>> -                                             &opts->mptcp)) {
>>> +               opt_size = mptcp_established_options(sk, skb, remaining,
>>> +                                                    &opts->mptcp);
>>> +               if (opt_size) {
>>>                         opts->options |= OPTION_MPTCP;
>>> -                       size += opt_size;
>>> +                       size += (opt_size & 63);
>>
>> Nice trick! What about returning a negative number when the MPTCP option
>> is not needed? Just to avoid playing with masks in the code?
> 
> Yes, that would work, thanks.
> 
> I can provide a patch in a couple of hours.
No hurry, thank you! I will wait for your patches to be applied before
sending a v2.

(While at it, no need to initialise opt_size to 0 here above, and there
was a double ";;" in mptcp_established_options, around "return 64 +
total_size;;" but this code will change anyway.)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.


  reply	other threads:[~2026-06-01  7:37 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01  5:22 [PATCH net-next 00/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
2026-06-01  5:22 ` [PATCH net-next 01/11] mptcp: options: suboptions sizes can be negative Matthieu Baerts (NGI0)
2026-06-01  5:22 ` [PATCH net-next 02/11] mptcp: pm: avoid computing rm_addr size twice Matthieu Baerts (NGI0)
2026-06-01  5:22 ` [PATCH net-next 03/11] mptcp: pm: avoid computing add_addr " Matthieu Baerts (NGI0)
2026-06-01  5:22 ` [PATCH net-next 04/11] mptcp: introduce add_addr_v6_port_drop_ts sysctl knob Matthieu Baerts (NGI0)
2026-06-01  5:44   ` Eric Dumazet
2026-06-01  6:04     ` Matthieu Baerts
2026-06-01  5:22 ` [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets Matthieu Baerts (NGI0)
2026-06-01  6:04   ` Eric Dumazet
2026-06-01  6:52     ` Matthieu Baerts
2026-06-01  7:26       ` Eric Dumazet
2026-06-01  7:36         ` Matthieu Baerts [this message]
2026-06-01  8:07         ` Matthieu Baerts
2026-06-01  5:22 ` [PATCH net-next 06/11] mptcp: pm: drop TCP TS with ADD_ADDRv6 + port Matthieu Baerts (NGI0)
2026-06-01  5:22 ` [PATCH net-next 07/11] selftests: mptcp: validate ADD_ADDRv6 + TS " Matthieu Baerts (NGI0)
2026-06-01  5:22 ` [PATCH net-next 08/11] selftests: mptcp: always check sent/dropped ADD_ADDRs Matthieu Baerts (NGI0)
2026-06-01  5:22 ` [PATCH net-next 09/11] selftests: mptcp: connect: test name in pcap file Matthieu Baerts (NGI0)
2026-06-01  5:22 ` [PATCH net-next 10/11] selftests: mptcp: simult_flow: " Matthieu Baerts (NGI0)
2026-06-01  5:22 ` [PATCH net-next 11/11] selftests: mptcp: pcap: drop most of the payload Matthieu Baerts (NGI0)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da187652-64c5-4775-a662-c52eb6cd8549@kernel.org \
    --to=matttbe@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=geliang@kernel.org \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martineau@kernel.org \
    --cc=mptcp@lists.linux.dev \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox