netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: "Chia-Yu Chang (Nokia)" <chia-yu.chang@nokia-bell-labs.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"horms@kernel.org" <horms@kernel.org>,
	"dsahern@kernel.org" <dsahern@kernel.org>,
	"kuniyu@amazon.com" <kuniyu@amazon.com>,
	"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"dave.taht@gmail.com" <dave.taht@gmail.com>,
	"jhs@mojatatu.com" <jhs@mojatatu.com>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	"xiyou.wangcong@gmail.com" <xiyou.wangcong@gmail.com>,
	"jiri@resnulli.us" <jiri@resnulli.us>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"andrew+netdev@lunn.ch" <andrew+netdev@lunn.ch>,
	"donald.hunter@gmail.com" <donald.hunter@gmail.com>,
	"ast@fiberby.net" <ast@fiberby.net>,
	"liuhangbin@gmail.com" <liuhangbin@gmail.com>,
	"shuah@kernel.org" <shuah@kernel.org>,
	"linux-kselftest@vger.kernel.org"
	<linux-kselftest@vger.kernel.org>,
	"ij@kernel.org" <ij@kernel.org>,
	"ncardwell@google.com" <ncardwell@google.com>,
	"Koen De Schepper (Nokia)" <koen.de_schepper@nokia-bell-labs.com>,
	"g.white@cablelabs.com" <g.white@cablelabs.com>,
	"ingemar.s.johansson@ericsson.com"
	<ingemar.s.johansson@ericsson.com>,
	"mirja.kuehlewind@ericsson.com" <mirja.kuehlewind@ericsson.com>,
	"cheshire@apple.com" <cheshire@apple.com>,
	"rs.ietf@gmx.at" <rs.ietf@gmx.at>,
	"Jason_Livingood@comcast.com" <Jason_Livingood@comcast.com>,
	"vidhi_goel@apple.com" <vidhi_goel@apple.com>
Subject: Re: [PATCH v12 net-next 11/15] tcp: accecn: AccECN option
Date: Wed, 16 Jul 2025 10:29:27 +0200	[thread overview]
Message-ID: <f60c5ef5-47b6-4132-bd7c-9707c81289a2@redhat.com> (raw)
In-Reply-To: <PAXPR07MB7984F66EB2AD576D2385C351A357A@PAXPR07MB7984.eurprd07.prod.outlook.com>

On 7/15/25 4:49 PM, Chia-Yu Chang (Nokia) wrote:
> On 7/4/25 10:53 AM, chia-yu.chang@nokia-bell-labs.com wrote:
> [...]
>>> +}
>>> +
>>> +/* Handles AccECN option ECT and CE 24-bit byte counters update into
>>> + * the u32 value in tcp_sock. As we're processing TCP options, it is
>>> + * safe to access from - 1.
>>> + */
>>> +static inline s32 tcp_update_ecn_bytes(u32 *cnt, const char *from,
>>> +                                    u32 init_offset) {
>>> +     u32 truncated = (get_unaligned_be32(from - 1) - init_offset) &
>>> +                     0xFFFFFFU;
>>> +     u32 delta = (truncated - *cnt) & 0xFFFFFFU;
>>> +
>>> +     /* If delta has the highest bit set (24th bit) indicating
>>> +      * negative, sign extend to correct an estimation using
>>> +      * sign_extend32(delta, 24 - 1)
>>> +      */
>>> +     delta = sign_extend32(delta, 23);
>>
>> I'm under the impression that delta could be simply:
>>
>>         delta = (truncated - *cnt)
>>
>> What am I missing?
> 
> Hi Paolo,
> 
> I think this code is necessary to ensure delta will not a super large value in case of wrap adound.
> 
> For instance, if truncated = 0x0000001F and *cnt = 0x00FFFFFF, then (truncated - *cnt) = 0xFF000020
> 
> But sign_extend32(((truncated - *cnt) & 0xFFFFFFU, 23) = 0x00000020, which shall be corrrect.
> 
> Another example, if truncated = 0x0000001F and *cnt = 0x0000003F, then (truncated - *cnt) = 0xFFFFFFE0
> 
> And sign_extend32(((truncated - *cnt) & 0xFFFFFFU, 23) = 0xFFFFFFE0.
> 
> In this latter example, both are correct.

Ok, I missed the fact that *cnt is a 24 bit integer, too. Your code
looks good.

> 
> [...]
>>> a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 
>>> d98a1a17eb52..2169fd28594e 100644
>>> --- a/net/ipv4/tcp_output.c
>>> +++ b/net/ipv4/tcp_output.c
>>> @@ -385,6 +385,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
>>>  #define OPTION_SMC           BIT(9)
>>>  #define OPTION_MPTCP         BIT(10)
>>>  #define OPTION_AO            BIT(11)
>>> +#define OPTION_ACCECN                BIT(12)
>>>
>>>  static void smc_options_write(__be32 *ptr, u16 *options)  { @@ -406,6 
>>> +407,8 @@ struct tcp_out_options {
>>>       u16 mss;                /* 0 to disable */
>>>       u8 ws;                  /* window scale, 0 to disable */
>>>       u8 num_sack_blocks;     /* number of SACK blocks to include */
>>> +     u8 num_accecn_fields:7, /* number of AccECN fields needed */
>>> +        use_synack_ecn_bytes:1; /* Use synack_ecn_bytes or not */
>>>       u8 hash_size;           /* bytes in hash_location */
>>>       u8 bpf_opt_len;         /* length of BPF hdr option */
>>>       __u8 *hash_location;    /* temporary pointer, overloaded */
>>> @@ -621,6 +624,8 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
>>>                             struct tcp_out_options *opts,
>>>                             struct tcp_key *key)  {
>>> +     u8 leftover_highbyte = TCPOPT_NOP; /* replace 1st NOP if avail */
>>> +     u8 leftover_lowbyte = TCPOPT_NOP;  /* replace 2nd NOP in 
>>> + succession */
>>>       __be32 *ptr = (__be32 *)(th + 1);
>>>       u16 options = opts->options;    /* mungable copy */
>>>
>>> @@ -656,15 +661,79 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
>>>               *ptr++ = htonl(opts->tsecr);
>>>       }
>>>
>>> +     if (OPTION_ACCECN & options) {
>>> +             /* Initial values for AccECN option, ordered is based on ECN field bits
>>> +              * similar to received_ecn_bytes. Used for SYN/ACK AccECN option.
>>> +              */
>>> +             static u32 synack_ecn_bytes[3] = { 0, 0, 0 };
>>
>> I think this does not address Eric's concern on v9 WRT global variable, as every CPU will still touch the same memory while accessing the above array.
>>
>>> +             const u8 ect0_idx = INET_ECN_ECT_0 - 1;
>>> +             const u8 ect1_idx = INET_ECN_ECT_1 - 1;
>>> +             const u8 ce_idx = INET_ECN_CE - 1;
>>> +             u32 e0b;
>>> +             u32 e1b;
>>> +             u32 ceb;
>>> +             u8 len;
>>> +
>>> +             if (opts->use_synack_ecn_bytes) {
>>> +                     e0b = synack_ecn_bytes[ect0_idx] + TCP_ACCECN_E0B_INIT_OFFSET;
>>> +                     e1b = synack_ecn_bytes[ect1_idx] + TCP_ACCECN_E1B_INIT_OFFSET;
>>> +                     ceb = synack_ecn_bytes[ce_idx] + 
>>> + TCP_ACCECN_CEB_INIT_OFFSET;
>>
>> On the flip side I don't see such array modified here, not in later patches?!? If so you could make it const and a global variable would be ok.
> 
> Sure, I will make it as static const global variable, which I hope this is ok for you.
> 
> 
>>> +/* Calculates how long AccECN option will fit to @remaining option space.
>>> + *
>>> + * AccECN option can sometimes replace NOPs used for alignment of 
>>> +other
>>> + * TCP options (up to @max_combine_saving available).
>>> + *
>>> + * Only solutions with at least @required AccECN fields are accepted.
>>> + *
>>> + * Returns: The size of the AccECN option excluding space repurposed 
>>> +from
>>> + * the alignment of the other options.
>>> + */
>>> +static int tcp_options_fit_accecn(struct tcp_out_options *opts, int required,
>>> +                               int remaining) {
>>> +     int size = TCP_ACCECN_MAXSIZE;
>>> +     int max_combine_saving;
>>> +
>>> +     if (opts->use_synack_ecn_bytes)
>>> +             max_combine_saving = tcp_synack_options_combine_saving(opts);
>>> +     else
>>> +             max_combine_saving = opts->num_sack_blocks > 0 ? 2 : 0;
>>> +     opts->num_accecn_fields = TCP_ACCECN_NUMFIELDS;
>>> +     while (opts->num_accecn_fields >= required) {
>>> +             int leftover_size = size & 0x3;
>>> +             /* Pad to dword if cannot combine */
>>> +             if (leftover_size > max_combine_saving)
>>> +                     leftover_size = -((4 - leftover_size) & 0x3);
>>
>> I *think* that with the above you mean something alike:
>>
>>                         size = ALIGN(size, 4);
>>                         leftover_size = 0
>>
>> ?
>>
>> The used code looks quite obscure to me.
>>
>> /P
> 
> Indeed, I will make below changes in the next version by using ALIGN() and ALIGN_DOWN()
> 
> Here the aim is to pad up (if max_combine_saving is not enough) or trim down (if max_combine saving is enough) to DWORD.
> 
> And the final return size will be the the a multiple of DWORD.
> 
> Would it be more readable?
> 
> /* Pad to DWORD if cannot combine. Align_size represents
>  * the final size to be used by AccECN options.
>  * +======+=============+====================+============+
>  * | size | size exceed | max_combine_saving | align_size |
>  * |      |    DWORD    |                    |            |
>  * +======+=============+====================+============+
>  * |   2  |       2     |         < 2        |      4     |
>  * |   2  |       2     |         >=2        |      0     |
>  * |   5  |       1     |         < 1        |      8     |
>  * |   5  |       1     |         >=1        |      4     |
>  * |   8  |       0     |         Any        |      8     |
>  * |  11  |       3     |         < 3        |     12     |
>  * |  11  |       3     |         >=3        |      8     |
>  * +======+=============+====================+============+
>  */
> if ((size & 0x3) > max_combine_saving)
>         align_size = ALIGN(size, 4);
> else
>         align_size = ALIGN_DOWN(size, 4);
> 
> if (remaining >= align_size) {
>         size = align_size;
>         break;
> }

Yes, IMHO is more readable. No need to add the table, the original
comment is clear enough.

Thanks,

Paolo


  reply	other threads:[~2025-07-16  8:29 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-04  8:53 [PATCH v12 net-next 00/15] AccECN protocol patch series chia-yu.chang
2025-07-04  8:53 ` [PATCH v12 net-next 01/15] tcp: reorganize SYN ECN code chia-yu.chang
2025-07-04  8:53 ` [PATCH v12 net-next 02/15] tcp: fast path functions later chia-yu.chang
2025-07-04  8:53 ` [PATCH v12 net-next 03/15] tcp: reorganize tcp_sock_write_txrx group for variables later chia-yu.chang
2025-07-04  8:53 ` [PATCH v12 net-next 04/15] tcp: ecn functions in separated include file chia-yu.chang
2025-07-14 11:20   ` Paolo Abeni
2025-07-04  8:53 ` [PATCH v12 net-next 05/15] tcp: AccECN core chia-yu.chang
2025-07-04  8:53 ` [PATCH v12 net-next 06/15] tcp: accecn: AccECN negotiation chia-yu.chang
2025-07-14 13:10   ` Paolo Abeni
2025-07-04  8:53 ` [PATCH v12 net-next 07/15] tcp: Add wait_third_ack for ECN negotiation in simultaneous connect chia-yu.chang
2025-07-14 13:22   ` Paolo Abeni
2025-07-04  8:53 ` [PATCH v12 net-next 08/15] tcp: accecn: add AccECN rx byte counters chia-yu.chang
2025-07-14 13:30   ` Paolo Abeni
2025-07-04  8:53 ` [PATCH v12 net-next 09/15] tcp: accecn: AccECN needs to know delivered bytes chia-yu.chang
2025-07-14 13:33   ` Paolo Abeni
2025-07-15  9:08     ` Chia-Yu Chang (Nokia)
2025-07-04  8:53 ` [PATCH v12 net-next 10/15] tcp: sack option handling improvements chia-yu.chang
2025-07-04  8:53 ` [PATCH v12 net-next 11/15] tcp: accecn: AccECN option chia-yu.chang
2025-07-14 14:32   ` Paolo Abeni
2025-07-15 14:49     ` Chia-Yu Chang (Nokia)
2025-07-16  8:29       ` Paolo Abeni [this message]
2025-07-04  8:53 ` [PATCH v12 net-next 12/15] tcp: accecn: AccECN option send control chia-yu.chang
2025-07-14 14:54   ` Paolo Abeni
2025-07-15 15:14     ` Chia-Yu Chang (Nokia)
2025-07-16  8:33       ` Paolo Abeni
2025-07-14 15:12   ` Paolo Abeni
2025-07-04  8:53 ` [PATCH v12 net-next 13/15] tcp: accecn: AccECN option failure handling chia-yu.chang
2025-07-14 15:00   ` Paolo Abeni
2025-07-04  8:53 ` [PATCH v12 net-next 14/15] tcp: accecn: AccECN option ceb/cep and ACE field multi-wrap heuristics chia-yu.chang
2025-07-14 15:03   ` Paolo Abeni
2025-07-04  8:53 ` [PATCH v12 net-next 15/15] tcp: accecn: try to fit AccECN option with SACK chia-yu.chang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f60c5ef5-47b6-4132-bd7c-9707c81289a2@redhat.com \
    --to=pabeni@redhat.com \
    --cc=Jason_Livingood@comcast.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=ast@fiberby.net \
    --cc=bpf@vger.kernel.org \
    --cc=cheshire@apple.com \
    --cc=chia-yu.chang@nokia-bell-labs.com \
    --cc=corbet@lwn.net \
    --cc=dave.taht@gmail.com \
    --cc=davem@davemloft.net \
    --cc=donald.hunter@gmail.com \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=g.white@cablelabs.com \
    --cc=horms@kernel.org \
    --cc=ij@kernel.org \
    --cc=ingemar.s.johansson@ericsson.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=koen.de_schepper@nokia-bell-labs.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=liuhangbin@gmail.com \
    --cc=mirja.kuehlewind@ericsson.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=rs.ietf@gmx.at \
    --cc=shuah@kernel.org \
    --cc=stephen@networkplumber.org \
    --cc=vidhi_goel@apple.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).