netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dumitru Ceara <dceara@redhat.com>
To: Ilya Maximets <i.maximets@ovn.org>, Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>,
	davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	netdev@vger.kernel.org
Subject: Re: [PATCH net-next 4/4] net: skbuff: fix l4_hash comment
Date: Thu, 11 May 2023 22:50:32 +0200	[thread overview]
Message-ID: <11ece947-a839-0026-b272-7fb07bcaf1bb@redhat.com> (raw)
In-Reply-To: <e45f3257-dc5c-3bcd-2de4-64f478ebb470@ovn.org>

On 5/11/23 19:54, Ilya Maximets wrote:
> On 5/11/23 15:00, Dumitru Ceara wrote:
>> On 5/11/23 14:33, Eric Dumazet wrote:
>>> On Thu, May 11, 2023 at 2:10 PM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> Hi Antoine,
>>>>
>>>> On 5/11/23 11:34, Antoine Tenart wrote:
>>>>> Since commit 877d1f6291f8 ("net: Set sk_txhash from a random number")
>>>>> sk->sk_txhash is not a canonical 4-tuple hash. sk->sk_txhash is
>>>>> used in the TCP Tx path to populate skb->hash, with skb->l4_hash=1.
>>>>> With this, skb->l4_hash does not always indicate the hash is a
>>>>> "canonical 4-tuple hash over transport ports" but rather a hash from L4
>>>>> layer to provide a uniform distribution over flows. Reword the comment
>>>>> accordingly, to avoid misunderstandings.
>>>>
>>>> But AFAIU the hash used to be a canonical 4-tuple hash and was used as
>>>> such by other components, e.g., OvS:
>>>>
>>>> https://elixir.bootlin.com/linux/latest/source/net/openvswitch/actions.c#L1069
>>>>
>>>> It seems to me at least unfortunate that semantics change without
>>>> considering other users.  The fact that we now fix the documentation
>>>> makes it seem like OvS was wrong to use the skb hash.  However, before
>>>> 877d1f6291f8 ("net: Set sk_txhash from a random number") it was OK for
>>>> OvS to use the skb hash as a canonical 4-tuple hash.
>>>>
>>>
>>> I do not think we can undo stuff that was done back in 2015
>>>
>>
>> I understand.  I guess I was kind of grasping at straws in the hope of
>> getting a canonical 4-tuple hash.
>>
>>> Has anyone complained ?
>>>
>>
>> It did go unnoticed for a while but recently we started getting
>> (indirect) reports due to the hash changing.
>>
>> This one is from an upstream OVN (OvS) user:
>> https://github.com/ovn-org/ovn/issues/112
>>
>> This is from an OpenShift (also running OVN/OvS) user:
>> https://issues.redhat.com/browse/OCPBUGS-7406
>>

I just realized we need a bit more context here.  It started being a
visible problem after 265f94ff54d6 ("net: Recompute sk_txhash on
negative routing advice") and also after 3acf3ec3f4b0 ("tcp: Change
txhash on every SYN and RTO retransmit") when retransmits started
changing the txhash and implicitly the hash used by OvS.

>>> Note that skb->hash has never been considered as canonical, for obvious reasons.
> 
> I guess, the other point here is that it's not an L4 hash either.
> 
> It's a random number.  So, the documentation will still not be
> correct even after the change proposed in this patch.
> 
> 
> One other solution to the problem might be to stop setting l4_hash
> flag while it's a random number.
> 
> One way to not break everything doing that will be to introduce a
> new flag, e.g. 'rnd_hash' that will be a hash that is "not related
> to packet fields, but provides a uniform distribution over flows".
> 
> skb_get_hash() then may return the current hash if it's any of
> l4, rnd or sw.  That should preserve the current logic across
> the kernel code.
> But having a new flag, we could introduce a new helper, for example
> skb_get_stable_hash() or skb_get_hash_nonrandom() or something like
> that, that will be equal to the current version of skb_get_hash(),
> i.e. not take the random hash into account.
> 
> Affected subsystems (OVS, ECMP, SRv6) can be changed to use that
> new function.  This way these subsystems will get a software hash
> based on the real packet fields, if it was originally random.
> This will also preserve ability to use hash provided by the HW,
> since it is not normally random.
> 
> With that, we'll also not need to have in the API something that has
> 'L4' in the name and in the docs, but has no relation to packet fields.
> It can be argued that the description in the doc doesn't mean that
> this hash is computed using L4 packet fields, but it's confusing
> regardless and getting overlooked while creating new code, as it
> shown by the issues in multiple substystems.
> 
> Hope this makes some sense.
> 
> 
> Dumitru also had some alternative ideas on how to provide a stable
> hash to subsystems that need it, but I'll leave it to him.
> 
What I had in mind is not really a stable hash but a "good enough
alternative".  It's probably "good enough" (at least for OvS/OVN) if the
hash used by OvS doesn't change throughout the lifetime of a TCP session.

Would it be possible to save the original (random) hash that was
generated for a locally terminated TCP session?  E.g., a new field in
'struct sock'.  It would be in essence a random tag associated to the
session that doesn't change throughout the lifetime of the session.
Unlike sk->sk_txhash which changes on retransmit/negative routing advice.

That means OvS doesn't have to compute a stable hash every time it
processes a packet,  It would just access this value through
skb->sk->good_name_for_this_new_tag.  The advantage is that it gives the
appearance of a canonical 4-tuple hash throughout the lifetime of a
session and it doesn't affect any of the use cases that required
877d1f6291f8 ("net: Set sk_txhash from a random number").

I probably missed relevant things but I thought it might be worth
sharing in case the idea has some value.

Regards,
Dumitru

> Best regards, Ilya Maximets.
> 
>>>
>>>
>>>> Best regards,
>>>> Dumitru
>>>>
>>>>>
>>>>> Signed-off-by: Antoine Tenart <atenart@kernel.org>
>>>>> ---
>>>>>  include/linux/skbuff.h | 4 ++--
>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>>> index 738776ab8838..f54c84193b23 100644
>>>>> --- a/include/linux/skbuff.h
>>>>> +++ b/include/linux/skbuff.h
>>>>> @@ -791,8 +791,8 @@ typedef unsigned char *sk_buff_data_t;
>>>>>   *   @active_extensions: active extensions (skb_ext_id types)
>>>>>   *   @ndisc_nodetype: router type (from link layer)
>>>>>   *   @ooo_okay: allow the mapping of a socket to a queue to be changed
>>>>> - *   @l4_hash: indicate hash is a canonical 4-tuple hash over transport
>>>>> - *           ports.
>>>>> + *   @l4_hash: indicate hash is from layer 4 and provides a uniform
>>>>> + *           distribution over flows.
>>>>>   *   @sw_hash: indicates hash was computed in software stack
>>>>>   *   @wifi_acked_valid: wifi_acked was set
>>>>>   *   @wifi_acked: whether frame was acked on wifi or not
>>>>
>>>
>>
> 


  reply	other threads:[~2023-05-11 20:50 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-11  9:34 [PATCH net-next 0/4] net: tcp: make txhash use consistent for IPv4 Antoine Tenart
2023-05-11  9:34 ` [PATCH net-next 1/4] net: tcp: make the txhash available in TIME_WAIT sockets for IPv4 too Antoine Tenart
2023-05-11  9:34 ` [PATCH net-next 2/4] net: ipv4: use consistent txhash in TIME_WAIT and SYN_RECV Antoine Tenart
2023-05-11  9:34 ` [PATCH net-next 3/4] Documentation: net: net.core.txrehash is not specific to listening sockets Antoine Tenart
2023-05-11  9:34 ` [PATCH net-next 4/4] net: skbuff: fix l4_hash comment Antoine Tenart
2023-05-11 12:10   ` Dumitru Ceara
2023-05-11 12:33     ` Eric Dumazet
2023-05-11 13:00       ` Dumitru Ceara
2023-05-11 17:54         ` Ilya Maximets
2023-05-11 20:50           ` Dumitru Ceara [this message]
2023-05-15  8:12             ` Antoine Tenart
2023-05-15 18:23               ` Ilya Maximets
2023-05-16  7:36                 ` Antoine Tenart
2023-05-16 21:25                   ` Ilya Maximets
2023-05-17 12:05                     ` Antoine Tenart
2023-05-17 23:00                       ` Ilya Maximets
2023-05-23 15:25                         ` Antoine Tenart
2023-05-11 10:24 ` [PATCH net-next 0/4] net: tcp: make txhash use consistent for IPv4 Eric Dumazet
2023-05-11 11:55   ` Antoine Tenart
2023-05-11 11:59 ` Ilya Maximets
  -- strict thread matches above, loose matches on Subject: below --
2023-04-27 13:45 Antoine Tenart
2023-04-27 13:45 ` [PATCH net-next 4/4] net: skbuff: fix l4_hash comment Antoine Tenart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=11ece947-a839-0026-b272-7fb07bcaf1bb@redhat.com \
    --to=dceara@redhat.com \
    --cc=atenart@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=i.maximets@ovn.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).