From: Dumitru Ceara <dceara@redhat.com>
To: Ilya Maximets <i.maximets@ovn.org>, Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
netdev@vger.kernel.org
Subject: Re: [PATCH net-next 4/4] net: skbuff: fix l4_hash comment
Date: Thu, 11 May 2023 22:50:32 +0200 [thread overview]
Message-ID: <11ece947-a839-0026-b272-7fb07bcaf1bb@redhat.com> (raw)
In-Reply-To: <e45f3257-dc5c-3bcd-2de4-64f478ebb470@ovn.org>
On 5/11/23 19:54, Ilya Maximets wrote:
> On 5/11/23 15:00, Dumitru Ceara wrote:
>> On 5/11/23 14:33, Eric Dumazet wrote:
>>> On Thu, May 11, 2023 at 2:10 PM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> Hi Antoine,
>>>>
>>>> On 5/11/23 11:34, Antoine Tenart wrote:
>>>>> Since commit 877d1f6291f8 ("net: Set sk_txhash from a random number")
>>>>> sk->sk_txhash is not a canonical 4-tuple hash. sk->sk_txhash is
>>>>> used in the TCP Tx path to populate skb->hash, with skb->l4_hash=1.
>>>>> With this, skb->l4_hash does not always indicate the hash is a
>>>>> "canonical 4-tuple hash over transport ports" but rather a hash from L4
>>>>> layer to provide a uniform distribution over flows. Reword the comment
>>>>> accordingly, to avoid misunderstandings.
>>>>
>>>> But AFAIU the hash used to be a canonical 4-tuple hash and was used as
>>>> such by other components, e.g., OvS:
>>>>
>>>> https://elixir.bootlin.com/linux/latest/source/net/openvswitch/actions.c#L1069
>>>>
>>>> It seems to me at least unfortunate that semantics change without
>>>> considering other users. The fact that we now fix the documentation
>>>> makes it seem like OvS was wrong to use the skb hash. However, before
>>>> 877d1f6291f8 ("net: Set sk_txhash from a random number") it was OK for
>>>> OvS to use the skb hash as a canonical 4-tuple hash.
>>>>
>>>
>>> I do not think we can undo stuff that was done back in 2015
>>>
>>
>> I understand. I guess I was kind of grasping at straws in the hope of
>> getting a canonical 4-tuple hash.
>>
>>> Has anyone complained ?
>>>
>>
>> It did go unnoticed for a while but recently we started getting
>> (indirect) reports due to the hash changing.
>>
>> This one is from an upstream OVN (OvS) user:
>> https://github.com/ovn-org/ovn/issues/112
>>
>> This is from an OpenShift (also running OVN/OvS) user:
>> https://issues.redhat.com/browse/OCPBUGS-7406
>>
I just realized we need a bit more context here. It started being a
visible problem after 265f94ff54d6 ("net: Recompute sk_txhash on
negative routing advice") and also after 3acf3ec3f4b0 ("tcp: Change
txhash on every SYN and RTO retransmit") when retransmits started
changing the txhash and implicitly the hash used by OvS.
>>> Note that skb->hash has never been considered as canonical, for obvious reasons.
>
> I guess, the other point here is that it's not an L4 hash either.
>
> It's a random number. So, the documentation will still not be
> correct even after the change proposed in this patch.
>
>
> One other solution to the problem might be to stop setting l4_hash
> flag while it's a random number.
>
> One way to not break everything doing that will be to introduce a
> new flag, e.g. 'rnd_hash' that will be a hash that is "not related
> to packet fields, but provides a uniform distribution over flows".
>
> skb_get_hash() then may return the current hash if it's any of
> l4, rnd or sw. That should preserve the current logic across
> the kernel code.
> But having a new flag, we could introduce a new helper, for example
> skb_get_stable_hash() or skb_get_hash_nonrandom() or something like
> that, that will be equal to the current version of skb_get_hash(),
> i.e. not take the random hash into account.
>
> Affected subsystems (OVS, ECMP, SRv6) can be changed to use that
> new function. This way these subsystems will get a software hash
> based on the real packet fields, if it was originally random.
> This will also preserve ability to use hash provided by the HW,
> since it is not normally random.
>
> With that, we'll also not need to have in the API something that has
> 'L4' in the name and in the docs, but has no relation to packet fields.
> It can be argued that the description in the doc doesn't mean that
> this hash is computed using L4 packet fields, but it's confusing
> regardless and getting overlooked while creating new code, as it
> shown by the issues in multiple substystems.
>
> Hope this makes some sense.
>
>
> Dumitru also had some alternative ideas on how to provide a stable
> hash to subsystems that need it, but I'll leave it to him.
>
What I had in mind is not really a stable hash but a "good enough
alternative". It's probably "good enough" (at least for OvS/OVN) if the
hash used by OvS doesn't change throughout the lifetime of a TCP session.
Would it be possible to save the original (random) hash that was
generated for a locally terminated TCP session? E.g., a new field in
'struct sock'. It would be in essence a random tag associated to the
session that doesn't change throughout the lifetime of the session.
Unlike sk->sk_txhash which changes on retransmit/negative routing advice.
That means OvS doesn't have to compute a stable hash every time it
processes a packet, It would just access this value through
skb->sk->good_name_for_this_new_tag. The advantage is that it gives the
appearance of a canonical 4-tuple hash throughout the lifetime of a
session and it doesn't affect any of the use cases that required
877d1f6291f8 ("net: Set sk_txhash from a random number").
I probably missed relevant things but I thought it might be worth
sharing in case the idea has some value.
Regards,
Dumitru
> Best regards, Ilya Maximets.
>
>>>
>>>
>>>> Best regards,
>>>> Dumitru
>>>>
>>>>>
>>>>> Signed-off-by: Antoine Tenart <atenart@kernel.org>
>>>>> ---
>>>>> include/linux/skbuff.h | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>>> index 738776ab8838..f54c84193b23 100644
>>>>> --- a/include/linux/skbuff.h
>>>>> +++ b/include/linux/skbuff.h
>>>>> @@ -791,8 +791,8 @@ typedef unsigned char *sk_buff_data_t;
>>>>> * @active_extensions: active extensions (skb_ext_id types)
>>>>> * @ndisc_nodetype: router type (from link layer)
>>>>> * @ooo_okay: allow the mapping of a socket to a queue to be changed
>>>>> - * @l4_hash: indicate hash is a canonical 4-tuple hash over transport
>>>>> - * ports.
>>>>> + * @l4_hash: indicate hash is from layer 4 and provides a uniform
>>>>> + * distribution over flows.
>>>>> * @sw_hash: indicates hash was computed in software stack
>>>>> * @wifi_acked_valid: wifi_acked was set
>>>>> * @wifi_acked: whether frame was acked on wifi or not
>>>>
>>>
>>
>
next prev parent reply other threads:[~2023-05-11 20:50 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-11 9:34 [PATCH net-next 0/4] net: tcp: make txhash use consistent for IPv4 Antoine Tenart
2023-05-11 9:34 ` [PATCH net-next 1/4] net: tcp: make the txhash available in TIME_WAIT sockets for IPv4 too Antoine Tenart
2023-05-11 9:34 ` [PATCH net-next 2/4] net: ipv4: use consistent txhash in TIME_WAIT and SYN_RECV Antoine Tenart
2023-05-11 9:34 ` [PATCH net-next 3/4] Documentation: net: net.core.txrehash is not specific to listening sockets Antoine Tenart
2023-05-11 9:34 ` [PATCH net-next 4/4] net: skbuff: fix l4_hash comment Antoine Tenart
2023-05-11 12:10 ` Dumitru Ceara
2023-05-11 12:33 ` Eric Dumazet
2023-05-11 13:00 ` Dumitru Ceara
2023-05-11 17:54 ` Ilya Maximets
2023-05-11 20:50 ` Dumitru Ceara [this message]
2023-05-15 8:12 ` Antoine Tenart
2023-05-15 18:23 ` Ilya Maximets
2023-05-16 7:36 ` Antoine Tenart
2023-05-16 21:25 ` Ilya Maximets
2023-05-17 12:05 ` Antoine Tenart
2023-05-17 23:00 ` Ilya Maximets
2023-05-23 15:25 ` Antoine Tenart
2023-05-11 10:24 ` [PATCH net-next 0/4] net: tcp: make txhash use consistent for IPv4 Eric Dumazet
2023-05-11 11:55 ` Antoine Tenart
2023-05-11 11:59 ` Ilya Maximets
-- strict thread matches above, loose matches on Subject: below --
2023-04-27 13:45 Antoine Tenart
2023-04-27 13:45 ` [PATCH net-next 4/4] net: skbuff: fix l4_hash comment Antoine Tenart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=11ece947-a839-0026-b272-7fb07bcaf1bb@redhat.com \
--to=dceara@redhat.com \
--cc=atenart@kernel.org \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=i.maximets@ovn.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).