netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dust Li <dust.li@linux.alibaba.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Philo Lu <lulie@linux.alibaba.com>,
	netdev@vger.kernel.org, willemdebruijn.kernel@gmail.com,
	davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
	dsahern@kernel.org, antony.antony@secunet.com,
	steffen.klassert@secunet.com, linux-kernel@vger.kernel.org,
	jakub@cloudflare.com
Subject: Re: [RFC PATCH net-next] net/udp: Add 4-tuple hash for connected socket
Date: Fri, 13 Sep 2024 23:06:49 +0800	[thread overview]
Message-ID: <20240913150649.GB14069@linux.alibaba.com> (raw)
In-Reply-To: <CANn89iL9EYX1EYLcrsXxz6dZX6eYyAi+u4uCZuYjg=y3tbgh6A@mail.gmail.com>

On 2024-09-13 16:39:33, Eric Dumazet wrote:
>On Fri, Sep 13, 2024 at 4:22 PM Dust Li <dust.li@linux.alibaba.com> wrote:
>>
>> On 2024-09-13 13:49:03, Eric Dumazet wrote:
>> >On Fri, Sep 13, 2024 at 12:09 PM Philo Lu <lulie@linux.alibaba.com> wrote:
>> >>
>> >> This RFC patch introduces 4-tuple hash for connected udp sockets, to
>> >> make udp lookup faster. It is a tentative proposal and any comment is
>> >> welcome.
>> >>
>> >> Currently, the udp_table has two hash table, the port hash and portaddr
>> >> hash. But for UDP server, all sockets have the same local port and addr,
>> >> so they are all on the same hash slot within a reuseport group. And the
>> >> target sock is selected by scoring.
>> >>
>> >> In some applications, the UDP server uses connect() for each incoming
>> >> client, and then the socket (fd) is used exclusively by the client. In
>> >> such scenarios, current scoring method can be ineffcient with a large
>> >> number of connections, resulting in high softirq overhead.
>> >>
>> >> To solve the problem, a 4-tuple hash list is added to udp_table, and is
>> >> updated when calling connect(). Then __udp4_lib_lookup() firstly
>> >> searches the 4-tuple hash list, and return directly if success. A new
>> >> sockopt UDP_HASH4 is added to enable it. So the usage is:
>> >> 1. socket()
>> >> 2. bind()
>> >> 3. setsockopt(UDP_HASH4)
>> >> 4. connect()
>> >>
>> >> AFAICT the patch (if useful) can be further improved by:
>> >> (a) Support disable with sockopt UDP_HASH4. Now it cannot be disabled
>> >> once turned on until the socket closed.
>> >> (b) Better interact with hash2/reuseport. Now hash4 hardly affects other
>> >> mechanisms, but maintaining sockets in both hash4 and hash2 lists seems
>> >> unnecessary.
>> >> (c) Support early demux and ipv6.
>> >>
>> >> Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
>> >
>> >Adding a 4-tuple hash for UDP has been discussed in the past.
>>
>> Thanks for the information! we don't know the history.
>>
>> >
>> >Main issue is that this is adding one cache line miss per incoming packet.
>>
>> What about adding something like refcnt in 'struct udp_hslot' ?
>> if someone enabled uhash4 on the port, we increase the refcnt.
>> Then we can check if that port have uhash4 enabled. If it's zero,
>> we can just bypass the uhash4 lookup process and goto the current
>> udp4_lib_lookup2().
>>
>
>Reading anything (thus a refcnt) in 'struct udp_hslot' will need the
>same cache line miss.

hslot2->head in 'struct udp_hslot' will be read right away in
udp4_lib_lookup2() in any case, it's just a few instructions
later(about 20). So I think cache miss should not be a problem
in this case.

>
>Note that udp_hslot already has a 'count' field

Yes, but that's for uhash/uhash2. I'm thinking of adding something
to indicate that uhash4 was enabled on this port. So we can avoid
the extra memory footprint on some cold memory. Maybe 'struct udp_hslot'
is not a good place.

Best regards,
Dust


  reply	other threads:[~2024-09-13 15:06 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-13 10:09 [RFC PATCH net-next] net/udp: Add 4-tuple hash for connected socket Philo Lu
2024-09-13 11:49 ` Eric Dumazet
2024-09-13 14:21   ` Dust Li
2024-09-13 14:39     ` Eric Dumazet
2024-09-13 15:06       ` Dust Li [this message]
2024-09-13 15:39         ` Eric Dumazet
2024-09-23  8:40   ` Philo Lu
2024-09-23  9:19     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240913150649.GB14069@linux.alibaba.com \
    --to=dust.li@linux.alibaba.com \
    --cc=antony.antony@secunet.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=jakub@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lulie@linux.alibaba.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=steffen.klassert@secunet.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).