From: Eric Dumazet <eric.dumazet@gmail.com>
To: Kuniyuki Iwashima <kuniyu@amazon.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Kuniyuki Iwashima <kuni1840@gmail.com>, netdev@vger.kernel.org
Subject: Re: [PATCH v6 net-next 0/6] tcp: Introduce optional per-netns ehash.
Date: Thu, 8 Sep 2022 11:13:07 -0700 [thread overview]
Message-ID: <248395fc-7dd7-3c7d-affc-ced4145c5285@gmail.com> (raw)
In-Reply-To: <20220908011022.45342-1-kuniyu@amazon.com>
On 9/7/22 18:10, Kuniyuki Iwashima wrote:
> The more sockets we have in the hash table, the longer we spend looking
> up the socket. While running a number of small workloads on the same
> host, they penalise each other and cause performance degradation.
>
> The root cause might be a single workload that consumes much more
> resources than the others. It often happens on a cloud service where
> different workloads share the same computing resource.
>
> On EC2 c5.24xlarge instance (196 GiB memory and 524288 (1Mi / 2) ehash
> entries), after running iperf3 in different netns, creating 24Mi sockets
> without data transfer in the root netns causes about 10% performance
> regression for the iperf3's connection.
>
> thash_entries sockets length Gbps
> 524288 1 1 50.7
> 24Mi 48 45.1
>
> It is basically related to the length of the list of each hash bucket.
> For testing purposes to see how performance drops along the length,
> I set 131072 (1Mi / 8) to thash_entries, and here's the result.
>
> thash_entries sockets length Gbps
> 131072 1 1 50.7
> 1Mi 8 49.9
> 2Mi 16 48.9
> 4Mi 32 47.3
> 8Mi 64 44.6
> 16Mi 128 40.6
> 24Mi 192 36.3
> 32Mi 256 32.5
> 40Mi 320 27.0
> 48Mi 384 25.0
>
> To resolve the socket lookup degradation, we introduce an optional
> per-netns hash table for TCP, but it's just ehash, and we still share
> the global bhash, bhash2 and lhash2.
>
> With a smaller ehash, we can look up non-listener sockets faster and
> isolate such noisy neighbours. Also, we can reduce lock contention.
>
> For details, please see the last patch.
>
> patch 1 - 4: prep for per-netns ehash
> patch 5: small optimisation for netns dismantle without TIME_WAIT sockets
> patch 6: add per-netns ehash
>
> Many thanks to Eric Dumazet for reviewing and advising.
>
>
> Changes:
> v6:
> * Patch 6
> * Use vmalloc_huge() in inet_pernet_hashinfo_alloc() and
> update the changelog and doc about NUMA (Eric Dumazet)
> * Use kmemdup() in inet_pernet_hashinfo_alloc() (Eric Dumazet)
> * Use vfree() in inet_pernet_hashinfo_(alloc|free)()
>
> v5: https://lore.kernel.org/netdev/20220907005534.72876-1-kuniyu@amazon.com/
> * Patch 2
> * Keep the tw_refcount base value at 1 (Eric Dumazet)
> * Add WARN_ON_ONCE() for tw_refcount (Eric Dumazet)
> * Patch 5
> * Test tw_refcount against 1 in tcp_twsk_purge()
>
> v4: https://lore.kernel.org/netdev/20220906162423.44410-1-kuniyu@amazon.com/
> * Add Patch 2
> * Patch 1
> * Add cleanups in tcp_time_wait() and tcp_v[46]_connect()
> * Patch 3
> * /tcp_death_row/s/->/./
> * Patch 4
> * Add mellanox and netronome driver changes back (Paolo Abeni, Jakub Kicinski)
> * /tcp_death_row/s/->/./
> * Patch 5
> * Simplify tcp_twsk_purge()
> * Patch 6
> * Move inet_pernet_hashinfo_free() into tcp_sk_exit_batch()
>
> v3: https://lore.kernel.org/netdev/20220830191518.77083-1-kuniyu@amazon.com/
> * Patch 3
> * Drop mellanox and netronome driver changes (Eric Dumazet)
> * Patch 4
> * Add test results in the changelog
> * Patch 5
> * Use roundup_pow_of_two() in tcp_set_hashinfo() (Eric Dumazet)
> * Remove proc_tcp_child_ehash_entries() and use proc_douintvec_minmax()
>
> v2: https://lore.kernel.org/netdev/20220829161920.99409-1-kuniyu@amazon.com/
> * Drop flock() and UDP stuff
> * Patch 2
> * Rename inet_get_hashinfo() to tcp_or_dccp_get_hashinfo() (Eric Dumazet)
> * Patch 4
> * Remove unnecessary inet_twsk_purge() calls for unshare()
> * Factorise inet_twsk_purge() calls (Eric Dumazet)
> * Patch 5
> * Change max buckets size as 16Mi
> * Use unsigned int for ehash size (Eric Dumazet)
> * Use GFP_KERNEL_ACCOUNT for the per-netns ehash allocation (Eric Dumazet)
> * Use current->nsproxy->net_ns for parent netns (Eric Dumazet)
>
> v1: https://lore.kernel.org/netdev/20220826000445.46552-1-kuniyu@amazon.com/
>
SGTM, thanks.
For the whole series:
Reviewed-by: Eric Dumazet <edumazet@google.com>
next prev parent reply other threads:[~2022-09-08 18:13 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-08 1:10 [PATCH v6 net-next 0/6] tcp: Introduce optional per-netns ehash Kuniyuki Iwashima
2022-09-08 1:10 ` [PATCH v6 net-next 1/6] tcp: Clean up some functions Kuniyuki Iwashima
2022-09-08 1:10 ` [PATCH v6 net-next 2/6] tcp: Don't allocate tcp_death_row outside of struct netns_ipv4 Kuniyuki Iwashima
2022-09-08 1:10 ` [PATCH v6 net-next 3/6] tcp: Set NULL to sk->sk_prot->h.hashinfo Kuniyuki Iwashima
2022-09-08 1:10 ` [PATCH v6 net-next 4/6] tcp: Access &tcp_hashinfo via net Kuniyuki Iwashima
2022-09-08 1:10 ` [PATCH v6 net-next 5/6] tcp: Save unnecessary inet_twsk_purge() calls Kuniyuki Iwashima
2022-09-08 1:10 ` [PATCH v6 net-next 6/6] tcp: Introduce optional per-netns ehash Kuniyuki Iwashima
2022-09-08 18:13 ` Eric Dumazet [this message]
2022-09-20 19:00 ` [PATCH v6 net-next 0/6] " patchwork-bot+netdevbpf
2022-10-11 21:46 ` Eric Dumazet
2022-10-11 21:53 ` Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=248395fc-7dd7-3c7d-affc-ced4145c5285@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=kuni1840@gmail.com \
--cc=kuniyu@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).