* [PATCH v2 net 0/2] ipv4: avoid pathological hash tables
@ 2022-01-19 10:04 Eric Dumazet
2022-01-19 10:04 ` [PATCH v2 net 1/2] ipv4: avoid quadratic behavior in netns dismantle Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Eric Dumazet @ 2022-01-19 10:04 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: David Ahern, netdev, Eric Dumazet, Eric Dumazet
From: Eric Dumazet <edumazet@google.com>
This series speeds up netns dismantles on hosts
having many active netns, by making sure two hash tables
used for IPV4 fib contains uniformly spread items.
v2: changed second patch to add fib_info_laddrhash_bucket()
for consistency (David Ahern suggestion).
Eric Dumazet (2):
ipv4: avoid quadratic behavior in netns dismantle
ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
net/ipv4/fib_semantics.c | 65 ++++++++++++++++++++--------------------
1 file changed, 32 insertions(+), 33 deletions(-)
--
2.34.1.703.g22d0c6ccf7-goog
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 net 1/2] ipv4: avoid quadratic behavior in netns dismantle
2022-01-19 10:04 [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Eric Dumazet
@ 2022-01-19 10:04 ` Eric Dumazet
2022-01-19 10:04 ` [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys Eric Dumazet
2022-01-19 18:24 ` [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Jakub Kicinski
2 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2022-01-19 10:04 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: David Ahern, netdev, Eric Dumazet, Eric Dumazet
From: Eric Dumazet <edumazet@google.com>
net/ipv4/fib_semantics.c uses an hash table of 256 slots,
keyed by device ifindexes: fib_info_devhash[DEVINDEX_HASHSIZE]
Problem is that with network namespaces, devices tend
to use the same ifindex.
lo device for instance has a fixed ifindex of one,
for all network namespaces.
This means that hosts with thousands of netns spend
a lot of time looking at some hash buckets with thousands
of elements, notably at netns dismantle.
Simply add a per netns perturbation (net_hash_mix())
to spread elements more uniformely.
Also change fib_devindex_hashfn() to use more entropy.
Fixes: aa79e66eee5d ("net: Make ifindex generation per-net namespace")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
---
net/ipv4/fib_semantics.c | 36 +++++++++++++++++-------------------
1 file changed, 17 insertions(+), 19 deletions(-)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 45619c005b8dddd7ccd5c7029efa4ed69b6ce1de..9813949da10493de36b9db797b6a5d94fd9bd3b1 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -29,6 +29,7 @@
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/netlink.h>
+#include <linux/hash.h>
#include <net/arp.h>
#include <net/ip.h>
@@ -319,11 +320,15 @@ static inline int nh_comp(struct fib_info *fi, struct fib_info *ofi)
static inline unsigned int fib_devindex_hashfn(unsigned int val)
{
- unsigned int mask = DEVINDEX_HASHSIZE - 1;
+ return hash_32(val, DEVINDEX_HASHBITS);
+}
- return (val ^
- (val >> DEVINDEX_HASHBITS) ^
- (val >> (DEVINDEX_HASHBITS * 2))) & mask;
+static struct hlist_head *
+fib_info_devhash_bucket(const struct net_device *dev)
+{
+ u32 val = net_hash_mix(dev_net(dev)) ^ dev->ifindex;
+
+ return &fib_info_devhash[fib_devindex_hashfn(val)];
}
static unsigned int fib_info_hashfn_1(int init_val, u8 protocol, u8 scope,
@@ -433,12 +438,11 @@ int ip_fib_check_default(__be32 gw, struct net_device *dev)
{
struct hlist_head *head;
struct fib_nh *nh;
- unsigned int hash;
spin_lock(&fib_info_lock);
- hash = fib_devindex_hashfn(dev->ifindex);
- head = &fib_info_devhash[hash];
+ head = fib_info_devhash_bucket(dev);
+
hlist_for_each_entry(nh, head, nh_hash) {
if (nh->fib_nh_dev == dev &&
nh->fib_nh_gw4 == gw &&
@@ -1609,12 +1613,10 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
} else {
change_nexthops(fi) {
struct hlist_head *head;
- unsigned int hash;
if (!nexthop_nh->fib_nh_dev)
continue;
- hash = fib_devindex_hashfn(nexthop_nh->fib_nh_dev->ifindex);
- head = &fib_info_devhash[hash];
+ head = fib_info_devhash_bucket(nexthop_nh->fib_nh_dev);
hlist_add_head(&nexthop_nh->nh_hash, head);
} endfor_nexthops(fi)
}
@@ -1966,8 +1968,7 @@ void fib_nhc_update_mtu(struct fib_nh_common *nhc, u32 new, u32 orig)
void fib_sync_mtu(struct net_device *dev, u32 orig_mtu)
{
- unsigned int hash = fib_devindex_hashfn(dev->ifindex);
- struct hlist_head *head = &fib_info_devhash[hash];
+ struct hlist_head *head = fib_info_devhash_bucket(dev);
struct fib_nh *nh;
hlist_for_each_entry(nh, head, nh_hash) {
@@ -1986,12 +1987,11 @@ void fib_sync_mtu(struct net_device *dev, u32 orig_mtu)
*/
int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force)
{
- int ret = 0;
- int scope = RT_SCOPE_NOWHERE;
+ struct hlist_head *head = fib_info_devhash_bucket(dev);
struct fib_info *prev_fi = NULL;
- unsigned int hash = fib_devindex_hashfn(dev->ifindex);
- struct hlist_head *head = &fib_info_devhash[hash];
+ int scope = RT_SCOPE_NOWHERE;
struct fib_nh *nh;
+ int ret = 0;
if (force)
scope = -1;
@@ -2136,7 +2136,6 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res)
int fib_sync_up(struct net_device *dev, unsigned char nh_flags)
{
struct fib_info *prev_fi;
- unsigned int hash;
struct hlist_head *head;
struct fib_nh *nh;
int ret;
@@ -2152,8 +2151,7 @@ int fib_sync_up(struct net_device *dev, unsigned char nh_flags)
}
prev_fi = NULL;
- hash = fib_devindex_hashfn(dev->ifindex);
- head = &fib_info_devhash[hash];
+ head = fib_info_devhash_bucket(dev);
ret = 0;
hlist_for_each_entry(nh, head, nh_hash) {
--
2.34.1.703.g22d0c6ccf7-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
2022-01-19 10:04 [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Eric Dumazet
2022-01-19 10:04 ` [PATCH v2 net 1/2] ipv4: avoid quadratic behavior in netns dismantle Eric Dumazet
@ 2022-01-19 10:04 ` Eric Dumazet
2022-01-19 15:40 ` David Ahern
2022-01-19 18:24 ` [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Jakub Kicinski
2 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2022-01-19 10:04 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: David Ahern, netdev, Eric Dumazet, Eric Dumazet
From: Eric Dumazet <edumazet@google.com>
net/ipv4/fib_semantics.c uses a hash table (fib_info_laddrhash)
in which fib_sync_down_addr() can locate fib_info
based on IPv4 local address.
This hash table is resized based on total number of
hashed fib_info, but the hash function is only
using the local address.
For hosts having many active network namespaces,
all fib_info for loopback devices (IPv4 address 127.0.0.1)
are hashed into a single bucket, making netns dismantles
very slow.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/fib_semantics.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 9813949da10493de36b9db797b6a5d94fd9bd3b1..b4589861b84c6bc4daa7149e078ad63749c7622f 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -52,6 +52,7 @@ static DEFINE_SPINLOCK(fib_info_lock);
static struct hlist_head *fib_info_hash;
static struct hlist_head *fib_info_laddrhash;
static unsigned int fib_info_hash_size;
+static unsigned int fib_info_hash_bits;
static unsigned int fib_info_cnt;
#define DEVINDEX_HASHBITS 8
@@ -1247,13 +1248,13 @@ int fib_check_nh(struct net *net, struct fib_nh *nh, u32 table, u8 scope,
return err;
}
-static inline unsigned int fib_laddr_hashfn(__be32 val)
+static struct hlist_head *
+fib_info_laddrhash_bucket(const struct net *net, __be32 val)
{
- unsigned int mask = (fib_info_hash_size - 1);
+ u32 slot = hash_32(net_hash_mix(net) ^ (__force u32)val,
+ fib_info_hash_bits);
- return ((__force u32)val ^
- ((__force u32)val >> 7) ^
- ((__force u32)val >> 14)) & mask;
+ return &fib_info_laddrhash[slot];
}
static struct hlist_head *fib_info_hash_alloc(int bytes)
@@ -1289,6 +1290,7 @@ static void fib_info_hash_move(struct hlist_head *new_info_hash,
old_info_hash = fib_info_hash;
old_laddrhash = fib_info_laddrhash;
fib_info_hash_size = new_size;
+ fib_info_hash_bits = ilog2(new_size);
for (i = 0; i < old_size; i++) {
struct hlist_head *head = &fib_info_hash[i];
@@ -1306,21 +1308,20 @@ static void fib_info_hash_move(struct hlist_head *new_info_hash,
}
fib_info_hash = new_info_hash;
+ fib_info_laddrhash = new_laddrhash;
for (i = 0; i < old_size; i++) {
- struct hlist_head *lhead = &fib_info_laddrhash[i];
+ struct hlist_head *lhead = &old_laddrhash[i];
struct hlist_node *n;
struct fib_info *fi;
hlist_for_each_entry_safe(fi, n, lhead, fib_lhash) {
struct hlist_head *ldest;
- unsigned int new_hash;
- new_hash = fib_laddr_hashfn(fi->fib_prefsrc);
- ldest = &new_laddrhash[new_hash];
+ ldest = fib_info_laddrhash_bucket(fi->fib_net,
+ fi->fib_prefsrc);
hlist_add_head(&fi->fib_lhash, ldest);
}
}
- fib_info_laddrhash = new_laddrhash;
spin_unlock_bh(&fib_info_lock);
@@ -1605,7 +1606,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
if (fi->fib_prefsrc) {
struct hlist_head *head;
- head = &fib_info_laddrhash[fib_laddr_hashfn(fi->fib_prefsrc)];
+ head = fib_info_laddrhash_bucket(net, fi->fib_prefsrc);
hlist_add_head(&fi->fib_lhash, head);
}
if (fi->nh) {
@@ -1877,16 +1878,16 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
*/
int fib_sync_down_addr(struct net_device *dev, __be32 local)
{
- int ret = 0;
- unsigned int hash = fib_laddr_hashfn(local);
- struct hlist_head *head = &fib_info_laddrhash[hash];
int tb_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN;
struct net *net = dev_net(dev);
+ struct hlist_head *head;
struct fib_info *fi;
+ int ret = 0;
if (!fib_info_laddrhash || local == 0)
return 0;
+ head = fib_info_laddrhash_bucket(net, local);
hlist_for_each_entry(fi, head, fib_lhash) {
if (!net_eq(fi->fib_net, net) ||
fi->fib_tb_id != tb_id)
--
2.34.1.703.g22d0c6ccf7-goog
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
2022-01-19 10:04 ` [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys Eric Dumazet
@ 2022-01-19 15:40 ` David Ahern
0 siblings, 0 replies; 5+ messages in thread
From: David Ahern @ 2022-01-19 15:40 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski
Cc: David Ahern, netdev, Eric Dumazet
On 1/19/22 3:04 AM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> net/ipv4/fib_semantics.c uses a hash table (fib_info_laddrhash)
> in which fib_sync_down_addr() can locate fib_info
> based on IPv4 local address.
>
> This hash table is resized based on total number of
> hashed fib_info, but the hash function is only
> using the local address.
>
> For hosts having many active network namespaces,
> all fib_info for loopback devices (IPv4 address 127.0.0.1)
> are hashed into a single bucket, making netns dismantles
> very slow.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> net/ipv4/fib_semantics.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 net 0/2] ipv4: avoid pathological hash tables
2022-01-19 10:04 [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Eric Dumazet
2022-01-19 10:04 ` [PATCH v2 net 1/2] ipv4: avoid quadratic behavior in netns dismantle Eric Dumazet
2022-01-19 10:04 ` [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys Eric Dumazet
@ 2022-01-19 18:24 ` Jakub Kicinski
2 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2022-01-19 18:24 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S . Miller, David Ahern, netdev, Eric Dumazet
On Wed, 19 Jan 2022 02:04:11 -0800 Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> This series speeds up netns dismantles on hosts
> having many active netns, by making sure two hash tables
> used for IPV4 fib contains uniformly spread items.
>
> v2: changed second patch to add fib_info_laddrhash_bucket()
> for consistency (David Ahern suggestion).
Applied, thanks! (the bot does not want to respond for some reason)
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-01-19 18:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-19 10:04 [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Eric Dumazet
2022-01-19 10:04 ` [PATCH v2 net 1/2] ipv4: avoid quadratic behavior in netns dismantle Eric Dumazet
2022-01-19 10:04 ` [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys Eric Dumazet
2022-01-19 15:40 ` David Ahern
2022-01-19 18:24 ` [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).