netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 net 0/2] ipv4: avoid pathological hash tables
@ 2022-01-19 10:04 Eric Dumazet
  2022-01-19 10:04 ` [PATCH v2 net 1/2] ipv4: avoid quadratic behavior in netns dismantle Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Eric Dumazet @ 2022-01-19 10:04 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: David Ahern, netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

This series speeds up netns dismantles on hosts
having many active netns, by making sure two hash tables
used for IPV4 fib contains uniformly spread items.

v2: changed second patch to add fib_info_laddrhash_bucket()
    for consistency (David Ahern suggestion).

Eric Dumazet (2):
  ipv4: avoid quadratic behavior in netns dismantle
  ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys

 net/ipv4/fib_semantics.c | 65 ++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 33 deletions(-)

-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 net 1/2] ipv4: avoid quadratic behavior in netns dismantle
  2022-01-19 10:04 [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Eric Dumazet
@ 2022-01-19 10:04 ` Eric Dumazet
  2022-01-19 10:04 ` [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys Eric Dumazet
  2022-01-19 18:24 ` [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Jakub Kicinski
  2 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2022-01-19 10:04 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: David Ahern, netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

net/ipv4/fib_semantics.c uses an hash table of 256 slots,
keyed by device ifindexes: fib_info_devhash[DEVINDEX_HASHSIZE]

Problem is that with network namespaces, devices tend
to use the same ifindex.

lo device for instance has a fixed ifindex of one,
for all network namespaces.

This means that hosts with thousands of netns spend
a lot of time looking at some hash buckets with thousands
of elements, notably at netns dismantle.

Simply add a per netns perturbation (net_hash_mix())
to spread elements more uniformely.

Also change fib_devindex_hashfn() to use more entropy.

Fixes: aa79e66eee5d ("net: Make ifindex generation per-net namespace")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
---
 net/ipv4/fib_semantics.c | 36 +++++++++++++++++-------------------
 1 file changed, 17 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 45619c005b8dddd7ccd5c7029efa4ed69b6ce1de..9813949da10493de36b9db797b6a5d94fd9bd3b1 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -29,6 +29,7 @@
 #include <linux/init.h>
 #include <linux/slab.h>
 #include <linux/netlink.h>
+#include <linux/hash.h>
 
 #include <net/arp.h>
 #include <net/ip.h>
@@ -319,11 +320,15 @@ static inline int nh_comp(struct fib_info *fi, struct fib_info *ofi)
 
 static inline unsigned int fib_devindex_hashfn(unsigned int val)
 {
-	unsigned int mask = DEVINDEX_HASHSIZE - 1;
+	return hash_32(val, DEVINDEX_HASHBITS);
+}
 
-	return (val ^
-		(val >> DEVINDEX_HASHBITS) ^
-		(val >> (DEVINDEX_HASHBITS * 2))) & mask;
+static struct hlist_head *
+fib_info_devhash_bucket(const struct net_device *dev)
+{
+	u32 val = net_hash_mix(dev_net(dev)) ^ dev->ifindex;
+
+	return &fib_info_devhash[fib_devindex_hashfn(val)];
 }
 
 static unsigned int fib_info_hashfn_1(int init_val, u8 protocol, u8 scope,
@@ -433,12 +438,11 @@ int ip_fib_check_default(__be32 gw, struct net_device *dev)
 {
 	struct hlist_head *head;
 	struct fib_nh *nh;
-	unsigned int hash;
 
 	spin_lock(&fib_info_lock);
 
-	hash = fib_devindex_hashfn(dev->ifindex);
-	head = &fib_info_devhash[hash];
+	head = fib_info_devhash_bucket(dev);
+
 	hlist_for_each_entry(nh, head, nh_hash) {
 		if (nh->fib_nh_dev == dev &&
 		    nh->fib_nh_gw4 == gw &&
@@ -1609,12 +1613,10 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
 	} else {
 		change_nexthops(fi) {
 			struct hlist_head *head;
-			unsigned int hash;
 
 			if (!nexthop_nh->fib_nh_dev)
 				continue;
-			hash = fib_devindex_hashfn(nexthop_nh->fib_nh_dev->ifindex);
-			head = &fib_info_devhash[hash];
+			head = fib_info_devhash_bucket(nexthop_nh->fib_nh_dev);
 			hlist_add_head(&nexthop_nh->nh_hash, head);
 		} endfor_nexthops(fi)
 	}
@@ -1966,8 +1968,7 @@ void fib_nhc_update_mtu(struct fib_nh_common *nhc, u32 new, u32 orig)
 
 void fib_sync_mtu(struct net_device *dev, u32 orig_mtu)
 {
-	unsigned int hash = fib_devindex_hashfn(dev->ifindex);
-	struct hlist_head *head = &fib_info_devhash[hash];
+	struct hlist_head *head = fib_info_devhash_bucket(dev);
 	struct fib_nh *nh;
 
 	hlist_for_each_entry(nh, head, nh_hash) {
@@ -1986,12 +1987,11 @@ void fib_sync_mtu(struct net_device *dev, u32 orig_mtu)
  */
 int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force)
 {
-	int ret = 0;
-	int scope = RT_SCOPE_NOWHERE;
+	struct hlist_head *head = fib_info_devhash_bucket(dev);
 	struct fib_info *prev_fi = NULL;
-	unsigned int hash = fib_devindex_hashfn(dev->ifindex);
-	struct hlist_head *head = &fib_info_devhash[hash];
+	int scope = RT_SCOPE_NOWHERE;
 	struct fib_nh *nh;
+	int ret = 0;
 
 	if (force)
 		scope = -1;
@@ -2136,7 +2136,6 @@ static void fib_select_default(const struct flowi4 *flp, struct fib_result *res)
 int fib_sync_up(struct net_device *dev, unsigned char nh_flags)
 {
 	struct fib_info *prev_fi;
-	unsigned int hash;
 	struct hlist_head *head;
 	struct fib_nh *nh;
 	int ret;
@@ -2152,8 +2151,7 @@ int fib_sync_up(struct net_device *dev, unsigned char nh_flags)
 	}
 
 	prev_fi = NULL;
-	hash = fib_devindex_hashfn(dev->ifindex);
-	head = &fib_info_devhash[hash];
+	head = fib_info_devhash_bucket(dev);
 	ret = 0;
 
 	hlist_for_each_entry(nh, head, nh_hash) {
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
  2022-01-19 10:04 [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Eric Dumazet
  2022-01-19 10:04 ` [PATCH v2 net 1/2] ipv4: avoid quadratic behavior in netns dismantle Eric Dumazet
@ 2022-01-19 10:04 ` Eric Dumazet
  2022-01-19 15:40   ` David Ahern
  2022-01-19 18:24 ` [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Jakub Kicinski
  2 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2022-01-19 10:04 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: David Ahern, netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

net/ipv4/fib_semantics.c uses a hash table (fib_info_laddrhash)
in which fib_sync_down_addr() can locate fib_info
based on IPv4 local address.

This hash table is resized based on total number of
hashed fib_info, but the hash function is only
using the local address.

For hosts having many active network namespaces,
all fib_info for loopback devices (IPv4 address 127.0.0.1)
are hashed into a single bucket, making netns dismantles
very slow.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/fib_semantics.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 9813949da10493de36b9db797b6a5d94fd9bd3b1..b4589861b84c6bc4daa7149e078ad63749c7622f 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -52,6 +52,7 @@ static DEFINE_SPINLOCK(fib_info_lock);
 static struct hlist_head *fib_info_hash;
 static struct hlist_head *fib_info_laddrhash;
 static unsigned int fib_info_hash_size;
+static unsigned int fib_info_hash_bits;
 static unsigned int fib_info_cnt;
 
 #define DEVINDEX_HASHBITS 8
@@ -1247,13 +1248,13 @@ int fib_check_nh(struct net *net, struct fib_nh *nh, u32 table, u8 scope,
 	return err;
 }
 
-static inline unsigned int fib_laddr_hashfn(__be32 val)
+static struct hlist_head *
+fib_info_laddrhash_bucket(const struct net *net, __be32 val)
 {
-	unsigned int mask = (fib_info_hash_size - 1);
+	u32 slot = hash_32(net_hash_mix(net) ^ (__force u32)val,
+			   fib_info_hash_bits);
 
-	return ((__force u32)val ^
-		((__force u32)val >> 7) ^
-		((__force u32)val >> 14)) & mask;
+	return &fib_info_laddrhash[slot];
 }
 
 static struct hlist_head *fib_info_hash_alloc(int bytes)
@@ -1289,6 +1290,7 @@ static void fib_info_hash_move(struct hlist_head *new_info_hash,
 	old_info_hash = fib_info_hash;
 	old_laddrhash = fib_info_laddrhash;
 	fib_info_hash_size = new_size;
+	fib_info_hash_bits = ilog2(new_size);
 
 	for (i = 0; i < old_size; i++) {
 		struct hlist_head *head = &fib_info_hash[i];
@@ -1306,21 +1308,20 @@ static void fib_info_hash_move(struct hlist_head *new_info_hash,
 	}
 	fib_info_hash = new_info_hash;
 
+	fib_info_laddrhash = new_laddrhash;
 	for (i = 0; i < old_size; i++) {
-		struct hlist_head *lhead = &fib_info_laddrhash[i];
+		struct hlist_head *lhead = &old_laddrhash[i];
 		struct hlist_node *n;
 		struct fib_info *fi;
 
 		hlist_for_each_entry_safe(fi, n, lhead, fib_lhash) {
 			struct hlist_head *ldest;
-			unsigned int new_hash;
 
-			new_hash = fib_laddr_hashfn(fi->fib_prefsrc);
-			ldest = &new_laddrhash[new_hash];
+			ldest = fib_info_laddrhash_bucket(fi->fib_net,
+							  fi->fib_prefsrc);
 			hlist_add_head(&fi->fib_lhash, ldest);
 		}
 	}
-	fib_info_laddrhash = new_laddrhash;
 
 	spin_unlock_bh(&fib_info_lock);
 
@@ -1605,7 +1606,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg,
 	if (fi->fib_prefsrc) {
 		struct hlist_head *head;
 
-		head = &fib_info_laddrhash[fib_laddr_hashfn(fi->fib_prefsrc)];
+		head = fib_info_laddrhash_bucket(net, fi->fib_prefsrc);
 		hlist_add_head(&fi->fib_lhash, head);
 	}
 	if (fi->nh) {
@@ -1877,16 +1878,16 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
  */
 int fib_sync_down_addr(struct net_device *dev, __be32 local)
 {
-	int ret = 0;
-	unsigned int hash = fib_laddr_hashfn(local);
-	struct hlist_head *head = &fib_info_laddrhash[hash];
 	int tb_id = l3mdev_fib_table(dev) ? : RT_TABLE_MAIN;
 	struct net *net = dev_net(dev);
+	struct hlist_head *head;
 	struct fib_info *fi;
+	int ret = 0;
 
 	if (!fib_info_laddrhash || local == 0)
 		return 0;
 
+	head = fib_info_laddrhash_bucket(net, local);
 	hlist_for_each_entry(fi, head, fib_lhash) {
 		if (!net_eq(fi->fib_net, net) ||
 		    fi->fib_tb_id != tb_id)
-- 
2.34.1.703.g22d0c6ccf7-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
  2022-01-19 10:04 ` [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys Eric Dumazet
@ 2022-01-19 15:40   ` David Ahern
  0 siblings, 0 replies; 5+ messages in thread
From: David Ahern @ 2022-01-19 15:40 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: David Ahern, netdev, Eric Dumazet

On 1/19/22 3:04 AM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> net/ipv4/fib_semantics.c uses a hash table (fib_info_laddrhash)
> in which fib_sync_down_addr() can locate fib_info
> based on IPv4 local address.
> 
> This hash table is resized based on total number of
> hashed fib_info, but the hash function is only
> using the local address.
> 
> For hosts having many active network namespaces,
> all fib_info for loopback devices (IPv4 address 127.0.0.1)
> are hashed into a single bucket, making netns dismantles
> very slow.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv4/fib_semantics.c | 29 +++++++++++++++--------------
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 

Reviewed-by: David Ahern <dsahern@kernel.org>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 net 0/2] ipv4: avoid pathological hash tables
  2022-01-19 10:04 [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Eric Dumazet
  2022-01-19 10:04 ` [PATCH v2 net 1/2] ipv4: avoid quadratic behavior in netns dismantle Eric Dumazet
  2022-01-19 10:04 ` [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys Eric Dumazet
@ 2022-01-19 18:24 ` Jakub Kicinski
  2 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2022-01-19 18:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, David Ahern, netdev, Eric Dumazet

On Wed, 19 Jan 2022 02:04:11 -0800 Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> This series speeds up netns dismantles on hosts
> having many active netns, by making sure two hash tables
> used for IPV4 fib contains uniformly spread items.
> 
> v2: changed second patch to add fib_info_laddrhash_bucket()
>     for consistency (David Ahern suggestion).

Applied, thanks! (the bot does not want to respond for some reason)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-01-19 18:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-19 10:04 [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Eric Dumazet
2022-01-19 10:04 ` [PATCH v2 net 1/2] ipv4: avoid quadratic behavior in netns dismantle Eric Dumazet
2022-01-19 10:04 ` [PATCH v2 net 2/2] ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys Eric Dumazet
2022-01-19 15:40   ` David Ahern
2022-01-19 18:24 ` [PATCH v2 net 0/2] ipv4: avoid pathological hash tables Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).