neigh: poor scalability of forced GC when neighbour count exceeds gc

Netdev List
 help / color / mirror / Atom feed

* neigh: poor scalability of forced GC when neighbour count exceeds gc_thresh3
@ 2026-06-18  8:17 Vimal Agrawal
  2026-06-25 10:20 ` [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full Vimal Agrawal
  0 siblings, 1 reply; 4+ messages in thread
From: Vimal Agrawal @ 2026-06-18  8:17 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern, Jakub Kicinski, Vimal Agrawal

While investigating a soft lockup observed during neighbour table
growth, I noticed that neighbour allocation latency increases
significantly once the number of entries exceeds gc_thresh3.

Test setup:
net.ipv6.neigh.default.gc_thresh1 = 16384
net.ipv6.neigh.default.gc_thresh2 = 32768
net.ipv6.neigh.default.gc_thresh3 = 32768

I created approximately 50,000 reachable neighbour entries and
measured time spent in __neigh_create(). Once the table size exceeds
gc_thresh3, neighbour creation latency increases dramatically (in my
testing, individual allocations can take >16 ms). Profiling shows that
most of the time is spent waiting on tbl->lock, typically held by
neigh_forced_gc().

The relevant path is:
static int neigh_forced_gc(struct neigh_table *tbl)
{
        ...
        write_lock_bh(&tbl->lock);
        list_for_each_entry_safe(n, tmp, &tbl->gc_list, gc_list) {
                if (refcount_read(&n->refcnt) == 1) {
                        ...
In my workload, most entries are active/reachable and have refcnt > 1,
so the GC walk scans a large portion of the neighbour table without
reclaiming entries. As a result, the lock can be held for a long
period while traversing the GC list.

Another observation is that once gc_thresh3 is exceeded, every new
neighbour allocation attempts a forced GC:
entries = atomic_inc_return(&tbl->gc_entries) - 1;

if (entries >= gc_thresh3 ||
    (entries >= READ_ONCE(tbl->gc_thresh2) &&
     time_after(now, READ_ONCE(tbl->last_flush) + 5 * HZ))) {
        if (!neigh_forced_gc(tbl) && entries >= gc_thresh3) {
                ...
Unlike the gc_thresh2 case, there is no rate limiting once the table
is already above gc_thresh3. Under sustained neighbour creation this
results in repeated full GC scans, further increasing contention on
tbl->lock.

Has this scalability issue been discussed previously, or is there a
reason why forced GC above gc_thresh3 is intentionally not
rate-limited?
I would be interested in feedback before working on a patch.


Thanks,
Vimal Agrawal

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full
  2026-06-18  8:17 neigh: poor scalability of forced GC when neighbour count exceeds gc_thresh3 Vimal Agrawal
@ 2026-06-25 10:20 ` Vimal Agrawal
  2026-06-25 15:42   ` Jakub Kicinski
  2026-06-25 21:45   ` Kuniyuki Iwashima
  0 siblings, 2 replies; 4+ messages in thread
From: Vimal Agrawal @ 2026-06-25 10:20 UTC (permalink / raw)
  To: netdev; +Cc: kuniyu, edumazet, vimal.agrawal, avimalin

Once the neighbour table exceeds gc_thresh3, neigh_forced_gc() is called
on every allocation attempt with no rate limiting. In workloads with mostly
active/reachable entries, the GC walk traverses a large portion of the
neighbour table without reclaiming entries, holding tbl->lock for an
extended period. This causes severe lock contention and allocation
latencies exceeding 16ms under sustained neighbour creation.

Add a pre-lock check in neigh_forced_gc() to skip the GC run if one was
performed within the last second, avoiding repeated full table scans and
lock acquisitions on the hot allocation path.

Profiling of neigh_create() shows ~3 orders of magnitude latency
improvement with this change.

Link:https://lore.kernel.org/netdev/CALkUMdSCpx_ywYCx_ePLdm6yioO1nQWx7sSM=AEgsq0kywHxTw@mail.gmail.com/
Signed-off-by: Vimal Agrawal <vimal.agrawal@sophos.com>
---
 net/core/neighbour.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 1349c0eedb64..078842db3c5f 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -260,6 +260,9 @@ static int neigh_forced_gc(struct neigh_table *tbl)
 	int shrunk = 0;
 	int loop = 0;

+	if (!time_after(jiffies, READ_ONCE(tbl->last_flush) + HZ))
+		return 0;
+
 	NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);

 	spin_lock_bh(&tbl->lock);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full
  2026-06-25 10:20 ` [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full Vimal Agrawal
@ 2026-06-25 15:42   ` Jakub Kicinski
  2026-06-25 21:45   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2026-06-25 15:42 UTC (permalink / raw)
  To: Vimal Agrawal; +Cc: netdev, kuniyu, edumazet, vimal.agrawal

On Thu, 25 Jun 2026 10:20:20 +0000 Vimal Agrawal wrote:
> Once the neighbour table exceeds gc_thresh3, neigh_forced_gc() is called
> on every allocation attempt with no rate limiting. In workloads with mostly
> active/reachable entries, the GC walk traverses a large portion of the
> neighbour table without reclaiming entries, holding tbl->lock for an
> extended period. This causes severe lock contention and allocation
> latencies exceeding 16ms under sustained neighbour creation.
> 
> Add a pre-lock check in neigh_forced_gc() to skip the GC run if one was
> performed within the last second, avoiding repeated full table scans and
> lock acquisitions on the hot allocation path.
> 
> Profiling of neigh_create() shows ~3 orders of magnitude latency
> improvement with this change.

I'm not an expert on neigh but 1 second seems a little aggressive.
Can you see if 10msec doesn't give us a similar win?

>  net/core/neighbour.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 1349c0eedb64..078842db3c5f 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -260,6 +260,9 @@ static int neigh_forced_gc(struct neigh_table *tbl)
>  	int shrunk = 0;
>  	int loop = 0;
>  
> +	if (!time_after(jiffies, READ_ONCE(tbl->last_flush) + HZ))
> +		return 0;
> +
>  	NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
>  
>  	spin_lock_bh(&tbl->lock);


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full
  2026-06-25 10:20 ` [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full Vimal Agrawal
  2026-06-25 15:42   ` Jakub Kicinski
@ 2026-06-25 21:45   ` Kuniyuki Iwashima
  1 sibling, 0 replies; 4+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-25 21:45 UTC (permalink / raw)
  To: avimalin; +Cc: edumazet, kuniyu, netdev, vimal.agrawal, kuba

From: Vimal Agrawal <avimalin@gmail.com>
Date: Thu, 25 Jun 2026 10:20:20 +0000
> Once the neighbour table exceeds gc_thresh3, neigh_forced_gc() is called
> on every allocation attempt with no rate limiting. In workloads with mostly
> active/reachable entries, the GC walk traverses a large portion of the
> neighbour table without reclaiming entries, holding tbl->lock for an
> extended period. This causes severe lock contention and allocation
> latencies exceeding 16ms under sustained neighbour creation.
> 
> Add a pre-lock check in neigh_forced_gc() to skip the GC run if one was
> performed within the last second, avoiding repeated full table scans and
> lock acquisitions on the hot allocation path.
> 
> Profiling of neigh_create() shows ~3 orders of magnitude latency
> improvement with this change.
> 
> Link:https://lore.kernel.org/netdev/CALkUMdSCpx_ywYCx_ePLdm6yioO1nQWx7sSM=AEgsq0kywHxTw@mail.gmail.com/

From the thread, these look misconfigured.

---8<---
net.ipv6.neigh.default.gc_thresh2 = 32768
net.ipv6.neigh.default.gc_thresh3 = 32768
---8<---

If gc_thresh3 is larger enough, gc_thresh2 will give you 5s
rate limiting.

If the number of active neigh entries constantly exceeds
gc_thresh3, it will be the correct gc_thresh2 for you.

Also, I guess you want a new kernel param for the first
neigh_hash_alloc(), which is currently fixed for 3, which
is too small for some hosts.

50000 entries require neigh_hash_grow() 13 times.

Can you test this on your real workload, starting from
neigh_hash_shift=16 and appropriate gc_thresh2/3 ?

---8<---
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 1349c0eedb64..a75b3750eec9 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1817,6 +1817,22 @@ EXPORT_SYMBOL(neigh_parms_release);
 static struct lock_class_key neigh_table_proxy_queue_class;
 
 static struct neigh_table __rcu *neigh_tables[NEIGH_NR_TABLES] __read_mostly;
+static __initdata unsigned long neigh_hash_shift = 3;
+
+static int __init neigh_set_hash_shift(char *str)
+{
+	ssize_t ret;
+
+	if (!str)
+		return 0;
+
+	ret = kstrtoul(str, 0, &neigh_hash_shift);
+	if (ret)
+		return 0;
+
+	return 1;
+}
+__setup("neigh_hash_shift=", neigh_set_hash_shift);
 
 void neigh_table_init(int index, struct neigh_table *tbl)
 {
@@ -1843,7 +1859,7 @@ void neigh_table_init(int index, struct neigh_table *tbl)
 		panic("cannot create neighbour proc dir entry");
 #endif
 
-	RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(3));
+	RCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(neigh_hash_shift));
 
 	phsize = (PNEIGH_HASHMASK + 1) * sizeof(struct pneigh_entry *);
 	tbl->phash_buckets = kzalloc(phsize, GFP_KERNEL);
---8<---



> Signed-off-by: Vimal Agrawal <vimal.agrawal@sophos.com>
> ---
>  net/core/neighbour.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 1349c0eedb64..078842db3c5f 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -260,6 +260,9 @@ static int neigh_forced_gc(struct neigh_table *tbl)
>  	int shrunk = 0;
>  	int loop = 0;
>  
> +	if (!time_after(jiffies, READ_ONCE(tbl->last_flush) + HZ))
> +		return 0;
> +
>  	NEIGH_CACHE_STAT_INC(tbl, forced_gc_runs);
>  
>  	spin_lock_bh(&tbl->lock);
> -- 
> 2.17.1
> v

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-25 21:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18  8:17 neigh: poor scalability of forced GC when neighbour count exceeds gc_thresh3 Vimal Agrawal
2026-06-25 10:20 ` [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full Vimal Agrawal
2026-06-25 15:42   ` Jakub Kicinski
2026-06-25 21:45   ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox