Netdev List
 help / color / mirror / Atom feed
* neigh: poor scalability of forced GC when neighbour count exceeds gc_thresh3
@ 2026-06-18  8:17 Vimal Agrawal
  2026-06-25 10:20 ` [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full Vimal Agrawal
  0 siblings, 1 reply; 4+ messages in thread
From: Vimal Agrawal @ 2026-06-18  8:17 UTC (permalink / raw)
  To: netdev; +Cc: David Ahern, Jakub Kicinski, Vimal Agrawal

While investigating a soft lockup observed during neighbour table
growth, I noticed that neighbour allocation latency increases
significantly once the number of entries exceeds gc_thresh3.

Test setup:
net.ipv6.neigh.default.gc_thresh1 = 16384
net.ipv6.neigh.default.gc_thresh2 = 32768
net.ipv6.neigh.default.gc_thresh3 = 32768

I created approximately 50,000 reachable neighbour entries and
measured time spent in __neigh_create(). Once the table size exceeds
gc_thresh3, neighbour creation latency increases dramatically (in my
testing, individual allocations can take >16 ms). Profiling shows that
most of the time is spent waiting on tbl->lock, typically held by
neigh_forced_gc().

The relevant path is:
static int neigh_forced_gc(struct neigh_table *tbl)
{
        ...
        write_lock_bh(&tbl->lock);
        list_for_each_entry_safe(n, tmp, &tbl->gc_list, gc_list) {
                if (refcount_read(&n->refcnt) == 1) {
                        ...
In my workload, most entries are active/reachable and have refcnt > 1,
so the GC walk scans a large portion of the neighbour table without
reclaiming entries. As a result, the lock can be held for a long
period while traversing the GC list.

Another observation is that once gc_thresh3 is exceeded, every new
neighbour allocation attempts a forced GC:
entries = atomic_inc_return(&tbl->gc_entries) - 1;

if (entries >= gc_thresh3 ||
    (entries >= READ_ONCE(tbl->gc_thresh2) &&
     time_after(now, READ_ONCE(tbl->last_flush) + 5 * HZ))) {
        if (!neigh_forced_gc(tbl) && entries >= gc_thresh3) {
                ...
Unlike the gc_thresh2 case, there is no rate limiting once the table
is already above gc_thresh3. Under sustained neighbour creation this
results in repeated full GC scans, further increasing contention on
tbl->lock.

Has this scalability issue been discussed previously, or is there a
reason why forced GC above gc_thresh3 is intentionally not
rate-limited?
I would be interested in feedback before working on a patch.


Thanks,
Vimal Agrawal

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-25 21:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-18  8:17 neigh: poor scalability of forced GC when neighbour count exceeds gc_thresh3 Vimal Agrawal
2026-06-25 10:20 ` [PATCH net-next] net: neigh: avoid calling neigh_forced_gc on every alloc when table is full Vimal Agrawal
2026-06-25 15:42   ` Jakub Kicinski
2026-06-25 21:45   ` Kuniyuki Iwashima

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox