Re: [PATCH bpf-next] bpf: lru: adjust free target to avoid global table starvation

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Stanislav Fomichev <stfomichev@gmail.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: bpf@vger.kernel.org,  netdev@vger.kernel.org,  ast@kernel.org,
	 daniel@iogearbox.net,  john.fastabend@gmail.com,
	 martin.lau@linux.dev,  Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH bpf-next] bpf: lru: adjust free target to avoid global table starvation
Date: Tue, 17 Jun 2025 16:08:45 -0400	[thread overview]
Message-ID: <6851cb4dcdae7_2f713f294e4@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <aFGoUWgo09Gfk-Dt@mini-arch>

Stanislav Fomichev wrote:
> On 06/16, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> > 
> > BPF_MAP_TYPE_LRU_HASH can recycle most recent elements well before the
> > map is full, due to percpu reservations and force shrink before
> > neighbor stealing. Once a CPU is unable to borrow from the global map,
> > it will once steal one elem from a neighbor and after that each time
> > flush this one element to the global list and immediately recycle it.
> > 
> > Batch value LOCAL_FREE_TARGET (128) will exhaust a 10K element map
> > with 79 CPUs. CPU 79 will observe this behavior even while its
> > neighbors hold 78 * 127 + 1 * 15 == 9921 free elements (99%).
> > 
> > CPUs need not be active concurrently. The issue can appear with
> > affinity migration, e.g., irqbalance. Each CPU can reserve and then
> > hold onto its 128 elements indefinitely.
> > 
> > Avoid global list exhaustion by limiting aggregate percpu caches to
> > half of map size, by adjusting LOCAL_FREE_TARGET based on cpu count.
> > This change has no effect on sufficiently large tables.
> 
> The code and rationale look good to me!

Great :)

> There is also
> Documentation/bpf/map_lru_hash_update.dot which mentions
> LOCAL_FREE_TARGET, not sure if it's easy to convey these clamping
> details in there? Or, instead, maybe expand on it in
> Documentation/bpf/map_hash.rst?

Good catch. How about in the graph I replace LOCAL_FREE_TARGET by
target_free and in map_hash.rst something like the following diff:

 - Attempt to use CPU-local state to batch operations
-- Attempt to fetch free nodes from global lists
+- Attempt to fetch ``target_free`` free nodes from global lists
 - Attempt to pull any node from a global list and remove it from the hashmap
 - Attempt to pull any node from any CPU's list and remove it from the hashmap
 
+The number of nodes to borrow from the global list in a batch, ``target_free``,
+depends on the size of the map. Larger batch size reduces lock contention, but
+may also exhaust the global structure. The value is computed at map init to
+avoid exhaustion, by limiting aggregate reservation by all CPUs to half the map
+size. Bounded by a minimum of 1 and maximum budget of 128 at a time.

Btw, there is also great documentation on
https://docs.ebpf.io/linux/map-type/BPF_MAP_TYPE_LRU_HASH/. That had a
small error in the order of those Attempt operations above that I
fixed up this week. I'll also update the LOCAL_FREE_TARGET there.
Since it explains the LRU mechanism well, should I link to it as well?

> This <size>/<nrcpu>/2 is a heuristic,
> so maybe we can give some guidance on the recommended fill level for
> small (size/nrcpu < 128) maps?

I don't know if we can suggest a size that works for all cases. It depends on
factors like the number of CPUs that actively update the map and how tolerable
prematurely removed elements are to the workload.

next prev parent reply	other threads:[~2025-06-17 20:08 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-16 14:38 [PATCH bpf-next] bpf: lru: adjust free target to avoid global table starvation Willem de Bruijn
2025-06-17 17:39 ` Stanislav Fomichev
2025-06-17 20:08   ` Willem de Bruijn [this message]
2025-06-17 20:29     ` Stanislav Fomichev
2025-06-18 13:56 ` Anton Protopopov
2025-06-18 13:55   ` Alexei Starovoitov
2025-06-18 22:03     ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6851cb4dcdae7_2f713f294e4@willemb.c.googlers.com.notmuch \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=stfomichev@gmail.com \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox