public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Slava Imameev <slava.imameev@crowdstrike.com>
To: <alexei.starovoitov@gmail.com>
Cc: <ameryhung@gmail.com>, <andrii@kernel.org>, <ast@kernel.org>,
	<bot+bpf-ci@kernel.org>, <bpf@vger.kernel.org>, <clm@meta.com>,
	<daniel@iogearbox.net>, <eddyz87@gmail.com>,
	<ihor.solodrai@linux.dev>, <kernel-team@meta.com>,
	<linux-open-source@crowdstrike.com>, <martin.lau@kernel.org>,
	<memxor@gmail.com>, <netdev@vger.kernel.org>,
	<slava.imameev@crowdstrike.com>, <yonghong.song@linux.dev>
Subject: Re: [PATCH bpf-next v2 2/3] bpf: Use kmalloc_nolock() universally in local storage
Date: Wed, 15 Apr 2026 14:11:50 +1000	[thread overview]
Message-ID: <20260415041150.60473-1-slava.imameev@crowdstrike.com> (raw)
In-Reply-To: <ydbdk5fj3mbjoninyt5lcg5czcrcgghk6ownothijy667zn6h3@eczgskgj43iw>

On Tue, 14 Apr 2026 19:27:00 -0700 Alexei Starovoitov wrote:
> On Mon, Apr 13, 2026 at 01:48:29PM +1000, Slava Imameev wrote:
> > On Fri, 10 Apr 2026 21:39:00 -0700 Alexei Starovoitov wrote:
> > > >
> > > >
> > > > This allows value sizes up to ~65KB. Before this patch, socket and
> > > > inode storage used bpf_map_kzalloc() (backed by regular kmalloc)
> > > > which could handle those large sizes. After this patch, any
> > > > elem_size above KMALLOC_MAX_CACHE_SIZE will silently fail: the map
> > > > creation succeeds via bpf_local_storage_map_alloc_check() but every
> > > > element allocation returns NULL.
> > > >
> > > > Should BPF_LOCAL_STORAGE_MAX_VALUE_SIZE be updated to use
> > > > KMALLOC_MAX_CACHE_SIZE instead of KMALLOC_MAX_SIZE now that all
> > > > storage types go through kmalloc_nolock()?
> > > >
> > > > Slava Imameev raised the same concern for task storage in
> > > > https://urldefense.com/v3/__https://lore.kernel.org/bpf/20260410014341.47043-1-slava.imameev@crowdstrike.com/__;!!BmdzS3_lV9HdKG8!ytFHcGR6fq4YVQZ74Z_LwJ5IKsEaF2vnY03x8-IS51cQyN3SkHYa-6G_vUxk2lW7xvWMNEfSArwyIGXuxeEhe62whEC8AyDpmA$
> > >
> > > Right. Let's update it, but I don't think it's a regression.
> > > On a loaded system kmalloc_large() rarely succeeds for order 2+.
> > > That's why kmalloc_nolock() doesn't attempt to bridge that gap.
> > > One or two contiguous physical pages is the best one can expect.
> > > In early bpf days we picked KMALLOC_MAX_SIZE assuming that
> > > it's a realistic max for kmalloc().
> > > It turned out to be wishful thinking.
> > > kmalloc_large concept should really be removed.
> > > It deceives users into thinking that it's usable.
> >
> > In defense of supporting 8KB-64KB allocations for local
> > storage, we can consider BPF_MAP_TYPE_HASH with BPF_F_NO_PREALLOC
> > as providing similar functionality to replace the missing 8KB-64KB
> > local storage allocation support. However, these map entry
> > allocations can also fail with similar probability since they
> > depend on the same underlying allocator.
> 
> I really hope that 64kb task local storage is not your production code.
> Severs easily have 50k threads. Sometimes more.
> 64k * 50k = 3 Gbytes of memory wasted.
> You need to redesign it from ground up.

This was a research project to replace LRU maps with task
storage. We implemented a garbage collector using a BPF task
iterator to release inactive task allocations. While iterating
over tens of thousands of tasks might be questionable, this was a
proof of concept that, when combined with other measures, could
potentially keep memory pressure in the tens of MBs.

8KB would be sufficient for 99.9% of our allocations, but
sometimes we need 12KB or more. The alternative to task storage
could be BPF_MAP_TYPE_HASH with BPF_F_NO_PREALLOC and a garbage
collector, as we want to reduce dependency on preallocated LRU
maps.

  reply	other threads:[~2026-04-15  4:14 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-11  1:54 [PATCH bpf-next v2 0/3] Use kmalloc_nolock() universally in BPF local storage Amery Hung
2026-04-11  1:54 ` [PATCH bpf-next v2 1/3] selftests/bpf: Remove kmalloc tracing from local storage create bench Amery Hung
2026-04-11  1:54 ` [PATCH bpf-next v2 2/3] bpf: Use kmalloc_nolock() universally in local storage Amery Hung
2026-04-11  2:36   ` bot+bpf-ci
2026-04-11  4:39     ` Alexei Starovoitov
2026-04-12 19:40       ` Slava Imameev
2026-04-13  3:48       ` Slava Imameev
2026-04-15  2:27         ` Alexei Starovoitov
2026-04-15  4:11           ` Slava Imameev [this message]
2026-04-11  1:54 ` [PATCH bpf-next v2 3/3] bpf: Remove gfp_flags plumbing from bpf_local_storage_update() Amery Hung
2026-04-11  4:30 ` [PATCH bpf-next v2 0/3] Use kmalloc_nolock() universally in BPF local storage patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260415041150.60473-1-slava.imameev@crowdstrike.com \
    --to=slava.imameev@crowdstrike.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bot+bpf-ci@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=clm@meta.com \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=ihor.solodrai@linux.dev \
    --cc=kernel-team@meta.com \
    --cc=linux-open-source@crowdstrike.com \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox