From: Tal Zussman <tz2294@columbia.edu>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-mm@kvack.org, bpf@vger.kernel.org,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeel.butt@linux.dev>,
Emil Tsalapatis <emil@etsalapatis.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Josh Don <joshdon@google.com>, Greg Thelen <gthelen@google.com>,
david@kernel.org
Subject: [LSF/MM/BPF TOPIC] Upstreaming cache_ext: Custom Page Cache Eviction with eBPF
Date: Wed, 25 Mar 2026 17:15:37 -0400 [thread overview]
Message-ID: <0b1293ca-7a1f-4358-bc20-15784452238d@columbia.edu> (raw)
Hi all,
This proposal is late, but I've been encouraged by a few people to submit
it, so hopefully it's not *too* late...
I would like to propose a session to discuss cache_ext, a framework that
allows applications to customize page cache eviction policies using eBPF.
This work was published at SOSP'25 [1] and presented at LPC in the eBPF
track [2]. The preliminary code is available at [3].
This topic spans both MM and BPF, so it might fit best as part of the joint
BPF+MM session that Roman and Shakeel have mentioned [4].
Background
----------
The kernel offers two built-in page cache eviction options: Active/Inactive
LRU and MGLRU. Neither is optimal for all workloads, and the existing
customization interfaces (fadvise, madvise, sysctl) are limited and often do
not behave as expected. Applications that need better caching behavior today
are stuck either living with the default policy or implementing their own
userspace caches, which are hard to share across processes and often still
rely on the page cache as a second tier.
What cache_ext does
-------------------
cache_ext uses eBPF struct_ops to let applications define custom eviction
policies that run in the kernel. Inspired by sched_ext, it provides:
- Six policy function hooks: init, evict_folios, folio_added,
folio_accessed, folio_removed, and admit_folio (admission filtering).
- An eviction list API (kfuncs) for creating and manipulating
variable-sized linked lists of folios. Policies can use multiple
lists.
- A batched eviction candidate interface: policies propose up to 32 folios
per eviction request; the kernel validates and evicts them.
- Per-cgroup isolation: each cgroup can run its own policy without
interfering with others using per-cgroup struct_ops programs.
We have implemented eight policies on cache_ext, from simple (FIFO, LFU,
MRU) to sophisticated (S3-FIFO, LHD, MGLRU), as well as application-informed
policies. Our evaluation shows that matching the policy to the workload can
improve throughput by up to 1.7x and reduce P99 latency by up to 58%, and
that, in general, no single policy is best for all workloads.
The kernel changes in our prototype are roughly 2000 lines total, of which
only about 210 lines modify core page cache code, 80 lines touch the
verifier, and 80 lines touch cgroup code. The rest is self-contained
cache_ext functionality (eviction list kfuncs and registry operations),
but much of this can and will be simplified.
Discussion Topics
-----------------
1. Interface design
Right now the page cache is not modularized. cache_ext adds hooks into
the page cache in an ad hoc fashion, inserting struct_ops callbacks at
six points in the page cache. Is this the right abstraction? Are there
page cache events we are missing? A longer-term goal could be a more
systematic modularization of the page cache to make it amenable to
extensibility, but that is a much larger effort -- we would like to
discuss what a practical first step looks like.
2. Relationship with MGLRU
cache_ext is currently built on top of the active/inactive lists
infrastructure. Can we instead make use of MGLRU's infrastructure (e.g.,
the access bit scanning)? This also raises the question of whether we can
split MGLRU into reusable infrastructure and policy, so that policies
could build on MGLRU's infrastructure while replacing its policy logic.
3. Eviction list data structures
cache_ext implements eviction lists as kernel-managed linked lists
exposed via kfuncs. Could we use BPF arenas instead, as sched_ext does?
And how would arenas affect the ability to fall back to the kernel's
default policy when a BPF policy misbehaves or fails to propose enough
eviction candidates? Are the eviction interface and data structures
powerful enough as-is?
4. Path to upstreaming
cache_ext was developed on Linux v6.6. We are currently working on
rebasing to the latest kernel and should have more progress in the next
month. There are a few other issues we plan to fix and clean up along the
way, but in general, what does the path towards upstreaming cache_ext
look like?
5. Future extensions
Beyond file-backed page eviction, there are natural next steps that could
be explored down the line. Prefetching customization has been looked at
before (FetchBPF [5]). Extending cache_ext to cover anonymous memory and
swap decisions has also been mentioned as a natural extension. This
could also have interesting interactions with Shakeel's memcg_ext
proposal [6].
Links
-----
[1] https://doi.org/10.1145/3731569.3764820 (SOSP'25 paper)
[2] https://lpc.events/event/19/contributions/2165/ (LPC talk)
[3] https://github.com/cache-ext/cache_ext
[4] https://lore.kernel.org/lkml/aa9SB6OzocfwL9kO@linux.dev/
[5] https://www.usenix.org/conference/atc24/presentation/cao
[6] https://lore.kernel.org/lkml/20260307182424.2889780-1-shakeel.butt@linux.dev/
Thanks,
Tal
reply other threads:[~2026-03-25 21:15 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0b1293ca-7a1f-4358-bc20-15784452238d@columbia.edu \
--to=tz2294@columbia.edu \
--cc=bpf@vger.kernel.org \
--cc=david@kernel.org \
--cc=emil@etsalapatis.com \
--cc=gthelen@google.com \
--cc=joshdon@google.com \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox