* [LSF/MM/BPF TOPIC] eBPF-driven Data Placement Hint
@ 2026-02-20 7:14 Naohiro Aota
0 siblings, 0 replies; only message in thread
From: Naohiro Aota @ 2026-02-20 7:14 UTC (permalink / raw)
To: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
bpf@vger.kernel.org
Hi all,
In Zoned Block Devices (ZBD), Garbage Collection (GC) efficiency is the
primary determinant of sustained write performance. While current ZBD-aware
filesystems (F2FS, Btrfs, XFS) are zone-literate, they typically fill zones
"blindly" by appending incoming data as it arrives. When "hot" (frequently
updated) and "cold" (static) data are interleaved within the same zone, the
filesystem must move the cold data to a new zone before the current one can
be reclaimed. This write amplification leads to significant performance
drops across the storage stack.
Existing solutions for data temperature segregation are often static (e.g.,
F2FS file extensions). However, these don't scale well for dynamic,
multi-tenant environments where data lifetime is better predicted by
application context or directory hierarchy.
We propose a unified VFS-level eBPF interface that allows for flexible,
programmatic data steering. By invoking a BPF program (e.g., at file_open),
the system can set necessary steering information, based on parent
directory, process, or custom metadata, to ensure data is directed to a
desired zone or block group.
# Use Cases:
- ZBD GC Efficiency: Programmatically separating short-lived journals and
write-ahead logs (WAL) from colder database workloads into distinct zone
pools, enabling clean zone resets.
- Multi-Tenant Isolation: Using cgroup context to influence e.g, block
group selection in ext4, preventing "noisy neighbor" fragmentation.
- HDD Performance Zoning: Steering latency-critical metadata to
high-performance areas of the drive while pushing archive data to
lower-tier zones, accounting for hardware-specific geometry.
- Optional: Transparent Compression Hints: Expanding the interface to skip
compression for already-encrypted/compressed formats or choosing between
LZ4 and Zstd based on workload priority.
# Discussion Points:
- The Attachment Point: We currently use a struct_ops approach in XFS, but
a cleaner VFS-level path would be allowing bpf_prog_attach on a directory
FD.
Discussion: Is a new hook in struct file_operations the right way to
"pin" a policy to a subtree? Are there better alternatives?
- The Placement "Contract": Defining a stable struct bpf_placement_hint
that works across XFS, Btrfs, F2FS, and ext4:
struct bpf_placement_hint {
enum rw_hint i_write_hint;
u32 stream_id; /* Generic Stream/Class ID / Block Group hint */
u64 flags; /* Behavioral toggles (e.g., ALLOC_FROM_HEAD, NO_COMPRESS) */
u64 private_data; /* FS-specific context, could have compression hint here? */
};
- Lifecycle and Persistence: BPF programs are runtime-only.
Discussion: Should we keep this as an FD-based, runtime-only attachment,
or is there a need for persistence (e.g., via xattr)?
I look forward to hearing your thoughts.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-02-20 7:15 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-20 7:14 [LSF/MM/BPF TOPIC] eBPF-driven Data Placement Hint Naohiro Aota
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox