From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Catherine Hoang <catherine.hoang@oracle.com>, linux-ext4@vger.kernel.org
Subject: Re: [RFC PATCH v2 0/4] remove buffer heads from ext2
Date: Fri, 4 Apr 2025 09:43:22 -0700 [thread overview]
Message-ID: <20250404164322.GB6307@frogsfrogsfrogs> (raw)
In-Reply-To: <Z-UpSq8jLIUXMf-Z@infradead.org>
On Thu, Mar 27, 2025 at 03:32:42AM -0700, Christoph Hellwig wrote:
> On Tue, Mar 25, 2025 at 06:49:24PM -0700, Catherine Hoang wrote:
> > Hi all,
> >
> > This series is an effort to begin removing buffer heads from ext2.
>
> Why is that desirable?
struct buffer_head is a mismash of things -- originally it was a landing
place for the old buffer cache, right? So it has the necessary things
like a pointer to a memory page, the disk address, a length, buffer
state flags (uptodate/dirty), and some locks. For filesystem metadata
blocks I think that's all that most filesystems really need. Assuming
that filesystems /never/ want overlapping metadata buffers, I think it's
more efficient to look up buffer objects via an rhashtable instead of
walking the address_space xarray to find a folio, and then walking a
linked list from that folio to find the particular bh.
Unfortunately, it also has a bunch of file mapping state information
(e.g. BH_Delalloc) that aren't needed for caching metadata blocks. All
the confusion that results from the incohesive mixing of these two
usecases goes away by separating out the metadata buffers into a
separate cache and (ha) leaving the filesystems to port the file IO
paths to iomap.
Separating filesystem metadata buffers into a private datastructure
instead of using the blockdev pagecache also closes off an entire class
of attack surface where evil userspace can wait for a filesystem to load
a metadata block into memory and validate it; and then scribble on the
pagecache block to cause the filesystem driver to make the wrong
decisions -- look at all the ext4 metadata_csum bugs where syzkaller
discovered that the decision to call the crc32c driver was gated on a
bit in a bufferhead, and setting that bit having not initialized the
crc32c driver would lead to a kernel crash. Nowadays we have
CONFIG_BLK_DEV_WRITE_MOUNTED to shut that down, though it defaults to y
and I think that might actually break leased layout things like pnfs.
So the upsides are: faster lookups, a more cohesive data structure that
only tries to do one thing, and closing attack surfaces.
The downsides: this new buffer cache code still needs: an explicit hook
into the dirty pagecache timeout to start its own writeback; to provide
its own shrinker; and some sort of solution for file mapping metadata so
that fsync can flush just those blocks and not the whole cache.
--D
next prev parent reply other threads:[~2025-04-04 16:43 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-26 1:49 [RFC PATCH v2 0/4] remove buffer heads from ext2 Catherine Hoang
2025-03-26 1:49 ` [RFC PATCH v2 1/4] ext2: remove buffer heads from superblock Catherine Hoang
2025-03-28 18:24 ` Darrick J. Wong
2025-03-26 1:49 ` [RFC PATCH v2 2/4] ext2: remove buffer heads from group descriptors Catherine Hoang
2025-03-28 18:29 ` Darrick J. Wong
2025-03-26 1:49 ` [RFC PATCH v2 3/4] ext2: remove buffer heads from quota handling Catherine Hoang
2025-03-26 1:49 ` [RFC PATCH v2 4/4] ext2: remove buffer heads from block bitmaps Catherine Hoang
2025-03-27 10:32 ` [RFC PATCH v2 0/4] remove buffer heads from ext2 Christoph Hellwig
2025-04-04 16:43 ` Darrick J. Wong [this message]
2025-04-07 6:25 ` Christoph Hellwig
2025-04-07 16:54 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250404164322.GB6307@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=catherine.hoang@oracle.com \
--cc=hch@infradead.org \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox