public inbox for linux-ext4@vger.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Catherine Hoang <catherine.hoang@oracle.com>, linux-ext4@vger.kernel.org
Subject: Re: [RFC PATCH v2 0/4] remove buffer heads from ext2
Date: Fri, 4 Apr 2025 09:43:22 -0700	[thread overview]
Message-ID: <20250404164322.GB6307@frogsfrogsfrogs> (raw)
In-Reply-To: <Z-UpSq8jLIUXMf-Z@infradead.org>

On Thu, Mar 27, 2025 at 03:32:42AM -0700, Christoph Hellwig wrote:
> On Tue, Mar 25, 2025 at 06:49:24PM -0700, Catherine Hoang wrote:
> > Hi all,
> > 
> > This series is an effort to begin removing buffer heads from ext2. 
> 
> Why is that desirable?

struct buffer_head is a mismash of things -- originally it was a landing
place for the old buffer cache, right?  So it has the necessary things
like a pointer to a memory page, the disk address, a length, buffer
state flags (uptodate/dirty), and some locks.  For filesystem metadata
blocks I think that's all that most filesystems really need.  Assuming
that filesystems /never/ want overlapping metadata buffers, I think it's
more efficient to look up buffer objects via an rhashtable instead of
walking the address_space xarray to find a folio, and then walking a
linked list from that folio to find the particular bh.

Unfortunately, it also has a bunch of file mapping state information
(e.g. BH_Delalloc) that aren't needed for caching metadata blocks.  All
the confusion that results from the incohesive mixing of these two
usecases goes away by separating out the metadata buffers into a
separate cache and (ha) leaving the filesystems to port the file IO
paths to iomap.

Separating filesystem metadata buffers into a private datastructure
instead of using the blockdev pagecache also closes off an entire class
of attack surface where evil userspace can wait for a filesystem to load
a metadata block into memory and validate it; and then scribble on the
pagecache block to cause the filesystem driver to make the wrong
decisions -- look at all the ext4 metadata_csum bugs where syzkaller
discovered that the decision to call the crc32c driver was gated on a
bit in a bufferhead, and setting that bit having not initialized the
crc32c driver would lead to a kernel crash.  Nowadays we have
CONFIG_BLK_DEV_WRITE_MOUNTED to shut that down, though it defaults to y
and I think that might actually break leased layout things like pnfs.

So the upsides are: faster lookups, a more cohesive data structure that
only tries to do one thing, and closing attack surfaces.

The downsides: this new buffer cache code still needs: an explicit hook
into the dirty pagecache timeout to start its own writeback; to provide
its own shrinker; and some sort of solution for file mapping metadata so
that fsync can flush just those blocks and not the whole cache.

--D

  reply	other threads:[~2025-04-04 16:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-26  1:49 [RFC PATCH v2 0/4] remove buffer heads from ext2 Catherine Hoang
2025-03-26  1:49 ` [RFC PATCH v2 1/4] ext2: remove buffer heads from superblock Catherine Hoang
2025-03-28 18:24   ` Darrick J. Wong
2025-03-26  1:49 ` [RFC PATCH v2 2/4] ext2: remove buffer heads from group descriptors Catherine Hoang
2025-03-28 18:29   ` Darrick J. Wong
2025-03-26  1:49 ` [RFC PATCH v2 3/4] ext2: remove buffer heads from quota handling Catherine Hoang
2025-03-26  1:49 ` [RFC PATCH v2 4/4] ext2: remove buffer heads from block bitmaps Catherine Hoang
2025-03-27 10:32 ` [RFC PATCH v2 0/4] remove buffer heads from ext2 Christoph Hellwig
2025-04-04 16:43   ` Darrick J. Wong [this message]
2025-04-07  6:25     ` Christoph Hellwig
2025-04-07 16:54       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250404164322.GB6307@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=catherine.hoang@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox