From: Dave Chinner <david@fromorbit.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>,
Hannes Reinecke <hare@suse.de>,
Pankaj Raghav <p.raghav@samsung.com>,
"kbus @pop.gmail.com>> Keith Busch" <kbusch@kernel.org>,
brauner@kernel.org, viro@zeniv.linux.org.uk,
akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, gost.dev@samsung.com
Subject: Re: [RFC 0/4] convert create_page_buffers to create_folio_buffers
Date: Mon, 17 Apr 2023 08:57:04 +1000 [thread overview]
Message-ID: <20230416225704.GC447837@dread.disaster.area> (raw)
In-Reply-To: <ZDuHEolre/saj8iZ@bombadil.infradead.org>
On Sat, Apr 15, 2023 at 10:26:42PM -0700, Luis Chamberlain wrote:
> > > > Except ... we want to probe a dozen different
> > > > filesystems, and half of them keep their superblock at the same offset
> > > > from the start of the block device. So we do want to keep it cached.
> > > > That's arguing for using the page cache, at least to read it.
> > >
> > > Do we currently share anything from the bdev cache with the fs for this?
> > > Let's say that first block device blocksize in memory.
> >
> > sb_bread() is used by most filesystems, and the buffer cache aliases
> > into the page cache.
>
> I see thanks. I checked what xfs does and its xfs_readsb() uses its own
> xfs_buf_read_uncached(). It ends up calling xfs_buf_submit() and
> xfs_buf_ioapply_map() does it's own submit_bio(). So I'm curious why
> they did that.
XFS has it's own metadata address space for caching - it does not
use the block device page cache at all. This is not new, it never
has.
The xfs_buf buffer cache does not use the page cache, either. It
does it's own thing, has it's own indexing, locking, shrinkers, etc.
IOWs, it does not use the iomap infrastructure at all - iomap is
used by XFS exclusively for data IO.
As for why we use an uncached buffer for the superblock? That's
largely historic because prior to 2007 every modification that did
allocation/free needed to lock and modify the superblock at
transaction commit. Hence it's always needed in memory but a
critical fast path, so it is always directly available without
needing to do a cache lookup to callers that need it.
In 2007, lazy superblock counters got rid of the requirement to lock
the superblock buffer in every transaction commit, so the uncached
buffer optimisation hasn't really been needed for the past decade.
But if it ain't broke, don't try to fix it....
> > > > Now, do we want userspace to be able to dd a new superblock into place
> > > > and have the mounted filesystem see it?
> > >
> > > Not sure I follow this. dd a new super block?
> >
> > In userspace, if I run 'dd if=blah of=/dev/sda1 bs=512 count=1 seek=N',
> > I can overwrite the superblock. Do we want filesystems to see that
> > kind of vandalism, or do we want the mounted filesystem to have its
> > own copy of the data and overwrite what userspace wrote the next time it
> > updates the superblock?
>
> Oh, what happens today?
In XFS, it will completely ignore the fact the the superblock got
trashed like this. When the fs goes idle, or the sb modified for
some other reason, it will relog the in-memory superblock and write
it back to disk, thereby fixing the corruption. i.e. while the
filesystem is mounted, the superblock is _write-only_...
> > (the trick is that this may not be vandalism, it might be the sysadmin
> > updating the uuid or running some fsck-ish program or trying to update
> > the superblock to support fabulous-new-feature on next mount. does this
> > change the answer?)
If you need to change anything in the superblock while the XFS fs is
mounted, then you have to use ioctls to modify the superblock
contents through the running transaction subsystem. Editting the
block device directly breaks the security model of filesystems that
assume they have exclusive access to the block device whilst the
filesystem is mounted....
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-04-16 22:57 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230414110825eucas1p1ed4d16627889ef8542dfa31b1183063d@eucas1p1.samsung.com>
2023-04-14 11:08 ` [RFC 0/4] convert create_page_buffers to create_folio_buffers Pankaj Raghav
2023-04-14 11:08 ` [RFC 1/4] fs/buffer: add set_bh_folio helper Pankaj Raghav
2023-04-14 11:08 ` [RFC 2/4] buffer: add alloc_folio_buffers() helper Pankaj Raghav
2023-04-14 13:06 ` Matthew Wilcox
2023-04-14 15:01 ` Pankaj Raghav
2023-04-14 11:08 ` [RFC 3/4] fs/buffer: add folio_create_empty_buffers helper Pankaj Raghav
2023-04-14 13:16 ` Matthew Wilcox
2023-04-14 11:08 ` [RFC 4/4] fs/buffer: convert create_page_buffers to create_folio_buffers Pankaj Raghav
2023-04-14 13:21 ` Matthew Wilcox
2023-04-14 13:47 ` [RFC 0/4] " Hannes Reinecke
2023-04-14 13:51 ` Matthew Wilcox
2023-04-14 13:56 ` Hannes Reinecke
2023-04-14 15:00 ` Pankaj Raghav
2023-04-15 1:01 ` Luis Chamberlain
2023-04-15 2:31 ` Matthew Wilcox
2023-04-15 3:24 ` Luis Chamberlain
2023-04-15 3:44 ` Matthew Wilcox
2023-04-15 13:14 ` Hannes Reinecke
2023-04-15 17:09 ` Matthew Wilcox
2023-04-16 1:28 ` Luis Chamberlain
2023-04-16 3:40 ` Matthew Wilcox
2023-04-16 5:26 ` Luis Chamberlain
2023-04-16 14:07 ` Matthew Wilcox
2023-04-17 15:40 ` Darrick J. Wong
2023-04-16 22:57 ` Dave Chinner [this message]
2023-04-17 2:27 ` Luis Chamberlain
2023-04-17 6:04 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230416225704.GC447837@dread.disaster.area \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hare@suse.de \
--cc=kbusch@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox