From: "Darrick J. Wong" <djwong@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: Luis Chamberlain <mcgrof@kernel.org>,
Hannes Reinecke <hare@suse.de>,
Pankaj Raghav <p.raghav@samsung.com>,
"kbus >> Keith Busch" <kbusch@kernel.org>,
brauner@kernel.org, viro@zeniv.linux.org.uk,
akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, gost.dev@samsung.com
Subject: Re: [RFC 0/4] convert create_page_buffers to create_folio_buffers
Date: Mon, 17 Apr 2023 08:40:09 -0700 [thread overview]
Message-ID: <20230417154009.GC360881@frogsfrogsfrogs> (raw)
In-Reply-To: <ZDwBJVmIN3tLFhXI@casper.infradead.org>
On Sun, Apr 16, 2023 at 03:07:33PM +0100, Matthew Wilcox wrote:
> On Sat, Apr 15, 2023 at 10:26:42PM -0700, Luis Chamberlain wrote:
> > On Sun, Apr 16, 2023 at 04:40:06AM +0100, Matthew Wilcox wrote:
> > > I don't think we
> > > should be overriding the aops, and if we narrow the scope of large folio
> > > support in blockdev t only supporting folio_size == LBA size, it becomes
> > > much more feasible.
> >
> > I'm trying to think of the possible use cases where folio_size != LBA size
> > and I cannot immediately think of some. Yes there are cases where a
> > filesystem may use a different block for say meta data than data, but that
> > I believe is side issue, ie, read/writes for small metadata would have
> > to be accepted. At least for NVMe we have metadata size as part of the
> > LBA format, but from what I understand no Linux filesystem yet uses that.
>
> NVMe metadata is per-block metadata -- a CRC or similar. Filesystem
> metadata is things like directories, inode tables, free space bitmaps,
> etc.
>
> > struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
> > bool retry)
> > {
> [...]
> > head = NULL;
> > offset = PAGE_SIZE;
> > while ((offset -= size) >= 0) {
> >
> > I see now what you say about the buffer head being of the block size
> > bh->b_size = size above.
>
> Yes, just changing that to 'offset = page_size(page);' will do the trick.
>
> > > sb_bread() is used by most filesystems, and the buffer cache aliases
> > > into the page cache.
> >
> > I see thanks. I checked what xfs does and its xfs_readsb() uses its own
> > xfs_buf_read_uncached(). It ends up calling xfs_buf_submit() and
> > xfs_buf_ioapply_map() does it's own submit_bio(). So I'm curious why
> > they did that.
>
> IRIX didn't have an sb_bread() ;-)
>
> > > In userspace, if I run 'dd if=blah of=/dev/sda1 bs=512 count=1 seek=N',
> > > I can overwrite the superblock. Do we want filesystems to see that
> > > kind of vandalism, or do we want the mounted filesystem to have its
> > > own copy of the data and overwrite what userspace wrote the next time it
> > > updates the superblock?
> >
> > Oh, what happens today?
>
> Depends on the filesystem, I think? Not really sure, to be honest.
The filesystem driver sees the vandalism, and can very well crash as a
result[1]. In that case it was corrupted journal contents being
replayed, but the same thing would happen if you wrote a malicious
userspace program to set the metadata_csum feature flag in the ondisk
superblock after mounting the fs.
https://bugzilla.kernel.org/show_bug.cgi?id=82201#c4
I've tried to prevent people from writing to mounted block devices in
the past, but did not succeed. If you try to prevent programs from
opening such devices with O_RDWR/O_WRONLY you then break lvm tools which
require that ability even though they don't actually write anything to
the block device. If you make the block device write_iter function
fail, then old e2fsprogs breaks and you get shouted at for breaking
userspace.
Hence I decided to let security researchers find these bugs and control
the design discussion via CVE. That's not correct and it's not smart,
but it preserves some of my sanity.
--D
next prev parent reply other threads:[~2023-04-17 15:40 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20230414110825eucas1p1ed4d16627889ef8542dfa31b1183063d@eucas1p1.samsung.com>
2023-04-14 11:08 ` [RFC 0/4] convert create_page_buffers to create_folio_buffers Pankaj Raghav
2023-04-14 11:08 ` [RFC 1/4] fs/buffer: add set_bh_folio helper Pankaj Raghav
2023-04-14 11:08 ` [RFC 2/4] buffer: add alloc_folio_buffers() helper Pankaj Raghav
2023-04-14 13:06 ` Matthew Wilcox
2023-04-14 15:01 ` Pankaj Raghav
2023-04-14 11:08 ` [RFC 3/4] fs/buffer: add folio_create_empty_buffers helper Pankaj Raghav
2023-04-14 13:16 ` Matthew Wilcox
2023-04-14 11:08 ` [RFC 4/4] fs/buffer: convert create_page_buffers to create_folio_buffers Pankaj Raghav
2023-04-14 13:21 ` Matthew Wilcox
2023-04-14 13:47 ` [RFC 0/4] " Hannes Reinecke
2023-04-14 13:51 ` Matthew Wilcox
2023-04-14 13:56 ` Hannes Reinecke
2023-04-14 15:00 ` Pankaj Raghav
2023-04-15 1:01 ` Luis Chamberlain
2023-04-15 2:31 ` Matthew Wilcox
2023-04-15 3:24 ` Luis Chamberlain
2023-04-15 3:44 ` Matthew Wilcox
2023-04-15 13:14 ` Hannes Reinecke
2023-04-15 17:09 ` Matthew Wilcox
2023-04-16 1:28 ` Luis Chamberlain
2023-04-16 3:40 ` Matthew Wilcox
2023-04-16 5:26 ` Luis Chamberlain
2023-04-16 14:07 ` Matthew Wilcox
2023-04-17 15:40 ` Darrick J. Wong [this message]
2023-04-16 22:57 ` Dave Chinner
2023-04-17 2:27 ` Luis Chamberlain
2023-04-17 6:04 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230417154009.GC360881@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hare@suse.de \
--cc=kbusch@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox