From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: axboe@kernel.dk, Luis Chamberlain <mcgrof@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
linux-block <linux-block@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
xfs <linux-xfs@vger.kernel.org>,
Jack Vogel <jack.vogel@oracle.com>
Subject: Re: [RFC[RAP] 1/2] block: fix race between set_blocksize and read paths
Date: Tue, 15 Apr 2025 22:01:44 -0700 [thread overview]
Message-ID: <20250416050144.GZ25675@frogsfrogsfrogs> (raw)
In-Reply-To: <Z_80_EXzPUiAow2I@infradead.org>
On Tue, Apr 15, 2025 at 09:41:32PM -0700, Christoph Hellwig wrote:
> On Mon, Apr 14, 2025 at 05:14:05PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > With the new large sector size support, it's now the case that
> > set_blocksize can change i_blksize and the folio order in a manner that
> > conflicts with a concurrent reader and causes a kernel crash.
> >
> > Specifically, let's say that udev-worker calls libblkid to detect the
> > labels on a block device. The read call can create an order-0 folio to
> > read the first 4096 bytes from the disk. But then udev is preempted.
> >
> > Next, someone tries to mount an 8k-sectorsize filesystem from the same
> > block device. The filesystem calls set_blksize, which sets i_blksize to
> > 8192 and the minimum folio order to 1.
> >
> > Now udev resumes, still holding the order-0 folio it allocated. It then
> > tries to schedule a read bio and do_mpage_readahead tries to create
> > bufferheads for the folio. Unfortunately, blocks_per_folio == 0 because
> > the page size is 4096 but the blocksize is 8192 so no bufferheads are
> > attached and the bh walk never sets bdev. We then submit the bio with a
> > NULL block device and crash.
> >
>
> Do we have a testcase for blktests or xfstests for this? The issue is
> subtle and some of the code in the patch looks easy to accidentally
> break again (not the fault of this patch primarily).
It's the same patch as:
https://lore.kernel.org/linux-fsdevel/20250408175125.GL6266@frogsfrogsfrogs/
which is to say, xfs/032 with while true; do blkid; done running in the
background to increase the chances of a collision.
> > } else {
> > + inode_lock_shared(bd_inode);
> > ret = blkdev_buffered_write(iocb, from);
> > + inode_unlock_shared(bd_inode);
>
> Does this need a comment why we take i_rwsem?
>
> > + inode_lock_shared(bd_inode);
> > ret = filemap_read(iocb, to, ret);
> > + inode_unlock_shared(bd_inode);
>
> Same here. Especially as the protection is now heavier than for most
> file systems.
Yeah, somewhere we need a better comment. How about this for
set_blocksize:
/*
* Flush and truncate the pagecache before we reconfigure the
* mapping geometry because folio sizes are variable now. If
* a reader has already allocated a folio whose size is smaller
* than the new min_order but invokes readahead after the new
* min_order becomes visible, readahead will think there are
* "zero" blocks per folio and crash.
*/
And then the read/write paths can say something simpler:
/*
* Take i_rwsem and invalidate_lock to avoid racing with a
* blocksize change punching out the pagecache.
*/
> I also wonder if we need locking asserts in some of the write side
> functions that expect the shared inode lock and invalidate lock now?
Probably. Do you have specific places in mind?
--D
next prev parent reply other threads:[~2025-04-16 5:01 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-15 0:14 [RFC[RAP] 1/2] block: fix race between set_blocksize and read paths Darrick J. Wong
2025-04-15 0:33 ` [RF[CRAP] 2/2] xfs: stop using set_blocksize Darrick J. Wong
2025-04-16 4:46 ` Christoph Hellwig
2025-04-16 5:06 ` Darrick J. Wong
2025-04-16 4:41 ` [RFC[RAP] 1/2] block: fix race between set_blocksize and read paths Christoph Hellwig
2025-04-16 5:01 ` Darrick J. Wong [this message]
2025-04-16 5:14 ` Christoph Hellwig
2025-04-18 7:51 ` Shinichiro Kawasaki
2025-04-18 15:29 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250416050144.GZ25675@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=axboe@kernel.dk \
--cc=hch@infradead.org \
--cc=jack.vogel@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox