From: Christian Brauner <brauner@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de>,
Jens Axboe <axboe@kernel.dk>,
"Darrick J. Wong" <djwong@kernel.org>,
linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org
Subject: Re: [PATCH v2 01/34] bdev: open block device as files
Date: Thu, 14 Mar 2024 17:45:50 +0100 [thread overview]
Message-ID: <20240314-tauben-hinkt-7e0bbd6d35fc@brauner> (raw)
In-Reply-To: <20240314-entbehren-folglich-8c8fef0cd49b@brauner>
On Thu, Mar 14, 2024 at 03:47:52PM +0100, Christian Brauner wrote:
> On Thu, Mar 14, 2024 at 12:10:59PM +0100, Christian Brauner wrote:
> > On Tue, Mar 12, 2024 at 07:32:35PM -0700, Christoph Hellwig wrote:
> > > Now that this is in mainline it seems to cause blktests to crash
> > > nbd/003 with a rather non-obvious oops for me:
> >
> > Ok, will be looking into that next.
>
> Ok, I know what's going on. Basically, fput() on the block device runs
> asynchronously which means that bdev->bd_holder can still be set to @sb
> after it has already been freed. Let me illustrate what I mean:
>
> P1 P2
> mount(sb) fd = open("/dev/nbd", ...)
> -> file = bdev_file_open_by_dev(..., sb, ...)
> bdev->bd_holder = sb;
>
> // Later on:
>
> umount(sb)
> ->kill_block_super(sb)
> |----> [fput() -> deferred via task work]
> -> put_super(sb) -> frees the sb via rcu
> |
> | nbd_ioctl(NBD_CLEAR_SOCK)
> | -> disk_force_media_change()
> | -> bdev_mark_dead()
> | -> fs_mark_dead()
> | // Finds bdev->bd_holder == sb
> |-> file->release::bdev_release() // Tries to get reference to it but it's freed; frees it again
> bdev->bd_holder = NULL;
>
> Two solutions that come to my mind:
>
> [1] Indicate to fput() that this is an internal block devices open and
> thus just close it synchronously. This is fine. First, because the block
> device superblock is never unmounted or anything so there's no risk
> from hanging there for any reason. Second, bdev_release() also ran
> synchronously so any issue that we'd see from calling
> file->f_op->release::bdev_release() we would have seen from
> bdev_release() itself. See [1.1] for a patch.
>
> (2) Take a temporary reference to the holder when opening the block
> device. This is also fine afaict because we differentiate between
> passive and active references on superblocks and what we already do
> in fs_bdev_mark_dead() and friends. This mean we don't have to mess
> around with fput(). See [1.2] for a patch.
>
> (3) Revert and cry. No patch.
>
> Personally, I think (2) is more desirable because we don't lose the
> async property of fput() and we don't need to have a new FMODE_* flag.
> I'd like to do some more testing with this. Thoughts?
>
> [1.1]:
> Signed-off-by: Christian Brauner <brauner@kernel.org>
> ---
> block/bdev.c | 1 +
> fs/file_table.c | 5 +++++
> include/linux/fs.h | 3 +++
> 3 files changed, 9 insertions(+)
>
> diff --git a/block/bdev.c b/block/bdev.c
> index e7adaaf1c219..d0c208a04b04 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -969,6 +969,7 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
> return bdev_file;
> }
> ihold(bdev->bd_inode);
> + bdev_file->f_mode |= FMODE_BLOCKDEV;
>
> ret = bdev_open(bdev, mode, holder, hops, bdev_file);
> if (ret) {
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 4f03beed4737..48d35dd67020 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -473,6 +473,11 @@ void fput(struct file *file)
> if (atomic_long_dec_and_test(&file->f_count)) {
> struct task_struct *task = current;
>
> + if (unlikely((file->f_mode & FMODE_BLOCKDEV))) {
> + __fput(file);
> + return;
> + }
> +
> if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
> file_free(file);
> return;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index d5d5a4ee24f0..ceac9c0316a6 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -121,6 +121,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
> #define FMODE_PWRITE ((__force fmode_t)0x10)
> /* File is opened for execution with sys_execve / sys_uselib */
> #define FMODE_EXEC ((__force fmode_t)0x20)
> +
> +/* File is opened as block device. */
> +#define FMODE_BLOCKDEV ((__force fmode_t)0x100)
> /* 32bit hashes as llseek() offset (for directories) */
> #define FMODE_32BITHASH ((__force fmode_t)0x200)
> /* 64bit hashes as llseek() offset (for directories) */
> --
> 2.43.0
>
> [1.2]:
> Sketched-by: Christian Brauner <brauner@kernel.org>
> ---
> block/bdev.c | 4 ++++
> fs/super.c | 17 +++++++++++++++++
> include/linux/blkdev.h | 3 +++
> 3 files changed, 24 insertions(+)
>
> diff --git a/block/bdev.c b/block/bdev.c
> index e7adaaf1c219..a0d5960dc2b9 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -627,6 +627,8 @@ static void bd_end_claim(struct block_device *bdev, void *holder)
> whole->bd_holder = NULL;
> mutex_unlock(&bdev_lock);
>
> + if (bdev->bd_holder_ops && bdev->bd_holder_ops->put_holder)
> + bdev->bd_holder_ops->put_holder(holder);
That should move into bdev_release() obviously so it mirrors
bdev_open(). Plus, currently it's a nop because we just NULLed
bdev->bd_holder_ops above. But you get the idea, I hope.
> /*
> * If this was the last claim, remove holder link and unblock evpoll if
> * it was a write holder.
> @@ -902,6 +904,8 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
> bdev_file->f_mode |= FMODE_NOWAIT;
> bdev_file->f_mapping = bdev->bd_inode->i_mapping;
> bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
> + if (hops && hops->get_holder)
> + hops->get_holder(holder);
> bdev_file->private_data = holder;
>
> return 0;
> diff --git a/fs/super.c b/fs/super.c
> index ee05ab6b37e7..64dbbdbed93a 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1515,11 +1515,28 @@ static int fs_bdev_thaw(struct block_device *bdev)
> return error;
> }
>
> +static void fs_bdev_super_get(void *data)
> +{
> + struct super_block *sb = data;
> +
> + spin_lock(&sb_lock);
> + sb->s_count++;
> + spin_unlock(&sb_lock);
> +}
> +
> +static void fs_bdev_super_put(void *data)
> +{
> + struct super_block *sb = data;
> + put_super(sb);
> +}
> +
> const struct blk_holder_ops fs_holder_ops = {
> .mark_dead = fs_bdev_mark_dead,
> .sync = fs_bdev_sync,
> .freeze = fs_bdev_freeze,
> .thaw = fs_bdev_thaw,
> + .get_holder = fs_bdev_super_get,
> + .put_holder = fs_bdev_super_put,
> };
> EXPORT_SYMBOL_GPL(fs_holder_ops);
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index f9b87c39cab0..d919e8bcb2c1 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1505,6 +1505,9 @@ struct blk_holder_ops {
> * Thaw the file system mounted on the block device.
> */
> int (*thaw)(struct block_device *bdev);
> +
> + void (*get_holder)(void *holder);
> + void (*put_holder)(void *holder);
> };
>
> /*
> --
> 2.43.0
>
next prev parent reply other threads:[~2024-03-14 16:45 UTC|newest]
Thread overview: 146+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-23 13:26 [PATCH v2 00/34] Open block devices as files Christian Brauner
2024-01-23 13:26 ` [PATCH v2 01/34] bdev: open block device " Christian Brauner
2024-01-29 16:02 ` Christoph Hellwig
2024-02-01 17:08 ` Christian Brauner
2024-02-02 6:43 ` Christoph Hellwig
2024-02-02 11:46 ` Christian Brauner
2024-02-09 11:39 ` Christian Brauner
2024-03-13 2:32 ` Christoph Hellwig
2024-03-14 11:10 ` Christian Brauner
2024-03-14 14:47 ` Christian Brauner
2024-03-14 16:45 ` Christian Brauner [this message]
2024-03-14 16:58 ` Jan Kara
2024-03-15 13:23 ` [PATCH] fs,block: get holder during claim Christian Brauner
2024-03-15 14:28 ` Jan Kara
2024-03-19 16:24 ` remove holder ops Christian Brauner
2024-03-19 17:03 ` Matthew Wilcox
2024-03-19 23:13 ` Christoph Hellwig
2024-03-17 20:53 ` [PATCH] fs,block: get holder during claim Christoph Hellwig
2024-03-18 8:33 ` Christian Brauner
2024-03-18 9:10 ` Yi Zhang
2024-01-23 13:26 ` [PATCH v2 02/34] block/ioctl: port blkdev_bszset() to file Christian Brauner
2024-01-29 16:14 ` Christoph Hellwig
2024-01-31 18:10 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 03/34] block/genhd: port disk_scan_partitions() " Christian Brauner
2024-01-29 16:14 ` Christoph Hellwig
2024-01-31 18:13 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 04/34] md: port block device access " Christian Brauner
2024-01-29 16:14 ` Christoph Hellwig
2024-01-31 18:15 ` Jan Kara
2024-04-15 9:26 ` Ming Lei
2024-04-15 12:35 ` Christian Brauner
2024-04-15 13:56 ` Mike Snitzer
2024-04-15 14:35 ` Ming Lei
2024-04-15 14:53 ` Christian Brauner
2024-04-15 15:11 ` Ming Lei
2024-04-15 15:53 ` Mike Snitzer
2024-04-15 16:22 ` Jan Kara
2024-04-16 0:27 ` Ming Lei
2024-01-23 13:26 ` [PATCH v2 05/34] swap: port block device usage " Christian Brauner
2024-01-29 16:15 ` Christoph Hellwig
2024-01-31 18:16 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 06/34] power: port block device access " Christian Brauner
2024-01-29 16:15 ` Christoph Hellwig
2024-01-31 18:17 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 07/34] xfs: port block device access to files Christian Brauner
2024-01-29 16:17 ` Christoph Hellwig
2024-02-01 14:33 ` Christian Brauner
2024-01-31 18:19 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 08/34] drbd: port block device access to file Christian Brauner
2024-01-31 18:22 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 09/34] pktcdvd: " Christian Brauner
2024-01-31 18:26 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 10/34] rnbd: " Christian Brauner
2024-01-31 18:28 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 11/34] xen: " Christian Brauner
2024-01-31 18:31 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 12/34] zram: " Christian Brauner
2024-01-31 18:32 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 13/34] bcache: port block device access to files Christian Brauner
2024-02-01 9:45 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 14/34] block2mtd: port " Christian Brauner
2024-02-01 9:47 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 15/34] nvme: port block device access to file Christian Brauner
2024-02-01 9:48 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 16/34] s390: " Christian Brauner
2024-02-01 10:11 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 17/34] target: " Christian Brauner
2024-02-01 10:12 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 18/34] bcachefs: " Christian Brauner
2024-02-01 10:13 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 19/34] btrfs: port " Christian Brauner
2024-02-01 10:16 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 20/34] erofs: " Christian Brauner
2024-02-01 10:16 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 21/34] ext4: port block " Christian Brauner
2024-02-01 10:18 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 22/34] f2fs: port block device access to files Christian Brauner
2024-02-01 10:19 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 23/34] jfs: port block device access to file Christian Brauner
2024-02-01 10:19 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 24/34] nfs: port block device access to files Christian Brauner
2024-02-01 10:22 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 25/34] ocfs2: port block device access to file Christian Brauner
2024-02-01 10:22 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 26/34] reiserfs: " Christian Brauner
2024-02-01 10:24 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 27/34] bdev: remove bdev_open_by_path() Christian Brauner
2024-01-29 16:17 ` Christoph Hellwig
2024-02-01 10:24 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 28/34] bdev: make bdev_release() private to block layer Christian Brauner
2024-01-29 16:19 ` Christoph Hellwig
2024-02-01 10:26 ` Jan Kara
2024-02-01 14:48 ` Christian Brauner
2024-01-23 13:26 ` [PATCH v2 29/34] bdev: make struct bdev_handle private to the " Christian Brauner
2024-01-29 16:22 ` Christoph Hellwig
2024-02-01 14:50 ` Christian Brauner
2024-02-01 10:54 ` Jan Kara
2024-02-01 15:07 ` Christian Brauner
2024-02-01 17:42 ` Jan Kara
2024-02-01 11:23 ` Jan Kara
2024-02-01 14:52 ` Christian Brauner
2024-01-23 13:26 ` [PATCH v2 30/34] bdev: remove bdev pointer from struct bdev_handle Christian Brauner
2024-01-29 16:22 ` Christoph Hellwig
2024-02-01 10:57 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 31/34] block: use file->f_op to indicate restricted writes Christian Brauner
2024-01-29 16:49 ` Christoph Hellwig
2024-01-29 17:09 ` [PATCH v2 31/34] block: use file->f_op to indicate restricted writes^[ Christian Brauner
2024-01-30 8:32 ` Christoph Hellwig
2024-01-30 9:11 ` Christian Brauner
2024-02-01 11:08 ` [PATCH v2 31/34] block: use file->f_op to indicate restricted writes Jan Kara
2024-02-01 16:16 ` Christian Brauner
2024-02-01 17:36 ` Jan Kara
2024-02-02 11:45 ` Christian Brauner
2024-02-02 11:51 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 32/34] block: remove bdev_handle completely Christian Brauner
2024-01-29 16:50 ` Christoph Hellwig
2024-02-01 11:20 ` Jan Kara
2024-02-01 16:18 ` Christian Brauner
2024-01-23 13:26 ` [PATCH v2 33/34] block: expose bdev_file_inode() Christian Brauner
2024-02-01 10:09 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 34/34] ext4: rely on sb->f_bdev only Christian Brauner
2024-02-01 11:34 ` Jan Kara
2024-02-01 13:40 ` Christian Brauner
2024-01-29 6:17 ` [PATCH v2 00/34] Open block devices as files Christoph Hellwig
2024-01-29 10:17 ` Christian Brauner
2024-01-29 10:56 ` [PATCH RFC 0/2] fs & block: remove bd_inode Christian Brauner
2024-01-29 10:56 ` [PATCH RFC 1/2] fs & block: remove bdev->bd_inode Christian Brauner
2024-02-20 11:57 ` Yu Kuai
2024-02-21 7:36 ` Christian Brauner
2024-01-29 10:56 ` [PATCH RFC 2/2] fs,drivers: remove bdev_inode() usage outside of block layer and drivers Christian Brauner
2024-01-29 14:37 ` Christoph Hellwig
2024-01-29 15:29 ` Christian Brauner
2024-01-29 15:36 ` Christoph Hellwig
2024-02-19 13:34 ` Yu Kuai
2024-02-19 13:42 ` Yu Kuai
2024-02-05 11:55 ` [PATCH v2 00/34] Open block devices as files Christian Brauner
2024-02-05 14:19 ` Jan Kara
2024-02-06 13:39 ` Christian Brauner
2024-02-06 13:58 ` Jan Kara
2024-02-06 16:10 ` Christian Brauner
2024-03-21 22:17 ` Matthew Wilcox
2024-03-22 3:38 ` Kent Overstreet
2024-03-22 13:56 ` Christian Brauner
2024-03-22 12:31 ` Christian Brauner
2024-03-22 12:40 ` Matthew Wilcox
2024-03-22 13:53 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240314-tauben-hinkt-7e0bbd6d35fc@brauner \
--to=brauner@kernel.org \
--cc=axboe@kernel.dk \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox