From: Christian Brauner <brauner@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de>,
Jens Axboe <axboe@kernel.dk>,
"Darrick J. Wong" <djwong@kernel.org>,
linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org
Subject: Re: [PATCH v2 01/34] bdev: open block device as files
Date: Thu, 14 Mar 2024 15:47:52 +0100 [thread overview]
Message-ID: <20240314-entbehren-folglich-8c8fef0cd49b@brauner> (raw)
In-Reply-To: <20240314-anfassen-teilnahm-20890c4a22c3@brauner>
On Thu, Mar 14, 2024 at 12:10:59PM +0100, Christian Brauner wrote:
> On Tue, Mar 12, 2024 at 07:32:35PM -0700, Christoph Hellwig wrote:
> > Now that this is in mainline it seems to cause blktests to crash
> > nbd/003 with a rather non-obvious oops for me:
>
> Ok, will be looking into that next.
Ok, I know what's going on. Basically, fput() on the block device runs
asynchronously which means that bdev->bd_holder can still be set to @sb
after it has already been freed. Let me illustrate what I mean:
P1 P2
mount(sb) fd = open("/dev/nbd", ...)
-> file = bdev_file_open_by_dev(..., sb, ...)
bdev->bd_holder = sb;
// Later on:
umount(sb)
->kill_block_super(sb)
|----> [fput() -> deferred via task work]
-> put_super(sb) -> frees the sb via rcu
|
| nbd_ioctl(NBD_CLEAR_SOCK)
| -> disk_force_media_change()
| -> bdev_mark_dead()
| -> fs_mark_dead()
| // Finds bdev->bd_holder == sb
|-> file->release::bdev_release() // Tries to get reference to it but it's freed; frees it again
bdev->bd_holder = NULL;
Two solutions that come to my mind:
[1] Indicate to fput() that this is an internal block devices open and
thus just close it synchronously. This is fine. First, because the block
device superblock is never unmounted or anything so there's no risk
from hanging there for any reason. Second, bdev_release() also ran
synchronously so any issue that we'd see from calling
file->f_op->release::bdev_release() we would have seen from
bdev_release() itself. See [1.1] for a patch.
(2) Take a temporary reference to the holder when opening the block
device. This is also fine afaict because we differentiate between
passive and active references on superblocks and what we already do
in fs_bdev_mark_dead() and friends. This mean we don't have to mess
around with fput(). See [1.2] for a patch.
(3) Revert and cry. No patch.
Personally, I think (2) is more desirable because we don't lose the
async property of fput() and we don't need to have a new FMODE_* flag.
I'd like to do some more testing with this. Thoughts?
[1.1]:
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
block/bdev.c | 1 +
fs/file_table.c | 5 +++++
include/linux/fs.h | 3 +++
3 files changed, 9 insertions(+)
diff --git a/block/bdev.c b/block/bdev.c
index e7adaaf1c219..d0c208a04b04 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -969,6 +969,7 @@ struct file *bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
return bdev_file;
}
ihold(bdev->bd_inode);
+ bdev_file->f_mode |= FMODE_BLOCKDEV;
ret = bdev_open(bdev, mode, holder, hops, bdev_file);
if (ret) {
diff --git a/fs/file_table.c b/fs/file_table.c
index 4f03beed4737..48d35dd67020 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -473,6 +473,11 @@ void fput(struct file *file)
if (atomic_long_dec_and_test(&file->f_count)) {
struct task_struct *task = current;
+ if (unlikely((file->f_mode & FMODE_BLOCKDEV))) {
+ __fput(file);
+ return;
+ }
+
if (unlikely(!(file->f_mode & (FMODE_BACKING | FMODE_OPENED)))) {
file_free(file);
return;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d5d5a4ee24f0..ceac9c0316a6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -121,6 +121,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
#define FMODE_PWRITE ((__force fmode_t)0x10)
/* File is opened for execution with sys_execve / sys_uselib */
#define FMODE_EXEC ((__force fmode_t)0x20)
+
+/* File is opened as block device. */
+#define FMODE_BLOCKDEV ((__force fmode_t)0x100)
/* 32bit hashes as llseek() offset (for directories) */
#define FMODE_32BITHASH ((__force fmode_t)0x200)
/* 64bit hashes as llseek() offset (for directories) */
--
2.43.0
[1.2]:
Sketched-by: Christian Brauner <brauner@kernel.org>
---
block/bdev.c | 4 ++++
fs/super.c | 17 +++++++++++++++++
include/linux/blkdev.h | 3 +++
3 files changed, 24 insertions(+)
diff --git a/block/bdev.c b/block/bdev.c
index e7adaaf1c219..a0d5960dc2b9 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -627,6 +627,8 @@ static void bd_end_claim(struct block_device *bdev, void *holder)
whole->bd_holder = NULL;
mutex_unlock(&bdev_lock);
+ if (bdev->bd_holder_ops && bdev->bd_holder_ops->put_holder)
+ bdev->bd_holder_ops->put_holder(holder);
/*
* If this was the last claim, remove holder link and unblock evpoll if
* it was a write holder.
@@ -902,6 +904,8 @@ int bdev_open(struct block_device *bdev, blk_mode_t mode, void *holder,
bdev_file->f_mode |= FMODE_NOWAIT;
bdev_file->f_mapping = bdev->bd_inode->i_mapping;
bdev_file->f_wb_err = filemap_sample_wb_err(bdev_file->f_mapping);
+ if (hops && hops->get_holder)
+ hops->get_holder(holder);
bdev_file->private_data = holder;
return 0;
diff --git a/fs/super.c b/fs/super.c
index ee05ab6b37e7..64dbbdbed93a 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1515,11 +1515,28 @@ static int fs_bdev_thaw(struct block_device *bdev)
return error;
}
+static void fs_bdev_super_get(void *data)
+{
+ struct super_block *sb = data;
+
+ spin_lock(&sb_lock);
+ sb->s_count++;
+ spin_unlock(&sb_lock);
+}
+
+static void fs_bdev_super_put(void *data)
+{
+ struct super_block *sb = data;
+ put_super(sb);
+}
+
const struct blk_holder_ops fs_holder_ops = {
.mark_dead = fs_bdev_mark_dead,
.sync = fs_bdev_sync,
.freeze = fs_bdev_freeze,
.thaw = fs_bdev_thaw,
+ .get_holder = fs_bdev_super_get,
+ .put_holder = fs_bdev_super_put,
};
EXPORT_SYMBOL_GPL(fs_holder_ops);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f9b87c39cab0..d919e8bcb2c1 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1505,6 +1505,9 @@ struct blk_holder_ops {
* Thaw the file system mounted on the block device.
*/
int (*thaw)(struct block_device *bdev);
+
+ void (*get_holder)(void *holder);
+ void (*put_holder)(void *holder);
};
/*
--
2.43.0
next prev parent reply other threads:[~2024-03-14 14:47 UTC|newest]
Thread overview: 146+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-23 13:26 [PATCH v2 00/34] Open block devices as files Christian Brauner
2024-01-23 13:26 ` [PATCH v2 01/34] bdev: open block device " Christian Brauner
2024-01-29 16:02 ` Christoph Hellwig
2024-02-01 17:08 ` Christian Brauner
2024-02-02 6:43 ` Christoph Hellwig
2024-02-02 11:46 ` Christian Brauner
2024-02-09 11:39 ` Christian Brauner
2024-03-13 2:32 ` Christoph Hellwig
2024-03-14 11:10 ` Christian Brauner
2024-03-14 14:47 ` Christian Brauner [this message]
2024-03-14 16:45 ` Christian Brauner
2024-03-14 16:58 ` Jan Kara
2024-03-15 13:23 ` [PATCH] fs,block: get holder during claim Christian Brauner
2024-03-15 14:28 ` Jan Kara
2024-03-19 16:24 ` remove holder ops Christian Brauner
2024-03-19 17:03 ` Matthew Wilcox
2024-03-19 23:13 ` Christoph Hellwig
2024-03-17 20:53 ` [PATCH] fs,block: get holder during claim Christoph Hellwig
2024-03-18 8:33 ` Christian Brauner
2024-03-18 9:10 ` Yi Zhang
2024-01-23 13:26 ` [PATCH v2 02/34] block/ioctl: port blkdev_bszset() to file Christian Brauner
2024-01-29 16:14 ` Christoph Hellwig
2024-01-31 18:10 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 03/34] block/genhd: port disk_scan_partitions() " Christian Brauner
2024-01-29 16:14 ` Christoph Hellwig
2024-01-31 18:13 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 04/34] md: port block device access " Christian Brauner
2024-01-29 16:14 ` Christoph Hellwig
2024-01-31 18:15 ` Jan Kara
2024-04-15 9:26 ` Ming Lei
2024-04-15 12:35 ` Christian Brauner
2024-04-15 13:56 ` Mike Snitzer
2024-04-15 14:35 ` Ming Lei
2024-04-15 14:53 ` Christian Brauner
2024-04-15 15:11 ` Ming Lei
2024-04-15 15:53 ` Mike Snitzer
2024-04-15 16:22 ` Jan Kara
2024-04-16 0:27 ` Ming Lei
2024-01-23 13:26 ` [PATCH v2 05/34] swap: port block device usage " Christian Brauner
2024-01-29 16:15 ` Christoph Hellwig
2024-01-31 18:16 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 06/34] power: port block device access " Christian Brauner
2024-01-29 16:15 ` Christoph Hellwig
2024-01-31 18:17 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 07/34] xfs: port block device access to files Christian Brauner
2024-01-29 16:17 ` Christoph Hellwig
2024-02-01 14:33 ` Christian Brauner
2024-01-31 18:19 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 08/34] drbd: port block device access to file Christian Brauner
2024-01-31 18:22 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 09/34] pktcdvd: " Christian Brauner
2024-01-31 18:26 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 10/34] rnbd: " Christian Brauner
2024-01-31 18:28 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 11/34] xen: " Christian Brauner
2024-01-31 18:31 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 12/34] zram: " Christian Brauner
2024-01-31 18:32 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 13/34] bcache: port block device access to files Christian Brauner
2024-02-01 9:45 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 14/34] block2mtd: port " Christian Brauner
2024-02-01 9:47 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 15/34] nvme: port block device access to file Christian Brauner
2024-02-01 9:48 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 16/34] s390: " Christian Brauner
2024-02-01 10:11 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 17/34] target: " Christian Brauner
2024-02-01 10:12 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 18/34] bcachefs: " Christian Brauner
2024-02-01 10:13 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 19/34] btrfs: port " Christian Brauner
2024-02-01 10:16 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 20/34] erofs: " Christian Brauner
2024-02-01 10:16 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 21/34] ext4: port block " Christian Brauner
2024-02-01 10:18 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 22/34] f2fs: port block device access to files Christian Brauner
2024-02-01 10:19 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 23/34] jfs: port block device access to file Christian Brauner
2024-02-01 10:19 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 24/34] nfs: port block device access to files Christian Brauner
2024-02-01 10:22 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 25/34] ocfs2: port block device access to file Christian Brauner
2024-02-01 10:22 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 26/34] reiserfs: " Christian Brauner
2024-02-01 10:24 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 27/34] bdev: remove bdev_open_by_path() Christian Brauner
2024-01-29 16:17 ` Christoph Hellwig
2024-02-01 10:24 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 28/34] bdev: make bdev_release() private to block layer Christian Brauner
2024-01-29 16:19 ` Christoph Hellwig
2024-02-01 10:26 ` Jan Kara
2024-02-01 14:48 ` Christian Brauner
2024-01-23 13:26 ` [PATCH v2 29/34] bdev: make struct bdev_handle private to the " Christian Brauner
2024-01-29 16:22 ` Christoph Hellwig
2024-02-01 14:50 ` Christian Brauner
2024-02-01 10:54 ` Jan Kara
2024-02-01 15:07 ` Christian Brauner
2024-02-01 17:42 ` Jan Kara
2024-02-01 11:23 ` Jan Kara
2024-02-01 14:52 ` Christian Brauner
2024-01-23 13:26 ` [PATCH v2 30/34] bdev: remove bdev pointer from struct bdev_handle Christian Brauner
2024-01-29 16:22 ` Christoph Hellwig
2024-02-01 10:57 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 31/34] block: use file->f_op to indicate restricted writes Christian Brauner
2024-01-29 16:49 ` Christoph Hellwig
2024-01-29 17:09 ` [PATCH v2 31/34] block: use file->f_op to indicate restricted writes^[ Christian Brauner
2024-01-30 8:32 ` Christoph Hellwig
2024-01-30 9:11 ` Christian Brauner
2024-02-01 11:08 ` [PATCH v2 31/34] block: use file->f_op to indicate restricted writes Jan Kara
2024-02-01 16:16 ` Christian Brauner
2024-02-01 17:36 ` Jan Kara
2024-02-02 11:45 ` Christian Brauner
2024-02-02 11:51 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 32/34] block: remove bdev_handle completely Christian Brauner
2024-01-29 16:50 ` Christoph Hellwig
2024-02-01 11:20 ` Jan Kara
2024-02-01 16:18 ` Christian Brauner
2024-01-23 13:26 ` [PATCH v2 33/34] block: expose bdev_file_inode() Christian Brauner
2024-02-01 10:09 ` Jan Kara
2024-01-23 13:26 ` [PATCH v2 34/34] ext4: rely on sb->f_bdev only Christian Brauner
2024-02-01 11:34 ` Jan Kara
2024-02-01 13:40 ` Christian Brauner
2024-01-29 6:17 ` [PATCH v2 00/34] Open block devices as files Christoph Hellwig
2024-01-29 10:17 ` Christian Brauner
2024-01-29 10:56 ` [PATCH RFC 0/2] fs & block: remove bd_inode Christian Brauner
2024-01-29 10:56 ` [PATCH RFC 1/2] fs & block: remove bdev->bd_inode Christian Brauner
2024-02-20 11:57 ` Yu Kuai
2024-02-21 7:36 ` Christian Brauner
2024-01-29 10:56 ` [PATCH RFC 2/2] fs,drivers: remove bdev_inode() usage outside of block layer and drivers Christian Brauner
2024-01-29 14:37 ` Christoph Hellwig
2024-01-29 15:29 ` Christian Brauner
2024-01-29 15:36 ` Christoph Hellwig
2024-02-19 13:34 ` Yu Kuai
2024-02-19 13:42 ` Yu Kuai
2024-02-05 11:55 ` [PATCH v2 00/34] Open block devices as files Christian Brauner
2024-02-05 14:19 ` Jan Kara
2024-02-06 13:39 ` Christian Brauner
2024-02-06 13:58 ` Jan Kara
2024-02-06 16:10 ` Christian Brauner
2024-03-21 22:17 ` Matthew Wilcox
2024-03-22 3:38 ` Kent Overstreet
2024-03-22 13:56 ` Christian Brauner
2024-03-22 12:31 ` Christian Brauner
2024-03-22 12:40 ` Matthew Wilcox
2024-03-22 13:53 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240314-entbehren-folglich-8c8fef0cd49b@brauner \
--to=brauner@kernel.org \
--cc=axboe@kernel.dk \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox