* [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
@ 2023-08-11 11:04 Jan Kara
2023-08-11 11:04 ` [PATCH 20/29] btrfs: Convert to bdev_open_by_path() Jan Kara
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Jan Kara @ 2023-08-11 11:04 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-block, Christoph Hellwig, Jan Kara, Alasdair Kergon,
Andrew Morton, Anna Schumaker, Chao Yu, Christian Borntraeger,
Darrick J. Wong, Dave Kleikamp, David Sterba, dm-devel, drbd-dev,
Gao Xiang, Jack Wang, Jaegeuk Kim, jfs-discussion, Joern Engel,
Joseph Qi, Kent Overstreet, linux-bcache, linux-btrfs,
linux-erofs, linux-ext4, linux-f2fs-devel, linux-mm, linux-mtd,
linux-nfs, linux-nilfs, linux-nvme, linux-pm, linux-raid,
linux-s390, linux-scsi, linux-xfs, Md. Haris Iqbal, Mike Snitzer,
Minchan Kim, ocfs2-devel, reiserfs-devel, Sergey Senozhatsky,
Song Liu, Sven Schnelle, target-devel, Ted Tso, Trond Myklebust,
xen-devel
Hello,
this is a v2 of the patch series which implements the idea of blkdev_get_by_*()
calls returning bdev_handle which is then passed to blkdev_put() [1]. This
makes the get and put calls for bdevs more obviously matching and allows us to
propagate context from get to put without having to modify all the users
(again!). In particular I need to propagate used open flags to blkdev_put() to
be able count writeable opens and add support for blocking writes to mounted
block devices. I'll send that series separately.
The series is based on Christian's vfs tree as of yesterday as there is quite
some overlap. Patches have passed some reasonable testing - I've tested block
changes, md, dm, bcache, xfs, btrfs, ext4, swap. This obviously doesn't cover
everything so I'd like to ask respective maintainers to review / test their
changes. Thanks! I've pushed out the full branch to:
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git bdev_handle
to ease review / testing.
Changes since v1:
* Rebased on top of current vfs tree
* Renamed final functions to bdev_open_by_*() and bdev_release()
* Fixed detection of exclusive open in blkdev_ioctl() and blkdev_fallocate()
* Fixed swap conversion to properly reinitialize swap_info->bdev_handle
* Fixed xfs conversion to not oops with rtdev without logdev
* Couple other minor fixups
Honza
[1] https://lore.kernel.org/all/ZJGNsVDhZx0Xgs2H@infradead.org
CC: Alasdair Kergon <agk@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Anna Schumaker <anna@kernel.org>
CC: Chao Yu <chao@kernel.org>
CC: Christian Borntraeger <borntraeger@linux.ibm.com>
CC: Coly Li <colyli@suse.de
CC: "Darrick J. Wong" <djwong@kernel.org>
CC: Dave Kleikamp <shaggy@kernel.org>
CC: David Sterba <dsterba@suse.com>
CC: dm-devel@redhat.com
CC: drbd-dev@lists.linbit.com
CC: Gao Xiang <xiang@kernel.org>
CC: Jack Wang <jinpu.wang@ionos.com>
CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: jfs-discussion@lists.sourceforge.net
CC: Joern Engel <joern@lazybastard.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: Kent Overstreet <kent.overstreet@gmail.com>
CC: linux-bcache@vger.kernel.org
CC: linux-btrfs@vger.kernel.org
CC: linux-erofs@lists.ozlabs.org
CC: <linux-ext4@vger.kernel.org>
CC: linux-f2fs-devel@lists.sourceforge.net
CC: linux-mm@kvack.org
CC: linux-mtd@lists.infradead.org
CC: linux-nfs@vger.kernel.org
CC: linux-nilfs@vger.kernel.org
CC: linux-nvme@lists.infradead.org
CC: linux-pm@vger.kernel.org
CC: linux-raid@vger.kernel.org
CC: linux-s390@vger.kernel.org
CC: linux-scsi@vger.kernel.org
CC: linux-xfs@vger.kernel.org
CC: "Md. Haris Iqbal" <haris.iqbal@ionos.com>
CC: Mike Snitzer <snitzer@kernel.org>
CC: Minchan Kim <minchan@kernel.org>
CC: ocfs2-devel@oss.oracle.com
CC: reiserfs-devel@vger.kernel.org
CC: Sergey Senozhatsky <senozhatsky@chromium.org>
CC: Song Liu <song@kernel.org>
CC: Sven Schnelle <svens@linux.ibm.com>
CC: target-devel@vger.kernel.org
CC: Ted Tso <tytso@mit.edu>
CC: Trond Myklebust <trond.myklebust@hammerspace.com>
CC: xen-devel@lists.xenproject.org
Previous versions:
Link: http://lore.kernel.org/r/20230629165206.383-1-jack@suse.cz # v1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 20/29] btrfs: Convert to bdev_open_by_path()
2023-08-11 11:04 [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle Jan Kara
@ 2023-08-11 11:04 ` Jan Kara
2023-08-11 12:27 ` [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle Christoph Hellwig
2023-08-25 1:58 ` Al Viro
2 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2023-08-11 11:04 UTC (permalink / raw)
To: linux-fsdevel
Cc: linux-block, Christoph Hellwig, Jan Kara, David Sterba,
linux-btrfs
Convert btrfs to use bdev_open_by_path() and pass the handle around. We
also drop the holder from struct btrfs_device as it is now not needed
anymore.
CC: David Sterba <dsterba@suse.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/btrfs/dev-replace.c | 14 +++---
fs/btrfs/ioctl.c | 18 +++----
fs/btrfs/volumes.c | 107 +++++++++++++++++++++--------------------
fs/btrfs/volumes.h | 6 +--
4 files changed, 73 insertions(+), 72 deletions(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 5f10965fd72b..fec013c5f26c 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -247,6 +247,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
{
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
struct btrfs_device *device;
+ struct bdev_handle *bdev_handle;
struct block_device *bdev;
u64 devid = BTRFS_DEV_REPLACE_DEVID;
int ret = 0;
@@ -257,12 +258,13 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
return -EINVAL;
}
- bdev = blkdev_get_by_path(device_path, BLK_OPEN_WRITE,
- fs_info->bdev_holder, NULL);
- if (IS_ERR(bdev)) {
+ bdev_handle = bdev_open_by_path(device_path, BLK_OPEN_WRITE,
+ fs_info->bdev_holder, NULL);
+ if (IS_ERR(bdev_handle)) {
btrfs_err(fs_info, "target device %s is invalid!", device_path);
- return PTR_ERR(bdev);
+ return PTR_ERR(bdev_handle);
}
+ bdev = bdev_handle->bdev;
if (!btrfs_check_device_zone_type(fs_info, bdev)) {
btrfs_err(fs_info,
@@ -313,9 +315,9 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
device->commit_bytes_used = device->bytes_used;
device->fs_info = fs_info;
device->bdev = bdev;
+ device->bdev_handle = bdev_handle;
set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
set_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
- device->holder = fs_info->bdev_holder;
device->dev_stats_valid = 1;
set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
device->fs_devices = fs_devices;
@@ -334,7 +336,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
return 0;
error:
- blkdev_put(bdev, fs_info->bdev_holder);
+ bdev_release(bdev_handle);
return ret;
}
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a18ee7b5a166..b4074191fcc7 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2670,8 +2670,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
struct inode *inode = file_inode(file);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct btrfs_ioctl_vol_args_v2 *vol_args;
- struct block_device *bdev = NULL;
- void *holder;
+ struct bdev_handle *bdev_handle = NULL;
int ret;
bool cancel = false;
@@ -2708,7 +2707,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
goto err_drop;
/* Exclusive operation is now claimed */
- ret = btrfs_rm_device(fs_info, &args, &bdev, &holder);
+ ret = btrfs_rm_device(fs_info, &args, &bdev_handle);
btrfs_exclop_finish(fs_info);
@@ -2722,8 +2721,8 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
}
err_drop:
mnt_drop_write_file(file);
- if (bdev)
- blkdev_put(bdev, holder);
+ if (bdev_handle)
+ bdev_release(bdev_handle);
out:
btrfs_put_dev_args_from_path(&args);
kfree(vol_args);
@@ -2736,8 +2735,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
struct inode *inode = file_inode(file);
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
struct btrfs_ioctl_vol_args *vol_args;
- struct block_device *bdev = NULL;
- void *holder;
+ struct bdev_handle *bdev_handle = NULL;
int ret;
bool cancel = false;
@@ -2764,15 +2762,15 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
ret = exclop_start_or_cancel_reloc(fs_info, BTRFS_EXCLOP_DEV_REMOVE,
cancel);
if (ret == 0) {
- ret = btrfs_rm_device(fs_info, &args, &bdev, &holder);
+ ret = btrfs_rm_device(fs_info, &args, &bdev_handle);
if (!ret)
btrfs_info(fs_info, "disk deleted %s", vol_args->name);
btrfs_exclop_finish(fs_info);
}
mnt_drop_write_file(file);
- if (bdev)
- blkdev_put(bdev, holder);
+ if (bdev_handle)
+ bdev_release(bdev_handle);
out:
btrfs_put_dev_args_from_path(&args);
kfree(vol_args);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e33ed9810f07..465d9b7252b4 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -511,37 +511,39 @@ static struct btrfs_fs_devices *find_fsid_with_metadata_uuid(
static int
btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
- int flush, struct block_device **bdev,
+ int flush, struct bdev_handle **bdev_handle,
struct btrfs_super_block **disk_super)
{
+ struct block_device *bdev;
int ret;
- *bdev = blkdev_get_by_path(device_path, flags, holder, NULL);
+ *bdev_handle = bdev_open_by_path(device_path, flags, holder, NULL);
- if (IS_ERR(*bdev)) {
- ret = PTR_ERR(*bdev);
+ if (IS_ERR(*bdev_handle)) {
+ ret = PTR_ERR(*bdev_handle);
goto error;
}
+ bdev = (*bdev_handle)->bdev;
if (flush)
- sync_blockdev(*bdev);
- ret = set_blocksize(*bdev, BTRFS_BDEV_BLOCKSIZE);
+ sync_blockdev(bdev);
+ ret = set_blocksize(bdev, BTRFS_BDEV_BLOCKSIZE);
if (ret) {
- blkdev_put(*bdev, holder);
+ bdev_release(*bdev_handle);
goto error;
}
- invalidate_bdev(*bdev);
- *disk_super = btrfs_read_dev_super(*bdev);
+ invalidate_bdev(bdev);
+ *disk_super = btrfs_read_dev_super(bdev);
if (IS_ERR(*disk_super)) {
ret = PTR_ERR(*disk_super);
- blkdev_put(*bdev, holder);
+ bdev_release(*bdev_handle);
goto error;
}
return 0;
error:
- *bdev = NULL;
+ *bdev_handle = NULL;
return ret;
}
@@ -613,7 +615,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
struct btrfs_device *device, blk_mode_t flags,
void *holder)
{
- struct block_device *bdev;
+ struct bdev_handle *bdev_handle;
struct btrfs_super_block *disk_super;
u64 devid;
int ret;
@@ -624,7 +626,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
return -EINVAL;
ret = btrfs_get_bdev_and_sb(device->name->str, flags, holder, 1,
- &bdev, &disk_super);
+ &bdev_handle, &disk_super);
if (ret)
return ret;
@@ -648,21 +650,21 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
fs_devices->seeding = true;
} else {
- if (bdev_read_only(bdev))
+ if (bdev_read_only(bdev_handle->bdev))
clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
else
set_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
}
- if (!bdev_nonrot(bdev))
+ if (!bdev_nonrot(bdev_handle->bdev))
fs_devices->rotating = true;
- if (bdev_max_discard_sectors(bdev))
+ if (bdev_max_discard_sectors(bdev_handle->bdev))
fs_devices->discardable = true;
- device->bdev = bdev;
+ device->bdev_handle = bdev_handle;
+ device->bdev = bdev_handle->bdev;
clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
- device->holder = holder;
fs_devices->open_devices++;
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) &&
@@ -676,7 +678,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
error_free_page:
btrfs_release_disk_super(disk_super);
- blkdev_put(bdev, holder);
+ bdev_release(bdev_handle);
return -EINVAL;
}
@@ -1066,9 +1068,10 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
if (device->devid == BTRFS_DEV_REPLACE_DEVID)
continue;
- if (device->bdev) {
- blkdev_put(device->bdev, device->holder);
+ if (device->bdev_handle) {
+ bdev_release(device->bdev_handle);
device->bdev = NULL;
+ device->bdev_handle = NULL;
fs_devices->open_devices--;
}
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
@@ -1113,7 +1116,7 @@ static void btrfs_close_bdev(struct btrfs_device *device)
invalidate_bdev(device->bdev);
}
- blkdev_put(device->bdev, device->holder);
+ bdev_release(device->bdev_handle);
}
static void btrfs_close_one_device(struct btrfs_device *device)
@@ -1361,7 +1364,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path)
struct btrfs_super_block *disk_super;
bool new_device_added = false;
struct btrfs_device *device = NULL;
- struct block_device *bdev;
+ struct bdev_handle *bdev_handle;
u64 bytenr, bytenr_orig;
int ret;
@@ -1384,18 +1387,19 @@ struct btrfs_device *btrfs_scan_one_device(const char *path)
* values temporarily, as the device paths of the fsid are the only
* required information for assembling the volume.
*/
- bdev = blkdev_get_by_path(path, BLK_OPEN_READ, NULL, NULL);
- if (IS_ERR(bdev))
- return ERR_CAST(bdev);
+ bdev_handle = bdev_open_by_path(path, BLK_OPEN_READ, NULL, NULL);
+ if (IS_ERR(bdev_handle))
+ return ERR_CAST(bdev_handle);
bytenr_orig = btrfs_sb_offset(0);
- ret = btrfs_sb_log_location_bdev(bdev, 0, READ, &bytenr);
+ ret = btrfs_sb_log_location_bdev(bdev_handle->bdev, 0, READ, &bytenr);
if (ret) {
device = ERR_PTR(ret);
goto error_bdev_put;
}
- disk_super = btrfs_read_disk_super(bdev, bytenr, bytenr_orig);
+ disk_super = btrfs_read_disk_super(bdev_handle->bdev, bytenr,
+ bytenr_orig);
if (IS_ERR(disk_super)) {
device = ERR_CAST(disk_super);
goto error_bdev_put;
@@ -1408,7 +1412,7 @@ struct btrfs_device *btrfs_scan_one_device(const char *path)
btrfs_release_disk_super(disk_super);
error_bdev_put:
- blkdev_put(bdev, NULL);
+ bdev_release(bdev_handle);
return device;
}
@@ -2093,7 +2097,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info,
int btrfs_rm_device(struct btrfs_fs_info *fs_info,
struct btrfs_dev_lookup_args *args,
- struct block_device **bdev, void **holder)
+ struct bdev_handle **bdev_handle)
{
struct btrfs_trans_handle *trans;
struct btrfs_device *device;
@@ -2202,7 +2206,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
btrfs_assign_next_active_device(device, NULL);
- if (device->bdev) {
+ if (device->bdev_handle) {
cur_devices->open_devices--;
/* remove sysfs entry */
btrfs_sysfs_remove_device(device);
@@ -2218,9 +2222,9 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
* free the device.
*
* We cannot call btrfs_close_bdev() here because we're holding the sb
- * write lock, and blkdev_put() will pull in the ->open_mutex on the
- * block device and it's dependencies. Instead just flush the device
- * and let the caller do the final blkdev_put.
+ * write lock, and bdev_release() will pull in the ->open_mutex on
+ * the block device and it's dependencies. Instead just flush the
+ * device and let the caller do the final bdev_release.
*/
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
btrfs_scratch_superblocks(fs_info, device->bdev,
@@ -2231,8 +2235,7 @@ int btrfs_rm_device(struct btrfs_fs_info *fs_info,
}
}
- *bdev = device->bdev;
- *holder = device->holder;
+ *bdev_handle = device->bdev_handle;
synchronize_rcu();
btrfs_free_device(device);
@@ -2369,7 +2372,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
const char *path)
{
struct btrfs_super_block *disk_super;
- struct block_device *bdev;
+ struct bdev_handle *bdev_handle;
int ret;
if (!path || !path[0])
@@ -2387,7 +2390,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
}
ret = btrfs_get_bdev_and_sb(path, BLK_OPEN_READ, NULL, 0,
- &bdev, &disk_super);
+ &bdev_handle, &disk_super);
if (ret) {
btrfs_put_dev_args_from_path(args);
return ret;
@@ -2400,7 +2403,7 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
else
memcpy(args->fsid, disk_super->fsid, BTRFS_FSID_SIZE);
btrfs_release_disk_super(disk_super);
- blkdev_put(bdev, NULL);
+ bdev_release(bdev_handle);
return 0;
}
@@ -2620,7 +2623,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
struct btrfs_root *root = fs_info->dev_root;
struct btrfs_trans_handle *trans;
struct btrfs_device *device;
- struct block_device *bdev;
+ struct bdev_handle *bdev_handle;
struct super_block *sb = fs_info->sb;
struct btrfs_fs_devices *fs_devices = fs_info->fs_devices;
struct btrfs_fs_devices *seed_devices = NULL;
@@ -2633,12 +2636,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
if (sb_rdonly(sb) && !fs_devices->seeding)
return -EROFS;
- bdev = blkdev_get_by_path(device_path, BLK_OPEN_WRITE,
- fs_info->bdev_holder, NULL);
- if (IS_ERR(bdev))
- return PTR_ERR(bdev);
+ bdev_handle = bdev_open_by_path(device_path, BLK_OPEN_WRITE,
+ fs_info->bdev_holder, NULL);
+ if (IS_ERR(bdev_handle))
+ return PTR_ERR(bdev_handle);
- if (!btrfs_check_device_zone_type(fs_info, bdev)) {
+ if (!btrfs_check_device_zone_type(fs_info, bdev_handle->bdev)) {
ret = -EINVAL;
goto error;
}
@@ -2650,11 +2653,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
locked = true;
}
- sync_blockdev(bdev);
+ sync_blockdev(bdev_handle->bdev);
rcu_read_lock();
list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
- if (device->bdev == bdev) {
+ if (device->bdev == bdev_handle->bdev) {
ret = -EEXIST;
rcu_read_unlock();
goto error;
@@ -2670,7 +2673,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
}
device->fs_info = fs_info;
- device->bdev = bdev;
+ device->bdev_handle = bdev_handle;
+ device->bdev = bdev_handle->bdev;
ret = lookup_bdev(device_path, &device->devt);
if (ret)
goto error_free_device;
@@ -2691,12 +2695,11 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
device->io_align = fs_info->sectorsize;
device->sector_size = fs_info->sectorsize;
device->total_bytes =
- round_down(bdev_nr_bytes(bdev), fs_info->sectorsize);
+ round_down(bdev_nr_bytes(device->bdev), fs_info->sectorsize);
device->disk_total_bytes = device->total_bytes;
device->commit_total_bytes = device->total_bytes;
set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
clear_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state);
- device->holder = fs_info->bdev_holder;
device->dev_stats_valid = 1;
set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
@@ -2732,7 +2735,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
atomic64_add(device->total_bytes, &fs_info->free_chunk_space);
- if (!bdev_nonrot(bdev))
+ if (!bdev_nonrot(device->bdev))
fs_devices->rotating = true;
orig_super_total_bytes = btrfs_super_total_bytes(fs_info->super_copy);
@@ -2854,7 +2857,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
error_free_device:
btrfs_free_device(device);
error:
- blkdev_put(bdev, fs_info->bdev_holder);
+ bdev_release(bdev_handle);
if (locked) {
mutex_unlock(&uuid_mutex);
up_write(&sb->s_umount);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 824161c6dd06..ad00017798b1 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -90,13 +90,11 @@ struct btrfs_device {
u64 generation;
+ struct bdev_handle *bdev_handle;
struct block_device *bdev;
struct btrfs_zoned_device_info *zone_info;
- /* block device holder for blkdev_get/put */
- void *holder;
-
/*
* Device's major-minor number. Must be set even if the device is not
* opened (bdev == NULL), unless the device is missing.
@@ -629,7 +627,7 @@ struct btrfs_device *btrfs_alloc_device(struct btrfs_fs_info *fs_info,
void btrfs_put_dev_args_from_path(struct btrfs_dev_lookup_args *args);
int btrfs_rm_device(struct btrfs_fs_info *fs_info,
struct btrfs_dev_lookup_args *args,
- struct block_device **bdev, void **holder);
+ struct bdev_handle **bdev_handle);
void __exit btrfs_cleanup_fs_uuids(void);
int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len);
int btrfs_grow_device(struct btrfs_trans_handle *trans,
--
2.35.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
2023-08-11 11:04 [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle Jan Kara
2023-08-11 11:04 ` [PATCH 20/29] btrfs: Convert to bdev_open_by_path() Jan Kara
@ 2023-08-11 12:27 ` Christoph Hellwig
2023-08-25 1:58 ` Al Viro
2 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2023-08-11 12:27 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, linux-block, Christoph Hellwig, Alasdair Kergon,
Andrew Morton, Anna Schumaker, Chao Yu, Christian Borntraeger,
Darrick J. Wong, Dave Kleikamp, David Sterba, dm-devel, drbd-dev,
Gao Xiang, Jack Wang, Jaegeuk Kim, jfs-discussion, Joern Engel,
Joseph Qi, Kent Overstreet, linux-bcache, linux-btrfs,
linux-erofs, linux-ext4, linux-f2fs-devel, linux-mm, linux-mtd,
linux-nfs, linux-nilfs, linux-nvme, linux-pm, linux-raid,
linux-s390, linux-scsi, linux-xfs, Md. Haris Iqbal, Mike Snitzer,
Minchan Kim, ocfs2-devel, reiserfs-devel, Sergey Senozhatsky,
Song Liu, Sven Schnelle, target-devel, Ted Tso, Trond Myklebust,
xen-devel
Except for a mostly cosmetic nitpick this looks good to me:
Acked-by: Christoph Hellwig <hch@lst.de>
That's not eactly the deep review I'd like to do, but as I'm about to
head out for vacation that's probably as good as it gets.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
2023-08-11 11:04 [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle Jan Kara
2023-08-11 11:04 ` [PATCH 20/29] btrfs: Convert to bdev_open_by_path() Jan Kara
2023-08-11 12:27 ` [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle Christoph Hellwig
@ 2023-08-25 1:58 ` Al Viro
2023-08-25 13:47 ` Jan Kara
2 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2023-08-25 1:58 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, linux-block, Christoph Hellwig, Alasdair Kergon,
Andrew Morton, Anna Schumaker, Chao Yu, Christian Borntraeger,
Darrick J. Wong, Dave Kleikamp, David Sterba, dm-devel, drbd-dev,
Gao Xiang, Jack Wang, Jaegeuk Kim, jfs-discussion, Joern Engel,
Joseph Qi, Kent Overstreet, linux-bcache, linux-btrfs,
linux-erofs, linux-ext4, linux-f2fs-devel, linux-mm, linux-mtd,
linux-nfs, linux-nilfs, linux-nvme, linux-pm, linux-raid,
linux-s390, linux-scsi, linux-xfs, Md. Haris Iqbal, Mike Snitzer,
Minchan Kim, ocfs2-devel, reiserfs-devel, Sergey Senozhatsky,
Song Liu, Sven Schnelle, target-devel, Ted Tso, Trond Myklebust,
xen-devel
On Fri, Aug 11, 2023 at 01:04:31PM +0200, Jan Kara wrote:
> Hello,
>
> this is a v2 of the patch series which implements the idea of blkdev_get_by_*()
> calls returning bdev_handle which is then passed to blkdev_put() [1]. This
> makes the get and put calls for bdevs more obviously matching and allows us to
> propagate context from get to put without having to modify all the users
> (again!). In particular I need to propagate used open flags to blkdev_put() to
> be able count writeable opens and add support for blocking writes to mounted
> block devices. I'll send that series separately.
>
> The series is based on Christian's vfs tree as of yesterday as there is quite
> some overlap. Patches have passed some reasonable testing - I've tested block
> changes, md, dm, bcache, xfs, btrfs, ext4, swap. This obviously doesn't cover
> everything so I'd like to ask respective maintainers to review / test their
> changes. Thanks! I've pushed out the full branch to:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git bdev_handle
>
> to ease review / testing.
Hmm... Completely Insane Idea(tm): how about turning that thing inside out and
having your bdev_open_by... return an actual opened struct file?
After all, we do that for sockets and pipes just fine and that's a whole lot
hotter area.
Suppose we leave blkdev_open()/blkdev_release() as-is. No need to mess with
what we have for normal opened files for block devices. And have block_open_by_dev()
that would find bdev, etc., same yours does and shove it into anon file.
Paired with plain fput() - no need to bother with new primitives for closing.
With a helper returning I_BDEV(bdev_file_inode(file)) to get from those to bdev.
NOTE: I'm not suggesting replacing ->s_bdev with struct file * if we do that -
we want that value cached, obviously. Just store both...
Not saying it's a good idea, but... might be interesting to look into.
Comments?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
2023-08-25 1:58 ` Al Viro
@ 2023-08-25 13:47 ` Jan Kara
2023-08-26 2:28 ` Al Viro
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Jan Kara @ 2023-08-25 13:47 UTC (permalink / raw)
To: Al Viro
Cc: Jan Kara, linux-fsdevel, linux-block, Christoph Hellwig,
Alasdair Kergon, Andrew Morton, Anna Schumaker, Chao Yu,
Christian Borntraeger, Darrick J. Wong, Dave Kleikamp,
David Sterba, dm-devel, drbd-dev, Gao Xiang, Jack Wang,
Jaegeuk Kim, jfs-discussion, Joern Engel, Joseph Qi,
Kent Overstreet, linux-bcache, linux-btrfs, linux-erofs,
linux-ext4, linux-f2fs-devel, linux-mm, linux-mtd, linux-nfs,
linux-nilfs, linux-nvme, linux-pm, linux-raid, linux-s390,
linux-scsi, linux-xfs, Md. Haris Iqbal, Mike Snitzer, Minchan Kim,
ocfs2-devel, reiserfs-devel, Sergey Senozhatsky, Song Liu,
Sven Schnelle, target-devel, Ted Tso, Trond Myklebust, xen-devel,
Jens Axboe, Christian Brauner
On Fri 25-08-23 02:58:43, Al Viro wrote:
> On Fri, Aug 11, 2023 at 01:04:31PM +0200, Jan Kara wrote:
> > Hello,
> >
> > this is a v2 of the patch series which implements the idea of blkdev_get_by_*()
> > calls returning bdev_handle which is then passed to blkdev_put() [1]. This
> > makes the get and put calls for bdevs more obviously matching and allows us to
> > propagate context from get to put without having to modify all the users
> > (again!). In particular I need to propagate used open flags to blkdev_put() to
> > be able count writeable opens and add support for blocking writes to mounted
> > block devices. I'll send that series separately.
> >
> > The series is based on Christian's vfs tree as of yesterday as there is quite
> > some overlap. Patches have passed some reasonable testing - I've tested block
> > changes, md, dm, bcache, xfs, btrfs, ext4, swap. This obviously doesn't cover
> > everything so I'd like to ask respective maintainers to review / test their
> > changes. Thanks! I've pushed out the full branch to:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git bdev_handle
> >
> > to ease review / testing.
>
> Hmm... Completely Insane Idea(tm): how about turning that thing inside out and
> having your bdev_open_by... return an actual opened struct file?
>
> After all, we do that for sockets and pipes just fine and that's a whole lot
> hotter area.
>
> Suppose we leave blkdev_open()/blkdev_release() as-is. No need to mess with
> what we have for normal opened files for block devices. And have block_open_by_dev()
> that would find bdev, etc., same yours does and shove it into anon file.
>
> Paired with plain fput() - no need to bother with new primitives for closing.
> With a helper returning I_BDEV(bdev_file_inode(file)) to get from those to bdev.
>
> NOTE: I'm not suggesting replacing ->s_bdev with struct file * if we do that -
> we want that value cached, obviously. Just store both...
>
> Not saying it's a good idea, but... might be interesting to look into.
> Comments?
I can see the appeal of not having to introduce the new bdev_handle type
and just using struct file which unifies in-kernel and userspace block
device opens. But I can see downsides too - the last fput() happening from
task work makes me a bit nervous whether it will not break something
somewhere with exclusive bdev opens. Getting from struct file to bdev is
somewhat harder but I guess a helper like F_BDEV() would solve that just
fine.
So besides my last fput() worry about I think this could work and would be
probably a bit nicer than what I have. But before going and redoing the whole
series let me gather some more feedback so that we don't go back and forth.
Christoph, Christian, Jens, any opinion?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
2023-08-25 13:47 ` Jan Kara
@ 2023-08-26 2:28 ` Al Viro
2023-08-28 14:27 ` Christoph Hellwig
2023-08-28 13:20 ` Christian Brauner
2023-08-28 14:22 ` Christoph Hellwig
2 siblings, 1 reply; 9+ messages in thread
From: Al Viro @ 2023-08-26 2:28 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, linux-block, Christoph Hellwig, Alasdair Kergon,
Andrew Morton, Anna Schumaker, Chao Yu, Christian Borntraeger,
Darrick J. Wong, Dave Kleikamp, David Sterba, dm-devel, drbd-dev,
Gao Xiang, Jack Wang, Jaegeuk Kim, jfs-discussion, Joern Engel,
Joseph Qi, Kent Overstreet, linux-bcache, linux-btrfs,
linux-erofs, linux-ext4, linux-f2fs-devel, linux-mm, linux-mtd,
linux-nfs, linux-nilfs, linux-nvme, linux-pm, linux-raid,
linux-s390, linux-scsi, linux-xfs, Md. Haris Iqbal, Mike Snitzer,
Minchan Kim, ocfs2-devel, reiserfs-devel, Sergey Senozhatsky,
Song Liu, Sven Schnelle, target-devel, Ted Tso, Trond Myklebust,
xen-devel, Jens Axboe, Christian Brauner
On Fri, Aug 25, 2023 at 03:47:56PM +0200, Jan Kara wrote:
> I can see the appeal of not having to introduce the new bdev_handle type
> and just using struct file which unifies in-kernel and userspace block
> device opens. But I can see downsides too - the last fput() happening from
> task work makes me a bit nervous whether it will not break something
> somewhere with exclusive bdev opens. Getting from struct file to bdev is
> somewhat harder but I guess a helper like F_BDEV() would solve that just
> fine.
>
> So besides my last fput() worry about I think this could work and would be
> probably a bit nicer than what I have. But before going and redoing the whole
> series let me gather some more feedback so that we don't go back and forth.
> Christoph, Christian, Jens, any opinion?
Redoing is not an issue - it can be done on top of your series just
as well. Async behaviour of fput() might be, but... need to look
through the actual users; for a lot of them it's perfectly fine.
FWIW, from a cursory look there appears to be a missing primitive: take
an opened bdev (or bdev_handle, with your variant, or opened file if we
go that way eventually) and claim it.
I mean, look at claim_swapfile() for example:
p->bdev = blkdev_get_by_dev(inode->i_rdev,
FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
if (IS_ERR(p->bdev)) {
error = PTR_ERR(p->bdev);
p->bdev = NULL;
return error;
}
p->old_block_size = block_size(p->bdev);
error = set_blocksize(p->bdev, PAGE_SIZE);
if (error < 0)
return error;
we already have the file opened, and we keep it opened all the way until
the swapoff(2); here we have noticed that it's a block device and we
* open the fucker again (by device number), this time claiming
it with our swap_info_struct as holder, to be closed at swapoff(2) time
(just before we close the file)
* flip the block size to PAGE_SIZE, to be reverted at swapoff(2)
time That really looks like it ought to be
* take the opened file, see that it's a block device
* try to claim it with that holder
* on success, flip the block size
with close_filp() in the swapoff(2) (or failure exit path in swapon(2))
doing what it would've done for an O_EXCL opened block device.
The only difference from O_EXCL userland open is that here we would
end up with holder pointing not to struct file in question, but to our
swap_info_struct. It will do the right thing.
This extra open is entirely due to "well, we need to claim it and the
primitive that does that happens to be tied to opening"; feels rather
counter-intuitive.
For that matter, we could add an explicit "unclaim" primitive - might
be easier to follow. That would add another example where that could
be used - in blkdev_bszset() we have an opened block device (it's an
ioctl, after all), we want to change block size and we *really* don't
want to have that happen under a mounted filesystem. So if it's not
opened exclusive, we do a temporary exclusive open of own and act on
that instead. Might as well go for a temporary claim...
BTW, what happens if two threads call ioctl(fd, BLKBSZSET, &n)
for the same descriptor that happens to have been opened O_EXCL?
Without O_EXCL they would've been unable to claim the sucker at the same
time - the holder we are using is the address of a function argument,
i.e. something that points to kernel stack of the caller. Those would
conflict and we either get set_blocksize() calls fully serialized, or
one of the callers would eat -EBUSY. Not so in "opened with O_EXCL"
case - they can very well overlap and IIRC set_blocksize() does *not*
expect that kind of crap... It's all under CAP_SYS_ADMIN, so it's not
as if it was a meaningful security hole anyway, but it does look fishy.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
2023-08-25 13:47 ` Jan Kara
2023-08-26 2:28 ` Al Viro
@ 2023-08-28 13:20 ` Christian Brauner
2023-08-28 14:22 ` Christoph Hellwig
2 siblings, 0 replies; 9+ messages in thread
From: Christian Brauner @ 2023-08-28 13:20 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, linux-fsdevel, linux-block, Christoph Hellwig,
Alasdair Kergon, Andrew Morton, Anna Schumaker, Chao Yu,
Christian Borntraeger, Darrick J. Wong, Dave Kleikamp,
David Sterba, dm-devel, drbd-dev, Gao Xiang, Jack Wang,
Jaegeuk Kim, jfs-discussion, Joern Engel, Joseph Qi,
Kent Overstreet, linux-bcache, linux-btrfs, linux-erofs,
linux-ext4, linux-f2fs-devel, linux-mm, linux-mtd, linux-nfs,
linux-nilfs, linux-nvme, linux-pm, linux-raid, linux-s390,
linux-scsi, linux-xfs, Md. Haris Iqbal, Mike Snitzer, Minchan Kim,
ocfs2-devel, reiserfs-devel, Sergey Senozhatsky, Song Liu,
Sven Schnelle, target-devel, Ted Tso, Trond Myklebust, xen-devel,
Jens Axboe
> So besides my last fput() worry about I think this could work and would be
> probably a bit nicer than what I have. But before going and redoing the whole
> series let me gather some more feedback so that we don't go back and forth.
> Christoph, Christian, Jens, any opinion?
I'll be a bit under water for the next few days, I expect but I'll get
back to this. I think not making you redo this whole thing from scratch
is what I'd prefer unless there's really clear advantages. But I don't
want to offer a haphazard opinion in the middle of the merge window.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
2023-08-25 13:47 ` Jan Kara
2023-08-26 2:28 ` Al Viro
2023-08-28 13:20 ` Christian Brauner
@ 2023-08-28 14:22 ` Christoph Hellwig
2 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2023-08-28 14:22 UTC (permalink / raw)
To: Jan Kara
Cc: Al Viro, linux-fsdevel, linux-block, Christoph Hellwig,
Alasdair Kergon, Andrew Morton, Anna Schumaker, Chao Yu,
Christian Borntraeger, Darrick J. Wong, Dave Kleikamp,
David Sterba, dm-devel, drbd-dev, Gao Xiang, Jack Wang,
Jaegeuk Kim, jfs-discussion, Joern Engel, Joseph Qi,
Kent Overstreet, linux-bcache, linux-btrfs, linux-erofs,
linux-ext4, linux-f2fs-devel, linux-mm, linux-mtd, linux-nfs,
linux-nilfs, linux-nvme, linux-pm, linux-raid, linux-s390,
linux-scsi, linux-xfs, Md. Haris Iqbal, Mike Snitzer, Minchan Kim,
ocfs2-devel, reiserfs-devel, Sergey Senozhatsky, Song Liu,
Sven Schnelle, target-devel, Ted Tso, Trond Myklebust, xen-devel,
Jens Axboe, Christian Brauner
On Fri, Aug 25, 2023 at 03:47:56PM +0200, Jan Kara wrote:
> I can see the appeal of not having to introduce the new bdev_handle type
> and just using struct file which unifies in-kernel and userspace block
> device opens. But I can see downsides too - the last fput() happening from
> task work makes me a bit nervous whether it will not break something
> somewhere with exclusive bdev opens. Getting from struct file to bdev is
> somewhat harder but I guess a helper like F_BDEV() would solve that just
> fine.
>
> So besides my last fput() worry about I think this could work and would be
> probably a bit nicer than what I have. But before going and redoing the whole
> series let me gather some more feedback so that we don't go back and forth.
> Christoph, Christian, Jens, any opinion?
I did think about the file a bit. The fact that we'd need something
like an anon_file for the by_dev open was always a huge turn off for
me, but maybe my concern is overblown. Having a struct file would
actually be really useful for a bunch of users.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
2023-08-26 2:28 ` Al Viro
@ 2023-08-28 14:27 ` Christoph Hellwig
0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2023-08-28 14:27 UTC (permalink / raw)
To: Al Viro
Cc: Jan Kara, linux-fsdevel, linux-block, Christoph Hellwig,
Alasdair Kergon, Andrew Morton, Anna Schumaker, Chao Yu,
Christian Borntraeger, Darrick J. Wong, Dave Kleikamp,
David Sterba, dm-devel, drbd-dev, Gao Xiang, Jack Wang,
Jaegeuk Kim, jfs-discussion, Joern Engel, Joseph Qi,
Kent Overstreet, linux-bcache, linux-btrfs, linux-erofs,
linux-ext4, linux-f2fs-devel, linux-mm, linux-mtd, linux-nfs,
linux-nilfs, linux-nvme, linux-pm, linux-raid, linux-s390,
linux-scsi, linux-xfs, Md. Haris Iqbal, Mike Snitzer, Minchan Kim,
ocfs2-devel, reiserfs-devel, Sergey Senozhatsky, Song Liu,
Sven Schnelle, target-devel, Ted Tso, Trond Myklebust, xen-devel,
Jens Axboe, Christian Brauner
On Sat, Aug 26, 2023 at 03:28:52AM +0100, Al Viro wrote:
> I mean, look at claim_swapfile() for example:
> p->bdev = blkdev_get_by_dev(inode->i_rdev,
> FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
> if (IS_ERR(p->bdev)) {
> error = PTR_ERR(p->bdev);
> p->bdev = NULL;
> return error;
> }
> p->old_block_size = block_size(p->bdev);
> error = set_blocksize(p->bdev, PAGE_SIZE);
> if (error < 0)
> return error;
> we already have the file opened, and we keep it opened all the way until
> the swapoff(2); here we have noticed that it's a block device and we
> * open the fucker again (by device number), this time claiming
> it with our swap_info_struct as holder, to be closed at swapoff(2) time
> (just before we close the file)
Note that some drivers look at FMODE_EXCL/BLK_OPEN_EXCL in ->open.
These are probably bogus and maybe we want to kill them, but that will
need an audit first.
> BTW, what happens if two threads call ioctl(fd, BLKBSZSET, &n)
> for the same descriptor that happens to have been opened O_EXCL?
> Without O_EXCL they would've been unable to claim the sucker at the same
> time - the holder we are using is the address of a function argument,
> i.e. something that points to kernel stack of the caller. Those would
> conflict and we either get set_blocksize() calls fully serialized, or
> one of the callers would eat -EBUSY. Not so in "opened with O_EXCL"
> case - they can very well overlap and IIRC set_blocksize() does *not*
> expect that kind of crap... It's all under CAP_SYS_ADMIN, so it's not
> as if it was a meaningful security hole anyway, but it does look fishy.
The user get to keep the pieces.. BLKBSZSET is kinda bogus anyway
as the soft blocksize only matters for buffer_head-like I/O, and
there only for file systems. Not idea why anyone would set it manually.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-08-28 14:28 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-11 11:04 [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle Jan Kara
2023-08-11 11:04 ` [PATCH 20/29] btrfs: Convert to bdev_open_by_path() Jan Kara
2023-08-11 12:27 ` [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle Christoph Hellwig
2023-08-25 1:58 ` Al Viro
2023-08-25 13:47 ` Jan Kara
2023-08-26 2:28 ` Al Viro
2023-08-28 14:27 ` Christoph Hellwig
2023-08-28 13:20 ` Christian Brauner
2023-08-28 14:22 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).