* [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices
@ 2023-11-01 17:43 Jan Kara
2023-11-01 17:43 ` [PATCH 1/7] bcachefs: Convert to bdev_open_by_path() Jan Kara
` (7 more replies)
0 siblings, 8 replies; 23+ messages in thread
From: Jan Kara @ 2023-11-01 17:43 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov,
Jan Kara
Hello!
This is the third version of the patches to add config option to not allow
writing to mounted block devices. The new API for block device opening has been
merged so hopefully this patchset can progress towards being merged. We face
some issues with necessary btrfs changes (review bandwidth) so this series is
modified to enable restricting of writes for all other filesystems. Once btrfs
can merge necessary device scanning changes, enabling the support for
restricting writes for it is trivial.
For motivation why restricting writes to mounted block devices is interesting
see patch 3/7. I've been testing the patches more extensively and I've found
couple of things that get broken by disallowing writes to mounted block
devices:
1) "mount -o loop" gets broken because util-linux keeps the loop device open
read-write when attempting to mount it. Hopefully fixable within util-linux.
2) resize2fs online resizing gets broken because it tries to open the block
device read-write only to call resizing ioctl. Trivial to fix within
e2fsprogs.
3) Online e2label will break because it directly writes to the ext2/3/4
superblock while the FS is mounted to set the new label. Ext4 driver
will have to implement the SETFSLABEL ioctl() and e2label will have
to use it, matching what happens for online labelling of btrfs and
xfs.
Likely there will be other breakage I didn't find yet but overall the breakage
looks minor enough that the option might be useful. Definitely good enough
for syzbot fuzzing and likely good enough for hardening of systems with
more tightened security.
This patch set is based on the VFS tree as of yesterday.
Changes since v2:
* Rebased on top of current VFS tree
* Added missing conversion of bcachefs to new bdev opening API
* Added patch to drop old bdev opening API
* Dropped support for restricting writes to btrfs to avoid patch dependencies
and unblock merging of the patches
Changes since v1:
* Added kernel cmdline argument to toggle whether writing to mounted block
devices is allowed or not
* Fixed handling of partitions
* Limit write blocking only to devices open with explicit BLK_OPEN_BLOCK_WRITES
flag
Honza
Previous versions:
Link: https://lore.kernel.org/all/20230612161614.10302-1-jack@suse.cz #v1
Link: https://lore.kernel.org/all/20230704122727.17096-1-jack@suse.cz #v2
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/7] bcachefs: Convert to bdev_open_by_path()
2023-11-01 17:43 [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jan Kara
@ 2023-11-01 17:43 ` Jan Kara
2023-11-01 19:01 ` Brian Foster
` (2 more replies)
2023-11-01 17:43 ` [PATCH 2/7] block: Remove blkdev_get_by_*() functions Jan Kara
` (6 subsequent siblings)
7 siblings, 3 replies; 23+ messages in thread
From: Jan Kara @ 2023-11-01 17:43 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov,
Jan Kara, Kent Overstreet, Brian Foster, linux-bcachefs
Convert bcachefs to use bdev_open_by_path() and pass the handle around.
CC: Kent Overstreet <kent.overstreet@linux.dev>
CC: Brian Foster <bfoster@redhat.com>
CC: linux-bcachefs@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/bcachefs/super-io.c | 19 ++++++++++---------
fs/bcachefs/super_types.h | 1 +
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
index 332d41e1c0a3..01a32c41a540 100644
--- a/fs/bcachefs/super-io.c
+++ b/fs/bcachefs/super-io.c
@@ -162,8 +162,8 @@ void bch2_sb_field_delete(struct bch_sb_handle *sb,
void bch2_free_super(struct bch_sb_handle *sb)
{
kfree(sb->bio);
- if (!IS_ERR_OR_NULL(sb->bdev))
- blkdev_put(sb->bdev, sb->holder);
+ if (!IS_ERR_OR_NULL(sb->bdev_handle))
+ bdev_release(sb->bdev_handle);
kfree(sb->holder);
kfree(sb->sb);
@@ -685,21 +685,22 @@ int bch2_read_super(const char *path, struct bch_opts *opts,
if (!opt_get(*opts, nochanges))
sb->mode |= BLK_OPEN_WRITE;
- sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
- if (IS_ERR(sb->bdev) &&
- PTR_ERR(sb->bdev) == -EACCES &&
+ sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
+ if (IS_ERR(sb->bdev_handle) &&
+ PTR_ERR(sb->bdev_handle) == -EACCES &&
opt_get(*opts, read_only)) {
sb->mode &= ~BLK_OPEN_WRITE;
- sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
- if (!IS_ERR(sb->bdev))
+ sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
+ if (!IS_ERR(sb->bdev_handle))
opt_set(*opts, nochanges, true);
}
- if (IS_ERR(sb->bdev)) {
- ret = PTR_ERR(sb->bdev);
+ if (IS_ERR(sb->bdev_handle)) {
+ ret = PTR_ERR(sb->bdev_handle);
goto out;
}
+ sb->bdev = sb->bdev_handle->bdev;
ret = bch2_sb_realloc(sb, 0);
if (ret) {
diff --git a/fs/bcachefs/super_types.h b/fs/bcachefs/super_types.h
index 78d6138db62d..b77d8897c9fa 100644
--- a/fs/bcachefs/super_types.h
+++ b/fs/bcachefs/super_types.h
@@ -4,6 +4,7 @@
struct bch_sb_handle {
struct bch_sb *sb;
+ struct bdev_handle *bdev_handle;
struct block_device *bdev;
struct bio *bio;
void *holder;
--
2.35.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 2/7] block: Remove blkdev_get_by_*() functions
2023-11-01 17:43 [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jan Kara
2023-11-01 17:43 ` [PATCH 1/7] bcachefs: Convert to bdev_open_by_path() Jan Kara
@ 2023-11-01 17:43 ` Jan Kara
2023-11-06 14:10 ` Christian Brauner
2023-11-01 17:43 ` [PATCH 3/7] block: Add config option to not allow writing to mounted devices Jan Kara
` (5 subsequent siblings)
7 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2023-11-01 17:43 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov,
Jan Kara, Christoph Hellwig
blkdev_get_by_*() and blkdev_put() functions are now unused. Remove
them.
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
---
block/bdev.c | 94 ++++++++++++++----------------------------
include/linux/blkdev.h | 5 ---
2 files changed, 30 insertions(+), 69 deletions(-)
diff --git a/block/bdev.c b/block/bdev.c
index 4a502fb9b814..3f27939e02c6 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -732,7 +732,7 @@ void blkdev_put_no_open(struct block_device *bdev)
}
/**
- * blkdev_get_by_dev - open a block device by device number
+ * bdev_open_by_dev - open a block device by device number
* @dev: device number of block device to open
* @mode: open mode (BLK_OPEN_*)
* @holder: exclusive holder identifier
@@ -744,32 +744,40 @@ void blkdev_put_no_open(struct block_device *bdev)
*
* Use this interface ONLY if you really do not have anything better - i.e. when
* you are behind a truly sucky interface and all you are given is a device
- * number. Everything else should use blkdev_get_by_path().
+ * number. Everything else should use bdev_open_by_path().
*
* CONTEXT:
* Might sleep.
*
* RETURNS:
- * Reference to the block_device on success, ERR_PTR(-errno) on failure.
+ * Handle with a reference to the block_device on success, ERR_PTR(-errno) on
+ * failure.
*/
-struct block_device *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
- const struct blk_holder_ops *hops)
+struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
+ const struct blk_holder_ops *hops)
{
- bool unblock_events = true;
+ struct bdev_handle *handle = kmalloc(sizeof(struct bdev_handle),
+ GFP_KERNEL);
struct block_device *bdev;
+ bool unblock_events = true;
struct gendisk *disk;
int ret;
+ if (!handle)
+ return ERR_PTR(-ENOMEM);
+
ret = devcgroup_check_permission(DEVCG_DEV_BLOCK,
MAJOR(dev), MINOR(dev),
((mode & BLK_OPEN_READ) ? DEVCG_ACC_READ : 0) |
((mode & BLK_OPEN_WRITE) ? DEVCG_ACC_WRITE : 0));
if (ret)
- return ERR_PTR(ret);
+ goto free_handle;
bdev = blkdev_get_no_open(dev);
- if (!bdev)
- return ERR_PTR(-ENXIO);
+ if (!bdev) {
+ ret = -ENXIO;
+ goto free_handle;
+ }
disk = bdev->bd_disk;
if (holder) {
@@ -818,7 +826,10 @@ struct block_device *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
if (unblock_events)
disk_unblock_events(disk);
- return bdev;
+ handle->bdev = bdev;
+ handle->holder = holder;
+ handle->mode = mode;
+ return handle;
put_module:
module_put(disk->fops->owner);
abort_claiming:
@@ -828,34 +839,14 @@ struct block_device *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
disk_unblock_events(disk);
put_blkdev:
blkdev_put_no_open(bdev);
+free_handle:
+ kfree(handle);
return ERR_PTR(ret);
}
-EXPORT_SYMBOL(blkdev_get_by_dev);
-
-struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
- const struct blk_holder_ops *hops)
-{
- struct bdev_handle *handle = kmalloc(sizeof(*handle), GFP_KERNEL);
- struct block_device *bdev;
-
- if (!handle)
- return ERR_PTR(-ENOMEM);
- bdev = blkdev_get_by_dev(dev, mode, holder, hops);
- if (IS_ERR(bdev)) {
- kfree(handle);
- return ERR_CAST(bdev);
- }
- handle->bdev = bdev;
- handle->holder = holder;
- if (holder)
- mode |= BLK_OPEN_EXCL;
- handle->mode = mode;
- return handle;
-}
EXPORT_SYMBOL(bdev_open_by_dev);
/**
- * blkdev_get_by_path - open a block device by name
+ * bdev_open_by_path - open a block device by name
* @path: path to the block device to open
* @mode: open mode (BLK_OPEN_*)
* @holder: exclusive holder identifier
@@ -869,29 +860,9 @@ EXPORT_SYMBOL(bdev_open_by_dev);
* Might sleep.
*
* RETURNS:
- * Reference to the block_device on success, ERR_PTR(-errno) on failure.
+ * Handle with a reference to the block_device on success, ERR_PTR(-errno) on
+ * failure.
*/
-struct block_device *blkdev_get_by_path(const char *path, blk_mode_t mode,
- void *holder, const struct blk_holder_ops *hops)
-{
- struct block_device *bdev;
- dev_t dev;
- int error;
-
- error = lookup_bdev(path, &dev);
- if (error)
- return ERR_PTR(error);
-
- bdev = blkdev_get_by_dev(dev, mode, holder, hops);
- if (!IS_ERR(bdev) && (mode & BLK_OPEN_WRITE) && bdev_read_only(bdev)) {
- blkdev_put(bdev, holder);
- return ERR_PTR(-EACCES);
- }
-
- return bdev;
-}
-EXPORT_SYMBOL(blkdev_get_by_path);
-
struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
void *holder, const struct blk_holder_ops *hops)
{
@@ -914,8 +885,9 @@ struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
}
EXPORT_SYMBOL(bdev_open_by_path);
-void blkdev_put(struct block_device *bdev, void *holder)
+void bdev_release(struct bdev_handle *handle)
{
+ struct block_device *bdev = handle->bdev;
struct gendisk *disk = bdev->bd_disk;
/*
@@ -929,8 +901,8 @@ void blkdev_put(struct block_device *bdev, void *holder)
sync_blockdev(bdev);
mutex_lock(&disk->open_mutex);
- if (holder)
- bd_end_claim(bdev, holder);
+ if (handle->holder)
+ bd_end_claim(bdev, handle->holder);
/*
* Trigger event checking and tell drivers to flush MEDIA_CHANGE
@@ -947,12 +919,6 @@ void blkdev_put(struct block_device *bdev, void *holder)
module_put(disk->fops->owner);
blkdev_put_no_open(bdev);
-}
-EXPORT_SYMBOL(blkdev_put);
-
-void bdev_release(struct bdev_handle *handle)
-{
- blkdev_put(handle->bdev, handle->holder);
kfree(handle);
}
EXPORT_SYMBOL(bdev_release);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index abf71cce785c..7afc10315dd5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1500,10 +1500,6 @@ struct bdev_handle {
blk_mode_t mode;
};
-struct block_device *blkdev_get_by_dev(dev_t dev, blk_mode_t mode, void *holder,
- const struct blk_holder_ops *hops);
-struct block_device *blkdev_get_by_path(const char *path, blk_mode_t mode,
- void *holder, const struct blk_holder_ops *hops);
struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
const struct blk_holder_ops *hops);
struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
@@ -1511,7 +1507,6 @@ struct bdev_handle *bdev_open_by_path(const char *path, blk_mode_t mode,
int bd_prepare_to_claim(struct block_device *bdev, void *holder,
const struct blk_holder_ops *hops);
void bd_abort_claiming(struct block_device *bdev, void *holder);
-void blkdev_put(struct block_device *bdev, void *holder);
void bdev_release(struct bdev_handle *handle);
/* just for blk-cgroup, don't use elsewhere */
--
2.35.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 3/7] block: Add config option to not allow writing to mounted devices
2023-11-01 17:43 [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jan Kara
2023-11-01 17:43 ` [PATCH 1/7] bcachefs: Convert to bdev_open_by_path() Jan Kara
2023-11-01 17:43 ` [PATCH 2/7] block: Remove blkdev_get_by_*() functions Jan Kara
@ 2023-11-01 17:43 ` Jan Kara
2023-11-06 14:47 ` Christian Brauner
2023-12-20 3:26 ` Li Lingfeng
2023-11-01 17:43 ` [PATCH 4/7] btrfs: Do not restrict writes to btrfs devices Jan Kara
` (4 subsequent siblings)
7 siblings, 2 replies; 23+ messages in thread
From: Jan Kara @ 2023-11-01 17:43 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov,
Jan Kara
Writing to mounted devices is dangerous and can lead to filesystem
corruption as well as crashes. Furthermore syzbot comes with more and
more involved examples how to corrupt block device under a mounted
filesystem leading to kernel crashes and reports we can do nothing
about. Add tracking of writers to each block device and a kernel cmdline
argument which controls whether other writeable opens to block devices
open with BLK_OPEN_RESTRICT_WRITES flag are allowed. We will make
filesystems use this flag for used devices.
Note that this effectively only prevents modification of the particular
block device's page cache by other writers. The actual device content
can still be modified by other means - e.g. by issuing direct scsi
commands, by doing writes through devices lower in the storage stack
(e.g. in case loop devices, DM, or MD are involved) etc. But blocking
direct modifications of the block device page cache is enough to give
filesystems a chance to perform data validation when loading data from
the underlying storage and thus prevent kernel crashes.
Syzbot can use this cmdline argument option to avoid uninteresting
crashes. Also users whose userspace setup does not need writing to
mounted block devices can set this option for hardening.
Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
block/Kconfig | 20 +++++++++++++
block/bdev.c | 62 ++++++++++++++++++++++++++++++++++++++-
include/linux/blk_types.h | 1 +
include/linux/blkdev.h | 2 ++
4 files changed, 84 insertions(+), 1 deletion(-)
diff --git a/block/Kconfig b/block/Kconfig
index f1364d1c0d93..ca04b657e058 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -78,6 +78,26 @@ config BLK_DEV_INTEGRITY_T10
select CRC_T10DIF
select CRC64_ROCKSOFT
+config BLK_DEV_WRITE_MOUNTED
+ bool "Allow writing to mounted block devices"
+ default y
+ help
+ When a block device is mounted, writing to its buffer cache is very
+ likely going to cause filesystem corruption. It is also rather easy to
+ crash the kernel in this way since the filesystem has no practical way
+ of detecting these writes to buffer cache and verifying its metadata
+ integrity. However there are some setups that need this capability
+ like running fsck on read-only mounted root device, modifying some
+ features on mounted ext4 filesystem, and similar. If you say N, the
+ kernel will prevent processes from writing to block devices that are
+ mounted by filesystems which provides some more protection from runaway
+ privileged processes and generally makes it much harder to crash
+ filesystem drivers. Note however that this does not prevent
+ underlying device(s) from being modified by other means, e.g. by
+ directly submitting SCSI commands or through access to lower layers of
+ storage stack. If in doubt, say Y. The configuration can be overridden
+ with the bdev_allow_write_mounted boot option.
+
config BLK_DEV_ZONED
bool "Zoned block device support"
select MQ_IOSCHED_DEADLINE
diff --git a/block/bdev.c b/block/bdev.c
index 3f27939e02c6..d75dd7dd2b31 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -30,6 +30,9 @@
#include "../fs/internal.h"
#include "blk.h"
+/* Should we allow writing to mounted block devices? */
+static bool bdev_allow_write_mounted = IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED);
+
struct bdev_inode {
struct block_device bdev;
struct inode vfs_inode;
@@ -730,7 +733,34 @@ void blkdev_put_no_open(struct block_device *bdev)
{
put_device(&bdev->bd_device);
}
-
+
+static bool bdev_writes_blocked(struct block_device *bdev)
+{
+ return bdev->bd_writers == -1;
+}
+
+static void bdev_block_writes(struct block_device *bdev)
+{
+ bdev->bd_writers = -1;
+}
+
+static void bdev_unblock_writes(struct block_device *bdev)
+{
+ bdev->bd_writers = 0;
+}
+
+static bool blkdev_open_compatible(struct block_device *bdev, blk_mode_t mode)
+{
+ if (!bdev_allow_write_mounted) {
+ /* Writes blocked? */
+ if (mode & BLK_OPEN_WRITE && bdev_writes_blocked(bdev))
+ return false;
+ if (mode & BLK_OPEN_RESTRICT_WRITES && bdev->bd_writers > 0)
+ return false;
+ }
+ return true;
+}
+
/**
* bdev_open_by_dev - open a block device by device number
* @dev: device number of block device to open
@@ -773,6 +803,10 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
if (ret)
goto free_handle;
+ /* Blocking writes requires exclusive opener */
+ if (mode & BLK_OPEN_RESTRICT_WRITES && !holder)
+ return ERR_PTR(-EINVAL);
+
bdev = blkdev_get_no_open(dev);
if (!bdev) {
ret = -ENXIO;
@@ -800,12 +834,21 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
goto abort_claiming;
if (!try_module_get(disk->fops->owner))
goto abort_claiming;
+ ret = -EBUSY;
+ if (!blkdev_open_compatible(bdev, mode))
+ goto abort_claiming;
if (bdev_is_partition(bdev))
ret = blkdev_get_part(bdev, mode);
else
ret = blkdev_get_whole(bdev, mode);
if (ret)
goto put_module;
+ if (!bdev_allow_write_mounted) {
+ if (mode & BLK_OPEN_RESTRICT_WRITES)
+ bdev_block_writes(bdev);
+ else if (mode & BLK_OPEN_WRITE)
+ bdev->bd_writers++;
+ }
if (holder) {
bd_finish_claiming(bdev, holder, hops);
@@ -901,6 +944,14 @@ void bdev_release(struct bdev_handle *handle)
sync_blockdev(bdev);
mutex_lock(&disk->open_mutex);
+ if (!bdev_allow_write_mounted) {
+ /* The exclusive opener was blocking writes? Unblock them. */
+ if (handle->mode & BLK_OPEN_RESTRICT_WRITES)
+ bdev_unblock_writes(bdev);
+ else if (handle->mode & BLK_OPEN_WRITE)
+ bdev->bd_writers--;
+ }
+
if (handle->holder)
bd_end_claim(bdev, handle->holder);
@@ -1069,3 +1120,12 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
blkdev_put_no_open(bdev);
}
+
+static int __init setup_bdev_allow_write_mounted(char *str)
+{
+ if (kstrtobool(str, &bdev_allow_write_mounted))
+ pr_warn("Invalid option string for bdev_allow_write_mounted:"
+ " '%s'\n", str);
+ return 1;
+}
+__setup("bdev_allow_write_mounted=", setup_bdev_allow_write_mounted);
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 749203277fee..52e264d5a830 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -66,6 +66,7 @@ struct block_device {
#ifdef CONFIG_FAIL_MAKE_REQUEST
bool bd_make_it_fail;
#endif
+ int bd_writers;
/*
* keep this out-of-line as it's both big and not needed in the fast
* path
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7afc10315dd5..0e0c0186aa32 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -124,6 +124,8 @@ typedef unsigned int __bitwise blk_mode_t;
#define BLK_OPEN_NDELAY ((__force blk_mode_t)(1 << 3))
/* open for "writes" only for ioctls (specialy hack for floppy.c) */
#define BLK_OPEN_WRITE_IOCTL ((__force blk_mode_t)(1 << 4))
+/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */
+#define BLK_OPEN_RESTRICT_WRITES ((__force blk_mode_t)(1 << 5))
struct gendisk {
/*
--
2.35.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 4/7] btrfs: Do not restrict writes to btrfs devices
2023-11-01 17:43 [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jan Kara
` (2 preceding siblings ...)
2023-11-01 17:43 ` [PATCH 3/7] block: Add config option to not allow writing to mounted devices Jan Kara
@ 2023-11-01 17:43 ` Jan Kara
2023-11-02 17:13 ` David Sterba
2023-11-01 17:43 ` [PATCH 5/7] fs: Block writes to mounted block devices Jan Kara
` (3 subsequent siblings)
7 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2023-11-01 17:43 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov,
Jan Kara, linux-btrfs, David Sterba, Josef Bacik, Chris Mason
Btrfs device probing code needs adaptation so that it works when writes
are restricted to its mounted devices. Since btrfs maintainer wants to
merge these changes through btrfs tree and there are review bandwidth
issues with that, let's not block all other filesystems and just not
restrict writes to btrfs devices for now.
CC: linux-btrfs@vger.kernel.org
CC: David Sterba <dsterba@suse.com>
CC: Josef Bacik <josef@toxicpanda.com>
CC: Chris Mason <clm@fb.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/btrfs/super.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6ecf78d09694..0ceeb9517177 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1403,6 +1403,8 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
return ERR_PTR(error);
}
+ /* No support for restricting writes to btrfs devices yet... */
+ mode &= ~BLK_OPEN_RESTRICT_WRITES;
/*
* Setup a dummy root and fs_info for test/set super. This is because
* we don't actually fill this stuff out until open_ctree, but we need
--
2.35.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 5/7] fs: Block writes to mounted block devices
2023-11-01 17:43 [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jan Kara
` (3 preceding siblings ...)
2023-11-01 17:43 ` [PATCH 4/7] btrfs: Do not restrict writes to btrfs devices Jan Kara
@ 2023-11-01 17:43 ` Jan Kara
2023-11-06 14:32 ` Christian Brauner
2023-11-01 17:43 ` [PATCH 6/7] xfs: Block writes to log device Jan Kara
` (2 subsequent siblings)
7 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2023-11-01 17:43 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov,
Jan Kara
Ask block layer to block writes to block devices mounted by filesystems.
Signed-off-by: Jan Kara <jack@suse.cz>
---
include/linux/blkdev.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 0e0c0186aa32..9f6c3373f9fc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1494,7 +1494,8 @@ extern const struct blk_holder_ops fs_holder_ops;
* as stored in sb->s_flags.
*/
#define sb_open_mode(flags) \
- (BLK_OPEN_READ | (((flags) & SB_RDONLY) ? 0 : BLK_OPEN_WRITE))
+ (BLK_OPEN_READ | BLK_OPEN_RESTRICT_WRITES | \
+ (((flags) & SB_RDONLY) ? 0 : BLK_OPEN_WRITE))
struct bdev_handle {
struct block_device *bdev;
--
2.35.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 6/7] xfs: Block writes to log device
2023-11-01 17:43 [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jan Kara
` (4 preceding siblings ...)
2023-11-01 17:43 ` [PATCH 5/7] fs: Block writes to mounted block devices Jan Kara
@ 2023-11-01 17:43 ` Jan Kara
2023-11-01 17:43 ` [PATCH 7/7] ext4: Block writes to journal device Jan Kara
2023-11-07 15:32 ` [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jens Axboe
7 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2023-11-01 17:43 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov,
Jan Kara, Darrick J . Wong
Ask block layer to not allow other writers to open block devices used
for xfs log and realtime devices.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/xfs/xfs_super.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 84107d162e41..2f29a6810e6a 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -365,8 +365,9 @@ xfs_blkdev_get(
{
int error = 0;
- *handlep = bdev_open_by_path(name, BLK_OPEN_READ | BLK_OPEN_WRITE,
- mp->m_super, &fs_holder_ops);
+ *handlep = bdev_open_by_path(name,
+ BLK_OPEN_READ | BLK_OPEN_WRITE | BLK_OPEN_RESTRICT_WRITES,
+ mp->m_super, &fs_holder_ops);
if (IS_ERR(*handlep)) {
error = PTR_ERR(*handlep);
*handlep = NULL;
--
2.35.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [PATCH 7/7] ext4: Block writes to journal device
2023-11-01 17:43 [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jan Kara
` (5 preceding siblings ...)
2023-11-01 17:43 ` [PATCH 6/7] xfs: Block writes to log device Jan Kara
@ 2023-11-01 17:43 ` Jan Kara
2023-11-07 15:32 ` [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jens Axboe
7 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2023-11-01 17:43 UTC (permalink / raw)
To: Christian Brauner
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov,
Jan Kara
Ask block layer to not allow other writers to open block device used
for ext4 journal.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/super.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 439e37ac219b..f96398f8aac9 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5855,8 +5855,9 @@ static struct bdev_handle *ext4_get_journal_blkdev(struct super_block *sb,
struct ext4_super_block *es;
int errno;
- bdev_handle = bdev_open_by_dev(j_dev, BLK_OPEN_READ | BLK_OPEN_WRITE,
- sb, &fs_holder_ops);
+ bdev_handle = bdev_open_by_dev(j_dev,
+ BLK_OPEN_READ | BLK_OPEN_WRITE | BLK_OPEN_RESTRICT_WRITES,
+ sb, &fs_holder_ops);
if (IS_ERR(bdev_handle)) {
ext4_msg(sb, KERN_ERR,
"failed to open journal device unknown-block(%u,%u) %ld",
--
2.35.3
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] bcachefs: Convert to bdev_open_by_path()
2023-11-01 17:43 ` [PATCH 1/7] bcachefs: Convert to bdev_open_by_path() Jan Kara
@ 2023-11-01 19:01 ` Brian Foster
2023-11-02 1:09 ` Kent Overstreet
2023-11-02 9:55 ` Jan Kara
2023-11-02 1:09 ` Kent Overstreet
2023-11-07 9:28 ` Christian Brauner
2 siblings, 2 replies; 23+ messages in thread
From: Brian Foster @ 2023-11-01 19:01 UTC (permalink / raw)
To: Jan Kara
Cc: Christian Brauner, linux-fsdevel, linux-block, Jens Axboe,
Christoph Hellwig, Kees Cook, syzkaller, Alexander Popov,
linux-xfs, Dmitry Vyukov, Kent Overstreet, linux-bcachefs
On Wed, Nov 01, 2023 at 06:43:06PM +0100, Jan Kara wrote:
> Convert bcachefs to use bdev_open_by_path() and pass the handle around.
>
> CC: Kent Overstreet <kent.overstreet@linux.dev>
> CC: Brian Foster <bfoster@redhat.com>
> CC: linux-bcachefs@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/bcachefs/super-io.c | 19 ++++++++++---------
> fs/bcachefs/super_types.h | 1 +
> 2 files changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
> index 332d41e1c0a3..01a32c41a540 100644
> --- a/fs/bcachefs/super-io.c
> +++ b/fs/bcachefs/super-io.c
...
> @@ -685,21 +685,22 @@ int bch2_read_super(const char *path, struct bch_opts *opts,
> if (!opt_get(*opts, nochanges))
> sb->mode |= BLK_OPEN_WRITE;
>
> - sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> - if (IS_ERR(sb->bdev) &&
> - PTR_ERR(sb->bdev) == -EACCES &&
> + sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> + if (IS_ERR(sb->bdev_handle) &&
> + PTR_ERR(sb->bdev_handle) == -EACCES &&
> opt_get(*opts, read_only)) {
> sb->mode &= ~BLK_OPEN_WRITE;
>
> - sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> - if (!IS_ERR(sb->bdev))
> + sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> + if (!IS_ERR(sb->bdev_handle))
> opt_set(*opts, nochanges, true);
> }
>
> - if (IS_ERR(sb->bdev)) {
> - ret = PTR_ERR(sb->bdev);
> + if (IS_ERR(sb->bdev_handle)) {
> + ret = PTR_ERR(sb->bdev_handle);
> goto out;
> }
> + sb->bdev = sb->bdev_handle->bdev;
>
> ret = bch2_sb_realloc(sb, 0);
> if (ret) {
Hi Jan,
This all seems reasonable to me, but should bcachefs use sb_open_mode()
somewhere in here to involve use of the restrict writes flag in the
first place? It looks like bcachefs sort of open codes bits of
mount_bdev() so it isn't clear the flag would be used anywhere...
Brian
> diff --git a/fs/bcachefs/super_types.h b/fs/bcachefs/super_types.h
> index 78d6138db62d..b77d8897c9fa 100644
> --- a/fs/bcachefs/super_types.h
> +++ b/fs/bcachefs/super_types.h
> @@ -4,6 +4,7 @@
>
> struct bch_sb_handle {
> struct bch_sb *sb;
> + struct bdev_handle *bdev_handle;
> struct block_device *bdev;
> struct bio *bio;
> void *holder;
> --
> 2.35.3
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] bcachefs: Convert to bdev_open_by_path()
2023-11-01 19:01 ` Brian Foster
@ 2023-11-02 1:09 ` Kent Overstreet
2023-11-02 9:55 ` Jan Kara
1 sibling, 0 replies; 23+ messages in thread
From: Kent Overstreet @ 2023-11-02 1:09 UTC (permalink / raw)
To: Brian Foster
Cc: Jan Kara, Christian Brauner, linux-fsdevel, linux-block,
Jens Axboe, Christoph Hellwig, Kees Cook, syzkaller,
Alexander Popov, linux-xfs, Dmitry Vyukov, linux-bcachefs
On Wed, Nov 01, 2023 at 03:01:22PM -0400, Brian Foster wrote:
> On Wed, Nov 01, 2023 at 06:43:06PM +0100, Jan Kara wrote:
> > Convert bcachefs to use bdev_open_by_path() and pass the handle around.
> >
> > CC: Kent Overstreet <kent.overstreet@linux.dev>
> > CC: Brian Foster <bfoster@redhat.com>
> > CC: linux-bcachefs@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/bcachefs/super-io.c | 19 ++++++++++---------
> > fs/bcachefs/super_types.h | 1 +
> > 2 files changed, 11 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
> > index 332d41e1c0a3..01a32c41a540 100644
> > --- a/fs/bcachefs/super-io.c
> > +++ b/fs/bcachefs/super-io.c
> ...
> > @@ -685,21 +685,22 @@ int bch2_read_super(const char *path, struct bch_opts *opts,
> > if (!opt_get(*opts, nochanges))
> > sb->mode |= BLK_OPEN_WRITE;
> >
> > - sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > - if (IS_ERR(sb->bdev) &&
> > - PTR_ERR(sb->bdev) == -EACCES &&
> > + sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > + if (IS_ERR(sb->bdev_handle) &&
> > + PTR_ERR(sb->bdev_handle) == -EACCES &&
> > opt_get(*opts, read_only)) {
> > sb->mode &= ~BLK_OPEN_WRITE;
> >
> > - sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > - if (!IS_ERR(sb->bdev))
> > + sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > + if (!IS_ERR(sb->bdev_handle))
> > opt_set(*opts, nochanges, true);
> > }
> >
> > - if (IS_ERR(sb->bdev)) {
> > - ret = PTR_ERR(sb->bdev);
> > + if (IS_ERR(sb->bdev_handle)) {
> > + ret = PTR_ERR(sb->bdev_handle);
> > goto out;
> > }
> > + sb->bdev = sb->bdev_handle->bdev;
> >
> > ret = bch2_sb_realloc(sb, 0);
> > if (ret) {
>
> Hi Jan,
>
> This all seems reasonable to me, but should bcachefs use sb_open_mode()
> somewhere in here to involve use of the restrict writes flag in the
> first place? It looks like bcachefs sort of open codes bits of
> mount_bdev() so it isn't clear the flag would be used anywhere...
Yeah, but that should be a separate patch :)
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] bcachefs: Convert to bdev_open_by_path()
2023-11-01 17:43 ` [PATCH 1/7] bcachefs: Convert to bdev_open_by_path() Jan Kara
2023-11-01 19:01 ` Brian Foster
@ 2023-11-02 1:09 ` Kent Overstreet
2023-11-07 9:28 ` Christian Brauner
2 siblings, 0 replies; 23+ messages in thread
From: Kent Overstreet @ 2023-11-02 1:09 UTC (permalink / raw)
To: Jan Kara
Cc: Christian Brauner, linux-fsdevel, linux-block, Jens Axboe,
Christoph Hellwig, Kees Cook, syzkaller, Alexander Popov,
linux-xfs, Dmitry Vyukov, Brian Foster, linux-bcachefs
On Wed, Nov 01, 2023 at 06:43:06PM +0100, Jan Kara wrote:
> Convert bcachefs to use bdev_open_by_path() and pass the handle around.
>
> CC: Kent Overstreet <kent.overstreet@linux.dev>
> CC: Brian Foster <bfoster@redhat.com>
> CC: linux-bcachefs@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] bcachefs: Convert to bdev_open_by_path()
2023-11-01 19:01 ` Brian Foster
2023-11-02 1:09 ` Kent Overstreet
@ 2023-11-02 9:55 ` Jan Kara
2023-11-02 11:58 ` Brian Foster
1 sibling, 1 reply; 23+ messages in thread
From: Jan Kara @ 2023-11-02 9:55 UTC (permalink / raw)
To: Brian Foster
Cc: Jan Kara, Christian Brauner, linux-fsdevel, linux-block,
Jens Axboe, Christoph Hellwig, Kees Cook, syzkaller,
Alexander Popov, linux-xfs, Dmitry Vyukov, Kent Overstreet,
linux-bcachefs
Hi Brian,
On Wed 01-11-23 15:01:22, Brian Foster wrote:
> On Wed, Nov 01, 2023 at 06:43:06PM +0100, Jan Kara wrote:
> > Convert bcachefs to use bdev_open_by_path() and pass the handle around.
> >
> > CC: Kent Overstreet <kent.overstreet@linux.dev>
> > CC: Brian Foster <bfoster@redhat.com>
> > CC: linux-bcachefs@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > fs/bcachefs/super-io.c | 19 ++++++++++---------
> > fs/bcachefs/super_types.h | 1 +
> > 2 files changed, 11 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
> > index 332d41e1c0a3..01a32c41a540 100644
> > --- a/fs/bcachefs/super-io.c
> > +++ b/fs/bcachefs/super-io.c
> ...
> > @@ -685,21 +685,22 @@ int bch2_read_super(const char *path, struct bch_opts *opts,
> > if (!opt_get(*opts, nochanges))
> > sb->mode |= BLK_OPEN_WRITE;
> >
> > - sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > - if (IS_ERR(sb->bdev) &&
> > - PTR_ERR(sb->bdev) == -EACCES &&
> > + sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > + if (IS_ERR(sb->bdev_handle) &&
> > + PTR_ERR(sb->bdev_handle) == -EACCES &&
> > opt_get(*opts, read_only)) {
> > sb->mode &= ~BLK_OPEN_WRITE;
> >
> > - sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > - if (!IS_ERR(sb->bdev))
> > + sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > + if (!IS_ERR(sb->bdev_handle))
> > opt_set(*opts, nochanges, true);
> > }
> >
> > - if (IS_ERR(sb->bdev)) {
> > - ret = PTR_ERR(sb->bdev);
> > + if (IS_ERR(sb->bdev_handle)) {
> > + ret = PTR_ERR(sb->bdev_handle);
> > goto out;
> > }
> > + sb->bdev = sb->bdev_handle->bdev;
> >
> > ret = bch2_sb_realloc(sb, 0);
> > if (ret) {
>
> This all seems reasonable to me, but should bcachefs use sb_open_mode()
> somewhere in here to involve use of the restrict writes flag in the
> first place? It looks like bcachefs sort of open codes bits of
> mount_bdev() so it isn't clear the flag would be used anywhere...
Yes, so AFAICS bcachefs will not get the write restriction from the changes
in the generic code. Using sb_open_mode() in bcachefs would fix that but
given the 'noexcl' and 'nochanges' mount options it will not be completely
seamless anyway. I guess once the generic changes are in, bcachefs can
decide how exactly it wants to set the BLK_OPEN_RESTRICT_WRITES flag. Or if
you already have an opinion, we can just add the patch to this series.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] bcachefs: Convert to bdev_open_by_path()
2023-11-02 9:55 ` Jan Kara
@ 2023-11-02 11:58 ` Brian Foster
0 siblings, 0 replies; 23+ messages in thread
From: Brian Foster @ 2023-11-02 11:58 UTC (permalink / raw)
To: Jan Kara
Cc: Christian Brauner, linux-fsdevel, linux-block, Jens Axboe,
Christoph Hellwig, Kees Cook, syzkaller, Alexander Popov,
linux-xfs, Dmitry Vyukov, Kent Overstreet, linux-bcachefs
On Thu, Nov 02, 2023 at 10:55:50AM +0100, Jan Kara wrote:
> Hi Brian,
>
> On Wed 01-11-23 15:01:22, Brian Foster wrote:
> > On Wed, Nov 01, 2023 at 06:43:06PM +0100, Jan Kara wrote:
> > > Convert bcachefs to use bdev_open_by_path() and pass the handle around.
> > >
> > > CC: Kent Overstreet <kent.overstreet@linux.dev>
> > > CC: Brian Foster <bfoster@redhat.com>
> > > CC: linux-bcachefs@vger.kernel.org
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > > fs/bcachefs/super-io.c | 19 ++++++++++---------
> > > fs/bcachefs/super_types.h | 1 +
> > > 2 files changed, 11 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/fs/bcachefs/super-io.c b/fs/bcachefs/super-io.c
> > > index 332d41e1c0a3..01a32c41a540 100644
> > > --- a/fs/bcachefs/super-io.c
> > > +++ b/fs/bcachefs/super-io.c
> > ...
> > > @@ -685,21 +685,22 @@ int bch2_read_super(const char *path, struct bch_opts *opts,
> > > if (!opt_get(*opts, nochanges))
> > > sb->mode |= BLK_OPEN_WRITE;
> > >
> > > - sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > > - if (IS_ERR(sb->bdev) &&
> > > - PTR_ERR(sb->bdev) == -EACCES &&
> > > + sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > > + if (IS_ERR(sb->bdev_handle) &&
> > > + PTR_ERR(sb->bdev_handle) == -EACCES &&
> > > opt_get(*opts, read_only)) {
> > > sb->mode &= ~BLK_OPEN_WRITE;
> > >
> > > - sb->bdev = blkdev_get_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > > - if (!IS_ERR(sb->bdev))
> > > + sb->bdev_handle = bdev_open_by_path(path, sb->mode, sb->holder, &bch2_sb_handle_bdev_ops);
> > > + if (!IS_ERR(sb->bdev_handle))
> > > opt_set(*opts, nochanges, true);
> > > }
> > >
> > > - if (IS_ERR(sb->bdev)) {
> > > - ret = PTR_ERR(sb->bdev);
> > > + if (IS_ERR(sb->bdev_handle)) {
> > > + ret = PTR_ERR(sb->bdev_handle);
> > > goto out;
> > > }
> > > + sb->bdev = sb->bdev_handle->bdev;
> > >
> > > ret = bch2_sb_realloc(sb, 0);
> > > if (ret) {
> >
> > This all seems reasonable to me, but should bcachefs use sb_open_mode()
> > somewhere in here to involve use of the restrict writes flag in the
> > first place? It looks like bcachefs sort of open codes bits of
> > mount_bdev() so it isn't clear the flag would be used anywhere...
>
> Yes, so AFAICS bcachefs will not get the write restriction from the changes
> in the generic code. Using sb_open_mode() in bcachefs would fix that but
> given the 'noexcl' and 'nochanges' mount options it will not be completely
> seamless anyway. I guess once the generic changes are in, bcachefs can
> decide how exactly it wants to set the BLK_OPEN_RESTRICT_WRITES flag. Or if
> you already have an opinion, we can just add the patch to this series.
>
Yep, makes sense, and I agree with Kent this should be a separate patch
anyways. Just wanted to make sure I wasn't missing something or
otherwise it was known that bcachefs needs a bit more work to turn this
on... thanks.
Brian
> Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 4/7] btrfs: Do not restrict writes to btrfs devices
2023-11-01 17:43 ` [PATCH 4/7] btrfs: Do not restrict writes to btrfs devices Jan Kara
@ 2023-11-02 17:13 ` David Sterba
0 siblings, 0 replies; 23+ messages in thread
From: David Sterba @ 2023-11-02 17:13 UTC (permalink / raw)
To: Jan Kara
Cc: Christian Brauner, linux-fsdevel, linux-block, Jens Axboe,
Christoph Hellwig, Kees Cook, syzkaller, Alexander Popov,
linux-xfs, Dmitry Vyukov, linux-btrfs, David Sterba, Josef Bacik,
Chris Mason
On Wed, Nov 01, 2023 at 06:43:09PM +0100, Jan Kara wrote:
> Btrfs device probing code needs adaptation so that it works when writes
> are restricted to its mounted devices. Since btrfs maintainer wants to
> merge these changes through btrfs tree and there are review bandwidth
> issues with that, let's not block all other filesystems and just not
> restrict writes to btrfs devices for now.
>
> CC: linux-btrfs@vger.kernel.org
> CC: David Sterba <dsterba@suse.com>
> CC: Josef Bacik <josef@toxicpanda.com>
> CC: Chris Mason <clm@fb.com>
> Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: David Sterba <dsterba@suse.com>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/7] block: Remove blkdev_get_by_*() functions
2023-11-01 17:43 ` [PATCH 2/7] block: Remove blkdev_get_by_*() functions Jan Kara
@ 2023-11-06 14:10 ` Christian Brauner
0 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2023-11-06 14:10 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov,
Christoph Hellwig
On Wed, Nov 01, 2023 at 06:43:07PM +0100, Jan Kara wrote:
> blkdev_get_by_*() and blkdev_put() functions are now unused. Remove
> them.
>
> Acked-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
Yes, very nice!
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 5/7] fs: Block writes to mounted block devices
2023-11-01 17:43 ` [PATCH 5/7] fs: Block writes to mounted block devices Jan Kara
@ 2023-11-06 14:32 ` Christian Brauner
0 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2023-11-06 14:32 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov
On Wed, Nov 01, 2023 at 06:43:10PM +0100, Jan Kara wrote:
> Ask block layer to block writes to block devices mounted by filesystems.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/7] block: Add config option to not allow writing to mounted devices
2023-11-01 17:43 ` [PATCH 3/7] block: Add config option to not allow writing to mounted devices Jan Kara
@ 2023-11-06 14:47 ` Christian Brauner
2023-11-06 15:18 ` Jan Kara
2023-12-20 3:26 ` Li Lingfeng
1 sibling, 1 reply; 23+ messages in thread
From: Christian Brauner @ 2023-11-06 14:47 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov
On Wed, Nov 01, 2023 at 06:43:08PM +0100, Jan Kara wrote:
> Writing to mounted devices is dangerous and can lead to filesystem
> corruption as well as crashes. Furthermore syzbot comes with more and
> more involved examples how to corrupt block device under a mounted
> filesystem leading to kernel crashes and reports we can do nothing
> about. Add tracking of writers to each block device and a kernel cmdline
> argument which controls whether other writeable opens to block devices
> open with BLK_OPEN_RESTRICT_WRITES flag are allowed. We will make
> filesystems use this flag for used devices.
>
> Note that this effectively only prevents modification of the particular
> block device's page cache by other writers. The actual device content
> can still be modified by other means - e.g. by issuing direct scsi
> commands, by doing writes through devices lower in the storage stack
> (e.g. in case loop devices, DM, or MD are involved) etc. But blocking
> direct modifications of the block device page cache is enough to give
> filesystems a chance to perform data validation when loading data from
> the underlying storage and thus prevent kernel crashes.
>
> Syzbot can use this cmdline argument option to avoid uninteresting
> crashes. Also users whose userspace setup does not need writing to
> mounted block devices can set this option for hardening.
>
> Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
A few minor tweaks I would do in-tree. Please see below.
I know it's mostly stylistic that's why I would do it so there's no
resend dance for non-technical reasons.
> block/Kconfig | 20 +++++++++++++
> block/bdev.c | 62 ++++++++++++++++++++++++++++++++++++++-
> include/linux/blk_types.h | 1 +
> include/linux/blkdev.h | 2 ++
> 4 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index f1364d1c0d93..ca04b657e058 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -78,6 +78,26 @@ config BLK_DEV_INTEGRITY_T10
> select CRC_T10DIF
> select CRC64_ROCKSOFT
>
> +config BLK_DEV_WRITE_MOUNTED
> + bool "Allow writing to mounted block devices"
> + default y
Let's hope that this can become the default one day.
> +static void bdev_unblock_writes(struct block_device *bdev)
> +{
> + bdev->bd_writers = 0;
> +}
> +
> +static bool blkdev_open_compatible(struct block_device *bdev, blk_mode_t mode)
I would like to mirror our may_{open,create}() routines here and call
this:
bdev_may_open()
This is a well-known vfs pattern and also easy to understand for block
devs as well.
> @@ -800,12 +834,21 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
> goto abort_claiming;
> if (!try_module_get(disk->fops->owner))
> goto abort_claiming;
> + ret = -EBUSY;
> + if (!blkdev_open_compatible(bdev, mode))
> + goto abort_claiming;
> if (bdev_is_partition(bdev))
> ret = blkdev_get_part(bdev, mode);
> else
> ret = blkdev_get_whole(bdev, mode);
> if (ret)
> goto put_module;
> + if (!bdev_allow_write_mounted) {
> + if (mode & BLK_OPEN_RESTRICT_WRITES)
> + bdev_block_writes(bdev);
> + else if (mode & BLK_OPEN_WRITE)
> + bdev->bd_writers++;
> + }
I would like to move this to a tiny helper for clarity:
static void bdev_claim_write_access(struct block_device *bdev)
{
if (!bdev_allow_write_mounted)
return;
/* Claim exclusive or shared write access to the block device. */
if (mode & BLK_OPEN_RESTRICT_WRITES)
bdev_block_writes(bdev);
else if (mode & BLK_OPEN_WRITE)
bdev->bd_writers++;
}
> if (holder) {
> bd_finish_claiming(bdev, holder, hops);
>
> @@ -901,6 +944,14 @@ void bdev_release(struct bdev_handle *handle)
> sync_blockdev(bdev);
>
> mutex_lock(&disk->open_mutex);
> + if (!bdev_allow_write_mounted) {
> + /* The exclusive opener was blocking writes? Unblock them. */
> + if (handle->mode & BLK_OPEN_RESTRICT_WRITES)
> + bdev_unblock_writes(bdev);
> + else if (handle->mode & BLK_OPEN_WRITE)
> + bdev->bd_writers--;
> + }
static void bdev_yield_write_access(struct block_device *bdev)
{
if (!bdev_allow_write_mounted)
return;
/* Yield exclusive or shared write access. */
if (handle->mode & BLK_OPEN_RESTRICT_WRITES)
bdev_unblock_writes(bdev);
else if (handle->mode & BLK_OPEN_WRITE)
bdev->bd_writers--;
}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/7] block: Add config option to not allow writing to mounted devices
2023-11-06 14:47 ` Christian Brauner
@ 2023-11-06 15:18 ` Jan Kara
2023-11-06 15:57 ` Christian Brauner
0 siblings, 1 reply; 23+ messages in thread
From: Jan Kara @ 2023-11-06 15:18 UTC (permalink / raw)
To: Christian Brauner
Cc: Jan Kara, linux-fsdevel, linux-block, Jens Axboe,
Christoph Hellwig, Kees Cook, syzkaller, Alexander Popov,
linux-xfs, Dmitry Vyukov
On Mon 06-11-23 15:47:54, Christian Brauner wrote:
> On Wed, Nov 01, 2023 at 06:43:08PM +0100, Jan Kara wrote:
> > Writing to mounted devices is dangerous and can lead to filesystem
> > corruption as well as crashes. Furthermore syzbot comes with more and
> > more involved examples how to corrupt block device under a mounted
> > filesystem leading to kernel crashes and reports we can do nothing
> > about. Add tracking of writers to each block device and a kernel cmdline
> > argument which controls whether other writeable opens to block devices
> > open with BLK_OPEN_RESTRICT_WRITES flag are allowed. We will make
> > filesystems use this flag for used devices.
> >
> > Note that this effectively only prevents modification of the particular
> > block device's page cache by other writers. The actual device content
> > can still be modified by other means - e.g. by issuing direct scsi
> > commands, by doing writes through devices lower in the storage stack
> > (e.g. in case loop devices, DM, or MD are involved) etc. But blocking
> > direct modifications of the block device page cache is enough to give
> > filesystems a chance to perform data validation when loading data from
> > the underlying storage and thus prevent kernel crashes.
> >
> > Syzbot can use this cmdline argument option to avoid uninteresting
> > crashes. Also users whose userspace setup does not need writing to
> > mounted block devices can set this option for hardening.
> >
> > Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
>
> A few minor tweaks I would do in-tree. Please see below.
> I know it's mostly stylistic that's why I would do it so there's no
> resend dance for non-technical reasons.
Whatever works best for you. I agree with the changes but please see my
comments below for some fixes.
> > block/Kconfig | 20 +++++++++++++
> > block/bdev.c | 62 ++++++++++++++++++++++++++++++++++++++-
> > include/linux/blk_types.h | 1 +
> > include/linux/blkdev.h | 2 ++
> > 4 files changed, 84 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/Kconfig b/block/Kconfig
> > index f1364d1c0d93..ca04b657e058 100644
> > --- a/block/Kconfig
> > +++ b/block/Kconfig
> > @@ -78,6 +78,26 @@ config BLK_DEV_INTEGRITY_T10
> > select CRC_T10DIF
> > select CRC64_ROCKSOFT
> >
> > +config BLK_DEV_WRITE_MOUNTED
> > + bool "Allow writing to mounted block devices"
> > + default y
>
> Let's hope that this can become the default one day.
Yes, I'd hope as well but we need some tooling work (util-linux, e2fsprogs)
before that can happen.
> > +static void bdev_unblock_writes(struct block_device *bdev)
> > +{
> > + bdev->bd_writers = 0;
> > +}
> > +
> > +static bool blkdev_open_compatible(struct block_device *bdev, blk_mode_t mode)
>
> I would like to mirror our may_{open,create}() routines here and call
> this:
>
> bdev_may_open()
>
> This is a well-known vfs pattern and also easy to understand for block
> devs as well.
Sure.
> > @@ -800,12 +834,21 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
> > goto abort_claiming;
> > if (!try_module_get(disk->fops->owner))
> > goto abort_claiming;
> > + ret = -EBUSY;
> > + if (!blkdev_open_compatible(bdev, mode))
> > + goto abort_claiming;
> > if (bdev_is_partition(bdev))
> > ret = blkdev_get_part(bdev, mode);
> > else
> > ret = blkdev_get_whole(bdev, mode);
> > if (ret)
> > goto put_module;
> > + if (!bdev_allow_write_mounted) {
> > + if (mode & BLK_OPEN_RESTRICT_WRITES)
> > + bdev_block_writes(bdev);
> > + else if (mode & BLK_OPEN_WRITE)
> > + bdev->bd_writers++;
> > + }
>
> I would like to move this to a tiny helper for clarity:
>
> static void bdev_claim_write_access(struct block_device *bdev)
> {
> if (!bdev_allow_write_mounted)
This should be the other way around.
> return;
>
> /* Claim exclusive or shared write access to the block device. */
> if (mode & BLK_OPEN_RESTRICT_WRITES)
> bdev_block_writes(bdev);
> else if (mode & BLK_OPEN_WRITE)
> bdev->bd_writers++;
> }
>
> > if (holder) {
> > bd_finish_claiming(bdev, holder, hops);
> >
> > @@ -901,6 +944,14 @@ void bdev_release(struct bdev_handle *handle)
> > sync_blockdev(bdev);
> >
> > mutex_lock(&disk->open_mutex);
> > + if (!bdev_allow_write_mounted) {
> > + /* The exclusive opener was blocking writes? Unblock them. */
> > + if (handle->mode & BLK_OPEN_RESTRICT_WRITES)
> > + bdev_unblock_writes(bdev);
> > + else if (handle->mode & BLK_OPEN_WRITE)
> > + bdev->bd_writers--;
> > + }
>
> static void bdev_yield_write_access(struct block_device *bdev)
> {
> if (!bdev_allow_write_mounted)
And this as well.
> return;
>
> /* Yield exclusive or shared write access. */
> if (handle->mode & BLK_OPEN_RESTRICT_WRITES)
> bdev_unblock_writes(bdev);
> else if (handle->mode & BLK_OPEN_WRITE)
> bdev->bd_writers--;
> }
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/7] block: Add config option to not allow writing to mounted devices
2023-11-06 15:18 ` Jan Kara
@ 2023-11-06 15:57 ` Christian Brauner
0 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2023-11-06 15:57 UTC (permalink / raw)
To: Jan Kara
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov
> > Let's hope that this can become the default one day.
>
> Yes, I'd hope as well but we need some tooling work (util-linux, e2fsprogs)
> before that can happen.
Yeah, absolutely. I'm moderately confident we'll have fairly quick
adoption of this though.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/7] bcachefs: Convert to bdev_open_by_path()
2023-11-01 17:43 ` [PATCH 1/7] bcachefs: Convert to bdev_open_by_path() Jan Kara
2023-11-01 19:01 ` Brian Foster
2023-11-02 1:09 ` Kent Overstreet
@ 2023-11-07 9:28 ` Christian Brauner
2 siblings, 0 replies; 23+ messages in thread
From: Christian Brauner @ 2023-11-07 9:28 UTC (permalink / raw)
To: Jan Kara
Cc: Christian Brauner, linux-fsdevel, linux-block, Jens Axboe,
Christoph Hellwig, syzkaller, Alexander Popov, linux-xfs,
Dmitry Vyukov, Kent Overstreet, Brian Foster, linux-bcachefs,
Kees Cook
On Wed, 1 Nov 2023 18:43:06 +0100, Jan Kara wrote:
> Convert bcachefs to use bdev_open_by_path() and pass the handle around.
>
>
Applied to the vfs.super branch of the vfs/vfs.git tree.
Patches in the vfs.super branch should appear in linux-next soon.
Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.
It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.
Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.
tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.super
[1/7] bcachefs: Convert to bdev_open_by_path()
https://git.kernel.org/vfs/vfs/c/8e897399352c
[2/7] block: Remove blkdev_get_by_*() functions
https://git.kernel.org/vfs/vfs/c/1dc2789bf2d9
[3/7] block: Add config option to not allow writing to mounted devices
https://git.kernel.org/vfs/vfs/c/708e8ecda49e
[4/7] btrfs: Do not restrict writes to btrfs devices
https://git.kernel.org/vfs/vfs/c/b6b2f4843264
[5/7] fs: Block writes to mounted block devices
https://git.kernel.org/vfs/vfs/c/48ce483465bb
[6/7] xfs: Block writes to log device
https://git.kernel.org/vfs/vfs/c/dae1e956882c
[7/7] ext4: Block writes to journal device
https://git.kernel.org/vfs/vfs/c/a8a97da12393
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices
2023-11-01 17:43 [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jan Kara
` (6 preceding siblings ...)
2023-11-01 17:43 ` [PATCH 7/7] ext4: Block writes to journal device Jan Kara
@ 2023-11-07 15:32 ` Jens Axboe
7 siblings, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2023-11-07 15:32 UTC (permalink / raw)
To: Jan Kara, Christian Brauner
Cc: linux-fsdevel, linux-block, Christoph Hellwig, Kees Cook,
syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov
On 11/1/23 11:43 AM, Jan Kara wrote:
> Hello!
>
> This is the third version of the patches to add config option to not allow
> writing to mounted block devices. The new API for block device opening has been
> merged so hopefully this patchset can progress towards being merged. We face
> some issues with necessary btrfs changes (review bandwidth) so this series is
> modified to enable restricting of writes for all other filesystems. Once btrfs
> can merge necessary device scanning changes, enabling the support for
> restricting writes for it is trivial.
>
> For motivation why restricting writes to mounted block devices is interesting
> see patch 3/7. I've been testing the patches more extensively and I've found
> couple of things that get broken by disallowing writes to mounted block
> devices:
>
> 1) "mount -o loop" gets broken because util-linux keeps the loop device open
> read-write when attempting to mount it. Hopefully fixable within util-linux.
> 2) resize2fs online resizing gets broken because it tries to open the block
> device read-write only to call resizing ioctl. Trivial to fix within
> e2fsprogs.
> 3) Online e2label will break because it directly writes to the ext2/3/4
> superblock while the FS is mounted to set the new label. Ext4 driver
> will have to implement the SETFSLABEL ioctl() and e2label will have
> to use it, matching what happens for online labelling of btrfs and
> xfs.
>
> Likely there will be other breakage I didn't find yet but overall the breakage
> looks minor enough that the option might be useful. Definitely good enough
> for syzbot fuzzing and likely good enough for hardening of systems with
> more tightened security.
For the series:
Reviewed-by: Jens Axboe <axboe@kernel.dk>
--
Jens Axboe
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/7] block: Add config option to not allow writing to mounted devices
2023-11-01 17:43 ` [PATCH 3/7] block: Add config option to not allow writing to mounted devices Jan Kara
2023-11-06 14:47 ` Christian Brauner
@ 2023-12-20 3:26 ` Li Lingfeng
2023-12-21 12:11 ` Jan Kara
1 sibling, 1 reply; 23+ messages in thread
From: Li Lingfeng @ 2023-12-20 3:26 UTC (permalink / raw)
To: Jan Kara, Christian Brauner
Cc: linux-fsdevel, linux-block, Jens Axboe, Christoph Hellwig,
Kees Cook, syzkaller, Alexander Popov, linux-xfs, Dmitry Vyukov,
yangerkun, yukuai (C), zhangyi (F)
在 2023/11/2 1:43, Jan Kara 写道:
> Writing to mounted devices is dangerous and can lead to filesystem
> corruption as well as crashes. Furthermore syzbot comes with more and
> more involved examples how to corrupt block device under a mounted
> filesystem leading to kernel crashes and reports we can do nothing
> about. Add tracking of writers to each block device and a kernel cmdline
> argument which controls whether other writeable opens to block devices
> open with BLK_OPEN_RESTRICT_WRITES flag are allowed. We will make
> filesystems use this flag for used devices.
>
> Note that this effectively only prevents modification of the particular
> block device's page cache by other writers. The actual device content
> can still be modified by other means - e.g. by issuing direct scsi
> commands, by doing writes through devices lower in the storage stack
> (e.g. in case loop devices, DM, or MD are involved) etc. But blocking
> direct modifications of the block device page cache is enough to give
> filesystems a chance to perform data validation when loading data from
> the underlying storage and thus prevent kernel crashes.
>
> Syzbot can use this cmdline argument option to avoid uninteresting
> crashes. Also users whose userspace setup does not need writing to
> mounted block devices can set this option for hardening.
>
> Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> block/Kconfig | 20 +++++++++++++
> block/bdev.c | 62 ++++++++++++++++++++++++++++++++++++++-
> include/linux/blk_types.h | 1 +
> include/linux/blkdev.h | 2 ++
> 4 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index f1364d1c0d93..ca04b657e058 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -78,6 +78,26 @@ config BLK_DEV_INTEGRITY_T10
> select CRC_T10DIF
> select CRC64_ROCKSOFT
>
> +config BLK_DEV_WRITE_MOUNTED
> + bool "Allow writing to mounted block devices"
> + default y
> + help
> + When a block device is mounted, writing to its buffer cache is very
> + likely going to cause filesystem corruption. It is also rather easy to
> + crash the kernel in this way since the filesystem has no practical way
> + of detecting these writes to buffer cache and verifying its metadata
> + integrity. However there are some setups that need this capability
> + like running fsck on read-only mounted root device, modifying some
> + features on mounted ext4 filesystem, and similar. If you say N, the
> + kernel will prevent processes from writing to block devices that are
> + mounted by filesystems which provides some more protection from runaway
> + privileged processes and generally makes it much harder to crash
> + filesystem drivers. Note however that this does not prevent
> + underlying device(s) from being modified by other means, e.g. by
> + directly submitting SCSI commands or through access to lower layers of
> + storage stack. If in doubt, say Y. The configuration can be overridden
> + with the bdev_allow_write_mounted boot option.
> +
> config BLK_DEV_ZONED
> bool "Zoned block device support"
> select MQ_IOSCHED_DEADLINE
> diff --git a/block/bdev.c b/block/bdev.c
> index 3f27939e02c6..d75dd7dd2b31 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -30,6 +30,9 @@
> #include "../fs/internal.h"
> #include "blk.h"
>
> +/* Should we allow writing to mounted block devices? */
> +static bool bdev_allow_write_mounted = IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED);
> +
> struct bdev_inode {
> struct block_device bdev;
> struct inode vfs_inode;
> @@ -730,7 +733,34 @@ void blkdev_put_no_open(struct block_device *bdev)
> {
> put_device(&bdev->bd_device);
> }
> -
> +
> +static bool bdev_writes_blocked(struct block_device *bdev)
> +{
> + return bdev->bd_writers == -1;
> +}
> +
> +static void bdev_block_writes(struct block_device *bdev)
> +{
> + bdev->bd_writers = -1;
> +}
> +
> +static void bdev_unblock_writes(struct block_device *bdev)
> +{
> + bdev->bd_writers = 0;
> +}
> +
> +static bool blkdev_open_compatible(struct block_device *bdev, blk_mode_t mode)
> +{
> + if (!bdev_allow_write_mounted) {
> + /* Writes blocked? */
> + if (mode & BLK_OPEN_WRITE && bdev_writes_blocked(bdev))
> + return false;
> + if (mode & BLK_OPEN_RESTRICT_WRITES && bdev->bd_writers > 0)
> + return false;
> + }
> + return true;
> +}
> +
> /**
> * bdev_open_by_dev - open a block device by device number
> * @dev: device number of block device to open
> @@ -773,6 +803,10 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
> if (ret)
> goto free_handle;
>
> + /* Blocking writes requires exclusive opener */
> + if (mode & BLK_OPEN_RESTRICT_WRITES && !holder)
> + return ERR_PTR(-EINVAL);
> +
> bdev = blkdev_get_no_open(dev);
> if (!bdev) {
> ret = -ENXIO;
> @@ -800,12 +834,21 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
> goto abort_claiming;
> if (!try_module_get(disk->fops->owner))
> goto abort_claiming;
> + ret = -EBUSY;
> + if (!blkdev_open_compatible(bdev, mode))
> + goto abort_claiming;
> if (bdev_is_partition(bdev))
> ret = blkdev_get_part(bdev, mode);
> else
> ret = blkdev_get_whole(bdev, mode);
> if (ret)
> goto put_module;
> + if (!bdev_allow_write_mounted) {
> + if (mode & BLK_OPEN_RESTRICT_WRITES)
> + bdev_block_writes(bdev);
Hi, Jan
When a partition device is mounted, I think maybe it's better to block
writes on the whole device at same time.
Allowing the whole device to be opened for writing when mounting a
partition device, did you have any special considerations before?
Thanks.
> + else if (mode & BLK_OPEN_WRITE)
> + bdev->bd_writers++;
> + }
> if (holder) {
> bd_finish_claiming(bdev, holder, hops);
>
> @@ -901,6 +944,14 @@ void bdev_release(struct bdev_handle *handle)
> sync_blockdev(bdev);
>
> mutex_lock(&disk->open_mutex);
> + if (!bdev_allow_write_mounted) {
> + /* The exclusive opener was blocking writes? Unblock them. */
> + if (handle->mode & BLK_OPEN_RESTRICT_WRITES)
> + bdev_unblock_writes(bdev);
> + else if (handle->mode & BLK_OPEN_WRITE)
> + bdev->bd_writers--;
> + }
> +
> if (handle->holder)
> bd_end_claim(bdev, handle->holder);
>
> @@ -1069,3 +1120,12 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
>
> blkdev_put_no_open(bdev);
> }
> +
> +static int __init setup_bdev_allow_write_mounted(char *str)
> +{
> + if (kstrtobool(str, &bdev_allow_write_mounted))
> + pr_warn("Invalid option string for bdev_allow_write_mounted:"
> + " '%s'\n", str);
> + return 1;
> +}
> +__setup("bdev_allow_write_mounted=", setup_bdev_allow_write_mounted);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 749203277fee..52e264d5a830 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -66,6 +66,7 @@ struct block_device {
> #ifdef CONFIG_FAIL_MAKE_REQUEST
> bool bd_make_it_fail;
> #endif
> + int bd_writers;
> /*
> * keep this out-of-line as it's both big and not needed in the fast
> * path
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 7afc10315dd5..0e0c0186aa32 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -124,6 +124,8 @@ typedef unsigned int __bitwise blk_mode_t;
> #define BLK_OPEN_NDELAY ((__force blk_mode_t)(1 << 3))
> /* open for "writes" only for ioctls (specialy hack for floppy.c) */
> #define BLK_OPEN_WRITE_IOCTL ((__force blk_mode_t)(1 << 4))
> +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */
> +#define BLK_OPEN_RESTRICT_WRITES ((__force blk_mode_t)(1 << 5))
>
> struct gendisk {
> /*
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 3/7] block: Add config option to not allow writing to mounted devices
2023-12-20 3:26 ` Li Lingfeng
@ 2023-12-21 12:11 ` Jan Kara
0 siblings, 0 replies; 23+ messages in thread
From: Jan Kara @ 2023-12-21 12:11 UTC (permalink / raw)
To: Li Lingfeng
Cc: Jan Kara, Christian Brauner, linux-fsdevel, linux-block,
Jens Axboe, Christoph Hellwig, Kees Cook, syzkaller,
Alexander Popov, linux-xfs, Dmitry Vyukov, yangerkun, yukuai (C),
zhangyi (F)
On Wed 20-12-23 11:26:38, Li Lingfeng wrote:
> > @@ -773,6 +803,10 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
> > if (ret)
> > goto free_handle;
> > + /* Blocking writes requires exclusive opener */
> > + if (mode & BLK_OPEN_RESTRICT_WRITES && !holder)
> > + return ERR_PTR(-EINVAL);
> > +
> > bdev = blkdev_get_no_open(dev);
> > if (!bdev) {
> > ret = -ENXIO;
> > @@ -800,12 +834,21 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
> > goto abort_claiming;
> > if (!try_module_get(disk->fops->owner))
> > goto abort_claiming;
> > + ret = -EBUSY;
> > + if (!blkdev_open_compatible(bdev, mode))
> > + goto abort_claiming;
> > if (bdev_is_partition(bdev))
> > ret = blkdev_get_part(bdev, mode);
> > else
> > ret = blkdev_get_whole(bdev, mode);
> > if (ret)
> > goto put_module;
> > + if (!bdev_allow_write_mounted) {
> > + if (mode & BLK_OPEN_RESTRICT_WRITES)
> > + bdev_block_writes(bdev);
>
> When a partition device is mounted, I think maybe it's better to block
> writes on the whole device at same time.
>
> Allowing the whole device to be opened for writing when mounting a partition
> device, did you have any special considerations before?
Yes, we were considering this. But the truth is that:
a) It is *very* hard to stop all the possibilities of corrupting data on
the device - e.g. with device mapper / loop device / malicious partition
table you can construct many block devices pointing to the same storage,
you can use e.g. SG_IO to corrupt on disk data etc. So special-casing
partitions is providing little additional benefit.
b) It is difficult to then correctly handle valid cases of multiple
writeably mounted partitions on the same device - you'd need to track used
block numbers for each device which gets difficult in presence of device
mapper etc.
c) To stop filesystem crashes, it is enough to stop modifications of buffer
cache of that one block device. Because filesystems have to validate data
they are loading into buffer cache anyway to handle faulty device, fs
corruption etc.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2023-12-21 12:11 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-01 17:43 [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jan Kara
2023-11-01 17:43 ` [PATCH 1/7] bcachefs: Convert to bdev_open_by_path() Jan Kara
2023-11-01 19:01 ` Brian Foster
2023-11-02 1:09 ` Kent Overstreet
2023-11-02 9:55 ` Jan Kara
2023-11-02 11:58 ` Brian Foster
2023-11-02 1:09 ` Kent Overstreet
2023-11-07 9:28 ` Christian Brauner
2023-11-01 17:43 ` [PATCH 2/7] block: Remove blkdev_get_by_*() functions Jan Kara
2023-11-06 14:10 ` Christian Brauner
2023-11-01 17:43 ` [PATCH 3/7] block: Add config option to not allow writing to mounted devices Jan Kara
2023-11-06 14:47 ` Christian Brauner
2023-11-06 15:18 ` Jan Kara
2023-11-06 15:57 ` Christian Brauner
2023-12-20 3:26 ` Li Lingfeng
2023-12-21 12:11 ` Jan Kara
2023-11-01 17:43 ` [PATCH 4/7] btrfs: Do not restrict writes to btrfs devices Jan Kara
2023-11-02 17:13 ` David Sterba
2023-11-01 17:43 ` [PATCH 5/7] fs: Block writes to mounted block devices Jan Kara
2023-11-06 14:32 ` Christian Brauner
2023-11-01 17:43 ` [PATCH 6/7] xfs: Block writes to log device Jan Kara
2023-11-01 17:43 ` [PATCH 7/7] ext4: Block writes to journal device Jan Kara
2023-11-07 15:32 ` [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).