linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Li Lingfeng <lilingfeng@huaweicloud.com>
To: Jan Kara <jack@suse.cz>, Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	Kees Cook <keescook@google.com>,
	syzkaller <syzkaller@googlegroups.com>,
	Alexander Popov <alex.popov@linux.com>,
	linux-xfs@vger.kernel.org, Dmitry Vyukov <dvyukov@google.com>,
	yangerkun <yangerkun@huawei.com>,
	"yukuai (C)" <yukuai3@huawei.com>,
	"zhangyi (F)" <yi.zhang@huawei.com>
Subject: Re: [PATCH 3/7] block: Add config option to not allow writing to mounted devices
Date: Wed, 20 Dec 2023 11:26:38 +0800	[thread overview]
Message-ID: <64fdffaa-9a8f-df34-42e7-ccca81e95c3c@huaweicloud.com> (raw)
In-Reply-To: <20231101174325.10596-3-jack@suse.cz>


在 2023/11/2 1:43, Jan Kara 写道:
> Writing to mounted devices is dangerous and can lead to filesystem
> corruption as well as crashes. Furthermore syzbot comes with more and
> more involved examples how to corrupt block device under a mounted
> filesystem leading to kernel crashes and reports we can do nothing
> about. Add tracking of writers to each block device and a kernel cmdline
> argument which controls whether other writeable opens to block devices
> open with BLK_OPEN_RESTRICT_WRITES flag are allowed. We will make
> filesystems use this flag for used devices.
>
> Note that this effectively only prevents modification of the particular
> block device's page cache by other writers. The actual device content
> can still be modified by other means - e.g. by issuing direct scsi
> commands, by doing writes through devices lower in the storage stack
> (e.g. in case loop devices, DM, or MD are involved) etc. But blocking
> direct modifications of the block device page cache is enough to give
> filesystems a chance to perform data validation when loading data from
> the underlying storage and thus prevent kernel crashes.
>
> Syzbot can use this cmdline argument option to avoid uninteresting
> crashes. Also users whose userspace setup does not need writing to
> mounted block devices can set this option for hardening.
>
> Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>   block/Kconfig             | 20 +++++++++++++
>   block/bdev.c              | 62 ++++++++++++++++++++++++++++++++++++++-
>   include/linux/blk_types.h |  1 +
>   include/linux/blkdev.h    |  2 ++
>   4 files changed, 84 insertions(+), 1 deletion(-)
>
> diff --git a/block/Kconfig b/block/Kconfig
> index f1364d1c0d93..ca04b657e058 100644
> --- a/block/Kconfig
> +++ b/block/Kconfig
> @@ -78,6 +78,26 @@ config BLK_DEV_INTEGRITY_T10
>   	select CRC_T10DIF
>   	select CRC64_ROCKSOFT
>   
> +config BLK_DEV_WRITE_MOUNTED
> +	bool "Allow writing to mounted block devices"
> +	default y
> +	help
> +	When a block device is mounted, writing to its buffer cache is very
> +	likely going to cause filesystem corruption. It is also rather easy to
> +	crash the kernel in this way since the filesystem has no practical way
> +	of detecting these writes to buffer cache and verifying its metadata
> +	integrity. However there are some setups that need this capability
> +	like running fsck on read-only mounted root device, modifying some
> +	features on mounted ext4 filesystem, and similar. If you say N, the
> +	kernel will prevent processes from writing to block devices that are
> +	mounted by filesystems which provides some more protection from runaway
> +	privileged processes and generally makes it much harder to crash
> +	filesystem drivers. Note however that this does not prevent
> +	underlying device(s) from being modified by other means, e.g. by
> +	directly submitting SCSI commands or through access to lower layers of
> +	storage stack. If in doubt, say Y. The configuration can be overridden
> +	with the bdev_allow_write_mounted boot option.
> +
>   config BLK_DEV_ZONED
>   	bool "Zoned block device support"
>   	select MQ_IOSCHED_DEADLINE
> diff --git a/block/bdev.c b/block/bdev.c
> index 3f27939e02c6..d75dd7dd2b31 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -30,6 +30,9 @@
>   #include "../fs/internal.h"
>   #include "blk.h"
>   
> +/* Should we allow writing to mounted block devices? */
> +static bool bdev_allow_write_mounted = IS_ENABLED(CONFIG_BLK_DEV_WRITE_MOUNTED);
> +
>   struct bdev_inode {
>   	struct block_device bdev;
>   	struct inode vfs_inode;
> @@ -730,7 +733,34 @@ void blkdev_put_no_open(struct block_device *bdev)
>   {
>   	put_device(&bdev->bd_device);
>   }
> -	
> +
> +static bool bdev_writes_blocked(struct block_device *bdev)
> +{
> +	return bdev->bd_writers == -1;
> +}
> +
> +static void bdev_block_writes(struct block_device *bdev)
> +{
> +	bdev->bd_writers = -1;
> +}
> +
> +static void bdev_unblock_writes(struct block_device *bdev)
> +{
> +	bdev->bd_writers = 0;
> +}
> +
> +static bool blkdev_open_compatible(struct block_device *bdev, blk_mode_t mode)
> +{
> +	if (!bdev_allow_write_mounted) {
> +		/* Writes blocked? */
> +		if (mode & BLK_OPEN_WRITE && bdev_writes_blocked(bdev))
> +			return false;
> +		if (mode & BLK_OPEN_RESTRICT_WRITES && bdev->bd_writers > 0)
> +			return false;
> +	}
> +	return true;
> +}
> +
>   /**
>    * bdev_open_by_dev - open a block device by device number
>    * @dev: device number of block device to open
> @@ -773,6 +803,10 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
>   	if (ret)
>   		goto free_handle;
>   
> +	/* Blocking writes requires exclusive opener */
> +	if (mode & BLK_OPEN_RESTRICT_WRITES && !holder)
> +		return ERR_PTR(-EINVAL);
> +
>   	bdev = blkdev_get_no_open(dev);
>   	if (!bdev) {
>   		ret = -ENXIO;
> @@ -800,12 +834,21 @@ struct bdev_handle *bdev_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
>   		goto abort_claiming;
>   	if (!try_module_get(disk->fops->owner))
>   		goto abort_claiming;
> +	ret = -EBUSY;
> +	if (!blkdev_open_compatible(bdev, mode))
> +		goto abort_claiming;
>   	if (bdev_is_partition(bdev))
>   		ret = blkdev_get_part(bdev, mode);
>   	else
>   		ret = blkdev_get_whole(bdev, mode);
>   	if (ret)
>   		goto put_module;
> +	if (!bdev_allow_write_mounted) {
> +		if (mode & BLK_OPEN_RESTRICT_WRITES)
> +			bdev_block_writes(bdev);

Hi, Jan

When a partition device is mounted, I think maybe it's better to block 
writes on the whole device at same time.

Allowing the whole device to be opened for writing when mounting a 
partition device, did you have any special considerations before?

Thanks.

> +		else if (mode & BLK_OPEN_WRITE)
> +			bdev->bd_writers++;
> +	}
>   	if (holder) {
>   		bd_finish_claiming(bdev, holder, hops);
>   
> @@ -901,6 +944,14 @@ void bdev_release(struct bdev_handle *handle)
>   		sync_blockdev(bdev);
>   
>   	mutex_lock(&disk->open_mutex);
> +	if (!bdev_allow_write_mounted) {
> +		/* The exclusive opener was blocking writes? Unblock them. */
> +		if (handle->mode & BLK_OPEN_RESTRICT_WRITES)
> +			bdev_unblock_writes(bdev);
> +		else if (handle->mode & BLK_OPEN_WRITE)
> +			bdev->bd_writers--;
> +	}
> +
>   	if (handle->holder)
>   		bd_end_claim(bdev, handle->holder);
>   
> @@ -1069,3 +1120,12 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
>   
>   	blkdev_put_no_open(bdev);
>   }
> +
> +static int __init setup_bdev_allow_write_mounted(char *str)
> +{
> +	if (kstrtobool(str, &bdev_allow_write_mounted))
> +		pr_warn("Invalid option string for bdev_allow_write_mounted:"
> +			" '%s'\n", str);
> +	return 1;
> +}
> +__setup("bdev_allow_write_mounted=", setup_bdev_allow_write_mounted);
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 749203277fee..52e264d5a830 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -66,6 +66,7 @@ struct block_device {
>   #ifdef CONFIG_FAIL_MAKE_REQUEST
>   	bool			bd_make_it_fail;
>   #endif
> +	int			bd_writers;
>   	/*
>   	 * keep this out-of-line as it's both big and not needed in the fast
>   	 * path
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 7afc10315dd5..0e0c0186aa32 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -124,6 +124,8 @@ typedef unsigned int __bitwise blk_mode_t;
>   #define BLK_OPEN_NDELAY		((__force blk_mode_t)(1 << 3))
>   /* open for "writes" only for ioctls (specialy hack for floppy.c) */
>   #define BLK_OPEN_WRITE_IOCTL	((__force blk_mode_t)(1 << 4))
> +/* open is exclusive wrt all other BLK_OPEN_WRITE opens to the device */
> +#define BLK_OPEN_RESTRICT_WRITES	((__force blk_mode_t)(1 << 5))
>   
>   struct gendisk {
>   	/*


  parent reply	other threads:[~2023-12-20  3:26 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-01 17:43 [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jan Kara
2023-11-01 17:43 ` [PATCH 1/7] bcachefs: Convert to bdev_open_by_path() Jan Kara
2023-11-01 19:01   ` Brian Foster
2023-11-02  1:09     ` Kent Overstreet
2023-11-02  9:55     ` Jan Kara
2023-11-02 11:58       ` Brian Foster
2023-11-02  1:09   ` Kent Overstreet
2023-11-07  9:28   ` Christian Brauner
2023-11-01 17:43 ` [PATCH 2/7] block: Remove blkdev_get_by_*() functions Jan Kara
2023-11-06 14:10   ` Christian Brauner
2023-11-01 17:43 ` [PATCH 3/7] block: Add config option to not allow writing to mounted devices Jan Kara
2023-11-06 14:47   ` Christian Brauner
2023-11-06 15:18     ` Jan Kara
2023-11-06 15:57       ` Christian Brauner
2023-12-20  3:26   ` Li Lingfeng [this message]
2023-12-21 12:11     ` Jan Kara
2023-11-01 17:43 ` [PATCH 4/7] btrfs: Do not restrict writes to btrfs devices Jan Kara
2023-11-02 17:13   ` David Sterba
2023-11-01 17:43 ` [PATCH 5/7] fs: Block writes to mounted block devices Jan Kara
2023-11-06 14:32   ` Christian Brauner
2023-11-01 17:43 ` [PATCH 6/7] xfs: Block writes to log device Jan Kara
2023-11-01 17:43 ` [PATCH 7/7] ext4: Block writes to journal device Jan Kara
2023-11-07 15:32 ` [PATCH 0/7 v3] block: Add config option to not allow writing to mounted devices Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=64fdffaa-9a8f-df34-42e7-ccca81e95c3c@huaweicloud.com \
    --to=lilingfeng@huaweicloud.com \
    --cc=alex.popov@linux.com \
    --cc=axboe@kernel.dk \
    --cc=brauner@kernel.org \
    --cc=dvyukov@google.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=keescook@google.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=syzkaller@googlegroups.com \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).