From: Jan Kara <jack@suse.cz>
To: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>, Jan Kara <jack@suse.cz>,
Jens Axboe <axboe@kernel.dk>,
Matthew Wilcox <willy@infradead.org>,
linux-block@vger.kernel.org
Subject: Re: [PATCH 1/2] block: handle BLK_OPEN_RESTRICT_WRITES correctly
Date: Tue, 26 Mar 2024 13:57:15 +0100 [thread overview]
Message-ID: <20240326125715.i23eo6sh5223tdmc@quack3> (raw)
In-Reply-To: <20240323-zielbereich-mittragen-6fdf14876c3e@brauner>
On Sat 23-03-24 17:11:19, Christian Brauner wrote:
> Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By
> default this option is set. When it is set the long-standing behavior
> of being able to write to mounted block devices is enabled.
>
> But in order to guard against unintended corruption by writing to the
> block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned
> off. In that case it isn't possible to write to mounted block devices
> anymore.
>
> A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES
> which disallows concurrent BLK_OPEN_WRITE access. When we still had the
> bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because
> the mode was passed around. Since we managed to get rid of the bdev
> handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based
> on whether the file was opened writable and writes to that block device
> are blocked. That logic doesn't work because we do allow
> BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE.
>
> So fix the detection logic. Use O_EXCL as an indicator that
> BLK_OPEN_RESTRICT_WRITES has been requested. We do the exact same thing
> for pidfds where O_EXCL means that this is a pidfd that refers to a
> thread. For userspace open paths O_EXCL will never be retained but for
> internal opens where we open files that are never installed into a file
> descriptor table this is fine.
>
> Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot
> directly be raised by userspace. It is implicitly raised during
> mounting.
>
> Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and
> unset.
>
> Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access")
> Reported-by: Matthew Wilcox <willy@infradead.org>
> Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@casper.infradead.org
> Signed-off-by: Christian Brauner <brauner@kernel.org>
The fix looks correct but admittedly it looks a bit hacky. I'd prefer
storing the needed information in some other flag, preferably one that does
not already have a special meaning with block devices. But FMODE_ space is
exhausted and don't see another easy solution. So I guess:
Reviewed-by: Jan Kara <jack@suse.cz>
Thanks for looking into this!
Honza
> ---
> block/bdev.c | 20 +++++++++++++-------
> 1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/block/bdev.c b/block/bdev.c
> index 7a5f611c3d2e..f819f3086905 100644
> --- a/block/bdev.c
> +++ b/block/bdev.c
> @@ -821,13 +821,12 @@ static void bdev_yield_write_access(struct file *bdev_file)
> return;
>
> bdev = file_bdev(bdev_file);
> - /* Yield exclusive or shared write access. */
> - if (bdev_file->f_mode & FMODE_WRITE) {
> - if (bdev_writes_blocked(bdev))
> - bdev_unblock_writes(bdev);
> - else
> - bdev->bd_writers--;
> - }
> +
> + /* O_EXCL is only set for internal BLK_OPEN_RESTRICT_WRITES. */
> + if (bdev_file->f_flags & O_EXCL)
> + bdev_unblock_writes(bdev);
> + else if (bdev_file->f_mode & FMODE_WRITE)
> + bdev->bd_writers--;
> }
>
> /**
> @@ -946,6 +945,13 @@ static unsigned blk_to_file_flags(blk_mode_t mode)
> else
> WARN_ON_ONCE(true);
>
> + /*
> + * BLK_OPEN_RESTRICT_WRITES is never set from userspace and
> + * O_EXCL is stripped from userspace.
> + */
> + if (mode & BLK_OPEN_RESTRICT_WRITES)
> + flags |= O_EXCL;
> +
> if (mode & BLK_OPEN_NDELAY)
> flags |= O_NDELAY;
>
> --
> 2.43.0
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2024-03-26 12:57 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-23 14:54 [PATCH] block: handle BLK_OPEN_RESTRICT_WRITES correctly Christian Brauner
2024-03-23 15:59 ` Christian Brauner
2024-03-23 16:11 ` [PATCH 1/2] " Christian Brauner
2024-03-23 16:11 ` [PATCH 2/2] [RFC]: block: count BLK_OPEN_RESTRICT_WRITES openers Christian Brauner
2024-03-26 13:24 ` Jan Kara
2024-03-25 11:51 ` [PATCH 1/2] block: handle BLK_OPEN_RESTRICT_WRITES correctly Yu Kuai
2024-03-25 12:04 ` Christian Brauner
2024-03-25 13:52 ` Yu Kuai
2024-03-25 13:54 ` Christian Brauner
2024-03-26 1:32 ` Yu Kuai
2024-03-26 12:57 ` Jan Kara [this message]
2024-03-26 13:17 ` Christian Brauner
2024-03-26 13:31 ` Jan Kara
2024-03-26 15:46 ` [PATCH v2] " Christian Brauner
2024-03-26 17:25 ` Christoph Hellwig
2024-03-26 22:42 ` Jan Kara
2024-03-26 15:47 ` [PATCH 1/2] " Christian Brauner
2024-03-27 12:01 ` Christian Brauner
2024-03-29 4:56 ` Matthew Wilcox
2024-03-29 12:10 ` Christian Brauner
2024-03-29 15:11 ` Christian Brauner
2024-03-29 15:24 ` Christian Brauner
2024-04-03 6:04 ` Christian Brauner
2024-04-03 19:22 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240326125715.i23eo6sh5223tdmc@quack3 \
--to=jack@suse.cz \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox