From: Al Viro <viro@zeniv.linux.org.uk>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Alasdair Kergon <agk@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Anna Schumaker <anna@kernel.org>, Chao Yu <chao@kernel.org>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
"Darrick J. Wong" <djwong@kernel.org>,
Dave Kleikamp <shaggy@kernel.org>,
David Sterba <dsterba@suse.com>,
dm-devel@redhat.com, drbd-dev@lists.linbit.com,
Gao Xiang <xiang@kernel.org>, Jack Wang <jinpu.wang@ionos.com>,
Jaegeuk Kim <jaegeuk@kernel.org>,
jfs-discussion@lists.sourceforge.net,
Joern Engel <joern@lazybastard.org>,
Joseph Qi <joseph.qi@linux.alibaba.com>,
Kent Overstreet <kent.overstreet@gmail.com>,
linux-bcache@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-erofs@lists.ozlabs.org, linux-ext4@vger.kernel.org,
linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org,
linux-mtd@lists.infradead.org, linux-nfs@vger.kernel.org,
linux-nilfs@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-pm@vger.kernel.org, linux-raid@vger.kernel.org,
linux-s390@vger.kernel.org, linux-scsi@vger.kernel.org,
linux-xfs@vger.kernel.org,
"Md. Haris Iqbal" <haris.iqbal@ionos.com>,
Mike Snitzer <snitzer@kernel.org>,
Minchan Kim <minchan@kernel.org>,
ocfs2-devel@oss.oracle.com, reiserfs-devel@vger.kernel.org,
Sergey Senozhatsky <senozhatsky@chromium.org>,
Song Liu <song@kernel.org>, Sven Schnelle <svens@linux.ibm.com>,
target-devel@vger.kernel.org, Ted Tso <tytso@mit.edu>,
Trond Myklebust <trond.myklebust@hammerspace.com>,
xen-devel@lists.xenproject.org, Jens Axboe <axboe@kernel.dk>,
Christian Brauner <brauner@kernel.org>
Subject: Re: [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle
Date: Sat, 26 Aug 2023 03:28:52 +0100 [thread overview]
Message-ID: <20230826022852.GO3390869@ZenIV> (raw)
In-Reply-To: <20230825134756.o3wpq6bogndukn53@quack3>
On Fri, Aug 25, 2023 at 03:47:56PM +0200, Jan Kara wrote:
> I can see the appeal of not having to introduce the new bdev_handle type
> and just using struct file which unifies in-kernel and userspace block
> device opens. But I can see downsides too - the last fput() happening from
> task work makes me a bit nervous whether it will not break something
> somewhere with exclusive bdev opens. Getting from struct file to bdev is
> somewhat harder but I guess a helper like F_BDEV() would solve that just
> fine.
>
> So besides my last fput() worry about I think this could work and would be
> probably a bit nicer than what I have. But before going and redoing the whole
> series let me gather some more feedback so that we don't go back and forth.
> Christoph, Christian, Jens, any opinion?
Redoing is not an issue - it can be done on top of your series just
as well. Async behaviour of fput() might be, but... need to look
through the actual users; for a lot of them it's perfectly fine.
FWIW, from a cursory look there appears to be a missing primitive: take
an opened bdev (or bdev_handle, with your variant, or opened file if we
go that way eventually) and claim it.
I mean, look at claim_swapfile() for example:
p->bdev = blkdev_get_by_dev(inode->i_rdev,
FMODE_READ | FMODE_WRITE | FMODE_EXCL, p);
if (IS_ERR(p->bdev)) {
error = PTR_ERR(p->bdev);
p->bdev = NULL;
return error;
}
p->old_block_size = block_size(p->bdev);
error = set_blocksize(p->bdev, PAGE_SIZE);
if (error < 0)
return error;
we already have the file opened, and we keep it opened all the way until
the swapoff(2); here we have noticed that it's a block device and we
* open the fucker again (by device number), this time claiming
it with our swap_info_struct as holder, to be closed at swapoff(2) time
(just before we close the file)
* flip the block size to PAGE_SIZE, to be reverted at swapoff(2)
time That really looks like it ought to be
* take the opened file, see that it's a block device
* try to claim it with that holder
* on success, flip the block size
with close_filp() in the swapoff(2) (or failure exit path in swapon(2))
doing what it would've done for an O_EXCL opened block device.
The only difference from O_EXCL userland open is that here we would
end up with holder pointing not to struct file in question, but to our
swap_info_struct. It will do the right thing.
This extra open is entirely due to "well, we need to claim it and the
primitive that does that happens to be tied to opening"; feels rather
counter-intuitive.
For that matter, we could add an explicit "unclaim" primitive - might
be easier to follow. That would add another example where that could
be used - in blkdev_bszset() we have an opened block device (it's an
ioctl, after all), we want to change block size and we *really* don't
want to have that happen under a mounted filesystem. So if it's not
opened exclusive, we do a temporary exclusive open of own and act on
that instead. Might as well go for a temporary claim...
BTW, what happens if two threads call ioctl(fd, BLKBSZSET, &n)
for the same descriptor that happens to have been opened O_EXCL?
Without O_EXCL they would've been unable to claim the sucker at the same
time - the holder we are using is the address of a function argument,
i.e. something that points to kernel stack of the caller. Those would
conflict and we either get set_blocksize() calls fully serialized, or
one of the callers would eat -EBUSY. Not so in "opened with O_EXCL"
case - they can very well overlap and IIRC set_blocksize() does *not*
expect that kind of crap... It's all under CAP_SYS_ADMIN, so it's not
as if it was a meaningful security hole anyway, but it does look fishy.
next prev parent reply other threads:[~2023-08-26 2:30 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-11 11:04 [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle Jan Kara
2023-08-11 11:04 ` [PATCH 01/29] block: Provide bdev_open_* functions Jan Kara
2023-08-11 11:04 ` [PATCH 02/29] block: Use bdev_open_by_dev() in blkdev_open() Jan Kara
2023-08-11 12:25 ` Christoph Hellwig
2023-08-14 13:51 ` Jan Kara
2023-08-25 1:14 ` Al Viro
2023-08-11 11:04 ` [PATCH 03/29] block: Use bdev_open_by_dev() in disk_scan_partitions() and blkdev_bszset() Jan Kara
2023-08-11 11:04 ` [PATCH 04/29] drdb: Convert to use bdev_open_by_path() Jan Kara
2023-08-11 11:04 ` [PATCH 05/29] pktcdvd: Convert to bdev_open_by_dev() Jan Kara
2023-08-11 11:04 ` [PATCH 06/29] rnbd-srv: Convert to use bdev_open_by_path() Jan Kara
2023-08-11 11:04 ` [PATCH 07/29] xen/blkback: Convert to bdev_open_by_dev() Jan Kara
2023-08-11 11:04 ` [PATCH 08/29] zram: Convert to use bdev_open_by_dev() Jan Kara
2023-08-11 11:04 ` [PATCH 09/29] bcache: Convert to bdev_open_by_path() Jan Kara
2023-08-21 1:06 ` Eric Wheeler
2023-08-21 17:50 ` Jan Kara
2023-08-21 18:54 ` Eric Wheeler
2023-08-23 10:10 ` Coly Li
2023-08-11 11:04 ` [PATCH 10/29] dm: Convert to bdev_open_by_dev() Jan Kara
2023-08-11 11:04 ` [PATCH 11/29] md: " Jan Kara
2023-08-13 15:54 ` Song Liu
2023-08-14 13:37 ` Jan Kara
2023-08-11 11:04 ` [PATCH 12/29] mtd: block2mtd: Convert to bdev_open_by_dev/path() Jan Kara
2023-08-11 11:04 ` [PATCH 13/29] nvmet: Convert to bdev_open_by_path() Jan Kara
2023-08-11 11:04 ` [PATCH 14/29] s390/dasd: " Jan Kara
2023-08-11 11:04 ` [PATCH 15/29] scsi: target: " Jan Kara
2023-08-11 11:04 ` [PATCH 16/29] PM: hibernate: Convert to bdev_open_by_dev() Jan Kara
2023-08-11 16:57 ` Rafael J. Wysocki
2023-08-11 11:04 ` [PATCH 17/29] PM: hibernate: Drop unused snapshot_test argument Jan Kara
2023-08-11 16:58 ` Rafael J. Wysocki
2023-08-11 11:04 ` [PATCH 18/29] mm/swap: Convert to use bdev_open_by_dev() Jan Kara
2023-08-11 11:04 ` [PATCH 19/29] fs: Convert to bdev_open_by_dev() Jan Kara
2023-08-11 11:04 ` [PATCH 20/29] btrfs: Convert to bdev_open_by_path() Jan Kara
2023-08-11 11:04 ` [PATCH 21/29] erofs: Convert to use bdev_open_by_path() Jan Kara
2023-08-11 11:04 ` [PATCH 22/29] ext4: Convert to bdev_open_by_dev() Jan Kara
2023-08-11 11:04 ` [PATCH 23/29] f2fs: Convert to bdev_open_by_dev/path() Jan Kara
2023-08-11 11:04 ` [PATCH 24/29] jfs: Convert to bdev_open_by_dev() Jan Kara
2023-08-11 11:04 ` [PATCH 25/29] nfs/blocklayout: Convert to use bdev_open_by_dev/path() Jan Kara
2023-08-11 11:04 ` [PATCH 26/29] ocfs2: Convert to use bdev_open_by_dev() Jan Kara
2023-08-11 11:04 ` [PATCH 27/29] reiserfs: Convert to bdev_open_by_dev/path() Jan Kara
2023-08-11 11:04 ` [PATCH 28/29] xfs: Convert to bdev_open_by_path() Jan Kara
[not found] ` <CGME20230814102748eucas1p269b8a53ed09fae1eb57dce3d2a7de752@eucas1p2.samsung.com>
2023-08-14 10:27 ` Daniel Gomez
2023-08-14 13:43 ` Jan Kara
2023-08-11 11:05 ` [PATCH 29/29] block: Remove blkdev_get_by_*() functions Jan Kara
2023-08-11 12:27 ` [PATCH v2 0/29] block: Make blkdev_get_by_*() return handle Christoph Hellwig
2023-08-25 1:58 ` Al Viro
2023-08-25 13:47 ` Jan Kara
2023-08-26 2:28 ` Al Viro [this message]
2023-08-28 14:27 ` Christoph Hellwig
2023-08-28 13:20 ` Christian Brauner
2023-08-28 14:22 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230826022852.GO3390869@ZenIV \
--to=viro@zeniv.linux.org.uk \
--cc=agk@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anna@kernel.org \
--cc=axboe@kernel.dk \
--cc=borntraeger@linux.ibm.com \
--cc=brauner@kernel.org \
--cc=chao@kernel.org \
--cc=djwong@kernel.org \
--cc=dm-devel@redhat.com \
--cc=drbd-dev@lists.linbit.com \
--cc=dsterba@suse.com \
--cc=haris.iqbal@ionos.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jaegeuk@kernel.org \
--cc=jfs-discussion@lists.sourceforge.net \
--cc=jinpu.wang@ionos.com \
--cc=joern@lazybastard.org \
--cc=joseph.qi@linux.alibaba.com \
--cc=kent.overstreet@gmail.com \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-erofs@lists.ozlabs.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-mtd@lists.infradead.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-nilfs@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pm@vger.kernel.org \
--cc=linux-raid@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=minchan@kernel.org \
--cc=ocfs2-devel@oss.oracle.com \
--cc=reiserfs-devel@vger.kernel.org \
--cc=senozhatsky@chromium.org \
--cc=shaggy@kernel.org \
--cc=snitzer@kernel.org \
--cc=song@kernel.org \
--cc=svens@linux.ibm.com \
--cc=target-devel@vger.kernel.org \
--cc=trond.myklebust@hammerspace.com \
--cc=tytso@mit.edu \
--cc=xen-devel@lists.xenproject.org \
--cc=xiang@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).