From: Demi Marie Obenour <demi@invisiblethingslab.com>
To: James Bottomley <James.Bottomley@hansenpartnership.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Block Mailing List <linux-block@vger.kernel.org>,
Linux Filesystem Mailing List <linux-fsdevel@vger.kernel.org>
Subject: Re: Race-free block device opening
Date: Sat, 7 May 2022 07:35:45 -0400 [thread overview]
Message-ID: <YnZZlR7BV/cyn8xS@itl-email> (raw)
In-Reply-To: <d134571f381868e1cec74aca905012d8aec9fec8.camel@HansenPartnership.com>
[-- Attachment #1: Type: text/plain, Size: 4409 bytes --]
On Wed, Apr 27, 2022 at 09:29:12AM -0400, James Bottomley wrote:
> On Tue, 2022-04-26 at 14:12 -0400, Demi Marie Obenour wrote:
> > Right now, opening block devices in a race-free way is incredibly
> > hard.
>
> Could you be more specific about what the race you're having problems
> with is? What is racing.
If I open /dev/mapper/qubes_dom0-vm--sys--net--private, it is possible
that something has destroyed the corresponding device and created a new
one with the same kernel name, *before* udev has managed to unlink the
device node. As a result, I wind up opening the wrong device.
> > The only reasonable approach I know of is sd_device_new_from_path() +
> > sd_device_open(), and is only available in systemd git main. It also
> > requires waiting on systemd-udev to have processed udev rules, which
> > can be a bottleneck.
>
> This doesn't actually seem to be in my copy of systemd.
That’s because it is not in any release yet.
> > There are better approaches in various special cases, such as using
> > device-mapper ioctls to check that the device one has opened still
> > has the name and/or UUID one expects. However, none of them works
> > for a plain call to open(2).
>
> Just so we're clear: if you call open on, say /dev/sdb1 and something
> happens to hot unplug and then replug a different device under that
> node, the file descriptor you got at open does *not* point to the new
> node. It points to a dead device responder that errors everything.
>
> The point being once you open() something, the file descriptor is
> guaranteed to point to the same device (or error).
That doesn’t help if the unplug and replug happens between passing the
path and udev having purged the now-stale symlink.
> > A much better approach would be for udev to point its symlinks at
> > "/dev/disk/by-diskseq/$DISKSEQ" for non-partition disk devices, or at
> > "/dev/disk/by-diskseq/${DISKSEQ}p${PARTITION}" for partitions. A
> > filesystem would then be mounted at "/dev/disk/by-diskseq" that
> > provides for race-free opening of these paths. This could be
> > implemented in userspace using FUSE, either with difficulty using the
> > current kernel API, or easily and efficiently using a new kernel API
> > for opening a block device by diskseq + partition. However, I think
> > this should be handled by the Linux kernel itself.
> >
> > What would be necessary to get this into the kernel? I would like to
> > implement this, but I don’t have the time to do so anytime soon. Is
> > anyone else interested in taking this on? I suspect the kernel code
> > needed to implement this would be quite a bit smaller than the FUSE
> > implementation.
>
> So it sounds like the problem is you want to be sure that the device
> doesn't change after you've called libblkid to identify it but before
> you call open? If that's so, the way you do this in userspace is to
> call libblkid again after the open. If the before and after id match,
> you're as sure as you can be the open was of the right device.
The devices I am working with are raw-format VM disks that contain
untrusted data. They are identified not by their content, which the VM
has complete control over, but by various sysfs attributes such as
dm/name and dm/uuid. And they need to be passed to interfaces, such as
libvirt and cryptsetup, that only accept device paths.
I can work around this in the case of cryptsetup by using the
libcryptsetup library and/or holding a file descriptor open, but neither
of those will work for libvirt since libvirtd is a separate process and
I cannot pass a file descriptor to it. Furthermore, there is no way to
make libvirtd do any post-open() checking on the file descriptor it has
obtained. While I plan to add a workaround in libxl and blkback for
loop and device-mapper devices, it is not reasonable to expect every
userspace tool to do the same.
The approach I am suggesting avoids this problem entirely, because
/dev/mapper/qubes_dom0-vm--sys--net--private is now a symlink to a
device node under /dev/disk/by-diskseq/$DISKSEQ. Those are never, ever
reused. When the device goes away, the device node goes away too, and
so any attempt to open the symlink (without O_PATH|O_NOFOLLOW) gets
-ENOENT as it should.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
prev parent reply other threads:[~2022-05-07 11:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-26 18:12 Race-free block device opening Demi Marie Obenour
2022-04-26 18:35 ` Greg Kroah-Hartman
2022-04-26 21:31 ` Demi Marie Obenour
2022-04-26 22:07 ` Demi Marie Obenour
2022-04-27 13:29 ` James Bottomley
2022-05-07 11:35 ` Demi Marie Obenour [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YnZZlR7BV/cyn8xS@itl-email \
--to=demi@invisiblethingslab.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox