* Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
@ 2026-01-27 18:47 Stefan Hajnoczi
2026-01-27 19:45 ` Paolo Bonzini
2026-01-27 21:06 ` Benjamin Marzinski
0 siblings, 2 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-01-27 18:47 UTC (permalink / raw)
To: Benjamin Marzinski, Paolo Bonzini
Cc: qemu-block, Kevin Wolf, Hannes Reinecke, afaria, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2436 bytes --]
Hi Benjamin and Paolo,
I would like to discuss changes to DM-Multipath and qemu-pr-helper to
handle SCSI Persistent Reservations in QEMU without privileged code.
SCSI Persistent Reservations support in QEMU is built on the
qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and
PERSISTENT RESERVATION OUT commands on behalf of the guest. The
qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s
CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU
process should not have those privileges.
There are issues with the current approach:
- Privileged code is a security attack surface.
- A bunch of code is required for privilege separation and for management
tools to set up qemu-pr-helper with access to multipathd.
- The interface is SCSI-specific and does not support NVMe.
Several of us have pondered a different approach that I will summarize
here. The <linux/pr.h> ioctl interface provides an alternative to
ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
SCSI and NVMe. Since privileges are not required, there would be no need
for the qemu-pr-helper daemon anymore.
The blocker is that <linux/pr.h> is not usable in multipath
environments. The Linux DM-Multipath driver has an incomplete ioctl
implementation that falls short of what libmpathpersist and multipathd
do in userspace. Kernel changes are necessary to fix this.
My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
to multipathd. That way applications like QEMU can consistently use
<linux/pr.h> across block device types and no longer have to go through
the privileged libmpathpersist interface.
Once DM-Multipath support <linux/pr.h> is functional, the main QEMU
process can directly invoke the ioctls. qemu-pr-helper will no longer be
needed, eliminating privileged code and simplifying the setup required
by management tools such as libvirt and KubeVirt.
The only loss in functionality that I have identified when switching to
<linux/pr.h> is that qemu-pr-helper supports SCSI TransportIDs for the
PERSISTENT RESERVATION OUT command. This is not supported by
<linux/pr.h>, but I'm not sure how this even works today since the guest
sees a virtual SCSI bus and is unaware of the physical bus or HBA. So
maybe that was never used in the first place?
Does this plan sound good to you?
Benjamin: I can work on the DM-Multipath upcalls if you are busy.
Thanks,
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-01-27 18:47 Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h> Stefan Hajnoczi
@ 2026-01-27 19:45 ` Paolo Bonzini
2026-01-28 14:18 ` Stefan Hajnoczi
2026-01-27 21:06 ` Benjamin Marzinski
1 sibling, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2026-01-27 19:45 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Benjamin Marzinski, open list:Block layer core, Kevin Wolf,
Hannes Reinecke, Alberto Faria, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1949 bytes --]
Il mar 27 gen 2026, 19:47 Stefan Hajnoczi <stefanha@redhat.com> ha scritto:
> Several of us have pondered a different approach that I will summarize
> here. The <linux/pr.h> ioctl interface provides an alternative to
> ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
> SCSI and NVMe. Since privileges are not required, there would be no need
> for the qemu-pr-helper daemon anymore.
>
Yes, no problem with that. It's easy to extend QEMU with a new pr-manager
subclass that converts SCSI commands to PR ioctls.
My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
> to multipathd. That way applications like QEMU can consistently use
> <linux/pr.h> across block device types and no longer have to go through
> the privileged libmpathpersist interface.
>
What do you have in mind for the upcall protocol? Does it need to be done
with multipathd or can it be a separate daemon for privilege separation? I
am not sure if there is any channel between dm-mpath and multipathd that
can be extended (I think it only uses uevent?); maybe it would make sense
to reuse qemu-pr-helper's protocol even.
Paolo
Once DM-Multipath support <linux/pr.h> is functional, the main QEMU
> process can directly invoke the ioctls. qemu-pr-helper will no longer be
> needed, eliminating privileged code and simplifying the setup required
> by management tools such as libvirt and KubeVirt.
>
> The only loss in functionality that I have identified when switching to
> <linux/pr.h> is that qemu-pr-helper supports SCSI TransportIDs for the
> PERSISTENT RESERVATION OUT command. This is not supported by
> <linux/pr.h>, but I'm not sure how this even works today since the guest
> sees a virtual SCSI bus and is unaware of the physical bus or HBA. So
> maybe that was never used in the first place?
>
> Does this plan sound good to you?
>
> Benjamin: I can work on the DM-Multipath upcalls if you are busy.
>
> Thanks,
> Stefan
>
[-- Attachment #2: Type: text/html, Size: 3005 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-01-27 18:47 Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h> Stefan Hajnoczi
2026-01-27 19:45 ` Paolo Bonzini
@ 2026-01-27 21:06 ` Benjamin Marzinski
2026-02-03 15:09 ` Stefan Hajnoczi
1 sibling, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2026-01-27 21:06 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Paolo Bonzini, qemu-block, Kevin Wolf, Hannes Reinecke, afaria,
qemu-devel, Mikulas Patocka
On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi wrote:
> Hi Benjamin and Paolo,
> I would like to discuss changes to DM-Multipath and qemu-pr-helper to
> handle SCSI Persistent Reservations in QEMU without privileged code.
>
> SCSI Persistent Reservations support in QEMU is built on the
> qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and
> PERSISTENT RESERVATION OUT commands on behalf of the guest. The
> qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s
> CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU
> process should not have those privileges.
>
> There are issues with the current approach:
> - Privileged code is a security attack surface.
> - A bunch of code is required for privilege separation and for management
> tools to set up qemu-pr-helper with access to multipathd.
> - The interface is SCSI-specific and does not support NVMe.
>
> Several of us have pondered a different approach that I will summarize
> here. The <linux/pr.h> ioctl interface provides an alternative to
> ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
> SCSI and NVMe. Since privileges are not required, there would be no need
> for the qemu-pr-helper daemon anymore.
>
> The blocker is that <linux/pr.h> is not usable in multipath
> environments. The Linux DM-Multipath driver has an incomplete ioctl
> implementation that falls short of what libmpathpersist and multipathd
> do in userspace. Kernel changes are necessary to fix this.
>
> My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
> to multipathd. That way applications like QEMU can consistently use
> <linux/pr.h> across block device types and no longer have to go through
> the privileged libmpathpersist interface.
This would take intercepting the pr commands to multipath devices right
at the start of dm_call_pr(). In order to make some persistent
reservation commands seem atomic, libmpathpersist needs to suspend the
multipath device in certain situations. So device-mapper cannot call
dm_get_live_table(), since this will block suspends. This should be o.k.
Libmpathpersist is designed to handle the possiblity that the multipath
device gets reloaded with different paths while it is running. And since
the multipath target is an immutable singleton target, there is no
possibility of it turning into another target type because of a table
reload during suspend.
Also, just to clarify, the kernel code can't interface directly with
multipathd. Most of the code for handling persistent reservations is in
libmpathpersist, which just needs multipathd to do things like make sure
that paths that are added in the furture get registered properly. There
would likely need to be some new program (that is just a thin wrapper
around libmpathpersist) which can be called with call_usermodehelper().
> Once DM-Multipath support <linux/pr.h> is functional, the main QEMU
> process can directly invoke the ioctls. qemu-pr-helper will no longer be
> needed, eliminating privileged code and simplifying the setup required
> by management tools such as libvirt and KubeVirt.
>
> The only loss in functionality that I have identified when switching to
> <linux/pr.h> is that qemu-pr-helper supports SCSI TransportIDs for the
> PERSISTENT RESERVATION OUT command. This is not supported by
> <linux/pr.h>, but I'm not sure how this even works today since the guest
> sees a virtual SCSI bus and is unaware of the physical bus or HBA. So
> maybe that was never used in the first place?
This is fine. Like you said, TransportIDs don't really make sense on a
virtual scsi device on top of a multipath device.
> Does this plan sound good to you?
I'm not sure how well this would go over upstream, but it does seem like
a reasonable plan. Mikulas, do you have any thought about this idea?
-Ben
> Benjamin: I can work on the DM-Multipath upcalls if you are busy.
>
> Thanks,
> Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-01-27 19:45 ` Paolo Bonzini
@ 2026-01-28 14:18 ` Stefan Hajnoczi
2026-01-28 15:30 ` Hannes Reinecke
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-01-28 14:18 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Benjamin Marzinski, open list:Block layer core, Kevin Wolf,
Hannes Reinecke, Alberto Faria, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2106 bytes --]
On Tue, Jan 27, 2026 at 08:45:39PM +0100, Paolo Bonzini wrote:
> Il mar 27 gen 2026, 19:47 Stefan Hajnoczi <stefanha@redhat.com> ha scritto:
>
> > Several of us have pondered a different approach that I will summarize
> > here. The <linux/pr.h> ioctl interface provides an alternative to
> > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
> > SCSI and NVMe. Since privileges are not required, there would be no need
> > for the qemu-pr-helper daemon anymore.
> >
>
> Yes, no problem with that. It's easy to extend QEMU with a new pr-manager
> subclass that converts SCSI commands to PR ioctls.
Yes. It will be possible to go further than that in the future:
Alberto has been working on QEMU block layer API support for persistent
reservations. When that becomes available, SCSI command parsing can
happen entirely within hw/scsi/scsi-disk.c for scsi-block and scsi-disk.
file-posix.c will then implement the new BlockDriver PR APIs via
<linux/pr.h> ioctls and other block drivers can implement them in
protocol-specific ways (e.g. iSCSI).
> My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
> > to multipathd. That way applications like QEMU can consistently use
> > <linux/pr.h> across block device types and no longer have to go through
> > the privileged libmpathpersist interface.
> >
>
> What do you have in mind for the upcall protocol? Does it need to be done
> with multipathd or can it be a separate daemon for privilege separation? I
> am not sure if there is any channel between dm-mpath and multipathd that
> can be extended (I think it only uses uevent?); maybe it would make sense
> to reuse qemu-pr-helper's protocol even.
I don't have a strong opinion on the protocol. My thought was to do a
traditional upcall with call_usermodehelper() with an execve argv/envp
protocol. That way there is no need to register a file descriptor. The
downside is that this approach is less efficient and more likely to fail
when the host is under memory pressure, but PR operations are not that
frequent.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-01-28 14:18 ` Stefan Hajnoczi
@ 2026-01-28 15:30 ` Hannes Reinecke
2026-01-28 16:13 ` Stefan Hajnoczi
0 siblings, 1 reply; 25+ messages in thread
From: Hannes Reinecke @ 2026-01-28 15:30 UTC (permalink / raw)
To: Stefan Hajnoczi, Paolo Bonzini
Cc: Benjamin Marzinski, open list:Block layer core, Kevin Wolf,
Alberto Faria, qemu-devel
On 1/28/26 15:18, Stefan Hajnoczi wrote:
> On Tue, Jan 27, 2026 at 08:45:39PM +0100, Paolo Bonzini wrote:
>> Il mar 27 gen 2026, 19:47 Stefan Hajnoczi <stefanha@redhat.com> ha scritto:
>>
>>> Several of us have pondered a different approach that I will summarize
>>> here. The <linux/pr.h> ioctl interface provides an alternative to
>>> ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
>>> SCSI and NVMe. Since privileges are not required, there would be no need
>>> for the qemu-pr-helper daemon anymore.
>>>
>>
>> Yes, no problem with that. It's easy to extend QEMU with a new pr-manager
>> subclass that converts SCSI commands to PR ioctls.
>
> Yes. It will be possible to go further than that in the future:
>
> Alberto has been working on QEMU block layer API support for persistent
> reservations. When that becomes available, SCSI command parsing can
> happen entirely within hw/scsi/scsi-disk.c for scsi-block and scsi-disk.
> file-posix.c will then implement the new BlockDriver PR APIs via
> <linux/pr.h> ioctls and other block drivers can implement them in
> protocol-specific ways (e.g. iSCSI).
>
>> My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
>>> to multipathd. That way applications like QEMU can consistently use
>>> <linux/pr.h> across block device types and no longer have to go through
>>> the privileged libmpathpersist interface.
>>>
>>
>> What do you have in mind for the upcall protocol? Does it need to be done
>> with multipathd or can it be a separate daemon for privilege separation? I
>> am not sure if there is any channel between dm-mpath and multipathd that
>> can be extended (I think it only uses uevent?); maybe it would make sense
>> to reuse qemu-pr-helper's protocol even.
>
> I don't have a strong opinion on the protocol. My thought was to do a
> traditional upcall with call_usermodehelper() with an execve argv/envp
> protocol. That way there is no need to register a file descriptor. The
> downside is that this approach is less efficient and more likely to fail
> when the host is under memory pressure, but PR operations are not that
> frequent.
>
gnaa. call_usermodehelper() is _evil_. It might be executed with any
arbitrary fs context, and you better hope the executable is present
there ...
Maybe look at the handshake daemon. That's solving a very similar issue
which we had for TLS.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-01-28 15:30 ` Hannes Reinecke
@ 2026-01-28 16:13 ` Stefan Hajnoczi
0 siblings, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-01-28 16:13 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Paolo Bonzini, Benjamin Marzinski, open list:Block layer core,
Kevin Wolf, Alberto Faria, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2850 bytes --]
On Wed, Jan 28, 2026 at 04:30:51PM +0100, Hannes Reinecke wrote:
> On 1/28/26 15:18, Stefan Hajnoczi wrote:
> > On Tue, Jan 27, 2026 at 08:45:39PM +0100, Paolo Bonzini wrote:
> > > Il mar 27 gen 2026, 19:47 Stefan Hajnoczi <stefanha@redhat.com> ha scritto:
> > >
> > > > Several of us have pondered a different approach that I will summarize
> > > > here. The <linux/pr.h> ioctl interface provides an alternative to
> > > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
> > > > SCSI and NVMe. Since privileges are not required, there would be no need
> > > > for the qemu-pr-helper daemon anymore.
> > > >
> > >
> > > Yes, no problem with that. It's easy to extend QEMU with a new pr-manager
> > > subclass that converts SCSI commands to PR ioctls.
> >
> > Yes. It will be possible to go further than that in the future:
> >
> > Alberto has been working on QEMU block layer API support for persistent
> > reservations. When that becomes available, SCSI command parsing can
> > happen entirely within hw/scsi/scsi-disk.c for scsi-block and scsi-disk.
> > file-posix.c will then implement the new BlockDriver PR APIs via
> > <linux/pr.h> ioctls and other block drivers can implement them in
> > protocol-specific ways (e.g. iSCSI).
> >
> > > My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
> > > > to multipathd. That way applications like QEMU can consistently use
> > > > <linux/pr.h> across block device types and no longer have to go through
> > > > the privileged libmpathpersist interface.
> > > >
> > >
> > > What do you have in mind for the upcall protocol? Does it need to be done
> > > with multipathd or can it be a separate daemon for privilege separation? I
> > > am not sure if there is any channel between dm-mpath and multipathd that
> > > can be extended (I think it only uses uevent?); maybe it would make sense
> > > to reuse qemu-pr-helper's protocol even.
> >
> > I don't have a strong opinion on the protocol. My thought was to do a
> > traditional upcall with call_usermodehelper() with an execve argv/envp
> > protocol. That way there is no need to register a file descriptor. The
> > downside is that this approach is less efficient and more likely to fail
> > when the host is under memory pressure, but PR operations are not that
> > frequent.
> >
> gnaa. call_usermodehelper() is _evil_. It might be executed with any
> arbitrary fs context, and you better hope the executable is present
> there ...
>
> Maybe look at the handshake daemon. That's solving a very similar issue
> which we had for TLS.
Netlink is the most complex approach. I had a hard to understanding what
was going from looking at drivers/nvme/host/ and net/handshake/. But if
netlink is the way to do this, I'm sure it can be done.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-01-27 21:06 ` Benjamin Marzinski
@ 2026-02-03 15:09 ` Stefan Hajnoczi
2026-02-03 17:53 ` Benjamin Marzinski
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-02-03 15:09 UTC (permalink / raw)
To: Benjamin Marzinski
Cc: Paolo Bonzini, qemu-block, Kevin Wolf, Hannes Reinecke, afaria,
qemu-devel, Mikulas Patocka
[-- Attachment #1: Type: text/plain, Size: 4389 bytes --]
On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski wrote:
> On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi wrote:
> > Hi Benjamin and Paolo,
> > I would like to discuss changes to DM-Multipath and qemu-pr-helper to
> > handle SCSI Persistent Reservations in QEMU without privileged code.
> >
> > SCSI Persistent Reservations support in QEMU is built on the
> > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and
> > PERSISTENT RESERVATION OUT commands on behalf of the guest. The
> > qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s
> > CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU
> > process should not have those privileges.
> >
> > There are issues with the current approach:
> > - Privileged code is a security attack surface.
> > - A bunch of code is required for privilege separation and for management
> > tools to set up qemu-pr-helper with access to multipathd.
> > - The interface is SCSI-specific and does not support NVMe.
> >
> > Several of us have pondered a different approach that I will summarize
> > here. The <linux/pr.h> ioctl interface provides an alternative to
> > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
> > SCSI and NVMe. Since privileges are not required, there would be no need
> > for the qemu-pr-helper daemon anymore.
> >
> > The blocker is that <linux/pr.h> is not usable in multipath
> > environments. The Linux DM-Multipath driver has an incomplete ioctl
> > implementation that falls short of what libmpathpersist and multipathd
> > do in userspace. Kernel changes are necessary to fix this.
> >
> > My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
> > to multipathd. That way applications like QEMU can consistently use
> > <linux/pr.h> across block device types and no longer have to go through
> > the privileged libmpathpersist interface.
>
> This would take intercepting the pr commands to multipath devices right
> at the start of dm_call_pr(). In order to make some persistent
> reservation commands seem atomic, libmpathpersist needs to suspend the
> multipath device in certain situations. So device-mapper cannot call
> dm_get_live_table(), since this will block suspends. This should be o.k.
> Libmpathpersist is designed to handle the possiblity that the multipath
> device gets reloaded with different paths while it is running. And since
> the multipath target is an immutable singleton target, there is no
> possibility of it turning into another target type because of a table
> reload during suspend.
>
> Also, just to clarify, the kernel code can't interface directly with
> multipathd. Most of the code for handling persistent reservations is in
> libmpathpersist, which just needs multipathd to do things like make sure
> that paths that are added in the furture get registered properly. There
> would likely need to be some new program (that is just a thin wrapper
> around libmpathpersist) which can be called with call_usermodehelper().
Hi everyone,
I'm starting to work on the DM-Multipath changes. Some more details on
how I am approaching this:
- multipath-tools will create multipath device-mapper targets with a new
ctr argument (pr_netlink) when this feature is enabled. When the
feature is disabled, everything remains backwards compatible. With the
pr_netlink ctr argument, the multipath target sends a netlink
multicast group notification instead of handling PR operations (e.g.
IOC_PR_* ioctls) in the kernel.
- There will be a new program in multipath-tools called mpathpersistd
that listens on the netlink multicast group for notifications. The
notification tells it which multipath device has a pending PR
operation. It fetches the PR operation parameters by sending a netlink
message, performs the persistent reservation operation via
libmpathpersist, and then sends a response to the kernel via another
netlink message.
- The multipath device-mapper target completes the PR operation upon
receiving the netlink response.
I ended up choosing netlink because call_usermodehelper() seems less
appropriate for an operation triggered by untrusted userspace processes.
Your input is welcome. Let me know if a different approach would be
better.
Thanks,
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-03 15:09 ` Stefan Hajnoczi
@ 2026-02-03 17:53 ` Benjamin Marzinski
2026-02-03 18:04 ` Stefan Hajnoczi
0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2026-02-03 17:53 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Paolo Bonzini, qemu-block, Kevin Wolf, Hannes Reinecke, afaria,
qemu-devel, Mikulas Patocka, Martin Wilck
On Tue, Feb 03, 2026 at 10:09:39AM -0500, Stefan Hajnoczi wrote:
> On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski wrote:
> > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi wrote:
> > > Hi Benjamin and Paolo,
> > > I would like to discuss changes to DM-Multipath and qemu-pr-helper to
> > > handle SCSI Persistent Reservations in QEMU without privileged code.
> > >
> > > SCSI Persistent Reservations support in QEMU is built on the
> > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and
> > > PERSISTENT RESERVATION OUT commands on behalf of the guest. The
> > > qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s
> > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU
> > > process should not have those privileges.
> > >
> > > There are issues with the current approach:
> > > - Privileged code is a security attack surface.
> > > - A bunch of code is required for privilege separation and for management
> > > tools to set up qemu-pr-helper with access to multipathd.
> > > - The interface is SCSI-specific and does not support NVMe.
> > >
> > > Several of us have pondered a different approach that I will summarize
> > > here. The <linux/pr.h> ioctl interface provides an alternative to
> > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
> > > SCSI and NVMe. Since privileges are not required, there would be no need
> > > for the qemu-pr-helper daemon anymore.
> > >
> > > The blocker is that <linux/pr.h> is not usable in multipath
> > > environments. The Linux DM-Multipath driver has an incomplete ioctl
> > > implementation that falls short of what libmpathpersist and multipathd
> > > do in userspace. Kernel changes are necessary to fix this.
> > >
> > > My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
> > > to multipathd. That way applications like QEMU can consistently use
> > > <linux/pr.h> across block device types and no longer have to go through
> > > the privileged libmpathpersist interface.
> >
> > This would take intercepting the pr commands to multipath devices right
> > at the start of dm_call_pr(). In order to make some persistent
> > reservation commands seem atomic, libmpathpersist needs to suspend the
> > multipath device in certain situations. So device-mapper cannot call
> > dm_get_live_table(), since this will block suspends. This should be o.k.
> > Libmpathpersist is designed to handle the possiblity that the multipath
> > device gets reloaded with different paths while it is running. And since
> > the multipath target is an immutable singleton target, there is no
> > possibility of it turning into another target type because of a table
> > reload during suspend.
> >
> > Also, just to clarify, the kernel code can't interface directly with
> > multipathd. Most of the code for handling persistent reservations is in
> > libmpathpersist, which just needs multipathd to do things like make sure
> > that paths that are added in the furture get registered properly. There
> > would likely need to be some new program (that is just a thin wrapper
> > around libmpathpersist) which can be called with call_usermodehelper().
Adding Martin Wilck, since he will also be looking at these changes.
> Hi everyone,
> I'm starting to work on the DM-Multipath changes. Some more details on
> how I am approaching this:
>
> - multipath-tools will create multipath device-mapper targets with a new
> ctr argument (pr_netlink) when this feature is enabled. When the
> feature is disabled, everything remains backwards compatible. With the
> pr_netlink ctr argument, the multipath target sends a netlink
> multicast group notification instead of handling PR operations (e.g.
> IOC_PR_* ioctls) in the kernel.
>
> - There will be a new program in multipath-tools called mpathpersistd
> that listens on the netlink multicast group for notifications. The
> notification tells it which multipath device has a pending PR
> operation. It fetches the PR operation parameters by sending a netlink
> message, performs the persistent reservation operation via
> libmpathpersist, and then sends a response to the kernel via another
> netlink message.
>
> - The multipath device-mapper target completes the PR operation upon
> receiving the netlink response.
>
> I ended up choosing netlink because call_usermodehelper() seems less
> appropriate for an operation triggered by untrusted userspace processes.
>
> Your input is welcome. Let me know if a different approach would be
> better.
Is the netlink interface going to be a generic persistent reservation
upcall interface, or it this just for dm multipath? I'm not sure if
there would ever be another user, and I don't have enough experience
with the netlink code to know how ugly it might be to route
communications from different kernel drivers to different userspace
daemons through the same generic netlink family. But if there's not
much extra complexity in building a generic interface, it seems like
it would be preferable to a multipath specific one.
-Ben
> Thanks,
> Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-03 17:53 ` Benjamin Marzinski
@ 2026-02-03 18:04 ` Stefan Hajnoczi
2026-02-04 13:19 ` Martin Wilck
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-02-03 18:04 UTC (permalink / raw)
To: Benjamin Marzinski
Cc: Paolo Bonzini, qemu-block, Kevin Wolf, Hannes Reinecke, afaria,
qemu-devel, Mikulas Patocka, Martin Wilck
[-- Attachment #1: Type: text/plain, Size: 5600 bytes --]
On Tue, Feb 03, 2026 at 12:53:12PM -0500, Benjamin Marzinski wrote:
> On Tue, Feb 03, 2026 at 10:09:39AM -0500, Stefan Hajnoczi wrote:
> > On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski wrote:
> > > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi wrote:
> > > > Hi Benjamin and Paolo,
> > > > I would like to discuss changes to DM-Multipath and qemu-pr-helper to
> > > > handle SCSI Persistent Reservations in QEMU without privileged code.
> > > >
> > > > SCSI Persistent Reservations support in QEMU is built on the
> > > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN and
> > > > PERSISTENT RESERVATION OUT commands on behalf of the guest. The
> > > > qemu-pr-helper process provides privilege separation for ioctl(SG_IO)'s
> > > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the main QEMU
> > > > process should not have those privileges.
> > > >
> > > > There are issues with the current approach:
> > > > - Privileged code is a security attack surface.
> > > > - A bunch of code is required for privilege separation and for management
> > > > tools to set up qemu-pr-helper with access to multipathd.
> > > > - The interface is SCSI-specific and does not support NVMe.
> > > >
> > > > Several of us have pondered a different approach that I will summarize
> > > > here. The <linux/pr.h> ioctl interface provides an alternative to
> > > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It supports both
> > > > SCSI and NVMe. Since privileges are not required, there would be no need
> > > > for the qemu-pr-helper daemon anymore.
> > > >
> > > > The blocker is that <linux/pr.h> is not usable in multipath
> > > > environments. The Linux DM-Multipath driver has an incomplete ioctl
> > > > implementation that falls short of what libmpathpersist and multipathd
> > > > do in userspace. Kernel changes are necessary to fix this.
> > > >
> > > > My suggestion is to implement <linux/pr.h> via upcalls from DM-Multipath
> > > > to multipathd. That way applications like QEMU can consistently use
> > > > <linux/pr.h> across block device types and no longer have to go through
> > > > the privileged libmpathpersist interface.
> > >
> > > This would take intercepting the pr commands to multipath devices right
> > > at the start of dm_call_pr(). In order to make some persistent
> > > reservation commands seem atomic, libmpathpersist needs to suspend the
> > > multipath device in certain situations. So device-mapper cannot call
> > > dm_get_live_table(), since this will block suspends. This should be o.k.
> > > Libmpathpersist is designed to handle the possiblity that the multipath
> > > device gets reloaded with different paths while it is running. And since
> > > the multipath target is an immutable singleton target, there is no
> > > possibility of it turning into another target type because of a table
> > > reload during suspend.
> > >
> > > Also, just to clarify, the kernel code can't interface directly with
> > > multipathd. Most of the code for handling persistent reservations is in
> > > libmpathpersist, which just needs multipathd to do things like make sure
> > > that paths that are added in the furture get registered properly. There
> > > would likely need to be some new program (that is just a thin wrapper
> > > around libmpathpersist) which can be called with call_usermodehelper().
>
> Adding Martin Wilck, since he will also be looking at these changes.
>
> > Hi everyone,
> > I'm starting to work on the DM-Multipath changes. Some more details on
> > how I am approaching this:
> >
> > - multipath-tools will create multipath device-mapper targets with a new
> > ctr argument (pr_netlink) when this feature is enabled. When the
> > feature is disabled, everything remains backwards compatible. With the
> > pr_netlink ctr argument, the multipath target sends a netlink
> > multicast group notification instead of handling PR operations (e.g.
> > IOC_PR_* ioctls) in the kernel.
> >
> > - There will be a new program in multipath-tools called mpathpersistd
> > that listens on the netlink multicast group for notifications. The
> > notification tells it which multipath device has a pending PR
> > operation. It fetches the PR operation parameters by sending a netlink
> > message, performs the persistent reservation operation via
> > libmpathpersist, and then sends a response to the kernel via another
> > netlink message.
> >
> > - The multipath device-mapper target completes the PR operation upon
> > receiving the netlink response.
> >
> > I ended up choosing netlink because call_usermodehelper() seems less
> > appropriate for an operation triggered by untrusted userspace processes.
> >
> > Your input is welcome. Let me know if a different approach would be
> > better.
>
> Is the netlink interface going to be a generic persistent reservation
> upcall interface, or it this just for dm multipath? I'm not sure if
> there would ever be another user, and I don't have enough experience
> with the netlink code to know how ugly it might be to route
> communications from different kernel drivers to different userspace
> daemons through the same generic netlink family. But if there's not
> much extra complexity in building a generic interface, it seems like
> it would be preferable to a multipath specific one.
It can be generic. The messages will contain the block device
major:minor as well as information to describe <linux/pr.h> requests.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-03 18:04 ` Stefan Hajnoczi
@ 2026-02-04 13:19 ` Martin Wilck
2026-02-04 18:32 ` Stefan Hajnoczi
0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2026-02-04 13:19 UTC (permalink / raw)
To: Stefan Hajnoczi, Benjamin Marzinski
Cc: Paolo Bonzini, qemu-block, Kevin Wolf, Hannes Reinecke, afaria,
qemu-devel, Mikulas Patocka
Hi Stefan,
On Tue, 2026-02-03 at 13:04 -0500, Stefan Hajnoczi wrote:
> On Tue, Feb 03, 2026 at 12:53:12PM -0500, Benjamin Marzinski wrote:
> > On Tue, Feb 03, 2026 at 10:09:39AM -0500, Stefan Hajnoczi wrote:
> > > On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski
> > > wrote:
> > > > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi
> > > > wrote:
> > > > > Hi Benjamin and Paolo,
> > > > > I would like to discuss changes to DM-Multipath and qemu-pr-
> > > > > helper to
> > > > > handle SCSI Persistent Reservations in QEMU without
> > > > > privileged code.
> > > > >
> > > > > SCSI Persistent Reservations support in QEMU is built on the
> > > > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN
> > > > > and
> > > > > PERSISTENT RESERVATION OUT commands on behalf of the guest.
> > > > > The
> > > > > qemu-pr-helper process provides privilege separation for
> > > > > ioctl(SG_IO)'s
> > > > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the
> > > > > main QEMU
> > > > > process should not have those privileges.
> > > > >
> > > > > There are issues with the current approach:
> > > > > - Privileged code is a security attack surface.
> > > > > - A bunch of code is required for privilege separation and
> > > > > for management
> > > > > tools to set up qemu-pr-helper with access to multipathd.
> > > > > - The interface is SCSI-specific and does not support NVMe.
> > > > >
> > > > > Several of us have pondered a different approach that I will
> > > > > summarize
> > > > > here. The <linux/pr.h> ioctl interface provides an
> > > > > alternative to
> > > > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It
> > > > > supports both
> > > > > SCSI and NVMe. Since privileges are not required, there would
> > > > > be no need
> > > > > for the qemu-pr-helper daemon anymore.
> > > > >
> > > > > The blocker is that <linux/pr.h> is not usable in multipath
> > > > > environments. The Linux DM-Multipath driver has an incomplete
> > > > > ioctl
> > > > > implementation that falls short of what libmpathpersist and
> > > > > multipathd
> > > > > do in userspace. Kernel changes are necessary to fix this.
> > > > >
> > > > > My suggestion is to implement <linux/pr.h> via upcalls from
> > > > > DM-Multipath
> > > > > to multipathd. That way applications like QEMU can
> > > > > consistently use
> > > > > <linux/pr.h> across block device types and no longer have to
> > > > > go through
> > > > > the privileged libmpathpersist interface.
> > > >
> > > > This would take intercepting the pr commands to multipath
> > > > devices right
> > > > at the start of dm_call_pr(). In order to make some persistent
> > > > reservation commands seem atomic, libmpathpersist needs to
> > > > suspend the
> > > > multipath device in certain situations. So device-mapper cannot
> > > > call
> > > > dm_get_live_table(), since this will block suspends. This
> > > > should be o.k.
> > > > Libmpathpersist is designed to handle the possiblity that the
> > > > multipath
> > > > device gets reloaded with different paths while it is running.
> > > > And since
> > > > the multipath target is an immutable singleton target, there is
> > > > no
> > > > possibility of it turning into another target type because of a
> > > > table
> > > > reload during suspend.
> > > >
> > > > Also, just to clarify, the kernel code can't interface directly
> > > > with
> > > > multipathd. Most of the code for handling persistent
> > > > reservations is in
> > > > libmpathpersist, which just needs multipathd to do things like
> > > > make sure
> > > > that paths that are added in the furture get registered
> > > > properly. There
> > > > would likely need to be some new program (that is just a thin
> > > > wrapper
> > > > around libmpathpersist) which can be called with
> > > > call_usermodehelper().
> >
> > Adding Martin Wilck, since he will also be looking at these
> > changes.
> >
> > > Hi everyone,
> > > I'm starting to work on the DM-Multipath changes. Some more
> > > details on
> > > how I am approaching this:
> > >
> > > - multipath-tools will create multipath device-mapper targets
> > > with a new
> > > ctr argument (pr_netlink) when this feature is enabled. When
> > > the
> > > feature is disabled, everything remains backwards compatible.
> > > With the
> > > pr_netlink ctr argument, the multipath target sends a netlink
> > > multicast group notification instead of handling PR operations
> > > (e.g.
> > > IOC_PR_* ioctls) in the kernel.
> > >
> > > - There will be a new program in multipath-tools called
> > > mpathpersistd
> > > that listens on the netlink multicast group for notifications.
> > > The
> > > notification tells it which multipath device has a pending PR
> > > operation. It fetches the PR operation parameters by sending a
> > > netlink
> > > message, performs the persistent reservation operation via
> > > libmpathpersist, and then sends a response to the kernel via
> > > another
> > > netlink message.
> > >
> > > - The multipath device-mapper target completes the PR operation
> > > upon
> > > receiving the netlink response.
> > >
> > > I ended up choosing netlink because call_usermodehelper() seems
> > > less
> > > appropriate for an operation triggered by untrusted userspace
> > > processes.
> > >
> > > Your input is welcome. Let me know if a different approach would
> > > be
> > > better.
> >
> > Is the netlink interface going to be a generic persistent
> > reservation
> > upcall interface, or it this just for dm multipath? I'm not sure if
> > there would ever be another user, and I don't have enough
> > experience
> > with the netlink code to know how ugly it might be to route
> > communications from different kernel drivers to different userspace
> > daemons through the same generic netlink family. But if there's not
> > much extra complexity in building a generic interface, it seems
> > like
> > it would be preferable to a multipath specific one.
>
> It can be generic. The messages will contain the block device
> major:minor as well as information to describe <linux/pr.h> requests.
So the ioctls will pass through qemu into the kernel, to be intercepted
by the dm-mpath driver, which will use an upcall to have them handled
by mpathpersistd (for the actual command) and multipathd (for the path
registrations).
I don't fully understand the advantage, security and complexity-wise,
of this concept, compared to intercepting them qemu and using a socket
to talk to mpathpersistd directly. If we did this, we could even
support both generic and SCSI PR commands.
Regards
Martin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-04 13:19 ` Martin Wilck
@ 2026-02-04 18:32 ` Stefan Hajnoczi
2026-02-04 23:57 ` Hannes Reinecke
2026-02-05 11:52 ` Martin Wilck
0 siblings, 2 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-02-04 18:32 UTC (permalink / raw)
To: Martin Wilck
Cc: Benjamin Marzinski, Paolo Bonzini, qemu-block, Kevin Wolf,
Hannes Reinecke, afaria, qemu-devel, Mikulas Patocka
[-- Attachment #1: Type: text/plain, Size: 8286 bytes --]
On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
> Hi Stefan,
>
> On Tue, 2026-02-03 at 13:04 -0500, Stefan Hajnoczi wrote:
> > On Tue, Feb 03, 2026 at 12:53:12PM -0500, Benjamin Marzinski wrote:
> > > On Tue, Feb 03, 2026 at 10:09:39AM -0500, Stefan Hajnoczi wrote:
> > > > On Tue, Jan 27, 2026 at 04:06:03PM -0500, Benjamin Marzinski
> > > > wrote:
> > > > > On Tue, Jan 27, 2026 at 01:47:43PM -0500, Stefan Hajnoczi
> > > > > wrote:
> > > > > > Hi Benjamin and Paolo,
> > > > > > I would like to discuss changes to DM-Multipath and qemu-pr-
> > > > > > helper to
> > > > > > handle SCSI Persistent Reservations in QEMU without
> > > > > > privileged code.
> > > > > >
> > > > > > SCSI Persistent Reservations support in QEMU is built on the
> > > > > > qemu-pr-helper daemon that performs PERSISTENT RESERVATION IN
> > > > > > and
> > > > > > PERSISTENT RESERVATION OUT commands on behalf of the guest.
> > > > > > The
> > > > > > qemu-pr-helper process provides privilege separation for
> > > > > > ioctl(SG_IO)'s
> > > > > > CAP_SYS_RAWIO and libmpathpersist's root privileges since the
> > > > > > main QEMU
> > > > > > process should not have those privileges.
> > > > > >
> > > > > > There are issues with the current approach:
> > > > > > - Privileged code is a security attack surface.
> > > > > > - A bunch of code is required for privilege separation and
> > > > > > for management
> > > > > > tools to set up qemu-pr-helper with access to multipathd.
> > > > > > - The interface is SCSI-specific and does not support NVMe.
> > > > > >
> > > > > > Several of us have pondered a different approach that I will
> > > > > > summarize
> > > > > > here. The <linux/pr.h> ioctl interface provides an
> > > > > > alternative to
> > > > > > ioctl(SG_IO) without the CAP_SYS_RAWIO requirement. It
> > > > > > supports both
> > > > > > SCSI and NVMe. Since privileges are not required, there would
> > > > > > be no need
> > > > > > for the qemu-pr-helper daemon anymore.
> > > > > >
> > > > > > The blocker is that <linux/pr.h> is not usable in multipath
> > > > > > environments. The Linux DM-Multipath driver has an incomplete
> > > > > > ioctl
> > > > > > implementation that falls short of what libmpathpersist and
> > > > > > multipathd
> > > > > > do in userspace. Kernel changes are necessary to fix this.
> > > > > >
> > > > > > My suggestion is to implement <linux/pr.h> via upcalls from
> > > > > > DM-Multipath
> > > > > > to multipathd. That way applications like QEMU can
> > > > > > consistently use
> > > > > > <linux/pr.h> across block device types and no longer have to
> > > > > > go through
> > > > > > the privileged libmpathpersist interface.
> > > > >
> > > > > This would take intercepting the pr commands to multipath
> > > > > devices right
> > > > > at the start of dm_call_pr(). In order to make some persistent
> > > > > reservation commands seem atomic, libmpathpersist needs to
> > > > > suspend the
> > > > > multipath device in certain situations. So device-mapper cannot
> > > > > call
> > > > > dm_get_live_table(), since this will block suspends. This
> > > > > should be o.k.
> > > > > Libmpathpersist is designed to handle the possiblity that the
> > > > > multipath
> > > > > device gets reloaded with different paths while it is running.
> > > > > And since
> > > > > the multipath target is an immutable singleton target, there is
> > > > > no
> > > > > possibility of it turning into another target type because of a
> > > > > table
> > > > > reload during suspend.
> > > > >
> > > > > Also, just to clarify, the kernel code can't interface directly
> > > > > with
> > > > > multipathd. Most of the code for handling persistent
> > > > > reservations is in
> > > > > libmpathpersist, which just needs multipathd to do things like
> > > > > make sure
> > > > > that paths that are added in the furture get registered
> > > > > properly. There
> > > > > would likely need to be some new program (that is just a thin
> > > > > wrapper
> > > > > around libmpathpersist) which can be called with
> > > > > call_usermodehelper().
> > >
> > > Adding Martin Wilck, since he will also be looking at these
> > > changes.
> > >
> > > > Hi everyone,
> > > > I'm starting to work on the DM-Multipath changes. Some more
> > > > details on
> > > > how I am approaching this:
> > > >
> > > > - multipath-tools will create multipath device-mapper targets
> > > > with a new
> > > > ctr argument (pr_netlink) when this feature is enabled. When
> > > > the
> > > > feature is disabled, everything remains backwards compatible.
> > > > With the
> > > > pr_netlink ctr argument, the multipath target sends a netlink
> > > > multicast group notification instead of handling PR operations
> > > > (e.g.
> > > > IOC_PR_* ioctls) in the kernel.
> > > >
> > > > - There will be a new program in multipath-tools called
> > > > mpathpersistd
> > > > that listens on the netlink multicast group for notifications.
> > > > The
> > > > notification tells it which multipath device has a pending PR
> > > > operation. It fetches the PR operation parameters by sending a
> > > > netlink
> > > > message, performs the persistent reservation operation via
> > > > libmpathpersist, and then sends a response to the kernel via
> > > > another
> > > > netlink message.
> > > >
> > > > - The multipath device-mapper target completes the PR operation
> > > > upon
> > > > receiving the netlink response.
> > > >
> > > > I ended up choosing netlink because call_usermodehelper() seems
> > > > less
> > > > appropriate for an operation triggered by untrusted userspace
> > > > processes.
> > > >
> > > > Your input is welcome. Let me know if a different approach would
> > > > be
> > > > better.
> > >
> > > Is the netlink interface going to be a generic persistent
> > > reservation
> > > upcall interface, or it this just for dm multipath? I'm not sure if
> > > there would ever be another user, and I don't have enough
> > > experience
> > > with the netlink code to know how ugly it might be to route
> > > communications from different kernel drivers to different userspace
> > > daemons through the same generic netlink family. But if there's not
> > > much extra complexity in building a generic interface, it seems
> > > like
> > > it would be preferable to a multipath specific one.
> >
> > It can be generic. The messages will contain the block device
> > major:minor as well as information to describe <linux/pr.h> requests.
>
> So the ioctls will pass through qemu into the kernel, to be intercepted
> by the dm-mpath driver, which will use an upcall to have them handled
> by mpathpersistd (for the actual command) and multipathd (for the path
> registrations).
>
> I don't fully understand the advantage, security and complexity-wise,
> of this concept, compared to intercepting them qemu and using a socket
> to talk to mpathpersistd directly. If we did this, we could even
> support both generic and SCSI PR commands.
Hi Martin,
The simplification and security benefits are on the application side,
not on the DM-Multipath side, so I can see what you're getting at. From
the DM-Multipath perspective things get a little more complex.
From an application perspective, a single API that works across block
device types (SCSI, NVMe, DM-Multipath) and requires no privileges or
sockets (they are a pain in container environments) is the most
convenient. The <linux/pr.h> ioctl API offers exactly this.
Unfortunately, DM-Multipath currently does not fully support
<linux/pr.h>. It sends PR operations down each path, but that is only a
subset of libmpathpersist's logic and multipathd is not kept in sync.
My impression is that libmpathpersist and multipathd logic cannot be
easily moved into the kernel. This is where the upcall idea comes from.
Let's notify multipath-tools from DM-Multipath so it can do its work in
userspace.
Getting back to the application vs DM-Multipath advantages: I think it's
worth simplifying things for applications because there are many
applications and only one DM-Multipath.
Thanks,
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-04 18:32 ` Stefan Hajnoczi
@ 2026-02-04 23:57 ` Hannes Reinecke
2026-02-05 1:03 ` Benjamin Marzinski
2026-02-05 11:52 ` Martin Wilck
1 sibling, 1 reply; 25+ messages in thread
From: Hannes Reinecke @ 2026-02-04 23:57 UTC (permalink / raw)
To: Stefan Hajnoczi, Martin Wilck
Cc: Benjamin Marzinski, Paolo Bonzini, qemu-block, Kevin Wolf, afaria,
qemu-devel, Mikulas Patocka
On 2/4/26 19:32, Stefan Hajnoczi wrote:
> On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
>> Hi Stefan,
>>
>> On Tue, 2026-02-03 at 13:04 -0500, Stefan Hajnoczi wrote:
[ .. ]>>>
>>> It can be generic. The messages will contain the block device
>>> major:minor as well as information to describe <linux/pr.h> requests.
>>
>> So the ioctls will pass through qemu into the kernel, to be intercepted
>> by the dm-mpath driver, which will use an upcall to have them handled
>> by mpathpersistd (for the actual command) and multipathd (for the path
>> registrations).
>>
>> I don't fully understand the advantage, security and complexity-wise,
>> of this concept, compared to intercepting them qemu and using a socket
>> to talk to mpathpersistd directly. If we did this, we could even
>> support both generic and SCSI PR commands.
>
> Hi Martin,
> The simplification and security benefits are on the application side,
> not on the DM-Multipath side, so I can see what you're getting at. From
> the DM-Multipath perspective things get a little more complex.
>
> From an application perspective, a single API that works across block
> device types (SCSI, NVMe, DM-Multipath) and requires no privileges or
> sockets (they are a pain in container environments) is the most
> convenient. The <linux/pr.h> ioctl API offers exactly this.
>
> Unfortunately, DM-Multipath currently does not fully support
> <linux/pr.h>. It sends PR operations down each path, but that is only a
> subset of libmpathpersist's logic and multipathd is not kept in sync.
>
> My impression is that libmpathpersist and multipathd logic cannot be
> easily moved into the kernel. This is where the upcall idea comes from.
> Let's notify multipath-tools from DM-Multipath so it can do its work in
> userspace.
>
It _might_ be possible by extending the current path-switching
code in the kernel to keep track of PRs. The we could move the
registration upon path switching, and (ideally) could do away
with upcalls.
Not sure, though, how targets react when having to deal with a
flood of PR commands ...
But maybe worth a try.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-04 23:57 ` Hannes Reinecke
@ 2026-02-05 1:03 ` Benjamin Marzinski
2026-02-05 10:20 ` Martin Wilck
0 siblings, 1 reply; 25+ messages in thread
From: Benjamin Marzinski @ 2026-02-05 1:03 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Stefan Hajnoczi, Martin Wilck, Paolo Bonzini, qemu-block,
Kevin Wolf, afaria, qemu-devel, Mikulas Patocka
On Thu, Feb 05, 2026 at 12:57:38AM +0100, Hannes Reinecke wrote:
> On 2/4/26 19:32, Stefan Hajnoczi wrote:
> > On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
> > > Hi Stefan,
> > >
> > > On Tue, 2026-02-03 at 13:04 -0500, Stefan Hajnoczi wrote:
> [ .. ]>>>
> > > > It can be generic. The messages will contain the block device
> > > > major:minor as well as information to describe <linux/pr.h> requests.
> > >
> > > So the ioctls will pass through qemu into the kernel, to be intercepted
> > > by the dm-mpath driver, which will use an upcall to have them handled
> > > by mpathpersistd (for the actual command) and multipathd (for the path
> > > registrations).
> > >
> > > I don't fully understand the advantage, security and complexity-wise,
> > > of this concept, compared to intercepting them qemu and using a socket
> > > to talk to mpathpersistd directly. If we did this, we could even
> > > support both generic and SCSI PR commands.
> >
> > Hi Martin,
> > The simplification and security benefits are on the application side,
> > not on the DM-Multipath side, so I can see what you're getting at. From
> > the DM-Multipath perspective things get a little more complex.
> >
> > From an application perspective, a single API that works across block
> > device types (SCSI, NVMe, DM-Multipath) and requires no privileges or
> > sockets (they are a pain in container environments) is the most
> > convenient. The <linux/pr.h> ioctl API offers exactly this.
> >
> > Unfortunately, DM-Multipath currently does not fully support
> > <linux/pr.h>. It sends PR operations down each path, but that is only a
> > subset of libmpathpersist's logic and multipathd is not kept in sync.
> >
> > My impression is that libmpathpersist and multipathd logic cannot be
> > easily moved into the kernel. This is where the upcall idea comes from.
> > Let's notify multipath-tools from DM-Multipath so it can do its work in
> > userspace.
> >
> It _might_ be possible by extending the current path-switching
> code in the kernel to keep track of PRs. The we could move the
> registration upon path switching, and (ideally) could do away
> with upcalls.
> Not sure, though, how targets react when having to deal with a
> flood of PR commands ...
> But maybe worth a try.
Making a multipath device pretend to be single Persistently Reservable
device involves a lot of ugly workarounds that I'm not really excited to
see in the kernel.
For instance, every time a new path appears or a path that was down when
the device was registered comes up, multipath needs to register that
path. But a preempt could come it while it is doing this (or indeed any
time after multipath registered the other paths). So it has to check
the that the registrations are still there on the other paths before
registering the new path, and then check again afterwards to make sure
that there wasn't a preempt during the registration.
Worse, you can't release a reservation from a path that is down. If
multipath needs to release its reservation, and the path that is holding
it is down, the only solution I could come up with is to suspend the
device so no IO happens. Preempt the reservation to move it to an active
path, which wipes the registrations off all the other paths. Then
reregister the all the active paths again, and unsuspend the device.
The failed paths will get reregistred as they come back up.
And there's more cases like these. They are, of course, just as doable
in the kernel as in userspace, but it's a lot of persistent reservation
code to put into the multipath target.
-Ben
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke Kernel Storage Architect
> hare@suse.de +49 911 74053 688
> SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
> HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-05 1:03 ` Benjamin Marzinski
@ 2026-02-05 10:20 ` Martin Wilck
0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2026-02-05 10:20 UTC (permalink / raw)
To: Benjamin Marzinski, Hannes Reinecke
Cc: Stefan Hajnoczi, Paolo Bonzini, qemu-block, Kevin Wolf, afaria,
qemu-devel, Mikulas Patocka
On Wed, 2026-02-04 at 20:03 -0500, Benjamin Marzinski wrote:
> On Thu, Feb 05, 2026 at 12:57:38AM +0100, Hannes Reinecke wrote:
> > On 2/4/26 19:32, Stefan Hajnoczi wrote:
> > > On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
> > > > Hi Stefan,
> > > >
> > > > On Tue, 2026-02-03 at 13:04 -0500, Stefan Hajnoczi wrote:
> > [ .. ]>>>
> > > > > It can be generic. The messages will contain the block device
> > > > > major:minor as well as information to describe <linux/pr.h>
> > > > > requests.
> > > >
> > > > So the ioctls will pass through qemu into the kernel, to be
> > > > intercepted
> > > > by the dm-mpath driver, which will use an upcall to have them
> > > > handled
> > > > by mpathpersistd (for the actual command) and multipathd (for
> > > > the path
> > > > registrations).
> > > >
> > > > I don't fully understand the advantage, security and
> > > > complexity-wise,
> > > > of this concept, compared to intercepting them qemu and using a
> > > > socket
> > > > to talk to mpathpersistd directly. If we did this, we could
> > > > even
> > > > support both generic and SCSI PR commands.
> > >
> > > Hi Martin,
> > > The simplification and security benefits are on the application
> > > side,
> > > not on the DM-Multipath side, so I can see what you're getting
> > > at. From
> > > the DM-Multipath perspective things get a little more complex.
> > >
> > > From an application perspective, a single API that works across
> > > block
> > > device types (SCSI, NVMe, DM-Multipath) and requires no
> > > privileges or
> > > sockets (they are a pain in container environments) is the most
> > > convenient. The <linux/pr.h> ioctl API offers exactly this.
> > >
> > > Unfortunately, DM-Multipath currently does not fully support
> > > <linux/pr.h>. It sends PR operations down each path, but that is
> > > only a
> > > subset of libmpathpersist's logic and multipathd is not kept in
> > > sync.
> > >
> > > My impression is that libmpathpersist and multipathd logic cannot
> > > be
> > > easily moved into the kernel. This is where the upcall idea comes
> > > from.
> > > Let's notify multipath-tools from DM-Multipath so it can do its
> > > work in
> > > userspace.
> > >
> > It _might_ be possible by extending the current path-switching
> > code in the kernel to keep track of PRs. The we could move the
> > registration upon path switching, and (ideally) could do away
> > with upcalls.
> > Not sure, though, how targets react when having to deal with a
> > flood of PR commands ...
> > But maybe worth a try.
>
> Making a multipath device pretend to be single Persistently
> Reservable
> device involves a lot of ugly workarounds that I'm not really excited
> to
> see in the kernel.
>
> For instance, every time a new path appears or a path that was down
> when
> the device was registered comes up, multipath needs to register that
> path. But a preempt could come it while it is doing this (or indeed
> any
> time after multipath registered the other paths). So it has to check
> the that the registrations are still there on the other paths before
> registering the new path, and then check again afterwards to make
> sure
> that there wasn't a preempt during the registration.
>
> Worse, you can't release a reservation from a path that is down. If
> multipath needs to release its reservation, and the path that is
> holding
> it is down, the only solution I could come up with is to suspend the
> device so no IO happens. Preempt the reservation to move it to an
> active
> path, which wipes the registrations off all the other paths. Then
> reregister the all the active paths again, and unsuspend the device.
> The failed paths will get reregistred as they come back up.
>
> And there's more cases like these. They are, of course, just as
> doable
> in the kernel as in userspace, but it's a lot of persistent
> reservation
> code to put into the multipath target.
Having reviewed (or tried to do so) Ben's code for handling the various
corner cases, I agree that we don't want to start all over with this.
Martin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-04 18:32 ` Stefan Hajnoczi
2026-02-04 23:57 ` Hannes Reinecke
@ 2026-02-05 11:52 ` Martin Wilck
2026-02-05 12:01 ` Daniel P. Berrangé
2026-02-05 14:28 ` Stefan Hajnoczi
1 sibling, 2 replies; 25+ messages in thread
From: Martin Wilck @ 2026-02-05 11:52 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Benjamin Marzinski, Paolo Bonzini, qemu-block, Kevin Wolf,
Hannes Reinecke, afaria, qemu-devel, Mikulas Patocka
On Wed, 2026-02-04 at 13:32 -0500, Stefan Hajnoczi wrote:
> On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
> > Hi Stefan,
> >
> > So the ioctls will pass through qemu into the kernel, to be
> > intercepted
> > by the dm-mpath driver, which will use an upcall to have them
> > handled
> > by mpathpersistd (for the actual command) and multipathd (for the
> > path
> > registrations).
> >
> > I don't fully understand the advantage, security and complexity-
> > wise,
> > of this concept, compared to intercepting them qemu and using a
> > socket
> > to talk to mpathpersistd directly. If we did this, we could even
> > support both generic and SCSI PR commands.
>
> Hi Martin,
> The simplification and security benefits are on the application side,
> not on the DM-Multipath side, so I can see what you're getting at.
> From
> the DM-Multipath perspective things get a little more complex.
>
> From an application perspective, a single API that works across block
> device types (SCSI, NVMe, DM-Multipath) and requires no privileges or
> sockets (they are a pain in container environments) is the most
> convenient. The <linux/pr.h> ioctl API offers exactly this.
I may be missing something, but AFAICS the PR ioctls require having a
block device open for writing, which does either require root
privileges, or some file descriptor previously opened with privileges
and forwarded to another, less privileged process. No?
> Unfortunately, DM-Multipath currently does not fully support
> <linux/pr.h>. It sends PR operations down each path, but that is only
> a
> subset of libmpathpersist's logic and multipathd is not kept in sync.
>
> My impression is that libmpathpersist and multipathd logic cannot be
> easily moved into the kernel. This is where the upcall idea comes
> from.
> Let's notify multipath-tools from DM-Multipath so it can do its work
> in
> userspace.
I agree.
> Getting back to the application vs DM-Multipath advantages: I think
> it's
> worth simplifying things for applications because there are many
> applications and only one DM-Multipath.
TBH, I don't see so many applications. Actually I am having trouble
finding any application at all that uses the generic linux PR
functionality. I haven't even found a basic command line tool that
encapsulates the ioctls, are you aware of one? That would be the first
thing we need, be it only for testing the kernel.
As for applications using SCSI/NVMe PRs, I also don't see many, at
least not in the Linux / open source realm. Actually, qemu is the only
one that immediately comes to my mind. I can imagine that storage
management tools for e.g. OpenStack or Kubernetes would want to use
PRs, but I don't know any details.
Wrt sockets, not sure what's so painful about them. multipathd recently
enabled a pathname sockets for qemu-pr-helper in KubeVirt [1].
Thanks,
Martin
[1] https://github.com/opensvc/multipath-tools/issues/111
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-05 11:52 ` Martin Wilck
@ 2026-02-05 12:01 ` Daniel P. Berrangé
2026-02-05 13:39 ` Stefan Hajnoczi
2026-02-05 14:28 ` Stefan Hajnoczi
1 sibling, 1 reply; 25+ messages in thread
From: Daniel P. Berrangé @ 2026-02-05 12:01 UTC (permalink / raw)
To: Martin Wilck
Cc: Stefan Hajnoczi, Benjamin Marzinski, Paolo Bonzini, qemu-block,
Kevin Wolf, Hannes Reinecke, afaria, qemu-devel, Mikulas Patocka
On Thu, Feb 05, 2026 at 12:52:33PM +0100, Martin Wilck wrote:
> On Wed, 2026-02-04 at 13:32 -0500, Stefan Hajnoczi wrote:
> > On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
> > > Hi Stefan,
> > >
> > > So the ioctls will pass through qemu into the kernel, to be
> > > intercepted
> > > by the dm-mpath driver, which will use an upcall to have them
> > > handled
> > > by mpathpersistd (for the actual command) and multipathd (for the
> > > path
> > > registrations).
> > >
> > > I don't fully understand the advantage, security and complexity-
> > > wise,
> > > of this concept, compared to intercepting them qemu and using a
> > > socket
> > > to talk to mpathpersistd directly. If we did this, we could even
> > > support both generic and SCSI PR commands.
> >
> > Hi Martin,
> > The simplification and security benefits are on the application side,
> > not on the DM-Multipath side, so I can see what you're getting at.
> > From
> > the DM-Multipath perspective things get a little more complex.
> >
> > From an application perspective, a single API that works across block
> > device types (SCSI, NVMe, DM-Multipath) and requires no privileges or
> > sockets (they are a pain in container environments) is the most
> > convenient. The <linux/pr.h> ioctl API offers exactly this.
>
> I may be missing something, but AFAICS the PR ioctls require having a
> block device open for writing, which does either require root
> privileges, or some file descriptor previously opened with privileges
> and forwarded to another, less privileged process. No?
While QEMU is run unprivileged, libvirt will grant QEMU access any block
devices that have been configured for the guest in question. On Linux,
libvirt will create a new /dev tmpfs populated with the allow-list of
device nodes the guest is permitted to access, with suitable file
permissions, ownership & SELinux labels set.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-05 12:01 ` Daniel P. Berrangé
@ 2026-02-05 13:39 ` Stefan Hajnoczi
2026-02-06 0:03 ` Hannes Reinecke
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-02-05 13:39 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Martin Wilck, Benjamin Marzinski, Paolo Bonzini, qemu-block,
Kevin Wolf, Hannes Reinecke, afaria, qemu-devel, Mikulas Patocka
[-- Attachment #1: Type: text/plain, Size: 2785 bytes --]
On Thu, Feb 05, 2026 at 12:01:13PM +0000, Daniel P. Berrangé wrote:
> On Thu, Feb 05, 2026 at 12:52:33PM +0100, Martin Wilck wrote:
> > On Wed, 2026-02-04 at 13:32 -0500, Stefan Hajnoczi wrote:
> > > On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
> > > > Hi Stefan,
> > > >
> > > > So the ioctls will pass through qemu into the kernel, to be
> > > > intercepted
> > > > by the dm-mpath driver, which will use an upcall to have them
> > > > handled
> > > > by mpathpersistd (for the actual command) and multipathd (for the
> > > > path
> > > > registrations).
> > > >
> > > > I don't fully understand the advantage, security and complexity-
> > > > wise,
> > > > of this concept, compared to intercepting them qemu and using a
> > > > socket
> > > > to talk to mpathpersistd directly. If we did this, we could even
> > > > support both generic and SCSI PR commands.
> > >
> > > Hi Martin,
> > > The simplification and security benefits are on the application side,
> > > not on the DM-Multipath side, so I can see what you're getting at.
> > > From
> > > the DM-Multipath perspective things get a little more complex.
> > >
> > > From an application perspective, a single API that works across block
> > > device types (SCSI, NVMe, DM-Multipath) and requires no privileges or
> > > sockets (they are a pain in container environments) is the most
> > > convenient. The <linux/pr.h> ioctl API offers exactly this.
> >
> > I may be missing something, but AFAICS the PR ioctls require having a
> > block device open for writing, which does either require root
> > privileges, or some file descriptor previously opened with privileges
> > and forwarded to another, less privileged process. No?
>
> While QEMU is run unprivileged, libvirt will grant QEMU access any block
> devices that have been configured for the guest in question. On Linux,
> libvirt will create a new /dev tmpfs populated with the allow-list of
> device nodes the guest is permitted to access, with suitable file
> permissions, ownership & SELinux labels set.
Ultimately something does require privileges to give an unprivileged
application access to a block device. That could be udev rules, it could
be libvirt, etc.
I would say the real distinction is between the privileges needed so the
application can access the block device vs the privileges needed to
perform PR operations. If udev or libvirt has set up block device nodes,
an unprivileged application can open them for read/write access. But it
would require CAP_SYS_RAWIO for SG_IO PR operations on top of that
whereas <linux/pr.h> ioctls do not require that.
Therefore there is a real advantages regarding privileges when using
<linux/pr.h> ioctls vs libmpathpersist.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-05 11:52 ` Martin Wilck
2026-02-05 12:01 ` Daniel P. Berrangé
@ 2026-02-05 14:28 ` Stefan Hajnoczi
1 sibling, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-02-05 14:28 UTC (permalink / raw)
To: Martin Wilck
Cc: Benjamin Marzinski, Paolo Bonzini, qemu-block, Kevin Wolf,
Hannes Reinecke, afaria, qemu-devel, Mikulas Patocka
[-- Attachment #1: Type: text/plain, Size: 2080 bytes --]
On Thu, Feb 05, 2026 at 12:52:33PM +0100, Martin Wilck wrote:
> On Wed, 2026-02-04 at 13:32 -0500, Stefan Hajnoczi wrote:
> > On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
> > Getting back to the application vs DM-Multipath advantages: I think
> > it's
> > worth simplifying things for applications because there are many
> > applications and only one DM-Multipath.
>
> TBH, I don't see so many applications. Actually I am having trouble
> finding any application at all that uses the generic linux PR
> functionality. I haven't even found a basic command line tool that
> encapsulates the ioctls, are you aware of one? That would be the first
> thing we need, be it only for testing the kernel.
blkpr(8) is part of util-linux:
https://github.com/util-linux/util-linux/blob/master/sys-utils/blkpr.c
> As for applications using SCSI/NVMe PRs, I also don't see many, at
> least not in the Linux / open source realm. Actually, qemu is the only
> one that immediately comes to my mind. I can imagine that storage
> management tools for e.g. OpenStack or Kubernetes would want to use
> PRs, but I don't know any details.
HA/clustering frameworks as well.
> Wrt sockets, not sure what's so painful about them. multipathd recently
> enabled a pathname sockets for qemu-pr-helper in KubeVirt [1].
This is a good example. KubeVirt has code to:
- Create a pr-helper container image with qemu-pr-helper,
libmpathpersist, and a multipath.conf file.
- Run the pr-helper container with the host's multipathd socket and
/etc/multipath directory passed through.
- Run a multipath-monitor daemon that remounts the multipathd UNIX
domain socket bind mount when it changes (multipathd restart?).
And that's just at the KubeVirt level. Libvirt and QEMU also have code
to make this possible.
That is a lot of setup! All of this goes away if QEMU can use
<linux/pr.h> ioctls instead of libmpathpersist. It's just not needed.
Stefan
>
> Thanks,
> Martin
>
> [1] https://github.com/opensvc/multipath-tools/issues/111
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-05 13:39 ` Stefan Hajnoczi
@ 2026-02-06 0:03 ` Hannes Reinecke
2026-02-06 14:08 ` Stefan Hajnoczi
0 siblings, 1 reply; 25+ messages in thread
From: Hannes Reinecke @ 2026-02-06 0:03 UTC (permalink / raw)
To: Stefan Hajnoczi, Daniel P. Berrangé
Cc: Martin Wilck, Benjamin Marzinski, Paolo Bonzini, qemu-block,
Kevin Wolf, afaria, qemu-devel, Mikulas Patocka
On 2/5/26 14:39, Stefan Hajnoczi wrote:
> On Thu, Feb 05, 2026 at 12:01:13PM +0000, Daniel P. Berrangé wrote:
>> On Thu, Feb 05, 2026 at 12:52:33PM +0100, Martin Wilck wrote:
>>> On Wed, 2026-02-04 at 13:32 -0500, Stefan Hajnoczi wrote:
>>>> On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> So the ioctls will pass through qemu into the kernel, to be
>>>>> intercepted
>>>>> by the dm-mpath driver, which will use an upcall to have them
>>>>> handled
>>>>> by mpathpersistd (for the actual command) and multipathd (for the
>>>>> path
>>>>> registrations).
>>>>>
>>>>> I don't fully understand the advantage, security and complexity-
>>>>> wise,
>>>>> of this concept, compared to intercepting them qemu and using a
>>>>> socket
>>>>> to talk to mpathpersistd directly. If we did this, we could even
>>>>> support both generic and SCSI PR commands.
>>>>
>>>> Hi Martin,
>>>> The simplification and security benefits are on the application side,
>>>> not on the DM-Multipath side, so I can see what you're getting at.
>>>> From
>>>> the DM-Multipath perspective things get a little more complex.
>>>>
>>>> From an application perspective, a single API that works across block
>>>> device types (SCSI, NVMe, DM-Multipath) and requires no privileges or
>>>> sockets (they are a pain in container environments) is the most
>>>> convenient. The <linux/pr.h> ioctl API offers exactly this.
>>>
>>> I may be missing something, but AFAICS the PR ioctls require having a
>>> block device open for writing, which does either require root
>>> privileges, or some file descriptor previously opened with privileges
>>> and forwarded to another, less privileged process. No?
>>
>> While QEMU is run unprivileged, libvirt will grant QEMU access any block
>> devices that have been configured for the guest in question. On Linux,
>> libvirt will create a new /dev tmpfs populated with the allow-list of
>> device nodes the guest is permitted to access, with suitable file
>> permissions, ownership & SELinux labels set.
>
> Ultimately something does require privileges to give an unprivileged
> application access to a block device. That could be udev rules, it could
> be libvirt, etc.
>
> I would say the real distinction is between the privileges needed so the
> application can access the block device vs the privileges needed to
> perform PR operations. If udev or libvirt has set up block device nodes,
> an unprivileged application can open them for read/write access. But it
> would require CAP_SYS_RAWIO for SG_IO PR operations on top of that
> whereas <linux/pr.h> ioctls do not require that.
>
That would make sense, but unfortunately READ KEYS (and READ
RESERVATIONS) requires the same privileges than the other
blkpr functions. Might be an idea to change that, though.
> Therefore there is a real advantages regarding privileges when using
> <linux/pr.h> ioctls vs libmpathpersist.
>
There actually is a large argument here, namely that blkpr is
device independent. So when using it it doesn't matter whether
you are accessing a SCSI device or a NVME device, blkpr will
work in both cases.
And as such it's a far better choice for generic frameworks
like qemu.
For multipath less so, I agree, as that is pretty much SCSI-only.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-06 0:03 ` Hannes Reinecke
@ 2026-02-06 14:08 ` Stefan Hajnoczi
2026-02-09 12:50 ` Hannes Reinecke
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-02-06 14:08 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Daniel P. Berrangé, Martin Wilck, Benjamin Marzinski,
Paolo Bonzini, qemu-block, Kevin Wolf, afaria, qemu-devel,
Mikulas Patocka
[-- Attachment #1: Type: text/plain, Size: 4045 bytes --]
On Fri, Feb 06, 2026 at 01:03:18AM +0100, Hannes Reinecke wrote:
> On 2/5/26 14:39, Stefan Hajnoczi wrote:
> > On Thu, Feb 05, 2026 at 12:01:13PM +0000, Daniel P. Berrangé wrote:
> > > On Thu, Feb 05, 2026 at 12:52:33PM +0100, Martin Wilck wrote:
> > > > On Wed, 2026-02-04 at 13:32 -0500, Stefan Hajnoczi wrote:
> > > > > On Wed, Feb 04, 2026 at 02:19:48PM +0100, Martin Wilck wrote:
> > > > > > Hi Stefan,
> > > > > >
> > > > > > So the ioctls will pass through qemu into the kernel, to be
> > > > > > intercepted
> > > > > > by the dm-mpath driver, which will use an upcall to have them
> > > > > > handled
> > > > > > by mpathpersistd (for the actual command) and multipathd (for the
> > > > > > path
> > > > > > registrations).
> > > > > >
> > > > > > I don't fully understand the advantage, security and complexity-
> > > > > > wise,
> > > > > > of this concept, compared to intercepting them qemu and using a
> > > > > > socket
> > > > > > to talk to mpathpersistd directly. If we did this, we could even
> > > > > > support both generic and SCSI PR commands.
> > > > >
> > > > > Hi Martin,
> > > > > The simplification and security benefits are on the application side,
> > > > > not on the DM-Multipath side, so I can see what you're getting at.
> > > > > From
> > > > > the DM-Multipath perspective things get a little more complex.
> > > > >
> > > > > From an application perspective, a single API that works across block
> > > > > device types (SCSI, NVMe, DM-Multipath) and requires no privileges or
> > > > > sockets (they are a pain in container environments) is the most
> > > > > convenient. The <linux/pr.h> ioctl API offers exactly this.
> > > >
> > > > I may be missing something, but AFAICS the PR ioctls require having a
> > > > block device open for writing, which does either require root
> > > > privileges, or some file descriptor previously opened with privileges
> > > > and forwarded to another, less privileged process. No?
> > >
> > > While QEMU is run unprivileged, libvirt will grant QEMU access any block
> > > devices that have been configured for the guest in question. On Linux,
> > > libvirt will create a new /dev tmpfs populated with the allow-list of
> > > device nodes the guest is permitted to access, with suitable file
> > > permissions, ownership & SELinux labels set.
> >
> > Ultimately something does require privileges to give an unprivileged
> > application access to a block device. That could be udev rules, it could
> > be libvirt, etc.
> >
> > I would say the real distinction is between the privileges needed so the
> > application can access the block device vs the privileges needed to
> > perform PR operations. If udev or libvirt has set up block device nodes,
> > an unprivileged application can open them for read/write access. But it
> > would require CAP_SYS_RAWIO for SG_IO PR operations on top of that
> > whereas <linux/pr.h> ioctls do not require that.
> >
> That would make sense, but unfortunately READ KEYS (and READ
> RESERVATIONS) requires the same privileges than the other
> blkpr functions. Might be an idea to change that, though.
Making sure I understand:
blkdev_pr_read_keys() and blkdev_pr_read_reservation() in Linux
block/ioctl.c should be adjusted to allow not just BLK_OPEN_WRITE but
also BLK_OPEN_READ?
I think it's okay from a security perspective: if the application can
already read the entire disk, then it's okay for it to read the keys and
reservation information. But I'm not 100% sure...
In any case, I think this idea is orthogonal to the discussion about
DM-Multipath <linux/pr.h> support. In terms of permission requirements,
<linux/pr.h> already requires fewer permissions (just opening the block
device for write) today than libmpathpersist or ioctl(SG_IO). Or do you
see a scenario where the application wants to open the block device as
root (or CAP_SYS_RAWIO) but read-only, so requiring opening the device
with write permissions is a blocker?
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-06 14:08 ` Stefan Hajnoczi
@ 2026-02-09 12:50 ` Hannes Reinecke
2026-02-09 14:23 ` Stefan Hajnoczi
0 siblings, 1 reply; 25+ messages in thread
From: Hannes Reinecke @ 2026-02-09 12:50 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Daniel P. Berrangé, Martin Wilck, Benjamin Marzinski,
Paolo Bonzini, qemu-block, Kevin Wolf, afaria, qemu-devel,
Mikulas Patocka
On 2/6/26 15:08, Stefan Hajnoczi wrote:
> On Fri, Feb 06, 2026 at 01:03:18AM +0100, Hannes Reinecke wrote:
>> On 2/5/26 14:39, Stefan Hajnoczi wrote:
[ .. ]
>> That would make sense, but unfortunately READ KEYS (and READ
>> RESERVATIONS) requires the same privileges than the other
>> blkpr functions. Might be an idea to change that, though.
>
> Making sure I understand:
>
> blkdev_pr_read_keys() and blkdev_pr_read_reservation() in Linux
> block/ioctl.c should be adjusted to allow not just BLK_OPEN_WRITE but
> also BLK_OPEN_READ?
>
Yes.
> I think it's okay from a security perspective: if the application can
> already read the entire disk, then it's okay for it to read the keys and
> reservation information. But I'm not 100% sure...
>
> In any case, I think this idea is orthogonal to the discussion about
> DM-Multipath <linux/pr.h> support. In terms of permission requirements,
> <linux/pr.h> already requires fewer permissions (just opening the block
> device for write) today than libmpathpersist or ioctl(SG_IO). Or do you
> see a scenario where the application wants to open the block device as
> root (or CAP_SYS_RAWIO) but read-only, so requiring opening the device
> with write permissions is a blocker?
>
My concern with opening the block device with BLK_OPEN_WRITE is that
this will trigger udev to 'synthesize' (ie regenerate) an 'add' event
on close, causing 'interesting' effects as this will cascade down
through the udev rule chain, triggering blkid, partition scan, you
name it.
Horrible, horrible, horrible.
Don't do it.
Especially not as you are only interested in reading information,
and not changing the disk state in any way.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-09 12:50 ` Hannes Reinecke
@ 2026-02-09 14:23 ` Stefan Hajnoczi
2026-02-10 10:23 ` Martin Wilck
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-02-09 14:23 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Daniel P. Berrangé, Martin Wilck, Benjamin Marzinski,
Paolo Bonzini, qemu-block, Kevin Wolf, afaria, qemu-devel,
Mikulas Patocka
[-- Attachment #1: Type: text/plain, Size: 1985 bytes --]
On Mon, Feb 09, 2026 at 01:50:00PM +0100, Hannes Reinecke wrote:
> On 2/6/26 15:08, Stefan Hajnoczi wrote:
> > On Fri, Feb 06, 2026 at 01:03:18AM +0100, Hannes Reinecke wrote:
> > > On 2/5/26 14:39, Stefan Hajnoczi wrote:
> [ .. ]
> > > That would make sense, but unfortunately READ KEYS (and READ
> > > RESERVATIONS) requires the same privileges than the other
> > > blkpr functions. Might be an idea to change that, though.
> >
> > Making sure I understand:
> >
> > blkdev_pr_read_keys() and blkdev_pr_read_reservation() in Linux
> > block/ioctl.c should be adjusted to allow not just BLK_OPEN_WRITE but
> > also BLK_OPEN_READ?
> >
> Yes.
>
> > I think it's okay from a security perspective: if the application can
> > already read the entire disk, then it's okay for it to read the keys and
> > reservation information. But I'm not 100% sure...
> >
> > In any case, I think this idea is orthogonal to the discussion about
> > DM-Multipath <linux/pr.h> support. In terms of permission requirements,
> > <linux/pr.h> already requires fewer permissions (just opening the block
> > device for write) today than libmpathpersist or ioctl(SG_IO). Or do you
> > see a scenario where the application wants to open the block device as
> > root (or CAP_SYS_RAWIO) but read-only, so requiring opening the device
> > with write permissions is a blocker?
> >
>
> My concern with opening the block device with BLK_OPEN_WRITE is that
> this will trigger udev to 'synthesize' (ie regenerate) an 'add' event
> on close, causing 'interesting' effects as this will cascade down
> through the udev rule chain, triggering blkid, partition scan, you
> name it.
> Horrible, horrible, horrible.
> Don't do it.
>
> Especially not as you are only interested in reading information,
> and not changing the disk state in any way.
I see. I will send patches to change blkdev_pr_read_keys()
blkdev_pr_read_reservation() to require only BLK_OPEN_READ.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-09 14:23 ` Stefan Hajnoczi
@ 2026-02-10 10:23 ` Martin Wilck
2026-02-10 13:59 ` Stefan Hajnoczi
0 siblings, 1 reply; 25+ messages in thread
From: Martin Wilck @ 2026-02-10 10:23 UTC (permalink / raw)
To: Stefan Hajnoczi, Hannes Reinecke
Cc: Daniel P. Berrangé, Benjamin Marzinski, Paolo Bonzini,
qemu-block, Kevin Wolf, afaria, qemu-devel, Mikulas Patocka
On Mon, 2026-02-09 at 09:23 -0500, Stefan Hajnoczi wrote:
> On Mon, Feb 09, 2026 at 01:50:00PM +0100, Hannes Reinecke wrote:
> >
> > My concern with opening the block device with BLK_OPEN_WRITE is
> > that
> > this will trigger udev to 'synthesize' (ie regenerate) an 'add'
> > event
> > on close, causing 'interesting' effects as this will cascade down
> > through the udev rule chain, triggering blkid, partition scan, you
> > name it.
> > Horrible, horrible, horrible.
> > Don't do it.
> >
> > Especially not as you are only interested in reading information,
> > and not changing the disk state in any way.
>
> I see. I will send patches to change blkdev_pr_read_keys()
> blkdev_pr_read_reservation() to require only BLK_OPEN_READ.
Hm, but then if you need to make an actual reservation and call e.g.
blkdev_pr_reserve(), you'll need to re-open the device r/w, and have
the same problem. I'm not sure I understand the issue here ... doesn't
your process have an open fd to the device in question anyway? You just
need to be sure not to _close_ it, as that's what causes the uevents
(udev uses an IN_CLOSE_WRITE inotify(7) event on the device node).
Another option would be to temporarily disable the udev "watch"
property for the device(s) in question.
Regards
Martin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-10 10:23 ` Martin Wilck
@ 2026-02-10 13:59 ` Stefan Hajnoczi
2026-02-10 14:29 ` Martin Wilck
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2026-02-10 13:59 UTC (permalink / raw)
To: Martin Wilck
Cc: Hannes Reinecke, Daniel P. Berrangé, Benjamin Marzinski,
Paolo Bonzini, qemu-block, Kevin Wolf, afaria, qemu-devel,
Mikulas Patocka
[-- Attachment #1: Type: text/plain, Size: 1510 bytes --]
On Tue, Feb 10, 2026 at 11:23:51AM +0100, Martin Wilck wrote:
> On Mon, 2026-02-09 at 09:23 -0500, Stefan Hajnoczi wrote:
> > On Mon, Feb 09, 2026 at 01:50:00PM +0100, Hannes Reinecke wrote:
> > >
> > > My concern with opening the block device with BLK_OPEN_WRITE is
> > > that
> > > this will trigger udev to 'synthesize' (ie regenerate) an 'add'
> > > event
> > > on close, causing 'interesting' effects as this will cascade down
> > > through the udev rule chain, triggering blkid, partition scan, you
> > > name it.
> > > Horrible, horrible, horrible.
> > > Don't do it.
> > >
> > > Especially not as you are only interested in reading information,
> > > and not changing the disk state in any way.
> >
> > I see. I will send patches to change blkdev_pr_read_keys()
> > blkdev_pr_read_reservation() to require only BLK_OPEN_READ.
>
> Hm, but then if you need to make an actual reservation and call e.g.
> blkdev_pr_reserve(), you'll need to re-open the device r/w, and have
> the same problem. I'm not sure I understand the issue here ... doesn't
> your process have an open fd to the device in question anyway? You just
> need to be sure not to _close_ it, as that's what causes the uevents
> (udev uses an IN_CLOSE_WRITE inotify(7) event on the device node).
Shell scripts are the most exposed to this issue because they invoke
blkpr(8) anew each time instead of keeping the fd open between reading
reservation details and performing a reservation operation.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h>
2026-02-10 13:59 ` Stefan Hajnoczi
@ 2026-02-10 14:29 ` Martin Wilck
0 siblings, 0 replies; 25+ messages in thread
From: Martin Wilck @ 2026-02-10 14:29 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Hannes Reinecke, Daniel P. Berrangé, Benjamin Marzinski,
Paolo Bonzini, qemu-block, Kevin Wolf, afaria, qemu-devel,
Mikulas Patocka
On Tue, 2026-02-10 at 08:59 -0500, Stefan Hajnoczi wrote:
> On Tue, Feb 10, 2026 at 11:23:51AM +0100, Martin Wilck wrote:
> > On Mon, 2026-02-09 at 09:23 -0500, Stefan Hajnoczi wrote:
> > > On Mon, Feb 09, 2026 at 01:50:00PM +0100, Hannes Reinecke wrote:
> > > >
> > > > My concern with opening the block device with BLK_OPEN_WRITE is
> > > > that
> > > > this will trigger udev to 'synthesize' (ie regenerate) an 'add'
> > > > event
> > > > on close, causing 'interesting' effects as this will cascade
> > > > down
> > > > through the udev rule chain, triggering blkid, partition scan,
> > > > you
> > > > name it.
> > > > Horrible, horrible, horrible.
> > > > Don't do it.
> > > >
> > > > Especially not as you are only interested in reading
> > > > information,
> > > > and not changing the disk state in any way.
> > >
> > > I see. I will send patches to change blkdev_pr_read_keys()
> > > blkdev_pr_read_reservation() to require only BLK_OPEN_READ.
> >
> > Hm, but then if you need to make an actual reservation and call
> > e.g.
> > blkdev_pr_reserve(), you'll need to re-open the device r/w, and
> > have
> > the same problem. I'm not sure I understand the issue here ...
> > doesn't
> > your process have an open fd to the device in question anyway? You
> > just
> > need to be sure not to _close_ it, as that's what causes the
> > uevents
> > (udev uses an IN_CLOSE_WRITE inotify(7) event on the device node).
>
> Shell scripts are the most exposed to this issue because they invoke
> blkpr(8) anew each time instead of keeping the fd open between
> reading
> reservation details and performing a reservation operation.
Right. We shouldn't do that. If we find we have to, we _must_ remove
the "watch" udev property from the devices in question.
Martin
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2026-02-10 14:30 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27 18:47 Moving from qemu-pr-helper and libmpathpersist to <linux/pr.h> Stefan Hajnoczi
2026-01-27 19:45 ` Paolo Bonzini
2026-01-28 14:18 ` Stefan Hajnoczi
2026-01-28 15:30 ` Hannes Reinecke
2026-01-28 16:13 ` Stefan Hajnoczi
2026-01-27 21:06 ` Benjamin Marzinski
2026-02-03 15:09 ` Stefan Hajnoczi
2026-02-03 17:53 ` Benjamin Marzinski
2026-02-03 18:04 ` Stefan Hajnoczi
2026-02-04 13:19 ` Martin Wilck
2026-02-04 18:32 ` Stefan Hajnoczi
2026-02-04 23:57 ` Hannes Reinecke
2026-02-05 1:03 ` Benjamin Marzinski
2026-02-05 10:20 ` Martin Wilck
2026-02-05 11:52 ` Martin Wilck
2026-02-05 12:01 ` Daniel P. Berrangé
2026-02-05 13:39 ` Stefan Hajnoczi
2026-02-06 0:03 ` Hannes Reinecke
2026-02-06 14:08 ` Stefan Hajnoczi
2026-02-09 12:50 ` Hannes Reinecke
2026-02-09 14:23 ` Stefan Hajnoczi
2026-02-10 10:23 ` Martin Wilck
2026-02-10 13:59 ` Stefan Hajnoczi
2026-02-10 14:29 ` Martin Wilck
2026-02-05 14:28 ` Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.