[RFC] vduse config write support

virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed

* [RFC] vduse config write support
@ 2024-07-24  3:38 Srivatsa Vaddagiri
  2024-07-26  2:37 ` Yongji Xie
  2024-07-26  2:47 ` Jason Wang
  0 siblings, 2 replies; 14+ messages in thread
From: Srivatsa Vaddagiri @ 2024-07-24  3:38 UTC (permalink / raw)
  To: xieyongji, jasowang, stefanha
  Cc: virtio-dev, virtualization, quic_svaddagi, quic_mnalajal,
	quic_eberman, quic_pheragu, quic_pderrin, quic_cvanscha,
	quic_pkondeti, quic_tsoni

Currently vduse does not seem to support configuration space writes
(vduse_vdpa_set_config does nothing). Is there any plan to lift that
limitation? I am aware of the discussions that took place here:

https://patchwork.kernel.org/project/netdevbpf/patch/20210615141331.407-11-xieyongji@bytedance.com/

Perhaps writes can be supported *selectively* without violating safety concerns
expressed in the above email discussion?

We are thinking of using vduse for hypervisor assisted virtio devices, which
may need config write support and hence this question.

To provide more details, we have a untrusted host that spins off a protected
(confidential) guest VM on a Type-1 hypervisor (Gunyah). VMM in untrusted host
leads to couple of issues:

1) Latency of (virtio) register access. VMM can take too long to respond with
VCPU stalled all that while. I think vduse shares a similar concern, due to
which it maintains a cache of configuratin registers inside kernel.

2) For PCI pass-through devices, we are concerned of letting VMM be in charge of
emulating the complete configuration space (how can VM defend against invalid
attributes presented for passthr devices)? I am aware of TDISP, but I think it
may not be available for some of the devices on our platform.

One option we are considering is for hypervisor to be in charge of virtio-PCI
bus emulation, allowing only select devices (with recognized features) to be
registered on the bus. VMM would need to register devices/features with
hypervisor, which would verify it (as per some policy) and present them to VM on
the virtio-PCI bus it would emulate. Protected VM should be shielded from
invalid device configuration information that it may otherwise read from a
compromised VMM.

For virtio devices, the hypervisor would also service most register read/writes
(to address concern #1), which implies it would need to cache a copy of the
device configuration information (similar to vduse).

We think vduse can be leveraged here to initialize the hypervisor cache of
virtio registers. Basically have a vdpa-gunyah driver registered on the vdpa
bus to which vduse devices are bound (rather than virtio-vdpa or vhost-vdpa).
vdpa-gunyah driver can pull configuration information from vduse and pass that
on to hypervisor. It will also help inject IRQ and pass on queue notifications
(using hypervisor specific APIs).

We will however likely need vduse to support configuration writes (guest VM
updating configuration space, for ex: writing to 'events_clear' field in case of
virtio-gpu). Would vduse maintainers be willing to accept config_write support
for select devices/features (as long as the writes don't violate any safety
concerns we may have)?

Thanks
vatsa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-24  3:38 [RFC] vduse config write support Srivatsa Vaddagiri
@ 2024-07-26  2:37 ` Yongji Xie
  2024-07-26  7:06   ` Srivatsa Vaddagiri
  2024-07-26  2:47 ` Jason Wang
  1 sibling, 1 reply; 14+ messages in thread
From: Yongji Xie @ 2024-07-26  2:37 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jason Wang, Stefan Hajnoczi, virtio-dev, virtualization,
	quic_mnalajal, quic_eberman, quic_pheragu, quic_pderrin,
	quic_cvanscha, quic_pkondeti, quic_tsoni

On Wed, Jul 24, 2024 at 11:38 AM Srivatsa Vaddagiri
<quic_svaddagi@quicinc.com> wrote:
>
> Currently vduse does not seem to support configuration space writes
> (vduse_vdpa_set_config does nothing). Is there any plan to lift that
> limitation? I am aware of the discussions that took place here:
>

The problem is that current virtio code can't allow the failure of config write.

> https://patchwork.kernel.org/project/netdevbpf/patch/20210615141331.407-11-xieyongji@bytedance.com/
>
> Perhaps writes can be supported *selectively* without violating safety concerns
> expressed in the above email discussion?
>
> We are thinking of using vduse for hypervisor assisted virtio devices, which
> may need config write support and hence this question.
>
> To provide more details, we have a untrusted host that spins off a protected
> (confidential) guest VM on a Type-1 hypervisor (Gunyah). VMM in untrusted host
> leads to couple of issues:
>
> 1) Latency of (virtio) register access. VMM can take too long to respond with
> VCPU stalled all that while. I think vduse shares a similar concern, due to
> which it maintains a cache of configuratin registers inside kernel.
>
> 2) For PCI pass-through devices, we are concerned of letting VMM be in charge of
> emulating the complete configuration space (how can VM defend against invalid
> attributes presented for passthr devices)? I am aware of TDISP, but I think it
> may not be available for some of the devices on our platform.
>
> One option we are considering is for hypervisor to be in charge of virtio-PCI
> bus emulation, allowing only select devices (with recognized features) to be
> registered on the bus. VMM would need to register devices/features with
> hypervisor, which would verify it (as per some policy) and present them to VM on
> the virtio-PCI bus it would emulate. Protected VM should be shielded from
> invalid device configuration information that it may otherwise read from a
> compromised VMM.
>
> For virtio devices, the hypervisor would also service most register read/writes
> (to address concern #1), which implies it would need to cache a copy of the
> device configuration information (similar to vduse).
>
> We think vduse can be leveraged here to initialize the hypervisor cache of
> virtio registers. Basically have a vdpa-gunyah driver registered on the vdpa
> bus to which vduse devices are bound (rather than virtio-vdpa or vhost-vdpa).
> vdpa-gunyah driver can pull configuration information from vduse and pass that
> on to hypervisor. It will also help inject IRQ and pass on queue notifications
> (using hypervisor specific APIs).
>
> We will however likely need vduse to support configuration writes (guest VM
> updating configuration space, for ex: writing to 'events_clear' field in case of
> virtio-gpu). Would vduse maintainers be willing to accept config_write support
> for select devices/features (as long as the writes don't violate any safety
> concerns we may have)?
>

It would be easier to support it if the config write just triggers an
async operation on the device side, e.g. a doorbell. That means we
can't ensure that any required internal actions on the device side
triggered by the config write have been completed after the driver
gets a successful return. But I'm not sure if this is your case.

Thanks,
Yongji

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-24  3:38 [RFC] vduse config write support Srivatsa Vaddagiri
  2024-07-26  2:37 ` Yongji Xie
@ 2024-07-26  2:47 ` Jason Wang
  2024-07-26  5:15   ` Michael S. Tsirkin
                     ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Jason Wang @ 2024-07-26  2:47 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: xieyongji, stefanha, virtio-dev, virtualization, quic_mnalajal,
	quic_eberman, quic_pheragu, quic_pderrin, quic_cvanscha,
	quic_pkondeti, quic_tsoni, eperezma, Stefano Garzarella, mst,
	Cindy Lu

On Wed, Jul 24, 2024 at 11:45 AM Srivatsa Vaddagiri
<quic_svaddagi@quicinc.com> wrote:
>
> Currently vduse does not seem to support configuration space writes
> (vduse_vdpa_set_config does nothing). Is there any plan to lift that
> limitation? I am aware of the discussions that took place here:
>
> https://patchwork.kernel.org/project/netdevbpf/patch/20210615141331.407-11-xieyongji@bytedance.com/
>
> Perhaps writes can be supported *selectively* without violating safety concerns
> expressed in the above email discussion?

Adding more relevant people here.

It can probably be done case by case. The main reason for avoiding
config writing is

1) to prevent buggy/malicious userspace from hanging kernel driver for ever
2) to prevent buggy/malicious userspace device to break the semantic

Basically, it is the traditional trust model where the kernel doesn't
trust userspace.

E.g current virtio-blk has the following codes:

tatic ssize_t
cache_type_store(struct device *dev, struct device_attribute *attr,
         const char *buf, size_t count)
{
        struct gendisk *disk = dev_to_disk(dev);
        struct virtio_blk *vblk = disk->private_data;
        struct virtio_device *vdev = vblk->vdev;
        int i;

        BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
        i = sysfs_match_string(virtblk_cache_types, buf);
        if (i < 0)
        return i;

        virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
        virtblk_update_cache_mode(vdev);
        return count;
}

So basically the question is if we make the config write a posted
write or non-posted one.

If we make vduse config write a posted one, it means vduse doesn't
need to wait for the usersapce response. This is much more safe but it
breaks the above code as it looks like it requires a non-posted
semantic. We need to filter out virtio-blk or change the code to have
a read after the write:

while (virtio_cread8(vdev, offsetof(struct virtio_blk_config, wce) & XXX)
       /* sleep or other */

But we may suffer from the userspace who doesn't update the wce bit
forever, or maybe we can have a timeout there.

If we make vduse config write a non posted one, it means the vduse
needs to wait for the response from the userspace. It satisfies the
above code's assumption but it needs to deal with the buggy userspace
which might be challenging. Technically, we can have a device
emulation in the kernel but it looks like overkill for wce (or I don't
know how it can mandate wce for userspace devices).

I feel it might make sense for other devices that only require posted
write semantics.

>
> We are thinking of using vduse for hypervisor assisted virtio devices, which
> may need config write support and hence this question.
>
> To provide more details, we have a untrusted host that spins off a protected
> (confidential) guest VM on a Type-1 hypervisor (Gunyah). VMM in untrusted host
> leads to couple of issues:
>
> 1) Latency of (virtio) register access. VMM can take too long to respond with
> VCPU stalled all that while. I think vduse shares a similar concern, due to
> which it maintains a cache of configuratin registers inside kernel.

Maybe you can give an example for this? We cache the configuration
space to a faster access to that.

>
> 2) For PCI pass-through devices, we are concerned of letting VMM be in charge of
> emulating the complete configuration space (how can VM defend against invalid
> attributes presented for passthr devices)?

Virtio driver has been hardened for this, for example:

commit 72b5e8958738aaa453db5149e6ca3bcf416023b9
Author: Jason Wang <jasowang@redhat.com>
Date:   Fri Jun 4 13:53:50 2021 +0800

    virtio-ring: store DMA metadata in desc_extra for split virtqueue

More hardening work is ongoing.

> I am aware of TDISP, but I think it
> may not be available for some of the devices on our platform.
>
> One option we are considering is for hypervisor to be in charge of virtio-PCI
> bus emulation, allowing only select devices (with recognized features) to be
> registered on the bus. VMM would need to register devices/features with
> hypervisor, which would verify it (as per some policy) and present them to VM on
> the virtio-PCI bus it would emulate. Protected VM should be shielded from
> invalid device configuration information that it may otherwise read from a
> compromised VMM.
>
> For virtio devices, the hypervisor would also service most register read/writes
> (to address concern #1), which implies it would need to cache a copy of the
> device configuration information (similar to vduse).
>
> We think vduse can be leveraged here to initialize the hypervisor cache of
> virtio registers. Basically have a vdpa-gunyah driver registered on the vdpa
> bus to which vduse devices are bound (rather than virtio-vdpa or vhost-vdpa).
> vdpa-gunyah driver can pull configuration information from vduse and pass that
> on to hypervisor. It will also help inject IRQ and pass on queue notifications
> (using hypervisor specific APIs).

Just to make sure I understand the design here, is vdpa-gunyah
expected to have a dedicated uAPI other than vhost-vDPA? Wonder any
reason why vhost-vDPA can be used here.

>
> We will however likely need vduse to support configuration writes (guest VM
> updating configuration space, for ex: writing to 'events_clear' field in case of
> virtio-gpu). Would vduse maintainers be willing to accept config_write support
> for select devices/features (as long as the writes don't violate any safety
> concerns we may have)?

I think so, looking at virtio_gpu_config_changed_work_func(), the
events_clear seems to be fine to have a posted semantic.

Maybe you can post an RFC to support config writing and let's start from there?

Thanks

>
> Thanks
> vatsa
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-26  2:47 ` Jason Wang
@ 2024-07-26  5:15   ` Michael S. Tsirkin
  2024-07-29  2:06     ` Jason Wang
  2024-07-26  7:03   ` Srivatsa Vaddagiri
  2024-07-26 12:42   ` Srivatsa Vaddagiri
  2 siblings, 1 reply; 14+ messages in thread
From: Michael S. Tsirkin @ 2024-07-26  5:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: Srivatsa Vaddagiri, xieyongji, stefanha, virtio-dev,
	virtualization, quic_mnalajal, quic_eberman, quic_pheragu,
	quic_pderrin, quic_cvanscha, quic_pkondeti, quic_tsoni, eperezma,
	Stefano Garzarella, Cindy Lu

On Fri, Jul 26, 2024 at 10:47:59AM +0800, Jason Wang wrote:
> On Wed, Jul 24, 2024 at 11:45 AM Srivatsa Vaddagiri
> <quic_svaddagi@quicinc.com> wrote:
> >
> > Currently vduse does not seem to support configuration space writes
> > (vduse_vdpa_set_config does nothing). Is there any plan to lift that
> > limitation? I am aware of the discussions that took place here:
> >
> > https://patchwork.kernel.org/project/netdevbpf/patch/20210615141331.407-11-xieyongji@bytedance.com/
> >
> > Perhaps writes can be supported *selectively* without violating safety concerns
> > expressed in the above email discussion?
> 
> Adding more relevant people here.
> 
> It can probably be done case by case. The main reason for avoiding
> config writing is
> 
> 1) to prevent buggy/malicious userspace from hanging kernel driver for ever
> 2) to prevent buggy/malicious userspace device to break the semantic
> 
> Basically, it is the traditional trust model where the kernel doesn't
> trust userspace.
> 
> E.g current virtio-blk has the following codes:
> 
> tatic ssize_t
> cache_type_store(struct device *dev, struct device_attribute *attr,
>          const char *buf, size_t count)
> {
>         struct gendisk *disk = dev_to_disk(dev);
>         struct virtio_blk *vblk = disk->private_data;
>         struct virtio_device *vdev = vblk->vdev;
>         int i;
> 
>         BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
>         i = sysfs_match_string(virtblk_cache_types, buf);
>         if (i < 0)
>         return i;
> 
>         virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
>         virtblk_update_cache_mode(vdev);
>         return count;
> }

To be fair, I think if you allow a block device in userspace you have already
allowed said userspace to crash the kernel unless you have
also restricted the filesystems mounted on this device to FUSE.

-- 
MST


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-26  2:47 ` Jason Wang
  2024-07-26  5:15   ` Michael S. Tsirkin
@ 2024-07-26  7:03   ` Srivatsa Vaddagiri
  2024-07-26  7:29     ` Michael S. Tsirkin
  2024-07-29  2:16     ` Jason Wang
  2024-07-26 12:42   ` Srivatsa Vaddagiri
  2 siblings, 2 replies; 14+ messages in thread
From: Srivatsa Vaddagiri @ 2024-07-26  7:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: xieyongji, stefanha, virtio-dev, virtualization, quic_mnalajal,
	quic_eberman, quic_pheragu, quic_pderrin, quic_cvanscha,
	quic_pkondeti, quic_tsoni, eperezma, Stefano Garzarella, mst,
	Cindy Lu

* Jason Wang <jasowang@redhat.com> [2024-07-26 10:47:59]:

> On Wed, Jul 24, 2024 at 11:45???AM Srivatsa Vaddagiri
> <quic_svaddagi@quicinc.com> wrote:
> >
> > Currently vduse does not seem to support configuration space writes
> > (vduse_vdpa_set_config does nothing). Is there any plan to lift that
> > limitation? I am aware of the discussions that took place here:
> >
> > https://patchwork.kernel.org/project/netdevbpf/patch/20210615141331.407-11-xieyongji@bytedance.com/
> >
> > Perhaps writes can be supported *selectively* without violating safety concerns
> > expressed in the above email discussion?
> 
> Adding more relevant people here.
> 
> It can probably be done case by case. The main reason for avoiding
> config writing is
> 
> 1) to prevent buggy/malicious userspace from hanging kernel driver for ever
> 2) to prevent buggy/malicious userspace device to break the semantic
> 
> Basically, it is the traditional trust model where the kernel doesn't
> trust userspace.
> 
> E.g current virtio-blk has the following codes:
> 
> tatic ssize_t
> cache_type_store(struct device *dev, struct device_attribute *attr,
>          const char *buf, size_t count)
> {
>         struct gendisk *disk = dev_to_disk(dev);
>         struct virtio_blk *vblk = disk->private_data;
>         struct virtio_device *vdev = vblk->vdev;
>         int i;
> 
>         BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
>         i = sysfs_match_string(virtblk_cache_types, buf);
>         if (i < 0)
>         return i;
> 
>         virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
>         virtblk_update_cache_mode(vdev);
>         return count;
> }
> 
> So basically the question is if we make the config write a posted
> write or non-posted one.
> 
> If we make vduse config write a posted one, it means vduse doesn't
> need to wait for the usersapce response. This is much more safe but it
> breaks the above code as it looks like it requires a non-posted
> semantic. We need to filter out virtio-blk or change the code to have
> a read after the write:
> 
> while (virtio_cread8(vdev, offsetof(struct virtio_blk_config, wce) & XXX)
>        /* sleep or other */
>
> But we may suffer from the userspace who doesn't update the wce bit
> forever, or maybe we can have a timeout there.
> 
> If we make vduse config write a non posted one, it means the vduse
> needs to wait for the response from the userspace. 

I was thinking only a read followed by write will needs to stall on a response
from userspace, but yes we will have the same issue of dealing with buggy
userspace. At least for PCI, I think there is provision to use surprise removal
facility, where guest can be notified of malfunctioning device. In case of
userspace (vduse daemon) not responding in time, can the surprise removal event
be injected by VMM (hypervisor in our case)? Guest can decide what to do with it
next.

> It satisfies the
> above code's assumption but it needs to deal with the buggy userspace
> which might be challenging.  Technically, we can have a device
> emulation in the kernel but it looks like overkill for wce (or I don't
> know how it can mandate wce for userspace devices).
> 
> I feel it might make sense for other devices that only require posted
> write semantics.
> 
> >
> > We are thinking of using vduse for hypervisor assisted virtio devices, which
> > may need config write support and hence this question.
> >
> > To provide more details, we have a untrusted host that spins off a protected
> > (confidential) guest VM on a Type-1 hypervisor (Gunyah). VMM in untrusted host
> > leads to couple of issues:
> >
> > 1) Latency of (virtio) register access. VMM can take too long to respond with
> > VCPU stalled all that while. I think vduse shares a similar concern, due to
> > which it maintains a cache of configuratin registers inside kernel.
> 
> Maybe you can give an example for this? We cache the configuration
> space to a faster access to that.

Yes for the same reason of faster access, we wish to have config information
cached in hypervisor. In case of VDUSE,

Guest read -> (VM exit) -> VHOST_VDPA_GET_CONFIG -> vduse_vdpa_get_config

Basically a guest read terminates in host kernel, before resuming.

In our case,

Guest read - (VM exit) - Hyp emulates read - (VM resume)

So a guest read would terminate in hypervisor itself, before it is resumed. 

Without this optimization, guest VCPU would have stalled until VMM in host can
emulate it, which can be long, especially a concern when the read is issued in
hot path (interrupt handler, w/o MSI_X).

> > 2) For PCI pass-through devices, we are concerned of letting VMM be in charge of
> > emulating the complete configuration space (how can VM defend against invalid
> > attributes presented for passthr devices)?
> 
> Virtio driver has been hardened for this, for example:
> 
> commit 72b5e8958738aaa453db5149e6ca3bcf416023b9
> Author: Jason Wang <jasowang@redhat.com>
> Date:   Fri Jun 4 13:53:50 2021 +0800
> 
>     virtio-ring: store DMA metadata in desc_extra for split virtqueue
> 
> More hardening work is ongoing.

Any additional pointers you can share? I will go over them and revert.

> > I am aware of TDISP, but I think it
> > may not be available for some of the devices on our platform.
> >
> > One option we are considering is for hypervisor to be in charge of virtio-PCI
> > bus emulation, allowing only select devices (with recognized features) to be
> > registered on the bus. VMM would need to register devices/features with
> > hypervisor, which would verify it (as per some policy) and present them to VM on
> > the virtio-PCI bus it would emulate. Protected VM should be shielded from
> > invalid device configuration information that it may otherwise read from a
> > compromised VMM.
> >
> > For virtio devices, the hypervisor would also service most register read/writes
> > (to address concern #1), which implies it would need to cache a copy of the
> > device configuration information (similar to vduse).
> >
> > We think vduse can be leveraged here to initialize the hypervisor cache of
> > virtio registers. Basically have a vdpa-gunyah driver registered on the vdpa
> > bus to which vduse devices are bound (rather than virtio-vdpa or vhost-vdpa).
> > vdpa-gunyah driver can pull configuration information from vduse and pass that
> > on to hypervisor. It will also help inject IRQ and pass on queue notifications
> > (using hypervisor specific APIs).
> 
> Just to make sure I understand the design here, is vdpa-gunyah
> expected to have a dedicated uAPI other than vhost-vDPA? 

I didn't think vdpa-gunyah would need to provide any UAPI. VDUSE daemon
functionality would be inside VMM itself. For example, the virtio-block backend
in VMM would create a VDUSE device and pass on key configuration information to
VDUSE kernel module. The vdpa device created subsequently (vdpa dev add ..) 
will be bound to vdpa-gunyah driver, which pulls the configuration information
and passes on to hypervisor, which will emulate all further access from VM. 

> Wonder any reason why vhost-vDPA can be used here.

I think vhost-vDPA will imply that VMM still tracks the device on its PCI/MMIO
transports and will be involved for any register read from VM? We didn't want
VMM to be involved in emulating register access of VM. Instead hyp will emulate
most of those accesses.  I need to think more, but I am thinking the VMM will
not even track this device on its PCI/MMIO transports, but rather on a
different, vdpa?, transport.

> > We will however likely need vduse to support configuration writes (guest VM
> > updating configuration space, for ex: writing to 'events_clear' field in case of
> > virtio-gpu). Would vduse maintainers be willing to accept config_write support
> > for select devices/features (as long as the writes don't violate any safety
> > concerns we may have)?
> 
> I think so, looking at virtio_gpu_config_changed_work_func(), the
> events_clear seems to be fine to have a posted semantic.
> 
> Maybe you can post an RFC to support config writing and let's start from there?

Ok thanks for your feedback!

- vatsa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-26  2:37 ` Yongji Xie
@ 2024-07-26  7:06   ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 14+ messages in thread
From: Srivatsa Vaddagiri @ 2024-07-26  7:06 UTC (permalink / raw)
  To: Yongji Xie
  Cc: Jason Wang, Stefan Hajnoczi, virtio-dev, virtualization,
	quic_mnalajal, quic_eberman, quic_pheragu, quic_pderrin,
	quic_cvanscha, quic_pkondeti, quic_tsoni

* Yongji Xie <xieyongji@bytedance.com> [2024-07-26 10:37:40]:

> On Wed, Jul 24, 2024 at 11:38???AM Srivatsa Vaddagiri
> <quic_svaddagi@quicinc.com> wrote:
> >
> > Currently vduse does not seem to support configuration space writes
> > (vduse_vdpa_set_config does nothing). Is there any plan to lift that
> > limitation? I am aware of the discussions that took place here:
> >
> 
> The problem is that current virtio code can't allow the failure of config write.

Ok got it.

> > We will however likely need vduse to support configuration writes (guest VM
> > updating configuration space, for ex: writing to 'events_clear' field in case of
> > virtio-gpu). Would vduse maintainers be willing to accept config_write support
> > for select devices/features (as long as the writes don't violate any safety
> > concerns we may have)?
> >
> 
> It would be easier to support it if the config write just triggers an
> async operation on the device side, e.g. a doorbell. That means we
> can't ensure that any required internal actions on the device side
> triggered by the config write have been completed after the driver
> gets a successful return. But I'm not sure if this is your case.

Yes posted write should be fine, as long as guest issues a read as a fence after
that, which needs to be a sync point. As discussed in my earlier reply, we can
explore injecting surprise removal event into guest where the VDUSE daemon does
not respond within a timeout.

Thanks
vatsa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-26  7:03   ` Srivatsa Vaddagiri
@ 2024-07-26  7:29     ` Michael S. Tsirkin
  2024-07-29  2:16     ` Jason Wang
  1 sibling, 0 replies; 14+ messages in thread
From: Michael S. Tsirkin @ 2024-07-26  7:29 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: Jason Wang, xieyongji, stefanha, virtio-dev, virtualization,
	quic_mnalajal, quic_eberman, quic_pheragu, quic_pderrin,
	quic_cvanscha, quic_pkondeti, quic_tsoni, eperezma,
	Stefano Garzarella, Cindy Lu

On Fri, Jul 26, 2024 at 12:33:24PM +0530, Srivatsa Vaddagiri wrote:
> > Maybe you can give an example for this? We cache the configuration
> > space to a faster access to that.
> 
> Yes for the same reason of faster access, we wish to have config information
> cached in hypervisor. In case of VDUSE,
> 
> Guest read -> (VM exit) -> VHOST_VDPA_GET_CONFIG -> vduse_vdpa_get_config
> 
> Basically a guest read terminates in host kernel, before resuming.
> 
> In our case,
> 
> Guest read - (VM exit) - Hyp emulates read - (VM resume)
> 
> So a guest read would terminate in hypervisor itself, before it is resumed. 
> 
> Without this optimization, guest VCPU would have stalled until VMM in host can
> emulate it, which can be long, especially a concern when the read is issued in
> hot path (interrupt handler, w/o MSI_X).

Instead of optimizing INT#x (which is futile, it's a shared interrupt
and will trigger a ton of exits anyway), just enable MSI-X.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-26  2:47 ` Jason Wang
  2024-07-26  5:15   ` Michael S. Tsirkin
  2024-07-26  7:03   ` Srivatsa Vaddagiri
@ 2024-07-26 12:42   ` Srivatsa Vaddagiri
  2024-07-30  2:53     ` Jason Wang
  2 siblings, 1 reply; 14+ messages in thread
From: Srivatsa Vaddagiri @ 2024-07-26 12:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: xieyongji, stefanha, virtio-dev, virtualization, quic_mnalajal,
	quic_eberman, quic_pheragu, quic_pderrin, quic_cvanscha,
	quic_pkondeti, quic_tsoni, eperezma, Stefano Garzarella, mst,
	Cindy Lu

* Jason Wang <jasowang@redhat.com> [2024-07-26 10:47:59]:

> > 2) For PCI pass-through devices, we are concerned of letting VMM be in charge of
> > emulating the complete configuration space (how can VM defend against invalid
> > attributes presented for passthr devices)?
> 
> Virtio driver has been hardened for this, for example:
> 
> commit 72b5e8958738aaa453db5149e6ca3bcf416023b9
> Author: Jason Wang <jasowang@redhat.com>
> Date:   Fri Jun 4 13:53:50 2021 +0800
> 
>     virtio-ring: store DMA metadata in desc_extra for split virtqueue
> 
> More hardening work is ongoing.

I think above change is not sufficient for what we are looking for. In
particular for pass-through PCI devices, we are concerned that a untrusted
(compromised?) VMM can return invalid attributes when the confidential VM reads
the configuration space. These are PCI devices that may not support TDISP.
Hypervisor, being a trusted entity and controlling the PCI bus emulation
can ensure that the confidential VM sees valid attributes for all devices
(physical and virtual) that are enumerated on the bus. That's a key reason why
we want hypervisor to emulate access to configuration space of all PCI devices
enumerated by VM. That I think necessitates that hypervisor handle access to
virtio device configuration space as well (even if MSI-X obviates the
performance arguments of hypervisor doing so)!

Thanks
vatsa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-26  5:15   ` Michael S. Tsirkin
@ 2024-07-29  2:06     ` Jason Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2024-07-29  2:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Srivatsa Vaddagiri, xieyongji, stefanha, virtio-dev,
	virtualization, quic_mnalajal, quic_eberman, quic_pheragu,
	quic_pderrin, quic_cvanscha, quic_pkondeti, quic_tsoni, eperezma,
	Stefano Garzarella, Cindy Lu

On Fri, Jul 26, 2024 at 1:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Fri, Jul 26, 2024 at 10:47:59AM +0800, Jason Wang wrote:
> > On Wed, Jul 24, 2024 at 11:45 AM Srivatsa Vaddagiri
> > <quic_svaddagi@quicinc.com> wrote:
> > >
> > > Currently vduse does not seem to support configuration space writes
> > > (vduse_vdpa_set_config does nothing). Is there any plan to lift that
> > > limitation? I am aware of the discussions that took place here:
> > >
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20210615141331.407-11-xieyongji@bytedance.com/
> > >
> > > Perhaps writes can be supported *selectively* without violating safety concerns
> > > expressed in the above email discussion?
> >
> > Adding more relevant people here.
> >
> > It can probably be done case by case. The main reason for avoiding
> > config writing is
> >
> > 1) to prevent buggy/malicious userspace from hanging kernel driver for ever
> > 2) to prevent buggy/malicious userspace device to break the semantic
> >
> > Basically, it is the traditional trust model where the kernel doesn't
> > trust userspace.
> >
> > E.g current virtio-blk has the following codes:
> >
> > tatic ssize_t
> > cache_type_store(struct device *dev, struct device_attribute *attr,
> >          const char *buf, size_t count)
> > {
> >         struct gendisk *disk = dev_to_disk(dev);
> >         struct virtio_blk *vblk = disk->private_data;
> >         struct virtio_device *vdev = vblk->vdev;
> >         int i;
> >
> >         BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
> >         i = sysfs_match_string(virtblk_cache_types, buf);
> >         if (i < 0)
> >         return i;
> >
> >         virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
> >         virtblk_update_cache_mode(vdev);
> >         return count;
> > }
>
> To be fair, I think if you allow a block device in userspace you have already
> allowed said userspace to crash the kernel unless you have
> also restricted the filesystems mounted on this device to FUSE.

I'm not sure I will get here, I think the kernel should be able to
survive from the buggy device no matter if it is a real one or
something emulated in the userspace?

And if we can manage to do that, we should have an extra safe layer
(e.g we've agreed that we need to use shadow cvq for VDUSE networking
devices).

THanks

>
> --
> MST
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-26  7:03   ` Srivatsa Vaddagiri
  2024-07-26  7:29     ` Michael S. Tsirkin
@ 2024-07-29  2:16     ` Jason Wang
  2024-07-29  6:02       ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Wang @ 2024-07-29  2:16 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: xieyongji, stefanha, virtio-dev, virtualization, quic_mnalajal,
	quic_eberman, quic_pheragu, quic_pderrin, quic_cvanscha,
	quic_pkondeti, quic_tsoni, eperezma, Stefano Garzarella, mst,
	Cindy Lu

On Fri, Jul 26, 2024 at 3:04 PM Srivatsa Vaddagiri
<quic_svaddagi@quicinc.com> wrote:
>
> * Jason Wang <jasowang@redhat.com> [2024-07-26 10:47:59]:
>
> > On Wed, Jul 24, 2024 at 11:45???AM Srivatsa Vaddagiri
> > <quic_svaddagi@quicinc.com> wrote:
> > >
> > > Currently vduse does not seem to support configuration space writes
> > > (vduse_vdpa_set_config does nothing). Is there any plan to lift that
> > > limitation? I am aware of the discussions that took place here:
> > >
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20210615141331.407-11-xieyongji@bytedance.com/
> > >
> > > Perhaps writes can be supported *selectively* without violating safety concerns
> > > expressed in the above email discussion?
> >
> > Adding more relevant people here.
> >
> > It can probably be done case by case. The main reason for avoiding
> > config writing is
> >
> > 1) to prevent buggy/malicious userspace from hanging kernel driver for ever
> > 2) to prevent buggy/malicious userspace device to break the semantic
> >
> > Basically, it is the traditional trust model where the kernel doesn't
> > trust userspace.
> >
> > E.g current virtio-blk has the following codes:
> >
> > tatic ssize_t
> > cache_type_store(struct device *dev, struct device_attribute *attr,
> >          const char *buf, size_t count)
> > {
> >         struct gendisk *disk = dev_to_disk(dev);
> >         struct virtio_blk *vblk = disk->private_data;
> >         struct virtio_device *vdev = vblk->vdev;
> >         int i;
> >
> >         BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
> >         i = sysfs_match_string(virtblk_cache_types, buf);
> >         if (i < 0)
> >         return i;
> >
> >         virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
> >         virtblk_update_cache_mode(vdev);
> >         return count;
> > }
> >
> > So basically the question is if we make the config write a posted
> > write or non-posted one.
> >
> > If we make vduse config write a posted one, it means vduse doesn't
> > need to wait for the usersapce response. This is much more safe but it
> > breaks the above code as it looks like it requires a non-posted
> > semantic. We need to filter out virtio-blk or change the code to have
> > a read after the write:
> >
> > while (virtio_cread8(vdev, offsetof(struct virtio_blk_config, wce) & XXX)
> >        /* sleep or other */
> >
> > But we may suffer from the userspace who doesn't update the wce bit
> > forever, or maybe we can have a timeout there.
> >
> > If we make vduse config write a non posted one, it means the vduse
> > needs to wait for the response from the userspace.
>
> I was thinking only a read followed by write will needs to stall on a response
> from userspace, but yes we will have the same issue of dealing with buggy
> userspace. At least for PCI, I think there is provision to use surprise removal
> facility, where guest can be notified of malfunctioning device. In case of
> userspace (vduse daemon) not responding in time, can the surprise removal event
> be injected by VMM (hypervisor in our case)? Guest can decide what to do with it
> next.

I think it's a good point, we can try to implement this.

>
> > It satisfies the
> > above code's assumption but it needs to deal with the buggy userspace
> > which might be challenging.  Technically, we can have a device
> > emulation in the kernel but it looks like overkill for wce (or I don't
> > know how it can mandate wce for userspace devices).
> >
> > I feel it might make sense for other devices that only require posted
> > write semantics.
> >
> > >
> > > We are thinking of using vduse for hypervisor assisted virtio devices, which
> > > may need config write support and hence this question.
> > >
> > > To provide more details, we have a untrusted host that spins off a protected
> > > (confidential) guest VM on a Type-1 hypervisor (Gunyah). VMM in untrusted host
> > > leads to couple of issues:
> > >
> > > 1) Latency of (virtio) register access. VMM can take too long to respond with
> > > VCPU stalled all that while. I think vduse shares a similar concern, due to
> > > which it maintains a cache of configuratin registers inside kernel.
> >
> > Maybe you can give an example for this? We cache the configuration
> > space to a faster access to that.
>
> Yes for the same reason of faster access, we wish to have config information
> cached in hypervisor. In case of VDUSE,
>
> Guest read -> (VM exit) -> VHOST_VDPA_GET_CONFIG -> vduse_vdpa_get_config
>
> Basically a guest read terminates in host kernel, before resuming.
>
> In our case,
>
> Guest read - (VM exit) - Hyp emulates read - (VM resume)
>
> So a guest read would terminate in hypervisor itself, before it is resumed.

It looks like the only difference so far is there is a userspace
assistance in the case of VDUSE.

>
> Without this optimization, guest VCPU would have stalled until VMM in host can
> emulate it, which can be long, especially a concern when the read is issued in
> hot path (interrupt handler, w/o MSI_X).

I think I agree with Michael, let's try to use MSI-X here where
there's a lot of existing optimizations in various layers.

>
> > > 2) For PCI pass-through devices, we are concerned of letting VMM be in charge of
> > > emulating the complete configuration space (how can VM defend against invalid
> > > attributes presented for passthr devices)?
> >
> > Virtio driver has been hardened for this, for example:
> >
> > commit 72b5e8958738aaa453db5149e6ca3bcf416023b9
> > Author: Jason Wang <jasowang@redhat.com>
> > Date:   Fri Jun 4 13:53:50 2021 +0800
> >
> >     virtio-ring: store DMA metadata in desc_extra for split virtqueue
> >
> > More hardening work is ongoing.
>
> Any additional pointers you can share? I will go over them and revert.

Yes, used buffer length and IRQ hardening are reverted as we need to
deal with the legacy device more carefully. New version will be posted
in the near future.

>
> > > I am aware of TDISP, but I think it
> > > may not be available for some of the devices on our platform.
> > >
> > > One option we are considering is for hypervisor to be in charge of virtio-PCI
> > > bus emulation, allowing only select devices (with recognized features) to be
> > > registered on the bus. VMM would need to register devices/features with
> > > hypervisor, which would verify it (as per some policy) and present them to VM on
> > > the virtio-PCI bus it would emulate. Protected VM should be shielded from
> > > invalid device configuration information that it may otherwise read from a
> > > compromised VMM.
> > >
> > > For virtio devices, the hypervisor would also service most register read/writes
> > > (to address concern #1), which implies it would need to cache a copy of the
> > > device configuration information (similar to vduse).
> > >
> > > We think vduse can be leveraged here to initialize the hypervisor cache of
> > > virtio registers. Basically have a vdpa-gunyah driver registered on the vdpa
> > > bus to which vduse devices are bound (rather than virtio-vdpa or vhost-vdpa).
> > > vdpa-gunyah driver can pull configuration information from vduse and pass that
> > > on to hypervisor. It will also help inject IRQ and pass on queue notifications
> > > (using hypervisor specific APIs).
> >
> > Just to make sure I understand the design here, is vdpa-gunyah
> > expected to have a dedicated uAPI other than vhost-vDPA?
>
> I didn't think vdpa-gunyah would need to provide any UAPI. VDUSE daemon
> functionality would be inside VMM itself. For example, the virtio-block backend
> in VMM would create a VDUSE device and pass on key configuration information to
> VDUSE kernel module. The vdpa device created subsequently (vdpa dev add ..)
> will be bound to vdpa-gunyah driver, which pulls the configuration information
> and passes on to hypervisor, which will emulate all further access from VM.

Ok, so those drivers will talk to the VMM.

>
> > Wonder any reason why vhost-vDPA can be used here.
>
> I think vhost-vDPA will imply that VMM still tracks the device on its PCI/MMIO
> transports and will be involved for any register read from VM?

Yes, and it is done with the assistance of the userspace.

> We didn't want
> VMM to be involved in emulating register access of VM. Instead hyp will emulate
> most of those accesses.  I need to think more, but I am thinking the VMM will
> not even track this device on its PCI/MMIO transports, but rather on a
> different, vdpa?, transport.

Right, you need to emulate a virtio device with an existing transport
anyhow. MMIO might be easy. If you want a PCI transport it means a
huge number of devices emulation needs to be done by the hypervisor.

>
> > > We will however likely need vduse to support configuration writes (guest VM
> > > updating configuration space, for ex: writing to 'events_clear' field in case of
> > > virtio-gpu). Would vduse maintainers be willing to accept config_write support
> > > for select devices/features (as long as the writes don't violate any safety
> > > concerns we may have)?
> >
> > I think so, looking at virtio_gpu_config_changed_work_func(), the
> > events_clear seems to be fine to have a posted semantic.
> >
> > Maybe you can post an RFC to support config writing and let's start from there?
>
> Ok thanks for your feedback!
>
> - vatsa
>

Thanks


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-29  2:16     ` Jason Wang
@ 2024-07-29  6:02       ` Srivatsa Vaddagiri
  2024-07-30  3:06         ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Srivatsa Vaddagiri @ 2024-07-29  6:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: xieyongji, stefanha, virtio-dev, virtualization, quic_mnalajal,
	quic_eberman, quic_pheragu, quic_pderrin, quic_cvanscha,
	quic_pkondeti, quic_tsoni, eperezma, Stefano Garzarella, mst,
	Cindy Lu

* Jason Wang <jasowang@redhat.com> [2024-07-29 10:16:48]:

> > Without this optimization, guest VCPU would have stalled until VMM in host can
> > emulate it, which can be long, especially a concern when the read is issued in
> > hot path (interrupt handler, w/o MSI_X).
> 
> I think I agree with Michael, let's try to use MSI-X here where
> there's a lot of existing optimizations in various layers.

Yes sure. Even if we implement MSI-X, there is a security angle to why we want
hypervisor-hosted PCI bus (have provided details in an earlier reply

https://lore.kernel.org/virtio-dev/20240726070609.GB723942@quicinc.com/T/#m84455763d6b4d0d3b8df814b3d64e6e48ec12ae3


> > > > We will however likely need vduse to support configuration writes (guest VM
> > > > updating configuration space, for ex: writing to 'events_clear' field in case of
> > > > virtio-gpu). Would vduse maintainers be willing to accept config_write support
> > > > for select devices/features (as long as the writes don't violate any safety
> > > > concerns we may have)?
> > >
> > > I think so, looking at virtio_gpu_config_changed_work_func(), the
> > > events_clear seems to be fine to have a posted semantic.
> > >
> > > Maybe you can post an RFC to support config writing and let's start from there?

Does VDUSE support runtime configuration changes (ex: block device capacity
changes)? I am curious how the atomicity of that update is handled. For ex:
guest reading config space while concurrent updates are underway (SET_CONFIG).
I think the generation count should help there - but it was not clear to me how
VDUSE is handling generation_count reads during such concurrent udpates.

- vatsa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-26 12:42   ` Srivatsa Vaddagiri
@ 2024-07-30  2:53     ` Jason Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2024-07-30  2:53 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: xieyongji, stefanha, virtio-dev, virtualization, quic_mnalajal,
	quic_eberman, quic_pheragu, quic_pderrin, quic_cvanscha,
	quic_pkondeti, quic_tsoni, eperezma, Stefano Garzarella, mst,
	Cindy Lu

On Fri, Jul 26, 2024 at 8:42 PM Srivatsa Vaddagiri
<quic_svaddagi@quicinc.com> wrote:
>
> * Jason Wang <jasowang@redhat.com> [2024-07-26 10:47:59]:
>
> > > 2) For PCI pass-through devices, we are concerned of letting VMM be in charge of
> > > emulating the complete configuration space (how can VM defend against invalid
> > > attributes presented for passthr devices)?
> >
> > Virtio driver has been hardened for this, for example:
> >
> > commit 72b5e8958738aaa453db5149e6ca3bcf416023b9
> > Author: Jason Wang <jasowang@redhat.com>
> > Date:   Fri Jun 4 13:53:50 2021 +0800
> >
> >     virtio-ring: store DMA metadata in desc_extra for split virtqueue
> >
> > More hardening work is ongoing.
>
> I think above change is not sufficient for what we are looking for. In
> particular for pass-through PCI devices, we are concerned that a untrusted
> (compromised?) VMM can return invalid attributes when the confidential VM reads
> the configuration space.

Yes it is, virtio specific hardening can't cover this, it might need
help from the PCI core.

> These are PCI devices that may not support TDISP.
> Hypervisor, being a trusted entity and controlling the PCI bus emulation
> can ensure that the confidential VM sees valid attributes for all devices
> (physical and virtual) that are enumerated on the bus.

I think I need to understand the difference between hypervisor and VMM
here and which one is trusted and not and why.

> That's a key reason why
> we want hypervisor to emulate access to configuration space of all PCI devices
> enumerated by VM. That I think necessitates that hypervisor handle access to
> virtio device configuration space as well (even if MSI-X obviates the
> performance arguments of hypervisor doing so)!

That should be fine, but it means for PCI you need a lot of emulations
(e.g Qemu emulate q35)

Thanks

>
> Thanks
> vatsa
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-29  6:02       ` Srivatsa Vaddagiri
@ 2024-07-30  3:06         ` Jason Wang
  2024-07-30  3:10           ` Jason Wang
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Wang @ 2024-07-30  3:06 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: xieyongji, stefanha, virtio-dev, virtualization, quic_mnalajal,
	quic_eberman, quic_pheragu, quic_pderrin, quic_cvanscha,
	quic_pkondeti, quic_tsoni, eperezma, Stefano Garzarella, mst,
	Cindy Lu

On Mon, Jul 29, 2024 at 2:03 PM Srivatsa Vaddagiri
<quic_svaddagi@quicinc.com> wrote:
>
> * Jason Wang <jasowang@redhat.com> [2024-07-29 10:16:48]:
>
> > > Without this optimization, guest VCPU would have stalled until VMM in host can
> > > emulate it, which can be long, especially a concern when the read is issued in
> > > hot path (interrupt handler, w/o MSI_X).
> >
> > I think I agree with Michael, let's try to use MSI-X here where
> > there's a lot of existing optimizations in various layers.
>
> Yes sure. Even if we implement MSI-X, there is a security angle to why we want
> hypervisor-hosted PCI bus (have provided details in an earlier reply
>
> https://lore.kernel.org/virtio-dev/20240726070609.GB723942@quicinc.com/T/#m84455763d6b4d0d3b8df814b3d64e6e48ec12ae3
>
>
> > > > > We will however likely need vduse to support configuration writes (guest VM
> > > > > updating configuration space, for ex: writing to 'events_clear' field in case of
> > > > > virtio-gpu). Would vduse maintainers be willing to accept config_write support
> > > > > for select devices/features (as long as the writes don't violate any safety
> > > > > concerns we may have)?
> > > >
> > > > I think so, looking at virtio_gpu_config_changed_work_func(), the
> > > > events_clear seems to be fine to have a posted semantic.
> > > >
> > > > Maybe you can post an RFC to support config writing and let's start from there?
>
> Does VDUSE support runtime configuration changes (ex: block device capacity
> changes)?

Looks like it can, it has VDUSE_DEV_INJECT_CONFIG_IRQ.

> I am curious how the atomicity of that update is handled. For ex:
> guest reading config space while concurrent updates are underway (SET_CONFIG).
> I think the generation count should help there - but it was not clear to me how
> VDUSE is handling generation_count reads during such concurrent udpates.

Similar to the above, we are missing uAPI to do that. Probably just an
uAPI to update the device generation.

Thanks

>
> - vatsa
>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] vduse config write support
  2024-07-30  3:06         ` Jason Wang
@ 2024-07-30  3:10           ` Jason Wang
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Wang @ 2024-07-30  3:10 UTC (permalink / raw)
  To: Srivatsa Vaddagiri
  Cc: xieyongji, stefanha, virtio-dev, virtualization, quic_mnalajal,
	quic_eberman, quic_pheragu, quic_pderrin, quic_cvanscha,
	quic_pkondeti, quic_tsoni, eperezma, Stefano Garzarella, mst,
	Cindy Lu

On Tue, Jul 30, 2024 at 11:06 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Mon, Jul 29, 2024 at 2:03 PM Srivatsa Vaddagiri
> <quic_svaddagi@quicinc.com> wrote:
> >
> > * Jason Wang <jasowang@redhat.com> [2024-07-29 10:16:48]:
> >
> > > > Without this optimization, guest VCPU would have stalled until VMM in host can
> > > > emulate it, which can be long, especially a concern when the read is issued in
> > > > hot path (interrupt handler, w/o MSI_X).
> > >
> > > I think I agree with Michael, let's try to use MSI-X here where
> > > there's a lot of existing optimizations in various layers.
> >
> > Yes sure. Even if we implement MSI-X, there is a security angle to why we want
> > hypervisor-hosted PCI bus (have provided details in an earlier reply
> >
> > https://lore.kernel.org/virtio-dev/20240726070609.GB723942@quicinc.com/T/#m84455763d6b4d0d3b8df814b3d64e6e48ec12ae3
> >
> >
> > > > > > We will however likely need vduse to support configuration writes (guest VM
> > > > > > updating configuration space, for ex: writing to 'events_clear' field in case of
> > > > > > virtio-gpu). Would vduse maintainers be willing to accept config_write support
> > > > > > for select devices/features (as long as the writes don't violate any safety
> > > > > > concerns we may have)?
> > > > >
> > > > > I think so, looking at virtio_gpu_config_changed_work_func(), the
> > > > > events_clear seems to be fine to have a posted semantic.
> > > > >
> > > > > Maybe you can post an RFC to support config writing and let's start from there?
> >
> > Does VDUSE support runtime configuration changes (ex: block device capacity
> > changes)?
>
> Looks like it can, it has VDUSE_DEV_INJECT_CONFIG_IRQ.
>
> > I am curious how the atomicity of that update is handled. For ex:
> > guest reading config space while concurrent updates are underway (SET_CONFIG).
> > I think the generation count should help there - but it was not clear to me how
> > VDUSE is handling generation_count reads during such concurrent udpates.
>
> Similar to the above, we are missing uAPI to do that. Probably just an
> uAPI to update the device generation.

I meant userspace can just do:

ioctl(fd,VDUSE_DEV_INC_GENRATION);
ioctl(fd,VDUSE_DEV_SET_CONFIG);
ioctl(fd,VDUSE_DEV_INC_GENREATION);

And synchronize the device generation increasing with
vduse_vdpa_get_generation(). Then we should be fine.

Thanks

>
> Thanks
>
> >
> > - vatsa
> >


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-07-30  3:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-24  3:38 [RFC] vduse config write support Srivatsa Vaddagiri
2024-07-26  2:37 ` Yongji Xie
2024-07-26  7:06   ` Srivatsa Vaddagiri
2024-07-26  2:47 ` Jason Wang
2024-07-26  5:15   ` Michael S. Tsirkin
2024-07-29  2:06     ` Jason Wang
2024-07-26  7:03   ` Srivatsa Vaddagiri
2024-07-26  7:29     ` Michael S. Tsirkin
2024-07-29  2:16     ` Jason Wang
2024-07-29  6:02       ` Srivatsa Vaddagiri
2024-07-30  3:06         ` Jason Wang
2024-07-30  3:10           ` Jason Wang
2024-07-26 12:42   ` Srivatsa Vaddagiri
2024-07-30  2:53     ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).