* Re: [RFC PATCH v4 00/20] vDPA shadow virtqueue
[not found] <20211001070603.307037-1-eperezma@redhat.com>
@ 2021-10-12 3:59 ` Jason Wang
2021-10-12 4:06 ` Jason Wang
[not found] ` <20211001070603.307037-9-eperezma@redhat.com>
` (10 subsequent siblings)
11 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-12 3:59 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's virtio device operation.
>
> When SVQ is enabled qemu offers a new vring to the device to read
> and write into, and also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked, but at this RFC SVQ is not enabled on migration automatically.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> For qemu to use shadow virtqueues the guest virtio driver must not use
> features like event_idx or indirect descriptors. These limitations will
> be addressed in later series, but they are left out for simplicity at
> the moment.
>
> SVQ needs to be enabled with QMP command:
>
> { "execute": "x-vhost-enable-shadow-vq",
> "arguments": { "name": "dev0", "enable": true } }
>
> This series includes some patches to delete in the final version that
> helps with its testing. The first two of the series freely implements
> the feature to stop the device and be able to retrieve its status. It's
> intended to be used with vp_vpda driver in a nested environment. This
> driver also need modifications to forward the new status bit.
>
> Patches 2-8 prepares the SVQ and QMP command to support guest to host
> notifications forwarding. If the SVQ is enabled with these ones
> applied and the device supports it, that part can be tested in
> isolation (for example, with networking), hopping through SVQ.
>
> Same thing is true with patches 9-13, but with device to guest
> notifications.
>
> The rest of the patches implements the actual buffer forwarding.
>
> Comments are welcome.
Hi Eugenio:
It would be helpful to have a public git repo for us to ease the review.
Thanks
>
> TODO:
> * Event, indirect, packed, and others features of virtio - Waiting for
> confirmation of the big picture.
> * Use already available iova tree to track mappings.
> * To sepparate buffers forwarding in its own AIO context, so we can
> throw more threads to that task and we don't need to stop the main
> event loop.
> * unmap iommu memory. Now the tree can only grow from SVQ enable, but
> it should be fine as long as not a lot of memory is added to the
> guest.
> * Rebase on top of latest qemu (and, hopefully, on top of multiqueue
> vdpa).
> * Some assertions need to be appropiate error handling paths.
> * Proper documentation.
>
> Changes from v3 RFC:
> * Move everything to vhost-vdpa backend. A big change, this allowed
> some cleanup but more code has been added in other places.
> * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
> * Adding vhost-vdpa devices support
> * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
> * Use QMP instead of migration to start SVQ mode.
> * Only accepting IOMMU devices, closer behavior with target devices
> (vDPA)
> * Fix invalid masking/unmasking of vhost call fd.
> * Use of proper methods for synchronization.
> * No need to modify VirtIO device code, all of the changes are
> contained in vhost code.
> * Delete superfluous code.
> * An intermediate RFC was sent with only the notifications forwarding
> changes. It can be seen in
> https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
> virtio: Add VIRTIO_F_QUEUE_STATE
> virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> virtio: Add virtio_queue_is_host_notifier_enabled
> vhost: Make vhost_virtqueue_{start,stop} public
> vhost: Add x-vhost-enable-shadow-vq qmp
> vhost: Add VhostShadowVirtqueue
> vdpa: Register vdpa devices in a list
> vhost: Route guest->host notification through shadow virtqueue
> Add vhost_svq_get_svq_call_notifier
> Add vhost_svq_set_guest_call_notifier
> vdpa: Save call_fd in vhost-vdpa
> vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> vhost: Route host->guest notification through shadow virtqueue
> virtio: Add vhost_shadow_vq_get_vring_addr
> vdpa: Save host and guest features
> vhost: Add vhost_svq_valid_device_features to shadow vq
> vhost: Shadow virtqueue buffers forwarding
> vhost: Add VhostIOVATree
> vhost: Use a tree to store memory mappings
> vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (20):
> virtio: Add VIRTIO_F_QUEUE_STATE
> virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
> virtio: Add virtio_queue_is_host_notifier_enabled
> vhost: Make vhost_virtqueue_{start,stop} public
> vhost: Add x-vhost-enable-shadow-vq qmp
> vhost: Add VhostShadowVirtqueue
> vdpa: Register vdpa devices in a list
> vhost: Route guest->host notification through shadow virtqueue
> vdpa: Save call_fd in vhost-vdpa
> vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
> vhost: Route host->guest notification through shadow virtqueue
> virtio: Add vhost_shadow_vq_get_vring_addr
> vdpa: Save host and guest features
> vhost: Add vhost_svq_valid_device_features to shadow vq
> vhost: Shadow virtqueue buffers forwarding
> vhost: Check for device VRING_USED_F_NO_NOTIFY at shadow virtqueue
> kick
> vhost: Use VRING_AVAIL_F_NO_INTERRUPT at device call on shadow
> virtqueue
> vhost: Add VhostIOVATree
> vhost: Use a tree to store memory mappings
> vdpa: Add custom IOTLB translations to SVQ
>
> qapi/net.json | 23 +
> hw/virtio/vhost-iova-tree.h | 40 ++
> hw/virtio/vhost-shadow-virtqueue.h | 37 ++
> hw/virtio/virtio-pci.h | 1 +
> include/hw/virtio/vhost-vdpa.h | 13 +
> include/hw/virtio/vhost.h | 4 +
> include/hw/virtio/virtio.h | 5 +-
> .../standard-headers/linux/virtio_config.h | 5 +
> include/standard-headers/linux/virtio_pci.h | 2 +
> hw/net/virtio-net.c | 6 +-
> hw/virtio/vhost-iova-tree.c | 230 +++++++
> hw/virtio/vhost-shadow-virtqueue.c | 619 ++++++++++++++++++
> hw/virtio/vhost-vdpa.c | 412 +++++++++++-
> hw/virtio/vhost.c | 12 +-
> hw/virtio/virtio-pci.c | 16 +-
> hw/virtio/virtio.c | 5 +
> hw/virtio/meson.build | 2 +-
> hw/virtio/trace-events | 1 +
> 18 files changed, 1413 insertions(+), 20 deletions(-)
> create mode 100644 hw/virtio/vhost-iova-tree.h
> create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
> create mode 100644 hw/virtio/vhost-iova-tree.c
> create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 00/20] vDPA shadow virtqueue
2021-10-12 3:59 ` [RFC PATCH v4 00/20] vDPA shadow virtqueue Jason Wang
@ 2021-10-12 4:06 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-12 4:06 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
On Tue, Oct 12, 2021 at 11:59 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> > This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> > is intended as a new method of tracking the memory the devices touch
> > during a migration process: Instead of relay on vhost device's dirty
> > logging capability, SVQ intercepts the VQ dataplane forwarding the
> > descriptors between VM and device. This way qemu is the effective
> > writer of guests memory, like in qemu's virtio device operation.
> >
> > When SVQ is enabled qemu offers a new vring to the device to read
> > and write into, and also intercepts kicks and calls between the device
> > and the guest. Used buffers relay would cause dirty memory being
> > tracked, but at this RFC SVQ is not enabled on migration automatically.
> >
> > It is based on the ideas of DPDK SW assisted LM, in the series of
> > DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> > not map the shadow vq in guest's VA, but in qemu's.
> >
> > For qemu to use shadow virtqueues the guest virtio driver must not use
> > features like event_idx or indirect descriptors. These limitations will
> > be addressed in later series, but they are left out for simplicity at
> > the moment.
> >
> > SVQ needs to be enabled with QMP command:
> >
> > { "execute": "x-vhost-enable-shadow-vq",
> > "arguments": { "name": "dev0", "enable": true } }
> >
> > This series includes some patches to delete in the final version that
> > helps with its testing. The first two of the series freely implements
> > the feature to stop the device and be able to retrieve its status. It's
> > intended to be used with vp_vpda driver in a nested environment. This
> > driver also need modifications to forward the new status bit.
> >
> > Patches 2-8 prepares the SVQ and QMP command to support guest to host
> > notifications forwarding. If the SVQ is enabled with these ones
> > applied and the device supports it, that part can be tested in
> > isolation (for example, with networking), hopping through SVQ.
> >
> > Same thing is true with patches 9-13, but with device to guest
> > notifications.
> >
> > The rest of the patches implements the actual buffer forwarding.
> >
> > Comments are welcome.
>
>
> Hi Eugenio:
>
>
> It would be helpful to have a public git repo for us to ease the review.
>
> Thanks
>
Btw, we also need to measure the performance impact of the shadow virtqueue.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 08/20] vhost: Route guest->host notification through shadow virtqueue
[not found] ` <20211001070603.307037-9-eperezma@redhat.com>
@ 2021-10-13 3:27 ` Jason Wang
[not found] ` <CAJaqyWd2joWx3kKz=cJBs4UxZofP7ETkbpg9+cSQSE2MSyBtUg@mail.gmail.com>
0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-13 3:27 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> Shadow virtqueue notifications forwarding is disabled when vhost_dev
> stops, so code flow follows usual cleanup.
>
> Also, host notifiers must be disabled at SVQ start,
Any reason for this?
> and they will not
> start if SVQ has been enabled when device is stopped. This is trivial
> to address, but it is left out for simplicity at this moment.
It looks to me this patch also contains the following logics
1) codes to enable svq
2) codes to let svq to be enabled from QMP.
I think they need to be split out, we may endup with the following
series of patches
1) svq skeleton with enable/disable
2) route host notifier to svq
3) route guest notifier to svq
4) codes to enable svq
5) enable svq via QMP
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> qapi/net.json | 2 +-
> hw/virtio/vhost-shadow-virtqueue.h | 8 ++
> include/hw/virtio/vhost-vdpa.h | 4 +
> hw/virtio/vhost-shadow-virtqueue.c | 138 ++++++++++++++++++++++++++++-
> hw/virtio/vhost-vdpa.c | 116 +++++++++++++++++++++++-
> 5 files changed, 264 insertions(+), 4 deletions(-)
>
> diff --git a/qapi/net.json b/qapi/net.json
> index a2c30fd455..fe546b0e7c 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -88,7 +88,7 @@
> #
> # @enable: true to use the alternate shadow VQ notifications
> #
> -# Returns: Always error, since SVQ is not implemented at the moment.
> +# Returns: Error if failure, or 'no error' for success.
> #
> # Since: 6.2
> #
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 27ac6388fa..237cfceb9c 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -14,6 +14,14 @@
>
> typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>
> +EventNotifier *vhost_svq_get_svq_call_notifier(VhostShadowVirtqueue *svq);
Let's move this function to another patch since it's unrelated to the
guest->host routing.
> +void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> +
> +bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> + VhostShadowVirtqueue *svq);
> +void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> + VhostShadowVirtqueue *svq);
> +
> VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
>
> void vhost_svq_free(VhostShadowVirtqueue *vq);
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 0d565bb5bd..48aae59d8e 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -12,6 +12,8 @@
> #ifndef HW_VIRTIO_VHOST_VDPA_H
> #define HW_VIRTIO_VHOST_VDPA_H
>
> +#include <gmodule.h>
> +
> #include "qemu/queue.h"
> #include "hw/virtio/virtio.h"
>
> @@ -24,6 +26,8 @@ typedef struct vhost_vdpa {
> int device_fd;
> uint32_t msg_type;
> MemoryListener listener;
> + bool shadow_vqs_enabled;
> + GPtrArray *shadow_vqs;
> struct vhost_dev *dev;
> QLIST_ENTRY(vhost_vdpa) entry;
> VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index c4826a1b56..21dc99ab5d 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -9,9 +9,12 @@
>
> #include "qemu/osdep.h"
> #include "hw/virtio/vhost-shadow-virtqueue.h"
> +#include "hw/virtio/vhost.h"
> +
> +#include "standard-headers/linux/vhost_types.h"
>
> #include "qemu/error-report.h"
> -#include "qemu/event_notifier.h"
> +#include "qemu/main-loop.h"
>
> /* Shadow virtqueue to relay notifications */
> typedef struct VhostShadowVirtqueue {
> @@ -19,14 +22,146 @@ typedef struct VhostShadowVirtqueue {
> EventNotifier kick_notifier;
> /* Shadow call notifier, sent to vhost */
> EventNotifier call_notifier;
> +
> + /*
> + * Borrowed virtqueue's guest to host notifier.
> + * To borrow it in this event notifier allows to register on the event
> + * loop and access the associated shadow virtqueue easily. If we use the
> + * VirtQueue, we don't have an easy way to retrieve it.
> + *
> + * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
> + */
> + EventNotifier host_notifier;
> +
> + /* Guest's call notifier, where SVQ calls guest. */
> + EventNotifier guest_call_notifier;
To be consistent, let's simply use "guest_notifier" here.
> +
> + /* Virtio queue shadowing */
> + VirtQueue *vq;
> } VhostShadowVirtqueue;
>
> +/* Forward guest notifications */
> +static void vhost_handle_guest_kick(EventNotifier *n)
> +{
> + VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> + host_notifier);
> +
> + if (unlikely(!event_notifier_test_and_clear(n))) {
> + return;
> + }
Is there a chance that we may stop the processing of available buffers
during the svq enabling? There could be no kick from the guest in this case.
> +
> + event_notifier_set(&svq->kick_notifier);
> +}
> +
> +/*
> + * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
> + * exists pending used buffers.
> + *
> + * @svq Shadow Virtqueue
> + */
> +EventNotifier *vhost_svq_get_svq_call_notifier(VhostShadowVirtqueue *svq)
> +{
> + return &svq->call_notifier;
> +}
> +
> +/*
> + * Set the call notifier for the SVQ to call the guest
> + *
> + * @svq Shadow virtqueue
> + * @call_fd call notifier
> + *
> + * Called on BQL context.
> + */
> +void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
> +{
> + event_notifier_init_fd(&svq->guest_call_notifier, call_fd);
> +}
> +
> +/*
> + * Restore the vhost guest to host notifier, i.e., disables svq effect.
> + */
> +static int vhost_svq_restore_vdev_host_notifier(struct vhost_dev *dev,
> + unsigned vhost_index,
> + VhostShadowVirtqueue *svq)
> +{
> + EventNotifier *vq_host_notifier = virtio_queue_get_host_notifier(svq->vq);
> + struct vhost_vring_file file = {
> + .index = vhost_index,
> + .fd = event_notifier_get_fd(vq_host_notifier),
> + };
> + int r;
> +
> + /* Restore vhost kick */
> + r = dev->vhost_ops->vhost_set_vring_kick(dev, &file);
And remap the notification area if necessary.
> + return r ? -errno : 0;
> +}
> +
> +/*
> + * Start shadow virtqueue operation.
> + * @dev vhost device
> + * @hidx vhost virtqueue index
> + * @svq Shadow Virtqueue
> + */
> +bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> + VhostShadowVirtqueue *svq)
> +{
> + EventNotifier *vq_host_notifier = virtio_queue_get_host_notifier(svq->vq);
> + struct vhost_vring_file file = {
> + .index = dev->vhost_ops->vhost_get_vq_index(dev, dev->vq_index + idx),
> + .fd = event_notifier_get_fd(&svq->kick_notifier),
> + };
> + int r;
> +
> + /* Check that notifications are still going directly to vhost dev */
> + assert(virtio_queue_is_host_notifier_enabled(svq->vq));
> +
> + /*
> + * event_notifier_set_handler already checks for guest's notifications if
> + * they arrive in the switch, so there is no need to explicitely check for
> + * them.
> + */
If this is true, shouldn't we call vhost_set_vring_kick() before the
event_notifier_set_handler()?
Btw, I think we should update the fd if set_vring_kick() was called
after this function?
> + event_notifier_init_fd(&svq->host_notifier,
> + event_notifier_get_fd(vq_host_notifier));
> + event_notifier_set_handler(&svq->host_notifier, vhost_handle_guest_kick);
> +
> + r = dev->vhost_ops->vhost_set_vring_kick(dev, &file);
And we need to stop the notification area mmap.
> + if (unlikely(r != 0)) {
> + error_report("Couldn't set kick fd: %s", strerror(errno));
> + goto err_set_vring_kick;
> + }
> +
> + return true;
> +
> +err_set_vring_kick:
> + event_notifier_set_handler(&svq->host_notifier, NULL);
> +
> + return false;
> +}
> +
> +/*
> + * Stop shadow virtqueue operation.
> + * @dev vhost device
> + * @idx vhost queue index
> + * @svq Shadow Virtqueue
> + */
> +void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> + VhostShadowVirtqueue *svq)
> +{
> + int r = vhost_svq_restore_vdev_host_notifier(dev, idx, svq);
> + if (unlikely(r < 0)) {
> + error_report("Couldn't restore vq kick fd: %s", strerror(-r));
> + }
> +
> + event_notifier_set_handler(&svq->host_notifier, NULL);
> +}
> +
> /*
> * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> * methods and file descriptors.
> */
> VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> {
> + int vq_idx = dev->vq_index + idx;
> g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> int r;
>
> @@ -44,6 +179,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> goto err_init_call_notifier;
> }
>
> + svq->vq = virtio_get_queue(dev->vdev, vq_idx);
> return g_steal_pointer(&svq);
>
> err_init_call_notifier:
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index e0dc7508c3..36c954a779 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -17,6 +17,7 @@
> #include "hw/virtio/vhost.h"
> #include "hw/virtio/vhost-backend.h"
> #include "hw/virtio/virtio-net.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
> #include "hw/virtio/vhost-vdpa.h"
> #include "exec/address-spaces.h"
> #include "qemu/main-loop.h"
> @@ -272,6 +273,16 @@ static void vhost_vdpa_add_status(struct vhost_dev *dev, uint8_t status)
> vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &s);
> }
>
> +/**
> + * Adaptor function to free shadow virtqueue through gpointer
> + *
> + * @svq The Shadow Virtqueue
> + */
> +static void vhost_psvq_free(gpointer svq)
> +{
> + vhost_svq_free(svq);
> +}
> +
> static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> {
> struct vhost_vdpa *v;
> @@ -283,6 +294,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> dev->opaque = opaque ;
> v->listener = vhost_vdpa_memory_listener;
> v->msg_type = VHOST_IOTLB_MSG_V2;
> + v->shadow_vqs = g_ptr_array_new_full(dev->nvqs, vhost_psvq_free);
> QLIST_INSERT_HEAD(&vhost_vdpa_devices, v, entry);
>
> vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> @@ -373,6 +385,17 @@ err:
> return;
> }
>
> +static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
> +{
> + struct vhost_vdpa *v = dev->opaque;
> + size_t idx;
> +
> + for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
> + vhost_svq_stop(dev, idx, g_ptr_array_index(v->shadow_vqs, idx));
> + }
> + g_ptr_array_free(v->shadow_vqs, true);
> +}
> +
> static int vhost_vdpa_cleanup(struct vhost_dev *dev)
> {
> struct vhost_vdpa *v;
> @@ -381,6 +404,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
> trace_vhost_vdpa_cleanup(dev, v);
> vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> memory_listener_unregister(&v->listener);
> + vhost_vdpa_svq_cleanup(dev);
> QLIST_REMOVE(v, entry);
>
> dev->opaque = NULL;
> @@ -557,7 +581,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> if (started) {
> uint8_t status = 0;
> memory_listener_register(&v->listener, &address_space_memory);
> - vhost_vdpa_host_notifiers_init(dev);
> + if (!v->shadow_vqs_enabled) {
> + vhost_vdpa_host_notifiers_init(dev);
> + }
This looks like a trick, why not check and setup shadow_vqs inside:
1) vhost_vdpa_host_notifiers_init()
and
2) vhost_vdpa_set_vring_kick()
> vhost_vdpa_set_vring_ready(dev);
> vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> @@ -663,10 +689,96 @@ static bool vhost_vdpa_force_iommu(struct vhost_dev *dev)
> return true;
> }
>
> +/*
> + * Start shadow virtqueue.
> + */
> +static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
> +{
> + struct vhost_vdpa *v = dev->opaque;
> + VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
> + return vhost_svq_start(dev, idx, svq);
> +}
> +
> +static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> +{
> + struct vhost_dev *hdev = v->dev;
> + unsigned n;
> +
> + if (enable == v->shadow_vqs_enabled) {
> + return hdev->nvqs;
> + }
> +
> + if (enable) {
> + /* Allocate resources */
> + assert(v->shadow_vqs->len == 0);
> + for (n = 0; n < hdev->nvqs; ++n) {
> + VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
> + bool ok;
> +
> + if (unlikely(!svq)) {
> + g_ptr_array_set_size(v->shadow_vqs, 0);
> + return 0;
> + }
> + g_ptr_array_add(v->shadow_vqs, svq);
> +
> + ok = vhost_vdpa_svq_start_vq(hdev, n);
> + if (unlikely(!ok)) {
> + /* Free still not started svqs */
> + g_ptr_array_set_size(v->shadow_vqs, n);
> + enable = false;
> + break;
> + }
> + }
Since there's almost no logic could be shared between enable and
disable. Let's split those logic out into dedicated functions where the
codes looks more easy to be reviewed (e.g have a better error handling etc).
> + }
> +
> + v->shadow_vqs_enabled = enable;
> +
> + if (!enable) {
> + /* Disable all queues or clean up failed start */
> + for (n = 0; n < v->shadow_vqs->len; ++n) {
> + unsigned vq_idx = vhost_vdpa_get_vq_index(hdev, n);
> + VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, n);
> + vhost_svq_stop(hdev, n, svq);
> + vhost_virtqueue_start(hdev, hdev->vdev, &hdev->vqs[n], vq_idx);
> + }
> +
> + /* Resources cleanup */
> + g_ptr_array_set_size(v->shadow_vqs, 0);
> + }
> +
> + return n;
> +}
>
> void qmp_x_vhost_enable_shadow_vq(const char *name, bool enable, Error **errp)
> {
> - error_setg(errp, "Shadow virtqueue still not implemented");
> + struct vhost_vdpa *v;
> + const char *err_cause = NULL;
> + bool r;
> +
> + QLIST_FOREACH(v, &vhost_vdpa_devices, entry) {
> + if (v->dev->vdev && 0 == strcmp(v->dev->vdev->name, name)) {
> + break;
> + }
> + }
I think you can iterate the NetClientStates to ge tthe vhost-vdpa backends.
> +
> + if (!v) {
> + err_cause = "Device not found";
> + goto err;
> + } else if (v->notifier[0].addr) {
> + err_cause = "Device has host notifiers enabled";
I don't get this.
Btw this function should be implemented in an independent patch after
svq is fully functional.
Thanks
> + goto err;
> + }
> +
> + r = vhost_vdpa_enable_svq(v, enable);
> + if (unlikely(!r)) {
> + err_cause = "Error enabling (see monitor)";
> + goto err;
> + }
> +
> +err:
> + if (err_cause) {
> + error_setg(errp, "Can't enable shadow vq on %s: %s", name, err_cause);
> + }
> }
>
> const VhostOps vdpa_ops = {
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 09/20] vdpa: Save call_fd in vhost-vdpa
[not found] ` <20211001070603.307037-10-eperezma@redhat.com>
@ 2021-10-13 3:43 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-13 3:43 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> We need to know it to switch to Shadow VirtQueue.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> include/hw/virtio/vhost-vdpa.h | 2 ++
> hw/virtio/vhost-vdpa.c | 5 +++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 48aae59d8e..fddac248b3 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -30,6 +30,8 @@ typedef struct vhost_vdpa {
> GPtrArray *shadow_vqs;
> struct vhost_dev *dev;
> QLIST_ENTRY(vhost_vdpa) entry;
> + /* File descriptor the device uses to call VM/SVQ */
> + int call_fd[VIRTIO_QUEUE_MAX];
Any reason we don't do this for kick_fd or why
virtio_queue_get_guest_notifier() can't work here? Need a comment or
commit log.
I think we need to have a consistent way to handle both kick and call fd.
Thanks
> VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> } VhostVDPA;
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 36c954a779..57a857444a 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -652,7 +652,12 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> struct vhost_vring_file *file)
> {
> + struct vhost_vdpa *v = dev->opaque;
> + int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> +
> trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
> +
> + v->call_fd[vdpa_idx] = file->fd;
> return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> }
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 10/20] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
[not found] ` <20211001070603.307037-11-eperezma@redhat.com>
@ 2021-10-13 3:43 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-13 3:43 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> hw/virtio/vhost-vdpa.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 57a857444a..bc34de2439 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -649,16 +649,27 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
> }
>
> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
> + struct vhost_vring_file *file)
> +{
> + trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
> + return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> +}
> +
> static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> struct vhost_vring_file *file)
> {
> struct vhost_vdpa *v = dev->opaque;
> int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>
> - trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
> -
> v->call_fd[vdpa_idx] = file->fd;
> - return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
> + if (v->shadow_vqs_enabled) {
> + VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> + vhost_svq_set_guest_call_notifier(svq, file->fd);
> + return 0;
> + } else {
> + return vhost_vdpa_set_vring_dev_call(dev, file);
> + }
I feel like we should do the same for kick fd.
Thanks
> }
>
> static int vhost_vdpa_get_features(struct vhost_dev *dev,
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 11/20] vhost: Route host->guest notification through shadow virtqueue
[not found] ` <20211001070603.307037-12-eperezma@redhat.com>
@ 2021-10-13 3:47 ` Jason Wang
[not found] ` <CAJaqyWfm734HrwTJK71hUQNYVkyDaR8OiqtGro_AX9i_pXfmBQ@mail.gmail.com>
2021-10-13 3:49 ` Jason Wang
1 sibling, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-13 3:47 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> This will make qemu aware of the device used buffers, allowing it to
> write the guest memory with its contents if needed.
>
> Since the use of vhost_virtqueue_start can unmasks and discard call
> events, vhost_virtqueue_start should be modified in one of these ways:
> * Split in two: One of them uses all logic to start a queue with no
> side effects for the guest, and another one tha actually assumes that
> the guest has just started the device. Vdpa should use just the
> former.
> * Actually store and check if the guest notifier is masked, and do it
> conditionally.
> * Left as it is, and duplicate all the logic in vhost-vdpa.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> hw/virtio/vhost-shadow-virtqueue.c | 19 +++++++++++++++
> hw/virtio/vhost-vdpa.c | 38 +++++++++++++++++++++++++++++-
> 2 files changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 21dc99ab5d..3fe129cf63 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -53,6 +53,22 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> event_notifier_set(&svq->kick_notifier);
> }
>
> +/* Forward vhost notifications */
> +static void vhost_svq_handle_call_no_test(EventNotifier *n)
> +{
> + VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> + call_notifier);
> +
> + event_notifier_set(&svq->guest_call_notifier);
> +}
> +
> +static void vhost_svq_handle_call(EventNotifier *n)
> +{
> + if (likely(event_notifier_test_and_clear(n))) {
> + vhost_svq_handle_call_no_test(n);
> + }
> +}
> +
> /*
> * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
> * exists pending used buffers.
> @@ -180,6 +196,8 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> }
>
> svq->vq = virtio_get_queue(dev->vdev, vq_idx);
> + event_notifier_set_handler(&svq->call_notifier,
> + vhost_svq_handle_call);
> return g_steal_pointer(&svq);
>
> err_init_call_notifier:
> @@ -195,6 +213,7 @@ err_init_kick_notifier:
> void vhost_svq_free(VhostShadowVirtqueue *vq)
> {
> event_notifier_cleanup(&vq->kick_notifier);
> + event_notifier_set_handler(&vq->call_notifier, NULL);
> event_notifier_cleanup(&vq->call_notifier);
> g_free(vq);
> }
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index bc34de2439..6c5f4c98b8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -712,13 +712,40 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
> {
> struct vhost_vdpa *v = dev->opaque;
> VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
> - return vhost_svq_start(dev, idx, svq);
> + EventNotifier *vhost_call_notifier = vhost_svq_get_svq_call_notifier(svq);
> + struct vhost_vring_file vhost_call_file = {
> + .index = idx + dev->vq_index,
> + .fd = event_notifier_get_fd(vhost_call_notifier),
> + };
> + int r;
> + bool b;
> +
> + /* Set shadow vq -> guest notifier */
> + assert(v->call_fd[idx]);
We need aovid the asser() here. On which case we can hit this?
> + vhost_svq_set_guest_call_notifier(svq, v->call_fd[idx]);
> +
> + b = vhost_svq_start(dev, idx, svq);
> + if (unlikely(!b)) {
> + return false;
> + }
> +
> + /* Set device -> SVQ notifier */
> + r = vhost_vdpa_set_vring_dev_call(dev, &vhost_call_file);
> + if (unlikely(r)) {
> + error_report("vhost_vdpa_set_vring_call for shadow vq failed");
> + return false;
> + }
Similar to kick, do we need to set_vring_call() before vhost_svq_start()?
> +
> + /* Check for pending calls */
> + event_notifier_set(vhost_call_notifier);
Interesting, can this result spurious interrupt?
> + return true;
> }
>
> static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> {
> struct vhost_dev *hdev = v->dev;
> unsigned n;
> + int r;
>
> if (enable == v->shadow_vqs_enabled) {
> return hdev->nvqs;
> @@ -752,9 +779,18 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> if (!enable) {
> /* Disable all queues or clean up failed start */
> for (n = 0; n < v->shadow_vqs->len; ++n) {
> + struct vhost_vring_file file = {
> + .index = vhost_vdpa_get_vq_index(hdev, n),
> + .fd = v->call_fd[n],
> + };
> +
> + r = vhost_vdpa_set_vring_call(hdev, &file);
> + assert(r == 0);
> +
> unsigned vq_idx = vhost_vdpa_get_vq_index(hdev, n);
> VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, n);
> vhost_svq_stop(hdev, n, svq);
> + /* TODO: This can unmask or override call fd! */
I don't get this comment. Does this mean the current code can't work
with mask_notifiers? If yes, this is something we need to fix.
Thanks
> vhost_virtqueue_start(hdev, hdev->vdev, &hdev->vqs[n], vq_idx);
> }
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 11/20] vhost: Route host->guest notification through shadow virtqueue
[not found] ` <20211001070603.307037-12-eperezma@redhat.com>
2021-10-13 3:47 ` [RFC PATCH v4 11/20] vhost: Route host->guest notification through shadow virtqueue Jason Wang
@ 2021-10-13 3:49 ` Jason Wang
[not found] ` <CAJaqyWcQ314RN7-U1bYqCMXb+-nyhSi3ddqWv90ofFucMbveUw@mail.gmail.com>
1 sibling, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-13 3:49 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> This will make qemu aware of the device used buffers, allowing it to
> write the guest memory with its contents if needed.
>
> Since the use of vhost_virtqueue_start can unmasks and discard call
> events, vhost_virtqueue_start should be modified in one of these ways:
> * Split in two: One of them uses all logic to start a queue with no
> side effects for the guest, and another one tha actually assumes that
> the guest has just started the device. Vdpa should use just the
> former.
> * Actually store and check if the guest notifier is masked, and do it
> conditionally.
> * Left as it is, and duplicate all the logic in vhost-vdpa.
Btw, the log looks not clear. I guess this patch goes for method 3. If
yes, we need explain it and why.
Thanks
>
> Signed-off-by: Eugenio Pérez<eperezma@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 12/20] virtio: Add vhost_shadow_vq_get_vring_addr
[not found] ` <20211001070603.307037-13-eperezma@redhat.com>
@ 2021-10-13 3:54 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-13 3:54 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> It reports the shadow virtqueue address from qemu virtual address space
I think both the title and commit log needs to more tweaks. Looking at
the codes, what id does is actually introduce vring into svq.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> hw/virtio/vhost-shadow-virtqueue.h | 4 +++
> hw/virtio/vhost-shadow-virtqueue.c | 50 ++++++++++++++++++++++++++++++
> 2 files changed, 54 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 237cfceb9c..2df3d117f5 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -16,6 +16,10 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>
> EventNotifier *vhost_svq_get_svq_call_notifier(VhostShadowVirtqueue *svq);
> void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> +void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> + struct vhost_vring_addr *addr);
> +size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
> +size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>
> bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> VhostShadowVirtqueue *svq);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 3fe129cf63..5c1899f6af 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -18,6 +18,9 @@
>
> /* Shadow virtqueue to relay notifications */
> typedef struct VhostShadowVirtqueue {
> + /* Shadow vring */
> + struct vring vring;
> +
> /* Shadow kick notifier, sent to vhost */
> EventNotifier kick_notifier;
> /* Shadow call notifier, sent to vhost */
> @@ -38,6 +41,9 @@ typedef struct VhostShadowVirtqueue {
>
> /* Virtio queue shadowing */
> VirtQueue *vq;
> +
> + /* Virtio device */
> + VirtIODevice *vdev;
> } VhostShadowVirtqueue;
>
> /* Forward guest notifications */
> @@ -93,6 +99,35 @@ void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
> event_notifier_init_fd(&svq->guest_call_notifier, call_fd);
> }
>
> +/*
> + * Get the shadow vq vring address.
> + * @svq Shadow virtqueue
> + * @addr Destination to store address
> + */
> +void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
> + struct vhost_vring_addr *addr)
> +{
> + addr->desc_user_addr = (uint64_t)svq->vring.desc;
> + addr->avail_user_addr = (uint64_t)svq->vring.avail;
> + addr->used_user_addr = (uint64_t)svq->vring.used;
> +}
> +
> +size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
> +{
> + uint16_t vq_idx = virtio_get_queue_index(svq->vq);
> + size_t desc_size = virtio_queue_get_desc_size(svq->vdev, vq_idx);
> + size_t avail_size = virtio_queue_get_avail_size(svq->vdev, vq_idx);
> +
> + return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size);
Is this round up required by the spec?
> +}
> +
> +size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq)
> +{
> + uint16_t vq_idx = virtio_get_queue_index(svq->vq);
> + size_t used_size = virtio_queue_get_used_size(svq->vdev, vq_idx);
> + return ROUND_UP(used_size, qemu_real_host_page_size);
> +}
> +
> /*
> * Restore the vhost guest to host notifier, i.e., disables svq effect.
> */
> @@ -178,6 +213,10 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> {
> int vq_idx = dev->vq_index + idx;
> + unsigned num = virtio_queue_get_num(dev->vdev, vq_idx);
> + size_t desc_size = virtio_queue_get_desc_size(dev->vdev, vq_idx);
> + size_t driver_size;
> + size_t device_size;
> g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> int r;
>
> @@ -196,6 +235,15 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> }
>
> svq->vq = virtio_get_queue(dev->vdev, vq_idx);
> + svq->vdev = dev->vdev;
> + driver_size = vhost_svq_driver_area_size(svq);
> + device_size = vhost_svq_device_area_size(svq);
> + svq->vring.num = num;
> + svq->vring.desc = qemu_memalign(qemu_real_host_page_size, driver_size);
> + svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
> + memset(svq->vring.desc, 0, driver_size);
Any reason for using the contiguous area for both desc and avail?
Thanks
> + svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> + memset(svq->vring.used, 0, device_size);
> event_notifier_set_handler(&svq->call_notifier,
> vhost_svq_handle_call);
> return g_steal_pointer(&svq);
> @@ -215,5 +263,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
> event_notifier_cleanup(&vq->kick_notifier);
> event_notifier_set_handler(&vq->call_notifier, NULL);
> event_notifier_cleanup(&vq->call_notifier);
> + qemu_vfree(vq->vring.desc);
> + qemu_vfree(vq->vring.used);
> g_free(vq);
> }
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 13/20] vdpa: Save host and guest features
[not found] ` <20211001070603.307037-14-eperezma@redhat.com>
@ 2021-10-13 3:56 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-13 3:56 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> Those are needed for SVQ: Host ones are needed to check if SVQ knows
> how to talk with the device and for feature negotiation, and guest ones
> to know if SVQ can talk with it.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> include/hw/virtio/vhost-vdpa.h | 2 ++
> hw/virtio/vhost-vdpa.c | 31 ++++++++++++++++++++++++++++---
> 2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index fddac248b3..9044ae694b 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -26,6 +26,8 @@ typedef struct vhost_vdpa {
> int device_fd;
> uint32_t msg_type;
> MemoryListener listener;
> + uint64_t host_features;
> + uint64_t guest_features;
Any reason that we can't use the features stored in VirtioDevice?
Thanks
> bool shadow_vqs_enabled;
> GPtrArray *shadow_vqs;
> struct vhost_dev *dev;
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 6c5f4c98b8..a057e8277d 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -439,10 +439,19 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
> return 0;
> }
>
> -static int vhost_vdpa_set_features(struct vhost_dev *dev,
> - uint64_t features)
> +/**
> + * Internal set_features() that follows vhost/VirtIO protocol for that
> + */
> +static int vhost_vdpa_backend_set_features(struct vhost_dev *dev,
> + uint64_t features)
> {
> + struct vhost_vdpa *v = dev->opaque;
> +
> int ret;
> + if (v->host_features & BIT_ULL(VIRTIO_F_QUEUE_STATE)) {
> + features |= BIT_ULL(VIRTIO_F_QUEUE_STATE);
> + }
> +
> trace_vhost_vdpa_set_features(dev, features);
> ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> uint8_t status = 0;
> @@ -455,6 +464,17 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> return !(status & VIRTIO_CONFIG_S_FEATURES_OK);
> }
>
> +/**
> + * Exposed vhost set features
> + */
> +static int vhost_vdpa_set_features(struct vhost_dev *dev,
> + uint64_t features)
> +{
> + struct vhost_vdpa *v = dev->opaque;
> + v->guest_features = features;
> + return vhost_vdpa_backend_set_features(dev, features);
> +}
> +
> static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> {
> uint64_t features;
> @@ -673,12 +693,17 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> }
>
> static int vhost_vdpa_get_features(struct vhost_dev *dev,
> - uint64_t *features)
> + uint64_t *features)
> {
> int ret;
>
> ret = vhost_vdpa_call(dev, VHOST_GET_FEATURES, features);
> trace_vhost_vdpa_get_features(dev, *features);
> +
> + if (ret == 0) {
> + struct vhost_vdpa *v = dev->opaque;
> + v->host_features = *features;
> + }
> return ret;
> }
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 15/20] vhost: Shadow virtqueue buffers forwarding
[not found] ` <20211001070603.307037-16-eperezma@redhat.com>
@ 2021-10-13 4:31 ` Jason Wang
[not found] ` <CAJaqyWeaJyxh-tt45wxONzuOLhVt6wO48e2ufZZ3uECHTDofFw@mail.gmail.com>
0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-13 4:31 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> Initial version of shadow virtqueue that actually forward buffers. There
> are no iommu support at the moment, and that will be addressed in future
> patches of this series. Since all vhost-vdpa devices uses forced IOMMU,
> this means that SVQ is not usable at this point of the series on any
> device.
>
> For simplicity it only supports modern devices, that expects vring
> in little endian, with split ring and no event idx or indirect
> descriptors. Support for them will not be added in this series.
>
> It reuses the VirtQueue code for the device part. The driver part is
> based on Linux's virtio_ring driver, but with stripped functionality
> and optimizations so it's easier to review. Later commits add simpler
> ones.
>
> SVQ uses VIRTIO_CONFIG_S_DEVICE_STOPPED to pause the device and
> retrieve its status (next available idx the device was going to
> consume) race-free. It can later reset the device to replace vring
> addresses etc. When SVQ starts qemu can resume consuming the guest's
> driver ring from that state, without notice from the latter.
>
> This status bit VIRTIO_CONFIG_S_DEVICE_STOPPED is currently discussed
> in VirtIO, and is implemented in qemu VirtIO-net devices in previous
> commits.
>
> Removal of _S_DEVICE_STOPPED bit (in other words, resuming the device)
> can be done in the future if an use case arises. At this moment we can
> just rely on reseting the full device.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> qapi/net.json | 2 +-
> hw/virtio/vhost-shadow-virtqueue.c | 237 ++++++++++++++++++++++++++++-
> hw/virtio/vhost-vdpa.c | 109 ++++++++++++-
> 3 files changed, 337 insertions(+), 11 deletions(-)
>
> diff --git a/qapi/net.json b/qapi/net.json
> index fe546b0e7c..1f4a55f2c5 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -86,7 +86,7 @@
> #
> # @name: the device name of the VirtIO device
> #
> -# @enable: true to use the alternate shadow VQ notifications
> +# @enable: true to use the alternate shadow VQ buffers fowarding path
> #
> # Returns: Error if failure, or 'no error' for success.
> #
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 34e159d4fd..df7e6fa3ec 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -10,6 +10,7 @@
> #include "qemu/osdep.h"
> #include "hw/virtio/vhost-shadow-virtqueue.h"
> #include "hw/virtio/vhost.h"
> +#include "hw/virtio/virtio-access.h"
>
> #include "standard-headers/linux/vhost_types.h"
>
> @@ -44,15 +45,135 @@ typedef struct VhostShadowVirtqueue {
>
> /* Virtio device */
> VirtIODevice *vdev;
> +
> + /* Map for returning guest's descriptors */
> + VirtQueueElement **ring_id_maps;
> +
> + /* Next head to expose to device */
> + uint16_t avail_idx_shadow;
> +
> + /* Next free descriptor */
> + uint16_t free_head;
> +
> + /* Last seen used idx */
> + uint16_t shadow_used_idx;
> +
> + /* Next head to consume from device */
> + uint16_t used_idx;
Let's use "last_used_idx" as kernel driver did.
> } VhostShadowVirtqueue;
>
> /* If the device is using some of these, SVQ cannot communicate */
> bool vhost_svq_valid_device_features(uint64_t *dev_features)
> {
> - return true;
> + uint64_t b;
> + bool r = true;
> +
> + for (b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END; ++b) {
> + switch (b) {
> + case VIRTIO_F_NOTIFY_ON_EMPTY:
> + case VIRTIO_F_ANY_LAYOUT:
> + /* SVQ is fine with this feature */
> + continue;
> +
> + case VIRTIO_F_ACCESS_PLATFORM:
> + /* SVQ needs this feature disabled. Can't continue */
So code can explain itself, need a comment to explain why.
> + if (*dev_features & BIT_ULL(b)) {
> + clear_bit(b, dev_features);
> + r = false;
> + }
> + break;
> +
> + case VIRTIO_F_VERSION_1:
> + /* SVQ needs this feature, so can't continue */
A comment to explain why SVQ needs this feature.
> + if (!(*dev_features & BIT_ULL(b))) {
> + set_bit(b, dev_features);
> + r = false;
> + }
> + continue;
> +
> + default:
> + /*
> + * SVQ must disable this feature, let's hope the device is fine
> + * without it.
> + */
> + if (*dev_features & BIT_ULL(b)) {
> + clear_bit(b, dev_features);
> + }
> + }
> + }
> +
> + return r;
> +}
Let's move this to patch 14.
> +
> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> + const struct iovec *iovec,
> + size_t num, bool more_descs, bool write)
> +{
> + uint16_t i = svq->free_head, last = svq->free_head;
> + unsigned n;
> + uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> + vring_desc_t *descs = svq->vring.desc;
> +
> + if (num == 0) {
> + return;
> + }
> +
> + for (n = 0; n < num; n++) {
> + if (more_descs || (n + 1 < num)) {
> + descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> + } else {
> + descs[i].flags = flags;
> + }
> + descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> + descs[i].len = cpu_to_le32(iovec[n].iov_len);
> +
> + last = i;
> + i = cpu_to_le16(descs[i].next);
> + }
> +
> + svq->free_head = le16_to_cpu(descs[last].next);
> +}
> +
> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> + VirtQueueElement *elem)
> +{
> + int head;
> + unsigned avail_idx;
> + vring_avail_t *avail = svq->vring.avail;
> +
> + head = svq->free_head;
> +
> + /* We need some descriptors here */
> + assert(elem->out_num || elem->in_num);
> +
> + vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> + elem->in_num > 0, false);
> + vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> +
> + /*
> + * Put entry in available array (but don't update avail->idx until they
> + * do sync).
> + */
> + avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> + avail->ring[avail_idx] = cpu_to_le16(head);
> + svq->avail_idx_shadow++;
> +
> + /* Update avail index after the descriptor is wrote */
> + smp_wmb();
> + avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> +
> + return head;
> +
> }
>
> -/* Forward guest notifications */
> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +{
> + unsigned qemu_head = vhost_svq_add_split(svq, elem);
> +
> + svq->ring_id_maps[qemu_head] = elem;
> +}
> +
> +/* Handle guest->device notifications */
> static void vhost_handle_guest_kick(EventNotifier *n)
> {
> VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> @@ -62,7 +183,74 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> return;
> }
>
> - event_notifier_set(&svq->kick_notifier);
> + /* Make available as many buffers as possible */
> + do {
> + if (virtio_queue_get_notification(svq->vq)) {
> + /* No more notifications until process all available */
> + virtio_queue_set_notification(svq->vq, false);
> + }
This can be done outside the loop.
> +
> + while (true) {
> + VirtQueueElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> + if (!elem) {
> + break;
> + }
> +
> + vhost_svq_add(svq, elem);
> + event_notifier_set(&svq->kick_notifier);
> + }
> +
> + virtio_queue_set_notification(svq->vq, true);
I think this can be moved to the end of this function.
Btw, we probably need a quota to make sure the svq is not hogging the
main event loop.
Similar issue could be found in both virtio-net TX (using timer or bh)
and TAP (a quota).
> + } while (!virtio_queue_empty(svq->vq));
> +}
> +
> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> +{
> + if (svq->used_idx != svq->shadow_used_idx) {
> + return true;
> + }
> +
> + /* Get used idx must not be reordered */
> + smp_rmb();
Interesting, we don't do this for kernel drivers. It would be helpful to
explain it more clear by "X must be done before Y".
> + svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> +
> + return svq->used_idx != svq->shadow_used_idx;
> +}
> +
> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> +{
> + vring_desc_t *descs = svq->vring.desc;
> + const vring_used_t *used = svq->vring.used;
> + vring_used_elem_t used_elem;
> + uint16_t last_used;
> +
> + if (!vhost_svq_more_used(svq)) {
> + return NULL;
> + }
> +
> + last_used = svq->used_idx & (svq->vring.num - 1);
> + used_elem.id = le32_to_cpu(used->ring[last_used].id);
> + used_elem.len = le32_to_cpu(used->ring[last_used].len);
> +
> + svq->used_idx++;
> + if (unlikely(used_elem.id >= svq->vring.num)) {
> + error_report("Device %s says index %u is used", svq->vdev->name,
> + used_elem.id);
> + return NULL;
> + }
> +
> + if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> + error_report(
> + "Device %s says index %u is used, but it was not available",
> + svq->vdev->name, used_elem.id);
> + return NULL;
> + }
> +
> + descs[used_elem.id].next = svq->free_head;
> + svq->free_head = used_elem.id;
> +
> + svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> + return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> }
>
> /* Forward vhost notifications */
> @@ -70,8 +258,26 @@ static void vhost_svq_handle_call_no_test(EventNotifier *n)
> {
> VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> call_notifier);
> -
> - event_notifier_set(&svq->guest_call_notifier);
> + VirtQueue *vq = svq->vq;
> +
> + /* Make as many buffers as possible used. */
> + do {
> + unsigned i = 0;
> +
> + /* TODO: Use VRING_AVAIL_F_NO_INTERRUPT */
> + while (true) {
> + g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> + if (!elem) {
> + break;
> + }
> +
> + assert(i < svq->vring.num);
Let's return error instead of using the assert.
> + virtqueue_fill(vq, elem, elem->len, i++);
> + }
> +
> + virtqueue_flush(vq, i);
> + event_notifier_set(&svq->guest_call_notifier);
> + } while (vhost_svq_more_used(svq));
> }
>
> static void vhost_svq_handle_call(EventNotifier *n)
> @@ -204,12 +410,25 @@ err_set_vring_kick:
> void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> VhostShadowVirtqueue *svq)
> {
> + int i;
> int r = vhost_svq_restore_vdev_host_notifier(dev, idx, svq);
> +
> if (unlikely(r < 0)) {
> error_report("Couldn't restore vq kick fd: %s", strerror(-r));
> }
>
> event_notifier_set_handler(&svq->host_notifier, NULL);
> +
> + for (i = 0; i < svq->vring.num; ++i) {
> + g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> + /*
> + * Although the doc says we must unpop in order, it's ok to unpop
> + * everything.
> + */
> + if (elem) {
> + virtqueue_unpop(svq->vq, elem, elem->len);
> + }
Will this result some of the "pending" buffers to be submitted multiple
times? If yes, should we wait for all the buffers used instead of doing
the unpop here?
> + }
> }
>
> /*
> @@ -224,7 +443,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> size_t driver_size;
> size_t device_size;
> g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> - int r;
> + int r, i;
>
> r = event_notifier_init(&svq->kick_notifier, 0);
> if (r != 0) {
> @@ -250,6 +469,11 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> memset(svq->vring.desc, 0, driver_size);
> svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> memset(svq->vring.used, 0, device_size);
> + for (i = 0; i < num - 1; i++) {
> + svq->vring.desc[i].next = cpu_to_le16(i + 1);
> + }
> +
> + svq->ring_id_maps = g_new0(VirtQueueElement *, num);
> event_notifier_set_handler(&svq->call_notifier,
> vhost_svq_handle_call);
> return g_steal_pointer(&svq);
> @@ -269,6 +493,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
> event_notifier_cleanup(&vq->kick_notifier);
> event_notifier_set_handler(&vq->call_notifier, NULL);
> event_notifier_cleanup(&vq->call_notifier);
> + g_free(vq->ring_id_maps);
> qemu_vfree(vq->vring.desc);
> qemu_vfree(vq->vring.used);
> g_free(vq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index a057e8277d..bb7010ddb5 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -19,6 +19,7 @@
> #include "hw/virtio/virtio-net.h"
> #include "hw/virtio/vhost-shadow-virtqueue.h"
> #include "hw/virtio/vhost-vdpa.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
> #include "exec/address-spaces.h"
> #include "qemu/main-loop.h"
> #include "cpu.h"
> @@ -475,6 +476,28 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> return vhost_vdpa_backend_set_features(dev, features);
> }
>
> +/**
> + * Restore guest features to vdpa device
> + */
> +static int vhost_vdpa_set_guest_features(struct vhost_dev *dev)
> +{
> + struct vhost_vdpa *v = dev->opaque;
> + return vhost_vdpa_backend_set_features(dev, v->guest_features);
> +}
> +
> +/**
> + * Set shadow virtqueue supported features
> + */
> +static int vhost_vdpa_set_svq_features(struct vhost_dev *dev)
> +{
> + struct vhost_vdpa *v = dev->opaque;
> + uint64_t features = v->host_features;
> + bool b = vhost_svq_valid_device_features(&features);
> + assert(b);
> +
> + return vhost_vdpa_backend_set_features(dev, features);
> +}
> +
> static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
> {
> uint64_t features;
> @@ -730,6 +753,19 @@ static bool vhost_vdpa_force_iommu(struct vhost_dev *dev)
> return true;
> }
>
> +static int vhost_vdpa_vring_pause(struct vhost_dev *dev)
> +{
> + int r;
> + uint8_t status;
> +
> + vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DEVICE_STOPPED);
> + do {
> + r = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
I guess we'd better add some sleep here.
> + } while (r == 0 && !(status & VIRTIO_CONFIG_S_DEVICE_STOPPED));
> +
> + return 0;
> +}
> +
> /*
> * Start shadow virtqueue.
> */
> @@ -742,9 +778,29 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
> .index = idx + dev->vq_index,
> .fd = event_notifier_get_fd(vhost_call_notifier),
> };
> + struct vhost_vring_addr addr = {
> + .index = idx + dev->vq_index,
> + };
> + struct vhost_vring_state num = {
> + .index = idx + dev->vq_index,
> + .num = virtio_queue_get_num(dev->vdev, idx),
> + };
> int r;
> bool b;
>
> + vhost_svq_get_vring_addr(svq, &addr);
> + r = vhost_vdpa_set_vring_addr(dev, &addr);
> + if (unlikely(r)) {
> + error_report("vhost_set_vring_addr for shadow vq failed");
> + return false;
> + }
> +
> + r = vhost_vdpa_set_vring_num(dev, &num);
> + if (unlikely(r)) {
> + error_report("vhost_vdpa_set_vring_num for shadow vq failed");
> + return false;
> + }
> +
> /* Set shadow vq -> guest notifier */
> assert(v->call_fd[idx]);
> vhost_svq_set_guest_call_notifier(svq, v->call_fd[idx]);
> @@ -781,15 +837,32 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> assert(v->shadow_vqs->len == 0);
> for (n = 0; n < hdev->nvqs; ++n) {
> VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
> - bool ok;
> -
> if (unlikely(!svq)) {
> g_ptr_array_set_size(v->shadow_vqs, 0);
> return 0;
> }
> g_ptr_array_add(v->shadow_vqs, svq);
> + }
> + }
>
> - ok = vhost_vdpa_svq_start_vq(hdev, n);
> + r = vhost_vdpa_vring_pause(hdev);
> + assert(r == 0);
> +
> + if (enable) {
> + for (n = 0; n < v->shadow_vqs->len; ++n) {
> + /* Obtain Virtqueue state */
> + vhost_virtqueue_stop(hdev, hdev->vdev, &hdev->vqs[n], n);
> + }
> + }
> +
> + /* Reset device so it can be configured */
> + r = vhost_vdpa_dev_start(hdev, false);
> + assert(r == 0);
> +
> + if (enable) {
> + int r;
> + for (n = 0; n < v->shadow_vqs->len; ++n) {
> + bool ok = vhost_vdpa_svq_start_vq(hdev, n);
> if (unlikely(!ok)) {
> /* Free still not started svqs */
> g_ptr_array_set_size(v->shadow_vqs, n);
> @@ -797,11 +870,19 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> break;
> }
> }
> +
> + /* Need to ack features to set state in vp_vdpa devices */
vhost_vdpa actually?
> + r = vhost_vdpa_set_svq_features(hdev);
> + if (unlikely(r)) {
> + enable = false;
> + }
> }
>
> v->shadow_vqs_enabled = enable;
>
> if (!enable) {
> + vhost_vdpa_set_guest_features(hdev);
> +
> /* Disable all queues or clean up failed start */
> for (n = 0; n < v->shadow_vqs->len; ++n) {
> struct vhost_vring_file file = {
> @@ -818,7 +899,12 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> /* TODO: This can unmask or override call fd! */
> vhost_virtqueue_start(hdev, hdev->vdev, &hdev->vqs[n], vq_idx);
> }
> + }
>
> + r = vhost_vdpa_dev_start(hdev, true);
> + assert(r == 0);
> +
> + if (!enable) {
> /* Resources cleanup */
> g_ptr_array_set_size(v->shadow_vqs, 0);
> }
> @@ -831,6 +917,7 @@ void qmp_x_vhost_enable_shadow_vq(const char *name, bool enable, Error **errp)
> struct vhost_vdpa *v;
> const char *err_cause = NULL;
> bool r;
> + uint64_t svq_features;
>
> QLIST_FOREACH(v, &vhost_vdpa_devices, entry) {
> if (v->dev->vdev && 0 == strcmp(v->dev->vdev->name, name)) {
> @@ -846,6 +933,20 @@ void qmp_x_vhost_enable_shadow_vq(const char *name, bool enable, Error **errp)
> goto err;
> }
>
> + svq_features = v->host_features;
> + if (!vhost_svq_valid_device_features(&svq_features)) {
> + error_setg(errp,
> + "Can't enable shadow vq on %s: Unexpected feature flags (%lx-%lx)",
> + name, v->host_features, svq_features);
> + return;
> + } else {
> + /* TODO: Check for virtio_vdpa + IOMMU & modern device */
I guess you mean "vhost_vdpa" here. For IOMMU, I guess you mean "vIOMMU"
actually?
Thanks
> + }
> +
> + if (err_cause) {
> + goto err;
> + }
> +
> r = vhost_vdpa_enable_svq(v, enable);
> if (unlikely(!r)) {
> err_cause = "Error enabling (see monitor)";
> @@ -853,7 +954,7 @@ void qmp_x_vhost_enable_shadow_vq(const char *name, bool enable, Error **errp)
> }
>
> err:
> - if (err_cause) {
> + if (errp == NULL && err_cause) {
> error_setg(errp, "Can't enable shadow vq on %s: %s", name, err_cause);
> }
> }
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 16/20] vhost: Check for device VRING_USED_F_NO_NOTIFY at shadow virtqueue kick
[not found] ` <20211001070603.307037-17-eperezma@redhat.com>
@ 2021-10-13 4:35 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-13 4:35 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> hw/virtio/vhost-shadow-virtqueue.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index df7e6fa3ec..775f8d36a0 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -173,6 +173,15 @@ static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> svq->ring_id_maps[qemu_head] = elem;
> }
>
> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> +{
> + /* Make sure we are reading updated device flag */
I guess this would be better:
/* We need to expose available array entries before checking used
* flags. */
(Borrowed from kernel codes).
Thanks
> + smp_mb();
> + if (!(svq->vring.used->flags & VRING_USED_F_NO_NOTIFY)) {
> + event_notifier_set(&svq->kick_notifier);
> + }
> +}
> +
> /* Handle guest->device notifications */
> static void vhost_handle_guest_kick(EventNotifier *n)
> {
> @@ -197,7 +206,7 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> }
>
> vhost_svq_add(svq, elem);
> - event_notifier_set(&svq->kick_notifier);
> + vhost_svq_kick(svq);
> }
>
> virtio_queue_set_notification(svq->vq, true);
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 17/20] vhost: Use VRING_AVAIL_F_NO_INTERRUPT at device call on shadow virtqueue
[not found] ` <20211001070603.307037-18-eperezma@redhat.com>
@ 2021-10-13 4:36 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-13 4:36 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Commit log please.
Thanks
> ---
> hw/virtio/vhost-shadow-virtqueue.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 775f8d36a0..2fd0bab75d 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -60,6 +60,9 @@ typedef struct VhostShadowVirtqueue {
>
> /* Next head to consume from device */
> uint16_t used_idx;
> +
> + /* Cache for the exposed notification flag */
> + bool notification;
> } VhostShadowVirtqueue;
>
> /* If the device is using some of these, SVQ cannot communicate */
> @@ -105,6 +108,24 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> return r;
> }
>
> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> +{
> + uint16_t notification_flag;
> +
> + if (svq->notification == enable) {
> + return;
> + }
> +
> + notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> +
> + svq->notification = enable;
> + if (enable) {
> + svq->vring.avail->flags &= ~notification_flag;
> + } else {
> + svq->vring.avail->flags |= notification_flag;
> + }
> +}
> +
> static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> const struct iovec *iovec,
> size_t num, bool more_descs, bool write)
> @@ -273,7 +294,7 @@ static void vhost_svq_handle_call_no_test(EventNotifier *n)
> do {
> unsigned i = 0;
>
> - /* TODO: Use VRING_AVAIL_F_NO_INTERRUPT */
> + vhost_svq_set_notification(svq, false);
> while (true) {
> g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> if (!elem) {
> @@ -286,6 +307,7 @@ static void vhost_svq_handle_call_no_test(EventNotifier *n)
>
> virtqueue_flush(vq, i);
> event_notifier_set(&svq->guest_call_notifier);
> + vhost_svq_set_notification(svq, true);
> } while (vhost_svq_more_used(svq));
> }
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
[not found] ` <20211001070603.307037-21-eperezma@redhat.com>
@ 2021-10-13 5:34 ` Jason Wang
[not found] ` <CAJaqyWdEGWFNrxqKxRya=ybRiP0wTZ0aPksBBeOe9KOjOmUnqA@mail.gmail.com>
2021-10-19 9:24 ` Jason Wang
1 sibling, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-13 5:34 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> Use translations added in VhostIOVATree in SVQ.
>
> Now every element needs to store the previous address also, so VirtQueue
> can consume the elements properly. This adds a little overhead per VQ
> element, having to allocate more memory to stash them. As a possible
> optimization, this allocation could be avoided if the descriptor is not
> a chain but a single one, but this is left undone.
>
> TODO: iova range should be queried before, and add logic to fail when
> GPA is outside of its range and memory listener or svq add it.
>
> Signed-off-by: Eugenio Pérez<eperezma@redhat.com>
> ---
> hw/virtio/vhost-shadow-virtqueue.h | 4 +-
> hw/virtio/vhost-shadow-virtqueue.c | 130 ++++++++++++++++++++++++-----
> hw/virtio/vhost-vdpa.c | 40 ++++++++-
> hw/virtio/trace-events | 1 +
> 4 files changed, 152 insertions(+), 23 deletions(-)
Think hard about the whole logic. This is safe since qemu memory map
will fail if guest submits a invalidate IOVA.
Then I wonder if we do something much more simpler:
1) Using qemu VA as IOVA but only maps the VA that belongs to guest
2) Then we don't need any IOVA tree here, what we need is to just map
vring and use qemu VA without any translation
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 08/20] vhost: Route guest->host notification through shadow virtqueue
[not found] ` <CAJaqyWd2joWx3kKz=cJBs4UxZofP7ETkbpg9+cSQSE2MSyBtUg@mail.gmail.com>
@ 2021-10-15 3:45 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-15 3:45 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/14 下午8:00, Eugenio Perez Martin 写道:
> On Wed, Oct 13, 2021 at 5:27 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
>>> Shadow virtqueue notifications forwarding is disabled when vhost_dev
>>> stops, so code flow follows usual cleanup.
>>>
>>> Also, host notifiers must be disabled at SVQ start,
>>
>> Any reason for this?
>>
> It will be addressed in a later series, sorry.
>
>>> and they will not
>>> start if SVQ has been enabled when device is stopped. This is trivial
>>> to address, but it is left out for simplicity at this moment.
>>
>> It looks to me this patch also contains the following logics
>>
>> 1) codes to enable svq
>>
>> 2) codes to let svq to be enabled from QMP.
>>
>> I think they need to be split out,
> I agree that we can split this more, with the code that belongs to SVQ
> and the code that belongs to vhost-vdpa. it will be addressed in
> future series.
>
>> we may endup with the following
>> series of patches
>>
> With "series of patches" do you mean to send every step in a separated
> series? There are odds of having the need of modifying code already
> sent & merged with later ones. If you confirm to me that it is fine, I
> can do it that way for sure.
Sorry for being unclear. I meant it's a sub-series actually of the series.
>
>> 1) svq skeleton with enable/disable
>> 2) route host notifier to svq
>> 3) route guest notifier to svq
>> 4) codes to enable svq
>> 5) enable svq via QMP
>>
> I'm totally fine with that, but there is code that is never called if
> the qmp command is not added. The compiler complains about static
> functions that are not called, making impossible things like bisecting
> through these commits, unless I use attribute((unused)) or similar. Or
> have I missed something?
You're right, then I think we can then:
1) svq skeleton with enable/disable via QMP
2) route host notifier to svq
3) route guest notifier to svq
>
> We could do that way with the code that belongs to SVQ though, since
> all of it is declared in headers. But to delay the "enable svq via
> qmp" to the last one makes debugging harder, as we cannot just enable
> notifications forwarding with no buffers forwarding.
Yes.
>
> If I introduce a change in the notifications code, I can simply go to
> these commits and enable SVQ for notifications. This way I can have an
> idea of what part is failing. A similar logic can be applied to other
> devices than vp_vdpa.
vhost-vdpa actually?
> We would lose it if we
>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>> qapi/net.json | 2 +-
>>> hw/virtio/vhost-shadow-virtqueue.h | 8 ++
>>> include/hw/virtio/vhost-vdpa.h | 4 +
>>> hw/virtio/vhost-shadow-virtqueue.c | 138 ++++++++++++++++++++++++++++-
>>> hw/virtio/vhost-vdpa.c | 116 +++++++++++++++++++++++-
>>> 5 files changed, 264 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/qapi/net.json b/qapi/net.json
>>> index a2c30fd455..fe546b0e7c 100644
>>> --- a/qapi/net.json
>>> +++ b/qapi/net.json
>>> @@ -88,7 +88,7 @@
>>> #
>>> # @enable: true to use the alternate shadow VQ notifications
>>> #
>>> -# Returns: Always error, since SVQ is not implemented at the moment.
>>> +# Returns: Error if failure, or 'no error' for success.
>>> #
>>> # Since: 6.2
>>> #
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 27ac6388fa..237cfceb9c 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -14,6 +14,14 @@
>>>
>>> typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>>>
>>> +EventNotifier *vhost_svq_get_svq_call_notifier(VhostShadowVirtqueue *svq);
>>
>> Let's move this function to another patch since it's unrelated to the
>> guest->host routing.
>>
> Right, I missed it while squashing commits and at later reviews.
>
>>> +void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
>>> +
>>> +bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
>>> + VhostShadowVirtqueue *svq);
>>> +void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
>>> + VhostShadowVirtqueue *svq);
>>> +
>>> VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
>>>
>>> void vhost_svq_free(VhostShadowVirtqueue *vq);
>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
>>> index 0d565bb5bd..48aae59d8e 100644
>>> --- a/include/hw/virtio/vhost-vdpa.h
>>> +++ b/include/hw/virtio/vhost-vdpa.h
>>> @@ -12,6 +12,8 @@
>>> #ifndef HW_VIRTIO_VHOST_VDPA_H
>>> #define HW_VIRTIO_VHOST_VDPA_H
>>>
>>> +#include <gmodule.h>
>>> +
>>> #include "qemu/queue.h"
>>> #include "hw/virtio/virtio.h"
>>>
>>> @@ -24,6 +26,8 @@ typedef struct vhost_vdpa {
>>> int device_fd;
>>> uint32_t msg_type;
>>> MemoryListener listener;
>>> + bool shadow_vqs_enabled;
>>> + GPtrArray *shadow_vqs;
>>> struct vhost_dev *dev;
>>> QLIST_ENTRY(vhost_vdpa) entry;
>>> VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index c4826a1b56..21dc99ab5d 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -9,9 +9,12 @@
>>>
>>> #include "qemu/osdep.h"
>>> #include "hw/virtio/vhost-shadow-virtqueue.h"
>>> +#include "hw/virtio/vhost.h"
>>> +
>>> +#include "standard-headers/linux/vhost_types.h"
>>>
>>> #include "qemu/error-report.h"
>>> -#include "qemu/event_notifier.h"
>>> +#include "qemu/main-loop.h"
>>>
>>> /* Shadow virtqueue to relay notifications */
>>> typedef struct VhostShadowVirtqueue {
>>> @@ -19,14 +22,146 @@ typedef struct VhostShadowVirtqueue {
>>> EventNotifier kick_notifier;
>>> /* Shadow call notifier, sent to vhost */
>>> EventNotifier call_notifier;
>>> +
>>> + /*
>>> + * Borrowed virtqueue's guest to host notifier.
>>> + * To borrow it in this event notifier allows to register on the event
>>> + * loop and access the associated shadow virtqueue easily. If we use the
>>> + * VirtQueue, we don't have an easy way to retrieve it.
>>> + *
>>> + * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
>>> + */
>>> + EventNotifier host_notifier;
>>> +
>>> + /* Guest's call notifier, where SVQ calls guest. */
>>> + EventNotifier guest_call_notifier;
>>
>> To be consistent, let's simply use "guest_notifier" here.
>>
> It could be confused when the series adds a guest -> qemu kick
> notifier then. Actually, I think it would be better to rename
> host_notifier to something like host_svq_notifier. Or host_call and
> guest_call, since "notifier" is already in the type, making the name
> to be a little bit "Hungarian notation".
I think that's fine, just need to make sure we have a consistent name
for SVQ notifier.
>
>>> +
>>> + /* Virtio queue shadowing */
>>> + VirtQueue *vq;
>>> } VhostShadowVirtqueue;
>>>
>>> +/* Forward guest notifications */
>>> +static void vhost_handle_guest_kick(EventNotifier *n)
>>> +{
>>> + VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>> + host_notifier);
>>> +
>>> + if (unlikely(!event_notifier_test_and_clear(n))) {
>>> + return;
>>> + }
>>
>> Is there a chance that we may stop the processing of available buffers
>> during the svq enabling? There could be no kick from the guest in this case.
>>
> Actually, yes, I think you are right. The guest kick eventfd could
> have been consumed by vhost but there may be still pending buffers.
>
> I think it would be better to check for available buffers first, then
> clear the notifier unconditionally, and then re-check and process them
> if any [1].
Looks like I can't find "[1]" anywhere.
>
> However, this problem arises later in the series: At this moment the
> device is not reset and guest's host notifier is not replaced, so
> either vhost/device receives the kick, or SVQ does and forwards it.
>
> Does it make sense to you?
Kind of, so I think we can:
1) As you said, always check available buffers when switching to SVQ
2) alwasy kick the vhost when switching back to vhost
>
>>> +
>>> + event_notifier_set(&svq->kick_notifier);
>>> +}
>>> +
>>> +/*
>>> + * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
>>> + * exists pending used buffers.
>>> + *
>>> + * @svq Shadow Virtqueue
>>> + */
>>> +EventNotifier *vhost_svq_get_svq_call_notifier(VhostShadowVirtqueue *svq)
>>> +{
>>> + return &svq->call_notifier;
>>> +}
>>> +
>>> +/*
>>> + * Set the call notifier for the SVQ to call the guest
>>> + *
>>> + * @svq Shadow virtqueue
>>> + * @call_fd call notifier
>>> + *
>>> + * Called on BQL context.
>>> + */
>>> +void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
>>> +{
>>> + event_notifier_init_fd(&svq->guest_call_notifier, call_fd);
>>> +}
>>> +
>>> +/*
>>> + * Restore the vhost guest to host notifier, i.e., disables svq effect.
>>> + */
>>> +static int vhost_svq_restore_vdev_host_notifier(struct vhost_dev *dev,
>>> + unsigned vhost_index,
>>> + VhostShadowVirtqueue *svq)
>>> +{
>>> + EventNotifier *vq_host_notifier = virtio_queue_get_host_notifier(svq->vq);
>>> + struct vhost_vring_file file = {
>>> + .index = vhost_index,
>>> + .fd = event_notifier_get_fd(vq_host_notifier),
>>> + };
>>> + int r;
>>> +
>>> + /* Restore vhost kick */
>>> + r = dev->vhost_ops->vhost_set_vring_kick(dev, &file);
>>
>> And remap the notification area if necessary.
> Totally right, that step is missed in this series.
>
> However, remapping guest host notifier memory region has no advantages
> over using ioeventfd to perform guest -> SVQ notifications, doesn't
> it? By both methods flow needs to go through the hypervisor kernel.
To be clear, I meant restore the notification area mapping from guest to
device directly. For SVQ, you are right, there's no much value for
bothering notification area map. (Or we can do it in the future).
>
>>
>>> + return r ? -errno : 0;
>>> +}
>>> +
>>> +/*
>>> + * Start shadow virtqueue operation.
>>> + * @dev vhost device
>>> + * @hidx vhost virtqueue index
>>> + * @svq Shadow Virtqueue
>>> + */
>>> +bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
>>> + VhostShadowVirtqueue *svq)
>>> +{
>>> + EventNotifier *vq_host_notifier = virtio_queue_get_host_notifier(svq->vq);
>>> + struct vhost_vring_file file = {
>>> + .index = dev->vhost_ops->vhost_get_vq_index(dev, dev->vq_index + idx),
>>> + .fd = event_notifier_get_fd(&svq->kick_notifier),
>>> + };
>>> + int r;
>>> +
>>> + /* Check that notifications are still going directly to vhost dev */
>>> + assert(virtio_queue_is_host_notifier_enabled(svq->vq));
>>> +
>>> + /*
>>> + * event_notifier_set_handler already checks for guest's notifications if
>>> + * they arrive in the switch, so there is no need to explicitely check for
>>> + * them.
>>> + */
>>
>> If this is true, shouldn't we call vhost_set_vring_kick() before the
>> event_notifier_set_handler()?
>>
> Not at this point of the series, but it could be another solution when
> we need to reset the device and we are unsure if all buffers have been
> read. But I think I prefer the solution exposed in [1] and to
> explicitly call vhost_handle_guest_kick here. Do you think
> differently?
I actually mean if we can end up with this situation since SVQ take over
the host notifier before set_vring_kick().
1) guest kick vq, vhost is running
2) qemu take over the host notifier
3) guest kick vq
4) qemu route host notifier to SVQ
Then the vq will be handled by both SVQ and vhost?
>
>> Btw, I think we should update the fd if set_vring_kick() was called
>> after this function?
>>
> Kind of. This is currently bad in the code, but...
>
> Backend callbacks vhost_ops->vhost_set_vring_kick and
> vhost_ops->vhost_set_vring_addr are only called at
> vhost_virtqueue_start. And they are always called with known data
> already stored in VirtQueue.
This is true for now, but I'd suggest to not depend on this since it:
1) it might be changed in the future
2) we're working at vhost layer and expose API to virtio device, the
code should be robust to handle set_vring_kick() at any time
3) I think we've already handled similar situation of set_vring_call, so
let's be consistent
>
> To avoid storing more state in vhost_vdpa, I think that we should
> avoid duplicating them, but ignore new kick_fd or address in SVQ mode,
> and retrieve them again at the moment the device is (re)started in SVQ
> mode. Qemu already avoids things like allowing the guest to set
> addresses at random time, using the VirtIOPCIProxy to store them.
>
> I also see how duplicating that status could protect vdpa SVQ code
> against future changes to vhost code, but that would make this series
> bigger and more complex with no need at this moment in my opinion.
>
> Do you agree?
Somehow, but consider we can handle set_vring_call(), let's at last make
set_vring_kick() more robust.
>
>>> + event_notifier_init_fd(&svq->host_notifier,
>>> + event_notifier_get_fd(vq_host_notifier));
>>> + event_notifier_set_handler(&svq->host_notifier, vhost_handle_guest_kick);
>>> +
>>> + r = dev->vhost_ops->vhost_set_vring_kick(dev, &file);
>>
>> And we need to stop the notification area mmap.
>>
> Right.
>
>>> + if (unlikely(r != 0)) {
>>> + error_report("Couldn't set kick fd: %s", strerror(errno));
>>> + goto err_set_vring_kick;
>>> + }
>>> +
>>> + return true;
>>> +
>>> +err_set_vring_kick:
>>> + event_notifier_set_handler(&svq->host_notifier, NULL);
>>> +
>>> + return false;
>>> +}
>>> +
>>> +/*
>>> + * Stop shadow virtqueue operation.
>>> + * @dev vhost device
>>> + * @idx vhost queue index
>>> + * @svq Shadow Virtqueue
>>> + */
>>> +void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
>>> + VhostShadowVirtqueue *svq)
>>> +{
>>> + int r = vhost_svq_restore_vdev_host_notifier(dev, idx, svq);
>>> + if (unlikely(r < 0)) {
>>> + error_report("Couldn't restore vq kick fd: %s", strerror(-r));
>>> + }
>>> +
>>> + event_notifier_set_handler(&svq->host_notifier, NULL);
>>> +}
>>> +
>>> /*
>>> * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>>> * methods and file descriptors.
>>> */
>>> VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>>> {
>>> + int vq_idx = dev->vq_index + idx;
>>> g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>>> int r;
>>>
>>> @@ -44,6 +179,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>>> goto err_init_call_notifier;
>>> }
>>>
>>> + svq->vq = virtio_get_queue(dev->vdev, vq_idx);
>>> return g_steal_pointer(&svq);
>>>
>>> err_init_call_notifier:
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index e0dc7508c3..36c954a779 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -17,6 +17,7 @@
>>> #include "hw/virtio/vhost.h"
>>> #include "hw/virtio/vhost-backend.h"
>>> #include "hw/virtio/virtio-net.h"
>>> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>>> #include "hw/virtio/vhost-vdpa.h"
>>> #include "exec/address-spaces.h"
>>> #include "qemu/main-loop.h"
>>> @@ -272,6 +273,16 @@ static void vhost_vdpa_add_status(struct vhost_dev *dev, uint8_t status)
>>> vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &s);
>>> }
>>>
>>> +/**
>>> + * Adaptor function to free shadow virtqueue through gpointer
>>> + *
>>> + * @svq The Shadow Virtqueue
>>> + */
>>> +static void vhost_psvq_free(gpointer svq)
>>> +{
>>> + vhost_svq_free(svq);
>>> +}
>>> +
>>> static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>> {
>>> struct vhost_vdpa *v;
>>> @@ -283,6 +294,7 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>> dev->opaque = opaque ;
>>> v->listener = vhost_vdpa_memory_listener;
>>> v->msg_type = VHOST_IOTLB_MSG_V2;
>>> + v->shadow_vqs = g_ptr_array_new_full(dev->nvqs, vhost_psvq_free);
>>> QLIST_INSERT_HEAD(&vhost_vdpa_devices, v, entry);
>>>
>>> vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> @@ -373,6 +385,17 @@ err:
>>> return;
>>> }
>>>
>>> +static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
>>> +{
>>> + struct vhost_vdpa *v = dev->opaque;
>>> + size_t idx;
>>> +
>>> + for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
>>> + vhost_svq_stop(dev, idx, g_ptr_array_index(v->shadow_vqs, idx));
>>> + }
>>> + g_ptr_array_free(v->shadow_vqs, true);
>>> +}
>>> +
>>> static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>>> {
>>> struct vhost_vdpa *v;
>>> @@ -381,6 +404,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>>> trace_vhost_vdpa_cleanup(dev, v);
>>> vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>> memory_listener_unregister(&v->listener);
>>> + vhost_vdpa_svq_cleanup(dev);
>>> QLIST_REMOVE(v, entry);
>>>
>>> dev->opaque = NULL;
>>> @@ -557,7 +581,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>> if (started) {
>>> uint8_t status = 0;
>>> memory_listener_register(&v->listener, &address_space_memory);
>>> - vhost_vdpa_host_notifiers_init(dev);
>>> + if (!v->shadow_vqs_enabled) {
>>> + vhost_vdpa_host_notifiers_init(dev);
>>> + }
>>
>> This looks like a trick, why not check and setup shadow_vqs inside:
>>
>> 1) vhost_vdpa_host_notifiers_init()
>>
>> and
>>
>> 2) vhost_vdpa_set_vring_kick()
>>
> Ok I will move the checks there.
>
>>> vhost_vdpa_set_vring_ready(dev);
>>> vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>> vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
>>> @@ -663,10 +689,96 @@ static bool vhost_vdpa_force_iommu(struct vhost_dev *dev)
>>> return true;
>>> }
>>>
>>> +/*
>>> + * Start shadow virtqueue.
>>> + */
>>> +static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
>>> +{
>>> + struct vhost_vdpa *v = dev->opaque;
>>> + VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
>>> + return vhost_svq_start(dev, idx, svq);
>>> +}
>>> +
>>> +static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
>>> +{
>>> + struct vhost_dev *hdev = v->dev;
>>> + unsigned n;
>>> +
>>> + if (enable == v->shadow_vqs_enabled) {
>>> + return hdev->nvqs;
>>> + }
>>> +
>>> + if (enable) {
>>> + /* Allocate resources */
>>> + assert(v->shadow_vqs->len == 0);
>>> + for (n = 0; n < hdev->nvqs; ++n) {
>>> + VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
>>> + bool ok;
>>> +
>>> + if (unlikely(!svq)) {
>>> + g_ptr_array_set_size(v->shadow_vqs, 0);
>>> + return 0;
>>> + }
>>> + g_ptr_array_add(v->shadow_vqs, svq);
>>> +
>>> + ok = vhost_vdpa_svq_start_vq(hdev, n);
>>> + if (unlikely(!ok)) {
>>> + /* Free still not started svqs */
>>> + g_ptr_array_set_size(v->shadow_vqs, n);
>>> + enable = false;
> [2]
>
>>> + break;
>>> + }
>>> + }
>>
>> Since there's almost no logic could be shared between enable and
>> disable. Let's split those logic out into dedicated functions where the
>> codes looks more easy to be reviewed (e.g have a better error handling etc).
>>
> Maybe it could be more clear in the code, but the reused logic is the
> disabling of SVQ and the fallback in case it cannot be enabled with
> [2]. But I'm not against splitting in two different functions if it
> makes review easier.
Ok.
>
>>> + }
>>> +
>>> + v->shadow_vqs_enabled = enable;
>>> +
>>> + if (!enable) {
>>> + /* Disable all queues or clean up failed start */
>>> + for (n = 0; n < v->shadow_vqs->len; ++n) {
>>> + unsigned vq_idx = vhost_vdpa_get_vq_index(hdev, n);
>>> + VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, n);
>>> + vhost_svq_stop(hdev, n, svq);
>>> + vhost_virtqueue_start(hdev, hdev->vdev, &hdev->vqs[n], vq_idx);
>>> + }
>>> +
>>> + /* Resources cleanup */
>>> + g_ptr_array_set_size(v->shadow_vqs, 0);
>>> + }
>>> +
>>> + return n;
>>> +}
>>>
>>> void qmp_x_vhost_enable_shadow_vq(const char *name, bool enable, Error **errp)
>>> {
>>> - error_setg(errp, "Shadow virtqueue still not implemented");
>>> + struct vhost_vdpa *v;
>>> + const char *err_cause = NULL;
>>> + bool r;
>>> +
>>> + QLIST_FOREACH(v, &vhost_vdpa_devices, entry) {
>>> + if (v->dev->vdev && 0 == strcmp(v->dev->vdev->name, name)) {
>>> + break;
>>> + }
>>> + }
>>
>> I think you can iterate the NetClientStates to ge tthe vhost-vdpa backends.
>>
> Right, I missed it.
>
>>> +
>>> + if (!v) {
>>> + err_cause = "Device not found";
>>> + goto err;
>>> + } else if (v->notifier[0].addr) {
>>> + err_cause = "Device has host notifiers enabled";
>>
>> I don't get this.
>>
> At this moment of the series you can enable guest -> SVQ -> 'vdpa
> device' if the device is not using the host notifiers memory region.
> The right solution is to disable it for the guest, and to handle it in
> SVQ. Otherwise, guest kick will bypass SVQ and
>
> It can be done in the same patch, or at least to disable (as unmap)
> them at this moment and handle them in a posterior patch. but for
> prototyping the solution I just ignored it in this series. It will be
> handled some way or another in the next one. I prefer the last one, to
> handle in a different patch, but let me know if you think it is better
> otherwise.
Aha, I see. But I think we need to that in this patch otherwise the we
can route host notifier to SVQ.
Thanks
>
>> Btw this function should be implemented in an independent patch after
>> svq is fully functional.
>>
> (Reasons for that are already commented at the top of this mail :) ).
>
> Thanks!
>
>> Thanks
>>
>>
>>> + goto err;
>>> + }
>>> +
>>> + r = vhost_vdpa_enable_svq(v, enable);
>>> + if (unlikely(!r)) {
>>> + err_cause = "Error enabling (see monitor)";
>>> + goto err;
>>> + }
>>> +
>>> +err:
>>> + if (err_cause) {
>>> + error_setg(errp, "Can't enable shadow vq on %s: %s", name, err_cause);
>>> + }
>>> }
>>>
>>> const VhostOps vdpa_ops = {
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 15/20] vhost: Shadow virtqueue buffers forwarding
[not found] ` <CAJaqyWeaJyxh-tt45wxONzuOLhVt6wO48e2ufZZ3uECHTDofFw@mail.gmail.com>
@ 2021-10-15 4:23 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-15 4:23 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/15 上午1:56, Eugenio Perez Martin 写道:
> On Wed, Oct 13, 2021 at 6:31 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
>>> Initial version of shadow virtqueue that actually forward buffers. There
>>> are no iommu support at the moment, and that will be addressed in future
>>> patches of this series. Since all vhost-vdpa devices uses forced IOMMU,
>>> this means that SVQ is not usable at this point of the series on any
>>> device.
>>>
>>> For simplicity it only supports modern devices, that expects vring
>>> in little endian, with split ring and no event idx or indirect
>>> descriptors. Support for them will not be added in this series.
>>>
>>> It reuses the VirtQueue code for the device part. The driver part is
>>> based on Linux's virtio_ring driver, but with stripped functionality
>>> and optimizations so it's easier to review. Later commits add simpler
>>> ones.
>>>
>>> SVQ uses VIRTIO_CONFIG_S_DEVICE_STOPPED to pause the device and
>>> retrieve its status (next available idx the device was going to
>>> consume) race-free. It can later reset the device to replace vring
>>> addresses etc. When SVQ starts qemu can resume consuming the guest's
>>> driver ring from that state, without notice from the latter.
>>>
>>> This status bit VIRTIO_CONFIG_S_DEVICE_STOPPED is currently discussed
>>> in VirtIO, and is implemented in qemu VirtIO-net devices in previous
>>> commits.
>>>
>>> Removal of _S_DEVICE_STOPPED bit (in other words, resuming the device)
>>> can be done in the future if an use case arises. At this moment we can
>>> just rely on reseting the full device.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>> qapi/net.json | 2 +-
>>> hw/virtio/vhost-shadow-virtqueue.c | 237 ++++++++++++++++++++++++++++-
>>> hw/virtio/vhost-vdpa.c | 109 ++++++++++++-
>>> 3 files changed, 337 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/qapi/net.json b/qapi/net.json
>>> index fe546b0e7c..1f4a55f2c5 100644
>>> --- a/qapi/net.json
>>> +++ b/qapi/net.json
>>> @@ -86,7 +86,7 @@
>>> #
>>> # @name: the device name of the VirtIO device
>>> #
>>> -# @enable: true to use the alternate shadow VQ notifications
>>> +# @enable: true to use the alternate shadow VQ buffers fowarding path
>>> #
>>> # Returns: Error if failure, or 'no error' for success.
>>> #
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index 34e159d4fd..df7e6fa3ec 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -10,6 +10,7 @@
>>> #include "qemu/osdep.h"
>>> #include "hw/virtio/vhost-shadow-virtqueue.h"
>>> #include "hw/virtio/vhost.h"
>>> +#include "hw/virtio/virtio-access.h"
>>>
>>> #include "standard-headers/linux/vhost_types.h"
>>>
>>> @@ -44,15 +45,135 @@ typedef struct VhostShadowVirtqueue {
>>>
>>> /* Virtio device */
>>> VirtIODevice *vdev;
>>> +
>>> + /* Map for returning guest's descriptors */
>>> + VirtQueueElement **ring_id_maps;
>>> +
>>> + /* Next head to expose to device */
>>> + uint16_t avail_idx_shadow;
>>> +
>>> + /* Next free descriptor */
>>> + uint16_t free_head;
>>> +
>>> + /* Last seen used idx */
>>> + uint16_t shadow_used_idx;
>>> +
>>> + /* Next head to consume from device */
>>> + uint16_t used_idx;
>>
>> Let's use "last_used_idx" as kernel driver did.
>>
> Ok I will change it.
>
>>> } VhostShadowVirtqueue;
>>>
>>> /* If the device is using some of these, SVQ cannot communicate */
>>> bool vhost_svq_valid_device_features(uint64_t *dev_features)
>>> {
>>> - return true;
>>> + uint64_t b;
>>> + bool r = true;
>>> +
>>> + for (b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END; ++b) {
>>> + switch (b) {
>>> + case VIRTIO_F_NOTIFY_ON_EMPTY:
>>> + case VIRTIO_F_ANY_LAYOUT:
>>> + /* SVQ is fine with this feature */
>>> + continue;
>>> +
>>> + case VIRTIO_F_ACCESS_PLATFORM:
>>> + /* SVQ needs this feature disabled. Can't continue */
>>
>> So code can explain itself, need a comment to explain why.
>>
> Do you mean that it *doesn't* need a comment to explain why? In that
> case I will delete them.
I meant the comment is duplicated with the code. If you wish, you can
explain why ACCESS_PLATFORM needs to be disabled.
>
>>> + if (*dev_features & BIT_ULL(b)) {
>>> + clear_bit(b, dev_features);
>>> + r = false;
>>> + }
>>> + break;
>>> +
>>> + case VIRTIO_F_VERSION_1:
>>> + /* SVQ needs this feature, so can't continue */
>>
>> A comment to explain why SVQ needs this feature.
>>
> Sure I will add it.
>
>>> + if (!(*dev_features & BIT_ULL(b))) {
>>> + set_bit(b, dev_features);
>>> + r = false;
>>> + }
>>> + continue;
>>> +
>>> + default:
>>> + /*
>>> + * SVQ must disable this feature, let's hope the device is fine
>>> + * without it.
>>> + */
>>> + if (*dev_features & BIT_ULL(b)) {
>>> + clear_bit(b, dev_features);
>>> + }
>>> + }
>>> + }
>>> +
>>> + return r;
>>> +}
>>
>> Let's move this to patch 14.
>>
> I can move it down to 14/20, but then it is not really accurate, since
> notifications forwarding can work with all feature sets. Not like we
> are introducing a regression, but still.
>
> I can always explain that in the patch message though, would that be ok?
I'm afraid this will break bisection. E.g for patch 14, it works for any
features but for patch 15 it doesn't.
>
>>> +
>>> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
>>> + const struct iovec *iovec,
>>> + size_t num, bool more_descs, bool write)
>>> +{
>>> + uint16_t i = svq->free_head, last = svq->free_head;
>>> + unsigned n;
>>> + uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
>>> + vring_desc_t *descs = svq->vring.desc;
>>> +
>>> + if (num == 0) {
>>> + return;
>>> + }
>>> +
>>> + for (n = 0; n < num; n++) {
>>> + if (more_descs || (n + 1 < num)) {
>>> + descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
>>> + } else {
>>> + descs[i].flags = flags;
>>> + }
>>> + descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
>>> + descs[i].len = cpu_to_le32(iovec[n].iov_len);
>>> +
>>> + last = i;
>>> + i = cpu_to_le16(descs[i].next);
>>> + }
>>> +
>>> + svq->free_head = le16_to_cpu(descs[last].next);
>>> +}
>>> +
>>> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
>>> + VirtQueueElement *elem)
>>> +{
>>> + int head;
>>> + unsigned avail_idx;
>>> + vring_avail_t *avail = svq->vring.avail;
>>> +
>>> + head = svq->free_head;
>>> +
>>> + /* We need some descriptors here */
>>> + assert(elem->out_num || elem->in_num);
>>> +
>>> + vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
>>> + elem->in_num > 0, false);
>>> + vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
>>> +
>>> + /*
>>> + * Put entry in available array (but don't update avail->idx until they
>>> + * do sync).
>>> + */
>>> + avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
>>> + avail->ring[avail_idx] = cpu_to_le16(head);
>>> + svq->avail_idx_shadow++;
>>> +
>>> + /* Update avail index after the descriptor is wrote */
>>> + smp_wmb();
>>> + avail->idx = cpu_to_le16(svq->avail_idx_shadow);
>>> +
>>> + return head;
>>> +
>>> }
>>>
>>> -/* Forward guest notifications */
>>> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
>>> +{
>>> + unsigned qemu_head = vhost_svq_add_split(svq, elem);
>>> +
>>> + svq->ring_id_maps[qemu_head] = elem;
>>> +}
>>> +
>>> +/* Handle guest->device notifications */
>>> static void vhost_handle_guest_kick(EventNotifier *n)
>>> {
>>> VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>> @@ -62,7 +183,74 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>>> return;
>>> }
>>>
>>> - event_notifier_set(&svq->kick_notifier);
>>> + /* Make available as many buffers as possible */
>>> + do {
>>> + if (virtio_queue_get_notification(svq->vq)) {
>>> + /* No more notifications until process all available */
>>> + virtio_queue_set_notification(svq->vq, false);
>>> + }
>>
>> This can be done outside the loop.
>>
> I think it cannot. The intention of doing this way is that we check
> for new available buffers *also after* enabling notifications, so we
> don't miss any of them. It is more or less copied from
> virtio_blk_handle_vq, which also needs to run to completion.
>
> If we need to loop again because there are more available buffers, we
> want to disable notifications again. Or am I missing something?
I think you're right.
>
>>> +
>>> + while (true) {
>>> + VirtQueueElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
>>> + if (!elem) {
>>> + break;
>>> + }
>>> +
>>> + vhost_svq_add(svq, elem);
>>> + event_notifier_set(&svq->kick_notifier);
>>> + }
>>> +
>>> + virtio_queue_set_notification(svq->vq, true);
>>
>> I think this can be moved to the end of this function.
>>
> (Same as previous answer)
>
>> Btw, we probably need a quota to make sure the svq is not hogging the
>> main event loop.
>>
>> Similar issue could be found in both virtio-net TX (using timer or bh)
>> and TAP (a quota).
>>
> I think that virtqueue size is the natural limit to that: since we are
> not making any buffers used in the loop, there is no way that it runs
> more than virtqueue size times. If it does because of an evil/bogus
> guest, virtqueue_pop raises the message "Virtqueue size exceeded" and
> returns NULL, effectively breaking the loop.
>
> Virtio-net tx functions mark each buffer right after making them
> available and use it, so they can hog BQL. But my understanding is
> that is not possible in the SVQ case.
Right.
>
> I can add a comment in the code to make it clearer though.
Yes, please.
>
>>> + } while (!virtio_queue_empty(svq->vq));
>>> +}
>>> +
>>> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
>>> +{
>>> + if (svq->used_idx != svq->shadow_used_idx) {
>>> + return true;
>>> + }
>>> +
>>> + /* Get used idx must not be reordered */
>>> + smp_rmb();
>>
>> Interesting, we don't do this for kernel drivers. It would be helpful to
>> explain it more clear by "X must be done before Y".
>>
> I think this got reordered, it's supposed to be *after* get the used
> idx, so it matches the one in the kernel with the comment "Only get
> used array entries after they have been exposed by host.".
Right.
>
> I will change it for the next series.
Ok.
>
>>> + svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
>>> +
>>> + return svq->used_idx != svq->shadow_used_idx;
>>> +}
>>> +
>>> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
>>> +{
>>> + vring_desc_t *descs = svq->vring.desc;
>>> + const vring_used_t *used = svq->vring.used;
>>> + vring_used_elem_t used_elem;
>>> + uint16_t last_used;
>>> +
>>> + if (!vhost_svq_more_used(svq)) {
>>> + return NULL;
>>> + }
>>> +
>>> + last_used = svq->used_idx & (svq->vring.num - 1);
>>> + used_elem.id = le32_to_cpu(used->ring[last_used].id);
>>> + used_elem.len = le32_to_cpu(used->ring[last_used].len);
>>> +
>>> + svq->used_idx++;
>>> + if (unlikely(used_elem.id >= svq->vring.num)) {
>>> + error_report("Device %s says index %u is used", svq->vdev->name,
>>> + used_elem.id);
>>> + return NULL;
>>> + }
>>> +
>>> + if (unlikely(!svq->ring_id_maps[used_elem.id])) {
>>> + error_report(
>>> + "Device %s says index %u is used, but it was not available",
>>> + svq->vdev->name, used_elem.id);
>>> + return NULL;
>>> + }
>>> +
>>> + descs[used_elem.id].next = svq->free_head;
>>> + svq->free_head = used_elem.id;
>>> +
>>> + svq->ring_id_maps[used_elem.id]->len = used_elem.len;
>>> + return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
>>> }
>>>
>>> /* Forward vhost notifications */
>>> @@ -70,8 +258,26 @@ static void vhost_svq_handle_call_no_test(EventNotifier *n)
>>> {
>>> VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>> call_notifier);
>>> -
>>> - event_notifier_set(&svq->guest_call_notifier);
>>> + VirtQueue *vq = svq->vq;
>>> +
>>> + /* Make as many buffers as possible used. */
>>> + do {
>>> + unsigned i = 0;
>>> +
>>> + /* TODO: Use VRING_AVAIL_F_NO_INTERRUPT */
>>> + while (true) {
>>> + g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
>>> + if (!elem) {
>>> + break;
>>> + }
>>> +
>>> + assert(i < svq->vring.num);
>>
>> Let's return error instead of using the assert.
>>
> Actually this is a condition that we should never meet: In the case of
> ring overrun, device would try to set used a descriptor that is either
>> vring size *or* should try to overrun some of the already used ones.
> In both cases, elem should be NULL and the loop should break.
>
> So this is a safety net protecting from both, if we have an i >
> svq->vring.num means we are not processing used buffers well anymore,
> and (moreover) this is happening after making used all descriptors.
>
> Taking that into account, should we delete it?
Maybe a warning instead.
>
>>> + virtqueue_fill(vq, elem, elem->len, i++);
>>> + }
>>> +
>>> + virtqueue_flush(vq, i);
>>> + event_notifier_set(&svq->guest_call_notifier);
>>> + } while (vhost_svq_more_used(svq));
>>> }
>>>
>>> static void vhost_svq_handle_call(EventNotifier *n)
>>> @@ -204,12 +410,25 @@ err_set_vring_kick:
>>> void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
>>> VhostShadowVirtqueue *svq)
>>> {
>>> + int i;
>>> int r = vhost_svq_restore_vdev_host_notifier(dev, idx, svq);
>>> +
>>> if (unlikely(r < 0)) {
>>> error_report("Couldn't restore vq kick fd: %s", strerror(-r));
>>> }
>>>
>>> event_notifier_set_handler(&svq->host_notifier, NULL);
>>> +
>>> + for (i = 0; i < svq->vring.num; ++i) {
>>> + g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
>>> + /*
>>> + * Although the doc says we must unpop in order, it's ok to unpop
>>> + * everything.
>>> + */
>>> + if (elem) {
>>> + virtqueue_unpop(svq->vq, elem, elem->len);
>>> + }
>>
>> Will this result some of the "pending" buffers to be submitted multiple
>> times? If yes, should we wait for all the buffers used instead of doing
>> the unpop here?
>>
> Do you mean to call virtqueue_unpop with the same elem (or elem.id)
> multiple times? That should never happen, because elem.id should be
> the position in the ring_id_maps. Also, unpop() should just unmap the
> element and never sync again.
>
> Maybe it is way clearer to call virtqueue_detach_element here directly.
I meant basically for the buffers that were consumed by the device but
not made used. In this case if we unpop here. It will be processed by
the device later via vhost-vdpa again?
This is probably fine for net but I'm not sure it works for other
devices. Another way is to wait until all the consumed buffer used.
>
>>> + }
>>> }
>>>
>>> /*
>>> @@ -224,7 +443,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>>> size_t driver_size;
>>> size_t device_size;
>>> g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>>> - int r;
>>> + int r, i;
>>>
>>> r = event_notifier_init(&svq->kick_notifier, 0);
>>> if (r != 0) {
>>> @@ -250,6 +469,11 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>>> memset(svq->vring.desc, 0, driver_size);
>>> svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>>> memset(svq->vring.used, 0, device_size);
>>> + for (i = 0; i < num - 1; i++) {
>>> + svq->vring.desc[i].next = cpu_to_le16(i + 1);
>>> + }
>>> +
>>> + svq->ring_id_maps = g_new0(VirtQueueElement *, num);
>>> event_notifier_set_handler(&svq->call_notifier,
>>> vhost_svq_handle_call);
>>> return g_steal_pointer(&svq);
>>> @@ -269,6 +493,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>>> event_notifier_cleanup(&vq->kick_notifier);
>>> event_notifier_set_handler(&vq->call_notifier, NULL);
>>> event_notifier_cleanup(&vq->call_notifier);
>>> + g_free(vq->ring_id_maps);
>>> qemu_vfree(vq->vring.desc);
>>> qemu_vfree(vq->vring.used);
>>> g_free(vq);
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index a057e8277d..bb7010ddb5 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -19,6 +19,7 @@
>>> #include "hw/virtio/virtio-net.h"
>>> #include "hw/virtio/vhost-shadow-virtqueue.h"
>>> #include "hw/virtio/vhost-vdpa.h"
>>> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>>> #include "exec/address-spaces.h"
>>> #include "qemu/main-loop.h"
>>> #include "cpu.h"
>>> @@ -475,6 +476,28 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>> return vhost_vdpa_backend_set_features(dev, features);
>>> }
>>>
>>> +/**
>>> + * Restore guest features to vdpa device
>>> + */
>>> +static int vhost_vdpa_set_guest_features(struct vhost_dev *dev)
>>> +{
>>> + struct vhost_vdpa *v = dev->opaque;
>>> + return vhost_vdpa_backend_set_features(dev, v->guest_features);
>>> +}
>>> +
>>> +/**
>>> + * Set shadow virtqueue supported features
>>> + */
>>> +static int vhost_vdpa_set_svq_features(struct vhost_dev *dev)
>>> +{
>>> + struct vhost_vdpa *v = dev->opaque;
>>> + uint64_t features = v->host_features;
>>> + bool b = vhost_svq_valid_device_features(&features);
>>> + assert(b);
>>> +
>>> + return vhost_vdpa_backend_set_features(dev, features);
>>> +}
>>> +
>>> static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>>> {
>>> uint64_t features;
>>> @@ -730,6 +753,19 @@ static bool vhost_vdpa_force_iommu(struct vhost_dev *dev)
>>> return true;
>>> }
>>>
>>> +static int vhost_vdpa_vring_pause(struct vhost_dev *dev)
>>> +{
>>> + int r;
>>> + uint8_t status;
>>> +
>>> + vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DEVICE_STOPPED);
>>> + do {
>>> + r = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
>>
>> I guess we'd better add some sleep here.
>>
> If the final version still contains the call, I will add the sleep. At
> the moment I think it's better if we stop the device by a vdpa ioctl.
Ok, so the idea is to sleep in the ioctl?
>
>>> + } while (r == 0 && !(status & VIRTIO_CONFIG_S_DEVICE_STOPPED));
>>> +
>>> + return 0;
>>> +}
>>> +
>>> /*
>>> * Start shadow virtqueue.
>>> */
>>> @@ -742,9 +778,29 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
>>> .index = idx + dev->vq_index,
>>> .fd = event_notifier_get_fd(vhost_call_notifier),
>>> };
>>> + struct vhost_vring_addr addr = {
>>> + .index = idx + dev->vq_index,
>>> + };
>>> + struct vhost_vring_state num = {
>>> + .index = idx + dev->vq_index,
>>> + .num = virtio_queue_get_num(dev->vdev, idx),
>>> + };
>>> int r;
>>> bool b;
>>>
>>> + vhost_svq_get_vring_addr(svq, &addr);
>>> + r = vhost_vdpa_set_vring_addr(dev, &addr);
>>> + if (unlikely(r)) {
>>> + error_report("vhost_set_vring_addr for shadow vq failed");
>>> + return false;
>>> + }
>>> +
>>> + r = vhost_vdpa_set_vring_num(dev, &num);
>>> + if (unlikely(r)) {
>>> + error_report("vhost_vdpa_set_vring_num for shadow vq failed");
>>> + return false;
>>> + }
>>> +
>>> /* Set shadow vq -> guest notifier */
>>> assert(v->call_fd[idx]);
>>> vhost_svq_set_guest_call_notifier(svq, v->call_fd[idx]);
>>> @@ -781,15 +837,32 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
>>> assert(v->shadow_vqs->len == 0);
>>> for (n = 0; n < hdev->nvqs; ++n) {
>>> VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
>>> - bool ok;
>>> -
>>> if (unlikely(!svq)) {
>>> g_ptr_array_set_size(v->shadow_vqs, 0);
>>> return 0;
>>> }
>>> g_ptr_array_add(v->shadow_vqs, svq);
>>> + }
>>> + }
>>>
>>> - ok = vhost_vdpa_svq_start_vq(hdev, n);
>>> + r = vhost_vdpa_vring_pause(hdev);
>>> + assert(r == 0);
>>> +
>>> + if (enable) {
>>> + for (n = 0; n < v->shadow_vqs->len; ++n) {
>>> + /* Obtain Virtqueue state */
>>> + vhost_virtqueue_stop(hdev, hdev->vdev, &hdev->vqs[n], n);
>>> + }
>>> + }
>>> +
>>> + /* Reset device so it can be configured */
>>> + r = vhost_vdpa_dev_start(hdev, false);
>>> + assert(r == 0);
>>> +
>>> + if (enable) {
>>> + int r;
>>> + for (n = 0; n < v->shadow_vqs->len; ++n) {
>>> + bool ok = vhost_vdpa_svq_start_vq(hdev, n);
>>> if (unlikely(!ok)) {
>>> /* Free still not started svqs */
>>> g_ptr_array_set_size(v->shadow_vqs, n);
>>> @@ -797,11 +870,19 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
>>> break;
>>> }
>>> }
>>> +
>>> + /* Need to ack features to set state in vp_vdpa devices */
>>
>> vhost_vdpa actually?
>>
> Yes, what a mistake!
>
>>> + r = vhost_vdpa_set_svq_features(hdev);
>>> + if (unlikely(r)) {
>>> + enable = false;
>>> + }
>>> }
>>>
>>> v->shadow_vqs_enabled = enable;
>>>
>>> if (!enable) {
>>> + vhost_vdpa_set_guest_features(hdev);
>>> +
>>> /* Disable all queues or clean up failed start */
>>> for (n = 0; n < v->shadow_vqs->len; ++n) {
>>> struct vhost_vring_file file = {
>>> @@ -818,7 +899,12 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
>>> /* TODO: This can unmask or override call fd! */
>>> vhost_virtqueue_start(hdev, hdev->vdev, &hdev->vqs[n], vq_idx);
>>> }
>>> + }
>>>
>>> + r = vhost_vdpa_dev_start(hdev, true);
>>> + assert(r == 0);
>>> +
>>> + if (!enable) {
>>> /* Resources cleanup */
>>> g_ptr_array_set_size(v->shadow_vqs, 0);
>>> }
>>> @@ -831,6 +917,7 @@ void qmp_x_vhost_enable_shadow_vq(const char *name, bool enable, Error **errp)
>>> struct vhost_vdpa *v;
>>> const char *err_cause = NULL;
>>> bool r;
>>> + uint64_t svq_features;
>>>
>>> QLIST_FOREACH(v, &vhost_vdpa_devices, entry) {
>>> if (v->dev->vdev && 0 == strcmp(v->dev->vdev->name, name)) {
>>> @@ -846,6 +933,20 @@ void qmp_x_vhost_enable_shadow_vq(const char *name, bool enable, Error **errp)
>>> goto err;
>>> }
>>>
>>> + svq_features = v->host_features;
>>> + if (!vhost_svq_valid_device_features(&svq_features)) {
>>> + error_setg(errp,
>>> + "Can't enable shadow vq on %s: Unexpected feature flags (%lx-%lx)",
>>> + name, v->host_features, svq_features);
>>> + return;
>>> + } else {
>>> + /* TODO: Check for virtio_vdpa + IOMMU & modern device */
>>
>> I guess you mean "vhost_vdpa" here.
> Yes, a similar mistake in less than 50 lines :).
>
>> For IOMMU, I guess you mean "vIOMMU"
>> actually?
>>
> This comment is out of date and inherited from the vhost version,
> where only the IOMMU version was developed, so it will be deleted in
> the next series. I think it makes little sense to check vIOMMU if we
> stick with vDPA since it still does not support it, but we could make
> the check here for sure.
Right.
Thanks
>
> Thanks!
>
>> Thanks
>>
>>
>>> + }
>>> +
>>> + if (err_cause) {
>>> + goto err;
>>> + }
>>> +
>>> r = vhost_vdpa_enable_svq(v, enable);
>>> if (unlikely(!r)) {
>>> err_cause = "Error enabling (see monitor)";
>>> @@ -853,7 +954,7 @@ void qmp_x_vhost_enable_shadow_vq(const char *name, bool enable, Error **errp)
>>> }
>>>
>>> err:
>>> - if (err_cause) {
>>> + if (errp == NULL && err_cause) {
>>> error_setg(errp, "Can't enable shadow vq on %s: %s", name, err_cause);
>>> }
>>> }
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 11/20] vhost: Route host->guest notification through shadow virtqueue
[not found] ` <CAJaqyWcQ314RN7-U1bYqCMXb+-nyhSi3ddqWv90ofFucMbveUw@mail.gmail.com>
@ 2021-10-15 4:24 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-15 4:24 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/14 下午11:58, Eugenio Perez Martin 写道:
> On Wed, Oct 13, 2021 at 5:49 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
>>> This will make qemu aware of the device used buffers, allowing it to
>>> write the guest memory with its contents if needed.
>>>
>>> Since the use of vhost_virtqueue_start can unmasks and discard call
>>> events, vhost_virtqueue_start should be modified in one of these ways:
>>> * Split in two: One of them uses all logic to start a queue with no
>>> side effects for the guest, and another one tha actually assumes that
>>> the guest has just started the device. Vdpa should use just the
>>> former.
>>> * Actually store and check if the guest notifier is masked, and do it
>>> conditionally.
>>> * Left as it is, and duplicate all the logic in vhost-vdpa.
>>
>> Btw, the log looks not clear. I guess this patch goes for method 3. If
>> yes, we need explain it and why.
>>
>> Thanks
>>
> Sorry about being unclear. This commit log (and code) just exposes the
> problem and the solutions I came up with but does nothing to solve it.
> I'm actually going for method 3 for the next series but I'm open to
> doing it differently.
That's fine, but need to doc that method 3 is something that is done in
the patch.
Thanks
>
>>> Signed-off-by: Eugenio Pérez<eperezma@redhat.com>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 11/20] vhost: Route host->guest notification through shadow virtqueue
[not found] ` <CAJaqyWfm734HrwTJK71hUQNYVkyDaR8OiqtGro_AX9i_pXfmBQ@mail.gmail.com>
@ 2021-10-15 4:42 ` Jason Wang
[not found] ` <CAJaqyWcO9oaGsRe-oMNbmHx7G4Mw0vZfc+7WYQ23+SteoFVn4Q@mail.gmail.com>
0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-15 4:42 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/15 上午12:39, Eugenio Perez Martin 写道:
> On Wed, Oct 13, 2021 at 5:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
>>> This will make qemu aware of the device used buffers, allowing it to
>>> write the guest memory with its contents if needed.
>>>
>>> Since the use of vhost_virtqueue_start can unmasks and discard call
>>> events, vhost_virtqueue_start should be modified in one of these ways:
>>> * Split in two: One of them uses all logic to start a queue with no
>>> side effects for the guest, and another one tha actually assumes that
>>> the guest has just started the device. Vdpa should use just the
>>> former.
>>> * Actually store and check if the guest notifier is masked, and do it
>>> conditionally.
>>> * Left as it is, and duplicate all the logic in vhost-vdpa.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>> hw/virtio/vhost-shadow-virtqueue.c | 19 +++++++++++++++
>>> hw/virtio/vhost-vdpa.c | 38 +++++++++++++++++++++++++++++-
>>> 2 files changed, 56 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index 21dc99ab5d..3fe129cf63 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -53,6 +53,22 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>>> event_notifier_set(&svq->kick_notifier);
>>> }
>>>
>>> +/* Forward vhost notifications */
>>> +static void vhost_svq_handle_call_no_test(EventNotifier *n)
>>> +{
>>> + VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>> + call_notifier);
>>> +
>>> + event_notifier_set(&svq->guest_call_notifier);
>>> +}
>>> +
>>> +static void vhost_svq_handle_call(EventNotifier *n)
>>> +{
>>> + if (likely(event_notifier_test_and_clear(n))) {
>>> + vhost_svq_handle_call_no_test(n);
>>> + }
>>> +}
>>> +
>>> /*
>>> * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
>>> * exists pending used buffers.
>>> @@ -180,6 +196,8 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>>> }
>>>
>>> svq->vq = virtio_get_queue(dev->vdev, vq_idx);
>>> + event_notifier_set_handler(&svq->call_notifier,
>>> + vhost_svq_handle_call);
>>> return g_steal_pointer(&svq);
>>>
>>> err_init_call_notifier:
>>> @@ -195,6 +213,7 @@ err_init_kick_notifier:
>>> void vhost_svq_free(VhostShadowVirtqueue *vq)
>>> {
>>> event_notifier_cleanup(&vq->kick_notifier);
>>> + event_notifier_set_handler(&vq->call_notifier, NULL);
>>> event_notifier_cleanup(&vq->call_notifier);
>>> g_free(vq);
>>> }
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index bc34de2439..6c5f4c98b8 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -712,13 +712,40 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
>>> {
>>> struct vhost_vdpa *v = dev->opaque;
>>> VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
>>> - return vhost_svq_start(dev, idx, svq);
>>> + EventNotifier *vhost_call_notifier = vhost_svq_get_svq_call_notifier(svq);
>>> + struct vhost_vring_file vhost_call_file = {
>>> + .index = idx + dev->vq_index,
>>> + .fd = event_notifier_get_fd(vhost_call_notifier),
>>> + };
>>> + int r;
>>> + bool b;
>>> +
>>> + /* Set shadow vq -> guest notifier */
>>> + assert(v->call_fd[idx]);
>>
>> We need aovid the asser() here. On which case we can hit this?
>>
> I would say that there is no way we can actually hit it, so let's remove it.
>
>>> + vhost_svq_set_guest_call_notifier(svq, v->call_fd[idx]);
>>> +
>>> + b = vhost_svq_start(dev, idx, svq);
>>> + if (unlikely(!b)) {
>>> + return false;
>>> + }
>>> +
>>> + /* Set device -> SVQ notifier */
>>> + r = vhost_vdpa_set_vring_dev_call(dev, &vhost_call_file);
>>> + if (unlikely(r)) {
>>> + error_report("vhost_vdpa_set_vring_call for shadow vq failed");
>>> + return false;
>>> + }
>>
>> Similar to kick, do we need to set_vring_call() before vhost_svq_start()?
>>
> It should not matter at this moment because the device should not be
> started at this point and device calls should not run
> vhost_svq_handle_call until BQL is released.
Yes, we stop virtqueue before.
>
> The "logic" of doing it after is to make clear that svq must be fully
> initialized before processing device calls, even in the case that we
> extract SVQ in its own iothread or similar. But this could be done
> before vhost_svq_start for sure.
>
>>> +
>>> + /* Check for pending calls */
>>> + event_notifier_set(vhost_call_notifier);
>>
>> Interesting, can this result spurious interrupt?
>>
> This actually "queues" a vhost_svq_handle_call after the BQL release,
> where the device should be fully reset. In that regard, if there are
> no used descriptors there will not be an irq raised to the guest. Does
> that answer the question? Or have I missed something?
Yes, please explain this in the comment.
>
>>> + return true;
>>> }
>>>
>>> static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
>>> {
>>> struct vhost_dev *hdev = v->dev;
>>> unsigned n;
>>> + int r;
>>>
>>> if (enable == v->shadow_vqs_enabled) {
>>> return hdev->nvqs;
>>> @@ -752,9 +779,18 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
>>> if (!enable) {
>>> /* Disable all queues or clean up failed start */
>>> for (n = 0; n < v->shadow_vqs->len; ++n) {
>>> + struct vhost_vring_file file = {
>>> + .index = vhost_vdpa_get_vq_index(hdev, n),
>>> + .fd = v->call_fd[n],
>>> + };
>>> +
>>> + r = vhost_vdpa_set_vring_call(hdev, &file);
>>> + assert(r == 0);
>>> +
>>> unsigned vq_idx = vhost_vdpa_get_vq_index(hdev, n);
>>> VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, n);
>>> vhost_svq_stop(hdev, n, svq);
>>> + /* TODO: This can unmask or override call fd! */
>>
>> I don't get this comment. Does this mean the current code can't work
>> with mask_notifiers? If yes, this is something we need to fix.
>>
> Yes, but it will be addressed in the next series. I should have
> explained it bette here, sorry :).
Ok.
Thanks
>
> Thanks!
>
>> Thanks
>>
>>
>>> vhost_virtqueue_start(hdev, hdev->vdev, &hdev->vqs[n], vq_idx);
>>> }
>>>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
[not found] ` <CAJaqyWdEGWFNrxqKxRya=ybRiP0wTZ0aPksBBeOe9KOjOmUnqA@mail.gmail.com>
@ 2021-10-15 7:37 ` Jason Wang
[not found] ` <CAJaqyWf7pFiw2twq9BPyr9fOJFa9ZpSMcbnoknOfC_pbuUWkmg@mail.gmail.com>
0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-15 7:37 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake
On Fri, Oct 15, 2021 at 3:28 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Oct 13, 2021 at 7:34 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> > > Use translations added in VhostIOVATree in SVQ.
> > >
> > > Now every element needs to store the previous address also, so VirtQueue
> > > can consume the elements properly. This adds a little overhead per VQ
> > > element, having to allocate more memory to stash them. As a possible
> > > optimization, this allocation could be avoided if the descriptor is not
> > > a chain but a single one, but this is left undone.
> > >
> > > TODO: iova range should be queried before, and add logic to fail when
> > > GPA is outside of its range and memory listener or svq add it.
> > >
> > > Signed-off-by: Eugenio Pérez<eperezma@redhat.com>
> > > ---
> > > hw/virtio/vhost-shadow-virtqueue.h | 4 +-
> > > hw/virtio/vhost-shadow-virtqueue.c | 130 ++++++++++++++++++++++++-----
> > > hw/virtio/vhost-vdpa.c | 40 ++++++++-
> > > hw/virtio/trace-events | 1 +
> > > 4 files changed, 152 insertions(+), 23 deletions(-)
> >
> >
> > Think hard about the whole logic. This is safe since qemu memory map
> > will fail if guest submits a invalidate IOVA.
> >
>
> Can you expand on this? What you mean is that VirtQueue already
> protects SVQ code if the guest sets an invalid buffer address (GPA),
> isn't it?
Yes.
>
> > Then I wonder if we do something much more simpler:
> >
> > 1) Using qemu VA as IOVA but only maps the VA that belongs to guest
> > 2) Then we don't need any IOVA tree here, what we need is to just map
> > vring and use qemu VA without any translation
> >
>
> That would be great, but either qemu's SVQ vring or guest translated
> buffers address (in qemu VA form) were already in high addresses,
> outside of the device's iova range (in my test).
You're right. I miss that and that's why we need e.g iova tree and allocator.
What I proposed only makes sense when shared virtual memory (SVA) is
implemented. In the case of SVA, the valid iova range should be the
full VA range.
>
> I didn't try remapping tricks to make them fit in the range, but I
> think it does complicate the solution relatively fast if there was
> already memory in that range owned by qemu before enabling SVQ:
>
> * Guest memory must be contiguous in VA address space, but it "must"
> support hotplug/unplug (although vDPA currently pins it). Hotplug
> memory could always overlap with SVQ vring, so we would need to move
> it.
> * Duplicating mapped memory for writing? (Not sure if guest memory is
> actually movable in qemu).
> * Indirect descriptors will need to allocate and free memory more or
> less frequently, increasing the possibility of overlapping.
I'm not sure I get the problem, but overlapping is not an issue since
we're using VA.
>
> If we can move guest memory,
I'm not sure we can do this or it looks very tricky.
> however, I can see how we can track it in
> a tree *but* mark when the tree is 1:1 with qemu's VA, so buffers
> forwarding does not take the translation penalty. When guest memory
> cannot be map 1:1, we can resort to tree, and come back to 1:1
> translation if the offending tree node(s) get deleted.
>
> However I think this puts the solution a little bit farther than
> "starting simple" :).
>
> Does it make sense?
Yes. So I think I will review the IOVA tree codes and get back to you.
THanks
>
> Thanks!
>
> > Thanks
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
[not found] ` <CAJaqyWf7pFiw2twq9BPyr9fOJFa9ZpSMcbnoknOfC_pbuUWkmg@mail.gmail.com>
@ 2021-10-15 8:37 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-15 8:37 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake
On Fri, Oct 15, 2021 at 4:21 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Fri, Oct 15, 2021 at 9:37 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Fri, Oct 15, 2021 at 3:28 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Oct 13, 2021 at 7:34 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> > > > > Use translations added in VhostIOVATree in SVQ.
> > > > >
> > > > > Now every element needs to store the previous address also, so VirtQueue
> > > > > can consume the elements properly. This adds a little overhead per VQ
> > > > > element, having to allocate more memory to stash them. As a possible
> > > > > optimization, this allocation could be avoided if the descriptor is not
> > > > > a chain but a single one, but this is left undone.
> > > > >
> > > > > TODO: iova range should be queried before, and add logic to fail when
> > > > > GPA is outside of its range and memory listener or svq add it.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez<eperezma@redhat.com>
> > > > > ---
> > > > > hw/virtio/vhost-shadow-virtqueue.h | 4 +-
> > > > > hw/virtio/vhost-shadow-virtqueue.c | 130 ++++++++++++++++++++++++-----
> > > > > hw/virtio/vhost-vdpa.c | 40 ++++++++-
> > > > > hw/virtio/trace-events | 1 +
> > > > > 4 files changed, 152 insertions(+), 23 deletions(-)
> > > >
> > > >
> > > > Think hard about the whole logic. This is safe since qemu memory map
> > > > will fail if guest submits a invalidate IOVA.
> > > >
> > >
> > > Can you expand on this? What you mean is that VirtQueue already
> > > protects SVQ code if the guest sets an invalid buffer address (GPA),
> > > isn't it?
> >
> > Yes.
> >
> > >
> > > > Then I wonder if we do something much more simpler:
> > > >
> > > > 1) Using qemu VA as IOVA but only maps the VA that belongs to guest
> > > > 2) Then we don't need any IOVA tree here, what we need is to just map
> > > > vring and use qemu VA without any translation
> > > >
> > >
> > > That would be great, but either qemu's SVQ vring or guest translated
> > > buffers address (in qemu VA form) were already in high addresses,
> > > outside of the device's iova range (in my test).
> >
> > You're right. I miss that and that's why we need e.g iova tree and allocator.
> >
> > What I proposed only makes sense when shared virtual memory (SVA) is
> > implemented. In the case of SVA, the valid iova range should be the
> > full VA range.
> >
> > >
> > > I didn't try remapping tricks to make them fit in the range, but I
> > > think it does complicate the solution relatively fast if there was
> > > already memory in that range owned by qemu before enabling SVQ:
> > >
> > > * Guest memory must be contiguous in VA address space, but it "must"
> > > support hotplug/unplug (although vDPA currently pins it). Hotplug
> > > memory could always overlap with SVQ vring, so we would need to move
> > > it.
> > > * Duplicating mapped memory for writing? (Not sure if guest memory is
> > > actually movable in qemu).
> > > * Indirect descriptors will need to allocate and free memory more or
> > > less frequently, increasing the possibility of overlapping.
> >
> > I'm not sure I get the problem, but overlapping is not an issue since
> > we're using VA.
> >
>
> It's basically the same (potential) problem of DPDK's SVQ: IOVA Range
> goes from 0 to X. That means that both GPA and SVQ must be in IOVA
> range. As an example, we put GPA at the beginning of the range, that
> grows upwards when memory is hot plugged, and SVQ vrings that grows
> downwards when devices are added or set in SVQ mode.
Yes, but this is not the case if we're using VA.
>
> Even without both space fragmentation problems, we could reach a point
> where both will take the same address, and we would need to go to the
> tree.
>
> But since we are able to detect those situations, I can see how we can
> work in two modes as an optimization: 1:1 when they don't overlap, and
> fragmented tree where it does. But I don't think it's a good idea to
> include it from the beginning, and I'm not sure if that is worth it
> without measuring the tree translation cost first.
>
> > >
> > > If we can move guest memory,
> >
> > I'm not sure we can do this or it looks very tricky.
> >
>
> Just thinking out loud here, but maybe we could map all memory and
> play with remap_file_pages [1] a little bit for that.
The problem is that there's no guarantee that it will always succeed.
So let's start with the current dedicated IOVA address space. We can
do optimization on top anyhow.
>
> > > however, I can see how we can track it in
> > > a tree *but* mark when the tree is 1:1 with qemu's VA, so buffers
> > > forwarding does not take the translation penalty. When guest memory
> > > cannot be map 1:1, we can resort to tree, and come back to 1:1
> > > translation if the offending tree node(s) get deleted.
> > >
> > > However I think this puts the solution a little bit farther than
> > > "starting simple" :).
> > >
> > > Does it make sense?
> >
> > Yes. So I think I will review the IOVA tree codes and get back to you.
> >
>
> Looking forward to it :).
Thanks
>
> Thanks!
>
> [1] https://linux.die.net/man/2/remap_file_pages
>
> > THanks
> >
> > >
> > > Thanks!
> > >
> > > > Thanks
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 18/20] vhost: Add VhostIOVATree
[not found] ` <20211001070603.307037-19-eperezma@redhat.com>
@ 2021-10-19 8:32 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-19 8:32 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> This tree is able to look for a translated address from an IOVA address.
>
> At first glance is similar to util/iova-tree. However, SVQ working on
> devices with limited IOVA space need more capabilities, like allocating
> IOVA chunks or perform reverse translations (qemu addresses to iova).
I don't see any reverse translation is used in the shadow code. Or
anything I missed?
>
> The allocation capability, as "assign a free IOVA address to this chunk
> of memory in qemu's address space" allows shadow virtqueue to create a
> new address space that is not restricted by guest's addressable one, so
> we can allocate shadow vqs vrings outside of its reachability, nor
> qemu's one. At the moment, the allocation is just done growing, not
> allowing deletion.
>
> A different name could be used, but ordered searchable array is a
> little bit long though.
>
> It duplicates the array so it can search efficiently both directions,
> and it will signal overlap if iova or the translated address is
> present in it's each array.
>
> Use of array will be changed to util-iova-tree in future series.
Adding Peter.
It looks to me the only thing miseed is the iova allocator. And it looks
to me it's better to decouple the allocator from the iova tree.
Then we had:
1) initialize iova range
2) iova = iova_alloc(size)
3) built the iova tree map
4) buffer forwarding
5) iova_free(size)
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> hw/virtio/vhost-iova-tree.h | 40 +++++++
> hw/virtio/vhost-iova-tree.c | 230 ++++++++++++++++++++++++++++++++++++
> hw/virtio/meson.build | 2 +-
> 3 files changed, 271 insertions(+), 1 deletion(-)
> create mode 100644 hw/virtio/vhost-iova-tree.h
> create mode 100644 hw/virtio/vhost-iova-tree.c
>
> diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
> new file mode 100644
> index 0000000000..d163a88905
> --- /dev/null
> +++ b/hw/virtio/vhost-iova-tree.h
> @@ -0,0 +1,40 @@
> +/*
> + * vhost software live migration ring
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
> +#define HW_VIRTIO_VHOST_IOVA_TREE_H
> +
> +#include "exec/memory.h"
> +
> +typedef struct VhostDMAMap {
> + void *translated_addr;
> + hwaddr iova;
> + hwaddr size; /* Inclusive */
> + IOMMUAccessFlags perm;
> +} VhostDMAMap;
> +
> +typedef enum VhostDMAMapNewRC {
> + VHOST_DMA_MAP_NO_SPACE = -3,
> + VHOST_DMA_MAP_OVERLAP = -2,
> + VHOST_DMA_MAP_INVALID = -1,
> + VHOST_DMA_MAP_OK = 0,
> +} VhostDMAMapNewRC;
> +
> +typedef struct VhostIOVATree VhostIOVATree;
> +
> +VhostIOVATree *vhost_iova_tree_new(void);
> +void vhost_iova_tree_unref(VhostIOVATree *iova_rm);
> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_unref);
> +
> +const VhostDMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_rm,
> + const VhostDMAMap *map);
> +VhostDMAMapNewRC vhost_iova_tree_alloc(VhostIOVATree *iova_rm,
> + VhostDMAMap *map);
> +
> +#endif
> diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
> new file mode 100644
> index 0000000000..c284e27607
> --- /dev/null
> +++ b/hw/virtio/vhost-iova-tree.c
> @@ -0,0 +1,230 @@
> +/*
> + * vhost software live migration ring
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "vhost-iova-tree.h"
> +
> +#define G_ARRAY_NOT_ZERO_TERMINATED false
> +#define G_ARRAY_NOT_CLEAR_ON_ALLOC false
> +
> +#define iova_min qemu_real_host_page_size
> +
> +/**
> + * VhostIOVATree, able to:
> + * - Translate iova address
> + * - Reverse translate iova address (from translated to iova)
> + * - Allocate IOVA regions for translated range (potentially slow operation)
> + *
> + * Note that it cannot remove nodes.
> + */
> +struct VhostIOVATree {
> + /* Ordered array of reverse translations, IOVA address to qemu memory. */
> + GArray *iova_taddr_map;
> +
> + /*
> + * Ordered array of translations from qemu virtual memory address to iova
> + */
> + GArray *taddr_iova_map;
> +};
Any reason for using GArray? Is it faster?
> +
> +/**
> + * Inserts an element after an existing one in garray.
> + *
> + * @array The array
> + * @prev_elem The previous element of array of NULL if prepending
> + * @map The DMA map
> + *
> + * It provides the aditional advantage of being type safe over
> + * g_array_insert_val, which accepts a reference pointer instead of a value
> + * with no complains.
> + */
> +static void vhost_iova_tree_insert_after(GArray *array,
> + const VhostDMAMap *prev_elem,
> + const VhostDMAMap *map)
> +{
> + size_t pos;
> +
> + if (!prev_elem) {
> + pos = 0;
> + } else {
> + pos = prev_elem - &g_array_index(array, typeof(*prev_elem), 0) + 1;
> + }
> +
> + g_array_insert_val(array, pos, *map);
> +}
> +
> +static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b)
> +{
> + const VhostDMAMap *m1 = a, *m2 = b;
> +
> + if (m1->translated_addr > m2->translated_addr + m2->size) {
> + return 1;
> + }
> +
> + if (m1->translated_addr + m1->size < m2->translated_addr) {
> + return -1;
> + }
> +
> + /* Overlapped */
> + return 0;
> +}
> +
> +/**
> + * Find the previous node to a given iova
> + *
> + * @array The ascending ordered-by-translated-addr array of VhostDMAMap
> + * @map The map to insert
> + * @prev Returned location of the previous map
> + *
> + * Return VHOST_DMA_MAP_OK if everything went well, or VHOST_DMA_MAP_OVERLAP if
> + * it already exists. It is ok to use this function to check if a given range
> + * exists, but it will use a linear search.
> + *
> + * TODO: We can use bsearch to locate the entry if we save the state in the
> + * needle, knowing that the needle is always the first argument to
> + * compare_func.
> + */
> +static VhostDMAMapNewRC vhost_iova_tree_find_prev(const GArray *array,
> + GCompareFunc compare_func,
> + const VhostDMAMap *map,
> + const VhostDMAMap **prev)
> +{
> + size_t i;
> + int r;
> +
> + *prev = NULL;
> + for (i = 0; i < array->len; ++i) {
> + r = compare_func(map, &g_array_index(array, typeof(*map), i));
> + if (r == 0) {
> + return VHOST_DMA_MAP_OVERLAP;
> + }
> + if (r < 0) {
> + return VHOST_DMA_MAP_OK;
> + }
> +
> + *prev = &g_array_index(array, typeof(**prev), i);
> + }
> +
> + return VHOST_DMA_MAP_OK;
> +}
> +
> +/**
> + * Create a new IOVA tree
> + *
> + * Returns the new IOVA tree
> + */
> +VhostIOVATree *vhost_iova_tree_new(void)
> +{
So I think it needs to be initialized with the range we get from
get_iova_range().
Thanks
> + VhostIOVATree *tree = g_new(VhostIOVATree, 1);
> + tree->iova_taddr_map = g_array_new(G_ARRAY_NOT_ZERO_TERMINATED,
> + G_ARRAY_NOT_CLEAR_ON_ALLOC,
> + sizeof(VhostDMAMap));
> + tree->taddr_iova_map = g_array_new(G_ARRAY_NOT_ZERO_TERMINATED,
> + G_ARRAY_NOT_CLEAR_ON_ALLOC,
> + sizeof(VhostDMAMap));
> + return tree;
> +}
> +
> +/**
> + * Destroy an IOVA tree
> + *
> + * @tree The iova tree
> + */
> +void vhost_iova_tree_unref(VhostIOVATree *tree)
> +{
> + g_array_unref(g_steal_pointer(&tree->iova_taddr_map));
> + g_array_unref(g_steal_pointer(&tree->taddr_iova_map));
> +}
> +
> +/**
> + * Find the IOVA address stored from a memory address
> + *
> + * @tree The iova tree
> + * @map The map with the memory address
> + *
> + * Return the stored mapping, or NULL if not found.
> + */
> +const VhostDMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
> + const VhostDMAMap *map)
> +{
> + /*
> + * This can be replaced with g_array_binary_search (Since glib 2.62) when
> + * that version become common enough.
> + */
> + return bsearch(map, tree->taddr_iova_map->data, tree->taddr_iova_map->len,
> + sizeof(*map), vhost_iova_tree_cmp_taddr);
> +}
> +
> +static bool vhost_iova_tree_find_iova_hole(const GArray *iova_map,
> + const VhostDMAMap *map,
> + const VhostDMAMap **prev_elem)
> +{
> + size_t i;
> + hwaddr iova = iova_min;
> +
> + *prev_elem = NULL;
> + for (i = 0; i < iova_map->len; i++) {
> + const VhostDMAMap *next = &g_array_index(iova_map, typeof(*next), i);
> + hwaddr hole_end = next->iova;
> + if (map->size < hole_end - iova) {
> + return true;
> + }
> +
> + iova = next->iova + next->size + 1;
> + *prev_elem = next;
> + }
> +
> + return ((hwaddr)-1 - iova) > iova_map->len;
> +}
> +
> +/**
> + * Allocate a new mapping
> + *
> + * @tree The iova tree
> + * @map The iova map
> + *
> + * Returns:
> + * - VHOST_DMA_MAP_OK if the map fits in the container
> + * - VHOST_DMA_MAP_INVALID if the map does not make sense (like size overflow)
> + * - VHOST_DMA_MAP_OVERLAP if the tree already contains that map
> + * - VHOST_DMA_MAP_NO_SPACE if iova_rm cannot allocate more space.
> + *
> + * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
> + */
> +VhostDMAMapNewRC vhost_iova_tree_alloc(VhostIOVATree *tree,
> + VhostDMAMap *map)
> +{
> + const VhostDMAMap *qemu_prev, *iova_prev;
> + int find_prev_rc;
> + bool fit;
> +
> + if (map->translated_addr + map->size < map->translated_addr ||
> + map->iova + map->size < map->iova || map->perm == IOMMU_NONE) {
> + return VHOST_DMA_MAP_INVALID;
> + }
> +
> + /* Search for a hole in iova space big enough */
> + fit = vhost_iova_tree_find_iova_hole(tree->iova_taddr_map, map,
> + &iova_prev);
> + if (!fit) {
> + return VHOST_DMA_MAP_NO_SPACE;
> + }
> +
> + map->iova = iova_prev ? (iova_prev->iova + iova_prev->size) + 1 : iova_min;
> + find_prev_rc = vhost_iova_tree_find_prev(tree->taddr_iova_map,
> + vhost_iova_tree_cmp_taddr, map,
> + &qemu_prev);
> + if (find_prev_rc == VHOST_DMA_MAP_OVERLAP) {
> + return VHOST_DMA_MAP_OVERLAP;
> + }
> +
> + vhost_iova_tree_insert_after(tree->iova_taddr_map, iova_prev, map);
> + vhost_iova_tree_insert_after(tree->taddr_iova_map, qemu_prev, map);
> + return VHOST_DMA_MAP_OK;
> +}
> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> index 8b5a0225fe..cb306b83c6 100644
> --- a/hw/virtio/meson.build
> +++ b/hw/virtio/meson.build
> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>
> virtio_ss = ss.source_set()
> virtio_ss.add(files('virtio.c'))
> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
> virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
> virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
> virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
[not found] ` <20211001070603.307037-21-eperezma@redhat.com>
2021-10-13 5:34 ` [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ Jason Wang
@ 2021-10-19 9:24 ` Jason Wang
[not found] ` <CAJaqyWcRcm9rwuTqJHS0FmuMrXpoCvF34TzXKQmxXTfZssZ-jA@mail.gmail.com>
1 sibling, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-19 9:24 UTC (permalink / raw)
To: Eugenio Pérez, qemu-devel
Cc: Parav Pandit, Michael S. Tsirkin, Markus Armbruster,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> Use translations added in VhostIOVATree in SVQ.
>
> Now every element needs to store the previous address also, so VirtQueue
> can consume the elements properly. This adds a little overhead per VQ
> element, having to allocate more memory to stash them. As a possible
> optimization, this allocation could be avoided if the descriptor is not
> a chain but a single one, but this is left undone.
>
> TODO: iova range should be queried before, and add logic to fail when
> GPA is outside of its range and memory listener or svq add it.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
> hw/virtio/vhost-shadow-virtqueue.h | 4 +-
> hw/virtio/vhost-shadow-virtqueue.c | 130 ++++++++++++++++++++++++-----
> hw/virtio/vhost-vdpa.c | 40 ++++++++-
> hw/virtio/trace-events | 1 +
> 4 files changed, 152 insertions(+), 23 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index b7baa424a7..a0e6b5267a 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -11,6 +11,7 @@
> #define VHOST_SHADOW_VIRTQUEUE_H
>
> #include "hw/virtio/vhost.h"
> +#include "hw/virtio/vhost-iova-tree.h"
>
> typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>
> @@ -28,7 +29,8 @@ bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> VhostShadowVirtqueue *svq);
>
> -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
> +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> + VhostIOVATree *iova_map);
>
> void vhost_svq_free(VhostShadowVirtqueue *vq);
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 2fd0bab75d..9db538547e 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -11,12 +11,19 @@
> #include "hw/virtio/vhost-shadow-virtqueue.h"
> #include "hw/virtio/vhost.h"
> #include "hw/virtio/virtio-access.h"
> +#include "hw/virtio/vhost-iova-tree.h"
>
> #include "standard-headers/linux/vhost_types.h"
>
> #include "qemu/error-report.h"
> #include "qemu/main-loop.h"
>
> +typedef struct SVQElement {
> + VirtQueueElement elem;
> + void **in_sg_stash;
> + void **out_sg_stash;
> +} SVQElement;
> +
> /* Shadow virtqueue to relay notifications */
> typedef struct VhostShadowVirtqueue {
> /* Shadow vring */
> @@ -46,8 +53,11 @@ typedef struct VhostShadowVirtqueue {
> /* Virtio device */
> VirtIODevice *vdev;
>
> + /* IOVA mapping if used */
> + VhostIOVATree *iova_map;
> +
> /* Map for returning guest's descriptors */
> - VirtQueueElement **ring_id_maps;
> + SVQElement **ring_id_maps;
>
> /* Next head to expose to device */
> uint16_t avail_idx_shadow;
> @@ -79,13 +89,6 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> continue;
>
> case VIRTIO_F_ACCESS_PLATFORM:
> - /* SVQ needs this feature disabled. Can't continue */
> - if (*dev_features & BIT_ULL(b)) {
> - clear_bit(b, dev_features);
> - r = false;
> - }
> - break;
> -
> case VIRTIO_F_VERSION_1:
> /* SVQ needs this feature, so can't continue */
> if (!(*dev_features & BIT_ULL(b))) {
> @@ -126,6 +129,64 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> }
> }
>
> +static void vhost_svq_stash_addr(void ***stash, const struct iovec *iov,
> + size_t num)
> +{
> + size_t i;
> +
> + if (num == 0) {
> + return;
> + }
> +
> + *stash = g_new(void *, num);
> + for (i = 0; i < num; ++i) {
> + (*stash)[i] = iov[i].iov_base;
> + }
> +}
> +
> +static void vhost_svq_unstash_addr(void **stash, struct iovec *iov, size_t num)
> +{
> + size_t i;
> +
> + if (num == 0) {
> + return;
> + }
> +
> + for (i = 0; i < num; ++i) {
> + iov[i].iov_base = stash[i];
> + }
> + g_free(stash);
> +}
> +
> +static void vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> + struct iovec *iovec, size_t num)
> +{
> + size_t i;
> +
> + for (i = 0; i < num; ++i) {
> + VhostDMAMap needle = {
> + .translated_addr = iovec[i].iov_base,
> + .size = iovec[i].iov_len,
> + };
> + size_t off;
> +
> + const VhostDMAMap *map = vhost_iova_tree_find_iova(svq->iova_map,
> + &needle);
Is it possible that we end up with more than one maps here?
> + /*
> + * Map cannot be NULL since iova map contains all guest space and
> + * qemu already has a physical address mapped
> + */
> + assert(map);
> +
> + /*
> + * Map->iova chunk size is ignored. What to do if descriptor
> + * (addr, size) does not fit is delegated to the device.
> + */
> + off = needle.translated_addr - map->translated_addr;
> + iovec[i].iov_base = (void *)(map->iova + off);
> + }
> +}
> +
> static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> const struct iovec *iovec,
> size_t num, bool more_descs, bool write)
> @@ -156,8 +217,9 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> }
>
> static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> - VirtQueueElement *elem)
> + SVQElement *svq_elem)
> {
> + VirtQueueElement *elem = &svq_elem->elem;
> int head;
> unsigned avail_idx;
> vring_avail_t *avail = svq->vring.avail;
> @@ -167,6 +229,12 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> /* We need some descriptors here */
> assert(elem->out_num || elem->in_num);
>
> + vhost_svq_stash_addr(&svq_elem->in_sg_stash, elem->in_sg, elem->in_num);
> + vhost_svq_stash_addr(&svq_elem->out_sg_stash, elem->out_sg, elem->out_num);
I wonder if we can solve the trick like stash and unstash with a
dedicated sgs in svq_elem, instead of reusing the elem.
Thanks
> +
> + vhost_svq_translate_addr(svq, elem->in_sg, elem->in_num);
> + vhost_svq_translate_addr(svq, elem->out_sg, elem->out_num);
> +
> vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> elem->in_num > 0, false);
> vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> @@ -187,7 +255,7 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
>
> }
>
> -static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +static void vhost_svq_add(VhostShadowVirtqueue *svq, SVQElement *elem)
> {
> unsigned qemu_head = vhost_svq_add_split(svq, elem);
>
> @@ -221,7 +289,7 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> }
>
> while (true) {
> - VirtQueueElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> + SVQElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> if (!elem) {
> break;
> }
> @@ -247,7 +315,7 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> return svq->used_idx != svq->shadow_used_idx;
> }
>
> -static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> +static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> {
> vring_desc_t *descs = svq->vring.desc;
> const vring_used_t *used = svq->vring.used;
> @@ -279,7 +347,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> descs[used_elem.id].next = svq->free_head;
> svq->free_head = used_elem.id;
>
> - svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> + svq->ring_id_maps[used_elem.id]->elem.len = used_elem.len;
> return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> }
>
> @@ -296,12 +364,19 @@ static void vhost_svq_handle_call_no_test(EventNotifier *n)
>
> vhost_svq_set_notification(svq, false);
> while (true) {
> - g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> - if (!elem) {
> + g_autofree SVQElement *svq_elem = vhost_svq_get_buf(svq);
> + VirtQueueElement *elem;
> + if (!svq_elem) {
> break;
> }
>
> assert(i < svq->vring.num);
> + elem = &svq_elem->elem;
> +
> + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> + elem->in_num);
> + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> + elem->out_num);
> virtqueue_fill(vq, elem, elem->len, i++);
> }
>
> @@ -451,14 +526,24 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> event_notifier_set_handler(&svq->host_notifier, NULL);
>
> for (i = 0; i < svq->vring.num; ++i) {
> - g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> + g_autofree SVQElement *svq_elem = svq->ring_id_maps[i];
> + VirtQueueElement *elem;
> +
> + if (!svq_elem) {
> + continue;
> + }
> +
> + elem = &svq_elem->elem;
> + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> + elem->in_num);
> + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> + elem->out_num);
> +
> /*
> * Although the doc says we must unpop in order, it's ok to unpop
> * everything.
> */
> - if (elem) {
> - virtqueue_unpop(svq->vq, elem, elem->len);
> - }
> + virtqueue_unpop(svq->vq, elem, elem->len);
> }
> }
>
> @@ -466,7 +551,8 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> * methods and file descriptors.
> */
> -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> + VhostIOVATree *iova_map)
> {
> int vq_idx = dev->vq_index + idx;
> unsigned num = virtio_queue_get_num(dev->vdev, vq_idx);
> @@ -500,11 +586,13 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> memset(svq->vring.desc, 0, driver_size);
> svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> memset(svq->vring.used, 0, device_size);
> + svq->iova_map = iova_map;
> +
> for (i = 0; i < num - 1; i++) {
> svq->vring.desc[i].next = cpu_to_le16(i + 1);
> }
>
> - svq->ring_id_maps = g_new0(VirtQueueElement *, num);
> + svq->ring_id_maps = g_new0(SVQElement *, num);
> event_notifier_set_handler(&svq->call_notifier,
> vhost_svq_handle_call);
> return g_steal_pointer(&svq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index a9c680b487..f5a12fee9d 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -176,6 +176,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> vaddr, section->readonly);
>
> llsize = int128_sub(llend, int128_make64(iova));
> + if (v->shadow_vqs_enabled) {
> + VhostDMAMap mem_region = {
> + .translated_addr = vaddr,
> + .size = int128_get64(llsize) - 1,
> + .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> + };
> +
> + int r = vhost_iova_tree_alloc(v->iova_map, &mem_region);
> + assert(r == VHOST_DMA_MAP_OK);
> +
> + iova = mem_region.iova;
> + }
>
> ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> vaddr, section->readonly);
> @@ -754,6 +766,23 @@ static bool vhost_vdpa_force_iommu(struct vhost_dev *dev)
> return true;
> }
>
> +static int vhost_vdpa_get_iova_range(struct vhost_dev *dev,
> + hwaddr *first, hwaddr *last)
> +{
> + int ret;
> + struct vhost_vdpa_iova_range range;
> +
> + ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_IOVA_RANGE, &range);
> + if (ret != 0) {
> + return ret;
> + }
> +
> + *first = range.first;
> + *last = range.last;
> + trace_vhost_vdpa_get_iova_range(dev, *first, *last);
> + return ret;
> +}
> +
> /**
> * Maps QEMU vaddr memory to device in a suitable way for shadow virtqueue:
> * - It always reference qemu memory address, not guest's memory.
> @@ -881,6 +910,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
> static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> {
> struct vhost_dev *hdev = v->dev;
> + hwaddr iova_first, iova_last;
> unsigned n;
> int r;
>
> @@ -894,7 +924,7 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> /* Allocate resources */
> assert(v->shadow_vqs->len == 0);
> for (n = 0; n < hdev->nvqs; ++n) {
> - VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
> + VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n, v->iova_map);
> if (unlikely(!svq)) {
> g_ptr_array_set_size(v->shadow_vqs, 0);
> return 0;
> @@ -903,6 +933,8 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> }
> }
>
> + r = vhost_vdpa_get_iova_range(hdev, &iova_first, &iova_last);
> + assert(r == 0);
> r = vhost_vdpa_vring_pause(hdev);
> assert(r == 0);
>
> @@ -913,6 +945,12 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> }
> }
>
> + memory_listener_unregister(&v->listener);
> + if (vhost_vdpa_dma_unmap(v, iova_first,
> + (iova_last - iova_first) & TARGET_PAGE_MASK)) {
> + error_report("Fail to invalidate device iotlb");
> + }
> +
> /* Reset device so it can be configured */
> r = vhost_vdpa_dev_start(hdev, false);
> assert(r == 0);
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 8ed19e9d0c..650e521e35 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -52,6 +52,7 @@ vhost_vdpa_set_vring_call(void *dev, unsigned int index, int fd) "dev: %p index:
> vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRIx64
> vhost_vdpa_set_owner(void *dev) "dev: %p"
> vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> +vhost_vdpa_get_iova_range(void *dev, uint64_t first, uint64_t last) "dev: %p first: 0x%"PRIx64" last: 0x%"PRIx64
>
> # virtio.c
> virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 11/20] vhost: Route host->guest notification through shadow virtqueue
[not found] ` <CAJaqyWcO9oaGsRe-oMNbmHx7G4Mw0vZfc+7WYQ23+SteoFVn4Q@mail.gmail.com>
@ 2021-10-20 2:01 ` Jason Wang
0 siblings, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-20 2:01 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake
On Tue, Oct 19, 2021 at 4:40 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Fri, Oct 15, 2021 at 6:42 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2021/10/15 上午12:39, Eugenio Perez Martin 写道:
> > > On Wed, Oct 13, 2021 at 5:47 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2021/10/1 下午3:05, Eugenio Pérez 写道:
> > >>> This will make qemu aware of the device used buffers, allowing it to
> > >>> write the guest memory with its contents if needed.
> > >>>
> > >>> Since the use of vhost_virtqueue_start can unmasks and discard call
> > >>> events, vhost_virtqueue_start should be modified in one of these ways:
> > >>> * Split in two: One of them uses all logic to start a queue with no
> > >>> side effects for the guest, and another one tha actually assumes that
> > >>> the guest has just started the device. Vdpa should use just the
> > >>> former.
> > >>> * Actually store and check if the guest notifier is masked, and do it
> > >>> conditionally.
> > >>> * Left as it is, and duplicate all the logic in vhost-vdpa.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>> hw/virtio/vhost-shadow-virtqueue.c | 19 +++++++++++++++
> > >>> hw/virtio/vhost-vdpa.c | 38 +++++++++++++++++++++++++++++-
> > >>> 2 files changed, 56 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > >>> index 21dc99ab5d..3fe129cf63 100644
> > >>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > >>> @@ -53,6 +53,22 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> > >>> event_notifier_set(&svq->kick_notifier);
> > >>> }
> > >>>
> > >>> +/* Forward vhost notifications */
> > >>> +static void vhost_svq_handle_call_no_test(EventNotifier *n)
> > >>> +{
> > >>> + VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>> + call_notifier);
> > >>> +
> > >>> + event_notifier_set(&svq->guest_call_notifier);
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_handle_call(EventNotifier *n)
> > >>> +{
> > >>> + if (likely(event_notifier_test_and_clear(n))) {
> > >>> + vhost_svq_handle_call_no_test(n);
> > >>> + }
> > >>> +}
> > >>> +
> > >>> /*
> > >>> * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
> > >>> * exists pending used buffers.
> > >>> @@ -180,6 +196,8 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > >>> }
> > >>>
> > >>> svq->vq = virtio_get_queue(dev->vdev, vq_idx);
> > >>> + event_notifier_set_handler(&svq->call_notifier,
> > >>> + vhost_svq_handle_call);
> > >>> return g_steal_pointer(&svq);
> > >>>
> > >>> err_init_call_notifier:
> > >>> @@ -195,6 +213,7 @@ err_init_kick_notifier:
> > >>> void vhost_svq_free(VhostShadowVirtqueue *vq)
> > >>> {
> > >>> event_notifier_cleanup(&vq->kick_notifier);
> > >>> + event_notifier_set_handler(&vq->call_notifier, NULL);
> > >>> event_notifier_cleanup(&vq->call_notifier);
> > >>> g_free(vq);
> > >>> }
> > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>> index bc34de2439..6c5f4c98b8 100644
> > >>> --- a/hw/virtio/vhost-vdpa.c
> > >>> +++ b/hw/virtio/vhost-vdpa.c
> > >>> @@ -712,13 +712,40 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
> > >>> {
> > >>> struct vhost_vdpa *v = dev->opaque;
> > >>> VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
> > >>> - return vhost_svq_start(dev, idx, svq);
> > >>> + EventNotifier *vhost_call_notifier = vhost_svq_get_svq_call_notifier(svq);
> > >>> + struct vhost_vring_file vhost_call_file = {
> > >>> + .index = idx + dev->vq_index,
> > >>> + .fd = event_notifier_get_fd(vhost_call_notifier),
> > >>> + };
> > >>> + int r;
> > >>> + bool b;
> > >>> +
> > >>> + /* Set shadow vq -> guest notifier */
> > >>> + assert(v->call_fd[idx]);
> > >>
> > >> We need aovid the asser() here. On which case we can hit this?
> > >>
> > > I would say that there is no way we can actually hit it, so let's remove it.
> > >
> > >>> + vhost_svq_set_guest_call_notifier(svq, v->call_fd[idx]);
> > >>> +
> > >>> + b = vhost_svq_start(dev, idx, svq);
> > >>> + if (unlikely(!b)) {
> > >>> + return false;
> > >>> + }
> > >>> +
> > >>> + /* Set device -> SVQ notifier */
> > >>> + r = vhost_vdpa_set_vring_dev_call(dev, &vhost_call_file);
> > >>> + if (unlikely(r)) {
> > >>> + error_report("vhost_vdpa_set_vring_call for shadow vq failed");
> > >>> + return false;
> > >>> + }
> > >>
> > >> Similar to kick, do we need to set_vring_call() before vhost_svq_start()?
> > >>
> > > It should not matter at this moment because the device should not be
> > > started at this point and device calls should not run
> > > vhost_svq_handle_call until BQL is released.
> >
> >
> > Yes, we stop virtqueue before.
> >
> >
> > >
> > > The "logic" of doing it after is to make clear that svq must be fully
> > > initialized before processing device calls, even in the case that we
> > > extract SVQ in its own iothread or similar. But this could be done
> > > before vhost_svq_start for sure.
> > >
> > >>> +
> > >>> + /* Check for pending calls */
> > >>> + event_notifier_set(vhost_call_notifier);
> > >>
> > >> Interesting, can this result spurious interrupt?
> > >>
> > > This actually "queues" a vhost_svq_handle_call after the BQL release,
> > > where the device should be fully reset. In that regard, if there are
> > > no used descriptors there will not be an irq raised to the guest. Does
> > > that answer the question? Or have I missed something?
> >
> >
> > Yes, please explain this in the comment.
> >
>
> I'm reviewing this again, and actually I think I was wrong in solving the issue.
>
> Since at this point the device is being configured, there is no chance
> that we had a missing call notification here: A previous kick is
> needed for the device to generate any calls, and these cannot be
> processed.
>
> What is not solved in this series is that we could have pending used
> buffers in vdpa device stopping SVQ, but queuing a check for that is
> not going to solve anything, since SVQ vring would be already
> destroyed:
>
> * vdpa device marks N > 0 buffers as used, and calls.
> * Before processing them, SVQ stop is called. SVQ have not processed
> these, and cleans them, making this event_notifier_set useless.
>
> So this would require a few changes. Mainly, instead of queueing a
> check for used, these need to be checked before svq cleaning. After
> that, obtain the VQ state (is not obtained in the stop at the moment,
> trusting in guest's used idx) and run a last
> vhost_svq_handle_call_no_test while the device is paused.
It looks to me what's really important is that SVQ needs to
drain/forwared used buffers after vdpa is stopped. Then we should be
fine.
>
> Thanks!
>
> >
> > >
> > >>> + return true;
> > >>> }
> > >>>
> > >>> static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > >>> {
> > >>> struct vhost_dev *hdev = v->dev;
> > >>> unsigned n;
> > >>> + int r;
> > >>>
> > >>> if (enable == v->shadow_vqs_enabled) {
> > >>> return hdev->nvqs;
> > >>> @@ -752,9 +779,18 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > >>> if (!enable) {
> > >>> /* Disable all queues or clean up failed start */
> > >>> for (n = 0; n < v->shadow_vqs->len; ++n) {
> > >>> + struct vhost_vring_file file = {
> > >>> + .index = vhost_vdpa_get_vq_index(hdev, n),
> > >>> + .fd = v->call_fd[n],
> > >>> + };
> > >>> +
> > >>> + r = vhost_vdpa_set_vring_call(hdev, &file);
> > >>> + assert(r == 0);
> > >>> +
> > >>> unsigned vq_idx = vhost_vdpa_get_vq_index(hdev, n);
> > >>> VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, n);
> > >>> vhost_svq_stop(hdev, n, svq);
> > >>> + /* TODO: This can unmask or override call fd! */
> > >>
> > >> I don't get this comment. Does this mean the current code can't work
> > >> with mask_notifiers? If yes, this is something we need to fix.
> > >>
> > > Yes, but it will be addressed in the next series. I should have
> > > explained it bette here, sorry :).
> >
> >
> > Ok.
> >
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >> Thanks
> > >>
> > >>
> > >>> vhost_virtqueue_start(hdev, hdev->vdev, &hdev->vqs[n], vq_idx);
> > >>> }
> > >>>
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
[not found] ` <CAJaqyWcRcm9rwuTqJHS0FmuMrXpoCvF34TzXKQmxXTfZssZ-jA@mail.gmail.com>
@ 2021-10-20 2:02 ` Jason Wang
2021-10-20 2:07 ` Jason Wang
0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-20 2:02 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
On Tue, Oct 19, 2021 at 6:29 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Oct 19, 2021 at 11:25 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> > > Use translations added in VhostIOVATree in SVQ.
> > >
> > > Now every element needs to store the previous address also, so VirtQueue
> > > can consume the elements properly. This adds a little overhead per VQ
> > > element, having to allocate more memory to stash them. As a possible
> > > optimization, this allocation could be avoided if the descriptor is not
> > > a chain but a single one, but this is left undone.
> > >
> > > TODO: iova range should be queried before, and add logic to fail when
> > > GPA is outside of its range and memory listener or svq add it.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > > hw/virtio/vhost-shadow-virtqueue.h | 4 +-
> > > hw/virtio/vhost-shadow-virtqueue.c | 130 ++++++++++++++++++++++++-----
> > > hw/virtio/vhost-vdpa.c | 40 ++++++++-
> > > hw/virtio/trace-events | 1 +
> > > 4 files changed, 152 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > index b7baa424a7..a0e6b5267a 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > @@ -11,6 +11,7 @@
> > > #define VHOST_SHADOW_VIRTQUEUE_H
> > >
> > > #include "hw/virtio/vhost.h"
> > > +#include "hw/virtio/vhost-iova-tree.h"
> > >
> > > typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > >
> > > @@ -28,7 +29,8 @@ bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> > > void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > VhostShadowVirtqueue *svq);
> > >
> > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
> > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > + VhostIOVATree *iova_map);
> > >
> > > void vhost_svq_free(VhostShadowVirtqueue *vq);
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > index 2fd0bab75d..9db538547e 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -11,12 +11,19 @@
> > > #include "hw/virtio/vhost-shadow-virtqueue.h"
> > > #include "hw/virtio/vhost.h"
> > > #include "hw/virtio/virtio-access.h"
> > > +#include "hw/virtio/vhost-iova-tree.h"
> > >
> > > #include "standard-headers/linux/vhost_types.h"
> > >
> > > #include "qemu/error-report.h"
> > > #include "qemu/main-loop.h"
> > >
> > > +typedef struct SVQElement {
> > > + VirtQueueElement elem;
> > > + void **in_sg_stash;
> > > + void **out_sg_stash;
> > > +} SVQElement;
> > > +
> > > /* Shadow virtqueue to relay notifications */
> > > typedef struct VhostShadowVirtqueue {
> > > /* Shadow vring */
> > > @@ -46,8 +53,11 @@ typedef struct VhostShadowVirtqueue {
> > > /* Virtio device */
> > > VirtIODevice *vdev;
> > >
> > > + /* IOVA mapping if used */
> > > + VhostIOVATree *iova_map;
> > > +
> > > /* Map for returning guest's descriptors */
> > > - VirtQueueElement **ring_id_maps;
> > > + SVQElement **ring_id_maps;
> > >
> > > /* Next head to expose to device */
> > > uint16_t avail_idx_shadow;
> > > @@ -79,13 +89,6 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > > continue;
> > >
> > > case VIRTIO_F_ACCESS_PLATFORM:
> > > - /* SVQ needs this feature disabled. Can't continue */
> > > - if (*dev_features & BIT_ULL(b)) {
> > > - clear_bit(b, dev_features);
> > > - r = false;
> > > - }
> > > - break;
> > > -
> > > case VIRTIO_F_VERSION_1:
> > > /* SVQ needs this feature, so can't continue */
> > > if (!(*dev_features & BIT_ULL(b))) {
> > > @@ -126,6 +129,64 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> > > }
> > > }
> > >
> > > +static void vhost_svq_stash_addr(void ***stash, const struct iovec *iov,
> > > + size_t num)
> > > +{
> > > + size_t i;
> > > +
> > > + if (num == 0) {
> > > + return;
> > > + }
> > > +
> > > + *stash = g_new(void *, num);
> > > + for (i = 0; i < num; ++i) {
> > > + (*stash)[i] = iov[i].iov_base;
> > > + }
> > > +}
> > > +
> > > +static void vhost_svq_unstash_addr(void **stash, struct iovec *iov, size_t num)
> > > +{
> > > + size_t i;
> > > +
> > > + if (num == 0) {
> > > + return;
> > > + }
> > > +
> > > + for (i = 0; i < num; ++i) {
> > > + iov[i].iov_base = stash[i];
> > > + }
> > > + g_free(stash);
> > > +}
> > > +
> > > +static void vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> > > + struct iovec *iovec, size_t num)
> > > +{
> > > + size_t i;
> > > +
> > > + for (i = 0; i < num; ++i) {
> > > + VhostDMAMap needle = {
> > > + .translated_addr = iovec[i].iov_base,
> > > + .size = iovec[i].iov_len,
> > > + };
> > > + size_t off;
> > > +
> > > + const VhostDMAMap *map = vhost_iova_tree_find_iova(svq->iova_map,
> > > + &needle);
> >
> >
> > Is it possible that we end up with more than one maps here?
> >
>
> Actually it is possible, since there is no guarantee that one
> descriptor (or indirect descriptor) maps exactly to one iov. It could
> map to many if qemu vaddr is not contiguous but GPA + size is. This is
> something that must be fixed for the next revision, so thanks for
> pointing it out!
>
> Taking that into account, the condition that svq vring avail_idx -
> used_idx was always less or equal than guest's vring avail_idx -
> used_idx is not true anymore. Checking for that before adding buffers
> to SVQ is the easy part, but how could we recover in that case?
>
> I think that the easy solution is to check for more available buffers
> unconditionally at the end of vhost_svq_handle_call, which handles the
> SVQ used and is supposed to make more room for available buffers. So
> vhost_handle_guest_kick would not check if eventfd is set or not
> anymore.
>
> Would that make sense?
Yes, I think it should work.
Thanks
>
> Thanks!
>
> >
> > > + /*
> > > + * Map cannot be NULL since iova map contains all guest space and
> > > + * qemu already has a physical address mapped
> > > + */
> > > + assert(map);
> > > +
> > > + /*
> > > + * Map->iova chunk size is ignored. What to do if descriptor
> > > + * (addr, size) does not fit is delegated to the device.
> > > + */
> > > + off = needle.translated_addr - map->translated_addr;
> > > + iovec[i].iov_base = (void *)(map->iova + off);
> > > + }
> > > +}
> > > +
> > > static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > > const struct iovec *iovec,
> > > size_t num, bool more_descs, bool write)
> > > @@ -156,8 +217,9 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > > }
> > >
> > > static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > - VirtQueueElement *elem)
> > > + SVQElement *svq_elem)
> > > {
> > > + VirtQueueElement *elem = &svq_elem->elem;
> > > int head;
> > > unsigned avail_idx;
> > > vring_avail_t *avail = svq->vring.avail;
> > > @@ -167,6 +229,12 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > /* We need some descriptors here */
> > > assert(elem->out_num || elem->in_num);
> > >
> > > + vhost_svq_stash_addr(&svq_elem->in_sg_stash, elem->in_sg, elem->in_num);
> > > + vhost_svq_stash_addr(&svq_elem->out_sg_stash, elem->out_sg, elem->out_num);
> >
> >
> > I wonder if we can solve the trick like stash and unstash with a
> > dedicated sgs in svq_elem, instead of reusing the elem.
> >
>
> Actually yes, it would be way simpler to use a new sgs array in
> svq_elem. I will change that.
>
> Thanks!
>
> > Thanks
> >
> >
> > > +
> > > + vhost_svq_translate_addr(svq, elem->in_sg, elem->in_num);
> > > + vhost_svq_translate_addr(svq, elem->out_sg, elem->out_num);
> > > +
> > > vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > > elem->in_num > 0, false);
> > > vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > > @@ -187,7 +255,7 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > >
> > > }
> > >
> > > -static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > > +static void vhost_svq_add(VhostShadowVirtqueue *svq, SVQElement *elem)
> > > {
> > > unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > >
> > > @@ -221,7 +289,7 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> > > }
> > >
> > > while (true) {
> > > - VirtQueueElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > > + SVQElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > > if (!elem) {
> > > break;
> > > }
> > > @@ -247,7 +315,7 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> > > return svq->used_idx != svq->shadow_used_idx;
> > > }
> > >
> > > -static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > +static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > {
> > > vring_desc_t *descs = svq->vring.desc;
> > > const vring_used_t *used = svq->vring.used;
> > > @@ -279,7 +347,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > descs[used_elem.id].next = svq->free_head;
> > > svq->free_head = used_elem.id;
> > >
> > > - svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > > + svq->ring_id_maps[used_elem.id]->elem.len = used_elem.len;
> > > return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > > }
> > >
> > > @@ -296,12 +364,19 @@ static void vhost_svq_handle_call_no_test(EventNotifier *n)
> > >
> > > vhost_svq_set_notification(svq, false);
> > > while (true) {
> > > - g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> > > - if (!elem) {
> > > + g_autofree SVQElement *svq_elem = vhost_svq_get_buf(svq);
> > > + VirtQueueElement *elem;
> > > + if (!svq_elem) {
> > > break;
> > > }
> > >
> > > assert(i < svq->vring.num);
> > > + elem = &svq_elem->elem;
> > > +
> > > + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> > > + elem->in_num);
> > > + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> > > + elem->out_num);
> > > virtqueue_fill(vq, elem, elem->len, i++);
> > > }
> > >
> > > @@ -451,14 +526,24 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > event_notifier_set_handler(&svq->host_notifier, NULL);
> > >
> > > for (i = 0; i < svq->vring.num; ++i) {
> > > - g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> > > + g_autofree SVQElement *svq_elem = svq->ring_id_maps[i];
> > > + VirtQueueElement *elem;
> > > +
> > > + if (!svq_elem) {
> > > + continue;
> > > + }
> > > +
> > > + elem = &svq_elem->elem;
> > > + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> > > + elem->in_num);
> > > + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> > > + elem->out_num);
> > > +
> > > /*
> > > * Although the doc says we must unpop in order, it's ok to unpop
> > > * everything.
> > > */
> > > - if (elem) {
> > > - virtqueue_unpop(svq->vq, elem, elem->len);
> > > - }
> > > + virtqueue_unpop(svq->vq, elem, elem->len);
> > > }
> > > }
> > >
> > > @@ -466,7 +551,8 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > > * methods and file descriptors.
> > > */
> > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > + VhostIOVATree *iova_map)
> > > {
> > > int vq_idx = dev->vq_index + idx;
> > > unsigned num = virtio_queue_get_num(dev->vdev, vq_idx);
> > > @@ -500,11 +586,13 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > memset(svq->vring.desc, 0, driver_size);
> > > svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> > > memset(svq->vring.used, 0, device_size);
> > > + svq->iova_map = iova_map;
> > > +
> > > for (i = 0; i < num - 1; i++) {
> > > svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > > }
> > >
> > > - svq->ring_id_maps = g_new0(VirtQueueElement *, num);
> > > + svq->ring_id_maps = g_new0(SVQElement *, num);
> > > event_notifier_set_handler(&svq->call_notifier,
> > > vhost_svq_handle_call);
> > > return g_steal_pointer(&svq);
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index a9c680b487..f5a12fee9d 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -176,6 +176,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > vaddr, section->readonly);
> > >
> > > llsize = int128_sub(llend, int128_make64(iova));
> > > + if (v->shadow_vqs_enabled) {
> > > + VhostDMAMap mem_region = {
> > > + .translated_addr = vaddr,
> > > + .size = int128_get64(llsize) - 1,
> > > + .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> > > + };
> > > +
> > > + int r = vhost_iova_tree_alloc(v->iova_map, &mem_region);
> > > + assert(r == VHOST_DMA_MAP_OK);
> > > +
> > > + iova = mem_region.iova;
> > > + }
> > >
> > > ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> > > vaddr, section->readonly);
> > > @@ -754,6 +766,23 @@ static bool vhost_vdpa_force_iommu(struct vhost_dev *dev)
> > > return true;
> > > }
> > >
> > > +static int vhost_vdpa_get_iova_range(struct vhost_dev *dev,
> > > + hwaddr *first, hwaddr *last)
> > > +{
> > > + int ret;
> > > + struct vhost_vdpa_iova_range range;
> > > +
> > > + ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_IOVA_RANGE, &range);
> > > + if (ret != 0) {
> > > + return ret;
> > > + }
> > > +
> > > + *first = range.first;
> > > + *last = range.last;
> > > + trace_vhost_vdpa_get_iova_range(dev, *first, *last);
> > > + return ret;
> > > +}
> > > +
> > > /**
> > > * Maps QEMU vaddr memory to device in a suitable way for shadow virtqueue:
> > > * - It always reference qemu memory address, not guest's memory.
> > > @@ -881,6 +910,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
> > > static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > {
> > > struct vhost_dev *hdev = v->dev;
> > > + hwaddr iova_first, iova_last;
> > > unsigned n;
> > > int r;
> > >
> > > @@ -894,7 +924,7 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > /* Allocate resources */
> > > assert(v->shadow_vqs->len == 0);
> > > for (n = 0; n < hdev->nvqs; ++n) {
> > > - VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
> > > + VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n, v->iova_map);
> > > if (unlikely(!svq)) {
> > > g_ptr_array_set_size(v->shadow_vqs, 0);
> > > return 0;
> > > @@ -903,6 +933,8 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > }
> > > }
> > >
> > > + r = vhost_vdpa_get_iova_range(hdev, &iova_first, &iova_last);
> > > + assert(r == 0);
> > > r = vhost_vdpa_vring_pause(hdev);
> > > assert(r == 0);
> > >
> > > @@ -913,6 +945,12 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > }
> > > }
> > >
> > > + memory_listener_unregister(&v->listener);
> > > + if (vhost_vdpa_dma_unmap(v, iova_first,
> > > + (iova_last - iova_first) & TARGET_PAGE_MASK)) {
> > > + error_report("Fail to invalidate device iotlb");
> > > + }
> > > +
> > > /* Reset device so it can be configured */
> > > r = vhost_vdpa_dev_start(hdev, false);
> > > assert(r == 0);
> > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > index 8ed19e9d0c..650e521e35 100644
> > > --- a/hw/virtio/trace-events
> > > +++ b/hw/virtio/trace-events
> > > @@ -52,6 +52,7 @@ vhost_vdpa_set_vring_call(void *dev, unsigned int index, int fd) "dev: %p index:
> > > vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRIx64
> > > vhost_vdpa_set_owner(void *dev) "dev: %p"
> > > vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> > > +vhost_vdpa_get_iova_range(void *dev, uint64_t first, uint64_t last) "dev: %p first: 0x%"PRIx64" last: 0x%"PRIx64
> > >
> > > # virtio.c
> > > virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
2021-10-20 2:02 ` Jason Wang
@ 2021-10-20 2:07 ` Jason Wang
[not found] ` <CAJaqyWe6R_32Se75XF3+NUZyiWr+cLYQ_86LExmom-vCRT9G0g@mail.gmail.com>
0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-20 2:07 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
On Wed, Oct 20, 2021 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Oct 19, 2021 at 6:29 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Tue, Oct 19, 2021 at 11:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> > > > Use translations added in VhostIOVATree in SVQ.
> > > >
> > > > Now every element needs to store the previous address also, so VirtQueue
> > > > can consume the elements properly. This adds a little overhead per VQ
> > > > element, having to allocate more memory to stash them. As a possible
> > > > optimization, this allocation could be avoided if the descriptor is not
> > > > a chain but a single one, but this is left undone.
> > > >
> > > > TODO: iova range should be queried before, and add logic to fail when
> > > > GPA is outside of its range and memory listener or svq add it.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > ---
> > > > hw/virtio/vhost-shadow-virtqueue.h | 4 +-
> > > > hw/virtio/vhost-shadow-virtqueue.c | 130 ++++++++++++++++++++++++-----
> > > > hw/virtio/vhost-vdpa.c | 40 ++++++++-
> > > > hw/virtio/trace-events | 1 +
> > > > 4 files changed, 152 insertions(+), 23 deletions(-)
> > > >
> > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > index b7baa424a7..a0e6b5267a 100644
> > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > @@ -11,6 +11,7 @@
> > > > #define VHOST_SHADOW_VIRTQUEUE_H
> > > >
> > > > #include "hw/virtio/vhost.h"
> > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > >
> > > > typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > >
> > > > @@ -28,7 +29,8 @@ bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> > > > void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > VhostShadowVirtqueue *svq);
> > > >
> > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
> > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > > + VhostIOVATree *iova_map);
> > > >
> > > > void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > >
> > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > index 2fd0bab75d..9db538547e 100644
> > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > @@ -11,12 +11,19 @@
> > > > #include "hw/virtio/vhost-shadow-virtqueue.h"
> > > > #include "hw/virtio/vhost.h"
> > > > #include "hw/virtio/virtio-access.h"
> > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > >
> > > > #include "standard-headers/linux/vhost_types.h"
> > > >
> > > > #include "qemu/error-report.h"
> > > > #include "qemu/main-loop.h"
> > > >
> > > > +typedef struct SVQElement {
> > > > + VirtQueueElement elem;
> > > > + void **in_sg_stash;
> > > > + void **out_sg_stash;
> > > > +} SVQElement;
> > > > +
> > > > /* Shadow virtqueue to relay notifications */
> > > > typedef struct VhostShadowVirtqueue {
> > > > /* Shadow vring */
> > > > @@ -46,8 +53,11 @@ typedef struct VhostShadowVirtqueue {
> > > > /* Virtio device */
> > > > VirtIODevice *vdev;
> > > >
> > > > + /* IOVA mapping if used */
> > > > + VhostIOVATree *iova_map;
> > > > +
> > > > /* Map for returning guest's descriptors */
> > > > - VirtQueueElement **ring_id_maps;
> > > > + SVQElement **ring_id_maps;
> > > >
> > > > /* Next head to expose to device */
> > > > uint16_t avail_idx_shadow;
> > > > @@ -79,13 +89,6 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > > > continue;
> > > >
> > > > case VIRTIO_F_ACCESS_PLATFORM:
> > > > - /* SVQ needs this feature disabled. Can't continue */
> > > > - if (*dev_features & BIT_ULL(b)) {
> > > > - clear_bit(b, dev_features);
> > > > - r = false;
> > > > - }
> > > > - break;
> > > > -
> > > > case VIRTIO_F_VERSION_1:
> > > > /* SVQ needs this feature, so can't continue */
> > > > if (!(*dev_features & BIT_ULL(b))) {
> > > > @@ -126,6 +129,64 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> > > > }
> > > > }
> > > >
> > > > +static void vhost_svq_stash_addr(void ***stash, const struct iovec *iov,
> > > > + size_t num)
> > > > +{
> > > > + size_t i;
> > > > +
> > > > + if (num == 0) {
> > > > + return;
> > > > + }
> > > > +
> > > > + *stash = g_new(void *, num);
> > > > + for (i = 0; i < num; ++i) {
> > > > + (*stash)[i] = iov[i].iov_base;
> > > > + }
> > > > +}
> > > > +
> > > > +static void vhost_svq_unstash_addr(void **stash, struct iovec *iov, size_t num)
> > > > +{
> > > > + size_t i;
> > > > +
> > > > + if (num == 0) {
> > > > + return;
> > > > + }
> > > > +
> > > > + for (i = 0; i < num; ++i) {
> > > > + iov[i].iov_base = stash[i];
> > > > + }
> > > > + g_free(stash);
> > > > +}
> > > > +
> > > > +static void vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> > > > + struct iovec *iovec, size_t num)
> > > > +{
> > > > + size_t i;
> > > > +
> > > > + for (i = 0; i < num; ++i) {
> > > > + VhostDMAMap needle = {
> > > > + .translated_addr = iovec[i].iov_base,
> > > > + .size = iovec[i].iov_len,
> > > > + };
> > > > + size_t off;
> > > > +
> > > > + const VhostDMAMap *map = vhost_iova_tree_find_iova(svq->iova_map,
> > > > + &needle);
> > >
> > >
> > > Is it possible that we end up with more than one maps here?
> > >
> >
> > Actually it is possible, since there is no guarantee that one
> > descriptor (or indirect descriptor) maps exactly to one iov. It could
> > map to many if qemu vaddr is not contiguous but GPA + size is. This is
> > something that must be fixed for the next revision, so thanks for
> > pointing it out!
> >
> > Taking that into account, the condition that svq vring avail_idx -
> > used_idx was always less or equal than guest's vring avail_idx -
> > used_idx is not true anymore. Checking for that before adding buffers
> > to SVQ is the easy part, but how could we recover in that case?
> >
> > I think that the easy solution is to check for more available buffers
> > unconditionally at the end of vhost_svq_handle_call, which handles the
> > SVQ used and is supposed to make more room for available buffers. So
> > vhost_handle_guest_kick would not check if eventfd is set or not
> > anymore.
> >
> > Would that make sense?
>
> Yes, I think it should work.
Btw, I wonder how to handle indirect descriptors. SVQ doesn't use
indirect descriptors for now, but it looks like a must otherwise we
may end up SVQ is full before VQ.
It looks to me an easy way is to always use indirect descriptors if #sg >= 2?
Thanks
>
> Thanks
>
> >
> > Thanks!
> >
> > >
> > > > + /*
> > > > + * Map cannot be NULL since iova map contains all guest space and
> > > > + * qemu already has a physical address mapped
> > > > + */
> > > > + assert(map);
> > > > +
> > > > + /*
> > > > + * Map->iova chunk size is ignored. What to do if descriptor
> > > > + * (addr, size) does not fit is delegated to the device.
> > > > + */
> > > > + off = needle.translated_addr - map->translated_addr;
> > > > + iovec[i].iov_base = (void *)(map->iova + off);
> > > > + }
> > > > +}
> > > > +
> > > > static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > > > const struct iovec *iovec,
> > > > size_t num, bool more_descs, bool write)
> > > > @@ -156,8 +217,9 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > > > }
> > > >
> > > > static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > - VirtQueueElement *elem)
> > > > + SVQElement *svq_elem)
> > > > {
> > > > + VirtQueueElement *elem = &svq_elem->elem;
> > > > int head;
> > > > unsigned avail_idx;
> > > > vring_avail_t *avail = svq->vring.avail;
> > > > @@ -167,6 +229,12 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > /* We need some descriptors here */
> > > > assert(elem->out_num || elem->in_num);
> > > >
> > > > + vhost_svq_stash_addr(&svq_elem->in_sg_stash, elem->in_sg, elem->in_num);
> > > > + vhost_svq_stash_addr(&svq_elem->out_sg_stash, elem->out_sg, elem->out_num);
> > >
> > >
> > > I wonder if we can solve the trick like stash and unstash with a
> > > dedicated sgs in svq_elem, instead of reusing the elem.
> > >
> >
> > Actually yes, it would be way simpler to use a new sgs array in
> > svq_elem. I will change that.
> >
> > Thanks!
> >
> > > Thanks
> > >
> > >
> > > > +
> > > > + vhost_svq_translate_addr(svq, elem->in_sg, elem->in_num);
> > > > + vhost_svq_translate_addr(svq, elem->out_sg, elem->out_num);
> > > > +
> > > > vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > > > elem->in_num > 0, false);
> > > > vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > > > @@ -187,7 +255,7 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > >
> > > > }
> > > >
> > > > -static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > > > +static void vhost_svq_add(VhostShadowVirtqueue *svq, SVQElement *elem)
> > > > {
> > > > unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > > >
> > > > @@ -221,7 +289,7 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> > > > }
> > > >
> > > > while (true) {
> > > > - VirtQueueElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > > > + SVQElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > > > if (!elem) {
> > > > break;
> > > > }
> > > > @@ -247,7 +315,7 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> > > > return svq->used_idx != svq->shadow_used_idx;
> > > > }
> > > >
> > > > -static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > +static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > {
> > > > vring_desc_t *descs = svq->vring.desc;
> > > > const vring_used_t *used = svq->vring.used;
> > > > @@ -279,7 +347,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > descs[used_elem.id].next = svq->free_head;
> > > > svq->free_head = used_elem.id;
> > > >
> > > > - svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > > > + svq->ring_id_maps[used_elem.id]->elem.len = used_elem.len;
> > > > return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > > > }
> > > >
> > > > @@ -296,12 +364,19 @@ static void vhost_svq_handle_call_no_test(EventNotifier *n)
> > > >
> > > > vhost_svq_set_notification(svq, false);
> > > > while (true) {
> > > > - g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> > > > - if (!elem) {
> > > > + g_autofree SVQElement *svq_elem = vhost_svq_get_buf(svq);
> > > > + VirtQueueElement *elem;
> > > > + if (!svq_elem) {
> > > > break;
> > > > }
> > > >
> > > > assert(i < svq->vring.num);
> > > > + elem = &svq_elem->elem;
> > > > +
> > > > + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> > > > + elem->in_num);
> > > > + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> > > > + elem->out_num);
> > > > virtqueue_fill(vq, elem, elem->len, i++);
> > > > }
> > > >
> > > > @@ -451,14 +526,24 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > event_notifier_set_handler(&svq->host_notifier, NULL);
> > > >
> > > > for (i = 0; i < svq->vring.num; ++i) {
> > > > - g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> > > > + g_autofree SVQElement *svq_elem = svq->ring_id_maps[i];
> > > > + VirtQueueElement *elem;
> > > > +
> > > > + if (!svq_elem) {
> > > > + continue;
> > > > + }
> > > > +
> > > > + elem = &svq_elem->elem;
> > > > + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> > > > + elem->in_num);
> > > > + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> > > > + elem->out_num);
> > > > +
> > > > /*
> > > > * Although the doc says we must unpop in order, it's ok to unpop
> > > > * everything.
> > > > */
> > > > - if (elem) {
> > > > - virtqueue_unpop(svq->vq, elem, elem->len);
> > > > - }
> > > > + virtqueue_unpop(svq->vq, elem, elem->len);
> > > > }
> > > > }
> > > >
> > > > @@ -466,7 +551,8 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > > > * methods and file descriptors.
> > > > */
> > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > > + VhostIOVATree *iova_map)
> > > > {
> > > > int vq_idx = dev->vq_index + idx;
> > > > unsigned num = virtio_queue_get_num(dev->vdev, vq_idx);
> > > > @@ -500,11 +586,13 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > memset(svq->vring.desc, 0, driver_size);
> > > > svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> > > > memset(svq->vring.used, 0, device_size);
> > > > + svq->iova_map = iova_map;
> > > > +
> > > > for (i = 0; i < num - 1; i++) {
> > > > svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > > > }
> > > >
> > > > - svq->ring_id_maps = g_new0(VirtQueueElement *, num);
> > > > + svq->ring_id_maps = g_new0(SVQElement *, num);
> > > > event_notifier_set_handler(&svq->call_notifier,
> > > > vhost_svq_handle_call);
> > > > return g_steal_pointer(&svq);
> > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > index a9c680b487..f5a12fee9d 100644
> > > > --- a/hw/virtio/vhost-vdpa.c
> > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > @@ -176,6 +176,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > > vaddr, section->readonly);
> > > >
> > > > llsize = int128_sub(llend, int128_make64(iova));
> > > > + if (v->shadow_vqs_enabled) {
> > > > + VhostDMAMap mem_region = {
> > > > + .translated_addr = vaddr,
> > > > + .size = int128_get64(llsize) - 1,
> > > > + .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> > > > + };
> > > > +
> > > > + int r = vhost_iova_tree_alloc(v->iova_map, &mem_region);
> > > > + assert(r == VHOST_DMA_MAP_OK);
> > > > +
> > > > + iova = mem_region.iova;
> > > > + }
> > > >
> > > > ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> > > > vaddr, section->readonly);
> > > > @@ -754,6 +766,23 @@ static bool vhost_vdpa_force_iommu(struct vhost_dev *dev)
> > > > return true;
> > > > }
> > > >
> > > > +static int vhost_vdpa_get_iova_range(struct vhost_dev *dev,
> > > > + hwaddr *first, hwaddr *last)
> > > > +{
> > > > + int ret;
> > > > + struct vhost_vdpa_iova_range range;
> > > > +
> > > > + ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_IOVA_RANGE, &range);
> > > > + if (ret != 0) {
> > > > + return ret;
> > > > + }
> > > > +
> > > > + *first = range.first;
> > > > + *last = range.last;
> > > > + trace_vhost_vdpa_get_iova_range(dev, *first, *last);
> > > > + return ret;
> > > > +}
> > > > +
> > > > /**
> > > > * Maps QEMU vaddr memory to device in a suitable way for shadow virtqueue:
> > > > * - It always reference qemu memory address, not guest's memory.
> > > > @@ -881,6 +910,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
> > > > static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > {
> > > > struct vhost_dev *hdev = v->dev;
> > > > + hwaddr iova_first, iova_last;
> > > > unsigned n;
> > > > int r;
> > > >
> > > > @@ -894,7 +924,7 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > /* Allocate resources */
> > > > assert(v->shadow_vqs->len == 0);
> > > > for (n = 0; n < hdev->nvqs; ++n) {
> > > > - VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
> > > > + VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n, v->iova_map);
> > > > if (unlikely(!svq)) {
> > > > g_ptr_array_set_size(v->shadow_vqs, 0);
> > > > return 0;
> > > > @@ -903,6 +933,8 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > }
> > > > }
> > > >
> > > > + r = vhost_vdpa_get_iova_range(hdev, &iova_first, &iova_last);
> > > > + assert(r == 0);
> > > > r = vhost_vdpa_vring_pause(hdev);
> > > > assert(r == 0);
> > > >
> > > > @@ -913,6 +945,12 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > }
> > > > }
> > > >
> > > > + memory_listener_unregister(&v->listener);
> > > > + if (vhost_vdpa_dma_unmap(v, iova_first,
> > > > + (iova_last - iova_first) & TARGET_PAGE_MASK)) {
> > > > + error_report("Fail to invalidate device iotlb");
> > > > + }
> > > > +
> > > > /* Reset device so it can be configured */
> > > > r = vhost_vdpa_dev_start(hdev, false);
> > > > assert(r == 0);
> > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > > index 8ed19e9d0c..650e521e35 100644
> > > > --- a/hw/virtio/trace-events
> > > > +++ b/hw/virtio/trace-events
> > > > @@ -52,6 +52,7 @@ vhost_vdpa_set_vring_call(void *dev, unsigned int index, int fd) "dev: %p index:
> > > > vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRIx64
> > > > vhost_vdpa_set_owner(void *dev) "dev: %p"
> > > > vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> > > > +vhost_vdpa_get_iova_range(void *dev, uint64_t first, uint64_t last) "dev: %p first: 0x%"PRIx64" last: 0x%"PRIx64
> > > >
> > > > # virtio.c
> > > > virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > >
> >
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
[not found] ` <CAJaqyWe6R_32Se75XF3+NUZyiWr+cLYQ_86LExmom-vCRT9G0g@mail.gmail.com>
@ 2021-10-20 9:03 ` Jason Wang
[not found] ` <CAJaqyWd9LjpA5w2f1s+pNmdNjYPvcbJgPqY+Qv1fWb+6LPPzAg@mail.gmail.com>
0 siblings, 1 reply; 27+ messages in thread
From: Jason Wang @ 2021-10-20 9:03 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
On Wed, Oct 20, 2021 at 2:52 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Oct 20, 2021 at 4:07 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Oct 20, 2021 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, Oct 19, 2021 at 6:29 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Tue, Oct 19, 2021 at 11:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > >
> > > > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> > > > > > Use translations added in VhostIOVATree in SVQ.
> > > > > >
> > > > > > Now every element needs to store the previous address also, so VirtQueue
> > > > > > can consume the elements properly. This adds a little overhead per VQ
> > > > > > element, having to allocate more memory to stash them. As a possible
> > > > > > optimization, this allocation could be avoided if the descriptor is not
> > > > > > a chain but a single one, but this is left undone.
> > > > > >
> > > > > > TODO: iova range should be queried before, and add logic to fail when
> > > > > > GPA is outside of its range and memory listener or svq add it.
> > > > > >
> > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > ---
> > > > > > hw/virtio/vhost-shadow-virtqueue.h | 4 +-
> > > > > > hw/virtio/vhost-shadow-virtqueue.c | 130 ++++++++++++++++++++++++-----
> > > > > > hw/virtio/vhost-vdpa.c | 40 ++++++++-
> > > > > > hw/virtio/trace-events | 1 +
> > > > > > 4 files changed, 152 insertions(+), 23 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > index b7baa424a7..a0e6b5267a 100644
> > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > @@ -11,6 +11,7 @@
> > > > > > #define VHOST_SHADOW_VIRTQUEUE_H
> > > > > >
> > > > > > #include "hw/virtio/vhost.h"
> > > > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > > > >
> > > > > > typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > > >
> > > > > > @@ -28,7 +29,8 @@ bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> > > > > > void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > > VhostShadowVirtqueue *svq);
> > > > > >
> > > > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
> > > > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > > > > + VhostIOVATree *iova_map);
> > > > > >
> > > > > > void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > > > >
> > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > index 2fd0bab75d..9db538547e 100644
> > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > @@ -11,12 +11,19 @@
> > > > > > #include "hw/virtio/vhost-shadow-virtqueue.h"
> > > > > > #include "hw/virtio/vhost.h"
> > > > > > #include "hw/virtio/virtio-access.h"
> > > > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > > > >
> > > > > > #include "standard-headers/linux/vhost_types.h"
> > > > > >
> > > > > > #include "qemu/error-report.h"
> > > > > > #include "qemu/main-loop.h"
> > > > > >
> > > > > > +typedef struct SVQElement {
> > > > > > + VirtQueueElement elem;
> > > > > > + void **in_sg_stash;
> > > > > > + void **out_sg_stash;
> > > > > > +} SVQElement;
> > > > > > +
> > > > > > /* Shadow virtqueue to relay notifications */
> > > > > > typedef struct VhostShadowVirtqueue {
> > > > > > /* Shadow vring */
> > > > > > @@ -46,8 +53,11 @@ typedef struct VhostShadowVirtqueue {
> > > > > > /* Virtio device */
> > > > > > VirtIODevice *vdev;
> > > > > >
> > > > > > + /* IOVA mapping if used */
> > > > > > + VhostIOVATree *iova_map;
> > > > > > +
> > > > > > /* Map for returning guest's descriptors */
> > > > > > - VirtQueueElement **ring_id_maps;
> > > > > > + SVQElement **ring_id_maps;
> > > > > >
> > > > > > /* Next head to expose to device */
> > > > > > uint16_t avail_idx_shadow;
> > > > > > @@ -79,13 +89,6 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > > > > > continue;
> > > > > >
> > > > > > case VIRTIO_F_ACCESS_PLATFORM:
> > > > > > - /* SVQ needs this feature disabled. Can't continue */
> > > > > > - if (*dev_features & BIT_ULL(b)) {
> > > > > > - clear_bit(b, dev_features);
> > > > > > - r = false;
> > > > > > - }
> > > > > > - break;
> > > > > > -
> > > > > > case VIRTIO_F_VERSION_1:
> > > > > > /* SVQ needs this feature, so can't continue */
> > > > > > if (!(*dev_features & BIT_ULL(b))) {
> > > > > > @@ -126,6 +129,64 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > +static void vhost_svq_stash_addr(void ***stash, const struct iovec *iov,
> > > > > > + size_t num)
> > > > > > +{
> > > > > > + size_t i;
> > > > > > +
> > > > > > + if (num == 0) {
> > > > > > + return;
> > > > > > + }
> > > > > > +
> > > > > > + *stash = g_new(void *, num);
> > > > > > + for (i = 0; i < num; ++i) {
> > > > > > + (*stash)[i] = iov[i].iov_base;
> > > > > > + }
> > > > > > +}
> > > > > > +
> > > > > > +static void vhost_svq_unstash_addr(void **stash, struct iovec *iov, size_t num)
> > > > > > +{
> > > > > > + size_t i;
> > > > > > +
> > > > > > + if (num == 0) {
> > > > > > + return;
> > > > > > + }
> > > > > > +
> > > > > > + for (i = 0; i < num; ++i) {
> > > > > > + iov[i].iov_base = stash[i];
> > > > > > + }
> > > > > > + g_free(stash);
> > > > > > +}
> > > > > > +
> > > > > > +static void vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> > > > > > + struct iovec *iovec, size_t num)
> > > > > > +{
> > > > > > + size_t i;
> > > > > > +
> > > > > > + for (i = 0; i < num; ++i) {
> > > > > > + VhostDMAMap needle = {
> > > > > > + .translated_addr = iovec[i].iov_base,
> > > > > > + .size = iovec[i].iov_len,
> > > > > > + };
> > > > > > + size_t off;
> > > > > > +
> > > > > > + const VhostDMAMap *map = vhost_iova_tree_find_iova(svq->iova_map,
> > > > > > + &needle);
> > > > >
> > > > >
> > > > > Is it possible that we end up with more than one maps here?
> > > > >
> > > >
> > > > Actually it is possible, since there is no guarantee that one
> > > > descriptor (or indirect descriptor) maps exactly to one iov. It could
> > > > map to many if qemu vaddr is not contiguous but GPA + size is. This is
> > > > something that must be fixed for the next revision, so thanks for
> > > > pointing it out!
> > > >
> > > > Taking that into account, the condition that svq vring avail_idx -
> > > > used_idx was always less or equal than guest's vring avail_idx -
> > > > used_idx is not true anymore. Checking for that before adding buffers
> > > > to SVQ is the easy part, but how could we recover in that case?
> > > >
> > > > I think that the easy solution is to check for more available buffers
> > > > unconditionally at the end of vhost_svq_handle_call, which handles the
> > > > SVQ used and is supposed to make more room for available buffers. So
> > > > vhost_handle_guest_kick would not check if eventfd is set or not
> > > > anymore.
> > > >
> > > > Would that make sense?
> > >
> > > Yes, I think it should work.
> >
> > Btw, I wonder how to handle indirect descriptors. SVQ doesn't use
> > indirect descriptors for now, but it looks like a must otherwise we
> > may end up SVQ is full before VQ.
> >
>
> We can get to that situation without indirect too, if a single
> descriptor maps to more than one sg buffer. The next revision is going
> to control that too.
>
> > It looks to me an easy way is to always use indirect descriptors if #sg >= 2?
> >
>
> I will use that, but that does not solve the case where a descriptor
> maps to > 1 different buffers in qemu vaddr.
Right, so we need to deal with the case when SVQ is out of space.
> So I think that some
> check after marking descriptors as used is a must somehow.
I thought it should be before processing the available buffer? It's
the guest driver that make sure there's sufficient space for used
ring?
Thanks
>
>
> > Thanks
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks!
> > > >
> > > > >
> > > > > > + /*
> > > > > > + * Map cannot be NULL since iova map contains all guest space and
> > > > > > + * qemu already has a physical address mapped
> > > > > > + */
> > > > > > + assert(map);
> > > > > > +
> > > > > > + /*
> > > > > > + * Map->iova chunk size is ignored. What to do if descriptor
> > > > > > + * (addr, size) does not fit is delegated to the device.
> > > > > > + */
> > > > > > + off = needle.translated_addr - map->translated_addr;
> > > > > > + iovec[i].iov_base = (void *)(map->iova + off);
> > > > > > + }
> > > > > > +}
> > > > > > +
> > > > > > static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > > > > > const struct iovec *iovec,
> > > > > > size_t num, bool more_descs, bool write)
> > > > > > @@ -156,8 +217,9 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > > > > > }
> > > > > >
> > > > > > static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > > > - VirtQueueElement *elem)
> > > > > > + SVQElement *svq_elem)
> > > > > > {
> > > > > > + VirtQueueElement *elem = &svq_elem->elem;
> > > > > > int head;
> > > > > > unsigned avail_idx;
> > > > > > vring_avail_t *avail = svq->vring.avail;
> > > > > > @@ -167,6 +229,12 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > > > /* We need some descriptors here */
> > > > > > assert(elem->out_num || elem->in_num);
> > > > > >
> > > > > > + vhost_svq_stash_addr(&svq_elem->in_sg_stash, elem->in_sg, elem->in_num);
> > > > > > + vhost_svq_stash_addr(&svq_elem->out_sg_stash, elem->out_sg, elem->out_num);
> > > > >
> > > > >
> > > > > I wonder if we can solve the trick like stash and unstash with a
> > > > > dedicated sgs in svq_elem, instead of reusing the elem.
> > > > >
> > > >
> > > > Actually yes, it would be way simpler to use a new sgs array in
> > > > svq_elem. I will change that.
> > > >
> > > > Thanks!
> > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > > +
> > > > > > + vhost_svq_translate_addr(svq, elem->in_sg, elem->in_num);
> > > > > > + vhost_svq_translate_addr(svq, elem->out_sg, elem->out_num);
> > > > > > +
> > > > > > vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > > > > > elem->in_num > 0, false);
> > > > > > vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > > > > > @@ -187,7 +255,7 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > > >
> > > > > > }
> > > > > >
> > > > > > -static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > > > > > +static void vhost_svq_add(VhostShadowVirtqueue *svq, SVQElement *elem)
> > > > > > {
> > > > > > unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > > > > >
> > > > > > @@ -221,7 +289,7 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> > > > > > }
> > > > > >
> > > > > > while (true) {
> > > > > > - VirtQueueElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > > > > > + SVQElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > > > > > if (!elem) {
> > > > > > break;
> > > > > > }
> > > > > > @@ -247,7 +315,7 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> > > > > > return svq->used_idx != svq->shadow_used_idx;
> > > > > > }
> > > > > >
> > > > > > -static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > > > +static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > > > {
> > > > > > vring_desc_t *descs = svq->vring.desc;
> > > > > > const vring_used_t *used = svq->vring.used;
> > > > > > @@ -279,7 +347,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > > > descs[used_elem.id].next = svq->free_head;
> > > > > > svq->free_head = used_elem.id;
> > > > > >
> > > > > > - svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > > > > > + svq->ring_id_maps[used_elem.id]->elem.len = used_elem.len;
> > > > > > return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > > > > > }
> > > > > >
> > > > > > @@ -296,12 +364,19 @@ static void vhost_svq_handle_call_no_test(EventNotifier *n)
> > > > > >
> > > > > > vhost_svq_set_notification(svq, false);
> > > > > > while (true) {
> > > > > > - g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> > > > > > - if (!elem) {
> > > > > > + g_autofree SVQElement *svq_elem = vhost_svq_get_buf(svq);
> > > > > > + VirtQueueElement *elem;
> > > > > > + if (!svq_elem) {
> > > > > > break;
> > > > > > }
> > > > > >
> > > > > > assert(i < svq->vring.num);
> > > > > > + elem = &svq_elem->elem;
> > > > > > +
> > > > > > + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> > > > > > + elem->in_num);
> > > > > > + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> > > > > > + elem->out_num);
> > > > > > virtqueue_fill(vq, elem, elem->len, i++);
> > > > > > }
> > > > > >
> > > > > > @@ -451,14 +526,24 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > > event_notifier_set_handler(&svq->host_notifier, NULL);
> > > > > >
> > > > > > for (i = 0; i < svq->vring.num; ++i) {
> > > > > > - g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> > > > > > + g_autofree SVQElement *svq_elem = svq->ring_id_maps[i];
> > > > > > + VirtQueueElement *elem;
> > > > > > +
> > > > > > + if (!svq_elem) {
> > > > > > + continue;
> > > > > > + }
> > > > > > +
> > > > > > + elem = &svq_elem->elem;
> > > > > > + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> > > > > > + elem->in_num);
> > > > > > + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> > > > > > + elem->out_num);
> > > > > > +
> > > > > > /*
> > > > > > * Although the doc says we must unpop in order, it's ok to unpop
> > > > > > * everything.
> > > > > > */
> > > > > > - if (elem) {
> > > > > > - virtqueue_unpop(svq->vq, elem, elem->len);
> > > > > > - }
> > > > > > + virtqueue_unpop(svq->vq, elem, elem->len);
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > @@ -466,7 +551,8 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > > * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > > > > > * methods and file descriptors.
> > > > > > */
> > > > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > > > > + VhostIOVATree *iova_map)
> > > > > > {
> > > > > > int vq_idx = dev->vq_index + idx;
> > > > > > unsigned num = virtio_queue_get_num(dev->vdev, vq_idx);
> > > > > > @@ -500,11 +586,13 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > > > memset(svq->vring.desc, 0, driver_size);
> > > > > > svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> > > > > > memset(svq->vring.used, 0, device_size);
> > > > > > + svq->iova_map = iova_map;
> > > > > > +
> > > > > > for (i = 0; i < num - 1; i++) {
> > > > > > svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > > > > > }
> > > > > >
> > > > > > - svq->ring_id_maps = g_new0(VirtQueueElement *, num);
> > > > > > + svq->ring_id_maps = g_new0(SVQElement *, num);
> > > > > > event_notifier_set_handler(&svq->call_notifier,
> > > > > > vhost_svq_handle_call);
> > > > > > return g_steal_pointer(&svq);
> > > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > > > index a9c680b487..f5a12fee9d 100644
> > > > > > --- a/hw/virtio/vhost-vdpa.c
> > > > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > > > @@ -176,6 +176,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > > > > vaddr, section->readonly);
> > > > > >
> > > > > > llsize = int128_sub(llend, int128_make64(iova));
> > > > > > + if (v->shadow_vqs_enabled) {
> > > > > > + VhostDMAMap mem_region = {
> > > > > > + .translated_addr = vaddr,
> > > > > > + .size = int128_get64(llsize) - 1,
> > > > > > + .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> > > > > > + };
> > > > > > +
> > > > > > + int r = vhost_iova_tree_alloc(v->iova_map, &mem_region);
> > > > > > + assert(r == VHOST_DMA_MAP_OK);
> > > > > > +
> > > > > > + iova = mem_region.iova;
> > > > > > + }
> > > > > >
> > > > > > ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> > > > > > vaddr, section->readonly);
> > > > > > @@ -754,6 +766,23 @@ static bool vhost_vdpa_force_iommu(struct vhost_dev *dev)
> > > > > > return true;
> > > > > > }
> > > > > >
> > > > > > +static int vhost_vdpa_get_iova_range(struct vhost_dev *dev,
> > > > > > + hwaddr *first, hwaddr *last)
> > > > > > +{
> > > > > > + int ret;
> > > > > > + struct vhost_vdpa_iova_range range;
> > > > > > +
> > > > > > + ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_IOVA_RANGE, &range);
> > > > > > + if (ret != 0) {
> > > > > > + return ret;
> > > > > > + }
> > > > > > +
> > > > > > + *first = range.first;
> > > > > > + *last = range.last;
> > > > > > + trace_vhost_vdpa_get_iova_range(dev, *first, *last);
> > > > > > + return ret;
> > > > > > +}
> > > > > > +
> > > > > > /**
> > > > > > * Maps QEMU vaddr memory to device in a suitable way for shadow virtqueue:
> > > > > > * - It always reference qemu memory address, not guest's memory.
> > > > > > @@ -881,6 +910,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
> > > > > > static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > {
> > > > > > struct vhost_dev *hdev = v->dev;
> > > > > > + hwaddr iova_first, iova_last;
> > > > > > unsigned n;
> > > > > > int r;
> > > > > >
> > > > > > @@ -894,7 +924,7 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > /* Allocate resources */
> > > > > > assert(v->shadow_vqs->len == 0);
> > > > > > for (n = 0; n < hdev->nvqs; ++n) {
> > > > > > - VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
> > > > > > + VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n, v->iova_map);
> > > > > > if (unlikely(!svq)) {
> > > > > > g_ptr_array_set_size(v->shadow_vqs, 0);
> > > > > > return 0;
> > > > > > @@ -903,6 +933,8 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > + r = vhost_vdpa_get_iova_range(hdev, &iova_first, &iova_last);
> > > > > > + assert(r == 0);
> > > > > > r = vhost_vdpa_vring_pause(hdev);
> > > > > > assert(r == 0);
> > > > > >
> > > > > > @@ -913,6 +945,12 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > + memory_listener_unregister(&v->listener);
> > > > > > + if (vhost_vdpa_dma_unmap(v, iova_first,
> > > > > > + (iova_last - iova_first) & TARGET_PAGE_MASK)) {
> > > > > > + error_report("Fail to invalidate device iotlb");
> > > > > > + }
> > > > > > +
> > > > > > /* Reset device so it can be configured */
> > > > > > r = vhost_vdpa_dev_start(hdev, false);
> > > > > > assert(r == 0);
> > > > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > > > > index 8ed19e9d0c..650e521e35 100644
> > > > > > --- a/hw/virtio/trace-events
> > > > > > +++ b/hw/virtio/trace-events
> > > > > > @@ -52,6 +52,7 @@ vhost_vdpa_set_vring_call(void *dev, unsigned int index, int fd) "dev: %p index:
> > > > > > vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRIx64
> > > > > > vhost_vdpa_set_owner(void *dev) "dev: %p"
> > > > > > vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> > > > > > +vhost_vdpa_get_iova_range(void *dev, uint64_t first, uint64_t last) "dev: %p first: 0x%"PRIx64" last: 0x%"PRIx64
> > > > > >
> > > > > > # virtio.c
> > > > > > virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > > > >
> > > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
[not found] ` <CAJaqyWd9LjpA5w2f1s+pNmdNjYPvcbJgPqY+Qv1fWb+6LPPzAg@mail.gmail.com>
@ 2021-10-21 2:38 ` Jason Wang
2021-10-26 4:32 ` Jason Wang
1 sibling, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-21 2:38 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
On Wed, Oct 20, 2021 at 7:57 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Oct 20, 2021 at 11:03 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Oct 20, 2021 at 2:52 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Oct 20, 2021 at 4:07 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 20, 2021 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Tue, Oct 19, 2021 at 6:29 PM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Oct 19, 2021 at 11:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> > > > > > > > Use translations added in VhostIOVATree in SVQ.
> > > > > > > >
> > > > > > > > Now every element needs to store the previous address also, so VirtQueue
> > > > > > > > can consume the elements properly. This adds a little overhead per VQ
> > > > > > > > element, having to allocate more memory to stash them. As a possible
> > > > > > > > optimization, this allocation could be avoided if the descriptor is not
> > > > > > > > a chain but a single one, but this is left undone.
> > > > > > > >
> > > > > > > > TODO: iova range should be queried before, and add logic to fail when
> > > > > > > > GPA is outside of its range and memory listener or svq add it.
> > > > > > > >
> > > > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > > > ---
> > > > > > > > hw/virtio/vhost-shadow-virtqueue.h | 4 +-
> > > > > > > > hw/virtio/vhost-shadow-virtqueue.c | 130 ++++++++++++++++++++++++-----
> > > > > > > > hw/virtio/vhost-vdpa.c | 40 ++++++++-
> > > > > > > > hw/virtio/trace-events | 1 +
> > > > > > > > 4 files changed, 152 insertions(+), 23 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > > index b7baa424a7..a0e6b5267a 100644
> > > > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > > @@ -11,6 +11,7 @@
> > > > > > > > #define VHOST_SHADOW_VIRTQUEUE_H
> > > > > > > >
> > > > > > > > #include "hw/virtio/vhost.h"
> > > > > > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > > > > > >
> > > > > > > > typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > > > > >
> > > > > > > > @@ -28,7 +29,8 @@ bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> > > > > > > > void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > > > > VhostShadowVirtqueue *svq);
> > > > > > > >
> > > > > > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
> > > > > > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > > > > > > + VhostIOVATree *iova_map);
> > > > > > > >
> > > > > > > > void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > > > > > >
> > > > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > > index 2fd0bab75d..9db538547e 100644
> > > > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > > @@ -11,12 +11,19 @@
> > > > > > > > #include "hw/virtio/vhost-shadow-virtqueue.h"
> > > > > > > > #include "hw/virtio/vhost.h"
> > > > > > > > #include "hw/virtio/virtio-access.h"
> > > > > > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > > > > > >
> > > > > > > > #include "standard-headers/linux/vhost_types.h"
> > > > > > > >
> > > > > > > > #include "qemu/error-report.h"
> > > > > > > > #include "qemu/main-loop.h"
> > > > > > > >
> > > > > > > > +typedef struct SVQElement {
> > > > > > > > + VirtQueueElement elem;
> > > > > > > > + void **in_sg_stash;
> > > > > > > > + void **out_sg_stash;
> > > > > > > > +} SVQElement;
> > > > > > > > +
> > > > > > > > /* Shadow virtqueue to relay notifications */
> > > > > > > > typedef struct VhostShadowVirtqueue {
> > > > > > > > /* Shadow vring */
> > > > > > > > @@ -46,8 +53,11 @@ typedef struct VhostShadowVirtqueue {
> > > > > > > > /* Virtio device */
> > > > > > > > VirtIODevice *vdev;
> > > > > > > >
> > > > > > > > + /* IOVA mapping if used */
> > > > > > > > + VhostIOVATree *iova_map;
> > > > > > > > +
> > > > > > > > /* Map for returning guest's descriptors */
> > > > > > > > - VirtQueueElement **ring_id_maps;
> > > > > > > > + SVQElement **ring_id_maps;
> > > > > > > >
> > > > > > > > /* Next head to expose to device */
> > > > > > > > uint16_t avail_idx_shadow;
> > > > > > > > @@ -79,13 +89,6 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > > > > > > > continue;
> > > > > > > >
> > > > > > > > case VIRTIO_F_ACCESS_PLATFORM:
> > > > > > > > - /* SVQ needs this feature disabled. Can't continue */
> > > > > > > > - if (*dev_features & BIT_ULL(b)) {
> > > > > > > > - clear_bit(b, dev_features);
> > > > > > > > - r = false;
> > > > > > > > - }
> > > > > > > > - break;
> > > > > > > > -
> > > > > > > > case VIRTIO_F_VERSION_1:
> > > > > > > > /* SVQ needs this feature, so can't continue */
> > > > > > > > if (!(*dev_features & BIT_ULL(b))) {
> > > > > > > > @@ -126,6 +129,64 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static void vhost_svq_stash_addr(void ***stash, const struct iovec *iov,
> > > > > > > > + size_t num)
> > > > > > > > +{
> > > > > > > > + size_t i;
> > > > > > > > +
> > > > > > > > + if (num == 0) {
> > > > > > > > + return;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + *stash = g_new(void *, num);
> > > > > > > > + for (i = 0; i < num; ++i) {
> > > > > > > > + (*stash)[i] = iov[i].iov_base;
> > > > > > > > + }
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vhost_svq_unstash_addr(void **stash, struct iovec *iov, size_t num)
> > > > > > > > +{
> > > > > > > > + size_t i;
> > > > > > > > +
> > > > > > > > + if (num == 0) {
> > > > > > > > + return;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + for (i = 0; i < num; ++i) {
> > > > > > > > + iov[i].iov_base = stash[i];
> > > > > > > > + }
> > > > > > > > + g_free(stash);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> > > > > > > > + struct iovec *iovec, size_t num)
> > > > > > > > +{
> > > > > > > > + size_t i;
> > > > > > > > +
> > > > > > > > + for (i = 0; i < num; ++i) {
> > > > > > > > + VhostDMAMap needle = {
> > > > > > > > + .translated_addr = iovec[i].iov_base,
> > > > > > > > + .size = iovec[i].iov_len,
> > > > > > > > + };
> > > > > > > > + size_t off;
> > > > > > > > +
> > > > > > > > + const VhostDMAMap *map = vhost_iova_tree_find_iova(svq->iova_map,
> > > > > > > > + &needle);
> > > > > > >
> > > > > > >
> > > > > > > Is it possible that we end up with more than one maps here?
> > > > > > >
> > > > > >
> > > > > > Actually it is possible, since there is no guarantee that one
> > > > > > descriptor (or indirect descriptor) maps exactly to one iov. It could
> > > > > > map to many if qemu vaddr is not contiguous but GPA + size is. This is
> > > > > > something that must be fixed for the next revision, so thanks for
> > > > > > pointing it out!
> > > > > >
> > > > > > Taking that into account, the condition that svq vring avail_idx -
> > > > > > used_idx was always less or equal than guest's vring avail_idx -
> > > > > > used_idx is not true anymore. Checking for that before adding buffers
> > > > > > to SVQ is the easy part, but how could we recover in that case?
> > > > > >
> > > > > > I think that the easy solution is to check for more available buffers
> > > > > > unconditionally at the end of vhost_svq_handle_call, which handles the
> > > > > > SVQ used and is supposed to make more room for available buffers. So
> > > > > > vhost_handle_guest_kick would not check if eventfd is set or not
> > > > > > anymore.
> > > > > >
> > > > > > Would that make sense?
> > > > >
> > > > > Yes, I think it should work.
> > > >
> > > > Btw, I wonder how to handle indirect descriptors. SVQ doesn't use
> > > > indirect descriptors for now, but it looks like a must otherwise we
> > > > may end up SVQ is full before VQ.
> > > >
> > >
> > > We can get to that situation without indirect too, if a single
> > > descriptor maps to more than one sg buffer. The next revision is going
> > > to control that too.
> > >
> > > > It looks to me an easy way is to always use indirect descriptors if #sg >= 2?
> > > >
> > >
> > > I will use that, but that does not solve the case where a descriptor
> > > maps to > 1 different buffers in qemu vaddr.
> >
> > Right, so we need to deal with the case when SVQ is out of space.
> >
> >
> > > So I think that some
> > > check after marking descriptors as used is a must somehow.
> >
> > I thought it should be before processing the available buffer?
>
> Yes, I meant after that. Somehow, because I include checking the
> number of sg buffers as "processing". :).
>
> > It's
> > the guest driver that make sure there's sufficient space for used
> > ring?
> >
>
> (I think we are talking the same with different words, but just in
> case I will develop the idea here with an example).
>
> The guest is able to check if there is enough space in the SVQ's
> vring, but not in the device's vring. As an example of this, imagine
> that a guest makes available a GPA contiguous buffer of 64K, one
> descriptor. However, this memory is divided into 16 chunks of 4K in
> qemu's VA space. Imagine that at this moment there are only eight
> slots free in each vring, and that neither communication is using
> indirect descriptors.
>
> The guest only needs 1 descriptor available to make that buffer
> available, so it will add to avail ring. But SVQ needs 16 chained
> descriptors, so the buffer is not going to reach the device until it
> makes at least 8 more descriptors as used. SVQ checked for the amount
> of available room, as you said, but it cannot forward the available
> one.
>
> Since the guest already sent kick when it made the descriptor
> available, we need another mechanism to know when we have all the
> needed free slots in the SVQ vring. And that's what I meant with the
> check after marking some buffers as available.
>
> I still think it is not worth it to protect the forwarding methods of
> hogging BQL, since there must be a limit sooner or later, but it is
> something that is worth putting on the table again. But this requires
> changes for the next version for sure.
Ok.
>
> I can think in more scenarios, like guest making available an indirect
> descriptor of vq size that needs to be splitted in even more sgs. Qemu
> already does not support more than 1024 sgs buffers in VirtQueue, but
> a driver (as SVQ) must *not* create an indirect descriptor chain
> longer than the Queue Size. Should we always increase vq size to 1024
> always? I think these are highly unlikely, but again these concerns
> must be at least commented here.
>
> Does it make sense?
Right. So I think the SVQ codes should be ready to handle all those cases.
Thanks
>
> Thanks!
>
> > Thanks
> >
> > >
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > >
> > > > > > > > + /*
> > > > > > > > + * Map cannot be NULL since iova map contains all guest space and
> > > > > > > > + * qemu already has a physical address mapped
> > > > > > > > + */
> > > > > > > > + assert(map);
> > > > > > > > +
> > > > > > > > + /*
> > > > > > > > + * Map->iova chunk size is ignored. What to do if descriptor
> > > > > > > > + * (addr, size) does not fit is delegated to the device.
> > > > > > > > + */
> > > > > > > > + off = needle.translated_addr - map->translated_addr;
> > > > > > > > + iovec[i].iov_base = (void *)(map->iova + off);
> > > > > > > > + }
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > > > > > > > const struct iovec *iovec,
> > > > > > > > size_t num, bool more_descs, bool write)
> > > > > > > > @@ -156,8 +217,9 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > > > > > > > }
> > > > > > > >
> > > > > > > > static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > > > > > - VirtQueueElement *elem)
> > > > > > > > + SVQElement *svq_elem)
> > > > > > > > {
> > > > > > > > + VirtQueueElement *elem = &svq_elem->elem;
> > > > > > > > int head;
> > > > > > > > unsigned avail_idx;
> > > > > > > > vring_avail_t *avail = svq->vring.avail;
> > > > > > > > @@ -167,6 +229,12 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > > > > > /* We need some descriptors here */
> > > > > > > > assert(elem->out_num || elem->in_num);
> > > > > > > >
> > > > > > > > + vhost_svq_stash_addr(&svq_elem->in_sg_stash, elem->in_sg, elem->in_num);
> > > > > > > > + vhost_svq_stash_addr(&svq_elem->out_sg_stash, elem->out_sg, elem->out_num);
> > > > > > >
> > > > > > >
> > > > > > > I wonder if we can solve the trick like stash and unstash with a
> > > > > > > dedicated sgs in svq_elem, instead of reusing the elem.
> > > > > > >
> > > > > >
> > > > > > Actually yes, it would be way simpler to use a new sgs array in
> > > > > > svq_elem. I will change that.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > + vhost_svq_translate_addr(svq, elem->in_sg, elem->in_num);
> > > > > > > > + vhost_svq_translate_addr(svq, elem->out_sg, elem->out_num);
> > > > > > > > +
> > > > > > > > vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > > > > > > > elem->in_num > 0, false);
> > > > > > > > vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > > > > > > > @@ -187,7 +255,7 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > > > > >
> > > > > > > > }
> > > > > > > >
> > > > > > > > -static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > > > > > > > +static void vhost_svq_add(VhostShadowVirtqueue *svq, SVQElement *elem)
> > > > > > > > {
> > > > > > > > unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > > > > > > >
> > > > > > > > @@ -221,7 +289,7 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> > > > > > > > }
> > > > > > > >
> > > > > > > > while (true) {
> > > > > > > > - VirtQueueElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > > > > > > > + SVQElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > > > > > > > if (!elem) {
> > > > > > > > break;
> > > > > > > > }
> > > > > > > > @@ -247,7 +315,7 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> > > > > > > > return svq->used_idx != svq->shadow_used_idx;
> > > > > > > > }
> > > > > > > >
> > > > > > > > -static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > > > > > +static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > > > > > {
> > > > > > > > vring_desc_t *descs = svq->vring.desc;
> > > > > > > > const vring_used_t *used = svq->vring.used;
> > > > > > > > @@ -279,7 +347,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > > > > > descs[used_elem.id].next = svq->free_head;
> > > > > > > > svq->free_head = used_elem.id;
> > > > > > > >
> > > > > > > > - svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > > > > > > > + svq->ring_id_maps[used_elem.id]->elem.len = used_elem.len;
> > > > > > > > return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > > > > > > > }
> > > > > > > >
> > > > > > > > @@ -296,12 +364,19 @@ static void vhost_svq_handle_call_no_test(EventNotifier *n)
> > > > > > > >
> > > > > > > > vhost_svq_set_notification(svq, false);
> > > > > > > > while (true) {
> > > > > > > > - g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> > > > > > > > - if (!elem) {
> > > > > > > > + g_autofree SVQElement *svq_elem = vhost_svq_get_buf(svq);
> > > > > > > > + VirtQueueElement *elem;
> > > > > > > > + if (!svq_elem) {
> > > > > > > > break;
> > > > > > > > }
> > > > > > > >
> > > > > > > > assert(i < svq->vring.num);
> > > > > > > > + elem = &svq_elem->elem;
> > > > > > > > +
> > > > > > > > + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> > > > > > > > + elem->in_num);
> > > > > > > > + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> > > > > > > > + elem->out_num);
> > > > > > > > virtqueue_fill(vq, elem, elem->len, i++);
> > > > > > > > }
> > > > > > > >
> > > > > > > > @@ -451,14 +526,24 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > > > > event_notifier_set_handler(&svq->host_notifier, NULL);
> > > > > > > >
> > > > > > > > for (i = 0; i < svq->vring.num; ++i) {
> > > > > > > > - g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> > > > > > > > + g_autofree SVQElement *svq_elem = svq->ring_id_maps[i];
> > > > > > > > + VirtQueueElement *elem;
> > > > > > > > +
> > > > > > > > + if (!svq_elem) {
> > > > > > > > + continue;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + elem = &svq_elem->elem;
> > > > > > > > + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> > > > > > > > + elem->in_num);
> > > > > > > > + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> > > > > > > > + elem->out_num);
> > > > > > > > +
> > > > > > > > /*
> > > > > > > > * Although the doc says we must unpop in order, it's ok to unpop
> > > > > > > > * everything.
> > > > > > > > */
> > > > > > > > - if (elem) {
> > > > > > > > - virtqueue_unpop(svq->vq, elem, elem->len);
> > > > > > > > - }
> > > > > > > > + virtqueue_unpop(svq->vq, elem, elem->len);
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > > @@ -466,7 +551,8 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > > > > * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > > > > > > > * methods and file descriptors.
> > > > > > > > */
> > > > > > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > > > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > > > > > > + VhostIOVATree *iova_map)
> > > > > > > > {
> > > > > > > > int vq_idx = dev->vq_index + idx;
> > > > > > > > unsigned num = virtio_queue_get_num(dev->vdev, vq_idx);
> > > > > > > > @@ -500,11 +586,13 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > > > > > memset(svq->vring.desc, 0, driver_size);
> > > > > > > > svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> > > > > > > > memset(svq->vring.used, 0, device_size);
> > > > > > > > + svq->iova_map = iova_map;
> > > > > > > > +
> > > > > > > > for (i = 0; i < num - 1; i++) {
> > > > > > > > svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > > > > > > > }
> > > > > > > >
> > > > > > > > - svq->ring_id_maps = g_new0(VirtQueueElement *, num);
> > > > > > > > + svq->ring_id_maps = g_new0(SVQElement *, num);
> > > > > > > > event_notifier_set_handler(&svq->call_notifier,
> > > > > > > > vhost_svq_handle_call);
> > > > > > > > return g_steal_pointer(&svq);
> > > > > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > > > > > index a9c680b487..f5a12fee9d 100644
> > > > > > > > --- a/hw/virtio/vhost-vdpa.c
> > > > > > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > > > > > @@ -176,6 +176,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > > > > > > vaddr, section->readonly);
> > > > > > > >
> > > > > > > > llsize = int128_sub(llend, int128_make64(iova));
> > > > > > > > + if (v->shadow_vqs_enabled) {
> > > > > > > > + VhostDMAMap mem_region = {
> > > > > > > > + .translated_addr = vaddr,
> > > > > > > > + .size = int128_get64(llsize) - 1,
> > > > > > > > + .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> > > > > > > > + };
> > > > > > > > +
> > > > > > > > + int r = vhost_iova_tree_alloc(v->iova_map, &mem_region);
> > > > > > > > + assert(r == VHOST_DMA_MAP_OK);
> > > > > > > > +
> > > > > > > > + iova = mem_region.iova;
> > > > > > > > + }
> > > > > > > >
> > > > > > > > ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> > > > > > > > vaddr, section->readonly);
> > > > > > > > @@ -754,6 +766,23 @@ static bool vhost_vdpa_force_iommu(struct vhost_dev *dev)
> > > > > > > > return true;
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static int vhost_vdpa_get_iova_range(struct vhost_dev *dev,
> > > > > > > > + hwaddr *first, hwaddr *last)
> > > > > > > > +{
> > > > > > > > + int ret;
> > > > > > > > + struct vhost_vdpa_iova_range range;
> > > > > > > > +
> > > > > > > > + ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_IOVA_RANGE, &range);
> > > > > > > > + if (ret != 0) {
> > > > > > > > + return ret;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + *first = range.first;
> > > > > > > > + *last = range.last;
> > > > > > > > + trace_vhost_vdpa_get_iova_range(dev, *first, *last);
> > > > > > > > + return ret;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > /**
> > > > > > > > * Maps QEMU vaddr memory to device in a suitable way for shadow virtqueue:
> > > > > > > > * - It always reference qemu memory address, not guest's memory.
> > > > > > > > @@ -881,6 +910,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
> > > > > > > > static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > > > {
> > > > > > > > struct vhost_dev *hdev = v->dev;
> > > > > > > > + hwaddr iova_first, iova_last;
> > > > > > > > unsigned n;
> > > > > > > > int r;
> > > > > > > >
> > > > > > > > @@ -894,7 +924,7 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > > > /* Allocate resources */
> > > > > > > > assert(v->shadow_vqs->len == 0);
> > > > > > > > for (n = 0; n < hdev->nvqs; ++n) {
> > > > > > > > - VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
> > > > > > > > + VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n, v->iova_map);
> > > > > > > > if (unlikely(!svq)) {
> > > > > > > > g_ptr_array_set_size(v->shadow_vqs, 0);
> > > > > > > > return 0;
> > > > > > > > @@ -903,6 +933,8 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > > + r = vhost_vdpa_get_iova_range(hdev, &iova_first, &iova_last);
> > > > > > > > + assert(r == 0);
> > > > > > > > r = vhost_vdpa_vring_pause(hdev);
> > > > > > > > assert(r == 0);
> > > > > > > >
> > > > > > > > @@ -913,6 +945,12 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > > + memory_listener_unregister(&v->listener);
> > > > > > > > + if (vhost_vdpa_dma_unmap(v, iova_first,
> > > > > > > > + (iova_last - iova_first) & TARGET_PAGE_MASK)) {
> > > > > > > > + error_report("Fail to invalidate device iotlb");
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > /* Reset device so it can be configured */
> > > > > > > > r = vhost_vdpa_dev_start(hdev, false);
> > > > > > > > assert(r == 0);
> > > > > > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > > > > > > index 8ed19e9d0c..650e521e35 100644
> > > > > > > > --- a/hw/virtio/trace-events
> > > > > > > > +++ b/hw/virtio/trace-events
> > > > > > > > @@ -52,6 +52,7 @@ vhost_vdpa_set_vring_call(void *dev, unsigned int index, int fd) "dev: %p index:
> > > > > > > > vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRIx64
> > > > > > > > vhost_vdpa_set_owner(void *dev) "dev: %p"
> > > > > > > > vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> > > > > > > > +vhost_vdpa_get_iova_range(void *dev, uint64_t first, uint64_t last) "dev: %p first: 0x%"PRIx64" last: 0x%"PRIx64
> > > > > > > >
> > > > > > > > # virtio.c
> > > > > > > > virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > > > > > >
> > > > > >
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ
[not found] ` <CAJaqyWd9LjpA5w2f1s+pNmdNjYPvcbJgPqY+Qv1fWb+6LPPzAg@mail.gmail.com>
2021-10-21 2:38 ` Jason Wang
@ 2021-10-26 4:32 ` Jason Wang
1 sibling, 0 replies; 27+ messages in thread
From: Jason Wang @ 2021-10-26 4:32 UTC (permalink / raw)
To: Eugenio Perez Martin
Cc: Parav Pandit, Markus Armbruster, Michael S. Tsirkin, qemu-level,
virtualization, Harpreet Singh Anand, Xiao W Wang,
Stefan Hajnoczi, Eli Cohen, Eric Blake, Michael Lilja
On Wed, Oct 20, 2021 at 7:57 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Oct 20, 2021 at 11:03 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Oct 20, 2021 at 2:52 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Oct 20, 2021 at 4:07 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Wed, Oct 20, 2021 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Tue, Oct 19, 2021 at 6:29 PM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Oct 19, 2021 at 11:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道:
> > > > > > > > Use translations added in VhostIOVATree in SVQ.
> > > > > > > >
> > > > > > > > Now every element needs to store the previous address also, so VirtQueue
> > > > > > > > can consume the elements properly. This adds a little overhead per VQ
> > > > > > > > element, having to allocate more memory to stash them. As a possible
> > > > > > > > optimization, this allocation could be avoided if the descriptor is not
> > > > > > > > a chain but a single one, but this is left undone.
> > > > > > > >
> > > > > > > > TODO: iova range should be queried before, and add logic to fail when
> > > > > > > > GPA is outside of its range and memory listener or svq add it.
> > > > > > > >
> > > > > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > > > > ---
> > > > > > > > hw/virtio/vhost-shadow-virtqueue.h | 4 +-
> > > > > > > > hw/virtio/vhost-shadow-virtqueue.c | 130 ++++++++++++++++++++++++-----
> > > > > > > > hw/virtio/vhost-vdpa.c | 40 ++++++++-
> > > > > > > > hw/virtio/trace-events | 1 +
> > > > > > > > 4 files changed, 152 insertions(+), 23 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > > index b7baa424a7..a0e6b5267a 100644
> > > > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > > > > @@ -11,6 +11,7 @@
> > > > > > > > #define VHOST_SHADOW_VIRTQUEUE_H
> > > > > > > >
> > > > > > > > #include "hw/virtio/vhost.h"
> > > > > > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > > > > > >
> > > > > > > > typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > > > > >
> > > > > > > > @@ -28,7 +29,8 @@ bool vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> > > > > > > > void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > > > > VhostShadowVirtqueue *svq);
> > > > > > > >
> > > > > > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx);
> > > > > > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > > > > > > + VhostIOVATree *iova_map);
> > > > > > > >
> > > > > > > > void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > > > > > >
> > > > > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > > index 2fd0bab75d..9db538547e 100644
> > > > > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > > > > @@ -11,12 +11,19 @@
> > > > > > > > #include "hw/virtio/vhost-shadow-virtqueue.h"
> > > > > > > > #include "hw/virtio/vhost.h"
> > > > > > > > #include "hw/virtio/virtio-access.h"
> > > > > > > > +#include "hw/virtio/vhost-iova-tree.h"
> > > > > > > >
> > > > > > > > #include "standard-headers/linux/vhost_types.h"
> > > > > > > >
> > > > > > > > #include "qemu/error-report.h"
> > > > > > > > #include "qemu/main-loop.h"
> > > > > > > >
> > > > > > > > +typedef struct SVQElement {
> > > > > > > > + VirtQueueElement elem;
> > > > > > > > + void **in_sg_stash;
> > > > > > > > + void **out_sg_stash;
> > > > > > > > +} SVQElement;
> > > > > > > > +
> > > > > > > > /* Shadow virtqueue to relay notifications */
> > > > > > > > typedef struct VhostShadowVirtqueue {
> > > > > > > > /* Shadow vring */
> > > > > > > > @@ -46,8 +53,11 @@ typedef struct VhostShadowVirtqueue {
> > > > > > > > /* Virtio device */
> > > > > > > > VirtIODevice *vdev;
> > > > > > > >
> > > > > > > > + /* IOVA mapping if used */
> > > > > > > > + VhostIOVATree *iova_map;
> > > > > > > > +
> > > > > > > > /* Map for returning guest's descriptors */
> > > > > > > > - VirtQueueElement **ring_id_maps;
> > > > > > > > + SVQElement **ring_id_maps;
> > > > > > > >
> > > > > > > > /* Next head to expose to device */
> > > > > > > > uint16_t avail_idx_shadow;
> > > > > > > > @@ -79,13 +89,6 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > > > > > > > continue;
> > > > > > > >
> > > > > > > > case VIRTIO_F_ACCESS_PLATFORM:
> > > > > > > > - /* SVQ needs this feature disabled. Can't continue */
> > > > > > > > - if (*dev_features & BIT_ULL(b)) {
> > > > > > > > - clear_bit(b, dev_features);
> > > > > > > > - r = false;
> > > > > > > > - }
> > > > > > > > - break;
> > > > > > > > -
> > > > > > > > case VIRTIO_F_VERSION_1:
> > > > > > > > /* SVQ needs this feature, so can't continue */
> > > > > > > > if (!(*dev_features & BIT_ULL(b))) {
> > > > > > > > @@ -126,6 +129,64 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static void vhost_svq_stash_addr(void ***stash, const struct iovec *iov,
> > > > > > > > + size_t num)
> > > > > > > > +{
> > > > > > > > + size_t i;
> > > > > > > > +
> > > > > > > > + if (num == 0) {
> > > > > > > > + return;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + *stash = g_new(void *, num);
> > > > > > > > + for (i = 0; i < num; ++i) {
> > > > > > > > + (*stash)[i] = iov[i].iov_base;
> > > > > > > > + }
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vhost_svq_unstash_addr(void **stash, struct iovec *iov, size_t num)
> > > > > > > > +{
> > > > > > > > + size_t i;
> > > > > > > > +
> > > > > > > > + if (num == 0) {
> > > > > > > > + return;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + for (i = 0; i < num; ++i) {
> > > > > > > > + iov[i].iov_base = stash[i];
> > > > > > > > + }
> > > > > > > > + g_free(stash);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> > > > > > > > + struct iovec *iovec, size_t num)
> > > > > > > > +{
> > > > > > > > + size_t i;
> > > > > > > > +
> > > > > > > > + for (i = 0; i < num; ++i) {
> > > > > > > > + VhostDMAMap needle = {
> > > > > > > > + .translated_addr = iovec[i].iov_base,
> > > > > > > > + .size = iovec[i].iov_len,
> > > > > > > > + };
> > > > > > > > + size_t off;
> > > > > > > > +
> > > > > > > > + const VhostDMAMap *map = vhost_iova_tree_find_iova(svq->iova_map,
> > > > > > > > + &needle);
> > > > > > >
> > > > > > >
> > > > > > > Is it possible that we end up with more than one maps here?
> > > > > > >
> > > > > >
> > > > > > Actually it is possible, since there is no guarantee that one
> > > > > > descriptor (or indirect descriptor) maps exactly to one iov. It could
> > > > > > map to many if qemu vaddr is not contiguous but GPA + size is. This is
> > > > > > something that must be fixed for the next revision, so thanks for
> > > > > > pointing it out!
> > > > > >
> > > > > > Taking that into account, the condition that svq vring avail_idx -
> > > > > > used_idx was always less or equal than guest's vring avail_idx -
> > > > > > used_idx is not true anymore. Checking for that before adding buffers
> > > > > > to SVQ is the easy part, but how could we recover in that case?
> > > > > >
> > > > > > I think that the easy solution is to check for more available buffers
> > > > > > unconditionally at the end of vhost_svq_handle_call, which handles the
> > > > > > SVQ used and is supposed to make more room for available buffers. So
> > > > > > vhost_handle_guest_kick would not check if eventfd is set or not
> > > > > > anymore.
> > > > > >
> > > > > > Would that make sense?
> > > > >
> > > > > Yes, I think it should work.
> > > >
> > > > Btw, I wonder how to handle indirect descriptors. SVQ doesn't use
> > > > indirect descriptors for now, but it looks like a must otherwise we
> > > > may end up SVQ is full before VQ.
> > > >
> > >
> > > We can get to that situation without indirect too, if a single
> > > descriptor maps to more than one sg buffer. The next revision is going
> > > to control that too.
> > >
> > > > It looks to me an easy way is to always use indirect descriptors if #sg >= 2?
> > > >
> > >
> > > I will use that, but that does not solve the case where a descriptor
> > > maps to > 1 different buffers in qemu vaddr.
> >
> > Right, so we need to deal with the case when SVQ is out of space.
> >
> >
> > > So I think that some
> > > check after marking descriptors as used is a must somehow.
> >
> > I thought it should be before processing the available buffer?
>
> Yes, I meant after that. Somehow, because I include checking the
> number of sg buffers as "processing". :).
>
> > It's
> > the guest driver that make sure there's sufficient space for used
> > ring?
> >
>
> (I think we are talking the same with different words, but just in
> case I will develop the idea here with an example).
>
> The guest is able to check if there is enough space in the SVQ's
> vring, but not in the device's vring. As an example of this, imagine
> that a guest makes available a GPA contiguous buffer of 64K, one
> descriptor. However, this memory is divided into 16 chunks of 4K in
> qemu's VA space. Imagine that at this moment there are only eight
> slots free in each vring, and that neither communication is using
> indirect descriptors.
>
> The guest only needs 1 descriptor available to make that buffer
> available, so it will add to avail ring. But SVQ needs 16 chained
> descriptors, so the buffer is not going to reach the device until it
> makes at least 8 more descriptors as used. SVQ checked for the amount
> of available room, as you said, but it cannot forward the available
> one.
>
> Since the guest already sent kick when it made the descriptor
> available, we need another mechanism to know when we have all the
> needed free slots in the SVQ vring. And that's what I meant with the
> check after marking some buffers as available.
>
> I still think it is not worth it to protect the forwarding methods of
> hogging BQL, since there must be a limit sooner or later, but it is
> something that is worth putting on the table again. But this requires
> changes for the next version for sure.
>
> I can think in more scenarios, like guest making available an indirect
> descriptor of vq size that needs to be splitted in even more sgs. Qemu
> already does not support more than 1024 sgs buffers in VirtQueue, but
> a driver (as SVQ) must *not* create an indirect descriptor chain
> longer than the Queue Size. Should we always increase vq size to 1024
> always? I think these are highly unlikely, but again these concerns
> must be at least commented here.
>
> Does it make sense?
Makes a lot of sense. It's better to make the code robust without any
assumption on both host and guest configuration.
Thanks
>
> Thanks!
>
> > Thanks
> >
> > >
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > >
> > > > > > > > + /*
> > > > > > > > + * Map cannot be NULL since iova map contains all guest space and
> > > > > > > > + * qemu already has a physical address mapped
> > > > > > > > + */
> > > > > > > > + assert(map);
> > > > > > > > +
> > > > > > > > + /*
> > > > > > > > + * Map->iova chunk size is ignored. What to do if descriptor
> > > > > > > > + * (addr, size) does not fit is delegated to the device.
> > > > > > > > + */
> > > > > > > > + off = needle.translated_addr - map->translated_addr;
> > > > > > > > + iovec[i].iov_base = (void *)(map->iova + off);
> > > > > > > > + }
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > > > > > > > const struct iovec *iovec,
> > > > > > > > size_t num, bool more_descs, bool write)
> > > > > > > > @@ -156,8 +217,9 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > > > > > > > }
> > > > > > > >
> > > > > > > > static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > > > > > - VirtQueueElement *elem)
> > > > > > > > + SVQElement *svq_elem)
> > > > > > > > {
> > > > > > > > + VirtQueueElement *elem = &svq_elem->elem;
> > > > > > > > int head;
> > > > > > > > unsigned avail_idx;
> > > > > > > > vring_avail_t *avail = svq->vring.avail;
> > > > > > > > @@ -167,6 +229,12 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > > > > > /* We need some descriptors here */
> > > > > > > > assert(elem->out_num || elem->in_num);
> > > > > > > >
> > > > > > > > + vhost_svq_stash_addr(&svq_elem->in_sg_stash, elem->in_sg, elem->in_num);
> > > > > > > > + vhost_svq_stash_addr(&svq_elem->out_sg_stash, elem->out_sg, elem->out_num);
> > > > > > >
> > > > > > >
> > > > > > > I wonder if we can solve the trick like stash and unstash with a
> > > > > > > dedicated sgs in svq_elem, instead of reusing the elem.
> > > > > > >
> > > > > >
> > > > > > Actually yes, it would be way simpler to use a new sgs array in
> > > > > > svq_elem. I will change that.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > + vhost_svq_translate_addr(svq, elem->in_sg, elem->in_num);
> > > > > > > > + vhost_svq_translate_addr(svq, elem->out_sg, elem->out_num);
> > > > > > > > +
> > > > > > > > vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > > > > > > > elem->in_num > 0, false);
> > > > > > > > vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > > > > > > > @@ -187,7 +255,7 @@ static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > > > > > > >
> > > > > > > > }
> > > > > > > >
> > > > > > > > -static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > > > > > > > +static void vhost_svq_add(VhostShadowVirtqueue *svq, SVQElement *elem)
> > > > > > > > {
> > > > > > > > unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > > > > > > >
> > > > > > > > @@ -221,7 +289,7 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> > > > > > > > }
> > > > > > > >
> > > > > > > > while (true) {
> > > > > > > > - VirtQueueElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > > > > > > > + SVQElement *elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > > > > > > > if (!elem) {
> > > > > > > > break;
> > > > > > > > }
> > > > > > > > @@ -247,7 +315,7 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> > > > > > > > return svq->used_idx != svq->shadow_used_idx;
> > > > > > > > }
> > > > > > > >
> > > > > > > > -static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > > > > > +static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > > > > > {
> > > > > > > > vring_desc_t *descs = svq->vring.desc;
> > > > > > > > const vring_used_t *used = svq->vring.used;
> > > > > > > > @@ -279,7 +347,7 @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > > > > > > > descs[used_elem.id].next = svq->free_head;
> > > > > > > > svq->free_head = used_elem.id;
> > > > > > > >
> > > > > > > > - svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > > > > > > > + svq->ring_id_maps[used_elem.id]->elem.len = used_elem.len;
> > > > > > > > return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > > > > > > > }
> > > > > > > >
> > > > > > > > @@ -296,12 +364,19 @@ static void vhost_svq_handle_call_no_test(EventNotifier *n)
> > > > > > > >
> > > > > > > > vhost_svq_set_notification(svq, false);
> > > > > > > > while (true) {
> > > > > > > > - g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> > > > > > > > - if (!elem) {
> > > > > > > > + g_autofree SVQElement *svq_elem = vhost_svq_get_buf(svq);
> > > > > > > > + VirtQueueElement *elem;
> > > > > > > > + if (!svq_elem) {
> > > > > > > > break;
> > > > > > > > }
> > > > > > > >
> > > > > > > > assert(i < svq->vring.num);
> > > > > > > > + elem = &svq_elem->elem;
> > > > > > > > +
> > > > > > > > + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> > > > > > > > + elem->in_num);
> > > > > > > > + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> > > > > > > > + elem->out_num);
> > > > > > > > virtqueue_fill(vq, elem, elem->len, i++);
> > > > > > > > }
> > > > > > > >
> > > > > > > > @@ -451,14 +526,24 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > > > > event_notifier_set_handler(&svq->host_notifier, NULL);
> > > > > > > >
> > > > > > > > for (i = 0; i < svq->vring.num; ++i) {
> > > > > > > > - g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> > > > > > > > + g_autofree SVQElement *svq_elem = svq->ring_id_maps[i];
> > > > > > > > + VirtQueueElement *elem;
> > > > > > > > +
> > > > > > > > + if (!svq_elem) {
> > > > > > > > + continue;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + elem = &svq_elem->elem;
> > > > > > > > + vhost_svq_unstash_addr(svq_elem->in_sg_stash, elem->in_sg,
> > > > > > > > + elem->in_num);
> > > > > > > > + vhost_svq_unstash_addr(svq_elem->out_sg_stash, elem->out_sg,
> > > > > > > > + elem->out_num);
> > > > > > > > +
> > > > > > > > /*
> > > > > > > > * Although the doc says we must unpop in order, it's ok to unpop
> > > > > > > > * everything.
> > > > > > > > */
> > > > > > > > - if (elem) {
> > > > > > > > - virtqueue_unpop(svq->vq, elem, elem->len);
> > > > > > > > - }
> > > > > > > > + virtqueue_unpop(svq->vq, elem, elem->len);
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > > @@ -466,7 +551,8 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > > > > * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > > > > > > > * methods and file descriptors.
> > > > > > > > */
> > > > > > > > -VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > > > > > +VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx,
> > > > > > > > + VhostIOVATree *iova_map)
> > > > > > > > {
> > > > > > > > int vq_idx = dev->vq_index + idx;
> > > > > > > > unsigned num = virtio_queue_get_num(dev->vdev, vq_idx);
> > > > > > > > @@ -500,11 +586,13 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > > > > > memset(svq->vring.desc, 0, driver_size);
> > > > > > > > svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> > > > > > > > memset(svq->vring.used, 0, device_size);
> > > > > > > > + svq->iova_map = iova_map;
> > > > > > > > +
> > > > > > > > for (i = 0; i < num - 1; i++) {
> > > > > > > > svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > > > > > > > }
> > > > > > > >
> > > > > > > > - svq->ring_id_maps = g_new0(VirtQueueElement *, num);
> > > > > > > > + svq->ring_id_maps = g_new0(SVQElement *, num);
> > > > > > > > event_notifier_set_handler(&svq->call_notifier,
> > > > > > > > vhost_svq_handle_call);
> > > > > > > > return g_steal_pointer(&svq);
> > > > > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > > > > > index a9c680b487..f5a12fee9d 100644
> > > > > > > > --- a/hw/virtio/vhost-vdpa.c
> > > > > > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > > > > > @@ -176,6 +176,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
> > > > > > > > vaddr, section->readonly);
> > > > > > > >
> > > > > > > > llsize = int128_sub(llend, int128_make64(iova));
> > > > > > > > + if (v->shadow_vqs_enabled) {
> > > > > > > > + VhostDMAMap mem_region = {
> > > > > > > > + .translated_addr = vaddr,
> > > > > > > > + .size = int128_get64(llsize) - 1,
> > > > > > > > + .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> > > > > > > > + };
> > > > > > > > +
> > > > > > > > + int r = vhost_iova_tree_alloc(v->iova_map, &mem_region);
> > > > > > > > + assert(r == VHOST_DMA_MAP_OK);
> > > > > > > > +
> > > > > > > > + iova = mem_region.iova;
> > > > > > > > + }
> > > > > > > >
> > > > > > > > ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> > > > > > > > vaddr, section->readonly);
> > > > > > > > @@ -754,6 +766,23 @@ static bool vhost_vdpa_force_iommu(struct vhost_dev *dev)
> > > > > > > > return true;
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static int vhost_vdpa_get_iova_range(struct vhost_dev *dev,
> > > > > > > > + hwaddr *first, hwaddr *last)
> > > > > > > > +{
> > > > > > > > + int ret;
> > > > > > > > + struct vhost_vdpa_iova_range range;
> > > > > > > > +
> > > > > > > > + ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_IOVA_RANGE, &range);
> > > > > > > > + if (ret != 0) {
> > > > > > > > + return ret;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + *first = range.first;
> > > > > > > > + *last = range.last;
> > > > > > > > + trace_vhost_vdpa_get_iova_range(dev, *first, *last);
> > > > > > > > + return ret;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > /**
> > > > > > > > * Maps QEMU vaddr memory to device in a suitable way for shadow virtqueue:
> > > > > > > > * - It always reference qemu memory address, not guest's memory.
> > > > > > > > @@ -881,6 +910,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx)
> > > > > > > > static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > > > {
> > > > > > > > struct vhost_dev *hdev = v->dev;
> > > > > > > > + hwaddr iova_first, iova_last;
> > > > > > > > unsigned n;
> > > > > > > > int r;
> > > > > > > >
> > > > > > > > @@ -894,7 +924,7 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > > > /* Allocate resources */
> > > > > > > > assert(v->shadow_vqs->len == 0);
> > > > > > > > for (n = 0; n < hdev->nvqs; ++n) {
> > > > > > > > - VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n);
> > > > > > > > + VhostShadowVirtqueue *svq = vhost_svq_new(hdev, n, v->iova_map);
> > > > > > > > if (unlikely(!svq)) {
> > > > > > > > g_ptr_array_set_size(v->shadow_vqs, 0);
> > > > > > > > return 0;
> > > > > > > > @@ -903,6 +933,8 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > > + r = vhost_vdpa_get_iova_range(hdev, &iova_first, &iova_last);
> > > > > > > > + assert(r == 0);
> > > > > > > > r = vhost_vdpa_vring_pause(hdev);
> > > > > > > > assert(r == 0);
> > > > > > > >
> > > > > > > > @@ -913,6 +945,12 @@ static unsigned vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable)
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > > + memory_listener_unregister(&v->listener);
> > > > > > > > + if (vhost_vdpa_dma_unmap(v, iova_first,
> > > > > > > > + (iova_last - iova_first) & TARGET_PAGE_MASK)) {
> > > > > > > > + error_report("Fail to invalidate device iotlb");
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > /* Reset device so it can be configured */
> > > > > > > > r = vhost_vdpa_dev_start(hdev, false);
> > > > > > > > assert(r == 0);
> > > > > > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> > > > > > > > index 8ed19e9d0c..650e521e35 100644
> > > > > > > > --- a/hw/virtio/trace-events
> > > > > > > > +++ b/hw/virtio/trace-events
> > > > > > > > @@ -52,6 +52,7 @@ vhost_vdpa_set_vring_call(void *dev, unsigned int index, int fd) "dev: %p index:
> > > > > > > > vhost_vdpa_get_features(void *dev, uint64_t features) "dev: %p features: 0x%"PRIx64
> > > > > > > > vhost_vdpa_set_owner(void *dev) "dev: %p"
> > > > > > > > vhost_vdpa_vq_get_addr(void *dev, void *vq, uint64_t desc_user_addr, uint64_t avail_user_addr, uint64_t used_user_addr) "dev: %p vq: %p desc_user_addr: 0x%"PRIx64" avail_user_addr: 0x%"PRIx64" used_user_addr: 0x%"PRIx64
> > > > > > > > +vhost_vdpa_get_iova_range(void *dev, uint64_t first, uint64_t last) "dev: %p first: 0x%"PRIx64" last: 0x%"PRIx64
> > > > > > > >
> > > > > > > > # virtio.c
> > > > > > > > virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u"
> > > > > > >
> > > > > >
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2021-10-26 4:33 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20211001070603.307037-1-eperezma@redhat.com>
2021-10-12 3:59 ` [RFC PATCH v4 00/20] vDPA shadow virtqueue Jason Wang
2021-10-12 4:06 ` Jason Wang
[not found] ` <20211001070603.307037-9-eperezma@redhat.com>
2021-10-13 3:27 ` [RFC PATCH v4 08/20] vhost: Route guest->host notification through " Jason Wang
[not found] ` <CAJaqyWd2joWx3kKz=cJBs4UxZofP7ETkbpg9+cSQSE2MSyBtUg@mail.gmail.com>
2021-10-15 3:45 ` Jason Wang
[not found] ` <20211001070603.307037-10-eperezma@redhat.com>
2021-10-13 3:43 ` [RFC PATCH v4 09/20] vdpa: Save call_fd in vhost-vdpa Jason Wang
[not found] ` <20211001070603.307037-11-eperezma@redhat.com>
2021-10-13 3:43 ` [RFC PATCH v4 10/20] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call Jason Wang
[not found] ` <20211001070603.307037-12-eperezma@redhat.com>
2021-10-13 3:47 ` [RFC PATCH v4 11/20] vhost: Route host->guest notification through shadow virtqueue Jason Wang
[not found] ` <CAJaqyWfm734HrwTJK71hUQNYVkyDaR8OiqtGro_AX9i_pXfmBQ@mail.gmail.com>
2021-10-15 4:42 ` Jason Wang
[not found] ` <CAJaqyWcO9oaGsRe-oMNbmHx7G4Mw0vZfc+7WYQ23+SteoFVn4Q@mail.gmail.com>
2021-10-20 2:01 ` Jason Wang
2021-10-13 3:49 ` Jason Wang
[not found] ` <CAJaqyWcQ314RN7-U1bYqCMXb+-nyhSi3ddqWv90ofFucMbveUw@mail.gmail.com>
2021-10-15 4:24 ` Jason Wang
[not found] ` <20211001070603.307037-13-eperezma@redhat.com>
2021-10-13 3:54 ` [RFC PATCH v4 12/20] virtio: Add vhost_shadow_vq_get_vring_addr Jason Wang
[not found] ` <20211001070603.307037-14-eperezma@redhat.com>
2021-10-13 3:56 ` [RFC PATCH v4 13/20] vdpa: Save host and guest features Jason Wang
[not found] ` <20211001070603.307037-16-eperezma@redhat.com>
2021-10-13 4:31 ` [RFC PATCH v4 15/20] vhost: Shadow virtqueue buffers forwarding Jason Wang
[not found] ` <CAJaqyWeaJyxh-tt45wxONzuOLhVt6wO48e2ufZZ3uECHTDofFw@mail.gmail.com>
2021-10-15 4:23 ` Jason Wang
[not found] ` <20211001070603.307037-17-eperezma@redhat.com>
2021-10-13 4:35 ` [RFC PATCH v4 16/20] vhost: Check for device VRING_USED_F_NO_NOTIFY at shadow virtqueue kick Jason Wang
[not found] ` <20211001070603.307037-18-eperezma@redhat.com>
2021-10-13 4:36 ` [RFC PATCH v4 17/20] vhost: Use VRING_AVAIL_F_NO_INTERRUPT at device call on shadow virtqueue Jason Wang
[not found] ` <20211001070603.307037-21-eperezma@redhat.com>
2021-10-13 5:34 ` [RFC PATCH v4 20/20] vdpa: Add custom IOTLB translations to SVQ Jason Wang
[not found] ` <CAJaqyWdEGWFNrxqKxRya=ybRiP0wTZ0aPksBBeOe9KOjOmUnqA@mail.gmail.com>
2021-10-15 7:37 ` Jason Wang
[not found] ` <CAJaqyWf7pFiw2twq9BPyr9fOJFa9ZpSMcbnoknOfC_pbuUWkmg@mail.gmail.com>
2021-10-15 8:37 ` Jason Wang
2021-10-19 9:24 ` Jason Wang
[not found] ` <CAJaqyWcRcm9rwuTqJHS0FmuMrXpoCvF34TzXKQmxXTfZssZ-jA@mail.gmail.com>
2021-10-20 2:02 ` Jason Wang
2021-10-20 2:07 ` Jason Wang
[not found] ` <CAJaqyWe6R_32Se75XF3+NUZyiWr+cLYQ_86LExmom-vCRT9G0g@mail.gmail.com>
2021-10-20 9:03 ` Jason Wang
[not found] ` <CAJaqyWd9LjpA5w2f1s+pNmdNjYPvcbJgPqY+Qv1fWb+6LPPzAg@mail.gmail.com>
2021-10-21 2:38 ` Jason Wang
2021-10-26 4:32 ` Jason Wang
[not found] ` <20211001070603.307037-19-eperezma@redhat.com>
2021-10-19 8:32 ` [RFC PATCH v4 18/20] vhost: Add VhostIOVATree Jason Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).