virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH v5 01/26] util: Make some iova_tree parameters const
       [not found] ` <20211029183525.1776416-2-eperezma@redhat.com>
@ 2021-10-31 18:59   ` Juan Quintela
  0 siblings, 0 replies; 20+ messages in thread
From: Juan Quintela @ 2021-10-31 18:59 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> As qemu guidelines:
> Unless a pointer is used to modify the pointed-to storage, give it the
> "const" attribute.
>
> In the particular case of iova_tree_find it allows to enforce what is
> requested by its comment, since the compiler would shout in case of
> modifying or freeing the const-qualified returned pointer.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> Acked-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

This patch can go in already, whose tree should this go through?

Later, Juan.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 00/26] vDPA shadow virtqueue
       [not found] <20211029183525.1776416-1-eperezma@redhat.com>
       [not found] ` <20211029183525.1776416-2-eperezma@redhat.com>
@ 2021-11-02  4:25 ` Jason Wang
       [not found] ` <20211029183525.1776416-4-eperezma@redhat.com>
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2021-11-02  4:25 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin,
	Richard Henderson, Stefan Hajnoczi, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:34, Eugenio Pérez 写道:
> This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's virtio device operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked, but at this RFC SVQ is not enabled on migration automatically.
>
> Thanks of being a buffers relay system, SVQ can be used also to
> communicate devices and drivers with different capabilities, like
> devices that only supports packed vring and not split and old guest
> with no driver packed support.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> For qemu to use shadow virtqueues the guest virtio driver must not use
> features like event_idx.
>
> SVQ needs to be enabled with QMP command:
>
> { "execute": "x-vhost-set-shadow-vq",
>        "arguments": { "name": "vhost-vdpa0", "enable": true } }
>
> This series includes some patches to delete in the final version that
> helps with its testing. The first two of the series have been sent
> sepparately but they haven't been included in qemu main branch.
>
> The two after them adds the feature to stop the device and be able to
> set and get its status. It's intended to be used with vp_vpda driver in
> a nested environment, so they are also external to this series. The
> vp_vdpa driver also need modifications to forward the new status bit,
> they will be proposed sepparately
>
> Patches 5-12 prepares the SVQ and QMP command to support guest to host
> notifications forwarding. If the SVQ is enabled with these ones
> applied and the device supports it, that part can be tested in
> isolation (for example, with networking), hopping through SVQ.
>
> Same thing is true with patches 13-17, but with device to guest
> notifications.
>
> Based on them, patches from 18 to 22 implement the actual buffer
> forwarding, using some features already introduced in previous.
> However, they will need a host device with no iommu, something that
> is not available at the moment.
>
> The last part of the series uses properly the host iommu, so the driver
> can access this new virtual address space created.
>
> Comments are welcome.


I think we need do some benchmark to see the performance impact.

Thanks


>
> TODO:
> * Event, indirect, packed, and others features of virtio.
> * To sepparate buffers forwarding in its own AIO context, so we can
>    throw more threads to that task and we don't need to stop the main
>    event loop.
> * Support multiqueue virtio-net vdpa.
> * Proper documentation.
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>    already present iova-tree for that.
> * Proper validation of guest features. Now SVQ can negotiate a
>    different set of features with the device when enabled.
> * Support of host notifiers memory regions
> * Handling of SVQ full queue in case guest's descriptors span to
>    different memory regions (qemu's VA chunks).
> * Flush pending used buffers at end of SVQ operation.
> * QMP command now looks by NetClientState name. Other devices will need
>    to implement it's way to enable vdpa.
> * Rename QMP command to set, so it looks more like a way of working
> * Better use of qemu error system
> * Make a few assertions proper error-handling paths.
> * Add more documentation
> * Less coupling of virtio / vhost, that could cause friction on changes
> * Addressed many other small comments and small fixes.
>
> Changes from v3 RFC:
>    * Move everything to vhost-vdpa backend. A big change, this allowed
>      some cleanup but more code has been added in other places.
>    * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
>    * Adding vhost-vdpa devices support
>    * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
>    * Use QMP instead of migration to start SVQ mode.
>    * Only accepting IOMMU devices, closer behavior with target devices
>      (vDPA)
>    * Fix invalid masking/unmasking of vhost call fd.
>    * Use of proper methods for synchronization.
>    * No need to modify VirtIO device code, all of the changes are
>      contained in vhost code.
>    * Delete superfluous code.
>    * An intermediate RFC was sent with only the notifications forwarding
>      changes. It can be seen in
>      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
>        virtio: Add VIRTIO_F_QUEUE_STATE
>        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>        virtio: Add virtio_queue_is_host_notifier_enabled
>        vhost: Make vhost_virtqueue_{start,stop} public
>        vhost: Add x-vhost-enable-shadow-vq qmp
>        vhost: Add VhostShadowVirtqueue
>        vdpa: Register vdpa devices in a list
>        vhost: Route guest->host notification through shadow virtqueue
>        Add vhost_svq_get_svq_call_notifier
>        Add vhost_svq_set_guest_call_notifier
>        vdpa: Save call_fd in vhost-vdpa
>        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>        vhost: Route host->guest notification through shadow virtqueue
>        virtio: Add vhost_shadow_vq_get_vring_addr
>        vdpa: Save host and guest features
>        vhost: Add vhost_svq_valid_device_features to shadow vq
>        vhost: Shadow virtqueue buffers forwarding
>        vhost: Add VhostIOVATree
>        vhost: Use a tree to store memory mappings
>        vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (26):
>    util: Make some iova_tree parameters const
>    vhost: Fix last queue index of devices with no cvq
>    virtio: Add VIRTIO_F_QUEUE_STATE
>    virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>    vhost: Add x-vhost-set-shadow-vq qmp
>    vhost: Add VhostShadowVirtqueue
>    vdpa: Save kick_fd in vhost-vdpa
>    vdpa: Add vhost_svq_get_dev_kick_notifier
>    vdpa: Add vhost_svq_set_svq_kick_fd
>    vhost: Add Shadow VirtQueue kick forwarding capabilities
>    vhost: Handle host notifiers in SVQ
>    vhost: Route guest->host notification through shadow virtqueue
>    Add vhost_svq_get_svq_call_notifier
>    Add vhost_svq_set_guest_call_notifier
>    vdpa: Save call_fd in vhost-vdpa
>    vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>    vhost: Route host->guest notification through shadow virtqueue
>    virtio: Add vhost_shadow_vq_get_vring_addr
>    vdpa: ack VIRTIO_F_QUEUE_STATE if device supports it
>    vhost: Add vhost_svq_valid_device_features to shadow vq
>    vhost: Add vhost_svq_valid_guest_features to shadow vq
>    vhost: Shadow virtqueue buffers forwarding
>    util: Add iova_tree_alloc
>    vhost: Add VhostIOVATree
>    vhost: Use a tree to store memory mappings
>    vdpa: Add custom IOTLB translations to SVQ
>
>   qapi/net.json                                 |  20 +
>   hw/virtio/vhost-iova-tree.h                   |  27 +
>   hw/virtio/vhost-shadow-virtqueue.h            |  44 ++
>   hw/virtio/virtio-pci.h                        |   1 +
>   include/hw/virtio/vhost-vdpa.h                |  12 +
>   include/hw/virtio/virtio.h                    |   4 +-
>   include/qemu/iova-tree.h                      |  25 +-
>   .../standard-headers/linux/virtio_config.h    |   5 +
>   include/standard-headers/linux/virtio_pci.h   |   2 +
>   hw/i386/intel_iommu.c                         |   2 +-
>   hw/net/vhost_net.c                            |   2 +-
>   hw/net/virtio-net.c                           |   6 +-
>   hw/virtio/vhost-iova-tree.c                   | 157 ++++
>   hw/virtio/vhost-shadow-virtqueue.c            | 746 ++++++++++++++++++
>   hw/virtio/vhost-vdpa.c                        | 437 +++++++++-
>   hw/virtio/virtio-pci.c                        |  16 +-
>   net/vhost-vdpa.c                              |  28 +
>   util/iova-tree.c                              | 151 +++-
>   hw/virtio/meson.build                         |   2 +-
>   19 files changed, 1664 insertions(+), 23 deletions(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 03/26] virtio: Add VIRTIO_F_QUEUE_STATE
       [not found] ` <20211029183525.1776416-4-eperezma@redhat.com>
@ 2021-11-02  4:57   ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2021-11-02  4:57 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

On Sat, Oct 30, 2021 at 2:36 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> Implementation of RFC of device state capability:
> https://lists.oasis-open.org/archives/virtio-comment/202012/msg00005.html

Considering this still requires time to be done, we need to think of a
way to go without this.

Thanks



>
> With this capability, vdpa device can reset it's index so it can start
> consuming from guest after disabling shadow virtqueue (SVQ), with state
> not 0.
>
> The use case is to test SVQ with virtio-pci vdpa (vp_vdpa) with nested
> virtualization. Spawning a L0 qemu with a virtio-net device, use
> vp_vdpa driver to handle it in the guest, and then spawn a L1 qemu using
> that vdpa device. When L1 qemu calls device to set a new state though
> vdpa ioctl, vp_vdpa should set each queue state though virtio
> VIRTIO_PCI_COMMON_Q_AVAIL_STATE.
>
> Since this is only for testing vhost-vdpa, it's added here before of
> proposing to kernel code. No effort is done for checking that device
> can actually change its state, its layout, or if the device even
> supports to change state at all. These will be added in the future.
>
> Also, a modified version of vp_vdpa that allows to set these in PCI
> config is needed.
>
> TODO: Check for feature enabled and split in virtio pci config
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/virtio/virtio-pci.h                         | 1 +
>  include/hw/virtio/virtio.h                     | 4 +++-
>  include/standard-headers/linux/virtio_config.h | 3 +++
>  include/standard-headers/linux/virtio_pci.h    | 2 ++
>  hw/virtio/virtio-pci.c                         | 9 +++++++++
>  5 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index 2446dcd9ae..019badbd7c 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -120,6 +120,7 @@ typedef struct VirtIOPCIQueue {
>    uint32_t desc[2];
>    uint32_t avail[2];
>    uint32_t used[2];
> +  uint16_t state;
>  } VirtIOPCIQueue;
>
>  struct VirtIOPCIProxy {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 8bab9cfb75..5fe575b8f0 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -289,7 +289,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>      DEFINE_PROP_BIT64("iommu_platform", _state, _field, \
>                        VIRTIO_F_IOMMU_PLATFORM, false), \
>      DEFINE_PROP_BIT64("packed", _state, _field, \
> -                      VIRTIO_F_RING_PACKED, false)
> +                      VIRTIO_F_RING_PACKED, false), \
> +    DEFINE_PROP_BIT64("save_restore_q_state", _state, _field, \
> +                      VIRTIO_F_QUEUE_STATE, true)
>
>  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
>  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
> diff --git a/include/standard-headers/linux/virtio_config.h b/include/standard-headers/linux/virtio_config.h
> index 22e3a85f67..59fad3eb45 100644
> --- a/include/standard-headers/linux/virtio_config.h
> +++ b/include/standard-headers/linux/virtio_config.h
> @@ -90,4 +90,7 @@
>   * Does the device support Single Root I/O Virtualization?
>   */
>  #define VIRTIO_F_SR_IOV                        37
> +
> +/* Device support save and restore virtqueue state */
> +#define VIRTIO_F_QUEUE_STATE            40
>  #endif /* _LINUX_VIRTIO_CONFIG_H */
> diff --git a/include/standard-headers/linux/virtio_pci.h b/include/standard-headers/linux/virtio_pci.h
> index db7a8e2fcb..c8d9802a87 100644
> --- a/include/standard-headers/linux/virtio_pci.h
> +++ b/include/standard-headers/linux/virtio_pci.h
> @@ -164,6 +164,7 @@ struct virtio_pci_common_cfg {
>         uint32_t queue_avail_hi;                /* read-write */
>         uint32_t queue_used_lo;         /* read-write */
>         uint32_t queue_used_hi;         /* read-write */
> +       uint16_t queue_avail_state;     /* read-write */
>  };
>
>  /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */
> @@ -202,6 +203,7 @@ struct virtio_pci_cfg_cap {
>  #define VIRTIO_PCI_COMMON_Q_AVAILHI    44
>  #define VIRTIO_PCI_COMMON_Q_USEDLO     48
>  #define VIRTIO_PCI_COMMON_Q_USEDHI     52
> +#define VIRTIO_PCI_COMMON_Q_AVAIL_STATE        56
>
>  #endif /* VIRTIO_PCI_NO_MODERN */
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 750aa47ec1..d7bb549033 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1244,6 +1244,9 @@ static uint64_t virtio_pci_common_read(void *opaque, hwaddr addr,
>      case VIRTIO_PCI_COMMON_Q_USEDHI:
>          val = proxy->vqs[vdev->queue_sel].used[1];
>          break;
> +    case VIRTIO_PCI_COMMON_Q_AVAIL_STATE:
> +        val = virtio_queue_get_last_avail_idx(vdev, vdev->queue_sel);
> +        break;
>      default:
>          val = 0;
>      }
> @@ -1330,6 +1333,8 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
>                         proxy->vqs[vdev->queue_sel].avail[0],
>                         ((uint64_t)proxy->vqs[vdev->queue_sel].used[1]) << 32 |
>                         proxy->vqs[vdev->queue_sel].used[0]);
> +            virtio_queue_set_last_avail_idx(vdev, vdev->queue_sel,
> +                        proxy->vqs[vdev->queue_sel].state);
>              proxy->vqs[vdev->queue_sel].enabled = 1;
>          } else {
>              virtio_error(vdev, "wrong value for queue_enable %"PRIx64, val);
> @@ -1353,6 +1358,9 @@ static void virtio_pci_common_write(void *opaque, hwaddr addr,
>      case VIRTIO_PCI_COMMON_Q_USEDHI:
>          proxy->vqs[vdev->queue_sel].used[1] = val;
>          break;
> +    case VIRTIO_PCI_COMMON_Q_AVAIL_STATE:
> +        proxy->vqs[vdev->queue_sel].state = val;
> +        break;
>      default:
>          break;
>      }
> @@ -1951,6 +1959,7 @@ static void virtio_pci_reset(DeviceState *qdev)
>          proxy->vqs[i].desc[0] = proxy->vqs[i].desc[1] = 0;
>          proxy->vqs[i].avail[0] = proxy->vqs[i].avail[1] = 0;
>          proxy->vqs[i].used[0] = proxy->vqs[i].used[1] = 0;
> +        proxy->vqs[i].state = 0;
>      }
>
>      if (pci_is_express(dev)) {
> --
> 2.27.0
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
       [not found] ` <20211029183525.1776416-22-eperezma@redhat.com>
@ 2021-11-02  5:25   ` Jason Wang
       [not found]     ` <CAJaqyWfdZwW82cTXwfdyfcLZUyKyw7R2wcwF2SM0wwTcVmZ_yQ@mail.gmail.com>
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Wang @ 2021-11-02  5:25 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>
> This allows it to test if the guest has aknowledge an invalid transport
> feature for SVQ. This will include packed vq layout or event_idx,
> where VirtIO device needs help from SVQ.
>
> There is not needed at this moment, but since SVQ will not re-negotiate
> features again with the guest, a failure in acknowledge them is fatal
> for SVQ.
>

It's not clear to me why we need this. Maybe you can give me an
example. E.g isn't it sufficient to filter out the device with
event_idx?

Thanks

> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/virtio/vhost-shadow-virtqueue.h | 1 +
>  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
>  2 files changed, 7 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 946b2c6295..ac55588009 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -16,6 +16,7 @@
>  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>
>  bool vhost_svq_valid_device_features(uint64_t *features);
> +bool vhost_svq_valid_guest_features(uint64_t *features);
>
>  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 6e0508a231..cb9ffcb015 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
>      return true;
>  }
>
> +/* If the guest is using some of these, SVQ cannot communicate */
> +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> +{
> +    return true;
> +}
> +
>  /* Forward guest notifications */
>  static void vhost_handle_guest_kick(EventNotifier *n)
>  {
> --
> 2.27.0
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 12/26] vhost: Route guest->host notification through shadow virtqueue
       [not found] ` <20211029183525.1776416-13-eperezma@redhat.com>
@ 2021-11-02  5:36   ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2021-11-02  5:36 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin,
	Richard Henderson, Stefan Hajnoczi, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> +/**
> + * Enable or disable shadow virtqueue in a vhost vdpa device.
> + *
> + * This function is idempotent, to call it many times with the same value for
> + * enable_svq will simply return success.
> + *
> + * @v       Vhost vdpa device
> + * @enable  True to set SVQ mode
> + * @errp    Error pointer
> + */
> +void vhost_vdpa_enable_svq(struct vhost_vdpa *v, bool enable, Error **errp)
> +{


What happens if vhost_vpda is not stated when we try to enable svq? 
Another note is that, the vhost device could be stopped and started 
after svq is enabled/disabled. We need to deal with them.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
       [not found] ` <20211029183525.1776416-24-eperezma@redhat.com>
@ 2021-11-02  6:35   ` Jason Wang
       [not found]     ` <CAJaqyWeWCwFfznbPVjcOYQeFYeDmFJy9ViWx4bQnTzGp73O8Ww@mail.gmail.com>
  2021-11-23  6:56   ` Peter Xu
  2022-01-27  8:57   ` Peter Xu
  2 siblings, 1 reply; 20+ messages in thread
From: Jason Wang @ 2021-11-02  6:35 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin,
	Richard Henderson, Stefan Hajnoczi, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> This iova tree function allows it to look for a hole in allocated
> regions and return a totally new translation for a given translated
> address.
>
> It's usage is mainly to allow devices to access qemu address space,
> remapping guest's one into a new iova space where qemu can add chunks of
> addresses.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/qemu/iova-tree.h |  17 +++++
>   util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 156 insertions(+)
>
> diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> index 8249edd764..33f9b2e13f 100644
> --- a/include/qemu/iova-tree.h
> +++ b/include/qemu/iova-tree.h
> @@ -29,6 +29,7 @@
>   #define  IOVA_OK           (0)
>   #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
>   #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */


I think we need a better name other than "NOMEM", since it's actually 
means there's no sufficient hole for the range?


>   
>   typedef struct IOVATree IOVATree;
>   typedef struct DMAMap {
> @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
>    */
>   void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
>   
> +/**
> + * iova_tree_alloc:
> + *
> + * @tree: the iova tree to allocate from
> + * @map: the new map (as translated addr & size) to allocate in iova region
> + * @iova_begin: the minimum address of the allocation
> + * @iova_end: the maximum addressable direction of the allocation
> + *
> + * Allocates a new region of a given size, between iova_min and iova_max.
> + *
> + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> + * free contiguous range. Caller can get the assigned iova in map->iova.
> + */
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_end);
> +


"iova_tree_alloc_map" seems better.


>   /**
>    * iova_tree_destroy:
>    *
> diff --git a/util/iova-tree.c b/util/iova-tree.c
> index 23ea35b7a4..27c921c4e2 100644
> --- a/util/iova-tree.c
> +++ b/util/iova-tree.c
> @@ -16,6 +16,36 @@ struct IOVATree {
>       GTree *tree;
>   };
>   
> +/* Args to pass to iova_tree_alloc foreach function. */
> +struct IOVATreeAllocArgs {
> +    /* Size of the desired allocation */
> +    size_t new_size;
> +
> +    /* The minimum address allowed in the allocation */
> +    hwaddr iova_begin;
> +
> +    /* The last addressable allowed in the allocation */
> +    hwaddr iova_last;
> +
> +    /* Previously-to-last iterated map, can be NULL in the first node */
> +    const DMAMap *hole_left;
> +
> +    /* Last iterated map */
> +    const DMAMap *hole_right;


Any reason we can move those to IOVATree structure, it can simplify a 
lot of things.


> +};
> +
> +/**
> + * Iterate args to tne next hole
> + *
> + * @args  The alloc arguments
> + * @next  The next mapping in the tree. Can be NULL to signal the last one
> + */
> +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> +                                         const DMAMap *next) {
> +    args->hole_left = args->hole_right;
> +    args->hole_right = next;
> +}
> +
>   static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
>   {
>       const DMAMap *m1 = a, *m2 = b;
> @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
>       return IOVA_OK;
>   }
>   
> +/**
> + * Try to accomodate a map of size ret->size in a hole between
> + * max(end(hole_left), iova_start).
> + *
> + * @args Arguments to allocation
> + */
> +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> +{
> +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> +    uint64_t hole_start, hole_last;
> +
> +    if (right && right->iova + right->size < args->iova_begin) {
> +        return false;
> +    }
> +
> +    if (left && left->iova > args->iova_last) {
> +        return false;
> +    }
> +
> +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> +
> +    if (hole_last - hole_start > args->new_size) {
> +        /* We found a valid hole. */
> +        return true;
> +    }
> +
> +    /* Keep iterating */
> +    return false;
> +}
> +
> +/**
> + * Foreach dma node in the tree, compare if there is a hole wit its previous
> + * node (or minimum iova address allowed) and the node.
> + *
> + * @key   Node iterating
> + * @value Node iterating
> + * @pargs Struct to communicate with the outside world
> + *
> + * Return: false to keep iterating, true if needs break.
> + */
> +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> +                                         gpointer pargs)
> +{
> +    struct IOVATreeAllocArgs *args = pargs;
> +    DMAMap *node = value;
> +
> +    assert(key == value);
> +
> +    iova_tree_alloc_args_iterate(args, node);
> +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> +        return true;
> +    }
> +
> +    if (iova_tree_alloc_map_in_hole(args)) {
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    struct IOVATreeAllocArgs args = {
> +        .new_size = map->size,
> +        .iova_begin = iova_begin,
> +        .iova_last = iova_last,
> +    };
> +
> +    if (iova_begin == 0) {
> +        /* Some devices does not like addr 0 */
> +        iova_begin += qemu_real_host_page_size;
> +    }
> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * Assuming low iova_begin, so no need to do a binary search to
> +     * locate the first node.
> +     *
> +     * TODO: We can improve the search speed if we save the beginning and the
> +     * end of holes, so we don't iterate over the previous saved ones.
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> +     * code a lot.


To say the truth, the codes in iova_tree_alloc_traverse() is hard to be 
reviewed. I think it would be easy to use first/next/last. What we 
really need is to calculate the hole between two ranges with handmade 
first, last.

Thanks


> +     *
> +     */
> +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> +    if (!iova_tree_alloc_map_in_hole(&args)) {
> +        /*
> +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> +         * (right, end) hole needs to be checked too
> +         */
> +        iova_tree_alloc_args_iterate(&args, NULL);
> +        if (!iova_tree_alloc_map_in_hole(&args)) {
> +            return IOVA_ERR_NOMEM;
> +        }
> +    }
> +
> +    map->iova = MAX(iova_begin,
> +                    args.hole_left ?
> +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> +    return iova_tree_insert(tree, map);
> +}
> +
>   void iova_tree_destroy(IOVATree *tree)
>   {
>       g_tree_destroy(tree->tree);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
       [not found] ` <20211029183525.1776416-3-eperezma@redhat.com>
@ 2021-11-02  7:25   ` Juan Quintela
  2021-11-02  7:32     ` Michael S. Tsirkin
  2021-11-02  7:40   ` Juan Quintela
  1 sibling, 1 reply; 20+ messages in thread
From: Juan Quintela @ 2021-11-02  7:25 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> The -1 assumes that all devices with no cvq have an spare vq allocated
> for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
> case, and the device may have a pair number of queues.
                                  ^^^^
even

I know, I know, I am Spanish myself O:-)

> To fix this, just resort to the lower even number of queues.

I don't understand what you try to achieve here.

> Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the
> virtio device")
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  hw/net/vhost_net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index 0d888f29a6..edf56a597f 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -330,7 +330,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
>      NetClientState *peer;
>  
>      if (!cvq) {
> -        last_index -= 1;
> +        last_index &= ~1ULL;
>      }

As far as I can see, that is a nop. last_index is defined as an int.

$ cat kk.c
#include <stdio.h>

int main(void)
{
	int i = 7;
	i &= -1ULL;
	printf("%d\n", i);
	i = 8;
	i &= -1ULL;
	printf("%d\n", i);
	i = 0;
	i &= -1ULL;
	printf("%d\n", i);
	i = -2;
	i &= -1ULL;
	printf("%d\n", i);
	return 0;
}
$ ./kk
7
8
0
-2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
  2021-11-02  7:25   ` [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq Juan Quintela
@ 2021-11-02  7:32     ` Michael S. Tsirkin
  2021-11-02  7:39       ` Juan Quintela
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2021-11-02  7:32 UTC (permalink / raw)
  To: Juan Quintela
  Cc: Laurent Vivier, Eduardo Habkost, Richard Henderson, qemu-devel,
	Markus Armbruster, Eugenio Pérez, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

On Tue, Nov 02, 2021 at 08:25:27AM +0100, Juan Quintela wrote:
> Eugenio Pérez <eperezma@redhat.com> wrote:
> > The -1 assumes that all devices with no cvq have an spare vq allocated
> > for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
> > case, and the device may have a pair number of queues.
>                                   ^^^^
> even
> 
> I know, I know, I am Spanish myself O:-)

Nobody expects the Spanish ;)

> > To fix this, just resort to the lower even number of queues.
> 
> I don't understand what you try to achieve here.
> 
> > Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the
> > virtio device")
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  hw/net/vhost_net.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index 0d888f29a6..edf56a597f 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -330,7 +330,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
> >      NetClientState *peer;
> >  
> >      if (!cvq) {
> > -        last_index -= 1;
> > +        last_index &= ~1ULL;
> >      }
> 
> As far as I can see, that is a nop. last_index is defined as an int.
> 
> $ cat kk.c
> #include <stdio.h>
> 
> int main(void)
> {
> 	int i = 7;
> 	i &= -1ULL;

Stefano's patch has ~1ULL , not -1ULL here.

> 	printf("%d\n", i);
> 	i = 8;
> 	i &= -1ULL;
> 	printf("%d\n", i);
> 	i = 0;
> 	i &= -1ULL;
> 	printf("%d\n", i);
> 	i = -2;
> 	i &= -1ULL;
> 	printf("%d\n", i);
> 	return 0;
> }
> $ ./kk
> 7
> 8
> 0
> -2

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 05/26] vhost: Add x-vhost-set-shadow-vq qmp
       [not found] ` <20211029183525.1776416-6-eperezma@redhat.com>
@ 2021-11-02  7:36   ` Juan Quintela
  0 siblings, 0 replies; 20+ messages in thread
From: Juan Quintela @ 2021-11-02  7:36 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> Command to set shadow virtqueue mode.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

You need to keep care of:

 Markus Armbruster      ] [PATCH v2 0/9] Configurable policy for handling unstable interfaces

When this hit the tree, you need to drop the x- and mark it as unstable.

Later, Juan.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
  2021-11-02  7:32     ` Michael S. Tsirkin
@ 2021-11-02  7:39       ` Juan Quintela
  0 siblings, 0 replies; 20+ messages in thread
From: Juan Quintela @ 2021-11-02  7:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Laurent Vivier, Eduardo Habkost, Richard Henderson, qemu-devel,
	Markus Armbruster, Eugenio Pérez, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Nov 02, 2021 at 08:25:27AM +0100, Juan Quintela wrote:
>> Eugenio Pérez <eperezma@redhat.com> wrote:
>> > The -1 assumes that all devices with no cvq have an spare vq allocated
>> > for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
>> > case, and the device may have a pair number of queues.
>>                                   ^^^^
>> even
>> 
>> I know, I know, I am Spanish myself O:-)
>
> Nobody expects the Spanish ;)

O:-)

>> int main(void)
>> {
>> 	int i = 7;
>> 	i &= -1ULL;
>
> Stefano's patch has ~1ULL , not -1ULL here.
>

Stupid eyes.

Thanks, Juan.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq
       [not found] ` <20211029183525.1776416-3-eperezma@redhat.com>
  2021-11-02  7:25   ` [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq Juan Quintela
@ 2021-11-02  7:40   ` Juan Quintela
  1 sibling, 0 replies; 20+ messages in thread
From: Juan Quintela @ 2021-11-02  7:40 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

Eugenio Pérez <eperezma@redhat.com> wrote:
> The -1 assumes that all devices with no cvq have an spare vq allocated
> for them, but with no offer of VIRTIO_NET_F_CTRL_VQ. This may not be the
> case, and the device may have a pair number of queues.
>
> To fix this, just resort to the lower even number of queues.
>
> Fixes: 049eb15b5fc9 ("vhost: record the last virtqueue index for the
> virtio device")
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ
       [not found] ` <20211029183525.1776416-12-eperezma@redhat.com>
@ 2021-11-02  7:54   ` Jason Wang
       [not found]     ` <CAJaqyWe56+wzXgdQp4nbGxhrSU4tPU+SkgTBUa=wSB5nSbtwuw@mail.gmail.com>
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Wang @ 2021-11-02  7:54 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin,
	Richard Henderson, Stefan Hajnoczi, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> If device supports host notifiers, this makes one jump less (kernel) to
> deliver SVQ notifications to it.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 23 ++++++++++++++++++++++-
>   2 files changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 30ab9643b9..eb0a54f954 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
> +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr);
> +
>   void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
>                        VhostShadowVirtqueue *svq, int svq_kick_fd);
>   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index fda60d11db..e3dcc039b6 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -29,6 +29,12 @@ typedef struct VhostShadowVirtqueue {
>        * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
>        */
>       EventNotifier svq_kick;
> +
> +    /* Device's host notifier memory region. NULL means no region */
> +    void *host_notifier_mr;
> +
> +    /* Virtio queue shadowing */
> +    VirtQueue *vq;
>   } VhostShadowVirtqueue;
>   
>   /**
> @@ -50,7 +56,20 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>           return;
>       }
>   
> -    event_notifier_set(&svq->hdev_kick);
> +    if (svq->host_notifier_mr) {
> +        uint16_t *mr = svq->host_notifier_mr;
> +        *mr = virtio_get_queue_index(svq->vq);


Do we need barriers around the possible MMIO here?

To avoid those complicated stuff, I'd rather simply go with eventfd path.

Note mmio and eventfd are not mutually exclusive.

Thanks


> +    } else {
> +        event_notifier_set(&svq->hdev_kick);
> +    }
> +}
> +
> +/*
> + * Set the device's memory region notifier. addr = NULL clear it.
> + */
> +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> +{
> +    svq->host_notifier_mr = addr;
>   }
>   
>   /**
> @@ -134,6 +153,7 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
>    */
>   VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>   {
> +    int vq_idx = dev->vq_index + idx;
>       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>       int r;
>   
> @@ -151,6 +171,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>           goto err_init_hdev_call;
>       }
>   
> +    svq->vq = virtio_get_queue(dev->vdev, vq_idx);
>       return g_steal_pointer(&svq);
>   
>   err_init_hdev_call:

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 22/26] vhost: Shadow virtqueue buffers forwarding
       [not found] ` <20211029183525.1776416-23-eperezma@redhat.com>
@ 2021-11-02  7:59   ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2021-11-02  7:59 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Michael S. Tsirkin,
	Richard Henderson, Stefan Hajnoczi, Markus Armbruster,
	Harpreet Singh Anand, Xiao W Wang, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Eduardo Habkost


在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> Initial version of shadow virtqueue that actually forward buffers. There
> are no iommu support at the moment, and that will be addressed in future
> patches of this series. Since all vhost-vdpa devices uses forced IOMMU,
> this means that SVQ is not usable at this point of the series on any
> device.
>
> For simplicity it only supports modern devices, that expects vring
> in little endian, with split ring and no event idx or indirect
> descriptors. Support for them will not be added in this series.
>
> It reuses the VirtQueue code for the device part. The driver part is
> based on Linux's virtio_ring driver, but with stripped functionality
> and optimizations so it's easier to review. Later commits add simpler
> ones.
>
> However to forwarding buffers have some particular pieces: One of the
> most unexpected ones is that a guest's buffer can expand through more
> than one descriptor in SVQ. While this is handled gracefully by qemu's
> emulated virtio devices, it may cause unexpected SVQ queue full. This
> patch also solves it checking for this condition at both guest's kicks
> and device's calls. The code may be more elegant in the future if SVQ
> code runs in its own iocontext.
>
> Note that vhost_vdpa_get_vq_state trust the device to write its status
> to used_idx at pause(), finishing all in-flight descriptors. This may
> not be enough for complex devices, but other development like usage of
> inflight_fd on top of this solution may extend the usage in the future.
>
> In particular, SVQ trust it to recover guest's virtqueue at start, and
> to mark as used the latest descriptors used by the device in the
> meantime.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   qapi/net.json                      |   5 +-
>   hw/virtio/vhost-shadow-virtqueue.c | 400 +++++++++++++++++++++++++++--
>   hw/virtio/vhost-vdpa.c             | 144 ++++++++++-
>   3 files changed, 521 insertions(+), 28 deletions(-)
>
> diff --git a/qapi/net.json b/qapi/net.json
> index fca2f6ebca..1c6d3b2179 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -84,12 +84,9 @@
>   #
>   # Use vhost shadow virtqueue.
>   #
> -# SVQ can just forward notifications between the device and the guest at this
> -# moment. This will expand in future changes.
> -#
>   # @name: the device name of the VirtIO device
>   #
> -# @set: true to use the alternate shadow VQ notifications
> +# @set: true to use the alternate shadow VQ
>   #
>   # Since: 6.2
>   #
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index cb9ffcb015..ad1b2342be 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -9,6 +9,9 @@
>   
>   #include "qemu/osdep.h"
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
> +#include "hw/virtio/vhost.h"
> +#include "hw/virtio/virtio-access.h"
> +
>   #include "standard-headers/linux/vhost_types.h"
>   
>   #include "qemu/error-report.h"
> @@ -45,6 +48,27 @@ typedef struct VhostShadowVirtqueue {
>   
>       /* Virtio device */
>       VirtIODevice *vdev;
> +
> +    /* Map for returning guest's descriptors */
> +    VirtQueueElement **ring_id_maps;
> +
> +    /* Next VirtQueue element that guest made available */
> +    VirtQueueElement *next_guest_avail_elem;
> +
> +    /* Next head to expose to device */
> +    uint16_t avail_idx_shadow;
> +
> +    /* Next free descriptor */
> +    uint16_t free_head;
> +
> +    /* Last seen used idx */
> +    uint16_t shadow_used_idx;
> +
> +    /* Next head to consume from device */
> +    uint16_t last_used_idx;
> +
> +    /* Cache for the exposed notification flag */
> +    bool notification;
>   } VhostShadowVirtqueue;
>   
>   /**
> @@ -56,25 +80,174 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> -/* If the device is using some of these, SVQ cannot communicate */
> +/**
> + * VirtIO transport device feature acknowledge
> + *
> + * @dev_features  The device features. If success, the acknowledged features.
> + *
> + * Returns true if SVQ can go with a subset of these, false otherwise.
> + */
>   bool vhost_svq_valid_device_features(uint64_t *dev_features)
>   {
> -    return true;
> +    uint64_t b;
> +    bool r = true;
> +
> +    for (b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END; ++b) {
> +        switch (b) {
> +        case VIRTIO_F_NOTIFY_ON_EMPTY:
> +        case VIRTIO_F_ANY_LAYOUT:
> +            continue;
> +
> +        case VIRTIO_F_ACCESS_PLATFORM:
> +            /* SVQ does not know how to translate addresses */
> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +                r = false;
> +            }
> +            break;
> +
> +        case VIRTIO_F_VERSION_1:
> +            /* SVQ trust that guest vring is little endian */
> +            if (!(*dev_features & BIT_ULL(b))) {
> +                set_bit(b, dev_features);
> +                r = false;
> +            }
> +            continue;
> +
> +        default:
> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +            }
> +        }
> +    }
> +
> +    return r;
>   }
>   
> -/* If the guest is using some of these, SVQ cannot communicate */
> +/**
> + * Check of guest's acknowledge features.
> + *
> + * @guest_features  The guest's acknowledged features
> + *
> + * Returns true if SVQ can handle them, false otherwise.
> + */
>   bool vhost_svq_valid_guest_features(uint64_t *guest_features)
>   {
> -    return true;
> +    static const uint64_t transport = MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
> +                            VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
> +
> +    /* These transport features are handled by VirtQueue */
> +    static const uint64_t valid = (BIT_ULL(VIRTIO_RING_F_INDIRECT_DESC) |
> +                                   BIT_ULL(VIRTIO_F_VERSION_1));
> +
> +    /* We are only interested in transport-related feature bits */
> +    uint64_t guest_transport_features = (*guest_features) & transport;
> +
> +    *guest_features &= (valid | ~transport);
> +    return !(guest_transport_features & (transport ^ valid));
>   }
>   
> -/* Forward guest notifications */
> -static void vhost_handle_guest_kick(EventNotifier *n)
> +/**
> + * Number of descriptors that SVQ can make available from the guest.
> + *
> + * @svq   The svq
> + */
> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>   {
> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> -                                             svq_kick);
> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> +}
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> +{
> +    uint16_t notification_flag;
> +
> +    if (svq->notification == enable) {
> +        return;
> +    }
> +
> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> +
> +    svq->notification = enable;
> +    if (enable) {
> +        svq->vring.avail->flags &= ~notification_flag;
> +    } else {
> +        svq->vring.avail->flags |= notification_flag;
> +    }
> +}
> +
> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                    const struct iovec *iovec,
> +                                    size_t num, bool more_descs, bool write)
> +{
> +    uint16_t i = svq->free_head, last = svq->free_head;
> +    unsigned n;
> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> +    vring_desc_t *descs = svq->vring.desc;
> +
> +    if (num == 0) {
> +        return;
> +    }
> +
> +    for (n = 0; n < num; n++) {
> +        if (more_descs || (n + 1 < num)) {
> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> +        } else {
> +            descs[i].flags = flags;
> +        }
> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> +
> +        last = i;
> +        i = cpu_to_le16(descs[i].next);
> +    }
> +
> +    svq->free_head = le16_to_cpu(descs[last].next);
> +}
> +
> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> +                                    VirtQueueElement *elem)
> +{
> +    int head;
> +    unsigned avail_idx;
> +    vring_avail_t *avail = svq->vring.avail;
> +
> +    head = svq->free_head;
> +
> +    /* We need some descriptors here */
> +    assert(elem->out_num || elem->in_num);
> +
> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> +                            elem->in_num > 0, false);
> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> +
> +    /*
> +     * Put entry in available array (but don't update avail->idx until they
> +     * do sync).
> +     */
> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> +    avail->ring[avail_idx] = cpu_to_le16(head);
> +    svq->avail_idx_shadow++;
> +
> +    /* Update avail index after the descriptor is wrote */
> +    smp_wmb();


A question, since we may talk with the real hardware, is smp_wmb() 
sufficient in this case or do we need to honer VIRTIO_F_ORDER_PLATFORM?


> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> +
> +    return head;
> +
> +}
> +
> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +{
> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> +
> +    svq->ring_id_maps[qemu_head] = elem;
> +}
> +
> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> +{
> +    /* We need to expose available array entries before checking used flags */
> +    smp_mb();
> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
>           return;
>       }
>   
> @@ -86,25 +259,188 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>       }
>   }
>   
> -/*
> - * Set the device's memory region notifier. addr = NULL clear it.
> +/**
> + * Forward available buffers.
> + *
> + * @svq Shadow VirtQueue
> + *
> + * Note that this function does not guarantee that all guest's available
> + * buffers are available to the device in SVQ avail ring. The guest may have
> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> + * vaddr.
> + *
> + * If that happens, guest's kick notifications will be disabled until device
> + * makes some buffers used.
>    */
> -void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>   {
> -    svq->host_notifier_mr = addr;
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(&svq->svq_kick);
> +
> +    /* Make available as many buffers as possible */
> +    do {
> +        if (virtio_queue_get_notification(svq->vq)) {
> +            virtio_queue_set_notification(svq->vq, false);
> +        }
> +
> +        while (true) {
> +            VirtQueueElement *elem;
> +
> +            if (svq->next_guest_avail_elem) {
> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +            } else {
> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> +            }
> +
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (elem->out_num + elem->in_num >
> +                vhost_svq_available_slots(svq)) {
> +                /*
> +                 * This condition is possible since a contiguous buffer in GPA
> +                 * does not imply a contiguous buffer in qemu's VA
> +                 * scatter-gather segments. If that happen, the buffer exposed
> +                 * to the device needs to be a chain of descriptors at this
> +                 * moment.
> +                 *
> +                 * SVQ cannot hold more available buffers if we are here:
> +                 * queue the current guest descriptor and ignore further kicks
> +                 * until some elements are used.
> +                 */


I wonder what's the advantage of tracking the pending elem like this. It 
looks to me we can simply rewind last_avail_idx in this case?


> +                svq->next_guest_avail_elem = elem;
> +                return;
> +            }
> +
> +            vhost_svq_add(svq, elem);
> +            vhost_svq_kick(svq);
> +        }
> +
> +        virtio_queue_set_notification(svq->vq, true);
> +    } while (!virtio_queue_empty(svq->vq));
> +}
> +
> +/**
> + * Handle guest's kick.
> + *
> + * @n guest kick event notifier, the one that guest set to notify svq.
> + */
> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> +{
> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> +                                             svq_kick);
> +    vhost_handle_guest_kick(svq);
> +}
> +
> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> +{
> +    if (svq->last_used_idx != svq->shadow_used_idx) {
> +        return true;
> +    }
> +
> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> +
> +    return svq->last_used_idx != svq->shadow_used_idx;
> +}
> +
> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> +{
> +    vring_desc_t *descs = svq->vring.desc;
> +    const vring_used_t *used = svq->vring.used;
> +    vring_used_elem_t used_elem;
> +    uint16_t last_used;
> +
> +    if (!vhost_svq_more_used(svq)) {
> +        return NULL;
> +    }
> +
> +    /* Only get used array entries after they have been exposed by dev */
> +    smp_rmb();
> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> +
> +    svq->last_used_idx++;
> +    if (unlikely(used_elem.id >= svq->vring.num)) {
> +        error_report("Device %s says index %u is used", svq->vdev->name,
> +                     used_elem.id);
> +        return NULL;
> +    }
> +
> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> +        error_report(
> +            "Device %s says index %u is used, but it was not available",
> +            svq->vdev->name, used_elem.id);
> +        return NULL;
> +    }
> +
> +    descs[used_elem.id].next = svq->free_head;
> +    svq->free_head = used_elem.id;
> +
> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
>   }
>   
> -/* Forward vhost notifications */
> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> +                            bool check_for_avail_queue)
> +{
> +    VirtQueue *vq = svq->vq;
> +
> +    /* Make as many buffers as possible used. */
> +    do {
> +        unsigned i = 0;
> +
> +        vhost_svq_set_notification(svq, false);
> +        while (true) {
> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (unlikely(i >= svq->vring.num)) {
> +                virtio_error(svq->vdev,
> +                         "More than %u used buffers obtained in a %u size SVQ",
> +                         i, svq->vring.num);
> +                virtqueue_fill(vq, elem, elem->len, i);
> +                virtqueue_flush(vq, i);
> +                i = 0;
> +            }
> +            virtqueue_fill(vq, elem, elem->len, i++);
> +        }
> +
> +        virtqueue_flush(vq, i);
> +        event_notifier_set(&svq->svq_call);
> +
> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> +            /*
> +             * Avail ring was full when vhost_svq_flush was called, so it's a
> +             * good moment to make more descriptors available if possible
> +             */
> +            vhost_handle_guest_kick(svq);
> +        }
> +
> +        vhost_svq_set_notification(svq, true);
> +    } while (vhost_svq_more_used(svq));


So this actually doesn't make sure all the buffers were processed by the 
device? Is this intended (I see it was called by the vhost_svq_stop()).

Note that it means some buffers might not be submitted to the device 
after migration?


> +}
> +
> +/**
> + * Forward used buffers.
> + *
> + * @n hdev call event notifier, the one that device set to notify svq.
> + *
> + * Note that we are not making any buffers available in the loop, there is no
> + * way that it runs more than virtqueue size times.
> + */
>   static void vhost_svq_handle_call(EventNotifier *n)
>   {
>       VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>                                                hdev_call);
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> -        return;
> -    }
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(n);
>   
> -    event_notifier_set(&svq->svq_call);
> +    vhost_svq_flush(svq, true);
>   }
>   
>   /*
> @@ -132,6 +468,14 @@ void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)
>       event_notifier_init_fd(&svq->svq_call, call_fd);
>   }
>   
> +/*
> + * Set the device's memory region notifier. addr = NULL clear it.
> + */
> +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> +{
> +    svq->host_notifier_mr = addr;
> +}
> +
>   /*
>    * Get the shadow vq vring address.
>    * @svq Shadow virtqueue
> @@ -185,7 +529,8 @@ static void vhost_svq_set_svq_kick_fd_internal(VhostShadowVirtqueue *svq,
>        * need to explicitely check for them.
>        */
>       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> +    event_notifier_set_handler(&svq->svq_kick,
> +                               vhost_handle_guest_kick_notifier);
>   
>       /*
>        * !check_old means that we are starting SVQ, taking the descriptor from
> @@ -233,7 +578,16 @@ void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
>   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
>                       VhostShadowVirtqueue *svq)
>   {
> +    unsigned i;
>       event_notifier_set_handler(&svq->svq_kick, NULL);
> +    vhost_svq_flush(svq, false);
> +
> +    for (i = 0; i < svq->vring.num; ++i) {
> +        g_autofree VirtQueueElement *elem = svq->ring_id_maps[i];
> +        if (elem) {
> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> +        }
> +    }
>   }
>   
>   /*
> @@ -248,7 +602,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>       size_t driver_size;
>       size_t device_size;
>       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> -    int r;
> +    int r, i;
>   
>       r = event_notifier_init(&svq->hdev_kick, 0);
>       if (r != 0) {
> @@ -274,6 +628,11 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
>       memset(svq->vring.desc, 0, driver_size);
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>       memset(svq->vring.used, 0, device_size);
> +    for (i = 0; i < num - 1; i++) {
> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> +    }
> +
> +    svq->ring_id_maps = g_new0(VirtQueueElement *, num);
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
>   
> @@ -292,6 +651,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>       event_notifier_cleanup(&vq->hdev_kick);
>       event_notifier_set_handler(&vq->hdev_call, NULL);
>       event_notifier_cleanup(&vq->hdev_call);
> +    g_free(vq->ring_id_maps);
>       qemu_vfree(vq->vring.desc);
>       qemu_vfree(vq->vring.used);
>       g_free(vq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index fc8396ba8a..e1c55e43e7 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -19,6 +19,7 @@
>   #include "hw/virtio/virtio-net.h"
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "hw/virtio/vhost-vdpa.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "exec/address-spaces.h"
>   #include "qemu/main-loop.h"
>   #include "cpu.h"
> @@ -821,6 +822,19 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>       return true;
>   }
>   
> +static int vhost_vdpa_vring_pause(struct vhost_dev *dev)
> +{
> +    int r;
> +    uint8_t status;
> +
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DEVICE_STOPPED);
> +    do {
> +        r = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> +    } while (r == 0 && !(status & VIRTIO_CONFIG_S_DEVICE_STOPPED));
> +
> +    return 0;
> +}
> +
>   /*
>    * Start or stop a shadow virtqueue in a vdpa device
>    *
> @@ -844,7 +858,14 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           .index = vq_index,
>       };
>       struct vhost_vring_file vhost_call_file = {
> -        .index = idx + dev->vq_index,
> +        .index = vq_index,
> +    };
> +    struct vhost_vring_addr addr = {
> +        .index = vq_index,
> +    };
> +    struct vhost_vring_state num = {
> +        .index = vq_index,
> +        .num = virtio_queue_get_num(dev->vdev, vq_index),
>       };
>       int r;
>   
> @@ -852,6 +873,7 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           const EventNotifier *vhost_kick = vhost_svq_get_dev_kick_notifier(svq);
>           const EventNotifier *vhost_call = vhost_svq_get_svq_call_notifier(svq);
>   
> +        vhost_svq_get_vring_addr(svq, &addr);
>           if (n->addr) {
>               r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
>                                                     false);
> @@ -870,8 +892,20 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           vhost_kick_file.fd = event_notifier_get_fd(vhost_kick);
>           vhost_call_file.fd = event_notifier_get_fd(vhost_call);
>       } else {
> +        struct vhost_vring_state state = {
> +            .index = vq_index,
> +        };
> +
>           vhost_svq_stop(dev, idx, svq);
>   
> +        state.num = virtio_queue_get_last_avail_idx(dev->vdev, idx);
> +        r = vhost_vdpa_set_vring_base(dev, &state);
> +        if (unlikely(r)) {
> +            error_setg_errno(errp, -r, "vhost_set_vring_base failed");
> +            return false;
> +        }
> +
> +        vhost_vdpa_vq_get_addr(dev, &addr, &dev->vqs[idx]);
>           if (n->addr) {
>               r = virtio_queue_set_host_notifier_mr(dev->vdev, idx, &n->mr,
>                                                     true);
> @@ -885,6 +919,17 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>           vhost_call_file.fd = v->call_fd[idx];
>       }
>   
> +    r = vhost_vdpa_set_vring_addr(dev, &addr);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, -r, "vhost_set_vring_addr failed");
> +        return false;
> +    }
> +    r = vhost_vdpa_set_vring_num(dev, &num);
> +    if (unlikely(r)) {
> +        error_setg_errno(errp, -r, "vhost_set_vring_num failed");
> +        return false;
> +    }
> +
>       r = vhost_vdpa_set_vring_dev_kick(dev, &vhost_kick_file);
>       if (unlikely(r)) {
>           error_setg_errno(errp, -r, "vhost_vdpa_set_vring_kick failed");
> @@ -899,6 +944,50 @@ static bool vhost_vdpa_svq_start_vq(struct vhost_dev *dev, unsigned idx,
>       return true;
>   }
>   
> +static void vhost_vdpa_get_vq_state(struct vhost_dev *dev, unsigned idx)
> +{
> +    struct VirtIODevice *vdev = dev->vdev;
> +
> +    virtio_queue_restore_last_avail_idx(vdev, idx);
> +    virtio_queue_invalidate_signalled_used(vdev, idx);
> +    virtio_queue_update_used_idx(vdev, idx);
> +}


Do we need to change vhost_vdpa_get_vring_base() to return 
vq->last_avail_idx as well?

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
       [not found]     ` <CAJaqyWeWCwFfznbPVjcOYQeFYeDmFJy9ViWx4bQnTzGp73O8Ww@mail.gmail.com>
@ 2021-11-03  3:10       ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2021-11-03  3:10 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 4:29 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Tue, Nov 2, 2021 at 7:35 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > > This iova tree function allows it to look for a hole in allocated
> > > regions and return a totally new translation for a given translated
> > > address.
> > >
> > > It's usage is mainly to allow devices to access qemu address space,
> > > remapping guest's one into a new iova space where qemu can add chunks of
> > > addresses.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >   include/qemu/iova-tree.h |  17 +++++
> > >   util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
> > >   2 files changed, 156 insertions(+)
> > >
> > > diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> > > index 8249edd764..33f9b2e13f 100644
> > > --- a/include/qemu/iova-tree.h
> > > +++ b/include/qemu/iova-tree.h
> > > @@ -29,6 +29,7 @@
> > >   #define  IOVA_OK           (0)
> > >   #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
> > >   #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> > > +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
> >
> >
> > I think we need a better name other than "NOMEM", since it's actually
> > means there's no sufficient hole for the range?
> >
>
> Actually, yes. I'm totally fine with changing it, but "the
> inspiration" is that ENOMEM is also the error that malloc sets in
> errno if not enough contiguous VM can be allocated.

Ok, then I think it's fine.

>
> What would be a more descriptive name?
>
> >
> > >
> > >   typedef struct IOVATree IOVATree;
> > >   typedef struct DMAMap {
> > > @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
> > >    */
> > >   void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
> > >
> > > +/**
> > > + * iova_tree_alloc:
> > > + *
> > > + * @tree: the iova tree to allocate from
> > > + * @map: the new map (as translated addr & size) to allocate in iova region
> > > + * @iova_begin: the minimum address of the allocation
> > > + * @iova_end: the maximum addressable direction of the allocation
> > > + *
> > > + * Allocates a new region of a given size, between iova_min and iova_max.
> > > + *
> > > + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> > > + * free contiguous range. Caller can get the assigned iova in map->iova.
> > > + */
> > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > > +                    hwaddr iova_end);
> > > +
> >
> >
> > "iova_tree_alloc_map" seems better.
> >
>
> Right, I changed in vhost but I forgot to change here.
>
> >
> > >   /**
> > >    * iova_tree_destroy:
> > >    *
> > > diff --git a/util/iova-tree.c b/util/iova-tree.c
> > > index 23ea35b7a4..27c921c4e2 100644
> > > --- a/util/iova-tree.c
> > > +++ b/util/iova-tree.c
> > > @@ -16,6 +16,36 @@ struct IOVATree {
> > >       GTree *tree;
> > >   };
> > >
> > > +/* Args to pass to iova_tree_alloc foreach function. */
> > > +struct IOVATreeAllocArgs {
> > > +    /* Size of the desired allocation */
> > > +    size_t new_size;
> > > +
> > > +    /* The minimum address allowed in the allocation */
> > > +    hwaddr iova_begin;
> > > +
> > > +    /* The last addressable allowed in the allocation */
> > > +    hwaddr iova_last;
> > > +
> > > +    /* Previously-to-last iterated map, can be NULL in the first node */
> > > +    const DMAMap *hole_left;
> > > +
> > > +    /* Last iterated map */
> > > +    const DMAMap *hole_right;
> >
> >
> > Any reason we can move those to IOVATree structure, it can simplify a
> > lot of things.
> >
>
> I can move for the next version for sure, but then it needs to be
> clear enough that these fields are alloc arguments.

Sure.

>
> >
> > > +};
> > > +
> > > +/**
> > > + * Iterate args to tne next hole
>
> s/tne/the/
>
> > > + *
> > > + * @args  The alloc arguments
> > > + * @next  The next mapping in the tree. Can be NULL to signal the last one
> > > + */
> > > +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> > > +                                         const DMAMap *next) {
> > > +    args->hole_left = args->hole_right;
> > > +    args->hole_right = next;
> > > +}
> > > +
> > >   static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
> > >   {
> > >       const DMAMap *m1 = a, *m2 = b;
> > > @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
> > >       return IOVA_OK;
> > >   }
> > >
> > > +/**
> > > + * Try to accomodate a map of size ret->size in a hole between
> > > + * max(end(hole_left), iova_start).
> > > + *
> > > + * @args Arguments to allocation
> > > + */
> > > +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> > > +{
> > > +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> > > +    uint64_t hole_start, hole_last;
> > > +
> > > +    if (right && right->iova + right->size < args->iova_begin) {
> > > +        return false;
> > > +    }
> > > +
> > > +    if (left && left->iova > args->iova_last) {
> > > +        return false;
> > > +    }
> > > +
> > > +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> > > +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> > > +
> > > +    if (hole_last - hole_start > args->new_size) {
> > > +        /* We found a valid hole. */
> > > +        return true;
> > > +    }
> > > +
> > > +    /* Keep iterating */
> > > +    return false;
> > > +}
> > > +
> > > +/**
> > > + * Foreach dma node in the tree, compare if there is a hole wit its previous
> > > + * node (or minimum iova address allowed) and the node.
> > > + *
> > > + * @key   Node iterating
> > > + * @value Node iterating
> > > + * @pargs Struct to communicate with the outside world
> > > + *
> > > + * Return: false to keep iterating, true if needs break.
> > > + */
> > > +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> > > +                                         gpointer pargs)
> > > +{
> > > +    struct IOVATreeAllocArgs *args = pargs;
> > > +    DMAMap *node = value;
> > > +
> > > +    assert(key == value);
> > > +
> > > +    iova_tree_alloc_args_iterate(args, node);
> > > +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> > > +        return true;
> > > +    }
> > > +
> > > +    if (iova_tree_alloc_map_in_hole(args)) {
> > > +        return true;
> > > +    }
> > > +
> > > +    return false;
> > > +}
> > > +
> > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> > > +                    hwaddr iova_last)
> > > +{
> > > +    struct IOVATreeAllocArgs args = {
> > > +        .new_size = map->size,
> > > +        .iova_begin = iova_begin,
> > > +        .iova_last = iova_last,
> > > +    };
> > > +
> > > +    if (iova_begin == 0) {
> > > +        /* Some devices does not like addr 0 */
> > > +        iova_begin += qemu_real_host_page_size;
> > > +    }
> > > +
> > > +    assert(iova_begin < iova_last);
> > > +
> > > +    /*
> > > +     * Find a valid hole for the mapping
> > > +     *
> > > +     * Assuming low iova_begin, so no need to do a binary search to
> > > +     * locate the first node.
> > > +     *
> > > +     * TODO: We can improve the search speed if we save the beginning and the
> > > +     * end of holes, so we don't iterate over the previous saved ones.
> > > +     *
> > > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > > +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> > > +     * code a lot.
> >
> >
> > To say the truth, the codes in iova_tree_alloc_traverse() is hard to be
> > reviewed. I think it would be easy to use first/next/last. What we
> > really need is to calculate the hole between two ranges with handmade
> > first, last.
> >
>
> I totally agree on that, but we don't have first/next/last in GTree
> until glib 2.68. Can we raise the minimum version required?

I'm not sure but I guess it's better not. But I wonder if something
like the following would be simpler?

DMAMap first = {
    .iova = iova_begin,
    .size = 0,
};

DMAMap *previous = &first;
DMAMap *this;

static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
                                         gpointer pargs)
{
    struct IOVATreeAllocArgs *args = pargs;
    hwaddr start = previous->iova + previous->size;
    this = value;

    if (this->iova - start >= args->size)
        return true;

    previous = this;
    return false;
}

And we need to deal with the iova_end as you did.

Thanks

>
> Another possibility that comes to my mind is to either have a list /
> tree of free regions, or directly a custom allocator for this.
>
> > Thanks
> >
> >
> > > +     *
> > > +     */
> > > +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> > > +    if (!iova_tree_alloc_map_in_hole(&args)) {
> > > +        /*
> > > +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> > > +         * (right, end) hole needs to be checked too
> > > +         */
> > > +        iova_tree_alloc_args_iterate(&args, NULL);
> > > +        if (!iova_tree_alloc_map_in_hole(&args)) {
> > > +            return IOVA_ERR_NOMEM;
> > > +        }
> > > +    }
> > > +
> > > +    map->iova = MAX(iova_begin,
> > > +                    args.hole_left ?
> > > +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> > > +    return iova_tree_insert(tree, map);
> > > +}
> > > +
> > >   void iova_tree_destroy(IOVATree *tree)
> > >   {
> > >       g_tree_destroy(tree->tree);
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
       [not found]     ` <CAJaqyWfdZwW82cTXwfdyfcLZUyKyw7R2wcwF2SM0wwTcVmZ_yQ@mail.gmail.com>
@ 2021-11-03  3:18       ` Jason Wang
       [not found]         ` <CAJaqyWcDrEyFX8xvGmePdA5-tN88JuPV+=GU26+d=u-kSO6gsQ@mail.gmail.com>
  0 siblings, 1 reply; 20+ messages in thread
From: Jason Wang @ 2021-11-03  3:18 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

On Tue, Nov 2, 2021 at 4:10 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Tue, Nov 2, 2021 at 6:26 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > >
> > > This allows it to test if the guest has aknowledge an invalid transport
> > > feature for SVQ. This will include packed vq layout or event_idx,
> > > where VirtIO device needs help from SVQ.
> > >
> > > There is not needed at this moment, but since SVQ will not re-negotiate
> > > features again with the guest, a failure in acknowledge them is fatal
> > > for SVQ.
> > >
> >
> > It's not clear to me why we need this. Maybe you can give me an
> > example. E.g isn't it sufficient to filter out the device with
> > event_idx?
> >
>
> If the guest did negotiate _F_EVENT_IDX, it expects to be notified
> only when device marks as used a specific number of descriptors.
>
> If we use VirtQueue notification, the VirtQueue code handles it
> transparently. But if we want to be able to change the guest VQ's
> call_fd, we cannot use VirtQueue's, so this needs to be handled by SVQ
> code. And that is still not implemented.
>
> Of course in the event_idx case we could just ignore it and notify in
> all used descriptors, but it seems not polite to me :). I will develop
> event_idx on top of this, either exposing the needed pieces in
> VirtQueue (I prefer this) or rolling our own in SVQ.

Yes, but what I meant is, we can fail the SVQ enabling if the device
supports event_idx. Then we're sure guests won't negotiate event_idx.

Thanks

>
> Same reasoning can be applied to unknown transport features.
>
> Thanks!
>
> > Thanks
> >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > ---
> > >  hw/virtio/vhost-shadow-virtqueue.h | 1 +
> > >  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
> > >  2 files changed, 7 insertions(+)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > index 946b2c6295..ac55588009 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > @@ -16,6 +16,7 @@
> > >  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > >
> > >  bool vhost_svq_valid_device_features(uint64_t *features);
> > > +bool vhost_svq_valid_guest_features(uint64_t *features);
> > >
> > >  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > >  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > index 6e0508a231..cb9ffcb015 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > >      return true;
> > >  }
> > >
> > > +/* If the guest is using some of these, SVQ cannot communicate */
> > > +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> > > +{
> > > +    return true;
> > > +}
> > > +
> > >  /* Forward guest notifications */
> > >  static void vhost_handle_guest_kick(EventNotifier *n)
> > >  {
> > > --
> > > 2.27.0
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ
       [not found]         ` <CAJaqyWd4DQwRSL5StCft+3-uq12TW5x1o4DN_YW97D0JzOr2XQ@mail.gmail.com>
@ 2021-11-04  2:31           ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2021-11-04  2:31 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

On Wed, Nov 3, 2021 at 3:40 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Nov 3, 2021 at 3:56 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Nov 2, 2021 at 4:47 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Nov 2, 2021 at 8:55 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2021/10/30 上午2:35, Eugenio Pérez 写道:
> > > > > If device supports host notifiers, this makes one jump less (kernel) to
> > > > > deliver SVQ notifications to it.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
> > > > >   hw/virtio/vhost-shadow-virtqueue.c | 23 ++++++++++++++++++++++-
> > > > >   2 files changed, 24 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > index 30ab9643b9..eb0a54f954 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > >   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > > > >   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> > > > >                                                 const VhostShadowVirtqueue *svq);
> > > > > +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr);
> > > > > +
> > > > >   void vhost_svq_start(struct vhost_dev *dev, unsigned idx,
> > > > >                        VhostShadowVirtqueue *svq, int svq_kick_fd);
> > > > >   void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > index fda60d11db..e3dcc039b6 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > @@ -29,6 +29,12 @@ typedef struct VhostShadowVirtqueue {
> > > > >        * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
> > > > >        */
> > > > >       EventNotifier svq_kick;
> > > > > +
> > > > > +    /* Device's host notifier memory region. NULL means no region */
> > > > > +    void *host_notifier_mr;
> > > > > +
> > > > > +    /* Virtio queue shadowing */
> > > > > +    VirtQueue *vq;
> > > > >   } VhostShadowVirtqueue;
> > > > >
> > > > >   /**
> > > > > @@ -50,7 +56,20 @@ static void vhost_handle_guest_kick(EventNotifier *n)
> > > > >           return;
> > > > >       }
> > > > >
> > > > > -    event_notifier_set(&svq->hdev_kick);
> > > > > +    if (svq->host_notifier_mr) {
> > > > > +        uint16_t *mr = svq->host_notifier_mr;
> > > > > +        *mr = virtio_get_queue_index(svq->vq);
> > > >
> > > >
> > > > Do we need barriers around the possible MMIO here?
> > >
> > > That's right, I missed them.
> > >
> > > >
> > > > To avoid those complicated stuff, I'd rather simply go with eventfd path.
> > > >
> > > > Note mmio and eventfd are not mutually exclusive.
> > >
> > > Actually we cannot ignore them since they are set in the guest. If SVQ
> > > does nothing about them, the guest's notification will travel directly
> > > to the vdpa device, and SVQ cannot intercept them.
> > >
> > > Taking that into account, it's actually less changes to move them to
> > > SVQ (like in this series) than to disable them (like in previous
> > > series). But we can go with disabling them for sure.
> >
> > I think we can simply disable the memory region for the doorbell, then
> > qemu/kvm will do all the rest for us.
> >
> > If we want to add barriers it would be a lot of architecture specific
> > instructions which looks like a burden for us to maintain in Qemu.
> >
> > So if we disable the memory region, KVM will fallback to the eventfd,
> > then qemu can intercept and we can simply relay it via kickfd. This
> > looks easier to maintain.
> >
> > Thanks
> >
>
> Any reason to go off-list? :).

Hit the wrong button:(

Adding the list back.

>
> I'm fine doing it that way, but it seems to me there must be a way
> since VFIO, UIO, etc would have the same issues. The worst case would
> be that these accesses are resolved through a syscall or similar. How
> does DPDK solve it?

I guess it should have per arch assemblies etc.

> Probably with specific asm as you say...

We can go this way, but to speed up the merging, I'd go with eventfd
first to avoid dependencies. And we can do that in the future as the
performance optimization.

Thanks

>
> Thanks!
>
>
> > >
> > > Thanks!
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > > > +    } else {
> > > > > +        event_notifier_set(&svq->hdev_kick);
> > > > > +    }
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Set the device's memory region notifier. addr = NULL clear it.
> > > > > + */
> > > > > +void vhost_svq_set_host_mr_notifier(VhostShadowVirtqueue *svq, void *addr)
> > > > > +{
> > > > > +    svq->host_notifier_mr = addr;
> > > > >   }
> > > > >
> > > > >   /**
> > > > > @@ -134,6 +153,7 @@ void vhost_svq_stop(struct vhost_dev *dev, unsigned idx,
> > > > >    */
> > > > >   VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > >   {
> > > > > +    int vq_idx = dev->vq_index + idx;
> > > > >       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > > > >       int r;
> > > > >
> > > > > @@ -151,6 +171,7 @@ VhostShadowVirtqueue *vhost_svq_new(struct vhost_dev *dev, int idx)
> > > > >           goto err_init_hdev_call;
> > > > >       }
> > > > >
> > > > > +    svq->vq = virtio_get_queue(dev->vdev, vq_idx);
> > > > >       return g_steal_pointer(&svq);
> > > > >
> > > > >   err_init_hdev_call:
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq
       [not found]         ` <CAJaqyWcDrEyFX8xvGmePdA5-tN88JuPV+=GU26+d=u-kSO6gsQ@mail.gmail.com>
@ 2021-11-04  2:34           ` Jason Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Jason Wang @ 2021-11-04  2:34 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

On Wed, Nov 3, 2021 at 3:44 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Nov 3, 2021 at 4:18 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Nov 2, 2021 at 4:10 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Nov 2, 2021 at 6:26 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Sat, Oct 30, 2021 at 2:44 AM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > >
> > > > > This allows it to test if the guest has aknowledge an invalid transport
> > > > > feature for SVQ. This will include packed vq layout or event_idx,
> > > > > where VirtIO device needs help from SVQ.
> > > > >
> > > > > There is not needed at this moment, but since SVQ will not re-negotiate
> > > > > features again with the guest, a failure in acknowledge them is fatal
> > > > > for SVQ.
> > > > >
> > > >
> > > > It's not clear to me why we need this. Maybe you can give me an
> > > > example. E.g isn't it sufficient to filter out the device with
> > > > event_idx?
> > > >
> > >
> > > If the guest did negotiate _F_EVENT_IDX, it expects to be notified
> > > only when device marks as used a specific number of descriptors.
> > >
> > > If we use VirtQueue notification, the VirtQueue code handles it
> > > transparently. But if we want to be able to change the guest VQ's
> > > call_fd, we cannot use VirtQueue's, so this needs to be handled by SVQ
> > > code. And that is still not implemented.
> > >
> > > Of course in the event_idx case we could just ignore it and notify in
> > > all used descriptors, but it seems not polite to me :). I will develop
> > > event_idx on top of this, either exposing the needed pieces in
> > > VirtQueue (I prefer this) or rolling our own in SVQ.
> >
> > Yes, but what I meant is, we can fail the SVQ enabling if the device
> > supports event_idx. Then we're sure guests won't negotiate event_idx.
> >
>
> We can go that way for sure, but then we leave out the scenario where
> the device supports event_idx but the guest has not acked it. This is
> a valid scenario for SVQ to work in.

If SVQ supports event idx in the future, we can remove it from the
blacklist. But I think it should be simpler to let SVQ use the same
features as guests. So in this case SVQ won't use the event index.

Thanks

>
> Thanks!
>
> > Thanks
> >
> > >
> > > Same reasoning can be applied to unknown transport features.
> > >
> > > Thanks!
> > >
> > > > Thanks
> > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > ---
> > > > >  hw/virtio/vhost-shadow-virtqueue.h | 1 +
> > > > >  hw/virtio/vhost-shadow-virtqueue.c | 6 ++++++
> > > > >  2 files changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > index 946b2c6295..ac55588009 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > > @@ -16,6 +16,7 @@
> > > > >  typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> > > > >
> > > > >  bool vhost_svq_valid_device_features(uint64_t *features);
> > > > > +bool vhost_svq_valid_guest_features(uint64_t *features);
> > > > >
> > > > >  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
> > > > >  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
> > > > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > index 6e0508a231..cb9ffcb015 100644
> > > > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > > @@ -62,6 +62,12 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
> > > > >      return true;
> > > > >  }
> > > > >
> > > > > +/* If the guest is using some of these, SVQ cannot communicate */
> > > > > +bool vhost_svq_valid_guest_features(uint64_t *guest_features)
> > > > > +{
> > > > > +    return true;
> > > > > +}
> > > > > +
> > > > >  /* Forward guest notifications */
> > > > >  static void vhost_handle_guest_kick(EventNotifier *n)
> > > > >  {
> > > > > --
> > > > > 2.27.0
> > > > >
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
       [not found] ` <20211029183525.1776416-24-eperezma@redhat.com>
  2021-11-02  6:35   ` [RFC PATCH v5 23/26] util: Add iova_tree_alloc Jason Wang
@ 2021-11-23  6:56   ` Peter Xu
  2022-01-27  8:57   ` Peter Xu
  2 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2021-11-23  6:56 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

Hi, Eugenio,

Sorry for the super late response.

On Fri, Oct 29, 2021 at 08:35:22PM +0200, Eugenio Pérez wrote:

[...]

> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    struct IOVATreeAllocArgs args = {
> +        .new_size = map->size,
> +        .iova_begin = iova_begin,
> +        .iova_last = iova_last,
> +    };
> +
> +    if (iova_begin == 0) {
> +        /* Some devices does not like addr 0 */
> +        iova_begin += qemu_real_host_page_size;
> +    }

Any explanation of why zero is not welcomed?

It would be great if we can move this out of iova-tree.c, because that doesn't
look like a good place to, e.g. reference qemu_real_host_page_size, anyways.
The caller can simply pass in qemu_real_host_page_size as iova_begin when
needed (and I highly doubt it'll be a must for all iova-tree users..)

> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * Assuming low iova_begin, so no need to do a binary search to
> +     * locate the first node.
> +     *
> +     * TODO: We can improve the search speed if we save the beginning and the
> +     * end of holes, so we don't iterate over the previous saved ones.
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> +     * code a lot.
> +     *
> +     */
> +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> +    if (!iova_tree_alloc_map_in_hole(&args)) {
> +        /*
> +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> +         * (right, end) hole needs to be checked too
> +         */
> +        iova_tree_alloc_args_iterate(&args, NULL);
> +        if (!iova_tree_alloc_map_in_hole(&args)) {
> +            return IOVA_ERR_NOMEM;
> +        }
> +    }
> +
> +    map->iova = MAX(iova_begin,
> +                    args.hole_left ?
> +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> +    return iova_tree_insert(tree, map);
> +}

Re the algorithm - I totally agree Jason's version is much better.

Thanks,

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
       [not found] ` <20211029183525.1776416-24-eperezma@redhat.com>
  2021-11-02  6:35   ` [RFC PATCH v5 23/26] util: Add iova_tree_alloc Jason Wang
  2021-11-23  6:56   ` Peter Xu
@ 2022-01-27  8:57   ` Peter Xu
       [not found]     ` <CAJaqyWd52cXWHnLsgo=kP2Z7=VG6YKtxFGhTe7OamfYcZxhz6w@mail.gmail.com>
  2 siblings, 1 reply; 20+ messages in thread
From: Peter Xu @ 2022-01-27  8:57 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

On Fri, Oct 29, 2021 at 08:35:22PM +0200, Eugenio Pérez wrote:
> This iova tree function allows it to look for a hole in allocated
> regions and return a totally new translation for a given translated
> address.
> 
> It's usage is mainly to allow devices to access qemu address space,
> remapping guest's one into a new iova space where qemu can add chunks of
> addresses.
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>  include/qemu/iova-tree.h |  17 +++++
>  util/iova-tree.c         | 139 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 156 insertions(+)
> 
> diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> index 8249edd764..33f9b2e13f 100644
> --- a/include/qemu/iova-tree.h
> +++ b/include/qemu/iova-tree.h
> @@ -29,6 +29,7 @@
>  #define  IOVA_OK           (0)
>  #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
>  #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
> +#define  IOVA_ERR_NOMEM    (-3) /* Cannot allocate */
>  
>  typedef struct IOVATree IOVATree;
>  typedef struct DMAMap {
> @@ -119,6 +120,22 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, hwaddr iova);
>   */
>  void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
>  
> +/**
> + * iova_tree_alloc:
> + *
> + * @tree: the iova tree to allocate from
> + * @map: the new map (as translated addr & size) to allocate in iova region
> + * @iova_begin: the minimum address of the allocation
> + * @iova_end: the maximum addressable direction of the allocation
> + *
> + * Allocates a new region of a given size, between iova_min and iova_max.
> + *
> + * Return: Same as iova_tree_insert, but cannot overlap and can be out of
> + * free contiguous range. Caller can get the assigned iova in map->iova.
> + */
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_end);
> +
>  /**
>   * iova_tree_destroy:
>   *
> diff --git a/util/iova-tree.c b/util/iova-tree.c
> index 23ea35b7a4..27c921c4e2 100644
> --- a/util/iova-tree.c
> +++ b/util/iova-tree.c
> @@ -16,6 +16,36 @@ struct IOVATree {
>      GTree *tree;
>  };
>  
> +/* Args to pass to iova_tree_alloc foreach function. */
> +struct IOVATreeAllocArgs {
> +    /* Size of the desired allocation */
> +    size_t new_size;
> +
> +    /* The minimum address allowed in the allocation */
> +    hwaddr iova_begin;
> +
> +    /* The last addressable allowed in the allocation */
> +    hwaddr iova_last;
> +
> +    /* Previously-to-last iterated map, can be NULL in the first node */
> +    const DMAMap *hole_left;
> +
> +    /* Last iterated map */
> +    const DMAMap *hole_right;

I slightly prefer having two more fields to cache the result:

       /* If found, we fill in the IOVA here */
       hwaddr iova_result;
       /* Whether have we found a valid IOVA */
       bool   iova_found;

IMHO they'll help on readability.  More below.

> +};
> +
> +/**
> + * Iterate args to tne next hole
> + *
> + * @args  The alloc arguments
> + * @next  The next mapping in the tree. Can be NULL to signal the last one
> + */
> +static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
> +                                         const DMAMap *next) {
> +    args->hole_left = args->hole_right;
> +    args->hole_right = next;
> +}
> +
>  static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
>  {
>      const DMAMap *m1 = a, *m2 = b;
> @@ -107,6 +137,115 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
>      return IOVA_OK;
>  }
>  
> +/**
> + * Try to accomodate a map of size ret->size in a hole between
> + * max(end(hole_left), iova_start).

I think this functions need the most comments, and above sentence is more or
less not sounding correct... My try...

/*
 * Try to find an unallocated IOVA range between LEFT and RIGHT elements.
 *
 * There're three cases:
 *
 * (1) When LEFT==NULL, RIGHT must be non-NULL and it means we're iterating at
 *     the 1st element.
 *
 * (2) When RIGHT==NULL, LEFT must be non-NULL and it means we're iterating at
 *     the last element.
 *
 * (3) When both LEFT and RIGHT are non-NULL, this is the most common case,
 *     we'll try to find a hole between LEFT and RIGHT mapping.
 */

> + *
> + * @args Arguments to allocation
> + */
> +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> +{
> +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> +    uint64_t hole_start, hole_last;
> +
> +    if (right && right->iova + right->size < args->iova_begin) {
> +        return false;
> +    }
> +
> +    if (left && left->iova > args->iova_last) {
> +        return false;
> +    }
> +
> +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);

I assume these values should be always inclusive, hence

s/right->iova/right->iova + 1/

?

> +
> +    if (hole_last - hole_start > args->new_size) {
> +        /* We found a valid hole. */

IMHO it's cleaner we simply set:

           args->iova_result = hole_start;

Here before stop the iterations.

> +        return true;
> +    }
> +
> +    /* Keep iterating */
> +    return false;
> +}
> +
> +/**
> + * Foreach dma node in the tree, compare if there is a hole wit its previous
> + * node (or minimum iova address allowed) and the node.
> + *
> + * @key   Node iterating
> + * @value Node iterating
> + * @pargs Struct to communicate with the outside world
> + *
> + * Return: false to keep iterating, true if needs break.
> + */
> +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> +                                         gpointer pargs)
> +{
> +    struct IOVATreeAllocArgs *args = pargs;
> +    DMAMap *node = value;
> +
> +    assert(key == value);
> +
> +    iova_tree_alloc_args_iterate(args, node);
> +    if (args->hole_left && args->hole_left->iova > args->iova_last) {

IMHO this check is redundant and can be dropped, as it's already done in
iova_tree_alloc_map_in_hole().

> +        return true;
> +    }
> +
> +    if (iova_tree_alloc_map_in_hole(args)) {
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    struct IOVATreeAllocArgs args = {
> +        .new_size = map->size,
> +        .iova_begin = iova_begin,
> +        .iova_last = iova_last,
> +    };
> +
> +    if (iova_begin == 0) {
> +        /* Some devices does not like addr 0 */
> +        iova_begin += qemu_real_host_page_size;
> +    }

(This should be dropped as the new version goes)

> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * Assuming low iova_begin, so no need to do a binary search to
> +     * locate the first node.

We could also mention something like this here:

        *
        * The traversing will cover all the possible holes but except the last
        * hole starting from the last element.  We need to handle it separately
        * below.
        *

> +     *
> +     * TODO: We can improve the search speed if we save the beginning and the
> +     * end of holes, so we don't iterate over the previous saved ones.
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). To do it with g_tree_foreach complicates the
> +     * code a lot.
> +     *
> +     */
> +    g_tree_foreach(tree->tree, iova_tree_alloc_traverse, &args);
> +    if (!iova_tree_alloc_map_in_hole(&args)) {

With iova_found, here it could be (hopefully) more readable:

       if (!args->iova_found) {
           /* If we failed to find a hole in 0..N-1 entries, try the last one */
           iova_tree_alloc_args_iterate(&args, NULL);
           iova_tree_alloc_map_in_hole(&args);
           if (!args->iova_found) {
               return IOVA_ERR_NOMEM;
           }
       }

       map->iova = args->iova_result;
       ...

Thanks,

> +        /*
> +         * 2nd try: Last iteration left args->right as the last DMAMap. But
> +         * (right, end) hole needs to be checked too
> +         */
> +        iova_tree_alloc_args_iterate(&args, NULL);
> +        if (!iova_tree_alloc_map_in_hole(&args)) {
> +            return IOVA_ERR_NOMEM;
> +        }
> +    }
> +
> +    map->iova = MAX(iova_begin,
> +                    args.hole_left ?
> +                    args.hole_left->iova + args.hole_left->size + 1 : 0);
> +    return iova_tree_insert(tree, map);
> +}
> +
>  void iova_tree_destroy(IOVATree *tree)
>  {
>      g_tree_destroy(tree->tree);
> -- 
> 2.27.0
> 

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH v5 23/26] util: Add iova_tree_alloc
       [not found]     ` <CAJaqyWd52cXWHnLsgo=kP2Z7=VG6YKtxFGhTe7OamfYcZxhz6w@mail.gmail.com>
@ 2022-01-27 11:25       ` Peter Xu
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Xu @ 2022-01-27 11:25 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Eduardo Habkost, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Markus Armbruster, Stefan Hajnoczi,
	Xiao W Wang, Harpreet Singh Anand, Eli Cohen, Paolo Bonzini,
	Eric Blake, virtualization, Parav Pandit

On Thu, Jan 27, 2022 at 11:09:44AM +0100, Eugenio Perez Martin wrote:
> > > +/**
> > > + * Try to accomodate a map of size ret->size in a hole between
> > > + * max(end(hole_left), iova_start).
> >
> > I think this functions need the most comments, and above sentence is more or
> > less not sounding correct... My try...
> >
> > /*
> >  * Try to find an unallocated IOVA range between LEFT and RIGHT elements.
> >  *
> >  * There're three cases:
> >  *
> >  * (1) When LEFT==NULL, RIGHT must be non-NULL and it means we're iterating at
> >  *     the 1st element.
> >  *
> >  * (2) When RIGHT==NULL, LEFT must be non-NULL and it means we're iterating at
> >  *     the last element.
> >  *
> >  * (3) When both LEFT and RIGHT are non-NULL, this is the most common case,
> >  *     we'll try to find a hole between LEFT and RIGHT mapping.
> >  */
> >
> 
> This is also called with left == NULL and right == NULL in the first
> allocation with an empty tree. This allows iova_tree_alloc to have the
> same code path both if the three is empty or not.
> 
> But I can add the use cases in the doc for sure.

Ah, right.

> 
> > > + *
> > > + * @args Arguments to allocation
> > > + */
> > > +static bool iova_tree_alloc_map_in_hole(const struct IOVATreeAllocArgs *args)
> > > +{
> > > +    const DMAMap *left = args->hole_left, *right = args->hole_right;
> > > +    uint64_t hole_start, hole_last;
> > > +
> > > +    if (right && right->iova + right->size < args->iova_begin) {
> > > +        return false;
> > > +    }
> > > +
> > > +    if (left && left->iova > args->iova_last) {
> > > +        return false;
> > > +    }
> > > +
> > > +    hole_start = MAX(left ? left->iova + left->size + 1 : 0, args->iova_begin);
> > > +    hole_last = MIN(right ? right->iova : HWADDR_MAX, args->iova_last);
> >
> > I assume these values should be always inclusive, hence
> >
> > s/right->iova/right->iova + 1/
> >
> > ?
> >
> 
> Right, it is confusing the way it's written. But I think it should be
> right->iova - 1 in any case to make it the hole last element, isn't
> it?

I was thinking "-1" but I failed to make it coherent with the thought when
typing.. Heh.

> 
> Would it work better to rename variable hole_last to hole_end? If not,
> we have a special case of the second allocation when iova_begin == 0:
> - We successfully allocate a DMAMap of size N. By the way the
> algorithm works,  it starts at 0, so [0, N] is allocated.

If we're always talking about inclusive ranges, shouldn't it be [0, N-1]?

> - We try to allocate a second one of size M. At the first iteration,
> "right" is the previously allocated DMAMap.
> Using the -1 trick we get hole_end == HWADDR_MAX.

I'm not sure I get the point, but both naming look fine to me.  As long as we
use inclusive ranges, then hole_end/last will be limited to HWADDR_MAX.

> > > +static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
> > > +                                         gpointer pargs)
> > > +{
> > > +    struct IOVATreeAllocArgs *args = pargs;
> > > +    DMAMap *node = value;
> > > +
> > > +    assert(key == value);
> > > +
> > > +    iova_tree_alloc_args_iterate(args, node);
> > > +    if (args->hole_left && args->hole_left->iova > args->iova_last) {
> >
> > IMHO this check is redundant and can be dropped, as it's already done in
> > iova_tree_alloc_map_in_hole().
> >
> 
> Assuming we add "iova_found" to iova_tree_alloc_map_in_hole to
> IOVATreeAllocArgs as you propose, it returns true if we are able to
> allocate a DMAMap entry, so no more iterations are needed. But if it
> returns false, it simply means that DMAMap cannot be allocated between
> left (or iova_begin) and right (iova_end). It doesn't tell if you can
> keep iterating or not. In other words, false == keep iterating if you
> can.
> 
> This other check signals the end of the available hole, and to avoid
> iterating beyond iova_last in the (unlikely?) case we have more nodes
> to iterate beyond that.
> 
> I'll try to make it more explicit.

Makes sense.  Comment works.

Thanks,

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-01-27 11:25 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20211029183525.1776416-1-eperezma@redhat.com>
     [not found] ` <20211029183525.1776416-2-eperezma@redhat.com>
2021-10-31 18:59   ` [RFC PATCH v5 01/26] util: Make some iova_tree parameters const Juan Quintela
2021-11-02  4:25 ` [RFC PATCH v5 00/26] vDPA shadow virtqueue Jason Wang
     [not found] ` <20211029183525.1776416-4-eperezma@redhat.com>
2021-11-02  4:57   ` [RFC PATCH v5 03/26] virtio: Add VIRTIO_F_QUEUE_STATE Jason Wang
     [not found] ` <20211029183525.1776416-22-eperezma@redhat.com>
2021-11-02  5:25   ` [RFC PATCH v5 21/26] vhost: Add vhost_svq_valid_guest_features to shadow vq Jason Wang
     [not found]     ` <CAJaqyWfdZwW82cTXwfdyfcLZUyKyw7R2wcwF2SM0wwTcVmZ_yQ@mail.gmail.com>
2021-11-03  3:18       ` Jason Wang
     [not found]         ` <CAJaqyWcDrEyFX8xvGmePdA5-tN88JuPV+=GU26+d=u-kSO6gsQ@mail.gmail.com>
2021-11-04  2:34           ` Jason Wang
     [not found] ` <20211029183525.1776416-13-eperezma@redhat.com>
2021-11-02  5:36   ` [RFC PATCH v5 12/26] vhost: Route guest->host notification through shadow virtqueue Jason Wang
     [not found] ` <20211029183525.1776416-24-eperezma@redhat.com>
2021-11-02  6:35   ` [RFC PATCH v5 23/26] util: Add iova_tree_alloc Jason Wang
     [not found]     ` <CAJaqyWeWCwFfznbPVjcOYQeFYeDmFJy9ViWx4bQnTzGp73O8Ww@mail.gmail.com>
2021-11-03  3:10       ` Jason Wang
2021-11-23  6:56   ` Peter Xu
2022-01-27  8:57   ` Peter Xu
     [not found]     ` <CAJaqyWd52cXWHnLsgo=kP2Z7=VG6YKtxFGhTe7OamfYcZxhz6w@mail.gmail.com>
2022-01-27 11:25       ` Peter Xu
     [not found] ` <20211029183525.1776416-6-eperezma@redhat.com>
2021-11-02  7:36   ` [RFC PATCH v5 05/26] vhost: Add x-vhost-set-shadow-vq qmp Juan Quintela
     [not found] ` <20211029183525.1776416-3-eperezma@redhat.com>
2021-11-02  7:25   ` [RFC PATCH v5 02/26] vhost: Fix last queue index of devices with no cvq Juan Quintela
2021-11-02  7:32     ` Michael S. Tsirkin
2021-11-02  7:39       ` Juan Quintela
2021-11-02  7:40   ` Juan Quintela
     [not found] ` <20211029183525.1776416-12-eperezma@redhat.com>
2021-11-02  7:54   ` [RFC PATCH v5 11/26] vhost: Handle host notifiers in SVQ Jason Wang
     [not found]     ` <CAJaqyWe56+wzXgdQp4nbGxhrSU4tPU+SkgTBUa=wSB5nSbtwuw@mail.gmail.com>
     [not found]       ` <CACGkMEvOxUMo1WA4tUfDhw+FOJVW87JJGPw=U3JvUSQTU_ogWQ@mail.gmail.com>
     [not found]         ` <CAJaqyWd4DQwRSL5StCft+3-uq12TW5x1o4DN_YW97D0JzOr2XQ@mail.gmail.com>
2021-11-04  2:31           ` Jason Wang
     [not found] ` <20211029183525.1776416-23-eperezma@redhat.com>
2021-11-02  7:59   ` [RFC PATCH v5 22/26] vhost: Shadow virtqueue buffers forwarding Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).