virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 21/31] util: Add iova_tree_alloc
       [not found] ` <20220121202733.404989-22-eperezma@redhat.com>
@ 2022-01-24  4:32   ` Peter Xu
       [not found]     ` <CAJaqyWf--wbNZz5ZzbpixD9op_fO5fV01kbYXzG097c_NkqYrw@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Xu @ 2022-01-24  4:32 UTC (permalink / raw)
  To: Eugenio Pérez
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-devel, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> +                    hwaddr iova_last)
> +{
> +    const DMAMapInternal *last, *i;
> +
> +    assert(iova_begin < iova_last);
> +
> +    /*
> +     * Find a valid hole for the mapping
> +     *
> +     * TODO: Replace all this with g_tree_node_first/next/last when available
> +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
> +     *
> +     * Try to allocate first at the end of the list.
> +     */
> +    last = QTAILQ_LAST(&tree->list);
> +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
> +                                    map->size)) {
> +        goto alloc;
> +    }
> +
> +    /* Look for inner hole */
> +    last = NULL;
> +    for (i = QTAILQ_FIRST(&tree->list); i;
> +         last = i, i = QTAILQ_NEXT(i, entry)) {
> +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
> +                                        map->size)) {
> +            goto alloc;
> +        }
> +    }
> +
> +    return IOVA_ERR_NOMEM;
> +
> +alloc:
> +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
> +    return iova_tree_insert(tree, map);
> +}

Hi, Eugenio,

Have you tried with what Jason suggested previously?

  https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/

That solution still sounds very sensible to me even without the newly
introduced list in previous two patches.

IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
stucture that was passed into the traverse func though, so it'll naturally work
with threading.

Or is there any blocker for it?

Thanks,

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
       [not found]     ` <CAJaqyWf--wbNZz5ZzbpixD9op_fO5fV01kbYXzG097c_NkqYrw@mail.gmail.com>
@ 2022-01-24 11:07       ` Peter Xu
       [not found]         ` <CAJaqyWcdpTr2X4VuAN2NLmpviCjDoAaY269+VQGZ7-F6myOhSw@mail.gmail.com>
  2022-01-30  5:06       ` Jason Wang
  1 sibling, 1 reply; 52+ messages in thread
From: Peter Xu @ 2022-01-24 11:07 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Mon, Jan 24, 2022 at 10:20:55AM +0100, Eugenio Perez Martin wrote:
> On Mon, Jan 24, 2022 at 5:33 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
> > > +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> 
> I forgot to s/iova_tree_alloc/iova_tree_alloc_map/ here.
> 
> > > +                    hwaddr iova_last)
> > > +{
> > > +    const DMAMapInternal *last, *i;
> > > +
> > > +    assert(iova_begin < iova_last);
> > > +
> > > +    /*
> > > +     * Find a valid hole for the mapping
> > > +     *
> > > +     * TODO: Replace all this with g_tree_node_first/next/last when available
> > > +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
> > > +     *
> > > +     * Try to allocate first at the end of the list.
> > > +     */
> > > +    last = QTAILQ_LAST(&tree->list);
> > > +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
> > > +                                    map->size)) {
> > > +        goto alloc;
> > > +    }
> > > +
> > > +    /* Look for inner hole */
> > > +    last = NULL;
> > > +    for (i = QTAILQ_FIRST(&tree->list); i;
> > > +         last = i, i = QTAILQ_NEXT(i, entry)) {
> > > +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
> > > +                                        map->size)) {
> > > +            goto alloc;
> > > +        }
> > > +    }
> > > +
> > > +    return IOVA_ERR_NOMEM;
> > > +
> > > +alloc:
> > > +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
> > > +    return iova_tree_insert(tree, map);
> > > +}
> >
> > Hi, Eugenio,
> >
> > Have you tried with what Jason suggested previously?
> >
> >   https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/
> >
> > That solution still sounds very sensible to me even without the newly
> > introduced list in previous two patches.
> >
> > IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
> > stucture that was passed into the traverse func though, so it'll naturally work
> > with threading.
> >
> > Or is there any blocker for it?
> >
> 
> Hi Peter,
> 
> I can try that solution again, but the main problem was the special
> cases of the beginning and ending.
> 
> For the function to locate a hole, DMAMap first = {.iova = 0, .size =
> 0} means that it cannot account 0 for the hole.
> 
> In other words, with that algorithm, if the only valid hole is [0, N)
> and we try to allocate a block of size N, it would fail.
> 
> Same happens with iova_end, although in practice it seems that IOMMU
> hardware iova upper limit is never UINT64_MAX.
> 
> Maybe we could treat .size = 0 as a special case? I see cleaner either
> to build the list (but insert needs to take the list into account) or
> to explicitly tell that prev == NULL means to use iova_first.

Sounds good to me.  I didn't mean to copy-paste Jason's code, but IMHO what
Jason wanted to show is the general concept - IOW, the fundamental idea (to me)
is that the tree will be traversed in order, hence maintaining another list
structure is redundant.

> 
> Another solution that comes to my mind: to add both exceptions outside
> of transverse function, and skip the first iteration with something
> like:
> 
> if (prev == NULL) {
>   prev = this;
>   return false /* continue */
> }
> 
> So the transverse callback has way less code paths. Would it work for
> you if I send a separate RFC from SVQ only to validate this?

Sure. :-)

If you want, imho you can also attach the patch when reply, then the discussion
context won't be lost too.

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
       [not found]         ` <CAJaqyWcdpTr2X4VuAN2NLmpviCjDoAaY269+VQGZ7-F6myOhSw@mail.gmail.com>
@ 2022-01-27  8:06           ` Peter Xu
       [not found]             ` <CAJaqyWczZ7C_vbwugyN9bEgOVuRokGqVMb_g5UK_R4F8O+qKOA@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Xu @ 2022-01-27  8:06 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
> So I think that the first step to remove complexity from the old one
> is to remove iova_begin and iova_end.
> 
> As Jason points out, removing iova_end is easier. It has the drawback
> of having to traverse all the list beyond iova_end, but a well formed
> iova tree should contain none. If the guest can manipulate it, it's
> only hurting itself adding nodes to it.
> 
> It's possible to extract the check for hole_right (or this in Jason's
> proposal) as a special case too.
> 
> But removing the iova_begin parameter is more complicated. We cannot
> know if it's a valid hole without knowing iova_begin, and we cannot
> resume traversing. Could we assume iova_begin will always be 0? I
> think not, the vdpa device can return anything through syscall.

Frankly I don't know what's the syscall you're talking about, but after a 2nd
thought and after I went back and re-read your previous version more carefully
(the one without the list) I think it seems working to me in general.  I should
have tried harder when reviewing the first time!

I mean this one:

https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/

Though this time I have some comments on the details.

Personally I like that one (probably with some amendment upon the old version)
more than the current list-based approach.  But I'd like to know your thoughts
too (including Jason).  I'll further comment in that thread soon.

Thanks,

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
       [not found]             ` <CAJaqyWczZ7C_vbwugyN9bEgOVuRokGqVMb_g5UK_R4F8O+qKOA@mail.gmail.com>
@ 2022-01-28  3:57               ` Peter Xu
  2022-01-28  5:55                 ` Jason Wang
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Xu @ 2022-01-28  3:57 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Thu, Jan 27, 2022 at 10:24:27AM +0100, Eugenio Perez Martin wrote:
> On Thu, Jan 27, 2022 at 9:06 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
> > > So I think that the first step to remove complexity from the old one
> > > is to remove iova_begin and iova_end.
> > >
> > > As Jason points out, removing iova_end is easier. It has the drawback
> > > of having to traverse all the list beyond iova_end, but a well formed
> > > iova tree should contain none. If the guest can manipulate it, it's
> > > only hurting itself adding nodes to it.
> > >
> > > It's possible to extract the check for hole_right (or this in Jason's
> > > proposal) as a special case too.
> > >
> > > But removing the iova_begin parameter is more complicated. We cannot
> > > know if it's a valid hole without knowing iova_begin, and we cannot
> > > resume traversing. Could we assume iova_begin will always be 0? I
> > > think not, the vdpa device can return anything through syscall.
> >
> > Frankly I don't know what's the syscall you're talking about,
> 
> I meant VHOST_VDPA_GET_IOVA_RANGE, which allows qemu to know the valid
> range of iova addresses. We get a pair of uint64_t from it, that
> indicates the minimum and maximum iova address the device (or iommu)
> supports.
> 
> We must allocate iova ranges within that address range, which
> complicates this algorithm a little bit. Since the SVQ iova addresses
> are not GPA, qemu needs extra code to be able to allocate and free
> them, creating a new custom iova as.
> 
> Please let me know if you want more details or if you prefer me to
> give more context in the patch message.

That's good enough, thanks.

> 
> > I mean this one:
> >
> > https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/
> >
> > Though this time I have some comments on the details.
> >
> > Personally I like that one (probably with some amendment upon the old version)
> > more than the current list-based approach.  But I'd like to know your thoughts
> > too (including Jason).  I'll further comment in that thread soon.
> >
> 
> Sure, I'm fine with whatever solution we choose, but I'm just running
> out of ideas to simplify it. Reading your suggestions on old RFC now.
> 
> Overall I feel list-based one is both more convenient and easy to
> delete when qemu raises the minimal glib version, but it adds a lot
> more code.
> 
> It could add less code with this less elegant changes:
> * If we just put the list entry in the DMAMap itself, although it
> exposes unneeded implementation details.
> * We force the iova tree either to be an allocation-based or an
> insertion-based, but not both. In other words, you can only either use
> iova_tree_alloc or iova_tree_insert on the same tree.

Yeah, I just noticed it yesterday that there's no easy choice on it.  Let's go
with either way; it shouldn't block the rest of the code.  It'll be good if
Jason or Michael share their preferences too.

> 
> I have a few tests to check the algorithms, but they are not in the
> qemu test format. I will post them so we all can understand better
> what is expected from this.

Sure.  Thanks.

-- 
Peter Xu

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
  2022-01-28  3:57               ` Peter Xu
@ 2022-01-28  5:55                 ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-01-28  5:55 UTC (permalink / raw)
  To: Peter Xu, Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/28 上午11:57, Peter Xu 写道:
> On Thu, Jan 27, 2022 at 10:24:27AM +0100, Eugenio Perez Martin wrote:
>> On Thu, Jan 27, 2022 at 9:06 AM Peter Xu <peterx@redhat.com> wrote:
>>> On Tue, Jan 25, 2022 at 10:40:01AM +0100, Eugenio Perez Martin wrote:
>>>> So I think that the first step to remove complexity from the old one
>>>> is to remove iova_begin and iova_end.
>>>>
>>>> As Jason points out, removing iova_end is easier. It has the drawback
>>>> of having to traverse all the list beyond iova_end, but a well formed
>>>> iova tree should contain none. If the guest can manipulate it, it's
>>>> only hurting itself adding nodes to it.
>>>>
>>>> It's possible to extract the check for hole_right (or this in Jason's
>>>> proposal) as a special case too.
>>>>
>>>> But removing the iova_begin parameter is more complicated. We cannot
>>>> know if it's a valid hole without knowing iova_begin, and we cannot
>>>> resume traversing. Could we assume iova_begin will always be 0? I
>>>> think not, the vdpa device can return anything through syscall.
>>> Frankly I don't know what's the syscall you're talking about,
>> I meant VHOST_VDPA_GET_IOVA_RANGE, which allows qemu to know the valid
>> range of iova addresses. We get a pair of uint64_t from it, that
>> indicates the minimum and maximum iova address the device (or iommu)
>> supports.
>>
>> We must allocate iova ranges within that address range, which
>> complicates this algorithm a little bit. Since the SVQ iova addresses
>> are not GPA, qemu needs extra code to be able to allocate and free
>> them, creating a new custom iova as.
>>
>> Please let me know if you want more details or if you prefer me to
>> give more context in the patch message.
> That's good enough, thanks.
>
>>> I mean this one:
>>>
>>> https://lore.kernel.org/qemu-devel/20211029183525.1776416-24-eperezma@redhat.com/
>>>
>>> Though this time I have some comments on the details.
>>>
>>> Personally I like that one (probably with some amendment upon the old version)
>>> more than the current list-based approach.  But I'd like to know your thoughts
>>> too (including Jason).  I'll further comment in that thread soon.
>>>
>> Sure, I'm fine with whatever solution we choose, but I'm just running
>> out of ideas to simplify it. Reading your suggestions on old RFC now.
>>
>> Overall I feel list-based one is both more convenient and easy to
>> delete when qemu raises the minimal glib version, but it adds a lot
>> more code.
>>
>> It could add less code with this less elegant changes:
>> * If we just put the list entry in the DMAMap itself, although it
>> exposes unneeded implementation details.
>> * We force the iova tree either to be an allocation-based or an
>> insertion-based, but not both. In other words, you can only either use
>> iova_tree_alloc or iova_tree_insert on the same tree.


This seems an odd API I must say :(


> Yeah, I just noticed it yesterday that there's no easy choice on it.  Let's go
> with either way; it shouldn't block the rest of the code.  It'll be good if
> Jason or Michael share their preferences too.


(Havne't gone through the code deeply)

I wonder how about just copy-paste gtree_node_first|last()? A quick 
google told me it's not complicated.

Thanks


>
>> I have a few tests to check the algorithms, but they are not in the
>> qemu test format. I will post them so we all can understand better
>> what is expected from this.
> Sure.  Thanks.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions
       [not found] ` <20220121202733.404989-2-eperezma@redhat.com>
@ 2022-01-28  5:59   ` Jason Wang
       [not found]     ` <CAJaqyWffGzYv2+HufFZzzBPtu5z3_vaKh4evGXqj7hqTB0WU3A@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-28  5:59 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> vhost_vdpa_set_features and vhost_vdpa_init need to use
> vhost_vdpa_get_features in svq mode.
>
> vhost_vdpa_dev_start needs to use almost all _set_ functions:
> vhost_vdpa_set_vring_dev_kick, vhost_vdpa_set_vring_dev_call,
> vhost_vdpa_set_dev_vring_base and vhost_vdpa_set_dev_vring_num.
>
> No functional change intended.


Is it related (a must) to the SVQ code?

Thanks


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 164 ++++++++++++++++++++---------------------
>   1 file changed, 82 insertions(+), 82 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 04ea43704f..6c10a7f05f 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -342,41 +342,6 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
>       return v->index != 0;
>   }
>   
> -static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> -{
> -    struct vhost_vdpa *v;
> -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> -    trace_vhost_vdpa_init(dev, opaque);
> -    int ret;
> -
> -    /*
> -     * Similar to VFIO, we end up pinning all guest memory and have to
> -     * disable discarding of RAM.
> -     */
> -    ret = ram_block_discard_disable(true);
> -    if (ret) {
> -        error_report("Cannot set discarding of RAM broken");
> -        return ret;
> -    }
> -
> -    v = opaque;
> -    v->dev = dev;
> -    dev->opaque =  opaque ;
> -    v->listener = vhost_vdpa_memory_listener;
> -    v->msg_type = VHOST_IOTLB_MSG_V2;
> -
> -    vhost_vdpa_get_iova_range(v);
> -
> -    if (vhost_vdpa_one_time_request(dev)) {
> -        return 0;
> -    }
> -
> -    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                               VIRTIO_CONFIG_S_DRIVER);
> -
> -    return 0;
> -}
> -
>   static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
>                                               int queue_index)
>   {
> @@ -506,24 +471,6 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
>       return 0;
>   }
>   
> -static int vhost_vdpa_set_features(struct vhost_dev *dev,
> -                                   uint64_t features)
> -{
> -    int ret;
> -
> -    if (vhost_vdpa_one_time_request(dev)) {
> -        return 0;
> -    }
> -
> -    trace_vhost_vdpa_set_features(dev, features);
> -    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> -    if (ret) {
> -        return ret;
> -    }
> -
> -    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> -}
> -
>   static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>   {
>       uint64_t features;
> @@ -646,35 +593,6 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
>       return ret;
>    }
>   
> -static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> -{
> -    struct vhost_vdpa *v = dev->opaque;
> -    trace_vhost_vdpa_dev_start(dev, started);
> -
> -    if (started) {
> -        vhost_vdpa_host_notifiers_init(dev);
> -        vhost_vdpa_set_vring_ready(dev);
> -    } else {
> -        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> -    }
> -
> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> -        return 0;
> -    }
> -
> -    if (started) {
> -        memory_listener_register(&v->listener, &address_space_memory);
> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> -    } else {
> -        vhost_vdpa_reset_device(dev);
> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> -                                   VIRTIO_CONFIG_S_DRIVER);
> -        memory_listener_unregister(&v->listener);
> -
> -        return 0;
> -    }
> -}
> -
>   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
>                                        struct vhost_log *log)
>   {
> @@ -735,6 +653,35 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>   }
>   
> +static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    trace_vhost_vdpa_dev_start(dev, started);
> +
> +    if (started) {
> +        vhost_vdpa_host_notifiers_init(dev);
> +        vhost_vdpa_set_vring_ready(dev);
> +    } else {
> +        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> +    }
> +
> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> +        return 0;
> +    }
> +
> +    if (started) {
> +        memory_listener_register(&v->listener, &address_space_memory);
> +        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +    } else {
> +        vhost_vdpa_reset_device(dev);
> +        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                                   VIRTIO_CONFIG_S_DRIVER);
> +        memory_listener_unregister(&v->listener);
> +
> +        return 0;
> +    }
> +}
> +
>   static int vhost_vdpa_get_features(struct vhost_dev *dev,
>                                        uint64_t *features)
>   {
> @@ -745,6 +692,24 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
>       return ret;
>   }
>   
> +static int vhost_vdpa_set_features(struct vhost_dev *dev,
> +                                   uint64_t features)
> +{
> +    int ret;
> +
> +    if (vhost_vdpa_one_time_request(dev)) {
> +        return 0;
> +    }
> +
> +    trace_vhost_vdpa_set_features(dev, features);
> +    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> +}
> +
>   static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>   {
>       if (vhost_vdpa_one_time_request(dev)) {
> @@ -772,6 +737,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>       return true;
>   }
>   
> +static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
> +{
> +    struct vhost_vdpa *v;
> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> +    trace_vhost_vdpa_init(dev, opaque);
> +    int ret;
> +
> +    /*
> +     * Similar to VFIO, we end up pinning all guest memory and have to
> +     * disable discarding of RAM.
> +     */
> +    ret = ram_block_discard_disable(true);
> +    if (ret) {
> +        error_report("Cannot set discarding of RAM broken");
> +        return ret;
> +    }
> +
> +    v = opaque;
> +    v->dev = dev;
> +    dev->opaque =  opaque ;
> +    v->listener = vhost_vdpa_memory_listener;
> +    v->msg_type = VHOST_IOTLB_MSG_V2;
> +
> +    vhost_vdpa_get_iova_range(v);
> +
> +    if (vhost_vdpa_one_time_request(dev)) {
> +        return 0;
> +    }
> +
> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> +                               VIRTIO_CONFIG_S_DRIVER);
> +
> +    return 0;
> +}
> +
>   const VhostOps vdpa_ops = {
>           .backend_type = VHOST_BACKEND_TYPE_VDPA,
>           .vhost_backend_init = vhost_vdpa_init,

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/31] vhost: Add VhostShadowVirtqueue
       [not found] ` <20220121202733.404989-3-eperezma@redhat.com>
@ 2022-01-28  6:00   ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-01-28  6:00 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Vhost shadow virtqueue (SVQ) is an intermediate jump for virtqueue
> notifications and buffers, allowing qemu to track them. While qemu is
> forwarding the buffers and virtqueue changes, it is able to commit the
> memory it's being dirtied, the same way regular qemu's VirtIO devices
> do.
>
> This commit only exposes basic SVQ allocation and free. Next patches of
> the series add functionality like notifications and buffers forwarding.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h | 21 ++++++++++
>   hw/virtio/vhost-shadow-virtqueue.c | 64 ++++++++++++++++++++++++++++++
>   hw/virtio/meson.build              |  2 +-
>   3 files changed, 86 insertions(+), 1 deletion(-)
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> new file mode 100644
> index 0000000000..61ea112002
> --- /dev/null
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -0,0 +1,21 @@
> +/*
> + * vhost shadow virtqueue
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef VHOST_SHADOW_VIRTQUEUE_H
> +#define VHOST_SHADOW_VIRTQUEUE_H
> +
> +#include "hw/virtio/vhost.h"
> +
> +typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
> +
> +VhostShadowVirtqueue *vhost_svq_new(void);
> +
> +void vhost_svq_free(VhostShadowVirtqueue *vq);
> +
> +#endif
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> new file mode 100644
> index 0000000000..5ee7b401cb
> --- /dev/null
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -0,0 +1,64 @@
> +/*
> + * vhost shadow virtqueue
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
> +
> +#include "qemu/error-report.h"
> +#include "qemu/event_notifier.h"
> +
> +/* Shadow virtqueue to relay notifications */
> +typedef struct VhostShadowVirtqueue {
> +    /* Shadow kick notifier, sent to vhost */
> +    EventNotifier hdev_kick;
> +    /* Shadow call notifier, sent to vhost */
> +    EventNotifier hdev_call;
> +} VhostShadowVirtqueue;
> +
> +/**
> + * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> + * methods and file descriptors.
> + */
> +VhostShadowVirtqueue *vhost_svq_new(void)
> +{
> +    g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> +    int r;
> +
> +    r = event_notifier_init(&svq->hdev_kick, 0);
> +    if (r != 0) {
> +        error_report("Couldn't create kick event notifier: %s",
> +                     strerror(errno));
> +        goto err_init_hdev_kick;
> +    }
> +
> +    r = event_notifier_init(&svq->hdev_call, 0);
> +    if (r != 0) {
> +        error_report("Couldn't create call event notifier: %s",
> +                     strerror(errno));
> +        goto err_init_hdev_call;
> +    }
> +
> +    return g_steal_pointer(&svq);
> +
> +err_init_hdev_call:
> +    event_notifier_cleanup(&svq->hdev_kick);
> +
> +err_init_hdev_kick:
> +    return NULL;
> +}
> +
> +/**
> + * Free the resources of the shadow virtqueue.
> + */
> +void vhost_svq_free(VhostShadowVirtqueue *vq)
> +{
> +    event_notifier_cleanup(&vq->hdev_kick);
> +    event_notifier_cleanup(&vq->hdev_call);
> +    g_free(vq);
> +}
> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> index 521f7d64a8..2dc87613bc 100644
> --- a/hw/virtio/meson.build
> +++ b/hw/virtio/meson.build
> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>   
>   virtio_ss = ss.source_set()
>   virtio_ss.add(files('virtio.c'))
> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c'))
> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))


I wonder if we need a dedicated config option for shadow virtqueue.

Thanks


>   virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
>   virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
>   virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/31] vDPA shadow virtqueue
       [not found] <20220121202733.404989-1-eperezma@redhat.com>
                   ` (2 preceding siblings ...)
       [not found] ` <20220121202733.404989-3-eperezma@redhat.com>
@ 2022-01-28  6:02 ` Jason Wang
       [not found]   ` <CAJaqyWfWxQSJc3YMpF6g7VwZBN_ab0Z+1nXgWH1sg+uBaOYgBQ@mail.gmail.com>
       [not found] ` <20220121202733.404989-4-eperezma@redhat.com>
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-28  6:02 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This series enables shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's emulated virtio device
> operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked, but at this RFC SVQ is not enabled on migration automatically.
>
> Thanks of being a buffers relay system, SVQ can be used also to
> communicate devices and drivers with different capabilities, like
> devices that only support packed vring and not split and old guests with
> no driver packed support.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> This version of SVQ is limited in the amount of features it can use with
> guest and device, because this series is already very big otherwise.
> Features like indirect or event_idx will be addressed in future series.
>
> SVQ needs to be enabled with cmdline parameter x-svq, like:
>
> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true
>
> In this version it cannot be enabled or disabled in runtime. Further
> series will remove this limitation and will enable it only for migration
> time.
>
> Some patches are intentionally very small to ease review, but they can
> be squashed if preferred.
>
> Patches 1-10 prepares the SVQ and QEMU to support both guest to device
> and device to guest notifications forwarding, with the extra qemu hop.
> That part can be tested in isolation if cmdline change is reproduced.
>
> Patches from 11 to 18 implement the actual buffer forwarding, but with
> no IOMMU support. It requires a vdpa device capable of addressing all
> qemu vaddr.
>
> Patches 19 to 23 adds the iommu support, so the device with address
> range limitations can access SVQ through this new virtual address space
> created.
>
> The rest of the series add the last pieces needed for migration.
>
> Comments are welcome.


I wonder the performance impact. So performance numbers are more than 
welcomed.

Thanks


>
> TODO:
> * Event, indirect, packed, and other features of virtio.
> * To separate buffers forwarding in its own AIO context, so we can
>    throw more threads to that task and we don't need to stop the main
>    event loop.
> * Support virtio-net control vq.
> * Proper documentation.
>
> Changes from v5 RFC:
> * Remove dynamic enablement of SVQ, making less dependent of the device.
> * Enable live migration if SVQ is enabled.
> * Fix SVQ when driver reset.
> * Comments addressed, specially in the iova area.
> * Rebase on latest master, adding multiqueue support (but no networking
>    control vq processing).
> v5 link:
> https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg07250.html
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>    already present iova-tree for that.
> * Proper validation of guest features. Now SVQ can negotiate a
>    different set of features with the device when enabled.
> * Support of host notifiers memory regions
> * Handling of SVQ full queue in case guest's descriptors span to
>    different memory regions (qemu's VA chunks).
> * Flush pending used buffers at end of SVQ operation.
> * QMP command now looks by NetClientState name. Other devices will need
>    to implement it's way to enable vdpa.
> * Rename QMP command to set, so it looks more like a way of working
> * Better use of qemu error system
> * Make a few assertions proper error-handling paths.
> * Add more documentation
> * Less coupling of virtio / vhost, that could cause friction on changes
> * Addressed many other small comments and small fixes.
>
> Changes from v3 RFC:
>    * Move everything to vhost-vdpa backend. A big change, this allowed
>      some cleanup but more code has been added in other places.
>    * More use of glib utilities, especially to manage memory.
> v3 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>
> Changes from v2 RFC:
>    * Adding vhost-vdpa devices support
>    * Fixed some memory leaks pointed by different comments
> v2 link:
> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>
> Changes from v1 RFC:
>    * Use QMP instead of migration to start SVQ mode.
>    * Only accepting IOMMU devices, closer behavior with target devices
>      (vDPA)
>    * Fix invalid masking/unmasking of vhost call fd.
>    * Use of proper methods for synchronization.
>    * No need to modify VirtIO device code, all of the changes are
>      contained in vhost code.
>    * Delete superfluous code.
>    * An intermediate RFC was sent with only the notifications forwarding
>      changes. It can be seen in
>      https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
> v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>
> Eugenio Pérez (20):
>        virtio: Add VIRTIO_F_QUEUE_STATE
>        virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>        virtio: Add virtio_queue_is_host_notifier_enabled
>        vhost: Make vhost_virtqueue_{start,stop} public
>        vhost: Add x-vhost-enable-shadow-vq qmp
>        vhost: Add VhostShadowVirtqueue
>        vdpa: Register vdpa devices in a list
>        vhost: Route guest->host notification through shadow virtqueue
>        Add vhost_svq_get_svq_call_notifier
>        Add vhost_svq_set_guest_call_notifier
>        vdpa: Save call_fd in vhost-vdpa
>        vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>        vhost: Route host->guest notification through shadow virtqueue
>        virtio: Add vhost_shadow_vq_get_vring_addr
>        vdpa: Save host and guest features
>        vhost: Add vhost_svq_valid_device_features to shadow vq
>        vhost: Shadow virtqueue buffers forwarding
>        vhost: Add VhostIOVATree
>        vhost: Use a tree to store memory mappings
>        vdpa: Add custom IOTLB translations to SVQ
>
> Eugenio Pérez (31):
>    vdpa: Reorder virtio/vhost-vdpa.c functions
>    vhost: Add VhostShadowVirtqueue
>    vdpa: Add vhost_svq_get_dev_kick_notifier
>    vdpa: Add vhost_svq_set_svq_kick_fd
>    vhost: Add Shadow VirtQueue kick forwarding capabilities
>    vhost: Route guest->host notification through shadow virtqueue
>    vhost: dd vhost_svq_get_svq_call_notifier
>    vhost: Add vhost_svq_set_guest_call_notifier
>    vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>    vhost: Route host->guest notification through shadow virtqueue
>    vhost: Add vhost_svq_valid_device_features to shadow vq
>    vhost: Add vhost_svq_valid_guest_features to shadow vq
>    vhost: Add vhost_svq_ack_guest_features to shadow vq
>    virtio: Add vhost_shadow_vq_get_vring_addr
>    vdpa: Add vhost_svq_get_num
>    vhost: pass queue index to vhost_vq_get_addr
>    vdpa: adapt vhost_ops callbacks to svq
>    vhost: Shadow virtqueue buffers forwarding
>    utils: Add internal DMAMap to iova-tree
>    util: Store DMA entries in a list
>    util: Add iova_tree_alloc
>    vhost: Add VhostIOVATree
>    vdpa: Add custom IOTLB translations to SVQ
>    vhost: Add vhost_svq_get_last_used_idx
>    vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
>    vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ
>    vdpa: Never set log_base addr if SVQ is enabled
>    vdpa: Expose VHOST_F_LOG_ALL on SVQ
>    vdpa: Make ncs autofree
>    vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
>    vdpa: Add x-svq to NetdevVhostVDPAOptions
>
>   qapi/net.json                      |   5 +-
>   hw/virtio/vhost-iova-tree.h        |  27 +
>   hw/virtio/vhost-shadow-virtqueue.h |  46 ++
>   include/hw/virtio/vhost-vdpa.h     |   7 +
>   include/qemu/iova-tree.h           |  17 +
>   hw/virtio/vhost-iova-tree.c        | 157 ++++++
>   hw/virtio/vhost-shadow-virtqueue.c | 761 +++++++++++++++++++++++++++++
>   hw/virtio/vhost-vdpa.c             | 740 ++++++++++++++++++++++++----
>   hw/virtio/vhost.c                  |   6 +-
>   net/vhost-vdpa.c                   |  58 ++-
>   util/iova-tree.c                   | 161 +++++-
>   hw/virtio/meson.build              |   2 +-
>   12 files changed, 1852 insertions(+), 135 deletions(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>   create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/31] vdpa: Add vhost_svq_get_dev_kick_notifier
       [not found] ` <20220121202733.404989-4-eperezma@redhat.com>
@ 2022-01-28  6:03   ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-01-28  6:03 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Is needed so vhost-vdpa knows the device's kick event fd.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  4 ++++
>   hw/virtio/vhost-shadow-virtqueue.c | 10 +++++++++-
>   2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 61ea112002..400effd9f2 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -11,9 +11,13 @@
>   #define VHOST_SHADOW_VIRTQUEUE_H
>   
>   #include "hw/virtio/vhost.h"
> +#include "qemu/event_notifier.h"


Let's move this part to patch 2.

Thanks


>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> +const EventNotifier *vhost_svq_get_dev_kick_notifier(
> +                                              const VhostShadowVirtqueue *svq);
> +
>   VhostShadowVirtqueue *vhost_svq_new(void);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 5ee7b401cb..bd87110073 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -11,7 +11,6 @@
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   
>   #include "qemu/error-report.h"
> -#include "qemu/event_notifier.h"
>   
>   /* Shadow virtqueue to relay notifications */
>   typedef struct VhostShadowVirtqueue {
> @@ -21,6 +20,15 @@ typedef struct VhostShadowVirtqueue {
>       EventNotifier hdev_call;
>   } VhostShadowVirtqueue;
>   
> +/**
> + * The notifier that SVQ will use to notify the device.
> + */
> +const EventNotifier *vhost_svq_get_dev_kick_notifier(
> +                                               const VhostShadowVirtqueue *svq)
> +{
> +    return &svq->hdev_kick;
> +}
> +
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd
       [not found] ` <20220121202733.404989-5-eperezma@redhat.com>
@ 2022-01-28  6:29   ` Jason Wang
       [not found]     ` <CAJaqyWc7fbgN-W7y3=iFqHsJzj+1Mg0cuwSu+my=62nu9vGOqA@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-28  6:29 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This function allows the vhost-vdpa backend to override kick_fd.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  1 +
>   hw/virtio/vhost-shadow-virtqueue.c | 45 ++++++++++++++++++++++++++++++
>   2 files changed, 46 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 400effd9f2..a56ecfc09d 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -15,6 +15,7 @@
>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index bd87110073..21534bc94d 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -11,6 +11,7 @@
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   
>   #include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
>   
>   /* Shadow virtqueue to relay notifications */
>   typedef struct VhostShadowVirtqueue {
> @@ -18,8 +19,20 @@ typedef struct VhostShadowVirtqueue {
>       EventNotifier hdev_kick;
>       /* Shadow call notifier, sent to vhost */
>       EventNotifier hdev_call;
> +
> +    /*
> +     * Borrowed virtqueue's guest to host notifier.
> +     * To borrow it in this event notifier allows to register on the event
> +     * loop and access the associated shadow virtqueue easily. If we use the
> +     * VirtQueue, we don't have an easy way to retrieve it.
> +     *
> +     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
> +     */
> +    EventNotifier svq_kick;
>   } VhostShadowVirtqueue;
>   
> +#define INVALID_SVQ_KICK_FD -1
> +
>   /**
>    * The notifier that SVQ will use to notify the device.
>    */
> @@ -29,6 +42,35 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> +/**
> + * Set a new file descriptor for the guest to kick SVQ and notify for avail
> + *
> + * @svq          The svq
> + * @svq_kick_fd  The new svq kick fd
> + */
> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> +{
> +    EventNotifier tmp;
> +    bool check_old = INVALID_SVQ_KICK_FD !=
> +                     event_notifier_get_fd(&svq->svq_kick);
> +
> +    if (check_old) {
> +        event_notifier_set_handler(&svq->svq_kick, NULL);
> +        event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
> +    }


It looks to me we don't do similar things in vhost-net. Any reason for 
caring about the old svq_kick?


> +
> +    /*
> +     * event_notifier_set_handler already checks for guest's notifications if
> +     * they arrive to the new file descriptor in the switch, so there is no
> +     * need to explicitely check for them.
> +     */
> +    event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> +
> +    if (!check_old || event_notifier_test_and_clear(&tmp)) {
> +        event_notifier_set(&svq->hdev_kick);


Any reason we need to kick the device directly here?

Thanks


> +    }
> +}
> +
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.
> @@ -52,6 +94,9 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>           goto err_init_hdev_call;
>       }
>   
> +    /* Placeholder descriptor, it should be deleted at set_kick_fd */
> +    event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> +
>       return g_steal_pointer(&svq);
>   
>   err_init_hdev_call:

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 05/31] vhost: Add Shadow VirtQueue kick forwarding capabilities
       [not found] ` <20220121202733.404989-6-eperezma@redhat.com>
@ 2022-01-28  6:32   ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-01-28  6:32 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> At this mode no buffer forwarding will be performed in SVQ mode: Qemu
> will just forward the guest's kicks to the device.
>
> Also, host notifiers must be disabled at SVQ start, and they will not
> start if SVQ has been enabled when the device is stopped. This will be
> addressed in next patches.


We need to disable host_notifier_mr as well, otherwise guest may touch 
the hardware doorbell directly without going through eventfd.


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 27 ++++++++++++++++++++++++++-
>   2 files changed, 28 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index a56ecfc09d..4c583a9171 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -19,6 +19,8 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
>   
> +void vhost_svq_stop(VhostShadowVirtqueue *svq);
> +
>   VhostShadowVirtqueue *vhost_svq_new(void);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 21534bc94d..8991f0b3c3 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -42,11 +42,26 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> +/* Forward guest notifications */
> +static void vhost_handle_guest_kick(EventNotifier *n)
> +{
> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> +                                             svq_kick);
> +
> +    if (unlikely(!event_notifier_test_and_clear(n))) {
> +        return;
> +    }
> +
> +    event_notifier_set(&svq->hdev_kick);
> +}
> +
>   /**
>    * Set a new file descriptor for the guest to kick SVQ and notify for avail
>    *
>    * @svq          The svq
> - * @svq_kick_fd  The new svq kick fd
> + * @svq_kick_fd  The svq kick fd
> + *
> + * Note that SVQ will never close the old file descriptor.
>    */
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>   {
> @@ -65,12 +80,22 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>        * need to explicitely check for them.
>        */
>       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> +    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
>   
>       if (!check_old || event_notifier_test_and_clear(&tmp)) {
>           event_notifier_set(&svq->hdev_kick);
>       }
>   }
>   
> +/**
> + * Stop shadow virtqueue operation.
> + * @svq Shadow Virtqueue
> + */
> +void vhost_svq_stop(VhostShadowVirtqueue *svq)
> +{
> +    event_notifier_set_handler(&svq->svq_kick, NULL);
> +}


This function is not used in the patch.

Thanks


> +
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue
       [not found] ` <20220121202733.404989-7-eperezma@redhat.com>
@ 2022-01-28  6:56   ` Jason Wang
       [not found]     ` <CAJaqyWeRbmwW80q3q52nFw=iz1xcPRFviFaRHo0nzXpEb+3m3A@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-28  6:56 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> At this moment no buffer forwarding will be performed in SVQ mode: Qemu
> just forward the guest's kicks to the device. This commit also set up
> SVQs in the vhost device.
>
> Host memory notifiers regions are left out for simplicity, and they will
> not be addressed in this series.


I wonder if it's better to squash this into patch 5 since it gives us a 
full guest->host forwarding.


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   include/hw/virtio/vhost-vdpa.h |   4 ++
>   hw/virtio/vhost-vdpa.c         | 122 ++++++++++++++++++++++++++++++++-
>   2 files changed, 124 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 3ce79a646d..009a9f3b6b 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -12,6 +12,8 @@
>   #ifndef HW_VIRTIO_VHOST_VDPA_H
>   #define HW_VIRTIO_VHOST_VDPA_H
>   
> +#include <gmodule.h>
> +
>   #include "hw/virtio/virtio.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
> @@ -27,6 +29,8 @@ typedef struct vhost_vdpa {
>       bool iotlb_batch_begin_sent;
>       MemoryListener listener;
>       struct vhost_vdpa_iova_range iova_range;
> +    bool shadow_vqs_enabled;
> +    GPtrArray *shadow_vqs;
>       struct vhost_dev *dev;
>       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>   } VhostVDPA;
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 6c10a7f05f..18de14f0fb 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -17,12 +17,14 @@
>   #include "hw/virtio/vhost.h"
>   #include "hw/virtio/vhost-backend.h"
>   #include "hw/virtio/virtio-net.h"
> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "hw/virtio/vhost-vdpa.h"
>   #include "exec/address-spaces.h"
>   #include "qemu/main-loop.h"
>   #include "cpu.h"
>   #include "trace.h"
>   #include "qemu-common.h"
> +#include "qapi/error.h"
>   
>   /*
>    * Return one past the end of the end of section. Be careful with uint64_t
> @@ -409,8 +411,14 @@ err:
>   
>   static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
>   {
> +    struct vhost_vdpa *v = dev->opaque;
>       int i;
>   
> +    if (v->shadow_vqs_enabled) {
> +        /* SVQ is not compatible with host notifiers mr */


I guess there should be a TODO or FIXME here.


> +        return;
> +    }
> +
>       for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
>           if (vhost_vdpa_host_notifier_init(dev, i)) {
>               goto err;
> @@ -424,6 +432,17 @@ err:
>       return;
>   }
>   
> +static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    size_t idx;
> +
> +    for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
> +        vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
> +    }
> +    g_ptr_array_free(v->shadow_vqs, true);
> +}
> +
>   static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>   {
>       struct vhost_vdpa *v;
> @@ -432,6 +451,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>       trace_vhost_vdpa_cleanup(dev, v);
>       vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       memory_listener_unregister(&v->listener);
> +    vhost_vdpa_svq_cleanup(dev);
>   
>       dev->opaque = NULL;
>       ram_block_discard_disable(false);
> @@ -507,9 +527,15 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
>   
>   static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>   {
> +    struct vhost_vdpa *v = dev->opaque;
>       int ret;
>       uint8_t status = 0;
>   
> +    for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> +        vhost_svq_stop(svq);
> +    }
> +
>       ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>       trace_vhost_vdpa_reset_device(dev, status);
>       return ret;
> @@ -639,13 +665,28 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>       return ret;
>   }
>   
> -static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> -                                       struct vhost_vring_file *file)
> +static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
> +                                         struct vhost_vring_file *file)
>   {
>       trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
>   }
>   
> +static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
> +                                       struct vhost_vring_file *file)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> +
> +    if (v->shadow_vqs_enabled) {
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> +        vhost_svq_set_svq_kick_fd(svq, file->fd);
> +        return 0;
> +    } else {
> +        return vhost_vdpa_set_vring_dev_kick(dev, file);
> +    }
> +}
> +
>   static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>                                          struct vhost_vring_file *file)
>   {
> @@ -653,6 +694,33 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>   }
>   
> +/**
> + * Set shadow virtqueue descriptors to the device
> + *
> + * @dev   The vhost device model
> + * @svq   The shadow virtqueue
> + * @idx   The index of the virtqueue in the vhost device
> + */
> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> +                                VhostShadowVirtqueue *svq,
> +                                unsigned idx)
> +{
> +    struct vhost_vring_file file = {
> +        .index = dev->vq_index + idx,
> +    };
> +    const EventNotifier *event_notifier;
> +    int r;
> +
> +    event_notifier = vhost_svq_get_dev_kick_notifier(svq);


A question, any reason for making VhostShadowVirtqueue private? If we 
export it in .h we don't need helper to access its member like 
vhost_svq_get_dev_kick_notifier().

Note that vhost_dev is a public structure.


> +    file.fd = event_notifier_get_fd(event_notifier);
> +    r = vhost_vdpa_set_vring_dev_kick(dev, &file);
> +    if (unlikely(r != 0)) {
> +        error_report("Can't set device kick fd (%d)", -r);
> +    }


I wonder whether or not we can generalize the logic here and 
vhost_vdpa_set_vring_kick(). There's nothing vdpa specific unless the 
vhost_ops->set_vring_kick().


> +
> +    return r == 0;
> +}
> +
>   static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -660,6 +728,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>   
>       if (started) {
>           vhost_vdpa_host_notifiers_init(dev);
> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> +            bool ok = vhost_vdpa_svq_setup(dev, svq, i);
> +            if (unlikely(!ok)) {
> +                return -1;
> +            }
> +        }
>           vhost_vdpa_set_vring_ready(dev);
>       } else {
>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> @@ -737,6 +812,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>       return true;
>   }
>   
> +/**
> + * Adaptor function to free shadow virtqueue through gpointer
> + *
> + * @svq   The Shadow Virtqueue
> + */
> +static void vhost_psvq_free(gpointer svq)
> +{
> +    vhost_svq_free(svq);
> +}


Any reason for such indirection? Can we simply use vhost_svq_free()?

Thanks


> +
> +static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
> +                               Error **errp)
> +{
> +    size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
> +    g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
> +                                                           vhost_psvq_free);
> +    if (!v->shadow_vqs_enabled) {
> +        goto out;
> +    }
> +
> +    for (unsigned n = 0; n < hdev->nvqs; ++n) {
> +        VhostShadowVirtqueue *svq = vhost_svq_new();
> +
> +        if (unlikely(!svq)) {
> +            error_setg(errp, "Cannot create svq %u", n);
> +            return -1;
> +        }
> +        g_ptr_array_add(v->shadow_vqs, svq);
> +    }
> +
> +out:
> +    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
> +    return 0;
> +}
> +
>   static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>   {
>       struct vhost_vdpa *v;
> @@ -759,6 +869,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>       dev->opaque =  opaque ;
>       v->listener = vhost_vdpa_memory_listener;
>       v->msg_type = VHOST_IOTLB_MSG_V2;
> +    ret = vhost_vdpa_init_svq(dev, v, errp);
> +    if (ret) {
> +        goto err;
> +    }
>   
>       vhost_vdpa_get_iova_range(v);
>   
> @@ -770,6 +884,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>                                  VIRTIO_CONFIG_S_DRIVER);
>   
>       return 0;
> +
> +err:
> +    ram_block_discard_disable(false);
> +    return ret;
>   }
>   
>   const VhostOps vdpa_ops = {

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/31] vhost: dd vhost_svq_get_svq_call_notifier
       [not found] ` <20220121202733.404989-8-eperezma@redhat.com>
@ 2022-01-29  7:57   ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-01-29  7:57 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This allows vhost-vdpa device to retrieve device -> svq call eventfd.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


What did 'dd' mean in the title?

Thanks


> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 12 ++++++++++++
>   2 files changed, 14 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 4c583a9171..a78234b52b 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -18,6 +18,8 @@ typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
>                                                 const VhostShadowVirtqueue *svq);
> +const EventNotifier *vhost_svq_get_svq_call_notifier(
> +                                              const VhostShadowVirtqueue *svq);
>   
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 8991f0b3c3..25fcdf16ec 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -55,6 +55,18 @@ static void vhost_handle_guest_kick(EventNotifier *n)
>       event_notifier_set(&svq->hdev_kick);
>   }
>   
> +/**
> + * Obtain the SVQ call notifier, where vhost device notifies SVQ that there
> + * exists pending used buffers.
> + *
> + * @svq Shadow Virtqueue
> + */
> +const EventNotifier *vhost_svq_get_svq_call_notifier(
> +                                               const VhostShadowVirtqueue *svq)
> +{
> +    return &svq->hdev_call;
> +}
> +
>   /**
>    * Set a new file descriptor for the guest to kick SVQ and notify for avail
>    *

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
       [not found] ` <20220121202733.404989-10-eperezma@redhat.com>
@ 2022-01-29  8:05   ` Jason Wang
       [not found]     ` <CAJaqyWda5sBw9VGBrz8g60OJ07Eeq45RRYu9vwgOPZFwten9rw@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-29  8:05 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>   1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 18de14f0fb..029f98feee 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>       }
>   }
>   
> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> -                                       struct vhost_vring_file *file)
> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
> +                                         struct vhost_vring_file *file)
>   {
>       trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>   }
>   
> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> +                                     struct vhost_vring_file *file)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (v->shadow_vqs_enabled) {
> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
> +
> +        vhost_svq_set_guest_call_notifier(svq, file->fd);


Two questions here (had similar questions for vring kick):

1) Any reason that we setup the eventfd for vhost-vdpa in 
vhost_vdpa_svq_setup() not here?

2) The call could be disabled by using -1 as the fd, I don't see any 
code to deal with that.

Thanks


> +        return 0;
> +    } else {
> +        return vhost_vdpa_set_vring_dev_call(dev, file);
> +    }
> +}
> +
>   /**
>    * Set shadow virtqueue descriptors to the device
>    *

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
       [not found] ` <20220121202733.404989-12-eperezma@redhat.com>
@ 2022-01-29  8:11   ` Jason Wang
       [not found]     ` <CAJaqyWfaf0RG9AzW4ktH2L3wyfOGuSk=rNm-j7xRkpdfVvkY-g@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-29  8:11 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This allows SVQ to negotiate features with the device. For the device,
> SVQ is a driver. While this function needs to bypass all non-transport
> features, it needs to disable the features that SVQ does not support
> when forwarding buffers. This includes packed vq layout, indirect
> descriptors or event idx.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>   hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++
>   hw/virtio/vhost-vdpa.c             | 21 ++++++++++++++
>   3 files changed, 67 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index c9ffa11fce..d963867a04 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -15,6 +15,8 @@
>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> +bool vhost_svq_valid_device_features(uint64_t *features);
> +
>   void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>   void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
>   const EventNotifier *vhost_svq_get_dev_kick_notifier(
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 9619c8082c..51442b3dbf 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>       return &svq->hdev_kick;
>   }
>   
> +/**
> + * Validate the transport device features that SVQ can use with the device
> + *
> + * @dev_features  The device features. If success, the acknowledged features.
> + *
> + * Returns true if SVQ can go with a subset of these, false otherwise.
> + */
> +bool vhost_svq_valid_device_features(uint64_t *dev_features)
> +{
> +    bool r = true;
> +
> +    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
> +         ++b) {
> +        switch (b) {
> +        case VIRTIO_F_NOTIFY_ON_EMPTY:
> +        case VIRTIO_F_ANY_LAYOUT:
> +            continue;
> +
> +        case VIRTIO_F_ACCESS_PLATFORM:
> +            /* SVQ does not know how to translate addresses */


I may miss something but any reason that we need to disable 
ACCESS_PLATFORM? I'd expect the vring helper we used for shadow 
virtqueue can deal with vIOMMU perfectly.


> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +                r = false;
> +            }
> +            break;
> +
> +        case VIRTIO_F_VERSION_1:


I had the same question here.

Thanks


> +            /* SVQ trust that guest vring is little endian */
> +            if (!(*dev_features & BIT_ULL(b))) {
> +                set_bit(b, dev_features);
> +                r = false;
> +            }
> +            continue;
> +
> +        default:
> +            if (*dev_features & BIT_ULL(b)) {
> +                clear_bit(b, dev_features);
> +            }
> +        }
> +    }
> +
> +    return r;
> +}
> +
>   /* Forward guest notifications */
>   static void vhost_handle_guest_kick(EventNotifier *n)
>   {
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index bdb45c8808..9d801cf907 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -855,10 +855,31 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>       size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
>       g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
>                                                              vhost_psvq_free);
> +    uint64_t dev_features;
> +    uint64_t svq_features;
> +    int r;
> +    bool ok;
> +
>       if (!v->shadow_vqs_enabled) {
>           goto out;
>       }
>   
> +    r = vhost_vdpa_get_features(hdev, &dev_features);
> +    if (r != 0) {
> +        error_setg(errp, "Can't get vdpa device features, got (%d)", r);
> +        return r;
> +    }
> +
> +    svq_features = dev_features;
> +    ok = vhost_svq_valid_device_features(&svq_features);
> +    if (unlikely(!ok)) {
> +        error_setg(errp,
> +            "SVQ Invalid device feature flags, offer: 0x%"PRIx64", ok: 0x%"PRIx64,
> +            hdev->features, svq_features);
> +        return -1;
> +    }
> +
> +    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
>           VhostShadowVirtqueue *svq = vhost_svq_new();
>   

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 15/31] vdpa: Add vhost_svq_get_num
       [not found] ` <20220121202733.404989-16-eperezma@redhat.com>
@ 2022-01-29  8:14   ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-01-29  8:14 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This reports the guest's visible SVQ effective length, not the device's
> one.


I think we need to explain if there could be a case that the SVQ size is 
not equal to the device queue size.

Thanks


>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h | 1 +
>   hw/virtio/vhost-shadow-virtqueue.c | 5 +++++
>   2 files changed, 6 insertions(+)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 3521e8094d..035207a469 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -29,6 +29,7 @@ const EventNotifier *vhost_svq_get_svq_call_notifier(
>                                                 const VhostShadowVirtqueue *svq);
>   void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
>                                 struct vhost_vring_addr *addr);
> +uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 0f2c2403ff..f129ec8395 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -212,6 +212,11 @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
>       addr->used_user_addr = (uint64_t)svq->vring.used;
>   }
>   
> +uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq)
> +{
> +    return svq->vring.num;
> +}
> +
>   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
>   {
>       size_t desc_size = sizeof(vring_desc_t) * svq->vring.num;

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr
       [not found] ` <20220121202733.404989-17-eperezma@redhat.com>
@ 2022-01-29  8:20   ` Jason Wang
       [not found]     ` <CAJaqyWexu=VroHQxmtJDQm=iu1va-s1VGR8hqGOreG0SOisjYg@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-29  8:20 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Doing that way allows vhost backend to know what address to return.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 7b03efccec..64b955ba0c 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -798,9 +798,10 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>                                       struct vhost_virtqueue *vq,
>                                       unsigned idx, bool enable_log)
>   {
> -    struct vhost_vring_addr addr;
> +    struct vhost_vring_addr addr = {
> +        .index = idx,
> +    };
>       int r;
> -    memset(&addr, 0, sizeof(struct vhost_vring_addr));
>   
>       if (dev->vhost_ops->vhost_vq_get_addr) {
>           r = dev->vhost_ops->vhost_vq_get_addr(dev, &addr, vq);
> @@ -813,7 +814,6 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>           addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail;
>           addr.used_user_addr = (uint64_t)(unsigned long)vq->used;
>       }


I'm a bit lost in the logic above, any reason we need call 
vhost_vq_get_addr() :) ?

Thanks


> -    addr.index = idx;
>       addr.log_guest_addr = vq->used_phys;
>       addr.flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0;
>       r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
       [not found] ` <20220121202733.404989-18-eperezma@redhat.com>
@ 2022-01-30  4:03   ` Jason Wang
       [not found]     ` <CAJaqyWdRKZp6CwnE+HAr0JALhSRh-trJbZ01kddnLTuRX_tMKQ@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-30  4:03 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> First half of the buffers forwarding part, preparing vhost-vdpa
> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> this is effectively dead code at the moment, but it helps to reduce
> patch size.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |   2 +-
>   hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
>   hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
>   3 files changed, 143 insertions(+), 13 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 035207a469..39aef5ffdf 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>   
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
> -VhostShadowVirtqueue *vhost_svq_new(void);
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
>   
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index f129ec8395..7c168075d7 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   /**
>    * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>    * methods and file descriptors.
> + *
> + * @qsize Shadow VirtQueue size
> + *
> + * Returns the new virtqueue or NULL.
> + *
> + * In case of error, reason is reported through error_report.
>    */
> -VhostShadowVirtqueue *vhost_svq_new(void)
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>   {
> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> +    size_t device_size, driver_size;
>       g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>       int r;
>   
> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>       /* Placeholder descriptor, it should be deleted at set_kick_fd */
>       event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>   
> +    svq->vring.num = qsize;


I wonder if this is the best. E.g some hardware can support up to 32K 
queue size. So this will probably end up with:

1) SVQ use 32K queue size
2) hardware queue uses 256

? Or we SVQ can stick to 256 but this will this cause trouble if we want 
to add event index support?


> +    driver_size = vhost_svq_driver_area_size(svq);
> +    device_size = vhost_svq_device_area_size(svq);
> +    svq->vring.desc = qemu_memalign(qemu_real_host_page_size, driver_size);
> +    svq->vring.avail = (void *)((char *)svq->vring.desc + desc_size);
> +    memset(svq->vring.desc, 0, driver_size);
> +    svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> +    memset(svq->vring.used, 0, device_size);
> +
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
>   
> @@ -318,5 +335,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>       event_notifier_cleanup(&vq->hdev_kick);
>       event_notifier_set_handler(&vq->hdev_call, NULL);
>       event_notifier_cleanup(&vq->hdev_call);
> +    qemu_vfree(vq->vring.desc);
> +    qemu_vfree(vq->vring.used);
>       g_free(vq);
>   }
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 9d801cf907..53e14bafa0 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -641,20 +641,52 @@ static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
>   }
>   
> -static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
> -                                      struct vhost_vring_state *ring)
> +static int vhost_vdpa_set_dev_vring_num(struct vhost_dev *dev,
> +                                        struct vhost_vring_state *ring)
>   {
>       trace_vhost_vdpa_set_vring_num(dev, ring->index, ring->num);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_NUM, ring);
>   }
>   
> -static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
> -                                       struct vhost_vring_state *ring)
> +static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
> +                                    struct vhost_vring_state *ring)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (v->shadow_vqs_enabled) {
> +        /*
> +         * Vring num was set at device start. SVQ num is handled by VirtQueue
> +         * code
> +         */
> +        return 0;
> +    }
> +
> +    return vhost_vdpa_set_dev_vring_num(dev, ring);
> +}
> +
> +static int vhost_vdpa_set_dev_vring_base(struct vhost_dev *dev,
> +                                         struct vhost_vring_state *ring)
>   {
>       trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
>       return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
>   }
>   
> +static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
> +                                     struct vhost_vring_state *ring)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    if (v->shadow_vqs_enabled) {
> +        /*
> +         * Vring base was set at device start. SVQ base is handled by VirtQueue
> +         * code
> +         */
> +        return 0;
> +    }
> +
> +    return vhost_vdpa_set_dev_vring_base(dev, ring);
> +}
> +
>   static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>                                          struct vhost_vring_state *ring)
>   {
> @@ -784,8 +816,8 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>       }
>   }
>   
> -static int vhost_vdpa_get_features(struct vhost_dev *dev,
> -                                     uint64_t *features)
> +static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,
> +                                       uint64_t *features)
>   {
>       int ret;
>   
> @@ -794,15 +826,64 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
>       return ret;
>   }
>   
> +static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    int ret = vhost_vdpa_get_dev_features(dev, features);
> +
> +    if (ret == 0 && v->shadow_vqs_enabled) {
> +        /* Filter only features that SVQ can offer to guest */
> +        vhost_svq_valid_guest_features(features);
> +    }


Sorry if I've asked before, I think it's sufficient to filter out the 
device features that we don't support during and fail the vhost 
initialization. Any reason we need do it again here?


> +
> +    return ret;
> +}
> +
>   static int vhost_vdpa_set_features(struct vhost_dev *dev,
>                                      uint64_t features)
>   {
> +    struct vhost_vdpa *v = dev->opaque;
>       int ret;
>   
>       if (vhost_vdpa_one_time_request(dev)) {
>           return 0;
>       }
>   
> +    if (v->shadow_vqs_enabled) {
> +        uint64_t dev_features, svq_features, acked_features;
> +        bool ok;
> +
> +        ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> +        if (ret != 0) {
> +            error_report("Can't get vdpa device features, got (%d)", ret);
> +            return ret;
> +        }
> +
> +        svq_features = dev_features;
> +        ok = vhost_svq_valid_device_features(&svq_features);
> +        if (unlikely(!ok)) {
> +            error_report("SVQ Invalid device feature flags, offer: 0x%"
> +                         PRIx64", ok: 0x%"PRIx64, dev->features, svq_features);
> +            return -1;
> +        }
> +
> +        ok = vhost_svq_valid_guest_features(&features);
> +        if (unlikely(!ok)) {
> +            error_report(
> +                "Invalid guest acked feature flag, acked: 0x%"
> +                PRIx64", ok: 0x%"PRIx64, dev->acked_features, features);
> +            return -1;
> +        }
> +
> +        ok = vhost_svq_ack_guest_features(svq_features, features,
> +                                          &acked_features);
> +        if (unlikely(!ok)) {
> +            return -1;
> +        }
> +
> +        features = acked_features;
> +    }
> +
>       trace_vhost_vdpa_set_features(dev, features);
>       ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
>       if (ret) {
> @@ -822,13 +903,31 @@ static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>       return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
>   }
>   
> -static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
> -                    struct vhost_vring_addr *addr, struct vhost_virtqueue *vq)
> +static void vhost_vdpa_vq_get_guest_addr(struct vhost_vring_addr *addr,
> +                                         struct vhost_virtqueue *vq)
>   {
> -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>       addr->desc_user_addr = (uint64_t)(unsigned long)vq->desc_phys;
>       addr->avail_user_addr = (uint64_t)(unsigned long)vq->avail_phys;
>       addr->used_user_addr = (uint64_t)(unsigned long)vq->used_phys;
> +}
> +
> +static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
> +                                  struct vhost_vring_addr *addr,
> +                                  struct vhost_virtqueue *vq)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +
> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
> +
> +    if (v->shadow_vqs_enabled) {
> +        int idx = vhost_vdpa_get_vq_index(dev, addr->index);
> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
> +
> +        vhost_svq_get_vring_addr(svq, addr);
> +    } else {
> +        vhost_vdpa_vq_get_guest_addr(addr, vq);
> +    }
> +
>       trace_vhost_vdpa_vq_get_addr(dev, vq, addr->desc_user_addr,
>                                    addr->avail_user_addr, addr->used_user_addr);
>       return 0;
> @@ -849,6 +948,12 @@ static void vhost_psvq_free(gpointer svq)
>       vhost_svq_free(svq);
>   }
>   
> +static int vhost_vdpa_get_max_queue_size(struct vhost_dev *dev,
> +                                         uint16_t *qsize)
> +{
> +    return vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_NUM, qsize);
> +}
> +
>   static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>                                  Error **errp)
>   {
> @@ -857,6 +962,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>                                                              vhost_psvq_free);
>       uint64_t dev_features;
>       uint64_t svq_features;
> +    uint16_t qsize;
>       int r;
>       bool ok;
>   
> @@ -864,7 +970,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>           goto out;
>       }
>   
> -    r = vhost_vdpa_get_features(hdev, &dev_features);
> +    r = vhost_vdpa_get_dev_features(hdev, &dev_features);
>       if (r != 0) {
>           error_setg(errp, "Can't get vdpa device features, got (%d)", r);
>           return r;
> @@ -879,9 +985,14 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>           return -1;
>       }
>   
> +    r = vhost_vdpa_get_max_queue_size(hdev, &qsize);
> +    if (unlikely(r)) {
> +        qsize = 256;
> +    }


Should we fail instead of having a "default" value here?

Thanks


> +
>       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> -        VhostShadowVirtqueue *svq = vhost_svq_new();
> +        VhostShadowVirtqueue *svq = vhost_svq_new(qsize);
>   
>           if (unlikely(!svq)) {
>               error_setg(errp, "Cannot create svq %u", n);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
       [not found] ` <20220121202733.404989-19-eperezma@redhat.com>
@ 2022-01-30  4:42   ` Jason Wang
       [not found]     ` <CAJaqyWdDax2+e3ZUEYyYNe5xAL=Oocu+72n89ygayrzYrQz2Yw@mail.gmail.com>
  2022-01-30  6:46   ` Jason Wang
  1 sibling, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-30  4:42 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Initial version of shadow virtqueue that actually forward buffers. There
> is no iommu support at the moment, and that will be addressed in future
> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
> this means that SVQ is not usable at this point of the series on any
> device.
>
> For simplicity it only supports modern devices, that expects vring
> in little endian, with split ring and no event idx or indirect
> descriptors. Support for them will not be added in this series.
>
> It reuses the VirtQueue code for the device part. The driver part is
> based on Linux's virtio_ring driver, but with stripped functionality
> and optimizations so it's easier to review.
>
> However, forwarding buffers have some particular pieces: One of the most
> unexpected ones is that a guest's buffer can expand through more than
> one descriptor in SVQ. While this is handled gracefully by qemu's
> emulated virtio devices, it may cause unexpected SVQ queue full. This
> patch also solves it by checking for this condition at both guest's
> kicks and device's calls. The code may be more elegant in the future if
> SVQ code runs in its own iocontext.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |   2 +
>   hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
>   hw/virtio/vhost-vdpa.c             | 111 ++++++++-
>   3 files changed, 462 insertions(+), 16 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 39aef5ffdf..19c934af49 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
>   size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>   
> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> +                     VirtQueue *vq);
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
>   VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index 7c168075d7..a1a404f68f 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -9,6 +9,8 @@
>   
>   #include "qemu/osdep.h"
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
> +#include "hw/virtio/vhost.h"
> +#include "hw/virtio/virtio-access.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
>   #include "qemu/error-report.h"
> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
>   
>       /* Guest's call notifier, where SVQ calls guest. */
>       EventNotifier svq_call;
> +
> +    /* Virtio queue shadowing */
> +    VirtQueue *vq;
> +
> +    /* Virtio device */
> +    VirtIODevice *vdev;
> +
> +    /* Map for returning guest's descriptors */
> +    VirtQueueElement **ring_id_maps;
> +
> +    /* Next VirtQueue element that guest made available */
> +    VirtQueueElement *next_guest_avail_elem;
> +
> +    /* Next head to expose to device */
> +    uint16_t avail_idx_shadow;
> +
> +    /* Next free descriptor */
> +    uint16_t free_head;
> +
> +    /* Last seen used idx */
> +    uint16_t shadow_used_idx;
> +
> +    /* Next head to consume from device */
> +    uint16_t last_used_idx;
> +
> +    /* Cache for the exposed notification flag */
> +    bool notification;
>   } VhostShadowVirtqueue;
>   
>   #define INVALID_SVQ_KICK_FD -1
> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
>       return true;
>   }
>   
> -/* Forward guest notifications */
> -static void vhost_handle_guest_kick(EventNotifier *n)
> +/**
> + * Number of descriptors that SVQ can make available from the guest.
> + *
> + * @svq   The svq
> + */
> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>   {
> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> -                                             svq_kick);
> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> +}
> +
> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> +{
> +    uint16_t notification_flag;
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> +    if (svq->notification == enable) {
> +        return;
> +    }
> +
> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> +
> +    svq->notification = enable;
> +    if (enable) {
> +        svq->vring.avail->flags &= ~notification_flag;
> +    } else {
> +        svq->vring.avail->flags |= notification_flag;
> +    }
> +}
> +
> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                    const struct iovec *iovec,
> +                                    size_t num, bool more_descs, bool write)
> +{
> +    uint16_t i = svq->free_head, last = svq->free_head;
> +    unsigned n;
> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> +    vring_desc_t *descs = svq->vring.desc;
> +
> +    if (num == 0) {
> +        return;
> +    }
> +
> +    for (n = 0; n < num; n++) {
> +        if (more_descs || (n + 1 < num)) {
> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> +        } else {
> +            descs[i].flags = flags;
> +        }
> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> +
> +        last = i;
> +        i = cpu_to_le16(descs[i].next);
> +    }
> +
> +    svq->free_head = le16_to_cpu(descs[last].next);
> +}
> +
> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> +                                    VirtQueueElement *elem)
> +{
> +    int head;
> +    unsigned avail_idx;
> +    vring_avail_t *avail = svq->vring.avail;
> +
> +    head = svq->free_head;
> +
> +    /* We need some descriptors here */
> +    assert(elem->out_num || elem->in_num);


Looks like this could be triggered by guest, we need fail instead assert 
here.


> +
> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> +                            elem->in_num > 0, false);
> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> +
> +    /*
> +     * Put entry in available array (but don't update avail->idx until they
> +     * do sync).
> +     */
> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> +    avail->ring[avail_idx] = cpu_to_le16(head);
> +    svq->avail_idx_shadow++;
> +
> +    /* Update avail index after the descriptor is wrote */
> +    smp_wmb();
> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> +
> +    return head;
> +}
> +
> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +{
> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> +
> +    svq->ring_id_maps[qemu_head] = elem;
> +}
> +
> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> +{
> +    /* We need to expose available array entries before checking used flags */
> +    smp_mb();
> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
>           return;
>       }
>   
>       event_notifier_set(&svq->hdev_kick);
>   }
>   
> -/* Forward vhost notifications */
> +/**
> + * Forward available buffers.
> + *
> + * @svq Shadow VirtQueue
> + *
> + * Note that this function does not guarantee that all guest's available
> + * buffers are available to the device in SVQ avail ring. The guest may have
> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> + * vaddr.
> + *
> + * If that happens, guest's kick notifications will be disabled until device
> + * makes some buffers used.
> + */
> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> +{
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(&svq->svq_kick);
> +
> +    /* Make available as many buffers as possible */
> +    do {
> +        if (virtio_queue_get_notification(svq->vq)) {
> +            virtio_queue_set_notification(svq->vq, false);


This looks like an optimization the should belong to 
virtio_queue_set_notification() itself.


> +        }
> +
> +        while (true) {
> +            VirtQueueElement *elem;
> +
> +            if (svq->next_guest_avail_elem) {
> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +            } else {
> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> +            }
> +
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (elem->out_num + elem->in_num >
> +                vhost_svq_available_slots(svq)) {
> +                /*
> +                 * This condition is possible since a contiguous buffer in GPA
> +                 * does not imply a contiguous buffer in qemu's VA
> +                 * scatter-gather segments. If that happen, the buffer exposed
> +                 * to the device needs to be a chain of descriptors at this
> +                 * moment.
> +                 *
> +                 * SVQ cannot hold more available buffers if we are here:
> +                 * queue the current guest descriptor and ignore further kicks
> +                 * until some elements are used.
> +                 */
> +                svq->next_guest_avail_elem = elem;
> +                return;
> +            }
> +
> +            vhost_svq_add(svq, elem);
> +            vhost_svq_kick(svq);
> +        }
> +
> +        virtio_queue_set_notification(svq->vq, true);
> +    } while (!virtio_queue_empty(svq->vq));
> +}
> +
> +/**
> + * Handle guest's kick.
> + *
> + * @n guest kick event notifier, the one that guest set to notify svq.
> + */
> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> +{
> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> +                                             svq_kick);
> +    vhost_handle_guest_kick(svq);
> +}
> +
> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> +{
> +    if (svq->last_used_idx != svq->shadow_used_idx) {
> +        return true;
> +    }
> +
> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> +
> +    return svq->last_used_idx != svq->shadow_used_idx;
> +}
> +
> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> +{
> +    vring_desc_t *descs = svq->vring.desc;
> +    const vring_used_t *used = svq->vring.used;
> +    vring_used_elem_t used_elem;
> +    uint16_t last_used;
> +
> +    if (!vhost_svq_more_used(svq)) {
> +        return NULL;
> +    }
> +
> +    /* Only get used array entries after they have been exposed by dev */
> +    smp_rmb();
> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> +
> +    svq->last_used_idx++;
> +    if (unlikely(used_elem.id >= svq->vring.num)) {
> +        error_report("Device %s says index %u is used", svq->vdev->name,
> +                     used_elem.id);
> +        return NULL;
> +    }
> +
> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> +        error_report(
> +            "Device %s says index %u is used, but it was not available",
> +            svq->vdev->name, used_elem.id);
> +        return NULL;
> +    }
> +
> +    descs[used_elem.id].next = svq->free_head;
> +    svq->free_head = used_elem.id;
> +
> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> +}
> +
> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> +                            bool check_for_avail_queue)
> +{
> +    VirtQueue *vq = svq->vq;
> +
> +    /* Make as many buffers as possible used. */
> +    do {
> +        unsigned i = 0;
> +
> +        vhost_svq_set_notification(svq, false);
> +        while (true) {
> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> +            if (!elem) {
> +                break;
> +            }
> +
> +            if (unlikely(i >= svq->vring.num)) {
> +                virtio_error(svq->vdev,
> +                         "More than %u used buffers obtained in a %u size SVQ",
> +                         i, svq->vring.num);
> +                virtqueue_fill(vq, elem, elem->len, i);
> +                virtqueue_flush(vq, i);


Let's simply use virtqueue_push() here?


> +                i = 0;


Do we need to bail out here?


> +            }
> +            virtqueue_fill(vq, elem, elem->len, i++);
> +        }
> +
> +        virtqueue_flush(vq, i);
> +        event_notifier_set(&svq->svq_call);
> +
> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> +            /*
> +             * Avail ring was full when vhost_svq_flush was called, so it's a
> +             * good moment to make more descriptors available if possible
> +             */
> +            vhost_handle_guest_kick(svq);


Is there better to have a similar check as vhost_handle_guest_kick() did?

             if (elem->out_num + elem->in_num >
                 vhost_svq_available_slots(svq)) {


> +        }
> +
> +        vhost_svq_set_notification(svq, true);


A mb() is needed here? Otherwise we may lost a call here (where 
vhost_svq_more_used() is run before vhost_svq_set_notification()).


> +    } while (vhost_svq_more_used(svq));
> +}
> +
> +/**
> + * Forward used buffers.
> + *
> + * @n hdev call event notifier, the one that device set to notify svq.
> + *
> + * Note that we are not making any buffers available in the loop, there is no
> + * way that it runs more than virtqueue size times.
> + */
>   static void vhost_svq_handle_call(EventNotifier *n)
>   {
>       VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>                                                hdev_call);
>   
> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> -        return;
> -    }
> +    /* Clear event notifier */
> +    event_notifier_test_and_clear(n);


Any reason that we remove the above check?


>   
> -    event_notifier_set(&svq->svq_call);
> +    vhost_svq_flush(svq, true);
>   }
>   
>   /**
> @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>        * need to explicitely check for them.
>        */
>       event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> +    event_notifier_set_handler(&svq->svq_kick,
> +                               vhost_handle_guest_kick_notifier);
>   
>       if (!check_old || event_notifier_test_and_clear(&tmp)) {
>           event_notifier_set(&svq->hdev_kick);
>       }
>   }
>   
> +/**
> + * Start shadow virtqueue operation.
> + *
> + * @svq Shadow Virtqueue
> + * @vdev        VirtIO device
> + * @vq          Virtqueue to shadow
> + */
> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> +                     VirtQueue *vq)
> +{
> +    svq->next_guest_avail_elem = NULL;
> +    svq->avail_idx_shadow = 0;
> +    svq->shadow_used_idx = 0;
> +    svq->last_used_idx = 0;
> +    svq->vdev = vdev;
> +    svq->vq = vq;
> +
> +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
> +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
> +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> +    }
> +}
> +
>   /**
>    * Stop shadow virtqueue operation.
>    * @svq Shadow Virtqueue
> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>   void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   {
>       event_notifier_set_handler(&svq->svq_kick, NULL);
> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> +
> +    if (!svq->vq) {
> +        return;
> +    }
> +
> +    /* Send all pending used descriptors to guest */
> +    vhost_svq_flush(svq, false);
> +
> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> +        g_autofree VirtQueueElement *elem = NULL;
> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> +        if (elem) {
> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> +        }
> +    }
> +
> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +    if (next_avail_elem) {
> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> +                                 next_avail_elem->len);
> +    }
>   }
>   
>   /**
> @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>       memset(svq->vring.desc, 0, driver_size);
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>       memset(svq->vring.used, 0, device_size);
> -
> +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
>   
> @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>       event_notifier_cleanup(&vq->hdev_kick);
>       event_notifier_set_handler(&vq->hdev_call, NULL);
>       event_notifier_cleanup(&vq->hdev_call);
> +    g_free(vq->ring_id_maps);
>       qemu_vfree(vq->vring.desc);
>       qemu_vfree(vq->vring.used);
>       g_free(vq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 53e14bafa0..0e5c00ed7e 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>    * Note that this function does not rewind kick file descriptor if cannot set
>    * call one.
>    */
> -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> -                                VhostShadowVirtqueue *svq,
> -                                unsigned idx)
> +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
> +                                  VhostShadowVirtqueue *svq,
> +                                  unsigned idx)
>   {
>       struct vhost_vring_file file = {
>           .index = dev->vq_index + idx,
> @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>       r = vhost_vdpa_set_vring_dev_kick(dev, &file);
>       if (unlikely(r != 0)) {
>           error_report("Can't set device kick fd (%d)", -r);
> -        return false;
> +        return r;
>       }
>   
>       event_notifier = vhost_svq_get_svq_call_notifier(svq);
> @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>           error_report("Can't set device call fd (%d)", -r);
>       }
>   
> +    return r;
> +}
> +
> +/**
> + * Unmap SVQ area in the device
> + */
> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> +                                      hwaddr size)
> +{
> +    int r;
> +
> +    size = ROUND_UP(size, qemu_real_host_page_size);
> +    r = vhost_vdpa_dma_unmap(v, iova, size);
> +    return r == 0;
> +}
> +
> +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> +                                       const VhostShadowVirtqueue *svq)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    struct vhost_vring_addr svq_addr;
> +    size_t device_size = vhost_svq_device_area_size(svq);
> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> +    bool ok;
> +
> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> +
> +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> +}
> +
> +/**
> + * Map shadow virtqueue rings in device
> + *
> + * @dev   The vhost device
> + * @svq   The shadow virtqueue
> + */
> +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
> +                                     const VhostShadowVirtqueue *svq)
> +{
> +    struct vhost_vdpa *v = dev->opaque;
> +    struct vhost_vring_addr svq_addr;
> +    size_t device_size = vhost_svq_device_area_size(svq);
> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> +    int r;
> +
> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> +
> +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> +                           (void *)svq_addr.desc_user_addr, true);
> +    if (unlikely(r != 0)) {
> +        return false;
> +    }
> +
> +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> +                           (void *)svq_addr.used_user_addr, false);


Do we need unmap the driver area if we fail here?

Thanks


> +    return r == 0;
> +}
> +
> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> +                                VhostShadowVirtqueue *svq,
> +                                unsigned idx)
> +{
> +    uint16_t vq_index = dev->vq_index + idx;
> +    struct vhost_vring_state s = {
> +        .index = vq_index,
> +    };
> +    int r;
> +    bool ok;
> +
> +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
> +    if (unlikely(r)) {
> +        error_report("Can't set vring base (%d)", r);
> +        return false;
> +    }
> +
> +    s.num = vhost_svq_get_num(svq);
> +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
> +    if (unlikely(r)) {
> +        error_report("Can't set vring num (%d)", r);
> +        return false;
> +    }
> +
> +    ok = vhost_vdpa_svq_map_rings(dev, svq);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
>       return r == 0;
>   }
>   
> @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>       if (started) {
>           vhost_vdpa_host_notifiers_init(dev);
>           for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
>               VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>               bool ok = vhost_vdpa_svq_setup(dev, svq, i);
>               if (unlikely(!ok)) {
>                   return -1;
>               }
> +            vhost_svq_start(svq, dev->vdev, vq);
>           }
>           vhost_vdpa_set_vring_ready(dev);
>       } else {
> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
> +                                                          i);
> +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
> +            if (unlikely(!ok)) {
> +                return -1;
> +            }
> +        }
>           vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>       }
>   

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 21/31] util: Add iova_tree_alloc
       [not found]     ` <CAJaqyWf--wbNZz5ZzbpixD9op_fO5fV01kbYXzG097c_NkqYrw@mail.gmail.com>
  2022-01-24 11:07       ` Peter Xu
@ 2022-01-30  5:06       ` Jason Wang
  1 sibling, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-01-30  5:06 UTC (permalink / raw)
  To: Eugenio Perez Martin, Peter Xu
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/24 下午5:20, Eugenio Perez Martin 写道:
> On Mon, Jan 24, 2022 at 5:33 AM Peter Xu <peterx@redhat.com> wrote:
>> On Fri, Jan 21, 2022 at 09:27:23PM +0100, Eugenio Pérez wrote:
>>> +int iova_tree_alloc(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
> I forgot to s/iova_tree_alloc/iova_tree_alloc_map/ here.
>
>>> +                    hwaddr iova_last)
>>> +{
>>> +    const DMAMapInternal *last, *i;
>>> +
>>> +    assert(iova_begin < iova_last);
>>> +
>>> +    /*
>>> +     * Find a valid hole for the mapping
>>> +     *
>>> +     * TODO: Replace all this with g_tree_node_first/next/last when available
>>> +     * (from glib since 2.68). Using a sepparated QTAILQ complicates code.
>>> +     *
>>> +     * Try to allocate first at the end of the list.
>>> +     */
>>> +    last = QTAILQ_LAST(&tree->list);
>>> +    if (iova_tree_alloc_map_in_hole(last, NULL, iova_begin, iova_last,
>>> +                                    map->size)) {
>>> +        goto alloc;
>>> +    }
>>> +
>>> +    /* Look for inner hole */
>>> +    last = NULL;
>>> +    for (i = QTAILQ_FIRST(&tree->list); i;
>>> +         last = i, i = QTAILQ_NEXT(i, entry)) {
>>> +        if (iova_tree_alloc_map_in_hole(last, i, iova_begin, iova_last,
>>> +                                        map->size)) {
>>> +            goto alloc;
>>> +        }
>>> +    }
>>> +
>>> +    return IOVA_ERR_NOMEM;
>>> +
>>> +alloc:
>>> +    map->iova = last ? last->map.iova + last->map.size + 1 : iova_begin;
>>> +    return iova_tree_insert(tree, map);
>>> +}
>> Hi, Eugenio,
>>
>> Have you tried with what Jason suggested previously?
>>
>>    https://lore.kernel.org/qemu-devel/CACGkMEtZAPd9xQTP_R4w296N_Qz7VuV1FLnb544fEVoYO0of+g@mail.gmail.com/
>>
>> That solution still sounds very sensible to me even without the newly
>> introduced list in previous two patches.
>>
>> IMHO we could move "DMAMap *previous, *this" into the IOVATreeAllocArgs*
>> stucture that was passed into the traverse func though, so it'll naturally work
>> with threading.
>>
>> Or is there any blocker for it?
>>
> Hi Peter,
>
> I can try that solution again, but the main problem was the special
> cases of the beginning and ending.
>
> For the function to locate a hole, DMAMap first = {.iova = 0, .size =
> 0} means that it cannot account 0 for the hole.
>
> In other words, with that algorithm, if the only valid hole is [0, N)
> and we try to allocate a block of size N, it would fail.
>
> Same happens with iova_end, although in practice it seems that IOMMU
> hardware iova upper limit is never UINT64_MAX.
>
> Maybe we could treat .size = 0 as a special case?


Yes, the pseudo-code I past is just to show the idea of using 
g_tree_foreach() instead of introducing new auxiliary data structures. 
That will simplify both the codes and the reviewers.

Down the road, we may start from an iova range specified during the 
creation of the iova tree. E.g for vtd, it's the GAW, for vhost-vdpa, 
it's the one that we get from VHOST_VDPA_GET_IOVA_RANGE.

Thanks


> I see cleaner either
> to build the list (but insert needs to take the list into account) or
> to explicitly tell that prev == NULL means to use iova_first.
>
> Another solution that comes to my mind: to add both exceptions outside
> of transverse function, and skip the first iteration with something
> like:
>
> if (prev == NULL) {
>    prev = this;
>    return false /* continue */
> }
>
> So the transverse callback has way less code paths. Would it work for
> you if I send a separate RFC from SVQ only to validate this?
>
> Thanks!
>
>> Thanks,
>> --
>> Peter Xu
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 22/31] vhost: Add VhostIOVATree
       [not found] ` <20220121202733.404989-23-eperezma@redhat.com>
@ 2022-01-30  5:21   ` Jason Wang
       [not found]     ` <CAJaqyWePW6hJKAm7nk+syqmXAgdTQSTtuv9jACu_+hgbg2bRHg@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-30  5:21 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> This tree is able to look for a translated address from an IOVA address.
>
> At first glance it is similar to util/iova-tree. However, SVQ working on
> devices with limited IOVA space need more capabilities,


So did the IOVA tree (e.g l2 vtd can only work in the range of GAW and 
without RMRRs).


>   like allocating
> IOVA chunks or performing reverse translations (qemu addresses to iova).


This looks like a general request as well. So I wonder if we can simply 
extend iova tree instead.

Thanks


>
> The allocation capability, as "assign a free IOVA address to this chunk
> of memory in qemu's address space" allows shadow virtqueue to create a
> new address space that is not restricted by guest's addressable one, so
> we can allocate shadow vqs vrings outside of it.
>
> It duplicates the tree so it can search efficiently both directions,
> and it will signal overlap if iova or the translated address is
> present in any tree.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-iova-tree.h |  27 +++++++
>   hw/virtio/vhost-iova-tree.c | 157 ++++++++++++++++++++++++++++++++++++
>   hw/virtio/meson.build       |   2 +-
>   3 files changed, 185 insertions(+), 1 deletion(-)
>   create mode 100644 hw/virtio/vhost-iova-tree.h
>   create mode 100644 hw/virtio/vhost-iova-tree.c
>
> diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
> new file mode 100644
> index 0000000000..610394eaf1
> --- /dev/null
> +++ b/hw/virtio/vhost-iova-tree.h
> @@ -0,0 +1,27 @@
> +/*
> + * vhost software live migration ring
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
> +#define HW_VIRTIO_VHOST_IOVA_TREE_H
> +
> +#include "qemu/iova-tree.h"
> +#include "exec/memory.h"
> +
> +typedef struct VhostIOVATree VhostIOVATree;
> +
> +VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
> +
> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
> +                                        const DMAMap *map);
> +int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
> +
> +#endif
> diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
> new file mode 100644
> index 0000000000..0021dbaf54
> --- /dev/null
> +++ b/hw/virtio/vhost-iova-tree.c
> @@ -0,0 +1,157 @@
> +/*
> + * vhost software live migration ring
> + *
> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/iova-tree.h"
> +#include "vhost-iova-tree.h"
> +
> +#define iova_min_addr qemu_real_host_page_size
> +
> +/**
> + * VhostIOVATree, able to:
> + * - Translate iova address
> + * - Reverse translate iova address (from translated to iova)
> + * - Allocate IOVA regions for translated range (potentially slow operation)
> + *
> + * Note that it cannot remove nodes.
> + */
> +struct VhostIOVATree {
> +    /* First addresable iova address in the device */
> +    uint64_t iova_first;
> +
> +    /* Last addressable iova address in the device */
> +    uint64_t iova_last;
> +
> +    /* IOVA address to qemu memory maps. */
> +    IOVATree *iova_taddr_map;
> +
> +    /* QEMU virtual memory address to iova maps */
> +    GTree *taddr_iova_map;
> +};
> +
> +static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
> +                                      gpointer data)
> +{
> +    const DMAMap *m1 = a, *m2 = b;
> +
> +    if (m1->translated_addr > m2->translated_addr + m2->size) {
> +        return 1;
> +    }
> +
> +    if (m1->translated_addr + m1->size < m2->translated_addr) {
> +        return -1;
> +    }
> +
> +    /* Overlapped */
> +    return 0;
> +}
> +
> +/**
> + * Create a new IOVA tree
> + *
> + * Returns the new IOVA tree
> + */
> +VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
> +{
> +    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
> +
> +    /* Some devices does not like 0 addresses */
> +    tree->iova_first = MAX(iova_first, iova_min_addr);
> +    tree->iova_last = iova_last;
> +
> +    tree->iova_taddr_map = iova_tree_new();
> +    tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
> +                                           NULL, g_free);
> +    return tree;
> +}
> +
> +/**
> + * Delete an iova tree
> + */
> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
> +{
> +    iova_tree_destroy(iova_tree->iova_taddr_map);
> +    g_tree_unref(iova_tree->taddr_iova_map);
> +    g_free(iova_tree);
> +}
> +
> +/**
> + * Find the IOVA address stored from a memory address
> + *
> + * @tree     The iova tree
> + * @map      The map with the memory address
> + *
> + * Return the stored mapping, or NULL if not found.
> + */
> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
> +                                        const DMAMap *map)
> +{
> +    return g_tree_lookup(tree->taddr_iova_map, map);
> +}
> +
> +/**
> + * Allocate a new mapping
> + *
> + * @tree  The iova tree
> + * @map   The iova map
> + *
> + * Returns:
> + * - IOVA_OK if the map fits in the container
> + * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
> + * - IOVA_ERR_OVERLAP if the tree already contains that map
> + * - IOVA_ERR_NOMEM if tree cannot allocate more space.
> + *
> + * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
> + */
> +int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
> +{
> +    /* Some vhost devices does not like addr 0. Skip first page */
> +    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
> +    DMAMap *new;
> +    int r;
> +
> +    if (map->translated_addr + map->size < map->translated_addr ||
> +        map->perm == IOMMU_NONE) {
> +        return IOVA_ERR_INVALID;
> +    }
> +
> +    /* Check for collisions in translated addresses */
> +    if (vhost_iova_tree_find_iova(tree, map)) {
> +        return IOVA_ERR_OVERLAP;
> +    }
> +
> +    /* Allocate a node in IOVA address */
> +    r = iova_tree_alloc(tree->iova_taddr_map, map, iova_first,
> +                        tree->iova_last);
> +    if (r != IOVA_OK) {
> +        return r;
> +    }
> +
> +    /* Allocate node in qemu -> iova translations */
> +    new = g_malloc(sizeof(*new));
> +    memcpy(new, map, sizeof(*new));
> +    g_tree_insert(tree->taddr_iova_map, new, new);
> +    return IOVA_OK;
> +}
> +
> +/**
> + * Remove existing mappings from iova tree
> + *
> + * @param  iova_tree  The vhost iova tree
> + * @param  map        The map to remove
> + */
> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
> +{
> +    const DMAMap *overlap;
> +
> +    iova_tree_remove(iova_tree->iova_taddr_map, map);
> +    while ((overlap = vhost_iova_tree_find_iova(iova_tree, map))) {
> +        g_tree_remove(iova_tree->taddr_iova_map, overlap);
> +    }
> +}
> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
> index 2dc87613bc..6047670804 100644
> --- a/hw/virtio/meson.build
> +++ b/hw/virtio/meson.build
> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>   
>   virtio_ss = ss.source_set()
>   virtio_ss.add(files('virtio.c'))
> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
>   virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
>   virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
>   virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ
       [not found] ` <20220121202733.404989-24-eperezma@redhat.com>
@ 2022-01-30  5:57   ` Jason Wang
       [not found]     ` <CAJaqyWe1zH8bfaoxTyz_RXH=0q+Yk9H7QyUffaRB1fCV9oVLZQ@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-30  5:57 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Use translations added in VhostIOVATree in SVQ.
>
> Only introduce usage here, not allocation and deallocation. As with
> previous patches, we use the dead code paths of shadow_vqs_enabled to
> avoid commiting too many changes at once. These are impossible to take
> at the moment.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-shadow-virtqueue.h |   3 +-
>   include/hw/virtio/vhost-vdpa.h     |   3 +
>   hw/virtio/vhost-shadow-virtqueue.c | 111 ++++++++++++++++----
>   hw/virtio/vhost-vdpa.c             | 161 +++++++++++++++++++++++++----
>   4 files changed, 238 insertions(+), 40 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 19c934af49..c6f67d6f76 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -12,6 +12,7 @@
>   
>   #include "hw/virtio/vhost.h"
>   #include "qemu/event_notifier.h"
> +#include "hw/virtio/vhost-iova-tree.h"
>   
>   typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>   
> @@ -37,7 +38,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>                        VirtQueue *vq);
>   void vhost_svq_stop(VhostShadowVirtqueue *svq);
>   
> -VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize, VhostIOVATree *iova_map);
>   
>   void vhost_svq_free(VhostShadowVirtqueue *vq);
>   
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 009a9f3b6b..cd2388b3be 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -14,6 +14,7 @@
>   
>   #include <gmodule.h>
>   
> +#include "hw/virtio/vhost-iova-tree.h"
>   #include "hw/virtio/virtio.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
> @@ -30,6 +31,8 @@ typedef struct vhost_vdpa {
>       MemoryListener listener;
>       struct vhost_vdpa_iova_range iova_range;
>       bool shadow_vqs_enabled;
> +    /* IOVA mapping used by Shadow Virtqueue */
> +    VhostIOVATree *iova_tree;
>       GPtrArray *shadow_vqs;
>       struct vhost_dev *dev;
>       VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index a1a404f68f..c7888eb8cf 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -11,6 +11,7 @@
>   #include "hw/virtio/vhost-shadow-virtqueue.h"
>   #include "hw/virtio/vhost.h"
>   #include "hw/virtio/virtio-access.h"
> +#include "hw/virtio/vhost-iova-tree.h"
>   #include "standard-headers/linux/vhost_types.h"
>   
>   #include "qemu/error-report.h"
> @@ -45,6 +46,9 @@ typedef struct VhostShadowVirtqueue {
>       /* Virtio device */
>       VirtIODevice *vdev;
>   
> +    /* IOVA mapping */
> +    VhostIOVATree *iova_tree;
> +
>       /* Map for returning guest's descriptors */
>       VirtQueueElement **ring_id_maps;
>   
> @@ -97,13 +101,7 @@ bool vhost_svq_valid_device_features(uint64_t *dev_features)
>               continue;
>   
>           case VIRTIO_F_ACCESS_PLATFORM:
> -            /* SVQ does not know how to translate addresses */
> -            if (*dev_features & BIT_ULL(b)) {
> -                clear_bit(b, dev_features);
> -                r = false;
> -            }
> -            break;
> -
> +            /* SVQ trust in host's IOMMU to translate addresses */
>           case VIRTIO_F_VERSION_1:
>               /* SVQ trust that guest vring is little endian */
>               if (!(*dev_features & BIT_ULL(b))) {
> @@ -205,7 +203,55 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
>       }
>   }
>   
> +/**
> + * Translate addresses between qemu's virtual address and SVQ IOVA
> + *
> + * @svq    Shadow VirtQueue
> + * @vaddr  Translated IOVA addresses
> + * @iovec  Source qemu's VA addresses
> + * @num    Length of iovec and minimum length of vaddr
> + */
> +static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
> +                                     void **addrs, const struct iovec *iovec,
> +                                     size_t num)
> +{
> +    size_t i;
> +
> +    if (num == 0) {
> +        return true;
> +    }
> +
> +    for (i = 0; i < num; ++i) {
> +        DMAMap needle = {
> +            .translated_addr = (hwaddr)iovec[i].iov_base,
> +            .size = iovec[i].iov_len,
> +        };
> +        size_t off;
> +
> +        const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_tree, &needle);
> +        /*
> +         * Map cannot be NULL since iova map contains all guest space and
> +         * qemu already has a physical address mapped
> +         */
> +        if (unlikely(!map)) {
> +            error_report("Invalid address 0x%"HWADDR_PRIx" given by guest",
> +                         needle.translated_addr);


This can be triggered by guest, we need use once or log_guest_error() etc.


> +            return false;
> +        }
> +
> +        /*
> +         * Map->iova chunk size is ignored. What to do if descriptor
> +         * (addr, size) does not fit is delegated to the device.
> +         */


I think we need at least check the size and fail if the size doesn't 
match here. Or is it possible that we have a buffer that may cross two 
memory regions?


> +        off = needle.translated_addr - map->translated_addr;
> +        addrs[i] = (void *)(map->iova + off);
> +    }
> +
> +    return true;
> +}
> +
>   static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> +                                    void * const *vaddr_sg,
>                                       const struct iovec *iovec,
>                                       size_t num, bool more_descs, bool write)
>   {
> @@ -224,7 +270,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
>           } else {
>               descs[i].flags = flags;
>           }
> -        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> +        descs[i].addr = cpu_to_le64((hwaddr)vaddr_sg[n]);
>           descs[i].len = cpu_to_le32(iovec[n].iov_len);
>   
>           last = i;
> @@ -234,42 +280,60 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
>       svq->free_head = le16_to_cpu(descs[last].next);
>   }
>   
> -static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> -                                    VirtQueueElement *elem)
> +static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> +                                VirtQueueElement *elem,
> +                                unsigned *head)


I'd suggest to make it returns bool since the patch that introduces this 
function.


>   {
> -    int head;
>       unsigned avail_idx;
>       vring_avail_t *avail = svq->vring.avail;
> +    bool ok;
> +    g_autofree void **sgs = g_new(void *, MAX(elem->out_num, elem->in_num));
>   
> -    head = svq->free_head;
> +    *head = svq->free_head;
>   
>       /* We need some descriptors here */
>       assert(elem->out_num || elem->in_num);
>   
> -    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> +    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
>                               elem->in_num > 0, false);
> -    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> +
> +
> +    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
>   
>       /*
>        * Put entry in available array (but don't update avail->idx until they
>        * do sync).
>        */
>       avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> -    avail->ring[avail_idx] = cpu_to_le16(head);
> +    avail->ring[avail_idx] = cpu_to_le16(*head);
>       svq->avail_idx_shadow++;
>   
>       /* Update avail index after the descriptor is wrote */
>       smp_wmb();
>       avail->idx = cpu_to_le16(svq->avail_idx_shadow);
>   
> -    return head;
> +    return true;
>   }
>   
> -static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> +static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
>   {
> -    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> +    unsigned qemu_head;
> +    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
>   
>       svq->ring_id_maps[qemu_head] = elem;
> +    return true;
>   }
>   
>   static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> @@ -309,6 +373,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>   
>           while (true) {
>               VirtQueueElement *elem;
> +            bool ok;
>   
>               if (svq->next_guest_avail_elem) {
>                   elem = g_steal_pointer(&svq->next_guest_avail_elem);
> @@ -337,7 +402,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>                   return;
>               }
>   
> -            vhost_svq_add(svq, elem);
> +            ok = vhost_svq_add(svq, elem);
> +            if (unlikely(!ok)) {
> +                /* VQ is broken, just return and ignore any other kicks */
> +                return;
> +            }
>               vhost_svq_kick(svq);
>           }
>   
> @@ -619,12 +688,13 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>    * methods and file descriptors.
>    *
>    * @qsize Shadow VirtQueue size
> + * @iova_tree Tree to perform descriptors translations
>    *
>    * Returns the new virtqueue or NULL.
>    *
>    * In case of error, reason is reported through error_report.
>    */
> -VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize, VhostIOVATree *iova_tree)
>   {
>       size_t desc_size = sizeof(vring_desc_t) * qsize;
>       size_t device_size, driver_size;
> @@ -656,6 +726,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>       memset(svq->vring.desc, 0, driver_size);
>       svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>       memset(svq->vring.used, 0, device_size);
> +    svq->iova_tree = iova_tree;
>       svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
>       event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>       return g_steal_pointer(&svq);
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 0e5c00ed7e..276a559649 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -209,6 +209,18 @@ static void vhost_vdpa_listener_region_add(MemoryListener *listener,
>                                            vaddr, section->readonly);
>   
>       llsize = int128_sub(llend, int128_make64(iova));
> +    if (v->shadow_vqs_enabled) {
> +        DMAMap mem_region = {
> +            .translated_addr = (hwaddr)vaddr,
> +            .size = int128_get64(llsize) - 1,
> +            .perm = IOMMU_ACCESS_FLAG(true, section->readonly),
> +        };
> +
> +        int r = vhost_iova_tree_map_alloc(v->iova_tree, &mem_region);
> +        assert(r == IOVA_OK);


It's better to fail or warn here.


> +
> +        iova = mem_region.iova;
> +    }
>   
>       vhost_vdpa_iotlb_batch_begin_once(v);
>       ret = vhost_vdpa_dma_map(v, iova, int128_get64(llsize),
> @@ -261,6 +273,20 @@ static void vhost_vdpa_listener_region_del(MemoryListener *listener,
>   
>       llsize = int128_sub(llend, int128_make64(iova));
>   
> +    if (v->shadow_vqs_enabled) {
> +        const DMAMap *result;
> +        const void *vaddr = memory_region_get_ram_ptr(section->mr) +
> +            section->offset_within_region +
> +            (iova - section->offset_within_address_space);
> +        DMAMap mem_region = {
> +            .translated_addr = (hwaddr)vaddr,
> +            .size = int128_get64(llsize) - 1,
> +        };
> +
> +        result = vhost_iova_tree_find_iova(v->iova_tree, &mem_region);
> +        iova = result->iova;
> +        vhost_iova_tree_remove(v->iova_tree, &mem_region);
> +    }
>       vhost_vdpa_iotlb_batch_begin_once(v);
>       ret = vhost_vdpa_dma_unmap(v, iova, int128_get64(llsize));
>       if (ret) {
> @@ -783,33 +809,70 @@ static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
>   /**
>    * Unmap SVQ area in the device
>    */
> -static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> -                                      hwaddr size)
> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v,
> +                                      const DMAMap *needle)
>   {
> +    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
> +    hwaddr size;
>       int r;
>   
> -    size = ROUND_UP(size, qemu_real_host_page_size);
> -    r = vhost_vdpa_dma_unmap(v, iova, size);
> +    if (unlikely(!result)) {
> +        error_report("Unable to find SVQ address to unmap");
> +        return false;
> +    }
> +
> +    size = ROUND_UP(result->size, qemu_real_host_page_size);
> +    r = vhost_vdpa_dma_unmap(v, result->iova, size);
>       return r == 0;
>   }
>   
>   static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
>                                          const VhostShadowVirtqueue *svq)
>   {
> +    DMAMap needle;
>       struct vhost_vdpa *v = dev->opaque;
>       struct vhost_vring_addr svq_addr;
> -    size_t device_size = vhost_svq_device_area_size(svq);
> -    size_t driver_size = vhost_svq_driver_area_size(svq);
>       bool ok;
>   
>       vhost_svq_get_vring_addr(svq, &svq_addr);
>   
> -    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.desc_user_addr,
> +    };
> +    ok = vhost_vdpa_svq_unmap_ring(v, &needle);
>       if (unlikely(!ok)) {
>           return false;
>       }
>   
> -    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.used_user_addr,
> +    };
> +    return vhost_vdpa_svq_unmap_ring(v, &needle);
> +}
> +
> +/**
> + * Map SVQ area in the device
> + *
> + * @v          Vhost-vdpa device
> + * @needle     The area to search iova
> + * @readonly   Permissions of the area
> + */
> +static bool vhost_vdpa_svq_map_ring(struct vhost_vdpa *v, const DMAMap *needle,
> +                                    bool readonly)
> +{
> +    hwaddr off;
> +    const DMAMap *result = vhost_iova_tree_find_iova(v->iova_tree, needle);
> +    int r;
> +
> +    if (unlikely(!result)) {
> +        error_report("Can't locate SVQ ring");
> +        return false;
> +    }
> +
> +    off = needle->translated_addr - result->translated_addr;
> +    r = vhost_vdpa_dma_map(v, result->iova + off, needle->size,
> +                           (void *)needle->translated_addr, readonly);
> +    return r == 0;
>   }
>   
>   /**
> @@ -821,23 +884,29 @@ static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
>   static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
>                                        const VhostShadowVirtqueue *svq)
>   {
> +    DMAMap needle;
>       struct vhost_vdpa *v = dev->opaque;
>       struct vhost_vring_addr svq_addr;
>       size_t device_size = vhost_svq_device_area_size(svq);
>       size_t driver_size = vhost_svq_driver_area_size(svq);
> -    int r;
> +    bool ok;
>   
>       vhost_svq_get_vring_addr(svq, &svq_addr);
>   
> -    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> -                           (void *)svq_addr.desc_user_addr, true);
> -    if (unlikely(r != 0)) {
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.desc_user_addr,
> +        .size = driver_size,
> +    };
> +    ok = vhost_vdpa_svq_map_ring(v, &needle, true);
> +    if (unlikely(!ok)) {
>           return false;
>       }
>   
> -    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> -                           (void *)svq_addr.used_user_addr, false);
> -    return r == 0;
> +    needle = (DMAMap) {
> +        .translated_addr = svq_addr.used_user_addr,
> +        .size = device_size,
> +    };
> +    return vhost_vdpa_svq_map_ring(v, &needle, false);
>   }
>   
>   static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> @@ -1006,6 +1075,23 @@ static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>       return vhost_vdpa_call(dev, VHOST_SET_OWNER, NULL);
>   }
>   
> +static bool vhost_vdpa_svq_get_vq_region(struct vhost_vdpa *v,
> +                                         unsigned long long addr,
> +                                         uint64_t *iova_addr)
> +{
> +    const DMAMap needle = {
> +        .translated_addr = addr,
> +    };
> +    const DMAMap *translation = vhost_iova_tree_find_iova(v->iova_tree,
> +                                                          &needle);
> +    if (!translation) {
> +        return false;
> +    }
> +
> +    *iova_addr = translation->iova + (addr - translation->translated_addr);
> +    return true;
> +}
> +
>   static void vhost_vdpa_vq_get_guest_addr(struct vhost_vring_addr *addr,
>                                            struct vhost_virtqueue *vq)
>   {
> @@ -1023,10 +1109,23 @@ static int vhost_vdpa_vq_get_addr(struct vhost_dev *dev,
>       assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>   
>       if (v->shadow_vqs_enabled) {
> +        struct vhost_vring_addr svq_addr;
>           int idx = vhost_vdpa_get_vq_index(dev, addr->index);
>           VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, idx);
>   
> -        vhost_svq_get_vring_addr(svq, addr);
> +        vhost_svq_get_vring_addr(svq, &svq_addr);
> +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.desc_user_addr,
> +                                          &addr->desc_user_addr)) {
> +            return -1;
> +        }
> +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.avail_user_addr,
> +                                          &addr->avail_user_addr)) {
> +            return -1;
> +        }
> +        if (!vhost_vdpa_svq_get_vq_region(v, svq_addr.used_user_addr,
> +                                          &addr->used_user_addr)) {
> +            return -1;
> +        }
>       } else {
>           vhost_vdpa_vq_get_guest_addr(addr, vq);
>       }
> @@ -1095,13 +1194,37 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>   
>       shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>       for (unsigned n = 0; n < hdev->nvqs; ++n) {
> -        VhostShadowVirtqueue *svq = vhost_svq_new(qsize);
> -
> +        DMAMap device_region, driver_region;
> +        struct vhost_vring_addr addr;
> +        VhostShadowVirtqueue *svq = vhost_svq_new(qsize, v->iova_tree);
>           if (unlikely(!svq)) {
>               error_setg(errp, "Cannot create svq %u", n);
>               return -1;
>           }
> -        g_ptr_array_add(v->shadow_vqs, svq);
> +
> +        vhost_svq_get_vring_addr(svq, &addr);
> +        driver_region = (DMAMap) {
> +            .translated_addr = (hwaddr)addr.desc_user_addr,
> +
> +            /*
> +             * DMAMAp.size include the last byte included in the range, while
> +             * sizeof marks one past it. Substract one byte to make them match.
> +             */
> +            .size = vhost_svq_driver_area_size(svq) - 1,
> +            .perm = VHOST_ACCESS_RO,
> +        };
> +        device_region = (DMAMap) {
> +            .translated_addr = (hwaddr)addr.used_user_addr,
> +            .size = vhost_svq_device_area_size(svq) - 1,
> +            .perm = VHOST_ACCESS_RW,
> +        };
> +
> +        r = vhost_iova_tree_map_alloc(v->iova_tree, &driver_region);
> +        assert(r == IOVA_OK);


Let's fail instead of assert here.

Thanks


> +        r = vhost_iova_tree_map_alloc(v->iova_tree, &device_region);
> +        assert(r == IOVA_OK);
> +
> +        g_ptr_array_add(shadow_vqs, svq);
>       }
>   
>   out:

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
       [not found] ` <20220121202733.404989-19-eperezma@redhat.com>
  2022-01-30  4:42   ` [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding Jason Wang
@ 2022-01-30  6:46   ` Jason Wang
       [not found]     ` <CAJaqyWfF01k3LntM7RLEmFcej=EY2d4+2MARKXPptQ2J7VnB9A@mail.gmail.com>
  1 sibling, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-30  6:46 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>   void vhost_svq_stop(VhostShadowVirtqueue *svq)
>   {
>       event_notifier_set_handler(&svq->svq_kick, NULL);
> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> +
> +    if (!svq->vq) {
> +        return;
> +    }
> +
> +    /* Send all pending used descriptors to guest */
> +    vhost_svq_flush(svq, false);


Do we need to wait for all the pending descriptors to be completed here?

Thanks


> +
> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> +        g_autofree VirtQueueElement *elem = NULL;
> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> +        if (elem) {
> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> +        }
> +    }
> +
> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> +    if (next_avail_elem) {
> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> +                                 next_avail_elem->len);
> +    }
>   }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
       [not found] ` <20220121202733.404989-29-eperezma@redhat.com>
@ 2022-01-30  6:50   ` Jason Wang
       [not found]     ` <CAJaqyWdBLU+maEhByepzeH7iwLmqUba0rRb8PM4VwBy2P8Vtow@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-01-30  6:50 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> SVQ is able to log the dirty bits by itself, so let's use it to not
> block migration.
>
> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> enabled. Even if the device supports it, the reports would be nonsense
> because SVQ memory is in the qemu region.
>
> The log region is still allocated. Future changes might skip that, but
> this series is already long enough.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index fb0a338baa..75090d65e8 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
>       if (ret == 0 && v->shadow_vqs_enabled) {
>           /* Filter only features that SVQ can offer to guest */
>           vhost_svq_valid_guest_features(features);
> +
> +        /* Add SVQ logging capabilities */
> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
>       }
>   
>       return ret;
> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
>   
>       if (v->shadow_vqs_enabled) {
>           uint64_t dev_features, svq_features, acked_features;
> +        uint8_t status = 0;
>           bool ok;
>   
> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> +        if (unlikely(ret)) {
> +            return ret;
> +        }
> +
> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> +            /*
> +             * vhost is trying to enable or disable _F_LOG, and the device
> +             * would report wrong dirty pages. SVQ handles it.
> +             */


I fail to understand this comment, I'd think there's no way to disable 
dirty page tracking for SVQ.

Thanks


> +            return 0;
> +        }
> +
> +        /* We must not ack _F_LOG if SVQ is enabled */
> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> +
>           ret = vhost_vdpa_get_dev_features(dev, &dev_features);
>           if (ret != 0) {
>               error_report("Can't get vdpa device features, got (%d)", ret);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 29/31] vdpa: Make ncs autofree
       [not found] ` <20220121202733.404989-30-eperezma@redhat.com>
@ 2022-01-30  6:51   ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-01-30  6:51 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Simplifying memory management.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>


To reduce the size of this series. This can be sent as an separate patch 
if I was not wrong.

Thanks


> ---
>   net/vhost-vdpa.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 4125d13118..4befba5cc7 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -264,7 +264,8 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>   {
>       const NetdevVhostVDPAOptions *opts;
>       int vdpa_device_fd;
> -    NetClientState **ncs, *nc;
> +    g_autofree NetClientState **ncs = NULL;
> +    NetClientState *nc;
>       int queue_pairs, i, has_cvq = 0;
>   
>       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> @@ -302,7 +303,6 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>               goto err;
>       }
>   
> -    g_free(ncs);
>       return 0;
>   
>   err:
> @@ -310,7 +310,6 @@ err:
>           qemu_del_net_client(ncs[0]);
>       }
>       qemu_close(vdpa_device_fd);
> -    g_free(ncs);
>   
>       return -1;
>   }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 30/31] vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
       [not found] ` <20220121202733.404989-31-eperezma@redhat.com>
@ 2022-01-30  6:53   ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-01-30  6:53 UTC (permalink / raw)
  To: Eugenio Pérez, qemu-devel
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, Markus Armbruster, Gautam Dawar,
	virtualization, Eduardo Habkost, Harpreet Singh Anand,
	Xiao W Wang, Stefan Hajnoczi, Eli Cohen, Paolo Bonzini,
	Zhu Lingshan, Eric Blake


在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> Since it's a device property, it can be done in net/. This helps SVQ to
> allocate the rings in vdpa device initialization, rather than delay
> that.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> ---
>   hw/virtio/vhost-vdpa.c | 15 ---------------
>   net/vhost-vdpa.c       | 32 ++++++++++++++++++++++++--------


I don't understand here, since we will support device other than net?


>   2 files changed, 24 insertions(+), 23 deletions(-)
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 75090d65e8..2491c05d29 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -350,19 +350,6 @@ static int vhost_vdpa_add_status(struct vhost_dev *dev, uint8_t status)
>       return 0;
>   }
>   
> -static void vhost_vdpa_get_iova_range(struct vhost_vdpa *v)
> -{
> -    int ret = vhost_vdpa_call(v->dev, VHOST_VDPA_GET_IOVA_RANGE,
> -                              &v->iova_range);
> -    if (ret != 0) {
> -        v->iova_range.first = 0;
> -        v->iova_range.last = UINT64_MAX;
> -    }
> -
> -    trace_vhost_vdpa_get_iova_range(v->dev, v->iova_range.first,
> -                                    v->iova_range.last);
> -}


Let's just export this instead?

Thanks


> -
>   static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
>   {
>       struct vhost_vdpa *v = dev->opaque;
> @@ -1295,8 +1282,6 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>           goto err;
>       }
>   
> -    vhost_vdpa_get_iova_range(v);
> -
>       if (vhost_vdpa_one_time_request(dev)) {
>           return 0;
>       }
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 4befba5cc7..cc9cecf8d1 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -22,6 +22,7 @@
>   #include <sys/ioctl.h>
>   #include <err.h>
>   #include "standard-headers/linux/virtio_net.h"
> +#include "standard-headers/linux/vhost_types.h"
>   #include "monitor/monitor.h"
>   #include "hw/virtio/vhost.h"
>   
> @@ -187,13 +188,25 @@ static NetClientInfo net_vhost_vdpa_info = {
>           .check_peer_type = vhost_vdpa_check_peer_type,
>   };
>   
> +static void vhost_vdpa_get_iova_range(int fd,
> +                                      struct vhost_vdpa_iova_range *iova_range)
> +{
> +    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
> +
> +    if (ret < 0) {
> +        iova_range->first = 0;
> +        iova_range->last = UINT64_MAX;
> +    }
> +}
> +
>   static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
> -                                           const char *device,
> -                                           const char *name,
> -                                           int vdpa_device_fd,
> -                                           int queue_pair_index,
> -                                           int nvqs,
> -                                           bool is_datapath)
> +                                       const char *device,
> +                                       const char *name,
> +                                       int vdpa_device_fd,
> +                                       int queue_pair_index,
> +                                       int nvqs,
> +                                       bool is_datapath,
> +                                       struct vhost_vdpa_iova_range iova_range)
>   {
>       NetClientState *nc = NULL;
>       VhostVDPAState *s;
> @@ -211,6 +224,7 @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
>   
>       s->vhost_vdpa.device_fd = vdpa_device_fd;
>       s->vhost_vdpa.index = queue_pair_index;
> +    s->vhost_vdpa.iova_range = iova_range;
>       ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
>       if (ret) {
>           qemu_del_net_client(nc);
> @@ -267,6 +281,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>       g_autofree NetClientState **ncs = NULL;
>       NetClientState *nc;
>       int queue_pairs, i, has_cvq = 0;
> +    struct vhost_vdpa_iova_range iova_range;
>   
>       assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>       opts = &netdev->u.vhost_vdpa;
> @@ -286,19 +301,20 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
>           qemu_close(vdpa_device_fd);
>           return queue_pairs;
>       }
> +    vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
>   
>       ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>   
>       for (i = 0; i < queue_pairs; i++) {
>           ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                     vdpa_device_fd, i, 2, true);
> +                                     vdpa_device_fd, i, 2, true, iova_range);
>           if (!ncs[i])
>               goto err;
>       }
>   
>       if (has_cvq) {
>           nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> -                                 vdpa_device_fd, i, 1, false);
> +                                 vdpa_device_fd, i, 1, false, iova_range);
>           if (!nc)
>               goto err;
>       }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
       [not found]     ` <CAJaqyWda5sBw9VGBrz8g60OJ07Eeq45RRYu9vwgOPZFwten9rw@mail.gmail.com>
@ 2022-02-08  3:23       ` Jason Wang
       [not found]         ` <CAJaqyWeisXmZ9+xw2Rj50K7aKx4khNZZjLZEz4MY97B9pQQm3w@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-08  3:23 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
> On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>>>    1 file changed, 18 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 18de14f0fb..029f98feee 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>>        }
>>>    }
>>>
>>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>> -                                       struct vhost_vring_file *file)
>>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
>>> +                                         struct vhost_vring_file *file)
>>>    {
>>>        trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>    }
>>>
>>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>> +                                     struct vhost_vring_file *file)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>> +
>>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
>>
>> Two questions here (had similar questions for vring kick):
>>
>> 1) Any reason that we setup the eventfd for vhost-vdpa in
>> vhost_vdpa_svq_setup() not here?
>>
> I'm not sure what you mean.
>
> The guest->SVQ call and kick fds are set here and at
> vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
> SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
> vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
> notifier handler since we don't poll it.
>
> On the other hand, the connection SVQ <-> device uses the same fds
> from the beginning to the end, and they will not change with, for
> example, call fd masking. That's why it's setup from
> vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
> us add way more logic there.


More logic in general shadow vq code but less codes for vhost-vdpa 
specific code I think.

E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to 
here.

Thanks


>
>> 2) The call could be disabled by using -1 as the fd, I don't see any
>> code to deal with that.
>>
> Right, I didn't take that into account. vhost-kernel takes also -1 as
> kick_fd to unbind, so SVQ can be reworked to take that into account
> for sure.
>
> Thanks!
>
>> Thanks
>>
>>
>>> +        return 0;
>>> +    } else {
>>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
>>> +    }
>>> +}
>>> +
>>>    /**
>>>     * Set shadow virtqueue descriptors to the device
>>>     *

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq
       [not found]       ` <CAJaqyWc6BqJBDcUE36AQ=bgWjJYkyMo1ZYxRwmc5ZgGj4T-pVg@mail.gmail.com>
@ 2022-02-08  3:37         ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-08  3:37 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 下午6:57, Eugenio Perez Martin 写道:
> On Mon, Jan 31, 2022 at 4:49 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
>> On Sat, Jan 29, 2022 at 9:11 AM Jason Wang <jasowang@redhat.com> wrote:
>>>
>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>> This allows SVQ to negotiate features with the device. For the device,
>>>> SVQ is a driver. While this function needs to bypass all non-transport
>>>> features, it needs to disable the features that SVQ does not support
>>>> when forwarding buffers. This includes packed vq layout, indirect
>>>> descriptors or event idx.
>>>>
>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>> ---
>>>>    hw/virtio/vhost-shadow-virtqueue.h |  2 ++
>>>>    hw/virtio/vhost-shadow-virtqueue.c | 44 ++++++++++++++++++++++++++++++
>>>>    hw/virtio/vhost-vdpa.c             | 21 ++++++++++++++
>>>>    3 files changed, 67 insertions(+)
>>>>
>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>>> index c9ffa11fce..d963867a04 100644
>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>>> @@ -15,6 +15,8 @@
>>>>
>>>>    typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>>>>
>>>> +bool vhost_svq_valid_device_features(uint64_t *features);
>>>> +
>>>>    void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>>>>    void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
>>>>    const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>>> index 9619c8082c..51442b3dbf 100644
>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>>> @@ -45,6 +45,50 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>>        return &svq->hdev_kick;
>>>>    }
>>>>
>>>> +/**
>>>> + * Validate the transport device features that SVQ can use with the device
>>>> + *
>>>> + * @dev_features  The device features. If success, the acknowledged features.
>>>> + *
>>>> + * Returns true if SVQ can go with a subset of these, false otherwise.
>>>> + */
>>>> +bool vhost_svq_valid_device_features(uint64_t *dev_features)
>>>> +{
>>>> +    bool r = true;
>>>> +
>>>> +    for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
>>>> +         ++b) {
>>>> +        switch (b) {
>>>> +        case VIRTIO_F_NOTIFY_ON_EMPTY:
>>>> +        case VIRTIO_F_ANY_LAYOUT:
>>>> +            continue;
>>>> +
>>>> +        case VIRTIO_F_ACCESS_PLATFORM:
>>>> +            /* SVQ does not know how to translate addresses */
>>>
>>> I may miss something but any reason that we need to disable
>>> ACCESS_PLATFORM? I'd expect the vring helper we used for shadow
>>> virtqueue can deal with vIOMMU perfectly.
>>>
>> This function is validating SVQ <-> Device communications features,
>> that may or may not be the same as guest <-> SVQ. These feature flags
>> are valid for guest <-> SVQ communication, same as with indirect
>> descriptors one.
>>
>> Having said that, there is a point in the series where
>> VIRTIO_F_ACCESS_PLATFORM is actually mandatory, so I think we could
>> use the latter addition of x-svq cmdline parameter and delay the
>> feature validations where it makes more sense.
>>
>>>> +            if (*dev_features & BIT_ULL(b)) {
>>>> +                clear_bit(b, dev_features);
>>>> +                r = false;
>>>> +            }
>>>> +            break;
>>>> +
>>>> +        case VIRTIO_F_VERSION_1:
>>>
>>> I had the same question here.
>>>
>> For VERSION_1 it's easier to assume that guest is little endian at
>> some points, but we could try harder to support both endianness if
>> needed.
>>
> Re-thinking the SVQ feature isolation stuff for this first iteration
> based on your comments.
>
> Maybe it's easier to simply fail if the device does not *match* the
> expected feature set, and add all of the "feature isolation" later.
> While a lot of guest <-> SVQ communication details are already solved
> for free with qemu's VirtQueue (indirect, packed, ...), we may
> simplify this series in particular and add the support for it later.
>
> For example, at this moment would be valid for the device to export
> indirect descriptors feature flag, and SVQ simply forward that feature
> flag offering to the guest. So the guest <-> SVQ communication could
> have indirect descriptors (qemu's VirtQueue code handles it for free),
> but SVQ would not acknowledge it for the device. As a side note, to
> negotiate it would have been harmless actually, but it's not the case
> of packed vq.
>
> So maybe for the v2 we can simply force the device to just export the
> strictly needed features and nothing else with qemu cmdline, and then
> enable the feature negotiation isolation for each side of SVQ?


Yes, that's exactly my point.

Thanks


>
> Thanks!
>
>
>> Thanks!
>>
>>> Thanks
>>>
>>>
>>>> +            /* SVQ trust that guest vring is little endian */
>>>> +            if (!(*dev_features & BIT_ULL(b))) {
>>>> +                set_bit(b, dev_features);
>>>> +                r = false;
>>>> +            }
>>>> +            continue;
>>>> +
>>>> +        default:
>>>> +            if (*dev_features & BIT_ULL(b)) {
>>>> +                clear_bit(b, dev_features);
>>>> +            }
>>>> +        }
>>>> +    }
>>>> +
>>>> +    return r;
>>>> +}
>>>> +
>>>>    /* Forward guest notifications */
>>>>    static void vhost_handle_guest_kick(EventNotifier *n)
>>>>    {
>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>> index bdb45c8808..9d801cf907 100644
>>>> --- a/hw/virtio/vhost-vdpa.c
>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>> @@ -855,10 +855,31 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>>>>        size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
>>>>        g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
>>>>                                                               vhost_psvq_free);
>>>> +    uint64_t dev_features;
>>>> +    uint64_t svq_features;
>>>> +    int r;
>>>> +    bool ok;
>>>> +
>>>>        if (!v->shadow_vqs_enabled) {
>>>>            goto out;
>>>>        }
>>>>
>>>> +    r = vhost_vdpa_get_features(hdev, &dev_features);
>>>> +    if (r != 0) {
>>>> +        error_setg(errp, "Can't get vdpa device features, got (%d)", r);
>>>> +        return r;
>>>> +    }
>>>> +
>>>> +    svq_features = dev_features;
>>>> +    ok = vhost_svq_valid_device_features(&svq_features);
>>>> +    if (unlikely(!ok)) {
>>>> +        error_setg(errp,
>>>> +            "SVQ Invalid device feature flags, offer: 0x%"PRIx64", ok: 0x%"PRIx64,
>>>> +            hdev->features, svq_features);
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_psvq_free);
>>>>        for (unsigned n = 0; n < hdev->nvqs; ++n) {
>>>>            VhostShadowVirtqueue *svq = vhost_svq_new();
>>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
       [not found]     ` <CAJaqyWdRKZp6CwnE+HAr0JALhSRh-trJbZ01kddnLTuRX_tMKQ@mail.gmail.com>
@ 2022-02-08  3:57       ` Jason Wang
       [not found]         ` <CAJaqyWfEEg2PKgxBAFwYhF9LD1oDtwVYXSjHHnCbstT3dvL2GA@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-08  3:57 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> First half of the buffers forwarding part, preparing vhost-vdpa
>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
>>> this is effectively dead code at the moment, but it helps to reduce
>>> patch size.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +-
>>>    hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
>>>    hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
>>>    3 files changed, 143 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 035207a469..39aef5ffdf 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>>>
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
>>>
>>> -VhostShadowVirtqueue *vhost_svq_new(void);
>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>>>
>>>    void vhost_svq_free(VhostShadowVirtqueue *vq);
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index f129ec8395..7c168075d7 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>    /**
>>>     * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>>>     * methods and file descriptors.
>>> + *
>>> + * @qsize Shadow VirtQueue size
>>> + *
>>> + * Returns the new virtqueue or NULL.
>>> + *
>>> + * In case of error, reason is reported through error_report.
>>>     */
>>> -VhostShadowVirtqueue *vhost_svq_new(void)
>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>>>    {
>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
>>> +    size_t device_size, driver_size;
>>>        g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>>>        int r;
>>>
>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>>>        /* Placeholder descriptor, it should be deleted at set_kick_fd */
>>>        event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>>>
>>> +    svq->vring.num = qsize;
>>
>> I wonder if this is the best. E.g some hardware can support up to 32K
>> queue size. So this will probably end up with:
>>
>> 1) SVQ use 32K queue size
>> 2) hardware queue uses 256
>>
> In that case SVQ vring queue size will be 32K and guest's vring can
> negotiate any number with SVQ equal or less than 32K,


Sorry for being unclear what I meant is actually

1) SVQ uses 32K queue size

2) guest vq uses 256

This looks like a burden that needs extra logic and may damage the 
performance.

And this can lead other interesting situation:

1) SVQ uses 256

2) guest vq uses 1024

Where a lot of more SVQ logic is needed.


> including 256.
> Is that what you mean?


I mean, it looks to me the logic will be much more simplified if we just 
allocate the shadow virtqueue with the size what guest can see (guest 
vring).

Then we don't need to think if the difference of the queue size can have 
any side effects.


>
> If with hardware queues you mean guest's vring, not sure why it is
> "probably 256". I'd say that in that case with the virtio-net kernel
> driver the ring size will be the same as the device export, for
> example, isn't it?
>
> The implementation should support any combination of sizes, but the
> ring size exposed to the guest is never bigger than hardware one.
>
>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
>> to add event index support?
>>
> I think we should not have any problem with event idx. If you mean
> that the guest could mark more buffers available than SVQ vring's
> size, that should not happen because there must be less entries in the
> guest than SVQ.
>
> But if I understood you correctly, a similar situation could happen if
> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> Even if that would happen, the situation should be ok too: SVQ knows
> the guest's avail idx and, if SVQ is full, it will continue forwarding
> avail buffers when the device uses more buffers.
>
> Does that make sense to you?


Yes.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr
       [not found]     ` <CAJaqyWexu=VroHQxmtJDQm=iu1va-s1VGR8hqGOreG0SOisjYg@mail.gmail.com>
@ 2022-02-08  6:58       ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-08  6:58 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 上午1:44, Eugenio Perez Martin 写道:
> On Sat, Jan 29, 2022 at 9:20 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> Doing that way allows vhost backend to know what address to return.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost.c | 6 +++---
>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>> index 7b03efccec..64b955ba0c 100644
>>> --- a/hw/virtio/vhost.c
>>> +++ b/hw/virtio/vhost.c
>>> @@ -798,9 +798,10 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>>>                                        struct vhost_virtqueue *vq,
>>>                                        unsigned idx, bool enable_log)
>>>    {
>>> -    struct vhost_vring_addr addr;
>>> +    struct vhost_vring_addr addr = {
>>> +        .index = idx,
>>> +    };
>>>        int r;
>>> -    memset(&addr, 0, sizeof(struct vhost_vring_addr));
>>>
>>>        if (dev->vhost_ops->vhost_vq_get_addr) {
>>>            r = dev->vhost_ops->vhost_vq_get_addr(dev, &addr, vq);
>>> @@ -813,7 +814,6 @@ static int vhost_virtqueue_set_addr(struct vhost_dev *dev,
>>>            addr.avail_user_addr = (uint64_t)(unsigned long)vq->avail;
>>>            addr.used_user_addr = (uint64_t)(unsigned long)vq->used;
>>>        }
>>
>> I'm a bit lost in the logic above, any reason we need call
>> vhost_vq_get_addr() :) ?
>>
> It's the way vhost_virtqueue_set_addr works if the backend has a
> vhost_vq_get_addr operation (currently, only vhost-vdpa). vhost first
> ask the address to the back end and then set it.


Right it's because vhost-vdpa doesn't use VA but GPA. But I'm not sure 
it's worth a dedicated vhost_ops. But consider we introduce shadow 
virtqueue stuffs, it should be ok now.

(In the future, we may consider to generalize non vhost-vdpa specific 
stuffs to VhostShadowVirtqueue, then we can get rid of this vhost_ops.


>
> Previously, index was not needed because all the information was in
> vhost_virtqueue. However to extract queue index from vhost_virtqueue
> is tricky, so I think it's easier to simply have that information at
> request, something similar to get_base or get_num when asking vdpa
> device. We can extract the index from vq - dev->vqs or something
> similar if it's prefered.


It looks odd for the caller to tell the index consider vhost_virtqueue 
is already passed. So I think we need deduce it from vhost_virtqueue as 
you mentioned here.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> -    addr.index = idx;
>>>        addr.log_guest_addr = vq->used_phys;
>>>        addr.flags = enable_log ? (1 << VHOST_VRING_F_LOG) : 0;
>>>        r = dev->vhost_ops->vhost_set_vring_addr(dev, &addr);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
       [not found]     ` <CAJaqyWdDax2+e3ZUEYyYNe5xAL=Oocu+72n89ygayrzYrQz2Yw@mail.gmail.com>
@ 2022-02-08  8:11       ` Jason Wang
       [not found]         ` <CAJaqyWfRWexq7jrCkJrPzLB4g_fK42pE8BarMhZwKNYtNXi7XA@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-08  8:11 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/2 上午1:08, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 5:43 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> Initial version of shadow virtqueue that actually forward buffers. There
>>> is no iommu support at the moment, and that will be addressed in future
>>> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
>>> this means that SVQ is not usable at this point of the series on any
>>> device.
>>>
>>> For simplicity it only supports modern devices, that expects vring
>>> in little endian, with split ring and no event idx or indirect
>>> descriptors. Support for them will not be added in this series.
>>>
>>> It reuses the VirtQueue code for the device part. The driver part is
>>> based on Linux's virtio_ring driver, but with stripped functionality
>>> and optimizations so it's easier to review.
>>>
>>> However, forwarding buffers have some particular pieces: One of the most
>>> unexpected ones is that a guest's buffer can expand through more than
>>> one descriptor in SVQ. While this is handled gracefully by qemu's
>>> emulated virtio devices, it may cause unexpected SVQ queue full. This
>>> patch also solves it by checking for this condition at both guest's
>>> kicks and device's calls. The code may be more elegant in the future if
>>> SVQ code runs in its own iocontext.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +
>>>    hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
>>>    hw/virtio/vhost-vdpa.c             | 111 ++++++++-
>>>    3 files changed, 462 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 39aef5ffdf..19c934af49 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
>>>    size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
>>>    size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>>>
>>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>>> +                     VirtQueue *vq);
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
>>>
>>>    VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index 7c168075d7..a1a404f68f 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -9,6 +9,8 @@
>>>
>>>    #include "qemu/osdep.h"
>>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
>>> +#include "hw/virtio/vhost.h"
>>> +#include "hw/virtio/virtio-access.h"
>>>    #include "standard-headers/linux/vhost_types.h"
>>>
>>>    #include "qemu/error-report.h"
>>> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
>>>
>>>        /* Guest's call notifier, where SVQ calls guest. */
>>>        EventNotifier svq_call;
>>> +
>>> +    /* Virtio queue shadowing */
>>> +    VirtQueue *vq;
>>> +
>>> +    /* Virtio device */
>>> +    VirtIODevice *vdev;
>>> +
>>> +    /* Map for returning guest's descriptors */
>>> +    VirtQueueElement **ring_id_maps;
>>> +
>>> +    /* Next VirtQueue element that guest made available */
>>> +    VirtQueueElement *next_guest_avail_elem;
>>> +
>>> +    /* Next head to expose to device */
>>> +    uint16_t avail_idx_shadow;
>>> +
>>> +    /* Next free descriptor */
>>> +    uint16_t free_head;
>>> +
>>> +    /* Last seen used idx */
>>> +    uint16_t shadow_used_idx;
>>> +
>>> +    /* Next head to consume from device */
>>> +    uint16_t last_used_idx;
>>> +
>>> +    /* Cache for the exposed notification flag */
>>> +    bool notification;
>>>    } VhostShadowVirtqueue;
>>>
>>>    #define INVALID_SVQ_KICK_FD -1
>>> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
>>>        return true;
>>>    }
>>>
>>> -/* Forward guest notifications */
>>> -static void vhost_handle_guest_kick(EventNotifier *n)
>>> +/**
>>> + * Number of descriptors that SVQ can make available from the guest.
>>> + *
>>> + * @svq   The svq
>>> + */
>>> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
>>>    {
>>> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>> -                                             svq_kick);
>>> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
>>> +}
>>> +
>>> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
>>> +{
>>> +    uint16_t notification_flag;
>>>
>>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
>>> +    if (svq->notification == enable) {
>>> +        return;
>>> +    }
>>> +
>>> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
>>> +
>>> +    svq->notification = enable;
>>> +    if (enable) {
>>> +        svq->vring.avail->flags &= ~notification_flag;
>>> +    } else {
>>> +        svq->vring.avail->flags |= notification_flag;
>>> +    }
>>> +}
>>> +
>>> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
>>> +                                    const struct iovec *iovec,
>>> +                                    size_t num, bool more_descs, bool write)
>>> +{
>>> +    uint16_t i = svq->free_head, last = svq->free_head;
>>> +    unsigned n;
>>> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
>>> +    vring_desc_t *descs = svq->vring.desc;
>>> +
>>> +    if (num == 0) {
>>> +        return;
>>> +    }
>>> +
>>> +    for (n = 0; n < num; n++) {
>>> +        if (more_descs || (n + 1 < num)) {
>>> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
>>> +        } else {
>>> +            descs[i].flags = flags;
>>> +        }
>>> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
>>> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
>>> +
>>> +        last = i;
>>> +        i = cpu_to_le16(descs[i].next);
>>> +    }
>>> +
>>> +    svq->free_head = le16_to_cpu(descs[last].next);
>>> +}
>>> +
>>> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
>>> +                                    VirtQueueElement *elem)
>>> +{
>>> +    int head;
>>> +    unsigned avail_idx;
>>> +    vring_avail_t *avail = svq->vring.avail;
>>> +
>>> +    head = svq->free_head;
>>> +
>>> +    /* We need some descriptors here */
>>> +    assert(elem->out_num || elem->in_num);
>>
>> Looks like this could be triggered by guest, we need fail instead assert
>> here.
>>
> My understanding was that virtqueue_pop already sanitized that case,
> but I'm not able to find where now. I will recheck and, in case it's
> not, I will move to a failure.
>
>>> +
>>> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
>>> +                            elem->in_num > 0, false);
>>> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
>>> +
>>> +    /*
>>> +     * Put entry in available array (but don't update avail->idx until they
>>> +     * do sync).
>>> +     */
>>> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
>>> +    avail->ring[avail_idx] = cpu_to_le16(head);
>>> +    svq->avail_idx_shadow++;
>>> +
>>> +    /* Update avail index after the descriptor is wrote */
>>> +    smp_wmb();
>>> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
>>> +
>>> +    return head;
>>> +}
>>> +
>>> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
>>> +{
>>> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
>>> +
>>> +    svq->ring_id_maps[qemu_head] = elem;
>>> +}
>>> +
>>> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
>>> +{
>>> +    /* We need to expose available array entries before checking used flags */
>>> +    smp_mb();
>>> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
>>>            return;
>>>        }
>>>
>>>        event_notifier_set(&svq->hdev_kick);
>>>    }
>>>
>>> -/* Forward vhost notifications */
>>> +/**
>>> + * Forward available buffers.
>>> + *
>>> + * @svq Shadow VirtQueue
>>> + *
>>> + * Note that this function does not guarantee that all guest's available
>>> + * buffers are available to the device in SVQ avail ring. The guest may have
>>> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
>>> + * vaddr.
>>> + *
>>> + * If that happens, guest's kick notifications will be disabled until device
>>> + * makes some buffers used.
>>> + */
>>> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
>>> +{
>>> +    /* Clear event notifier */
>>> +    event_notifier_test_and_clear(&svq->svq_kick);
>>> +
>>> +    /* Make available as many buffers as possible */
>>> +    do {
>>> +        if (virtio_queue_get_notification(svq->vq)) {
>>> +            virtio_queue_set_notification(svq->vq, false);
>>
>> This looks like an optimization the should belong to
>> virtio_queue_set_notification() itself.
>>
> Sure we can move.
>
>>> +        }
>>> +
>>> +        while (true) {
>>> +            VirtQueueElement *elem;
>>> +
>>> +            if (svq->next_guest_avail_elem) {
>>> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>> +            } else {
>>> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
>>> +            }
>>> +
>>> +            if (!elem) {
>>> +                break;
>>> +            }
>>> +
>>> +            if (elem->out_num + elem->in_num >
>>> +                vhost_svq_available_slots(svq)) {
>>> +                /*
>>> +                 * This condition is possible since a contiguous buffer in GPA
>>> +                 * does not imply a contiguous buffer in qemu's VA
>>> +                 * scatter-gather segments. If that happen, the buffer exposed
>>> +                 * to the device needs to be a chain of descriptors at this
>>> +                 * moment.
>>> +                 *
>>> +                 * SVQ cannot hold more available buffers if we are here:
>>> +                 * queue the current guest descriptor and ignore further kicks
>>> +                 * until some elements are used.
>>> +                 */
>>> +                svq->next_guest_avail_elem = elem;
>>> +                return;
>>> +            }
>>> +
>>> +            vhost_svq_add(svq, elem);
>>> +            vhost_svq_kick(svq);
>>> +        }
>>> +
>>> +        virtio_queue_set_notification(svq->vq, true);
>>> +    } while (!virtio_queue_empty(svq->vq));
>>> +}
>>> +
>>> +/**
>>> + * Handle guest's kick.
>>> + *
>>> + * @n guest kick event notifier, the one that guest set to notify svq.
>>> + */
>>> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
>>> +{
>>> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>> +                                             svq_kick);
>>> +    vhost_handle_guest_kick(svq);
>>> +}
>>> +
>>> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
>>> +{
>>> +    if (svq->last_used_idx != svq->shadow_used_idx) {
>>> +        return true;
>>> +    }
>>> +
>>> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
>>> +
>>> +    return svq->last_used_idx != svq->shadow_used_idx;
>>> +}
>>> +
>>> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
>>> +{
>>> +    vring_desc_t *descs = svq->vring.desc;
>>> +    const vring_used_t *used = svq->vring.used;
>>> +    vring_used_elem_t used_elem;
>>> +    uint16_t last_used;
>>> +
>>> +    if (!vhost_svq_more_used(svq)) {
>>> +        return NULL;
>>> +    }
>>> +
>>> +    /* Only get used array entries after they have been exposed by dev */
>>> +    smp_rmb();
>>> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
>>> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
>>> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
>>> +
>>> +    svq->last_used_idx++;
>>> +    if (unlikely(used_elem.id >= svq->vring.num)) {
>>> +        error_report("Device %s says index %u is used", svq->vdev->name,
>>> +                     used_elem.id);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
>>> +        error_report(
>>> +            "Device %s says index %u is used, but it was not available",
>>> +            svq->vdev->name, used_elem.id);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    descs[used_elem.id].next = svq->free_head;
>>> +    svq->free_head = used_elem.id;
>>> +
>>> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
>>> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
>>> +}
>>> +
>>> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
>>> +                            bool check_for_avail_queue)
>>> +{
>>> +    VirtQueue *vq = svq->vq;
>>> +
>>> +    /* Make as many buffers as possible used. */
>>> +    do {
>>> +        unsigned i = 0;
>>> +
>>> +        vhost_svq_set_notification(svq, false);
>>> +        while (true) {
>>> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
>>> +            if (!elem) {
>>> +                break;
>>> +            }
>>> +
>>> +            if (unlikely(i >= svq->vring.num)) {
>>> +                virtio_error(svq->vdev,
>>> +                         "More than %u used buffers obtained in a %u size SVQ",
>>> +                         i, svq->vring.num);
>>> +                virtqueue_fill(vq, elem, elem->len, i);
>>> +                virtqueue_flush(vq, i);
>>
>> Let's simply use virtqueue_push() here?
>>
> virtqueue_push support to fill and flush only one element, instead of
> batch. I'm fine with either but I think the less updates to the used
> idx, the better.


Fine.


>
>>> +                i = 0;
>>
>> Do we need to bail out here?
>>
> Yes I guess we can simply return.
>
>>> +            }
>>> +            virtqueue_fill(vq, elem, elem->len, i++);
>>> +        }
>>> +
>>> +        virtqueue_flush(vq, i);
>>> +        event_notifier_set(&svq->svq_call);
>>> +
>>> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
>>> +            /*
>>> +             * Avail ring was full when vhost_svq_flush was called, so it's a
>>> +             * good moment to make more descriptors available if possible
>>> +             */
>>> +            vhost_handle_guest_kick(svq);
>>
>> Is there better to have a similar check as vhost_handle_guest_kick() did?
>>
>>               if (elem->out_num + elem->in_num >
>>                   vhost_svq_available_slots(svq)) {
>>
> It will be duplicated when we call vhost_handle_guest_kick, won't it?


Right, I mis-read the code.


>
>>> +        }
>>> +
>>> +        vhost_svq_set_notification(svq, true);
>>
>> A mb() is needed here? Otherwise we may lost a call here (where
>> vhost_svq_more_used() is run before vhost_svq_set_notification()).
>>
> I'm confused here then, I thought you said this is just a hint so
> there was no need? [1]. I think the memory barrier is needed too.


Yes, it's a hint but:

1) When we disable the notification, consider the notification disable 
is just a hint, device can still raise an interrupt, so the ordering is 
meaningless and a memory barrier is not necessary (the 
vhost_svq_set_notification(svq, false))

2) When we enable the notification, though it's a hint, the device can 
choose to implement it by enabling the interrupt, in this case, the 
notification enable should be done before checking the used. Otherwise, 
the checking of more used might be done before enable the notification:

1) driver check more used
2) device add more used but no notification
3) driver enable the notification then we lost a notification here


>>> +    } while (vhost_svq_more_used(svq));
>>> +}
>>> +
>>> +/**
>>> + * Forward used buffers.
>>> + *
>>> + * @n hdev call event notifier, the one that device set to notify svq.
>>> + *
>>> + * Note that we are not making any buffers available in the loop, there is no
>>> + * way that it runs more than virtqueue size times.
>>> + */
>>>    static void vhost_svq_handle_call(EventNotifier *n)
>>>    {
>>>        VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
>>>                                                 hdev_call);
>>>
>>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
>>> -        return;
>>> -    }
>>> +    /* Clear event notifier */
>>> +    event_notifier_test_and_clear(n);
>>
>> Any reason that we remove the above check?
>>
> This comes from the previous versions, where this made sure we missed
> no used buffers in the process of switching to SVQ mode.


I'm not sure I get here. Even if for the switching, it should be more 
safe the handle the flush unconditionally?

Thanks


>
> If we enable SVQ from the beginning I think we can rely on getting all
> the device's used buffer notifications, so let me think a little bit
> and I can move to check the eventfd.
>
>>> -    event_notifier_set(&svq->svq_call);
>>> +    vhost_svq_flush(svq, true);
>>>    }
>>>
>>>    /**
>>> @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>         * need to explicitely check for them.
>>>         */
>>>        event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
>>> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
>>> +    event_notifier_set_handler(&svq->svq_kick,
>>> +                               vhost_handle_guest_kick_notifier);
>>>
>>>        if (!check_old || event_notifier_test_and_clear(&tmp)) {
>>>            event_notifier_set(&svq->hdev_kick);
>>>        }
>>>    }
>>>
>>> +/**
>>> + * Start shadow virtqueue operation.
>>> + *
>>> + * @svq Shadow Virtqueue
>>> + * @vdev        VirtIO device
>>> + * @vq          Virtqueue to shadow
>>> + */
>>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
>>> +                     VirtQueue *vq)
>>> +{
>>> +    svq->next_guest_avail_elem = NULL;
>>> +    svq->avail_idx_shadow = 0;
>>> +    svq->shadow_used_idx = 0;
>>> +    svq->last_used_idx = 0;
>>> +    svq->vdev = vdev;
>>> +    svq->vq = vq;
>>> +
>>> +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
>>> +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
>>> +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
>>> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
>>> +    }
>>> +}
>>> +
>>>    /**
>>>     * Stop shadow virtqueue operation.
>>>     * @svq Shadow Virtqueue
>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>    {
>>>        event_notifier_set_handler(&svq->svq_kick, NULL);
>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>> +
>>> +    if (!svq->vq) {
>>> +        return;
>>> +    }
>>> +
>>> +    /* Send all pending used descriptors to guest */
>>> +    vhost_svq_flush(svq, false);
>>> +
>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>> +        g_autofree VirtQueueElement *elem = NULL;
>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>> +        if (elem) {
>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>> +        }
>>> +    }
>>> +
>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>> +    if (next_avail_elem) {
>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>> +                                 next_avail_elem->len);
>>> +    }
>>>    }
>>>
>>>    /**
>>> @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>>>        memset(svq->vring.desc, 0, driver_size);
>>>        svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
>>>        memset(svq->vring.used, 0, device_size);
>>> -
>>> +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
>>>        event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
>>>        return g_steal_pointer(&svq);
>>>
>>> @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
>>>        event_notifier_cleanup(&vq->hdev_kick);
>>>        event_notifier_set_handler(&vq->hdev_call, NULL);
>>>        event_notifier_cleanup(&vq->hdev_call);
>>> +    g_free(vq->ring_id_maps);
>>>        qemu_vfree(vq->vring.desc);
>>>        qemu_vfree(vq->vring.used);
>>>        g_free(vq);
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 53e14bafa0..0e5c00ed7e 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>     * Note that this function does not rewind kick file descriptor if cannot set
>>>     * call one.
>>>     */
>>> -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>> -                                VhostShadowVirtqueue *svq,
>>> -                                unsigned idx)
>>> +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
>>> +                                  VhostShadowVirtqueue *svq,
>>> +                                  unsigned idx)
>>>    {
>>>        struct vhost_vring_file file = {
>>>            .index = dev->vq_index + idx,
>>> @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>>        r = vhost_vdpa_set_vring_dev_kick(dev, &file);
>>>        if (unlikely(r != 0)) {
>>>            error_report("Can't set device kick fd (%d)", -r);
>>> -        return false;
>>> +        return r;
>>>        }
>>>
>>>        event_notifier = vhost_svq_get_svq_call_notifier(svq);
>>> @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>>            error_report("Can't set device call fd (%d)", -r);
>>>        }
>>>
>>> +    return r;
>>> +}
>>> +
>>> +/**
>>> + * Unmap SVQ area in the device
>>> + */
>>> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
>>> +                                      hwaddr size)
>>> +{
>>> +    int r;
>>> +
>>> +    size = ROUND_UP(size, qemu_real_host_page_size);
>>> +    r = vhost_vdpa_dma_unmap(v, iova, size);
>>> +    return r == 0;
>>> +}
>>> +
>>> +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
>>> +                                       const VhostShadowVirtqueue *svq)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    struct vhost_vring_addr svq_addr;
>>> +    size_t device_size = vhost_svq_device_area_size(svq);
>>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
>>> +    bool ok;
>>> +
>>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
>>> +
>>> +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
>>> +    if (unlikely(!ok)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
>>> +}
>>> +
>>> +/**
>>> + * Map shadow virtqueue rings in device
>>> + *
>>> + * @dev   The vhost device
>>> + * @svq   The shadow virtqueue
>>> + */
>>> +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
>>> +                                     const VhostShadowVirtqueue *svq)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    struct vhost_vring_addr svq_addr;
>>> +    size_t device_size = vhost_svq_device_area_size(svq);
>>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
>>> +    int r;
>>> +
>>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
>>> +
>>> +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
>>> +                           (void *)svq_addr.desc_user_addr, true);
>>> +    if (unlikely(r != 0)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
>>> +                           (void *)svq_addr.used_user_addr, false);
>>
>> Do we need unmap the driver area if we fail here?
>>
> Yes, this used to trust in unmap them at the disabling of SVQ. Now I
> think we need to unmap as you say.
>
> Thanks!
>
> [1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-March/053322.html
>
>> Thanks
>>
>>
>>> +    return r == 0;
>>> +}
>>> +
>>> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>> +                                VhostShadowVirtqueue *svq,
>>> +                                unsigned idx)
>>> +{
>>> +    uint16_t vq_index = dev->vq_index + idx;
>>> +    struct vhost_vring_state s = {
>>> +        .index = vq_index,
>>> +    };
>>> +    int r;
>>> +    bool ok;
>>> +
>>> +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
>>> +    if (unlikely(r)) {
>>> +        error_report("Can't set vring base (%d)", r);
>>> +        return false;
>>> +    }
>>> +
>>> +    s.num = vhost_svq_get_num(svq);
>>> +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
>>> +    if (unlikely(r)) {
>>> +        error_report("Can't set vring num (%d)", r);
>>> +        return false;
>>> +    }
>>> +
>>> +    ok = vhost_vdpa_svq_map_rings(dev, svq);
>>> +    if (unlikely(!ok)) {
>>> +        return false;
>>> +    }
>>> +
>>> +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
>>>        return r == 0;
>>>    }
>>>
>>> @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>        if (started) {
>>>            vhost_vdpa_host_notifiers_init(dev);
>>>            for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
>>>                VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>>>                bool ok = vhost_vdpa_svq_setup(dev, svq, i);
>>>                if (unlikely(!ok)) {
>>>                    return -1;
>>>                }
>>> +            vhost_svq_start(svq, dev->vdev, vq);
>>>            }
>>>            vhost_vdpa_set_vring_ready(dev);
>>>        } else {
>>> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
>>> +                                                          i);
>>> +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
>>> +            if (unlikely(!ok)) {
>>> +                return -1;
>>> +            }
>>> +        }
>>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>>        }
>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
       [not found]     ` <CAJaqyWfF01k3LntM7RLEmFcej=EY2d4+2MARKXPptQ2J7VnB9A@mail.gmail.com>
@ 2022-02-08  8:15       ` Jason Wang
       [not found]         ` <CAJaqyWedqtzRW=ur7upchneSc-oOkvkr3FUph_BfphV3zTmnkw@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-08  8:15 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>    {
>>>        event_notifier_set_handler(&svq->svq_kick, NULL);
>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>> +
>>> +    if (!svq->vq) {
>>> +        return;
>>> +    }
>>> +
>>> +    /* Send all pending used descriptors to guest */
>>> +    vhost_svq_flush(svq, false);
>>
>> Do we need to wait for all the pending descriptors to be completed here?
>>
> No, this function does not wait, it only completes the forwarding of
> the *used* descriptors.
>
> The best example is the net rx queue in my opinion. This call will
> check SVQ's vring used_idx and will forward the last used descriptors
> if any, but all available descriptors will remain as available for
> qemu's VQ code.
>
> To skip it would miss those last rx descriptors in migration.
>
> Thanks!


So it's probably to not the best place to ask. It's more about the 
inflight descriptors so it should be TX instead of RX.

I can imagine the migration last phase, we should stop the vhost-vDPA 
before calling vhost_svq_stop(). Then we should be fine regardless of 
inflight descriptors.

Thanks


>
>> Thanks
>>
>>
>>> +
>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>> +        g_autofree VirtQueueElement *elem = NULL;
>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>> +        if (elem) {
>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>> +        }
>>> +    }
>>> +
>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>> +    if (next_avail_elem) {
>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>> +                                 next_avail_elem->len);
>>> +    }
>>>    }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 22/31] vhost: Add VhostIOVATree
       [not found]     ` <CAJaqyWePW6hJKAm7nk+syqmXAgdTQSTtuv9jACu_+hgbg2bRHg@mail.gmail.com>
@ 2022-02-08  8:17       ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-08  8:17 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/2 上午1:27, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 6:21 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> This tree is able to look for a translated address from an IOVA address.
>>>
>>> At first glance it is similar to util/iova-tree. However, SVQ working on
>>> devices with limited IOVA space need more capabilities,
>>
>> So did the IOVA tree (e.g l2 vtd can only work in the range of GAW and
>> without RMRRs).
>>
>>
>>>    like allocating
>>> IOVA chunks or performing reverse translations (qemu addresses to iova).
>>
>> This looks like a general request as well. So I wonder if we can simply
>> extend iova tree instead.
>>
> While both are true, I don't see code that performs allocations or
> qemu vaddr to iova translations. But if the changes can be integrated
> into iova-tree that would be great for sure.
>
> The main drawback I see is the need to maintain two trees instead of
> one for users of iova-tree. While complexity does not grow, it needs
> to double the amount of work needed.


If you care about the performance, we can disable the reverse mapping 
during the allocation. For vIOMMU users it won't notice any performance 
penalty.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> The allocation capability, as "assign a free IOVA address to this chunk
>>> of memory in qemu's address space" allows shadow virtqueue to create a
>>> new address space that is not restricted by guest's addressable one, so
>>> we can allocate shadow vqs vrings outside of it.
>>>
>>> It duplicates the tree so it can search efficiently both directions,
>>> and it will signal overlap if iova or the translated address is
>>> present in any tree.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-iova-tree.h |  27 +++++++
>>>    hw/virtio/vhost-iova-tree.c | 157 ++++++++++++++++++++++++++++++++++++
>>>    hw/virtio/meson.build       |   2 +-
>>>    3 files changed, 185 insertions(+), 1 deletion(-)
>>>    create mode 100644 hw/virtio/vhost-iova-tree.h
>>>    create mode 100644 hw/virtio/vhost-iova-tree.c
>>>
>>> diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
>>> new file mode 100644
>>> index 0000000000..610394eaf1
>>> --- /dev/null
>>> +++ b/hw/virtio/vhost-iova-tree.h
>>> @@ -0,0 +1,27 @@
>>> +/*
>>> + * vhost software live migration ring
>>> + *
>>> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
>>> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
>>> + *
>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>> + */
>>> +
>>> +#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
>>> +#define HW_VIRTIO_VHOST_IOVA_TREE_H
>>> +
>>> +#include "qemu/iova-tree.h"
>>> +#include "exec/memory.h"
>>> +
>>> +typedef struct VhostIOVATree VhostIOVATree;
>>> +
>>> +VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
>>> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
>>> +G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
>>> +
>>> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
>>> +                                        const DMAMap *map);
>>> +int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
>>> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
>>> +
>>> +#endif
>>> diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
>>> new file mode 100644
>>> index 0000000000..0021dbaf54
>>> --- /dev/null
>>> +++ b/hw/virtio/vhost-iova-tree.c
>>> @@ -0,0 +1,157 @@
>>> +/*
>>> + * vhost software live migration ring
>>> + *
>>> + * SPDX-FileCopyrightText: Red Hat, Inc. 2021
>>> + * SPDX-FileContributor: Author: Eugenio Pérez <eperezma@redhat.com>
>>> + *
>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qemu/iova-tree.h"
>>> +#include "vhost-iova-tree.h"
>>> +
>>> +#define iova_min_addr qemu_real_host_page_size
>>> +
>>> +/**
>>> + * VhostIOVATree, able to:
>>> + * - Translate iova address
>>> + * - Reverse translate iova address (from translated to iova)
>>> + * - Allocate IOVA regions for translated range (potentially slow operation)
>>> + *
>>> + * Note that it cannot remove nodes.
>>> + */
>>> +struct VhostIOVATree {
>>> +    /* First addresable iova address in the device */
>>> +    uint64_t iova_first;
>>> +
>>> +    /* Last addressable iova address in the device */
>>> +    uint64_t iova_last;
>>> +
>>> +    /* IOVA address to qemu memory maps. */
>>> +    IOVATree *iova_taddr_map;
>>> +
>>> +    /* QEMU virtual memory address to iova maps */
>>> +    GTree *taddr_iova_map;
>>> +};
>>> +
>>> +static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
>>> +                                      gpointer data)
>>> +{
>>> +    const DMAMap *m1 = a, *m2 = b;
>>> +
>>> +    if (m1->translated_addr > m2->translated_addr + m2->size) {
>>> +        return 1;
>>> +    }
>>> +
>>> +    if (m1->translated_addr + m1->size < m2->translated_addr) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    /* Overlapped */
>>> +    return 0;
>>> +}
>>> +
>>> +/**
>>> + * Create a new IOVA tree
>>> + *
>>> + * Returns the new IOVA tree
>>> + */
>>> +VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
>>> +{
>>> +    VhostIOVATree *tree = g_new(VhostIOVATree, 1);
>>> +
>>> +    /* Some devices does not like 0 addresses */
>>> +    tree->iova_first = MAX(iova_first, iova_min_addr);
>>> +    tree->iova_last = iova_last;
>>> +
>>> +    tree->iova_taddr_map = iova_tree_new();
>>> +    tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
>>> +                                           NULL, g_free);
>>> +    return tree;
>>> +}
>>> +
>>> +/**
>>> + * Delete an iova tree
>>> + */
>>> +void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
>>> +{
>>> +    iova_tree_destroy(iova_tree->iova_taddr_map);
>>> +    g_tree_unref(iova_tree->taddr_iova_map);
>>> +    g_free(iova_tree);
>>> +}
>>> +
>>> +/**
>>> + * Find the IOVA address stored from a memory address
>>> + *
>>> + * @tree     The iova tree
>>> + * @map      The map with the memory address
>>> + *
>>> + * Return the stored mapping, or NULL if not found.
>>> + */
>>> +const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
>>> +                                        const DMAMap *map)
>>> +{
>>> +    return g_tree_lookup(tree->taddr_iova_map, map);
>>> +}
>>> +
>>> +/**
>>> + * Allocate a new mapping
>>> + *
>>> + * @tree  The iova tree
>>> + * @map   The iova map
>>> + *
>>> + * Returns:
>>> + * - IOVA_OK if the map fits in the container
>>> + * - IOVA_ERR_INVALID if the map does not make sense (like size overflow)
>>> + * - IOVA_ERR_OVERLAP if the tree already contains that map
>>> + * - IOVA_ERR_NOMEM if tree cannot allocate more space.
>>> + *
>>> + * It returns assignated iova in map->iova if return value is VHOST_DMA_MAP_OK.
>>> + */
>>> +int vhost_iova_tree_map_alloc(VhostIOVATree *tree, DMAMap *map)
>>> +{
>>> +    /* Some vhost devices does not like addr 0. Skip first page */
>>> +    hwaddr iova_first = tree->iova_first ?: qemu_real_host_page_size;
>>> +    DMAMap *new;
>>> +    int r;
>>> +
>>> +    if (map->translated_addr + map->size < map->translated_addr ||
>>> +        map->perm == IOMMU_NONE) {
>>> +        return IOVA_ERR_INVALID;
>>> +    }
>>> +
>>> +    /* Check for collisions in translated addresses */
>>> +    if (vhost_iova_tree_find_iova(tree, map)) {
>>> +        return IOVA_ERR_OVERLAP;
>>> +    }
>>> +
>>> +    /* Allocate a node in IOVA address */
>>> +    r = iova_tree_alloc(tree->iova_taddr_map, map, iova_first,
>>> +                        tree->iova_last);
>>> +    if (r != IOVA_OK) {
>>> +        return r;
>>> +    }
>>> +
>>> +    /* Allocate node in qemu -> iova translations */
>>> +    new = g_malloc(sizeof(*new));
>>> +    memcpy(new, map, sizeof(*new));
>>> +    g_tree_insert(tree->taddr_iova_map, new, new);
>>> +    return IOVA_OK;
>>> +}
>>> +
>>> +/**
>>> + * Remove existing mappings from iova tree
>>> + *
>>> + * @param  iova_tree  The vhost iova tree
>>> + * @param  map        The map to remove
>>> + */
>>> +void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map)
>>> +{
>>> +    const DMAMap *overlap;
>>> +
>>> +    iova_tree_remove(iova_tree->iova_taddr_map, map);
>>> +    while ((overlap = vhost_iova_tree_find_iova(iova_tree, map))) {
>>> +        g_tree_remove(iova_tree->taddr_iova_map, overlap);
>>> +    }
>>> +}
>>> diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
>>> index 2dc87613bc..6047670804 100644
>>> --- a/hw/virtio/meson.build
>>> +++ b/hw/virtio/meson.build
>>> @@ -11,7 +11,7 @@ softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
>>>
>>>    virtio_ss = ss.source_set()
>>>    virtio_ss.add(files('virtio.c'))
>>> -virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c'))
>>> +virtio_ss.add(when: 'CONFIG_VHOST', if_true: files('vhost.c', 'vhost-backend.c', 'vhost-shadow-virtqueue.c', 'vhost-iova-tree.c'))
>>>    virtio_ss.add(when: 'CONFIG_VHOST_USER', if_true: files('vhost-user.c'))
>>>    virtio_ss.add(when: 'CONFIG_VHOST_VDPA', if_true: files('vhost-vdpa.c'))
>>>    virtio_ss.add(when: 'CONFIG_VIRTIO_BALLOON', if_true: files('virtio-balloon.c'))

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ
       [not found]     ` <CAJaqyWe1zH8bfaoxTyz_RXH=0q+Yk9H7QyUffaRB1fCV9oVLZQ@mail.gmail.com>
@ 2022-02-08  8:19       ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-08  8:19 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 上午3:11, Eugenio Perez Martin 写道:
>>> +            return false;
>>> +        }
>>> +
>>> +        /*
>>> +         * Map->iova chunk size is ignored. What to do if descriptor
>>> +         * (addr, size) does not fit is delegated to the device.
>>> +         */
>> I think we need at least check the size and fail if the size doesn't
>> match here. Or is it possible that we have a buffer that may cross two
>> memory regions?
>>
> It should be impossible, since both iova_tree and VirtQueue should be
> in sync regarding the memory regions updates. If a VirtQueue buffer
> crosses many memory regions, iovec has more entries.
>
> I can add a return false, but I'm not able to trigger that situation
> even with a malformed driver.
>

Ok, but it won't harm to add a warn here.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
       [not found]     ` <CAJaqyWdBLU+maEhByepzeH7iwLmqUba0rRb8PM4VwBy2P8Vtow@mail.gmail.com>
@ 2022-02-08  8:25       ` Jason Wang
       [not found]         ` <CAJaqyWcvWjPas0=xp+U-c-kG+e6k73jg=C4phFD7S-tZY=niSQ@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-08  8:25 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> SVQ is able to log the dirty bits by itself, so let's use it to not
>>> block migration.
>>>
>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
>>> enabled. Even if the device supports it, the reports would be nonsense
>>> because SVQ memory is in the qemu region.
>>>
>>> The log region is still allocated. Future changes might skip that, but
>>> this series is already long enough.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
>>>    1 file changed, 20 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index fb0a338baa..75090d65e8 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
>>>        if (ret == 0 && v->shadow_vqs_enabled) {
>>>            /* Filter only features that SVQ can offer to guest */
>>>            vhost_svq_valid_guest_features(features);
>>> +
>>> +        /* Add SVQ logging capabilities */
>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
>>>        }
>>>
>>>        return ret;
>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>>
>>>        if (v->shadow_vqs_enabled) {
>>>            uint64_t dev_features, svq_features, acked_features;
>>> +        uint8_t status = 0;
>>>            bool ok;
>>>
>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
>>> +        if (unlikely(ret)) {
>>> +            return ret;
>>> +        }
>>> +
>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>>> +            /*
>>> +             * vhost is trying to enable or disable _F_LOG, and the device
>>> +             * would report wrong dirty pages. SVQ handles it.
>>> +             */
>>
>> I fail to understand this comment, I'd think there's no way to disable
>> dirty page tracking for SVQ.
>>
> vhost_log_global_{start,stop} are called at the beginning and end of
> migration. To inform the device that it should start logging, they set
> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.


Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The 
only thing is to ignore or filter out the F_LOG_ALL and pretend to be 
enabled and disabled.


>
> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> vhost does not block migration. Maybe we need to look for another way
> to do this?


I'm fine with filtering since it's much more simpler, but I fail to 
understand why we need to check DRIVER_OK.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> +            return 0;
>>> +        }
>>> +
>>> +        /* We must not ack _F_LOG if SVQ is enabled */
>>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
>>> +
>>>            ret = vhost_vdpa_get_dev_features(dev, &dev_features);
>>>            if (ret != 0) {
>>>                error_report("Can't get vdpa device features, got (%d)", ret);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/31] vDPA shadow virtqueue
       [not found]   ` <CAJaqyWfWxQSJc3YMpF6g7VwZBN_ab0Z+1nXgWH1sg+uBaOYgBQ@mail.gmail.com>
@ 2022-02-08  8:27     ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-08  8:27 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/31 下午5:15, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> This series enables shadow virtqueue (SVQ) for vhost-vdpa devices. This
>>> is intended as a new method of tracking the memory the devices touch
>>> during a migration process: Instead of relay on vhost device's dirty
>>> logging capability, SVQ intercepts the VQ dataplane forwarding the
>>> descriptors between VM and device. This way qemu is the effective
>>> writer of guests memory, like in qemu's emulated virtio device
>>> operation.
>>>
>>> When SVQ is enabled qemu offers a new virtual address space to the
>>> device to read and write into, and it maps new vrings and the guest
>>> memory in it. SVQ also intercepts kicks and calls between the device
>>> and the guest. Used buffers relay would cause dirty memory being
>>> tracked, but at this RFC SVQ is not enabled on migration automatically.
>>>
>>> Thanks of being a buffers relay system, SVQ can be used also to
>>> communicate devices and drivers with different capabilities, like
>>> devices that only support packed vring and not split and old guests with
>>> no driver packed support.
>>>
>>> It is based on the ideas of DPDK SW assisted LM, in the series of
>>> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
>>> not map the shadow vq in guest's VA, but in qemu's.
>>>
>>> This version of SVQ is limited in the amount of features it can use with
>>> guest and device, because this series is already very big otherwise.
>>> Features like indirect or event_idx will be addressed in future series.
>>>
>>> SVQ needs to be enabled with cmdline parameter x-svq, like:
>>>
>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true
>>>
>>> In this version it cannot be enabled or disabled in runtime. Further
>>> series will remove this limitation and will enable it only for migration
>>> time.
>>>
>>> Some patches are intentionally very small to ease review, but they can
>>> be squashed if preferred.
>>>
>>> Patches 1-10 prepares the SVQ and QEMU to support both guest to device
>>> and device to guest notifications forwarding, with the extra qemu hop.
>>> That part can be tested in isolation if cmdline change is reproduced.
>>>
>>> Patches from 11 to 18 implement the actual buffer forwarding, but with
>>> no IOMMU support. It requires a vdpa device capable of addressing all
>>> qemu vaddr.
>>>
>>> Patches 19 to 23 adds the iommu support, so the device with address
>>> range limitations can access SVQ through this new virtual address space
>>> created.
>>>
>>> The rest of the series add the last pieces needed for migration.
>>>
>>> Comments are welcome.
>>
>> I wonder the performance impact. So performance numbers are more than
>> welcomed.
>>
> Sure, I'll do it for the next revision. Since this one brings a decent
> amount of changes, I chose to collect the feedback first.


A simple single TCP_STREAM netperf test should be sufficient to give 
some basic understanding about the performance impact.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>> TODO:
>>> * Event, indirect, packed, and other features of virtio.
>>> * To separate buffers forwarding in its own AIO context, so we can
>>>     throw more threads to that task and we don't need to stop the main
>>>     event loop.
>>> * Support virtio-net control vq.
>>> * Proper documentation.
>>>
>>> Changes from v5 RFC:
>>> * Remove dynamic enablement of SVQ, making less dependent of the device.
>>> * Enable live migration if SVQ is enabled.
>>> * Fix SVQ when driver reset.
>>> * Comments addressed, specially in the iova area.
>>> * Rebase on latest master, adding multiqueue support (but no networking
>>>     control vq processing).
>>> v5 link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg07250.html
>>>
>>> Changes from v4 RFC:
>>> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>>>     already present iova-tree for that.
>>> * Proper validation of guest features. Now SVQ can negotiate a
>>>     different set of features with the device when enabled.
>>> * Support of host notifiers memory regions
>>> * Handling of SVQ full queue in case guest's descriptors span to
>>>     different memory regions (qemu's VA chunks).
>>> * Flush pending used buffers at end of SVQ operation.
>>> * QMP command now looks by NetClientState name. Other devices will need
>>>     to implement it's way to enable vdpa.
>>> * Rename QMP command to set, so it looks more like a way of working
>>> * Better use of qemu error system
>>> * Make a few assertions proper error-handling paths.
>>> * Add more documentation
>>> * Less coupling of virtio / vhost, that could cause friction on changes
>>> * Addressed many other small comments and small fixes.
>>>
>>> Changes from v3 RFC:
>>>     * Move everything to vhost-vdpa backend. A big change, this allowed
>>>       some cleanup but more code has been added in other places.
>>>     * More use of glib utilities, especially to manage memory.
>>> v3 link:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06032.html
>>>
>>> Changes from v2 RFC:
>>>     * Adding vhost-vdpa devices support
>>>     * Fixed some memory leaks pointed by different comments
>>> v2 link:
>>> https://lists.nongnu.org/archive/html/qemu-devel/2021-03/msg05600.html
>>>
>>> Changes from v1 RFC:
>>>     * Use QMP instead of migration to start SVQ mode.
>>>     * Only accepting IOMMU devices, closer behavior with target devices
>>>       (vDPA)
>>>     * Fix invalid masking/unmasking of vhost call fd.
>>>     * Use of proper methods for synchronization.
>>>     * No need to modify VirtIO device code, all of the changes are
>>>       contained in vhost code.
>>>     * Delete superfluous code.
>>>     * An intermediate RFC was sent with only the notifications forwarding
>>>       changes. It can be seen in
>>>       https://patchew.org/QEMU/20210129205415.876290-1-eperezma@redhat.com/
>>> v1 link:
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05372.html
>>>
>>> Eugenio Pérez (20):
>>>         virtio: Add VIRTIO_F_QUEUE_STATE
>>>         virtio-net: Honor VIRTIO_CONFIG_S_DEVICE_STOPPED
>>>         virtio: Add virtio_queue_is_host_notifier_enabled
>>>         vhost: Make vhost_virtqueue_{start,stop} public
>>>         vhost: Add x-vhost-enable-shadow-vq qmp
>>>         vhost: Add VhostShadowVirtqueue
>>>         vdpa: Register vdpa devices in a list
>>>         vhost: Route guest->host notification through shadow virtqueue
>>>         Add vhost_svq_get_svq_call_notifier
>>>         Add vhost_svq_set_guest_call_notifier
>>>         vdpa: Save call_fd in vhost-vdpa
>>>         vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>>>         vhost: Route host->guest notification through shadow virtqueue
>>>         virtio: Add vhost_shadow_vq_get_vring_addr
>>>         vdpa: Save host and guest features
>>>         vhost: Add vhost_svq_valid_device_features to shadow vq
>>>         vhost: Shadow virtqueue buffers forwarding
>>>         vhost: Add VhostIOVATree
>>>         vhost: Use a tree to store memory mappings
>>>         vdpa: Add custom IOTLB translations to SVQ
>>>
>>> Eugenio Pérez (31):
>>>     vdpa: Reorder virtio/vhost-vdpa.c functions
>>>     vhost: Add VhostShadowVirtqueue
>>>     vdpa: Add vhost_svq_get_dev_kick_notifier
>>>     vdpa: Add vhost_svq_set_svq_kick_fd
>>>     vhost: Add Shadow VirtQueue kick forwarding capabilities
>>>     vhost: Route guest->host notification through shadow virtqueue
>>>     vhost: dd vhost_svq_get_svq_call_notifier
>>>     vhost: Add vhost_svq_set_guest_call_notifier
>>>     vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
>>>     vhost: Route host->guest notification through shadow virtqueue
>>>     vhost: Add vhost_svq_valid_device_features to shadow vq
>>>     vhost: Add vhost_svq_valid_guest_features to shadow vq
>>>     vhost: Add vhost_svq_ack_guest_features to shadow vq
>>>     virtio: Add vhost_shadow_vq_get_vring_addr
>>>     vdpa: Add vhost_svq_get_num
>>>     vhost: pass queue index to vhost_vq_get_addr
>>>     vdpa: adapt vhost_ops callbacks to svq
>>>     vhost: Shadow virtqueue buffers forwarding
>>>     utils: Add internal DMAMap to iova-tree
>>>     util: Store DMA entries in a list
>>>     util: Add iova_tree_alloc
>>>     vhost: Add VhostIOVATree
>>>     vdpa: Add custom IOTLB translations to SVQ
>>>     vhost: Add vhost_svq_get_last_used_idx
>>>     vdpa: Adapt vhost_vdpa_get_vring_base to SVQ
>>>     vdpa: Clear VHOST_VRING_F_LOG at vhost_vdpa_set_vring_addr in SVQ
>>>     vdpa: Never set log_base addr if SVQ is enabled
>>>     vdpa: Expose VHOST_F_LOG_ALL on SVQ
>>>     vdpa: Make ncs autofree
>>>     vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c
>>>     vdpa: Add x-svq to NetdevVhostVDPAOptions
>>>
>>>    qapi/net.json                      |   5 +-
>>>    hw/virtio/vhost-iova-tree.h        |  27 +
>>>    hw/virtio/vhost-shadow-virtqueue.h |  46 ++
>>>    include/hw/virtio/vhost-vdpa.h     |   7 +
>>>    include/qemu/iova-tree.h           |  17 +
>>>    hw/virtio/vhost-iova-tree.c        | 157 ++++++
>>>    hw/virtio/vhost-shadow-virtqueue.c | 761 +++++++++++++++++++++++++++++
>>>    hw/virtio/vhost-vdpa.c             | 740 ++++++++++++++++++++++++----
>>>    hw/virtio/vhost.c                  |   6 +-
>>>    net/vhost-vdpa.c                   |  58 ++-
>>>    util/iova-tree.c                   | 161 +++++-
>>>    hw/virtio/meson.build              |   2 +-
>>>    12 files changed, 1852 insertions(+), 135 deletions(-)
>>>    create mode 100644 hw/virtio/vhost-iova-tree.h
>>>    create mode 100644 hw/virtio/vhost-shadow-virtqueue.h
>>>    create mode 100644 hw/virtio/vhost-iova-tree.c
>>>    create mode 100644 hw/virtio/vhost-shadow-virtqueue.c
>>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd
       [not found]     ` <CAJaqyWc7fbgN-W7y3=iFqHsJzj+1Mg0cuwSu+my=62nu9vGOqA@mail.gmail.com>
@ 2022-02-08  8:47       ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-08  8:47 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/31 下午6:18, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 7:29 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> This function allows the vhost-vdpa backend to override kick_fd.
>>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-shadow-virtqueue.h |  1 +
>>>    hw/virtio/vhost-shadow-virtqueue.c | 45 ++++++++++++++++++++++++++++++
>>>    2 files changed, 46 insertions(+)
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>> index 400effd9f2..a56ecfc09d 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>> @@ -15,6 +15,7 @@
>>>
>>>    typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
>>>
>>> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
>>>    const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>                                                  const VhostShadowVirtqueue *svq);
>>>
>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>> index bd87110073..21534bc94d 100644
>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>> @@ -11,6 +11,7 @@
>>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
>>>
>>>    #include "qemu/error-report.h"
>>> +#include "qemu/main-loop.h"
>>>
>>>    /* Shadow virtqueue to relay notifications */
>>>    typedef struct VhostShadowVirtqueue {
>>> @@ -18,8 +19,20 @@ typedef struct VhostShadowVirtqueue {
>>>        EventNotifier hdev_kick;
>>>        /* Shadow call notifier, sent to vhost */
>>>        EventNotifier hdev_call;
>>> +
>>> +    /*
>>> +     * Borrowed virtqueue's guest to host notifier.
>>> +     * To borrow it in this event notifier allows to register on the event
>>> +     * loop and access the associated shadow virtqueue easily. If we use the
>>> +     * VirtQueue, we don't have an easy way to retrieve it.
>>> +     *
>>> +     * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
>>> +     */
>>> +    EventNotifier svq_kick;
>>>    } VhostShadowVirtqueue;
>>>
>>> +#define INVALID_SVQ_KICK_FD -1
>>> +
>>>    /**
>>>     * The notifier that SVQ will use to notify the device.
>>>     */
>>> @@ -29,6 +42,35 @@ const EventNotifier *vhost_svq_get_dev_kick_notifier(
>>>        return &svq->hdev_kick;
>>>    }
>>>
>>> +/**
>>> + * Set a new file descriptor for the guest to kick SVQ and notify for avail
>>> + *
>>> + * @svq          The svq
>>> + * @svq_kick_fd  The new svq kick fd
>>> + */
>>> +void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>> +{
>>> +    EventNotifier tmp;
>>> +    bool check_old = INVALID_SVQ_KICK_FD !=
>>> +                     event_notifier_get_fd(&svq->svq_kick);
>>> +
>>> +    if (check_old) {
>>> +        event_notifier_set_handler(&svq->svq_kick, NULL);
>>> +        event_notifier_init_fd(&tmp, event_notifier_get_fd(&svq->svq_kick));
>>> +    }
>>
>> It looks to me we don't do similar things in vhost-net. Any reason for
>> caring about the old svq_kick?
>>
> Do you mean to check for old kick_fd in case we miss notifications,
> and explicitly omit the INVALID_SVQ_KICK_FD?


Yes.


>
> If you mean qemu's vhost-net, I guess it's because the device's kick
> fd is never changed in all the vhost device lifecycle, it's only set
> at the beginning. Previous RFC also depended on that, but you
> suggested better vhost and SVQ in v4 feedback if I understood
> correctly [1]. Or am I missing something?


No, I forgot that. But in this case we should have a better dealing with 
the the conversion from valid fd to -1 by disabling the handler.


>
> Qemu's vhost-net does not need to use this because it is not polling
> it. For kernel's vhost, I guess the closest is the use of pollstop and
> pollstart at vhost_vring_ioctl.
>
> In my opinion, I think that SVQ code size can benefit from now
> allowing to override kick_fd from the start of the operation. Not from
> initialization, but start. But I can see the benefits of having the
> change into account from this moment so it's more resilient to the
> future.
>
>>> +
>>> +    /*
>>> +     * event_notifier_set_handler already checks for guest's notifications if
>>> +     * they arrive to the new file descriptor in the switch, so there is no
>>> +     * need to explicitely check for them.
>>> +     */
>>> +    event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
>>> +
>>> +    if (!check_old || event_notifier_test_and_clear(&tmp)) {
>>> +        event_notifier_set(&svq->hdev_kick);
>>
>> Any reason we need to kick the device directly here?
>>
> At this point of the series only notifications are forwarded, not
> buffers. If kick_fd is set, we need to check the old one, the same way
> as vhost checks the masked notifier in case of change.


I meant we need to kick the svq instead of vhost-vdpa in this case?

Thanks


>
> Thanks!
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-10/msg03152.html
> , from "I'd suggest to not depend on this since it:"
>
>
>> Thanks
>>
>>
>>> +    }
>>> +}
>>> +
>>>    /**
>>>     * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>>>     * methods and file descriptors.
>>> @@ -52,6 +94,9 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>>>            goto err_init_hdev_call;
>>>        }
>>>
>>> +    /* Placeholder descriptor, it should be deleted at set_kick_fd */
>>> +    event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>>> +
>>>        return g_steal_pointer(&svq);
>>>
>>>    err_init_hdev_call:

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue
       [not found]     ` <CAJaqyWeRbmwW80q3q52nFw=iz1xcPRFviFaRHo0nzXpEb+3m3A@mail.gmail.com>
@ 2022-02-08  9:02       ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-08  9:02 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/31 下午7:33, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 7:57 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> At this moment no buffer forwarding will be performed in SVQ mode: Qemu
>>> just forward the guest's kicks to the device. This commit also set up
>>> SVQs in the vhost device.
>>>
>>> Host memory notifiers regions are left out for simplicity, and they will
>>> not be addressed in this series.
>>
>> I wonder if it's better to squash this into patch 5 since it gives us a
>> full guest->host forwarding.
>>
> I'm fine with that if you think it makes the review easier.


Yes please.


>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    include/hw/virtio/vhost-vdpa.h |   4 ++
>>>    hw/virtio/vhost-vdpa.c         | 122 ++++++++++++++++++++++++++++++++-
>>>    2 files changed, 124 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
>>> index 3ce79a646d..009a9f3b6b 100644
>>> --- a/include/hw/virtio/vhost-vdpa.h
>>> +++ b/include/hw/virtio/vhost-vdpa.h
>>> @@ -12,6 +12,8 @@
>>>    #ifndef HW_VIRTIO_VHOST_VDPA_H
>>>    #define HW_VIRTIO_VHOST_VDPA_H
>>>
>>> +#include <gmodule.h>
>>> +
>>>    #include "hw/virtio/virtio.h"
>>>    #include "standard-headers/linux/vhost_types.h"
>>>
>>> @@ -27,6 +29,8 @@ typedef struct vhost_vdpa {
>>>        bool iotlb_batch_begin_sent;
>>>        MemoryListener listener;
>>>        struct vhost_vdpa_iova_range iova_range;
>>> +    bool shadow_vqs_enabled;
>>> +    GPtrArray *shadow_vqs;
>>>        struct vhost_dev *dev;
>>>        VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
>>>    } VhostVDPA;
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 6c10a7f05f..18de14f0fb 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -17,12 +17,14 @@
>>>    #include "hw/virtio/vhost.h"
>>>    #include "hw/virtio/vhost-backend.h"
>>>    #include "hw/virtio/virtio-net.h"
>>> +#include "hw/virtio/vhost-shadow-virtqueue.h"
>>>    #include "hw/virtio/vhost-vdpa.h"
>>>    #include "exec/address-spaces.h"
>>>    #include "qemu/main-loop.h"
>>>    #include "cpu.h"
>>>    #include "trace.h"
>>>    #include "qemu-common.h"
>>> +#include "qapi/error.h"
>>>
>>>    /*
>>>     * Return one past the end of the end of section. Be careful with uint64_t
>>> @@ -409,8 +411,14 @@ err:
>>>
>>>    static void vhost_vdpa_host_notifiers_init(struct vhost_dev *dev)
>>>    {
>>> +    struct vhost_vdpa *v = dev->opaque;
>>>        int i;
>>>
>>> +    if (v->shadow_vqs_enabled) {
>>> +        /* SVQ is not compatible with host notifiers mr */
>>
>> I guess there should be a TODO or FIXME here.
>>
> Sure I can add it.
>
>>> +        return;
>>> +    }
>>> +
>>>        for (i = dev->vq_index; i < dev->vq_index + dev->nvqs; i++) {
>>>            if (vhost_vdpa_host_notifier_init(dev, i)) {
>>>                goto err;
>>> @@ -424,6 +432,17 @@ err:
>>>        return;
>>>    }
>>>
>>> +static void vhost_vdpa_svq_cleanup(struct vhost_dev *dev)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    size_t idx;
>>> +
>>> +    for (idx = 0; idx < v->shadow_vqs->len; ++idx) {
>>> +        vhost_svq_stop(g_ptr_array_index(v->shadow_vqs, idx));
>>> +    }
>>> +    g_ptr_array_free(v->shadow_vqs, true);
>>> +}
>>> +
>>>    static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>>>    {
>>>        struct vhost_vdpa *v;
>>> @@ -432,6 +451,7 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
>>>        trace_vhost_vdpa_cleanup(dev, v);
>>>        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>>        memory_listener_unregister(&v->listener);
>>> +    vhost_vdpa_svq_cleanup(dev);
>>>
>>>        dev->opaque = NULL;
>>>        ram_block_discard_disable(false);
>>> @@ -507,9 +527,15 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
>>>
>>>    static int vhost_vdpa_reset_device(struct vhost_dev *dev)
>>>    {
>>> +    struct vhost_vdpa *v = dev->opaque;
>>>        int ret;
>>>        uint8_t status = 0;
>>>
>>> +    for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>>> +        vhost_svq_stop(svq);
>>> +    }
>>> +
>>>        ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>>        trace_vhost_vdpa_reset_device(dev, status);
>>>        return ret;
>>> @@ -639,13 +665,28 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>>>        return ret;
>>>    }
>>>
>>> -static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>> -                                       struct vhost_vring_file *file)
>>> +static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
>>> +                                         struct vhost_vring_file *file)
>>>    {
>>>        trace_vhost_vdpa_set_vring_kick(dev, file->index, file->fd);
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_KICK, file);
>>>    }
>>>
>>> +static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>> +                                       struct vhost_vring_file *file)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>> +
>>> +    if (v->shadow_vqs_enabled) {
>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>> +        vhost_svq_set_svq_kick_fd(svq, file->fd);
>>> +        return 0;
>>> +    } else {
>>> +        return vhost_vdpa_set_vring_dev_kick(dev, file);
>>> +    }
>>> +}
>>> +
>>>    static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>                                           struct vhost_vring_file *file)
>>>    {
>>> @@ -653,6 +694,33 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>    }
>>>
>>> +/**
>>> + * Set shadow virtqueue descriptors to the device
>>> + *
>>> + * @dev   The vhost device model
>>> + * @svq   The shadow virtqueue
>>> + * @idx   The index of the virtqueue in the vhost device
>>> + */
>>> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
>>> +                                VhostShadowVirtqueue *svq,
>>> +                                unsigned idx)
>>> +{
>>> +    struct vhost_vring_file file = {
>>> +        .index = dev->vq_index + idx,
>>> +    };
>>> +    const EventNotifier *event_notifier;
>>> +    int r;
>>> +
>>> +    event_notifier = vhost_svq_get_dev_kick_notifier(svq);
>>
>> A question, any reason for making VhostShadowVirtqueue private? If we
>> export it in .h we don't need helper to access its member like
>> vhost_svq_get_dev_kick_notifier().
>>
> To export it it's always a possibility of course, but that direct
> access will not be thread safe if we decide to move SVQ to its own
> iothread for example.


I don't get this, maybe you can give me an example.


>
> I feel it will be easier to work with it this way but it might be that
> I'm just used to making as much as possible private. Not like it's
> needed to use the helpers in the hot paths, only in the setup and
> teardown.
>
>> Note that vhost_dev is a public structure.
>>
> Sure we could embed in vhost_virtqueue if we choose to do it that way,
> for example.
>
>>> +    file.fd = event_notifier_get_fd(event_notifier);
>>> +    r = vhost_vdpa_set_vring_dev_kick(dev, &file);
>>> +    if (unlikely(r != 0)) {
>>> +        error_report("Can't set device kick fd (%d)", -r);
>>> +    }
>>
>> I wonder whether or not we can generalize the logic here and
>> vhost_vdpa_set_vring_kick(). There's nothing vdpa specific unless the
>> vhost_ops->set_vring_kick().
>>
> If we call vhost_ops->set_vring_kick we are setting guest->SVQ kick
> notifier, not SVQ -> vDPA device, because the
> if(v->shadow_vqs_enabled). All of the modified ops callbacks are
> hiding the actual device from the vhost subsystem so we need to
> explicitly use the newly created _dev_ ones.


Ok, I'm fine to start with vhost_vdpa specific code.


>
>>> +
>>> +    return r == 0;
>>> +}
>>> +
>>>    static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>    {
>>>        struct vhost_vdpa *v = dev->opaque;
>>> @@ -660,6 +728,13 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>
>>>        if (started) {
>>>            vhost_vdpa_host_notifiers_init(dev);
>>> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
>>> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
>>> +            bool ok = vhost_vdpa_svq_setup(dev, svq, i);
>>> +            if (unlikely(!ok)) {
>>> +                return -1;
>>> +            }
>>> +        }
>>>            vhost_vdpa_set_vring_ready(dev);
>>>        } else {
>>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>> @@ -737,6 +812,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>>>        return true;
>>>    }
>>>
>>> +/**
>>> + * Adaptor function to free shadow virtqueue through gpointer
>>> + *
>>> + * @svq   The Shadow Virtqueue
>>> + */
>>> +static void vhost_psvq_free(gpointer svq)
>>> +{
>>> +    vhost_svq_free(svq);
>>> +}
>>
>> Any reason for such indirection? Can we simply use vhost_svq_free()?
>>
> GCC complains about different types. I think we could do a function
> type cast and it's valid for every architecture qemu supports, but the
> indirection seems cleaner to me, and I would be surprised if the
> compiler does not optimize it away in the cases that the casting are
> valid.
>
> ../hw/virtio/vhost-vdpa.c:1186:60: error: incompatible function
> pointer types passing 'void (VhostShadowVirtqueue *)' (aka 'void
> (struct VhostShadowVirtqueue *)') to parameter of type
> 'GDestroyNotify' (aka 'void (*)(void *)')


Or just change vhost_svq_free() to take gpointer instead? Then we don't 
need a cast.

Thanks

>
> Thanks!
>
>> Thanks
>>
>>
>>> +
>>> +static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
>>> +                               Error **errp)
>>> +{
>>> +    size_t n_svqs = v->shadow_vqs_enabled ? hdev->nvqs : 0;
>>> +    g_autoptr(GPtrArray) shadow_vqs = g_ptr_array_new_full(n_svqs,
>>> +                                                           vhost_psvq_free);
>>> +    if (!v->shadow_vqs_enabled) {
>>> +        goto out;
>>> +    }
>>> +
>>> +    for (unsigned n = 0; n < hdev->nvqs; ++n) {
>>> +        VhostShadowVirtqueue *svq = vhost_svq_new();
>>> +
>>> +        if (unlikely(!svq)) {
>>> +            error_setg(errp, "Cannot create svq %u", n);
>>> +            return -1;
>>> +        }
>>> +        g_ptr_array_add(v->shadow_vqs, svq);
>>> +    }
>>> +
>>> +out:
>>> +    v->shadow_vqs = g_steal_pointer(&shadow_vqs);
>>> +    return 0;
>>> +}
>>> +
>>>    static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>    {
>>>        struct vhost_vdpa *v;
>>> @@ -759,6 +869,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>        dev->opaque =  opaque ;
>>>        v->listener = vhost_vdpa_memory_listener;
>>>        v->msg_type = VHOST_IOTLB_MSG_V2;
>>> +    ret = vhost_vdpa_init_svq(dev, v, errp);
>>> +    if (ret) {
>>> +        goto err;
>>> +    }
>>>
>>>        vhost_vdpa_get_iova_range(v);
>>>
>>> @@ -770,6 +884,10 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>>                                   VIRTIO_CONFIG_S_DRIVER);
>>>
>>>        return 0;
>>> +
>>> +err:
>>> +    ram_block_discard_disable(false);
>>> +    return ret;
>>>    }
>>>
>>>    const VhostOps vdpa_ops = {

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
       [not found]         ` <CAJaqyWcvWjPas0=xp+U-c-kG+e6k73jg=C4phFD7S-tZY=niSQ@mail.gmail.com>
@ 2022-02-17  6:02           ` Jason Wang
       [not found]             ` <CAJaqyWdhHmD+tB_bY_YEMnBU1p7-LW=LP8f+3e_ZXDcOfSRiNA@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-17  6:02 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > >>> block migration.
> > >>>
> > >>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > >>> enabled. Even if the device supports it, the reports would be nonsense
> > >>> because SVQ memory is in the qemu region.
> > >>>
> > >>> The log region is still allocated. Future changes might skip that, but
> > >>> this series is already long enough.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>>    hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > >>>    1 file changed, 20 insertions(+)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>> index fb0a338baa..75090d65e8 100644
> > >>> --- a/hw/virtio/vhost-vdpa.c
> > >>> +++ b/hw/virtio/vhost-vdpa.c
> > >>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > >>>        if (ret == 0 && v->shadow_vqs_enabled) {
> > >>>            /* Filter only features that SVQ can offer to guest */
> > >>>            vhost_svq_valid_guest_features(features);
> > >>> +
> > >>> +        /* Add SVQ logging capabilities */
> > >>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > >>>        }
> > >>>
> > >>>        return ret;
> > >>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > >>>
> > >>>        if (v->shadow_vqs_enabled) {
> > >>>            uint64_t dev_features, svq_features, acked_features;
> > >>> +        uint8_t status = 0;
> > >>>            bool ok;
> > >>>
> > >>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > >>> +        if (unlikely(ret)) {
> > >>> +            return ret;
> > >>> +        }
> > >>> +
> > >>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > >>> +            /*
> > >>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > >>> +             * would report wrong dirty pages. SVQ handles it.
> > >>> +             */
> > >>
> > >> I fail to understand this comment, I'd think there's no way to disable
> > >> dirty page tracking for SVQ.
> > >>
> > > vhost_log_global_{start,stop} are called at the beginning and end of
> > > migration. To inform the device that it should start logging, they set
> > > or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> >
> >
> > Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > enabled and disabled.
> >
>
> Yes, that's what this patch does.
>
> >
> > >
> > > While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > > vhost does not block migration. Maybe we need to look for another way
> > > to do this?
> >
> >
> > I'm fine with filtering since it's much more simpler, but I fail to
> > understand why we need to check DRIVER_OK.
> >
>
> Ok maybe I can make that part more clear,
>
> Since both operations use vhost_vdpa_set_features we must just filter
> the one that actually sets or removes VHOST_F_LOG_ALL, without
> affecting other features.
>
> In practice, that means to not forward the set features after
> DRIVER_OK. The device is not expecting them anymore.

I wonder what happens if we don't do this.

So kernel had this check:

        /*
         * It's not allowed to change the features after they have
         * been negotiated.
         */
if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
        return -EBUSY;

So is it FEATURES_OK actually?

For this patch, I wonder if the thing we need to do is to see whether
it is a enable/disable F_LOG_ALL and simply return.

Thanks

>
> Does that make more sense?
>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >> Thanks
> > >>
> > >>
> > >>> +            return 0;
> > >>> +        }
> > >>> +
> > >>> +        /* We must not ack _F_LOG if SVQ is enabled */
> > >>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> > >>> +
> > >>>            ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> > >>>            if (ret != 0) {
> > >>>                error_report("Can't get vdpa device features, got (%d)", ret);
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
       [not found]         ` <CAJaqyWfEEg2PKgxBAFwYhF9LD1oDtwVYXSjHHnCbstT3dvL2GA@mail.gmail.com>
@ 2022-02-21  7:15           ` Jason Wang
       [not found]             ` <CAJaqyWcoHgToqsR-bVRctTnhgufmarR_2hh4O_VoCbCGp8WNhg@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-21  7:15 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
>>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>> First half of the buffers forwarding part, preparing vhost-vdpa
>>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
>>>>> this is effectively dead code at the moment, but it helps to reduce
>>>>> patch size.
>>>>>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
>>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
>>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
>>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
>>>>> index 035207a469..39aef5ffdf 100644
>>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
>>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
>>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
>>>>>
>>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
>>>>>
>>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
>>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
>>>>>
>>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
>>>>>
>>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
>>>>> index f129ec8395..7c168075d7 100644
>>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
>>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
>>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>>>     /**
>>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
>>>>>      * methods and file descriptors.
>>>>> + *
>>>>> + * @qsize Shadow VirtQueue size
>>>>> + *
>>>>> + * Returns the new virtqueue or NULL.
>>>>> + *
>>>>> + * In case of error, reason is reported through error_report.
>>>>>      */
>>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
>>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
>>>>>     {
>>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
>>>>> +    size_t device_size, driver_size;
>>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
>>>>>         int r;
>>>>>
>>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
>>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
>>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
>>>>>
>>>>> +    svq->vring.num = qsize;
>>>> I wonder if this is the best. E.g some hardware can support up to 32K
>>>> queue size. So this will probably end up with:
>>>>
>>>> 1) SVQ use 32K queue size
>>>> 2) hardware queue uses 256
>>>>
>>> In that case SVQ vring queue size will be 32K and guest's vring can
>>> negotiate any number with SVQ equal or less than 32K,
>>
>> Sorry for being unclear what I meant is actually
>>
>> 1) SVQ uses 32K queue size
>>
>> 2) guest vq uses 256
>>
>> This looks like a burden that needs extra logic and may damage the
>> performance.
>>
> Still not getting this point.
>
> An available guest buffer, although contiguous in GPA/GVA, can expand
> in multiple buffers if it's not contiguous in qemu's VA (by the while
> loop in virtqueue_map_desc [1]). In that scenario it is better to have
> "plenty" of SVQ buffers.


Yes, but this case should be rare. So in this case we should deal with 
overrun on SVQ, that is

1) SVQ is full
2) guest VQ isn't

We need to

1) check the available buffer slots
2) disable guest kick and wait for the used buffers

But it looks to me the current code is not ready for dealing with this case?


>
> I'm ok if we decide to put an upper limit though, or if we decide not
> to handle this situation. But we would leave out valid virtio drivers.
> Maybe to set a fixed upper limit (1024?)? To add another parameter
> (x-svq-size-n=N)?
>
> If you mean we lose performance because memory gets more sparse I
> think the only possibility is to limit that way.


If guest is not using 32K, having a 32K for svq may gives extra stress 
on the cache since we will end up with a pretty large working set.


>
>> And this can lead other interesting situation:
>>
>> 1) SVQ uses 256
>>
>> 2) guest vq uses 1024
>>
>> Where a lot of more SVQ logic is needed.
>>
> If we agree that a guest descriptor can expand in multiple SVQ
> descriptors, this should be already handled by the previous logic too.
>
> But this should only happen in case that qemu is launched with a "bad"
> cmdline, isn't it?


This seems can happen when we use -device 
virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?


>
> If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> as a queue size [2]. But if the vdpa device maximum queue size is
> effectively 256, this will result in an error: We're not exposing it
> to the guest at any moment but with qemu's cmdline.
>
>>> including 256.
>>> Is that what you mean?
>>
>> I mean, it looks to me the logic will be much more simplified if we just
>> allocate the shadow virtqueue with the size what guest can see (guest
>> vring).
>>
>> Then we don't need to think if the difference of the queue size can have
>> any side effects.
>>
> I think that we cannot avoid that extra logic unless we force GPA to
> be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> more than one descriptor in SVQ, then yes, we can simplify things. If
> not, I think we are forced to carry all of it.


Yes, I agree, the code should be robust to handle any case.

Thanks


>
> But if we prove it I'm not opposed to simplifying things and making
> head at SVQ == head at guest.
>
> Thanks!
>
> [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> [2] But that's not the whole story: I've been running limited in tx
> descriptors because of virtio_net_max_tx_queue_size, which predates
> vdpa. I'll send a patch to also un-limit it.
>
>>> If with hardware queues you mean guest's vring, not sure why it is
>>> "probably 256". I'd say that in that case with the virtio-net kernel
>>> driver the ring size will be the same as the device export, for
>>> example, isn't it?
>>>
>>> The implementation should support any combination of sizes, but the
>>> ring size exposed to the guest is never bigger than hardware one.
>>>
>>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
>>>> to add event index support?
>>>>
>>> I think we should not have any problem with event idx. If you mean
>>> that the guest could mark more buffers available than SVQ vring's
>>> size, that should not happen because there must be less entries in the
>>> guest than SVQ.
>>>
>>> But if I understood you correctly, a similar situation could happen if
>>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
>>> Even if that would happen, the situation should be ok too: SVQ knows
>>> the guest's avail idx and, if SVQ is full, it will continue forwarding
>>> avail buffers when the device uses more buffers.
>>>
>>> Does that make sense to you?
>>
>> Yes.
>>
>> Thanks
>>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions
       [not found]     ` <CAJaqyWffGzYv2+HufFZzzBPtu5z3_vaKh4evGXqj7hqTB0WU3A@mail.gmail.com>
@ 2022-02-21  7:31       ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-21  7:31 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/1/28 下午3:57, Eugenio Perez Martin 写道:
> On Fri, Jan 28, 2022 at 6:59 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>> vhost_vdpa_set_features and vhost_vdpa_init need to use
>>> vhost_vdpa_get_features in svq mode.
>>>
>>> vhost_vdpa_dev_start needs to use almost all _set_ functions:
>>> vhost_vdpa_set_vring_dev_kick, vhost_vdpa_set_vring_dev_call,
>>> vhost_vdpa_set_dev_vring_base and vhost_vdpa_set_dev_vring_num.
>>>
>>> No functional change intended.
>>
>> Is it related (a must) to the SVQ code?
>>
> Yes, SVQ needs to access the device variants to configure it, while
> exposing the SVQ ones.
>
> For example for set_features, SVQ needs to set device features in the
> start code, but expose SVQ ones to the guest.
>
> Another possibility is to forward-declare them but I feel it pollutes
> the code more, doesn't it? Is there any reason to avoid the reordering
> beyond reducing the number of changes/patches?


No, but for reviewer, it might be easier if you squash the reordering 
logic into the patch which needs that.

Thanks


>
> Thanks!
>
>
>> Thanks
>>
>>
>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>> ---
>>>    hw/virtio/vhost-vdpa.c | 164 ++++++++++++++++++++---------------------
>>>    1 file changed, 82 insertions(+), 82 deletions(-)
>>>
>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>> index 04ea43704f..6c10a7f05f 100644
>>> --- a/hw/virtio/vhost-vdpa.c
>>> +++ b/hw/virtio/vhost-vdpa.c
>>> @@ -342,41 +342,6 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev *dev)
>>>        return v->index != 0;
>>>    }
>>>
>>> -static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>> -{
>>> -    struct vhost_vdpa *v;
>>> -    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>>> -    trace_vhost_vdpa_init(dev, opaque);
>>> -    int ret;
>>> -
>>> -    /*
>>> -     * Similar to VFIO, we end up pinning all guest memory and have to
>>> -     * disable discarding of RAM.
>>> -     */
>>> -    ret = ram_block_discard_disable(true);
>>> -    if (ret) {
>>> -        error_report("Cannot set discarding of RAM broken");
>>> -        return ret;
>>> -    }
>>> -
>>> -    v = opaque;
>>> -    v->dev = dev;
>>> -    dev->opaque =  opaque ;
>>> -    v->listener = vhost_vdpa_memory_listener;
>>> -    v->msg_type = VHOST_IOTLB_MSG_V2;
>>> -
>>> -    vhost_vdpa_get_iova_range(v);
>>> -
>>> -    if (vhost_vdpa_one_time_request(dev)) {
>>> -        return 0;
>>> -    }
>>> -
>>> -    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> -                               VIRTIO_CONFIG_S_DRIVER);
>>> -
>>> -    return 0;
>>> -}
>>> -
>>>    static void vhost_vdpa_host_notifier_uninit(struct vhost_dev *dev,
>>>                                                int queue_index)
>>>    {
>>> @@ -506,24 +471,6 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
>>>        return 0;
>>>    }
>>>
>>> -static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>> -                                   uint64_t features)
>>> -{
>>> -    int ret;
>>> -
>>> -    if (vhost_vdpa_one_time_request(dev)) {
>>> -        return 0;
>>> -    }
>>> -
>>> -    trace_vhost_vdpa_set_features(dev, features);
>>> -    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
>>> -    if (ret) {
>>> -        return ret;
>>> -    }
>>> -
>>> -    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>>> -}
>>> -
>>>    static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
>>>    {
>>>        uint64_t features;
>>> @@ -646,35 +593,6 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, uint8_t *config,
>>>        return ret;
>>>     }
>>>
>>> -static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>> -{
>>> -    struct vhost_vdpa *v = dev->opaque;
>>> -    trace_vhost_vdpa_dev_start(dev, started);
>>> -
>>> -    if (started) {
>>> -        vhost_vdpa_host_notifiers_init(dev);
>>> -        vhost_vdpa_set_vring_ready(dev);
>>> -    } else {
>>> -        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>> -    }
>>> -
>>> -    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
>>> -        return 0;
>>> -    }
>>> -
>>> -    if (started) {
>>> -        memory_listener_register(&v->listener, &address_space_memory);
>>> -        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>> -    } else {
>>> -        vhost_vdpa_reset_device(dev);
>>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> -                                   VIRTIO_CONFIG_S_DRIVER);
>>> -        memory_listener_unregister(&v->listener);
>>> -
>>> -        return 0;
>>> -    }
>>> -}
>>> -
>>>    static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
>>>                                         struct vhost_log *log)
>>>    {
>>> @@ -735,6 +653,35 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>        return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>    }
>>>
>>> +static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>> +{
>>> +    struct vhost_vdpa *v = dev->opaque;
>>> +    trace_vhost_vdpa_dev_start(dev, started);
>>> +
>>> +    if (started) {
>>> +        vhost_vdpa_host_notifiers_init(dev);
>>> +        vhost_vdpa_set_vring_ready(dev);
>>> +    } else {
>>> +        vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
>>> +    }
>>> +
>>> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (started) {
>>> +        memory_listener_register(&v->listener, &address_space_memory);
>>> +        return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>> +    } else {
>>> +        vhost_vdpa_reset_device(dev);
>>> +        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> +                                   VIRTIO_CONFIG_S_DRIVER);
>>> +        memory_listener_unregister(&v->listener);
>>> +
>>> +        return 0;
>>> +    }
>>> +}
>>> +
>>>    static int vhost_vdpa_get_features(struct vhost_dev *dev,
>>>                                         uint64_t *features)
>>>    {
>>> @@ -745,6 +692,24 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev,
>>>        return ret;
>>>    }
>>>
>>> +static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>> +                                   uint64_t features)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (vhost_vdpa_one_time_request(dev)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    trace_vhost_vdpa_set_features(dev, features);
>>> +    ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
>>> +    if (ret) {
>>> +        return ret;
>>> +    }
>>> +
>>> +    return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>>> +}
>>> +
>>>    static int vhost_vdpa_set_owner(struct vhost_dev *dev)
>>>    {
>>>        if (vhost_vdpa_one_time_request(dev)) {
>>> @@ -772,6 +737,41 @@ static bool  vhost_vdpa_force_iommu(struct vhost_dev *dev)
>>>        return true;
>>>    }
>>>
>>> +static int vhost_vdpa_init(struct vhost_dev *dev, void *opaque, Error **errp)
>>> +{
>>> +    struct vhost_vdpa *v;
>>> +    assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
>>> +    trace_vhost_vdpa_init(dev, opaque);
>>> +    int ret;
>>> +
>>> +    /*
>>> +     * Similar to VFIO, we end up pinning all guest memory and have to
>>> +     * disable discarding of RAM.
>>> +     */
>>> +    ret = ram_block_discard_disable(true);
>>> +    if (ret) {
>>> +        error_report("Cannot set discarding of RAM broken");
>>> +        return ret;
>>> +    }
>>> +
>>> +    v = opaque;
>>> +    v->dev = dev;
>>> +    dev->opaque =  opaque ;
>>> +    v->listener = vhost_vdpa_memory_listener;
>>> +    v->msg_type = VHOST_IOTLB_MSG_V2;
>>> +
>>> +    vhost_vdpa_get_iova_range(v);
>>> +
>>> +    if (vhost_vdpa_one_time_request(dev)) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>> +                               VIRTIO_CONFIG_S_DRIVER);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>    const VhostOps vdpa_ops = {
>>>            .backend_type = VHOST_BACKEND_TYPE_VDPA,
>>>            .vhost_backend_init = vhost_vdpa_init,

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
       [not found]         ` <CAJaqyWeisXmZ9+xw2Rj50K7aKx4khNZZjLZEz4MY97B9pQQm3w@mail.gmail.com>
@ 2022-02-21  7:39           ` Jason Wang
       [not found]             ` <CAJaqyWc5uR70a=hTpVpomuahF9iZouLmRpXPnWidga5CFxJOpA@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-21  7:39 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/18 下午8:35, Eugenio Perez Martin 写道:
> On Tue, Feb 8, 2022 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
>>> On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>>>>>     1 file changed, 18 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>> index 18de14f0fb..029f98feee 100644
>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>>>>         }
>>>>>     }
>>>>>
>>>>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>> -                                       struct vhost_vring_file *file)
>>>>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
>>>>> +                                         struct vhost_vring_file *file)
>>>>>     {
>>>>>         trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>>>>>         return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>>>     }
>>>>>
>>>>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>> +                                     struct vhost_vring_file *file)
>>>>> +{
>>>>> +    struct vhost_vdpa *v = dev->opaque;
>>>>> +
>>>>> +    if (v->shadow_vqs_enabled) {
>>>>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>>>> +
>>>>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
>>>> Two questions here (had similar questions for vring kick):
>>>>
>>>> 1) Any reason that we setup the eventfd for vhost-vdpa in
>>>> vhost_vdpa_svq_setup() not here?
>>>>
>>> I'm not sure what you mean.
>>>
>>> The guest->SVQ call and kick fds are set here and at
>>> vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
>>> SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
>>> vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
>>> notifier handler since we don't poll it.
>>>
>>> On the other hand, the connection SVQ <-> device uses the same fds
>>> from the beginning to the end, and they will not change with, for
>>> example, call fd masking. That's why it's setup from
>>> vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
>>> us add way more logic there.
>>
>> More logic in general shadow vq code but less codes for vhost-vdpa
>> specific code I think.
>>
>> E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to
>> here.
>>
> But they are different fds. vhost_vdpa_svq_set_fds sets the
> SVQ<->device. This function sets the SVQ->guest call file descriptor.
>
> To move the logic of vhost_vdpa_svq_set_fds here would imply either:
> a) Logic to know if we are receiving the first call fd or not.


Any reason for this? I guess you meant multiqueue. If yes, it should not 
be much difference since we have idx as the parameter.


>   That
> code is not in the series at the moment, because setting at
> vhost_vdpa_dev_start tells the difference for free. Is just adding
> code, not moving.
> b) Logic to set again *the same* file descriptor to device, with logic
> to tell if we have missed calls. That logic is not implemented for
> device->SVQ call file descriptor, because we are assuming it never
> changes from vhost_vdpa_svq_set_fds. So this is again adding code.
>
> At this moment, we have:
> vhost_vdpa_svq_set_fds:
>    set SVQ<->device fds
>
> vhost_vdpa_set_vring_call:
>    set guest<-SVQ call
>
> vhost_vdpa_set_vring_kick:
>    set guest->SVQ kick.
>
> If I understood correctly, the alternative would be something like:
> vhost_vdpa_set_vring_call:
>    set guest<-SVQ call
>    if(!vq->call_set) {
>      - set SVQ<-device call.
>      - vq->call_set = true
>    }
>
> vhost_vdpa_set_vring_kick:
>    set guest<-SVQ call
>    if(!vq->dev_kick_set) {
>      - set guest->device kick.
>      - vq->dev_kick_set = true
>    }
>
> dev_reset / dev_stop:
> for vq in vqs:
>    vq->dev_kick_set = vq->dev_call_set = false
> ...
>
> Or have I misunderstood something?


I wonder what happens if MSI-X is masking in guest. So if I understand 
correctly, we don't disable the eventfd from device? If yes, this seems 
suboptinal.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>>> 2) The call could be disabled by using -1 as the fd, I don't see any
>>>> code to deal with that.
>>>>
>>> Right, I didn't take that into account. vhost-kernel takes also -1 as
>>> kick_fd to unbind, so SVQ can be reworked to take that into account
>>> for sure.
>>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>> +        return 0;
>>>>> +    } else {
>>>>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>     /**
>>>>>      * Set shadow virtqueue descriptors to the device
>>>>>      *

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
       [not found]         ` <CAJaqyWedqtzRW=ur7upchneSc-oOkvkr3FUph_BfphV3zTmnkw@mail.gmail.com>
@ 2022-02-21  7:43           ` Jason Wang
       [not found]             ` <CAJaqyWcHhMpjJ4kde1ejV5c_vP7_8PvfXpi5u9rdWuaORFt_zg@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-21  7:43 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>>>     {
>>>>>         event_notifier_set_handler(&svq->svq_kick, NULL);
>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>>>> +
>>>>> +    if (!svq->vq) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    /* Send all pending used descriptors to guest */
>>>>> +    vhost_svq_flush(svq, false);
>>>> Do we need to wait for all the pending descriptors to be completed here?
>>>>
>>> No, this function does not wait, it only completes the forwarding of
>>> the *used* descriptors.
>>>
>>> The best example is the net rx queue in my opinion. This call will
>>> check SVQ's vring used_idx and will forward the last used descriptors
>>> if any, but all available descriptors will remain as available for
>>> qemu's VQ code.
>>>
>>> To skip it would miss those last rx descriptors in migration.
>>>
>>> Thanks!
>>
>> So it's probably to not the best place to ask. It's more about the
>> inflight descriptors so it should be TX instead of RX.
>>
>> I can imagine the migration last phase, we should stop the vhost-vDPA
>> before calling vhost_svq_stop(). Then we should be fine regardless of
>> inflight descriptors.
>>
> I think I'm still missing something here.
>
> To be on the same page. Regarding tx this could cause repeated tx
> frames (one at source and other at destination), but never a missed
> buffer not transmitted. The "stop before" could be interpreted as "SVQ
> is not forwarding available buffers anymore". Would that work?


Right, but this only work if

1) a flush to make sure TX DMA for inflight descriptors are all completed

2) just mark all inflight descriptor used

Otherwise there could be buffers that is inflight forever.

Thanks


>
> Thanks!
>
>> Thanks
>>
>>
>>>> Thanks
>>>>
>>>>
>>>>> +
>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>>>> +        g_autofree VirtQueueElement *elem = NULL;
>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>>>> +        if (elem) {
>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>>>> +    if (next_avail_elem) {
>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>>>> +                                 next_avail_elem->len);
>>>>> +    }
>>>>>     }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
       [not found]             ` <CAJaqyWcoHgToqsR-bVRctTnhgufmarR_2hh4O_VoCbCGp8WNhg@mail.gmail.com>
@ 2022-02-22  3:16               ` Jason Wang
       [not found]                 ` <CAJaqyWd2PQFedaEOV7YVZgp0m37snn-4LYYtNw7g4u+7hrtq=Q@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-22  3:16 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Tue, Feb 22, 2022 at 1:23 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Mon, Feb 21, 2022 at 8:15 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> > > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>>>> First half of the buffers forwarding part, preparing vhost-vdpa
> > >>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> > >>>>> this is effectively dead code at the moment, but it helps to reduce
> > >>>>> patch size.
> > >>>>>
> > >>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>>>> ---
> > >>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> > >>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
> > >>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
> > >>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
> > >>>>>
> > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > >>>>> index 035207a469..39aef5ffdf 100644
> > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > >>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > >>>>>
> > >>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > >>>>>
> > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
> > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > >>>>>
> > >>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
> > >>>>>
> > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > >>>>> index f129ec8395..7c168075d7 100644
> > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > >>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>>>     /**
> > >>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > >>>>>      * methods and file descriptors.
> > >>>>> + *
> > >>>>> + * @qsize Shadow VirtQueue size
> > >>>>> + *
> > >>>>> + * Returns the new virtqueue or NULL.
> > >>>>> + *
> > >>>>> + * In case of error, reason is reported through error_report.
> > >>>>>      */
> > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
> > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > >>>>>     {
> > >>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> > >>>>> +    size_t device_size, driver_size;
> > >>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > >>>>>         int r;
> > >>>>>
> > >>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> > >>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
> > >>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> > >>>>>
> > >>>>> +    svq->vring.num = qsize;
> > >>>> I wonder if this is the best. E.g some hardware can support up to 32K
> > >>>> queue size. So this will probably end up with:
> > >>>>
> > >>>> 1) SVQ use 32K queue size
> > >>>> 2) hardware queue uses 256
> > >>>>
> > >>> In that case SVQ vring queue size will be 32K and guest's vring can
> > >>> negotiate any number with SVQ equal or less than 32K,
> > >>
> > >> Sorry for being unclear what I meant is actually
> > >>
> > >> 1) SVQ uses 32K queue size
> > >>
> > >> 2) guest vq uses 256
> > >>
> > >> This looks like a burden that needs extra logic and may damage the
> > >> performance.
> > >>
> > > Still not getting this point.
> > >
> > > An available guest buffer, although contiguous in GPA/GVA, can expand
> > > in multiple buffers if it's not contiguous in qemu's VA (by the while
> > > loop in virtqueue_map_desc [1]). In that scenario it is better to have
> > > "plenty" of SVQ buffers.
> >
> >
> > Yes, but this case should be rare. So in this case we should deal with
> > overrun on SVQ, that is
> >
> > 1) SVQ is full
> > 2) guest VQ isn't
> >
> > We need to
> >
> > 1) check the available buffer slots
> > 2) disable guest kick and wait for the used buffers
> >
> > But it looks to me the current code is not ready for dealing with this case?
> >
>
> Yes it deals, that's the meaning of svq->next_guest_avail_elem.

Oh right, I missed that.

>
> >
> > >
> > > I'm ok if we decide to put an upper limit though, or if we decide not
> > > to handle this situation. But we would leave out valid virtio drivers.
> > > Maybe to set a fixed upper limit (1024?)? To add another parameter
> > > (x-svq-size-n=N)?
> > >
> > > If you mean we lose performance because memory gets more sparse I
> > > think the only possibility is to limit that way.
> >
> >
> > If guest is not using 32K, having a 32K for svq may gives extra stress
> > on the cache since we will end up with a pretty large working set.
> >
>
> That might be true. My guess is that it should not matter, since SVQ
> and the guest's vring will have the same numbers of scattered buffers
> and the avail / used / packed ring will be consumed more or less
> sequentially. But I haven't tested.
>
> I think it's better to add an upper limit (either fixed or in the
> qemu's backend's cmdline) later if we see that this is a problem.

I'd suggest using the same size as what the guest saw.

> Another solution now would be to get the number from the frontend
> device cmdline instead of from the vdpa device. I'm ok with that, but
> it doesn't delete the svq->next_guest_avail_elem processing, and it
> comes with disadvantages in my opinion. More below.

Right, we should keep next_guest_avail_elem. Using the same queue size
is a balance between:

1) using next_guest_avail_elem (rare)
2) not give too much stress on the cache

>
> >
> > >
> > >> And this can lead other interesting situation:
> > >>
> > >> 1) SVQ uses 256
> > >>
> > >> 2) guest vq uses 1024
> > >>
> > >> Where a lot of more SVQ logic is needed.
> > >>
> > > If we agree that a guest descriptor can expand in multiple SVQ
> > > descriptors, this should be already handled by the previous logic too.
> > >
> > > But this should only happen in case that qemu is launched with a "bad"
> > > cmdline, isn't it?
> >
> >
> > This seems can happen when we use -device
> > virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?
> >
>
> I'm going to use the rx queue here since it's more accurate, tx has
> its own limit separately.
>
> If we use rx_queue_size=256 in L0 and rx_queue_size=1024 in L1 with no
> SVQ, L0 qemu will happily accept 1024 as size

Interesting, looks like a bug (I guess it works since you enable vhost?):

Per virtio-spec:

"""
Queue Size. On reset, specifies the maximum queue size supported by
the device. This can be modified by the driver to reduce memory
requirements. A 0 means the queue is unavailable.
"""

We can't increase the queue_size from 256 to 1024 actually. (Only
decrease is allowed).

> when L1 qemu writes that
> value at vhost_virtqueue_start. I'm not sure what would happen with a
> real device, my guess is that the device will fail somehow. That's
> what I meant with a "bad cmdline", I should have been more specific.

I should say that it's something that is probably unrelated to this
series but needs to be addressed.

>
> If we add SVQ to the mix, the guest first negotiates the 1024 with the
> qemu device model. After that, vhost.c will try to write 1024 too but
> this is totally ignored by this patch's changes at
> vhost_vdpa_set_vring_num. Finally, SVQ will set 256 as a ring size to
> the device, since it's the read value from the device, leading to your
> scenario. So SVQ effectively isolates both sides and makes possible
> the communication, even with a device that does not support so many
> descriptors.
>
> But SVQ already handles this case: It's the same as if the buffers are
> fragmented in HVA and queue size is equal at both sides. That's why I
> think SVQ size should depend on the backend device's size, not
> frontend cmdline.

Right.

Thanks

>
> Thanks!
>
> >
> > >
> > > If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> > > as a queue size [2]. But if the vdpa device maximum queue size is
> > > effectively 256, this will result in an error: We're not exposing it
> > > to the guest at any moment but with qemu's cmdline.
> > >
> > >>> including 256.
> > >>> Is that what you mean?
> > >>
> > >> I mean, it looks to me the logic will be much more simplified if we just
> > >> allocate the shadow virtqueue with the size what guest can see (guest
> > >> vring).
> > >>
> > >> Then we don't need to think if the difference of the queue size can have
> > >> any side effects.
> > >>
> > > I think that we cannot avoid that extra logic unless we force GPA to
> > > be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> > > more than one descriptor in SVQ, then yes, we can simplify things. If
> > > not, I think we are forced to carry all of it.
> >
> >
> > Yes, I agree, the code should be robust to handle any case.
> >
> > Thanks
> >
> >
> > >
> > > But if we prove it I'm not opposed to simplifying things and making
> > > head at SVQ == head at guest.
> > >
> > > Thanks!
> > >
> > > [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> > > [2] But that's not the whole story: I've been running limited in tx
> > > descriptors because of virtio_net_max_tx_queue_size, which predates
> > > vdpa. I'll send a patch to also un-limit it.
> > >
> > >>> If with hardware queues you mean guest's vring, not sure why it is
> > >>> "probably 256". I'd say that in that case with the virtio-net kernel
> > >>> driver the ring size will be the same as the device export, for
> > >>> example, isn't it?
> > >>>
> > >>> The implementation should support any combination of sizes, but the
> > >>> ring size exposed to the guest is never bigger than hardware one.
> > >>>
> > >>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
> > >>>> to add event index support?
> > >>>>
> > >>> I think we should not have any problem with event idx. If you mean
> > >>> that the guest could mark more buffers available than SVQ vring's
> > >>> size, that should not happen because there must be less entries in the
> > >>> guest than SVQ.
> > >>>
> > >>> But if I understood you correctly, a similar situation could happen if
> > >>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> > >>> Even if that would happen, the situation should be ok too: SVQ knows
> > >>> the guest's avail idx and, if SVQ is full, it will continue forwarding
> > >>> avail buffers when the device uses more buffers.
> > >>>
> > >>> Does that make sense to you?
> > >>
> > >> Yes.
> > >>
> > >> Thanks
> > >>
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call
       [not found]             ` <CAJaqyWc5uR70a=hTpVpomuahF9iZouLmRpXPnWidga5CFxJOpA@mail.gmail.com>
@ 2022-02-22  7:18               ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-22  7:18 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/21 下午4:01, Eugenio Perez Martin 写道:
> On Mon, Feb 21, 2022 at 8:39 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/18 下午8:35, Eugenio Perez Martin 写道:
>>> On Tue, Feb 8, 2022 at 4:23 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/1/31 下午11:34, Eugenio Perez Martin 写道:
>>>>> On Sat, Jan 29, 2022 at 9:06 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>> ---
>>>>>>>      hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++--
>>>>>>>      1 file changed, 18 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>>>> index 18de14f0fb..029f98feee 100644
>>>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>>>> @@ -687,13 +687,29 @@ static int vhost_vdpa_set_vring_kick(struct vhost_dev *dev,
>>>>>>>          }
>>>>>>>      }
>>>>>>>
>>>>>>> -static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>>>> -                                       struct vhost_vring_file *file)
>>>>>>> +static int vhost_vdpa_set_vring_dev_call(struct vhost_dev *dev,
>>>>>>> +                                         struct vhost_vring_file *file)
>>>>>>>      {
>>>>>>>          trace_vhost_vdpa_set_vring_call(dev, file->index, file->fd);
>>>>>>>          return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
>>>>>>>      }
>>>>>>>
>>>>>>> +static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
>>>>>>> +                                     struct vhost_vring_file *file)
>>>>>>> +{
>>>>>>> +    struct vhost_vdpa *v = dev->opaque;
>>>>>>> +
>>>>>>> +    if (v->shadow_vqs_enabled) {
>>>>>>> +        int vdpa_idx = vhost_vdpa_get_vq_index(dev, file->index);
>>>>>>> +        VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, vdpa_idx);
>>>>>>> +
>>>>>>> +        vhost_svq_set_guest_call_notifier(svq, file->fd);
>>>>>> Two questions here (had similar questions for vring kick):
>>>>>>
>>>>>> 1) Any reason that we setup the eventfd for vhost-vdpa in
>>>>>> vhost_vdpa_svq_setup() not here?
>>>>>>
>>>>> I'm not sure what you mean.
>>>>>
>>>>> The guest->SVQ call and kick fds are set here and at
>>>>> vhost_vdpa_set_vring_kick. The event notifier handler of the guest ->
>>>>> SVQ kick_fd is set at vhost_vdpa_set_vring_kick /
>>>>> vhost_svq_set_svq_kick_fd. The guest -> SVQ call fd has no event
>>>>> notifier handler since we don't poll it.
>>>>>
>>>>> On the other hand, the connection SVQ <-> device uses the same fds
>>>>> from the beginning to the end, and they will not change with, for
>>>>> example, call fd masking. That's why it's setup from
>>>>> vhost_vdpa_svq_setup. Delaying to vhost_vdpa_set_vring_call would make
>>>>> us add way more logic there.
>>>> More logic in general shadow vq code but less codes for vhost-vdpa
>>>> specific code I think.
>>>>
>>>> E.g for we can move the kick set logic from vhost_vdpa_svq_set_fds() to
>>>> here.
>>>>
>>> But they are different fds. vhost_vdpa_svq_set_fds sets the
>>> SVQ<->device. This function sets the SVQ->guest call file descriptor.
>>>
>>> To move the logic of vhost_vdpa_svq_set_fds here would imply either:
>>> a) Logic to know if we are receiving the first call fd or not.
>>
>> Any reason for this? I guess you meant multiqueue. If yes, it should not
>> be much difference since we have idx as the parameter.
>>
> With "first call fd" I meant "first time we receive the call fd", so
> we only set them once.
>
> I think this is going to be easier if I prepare a patch doing your way
> and we comment on it.


That would be helpful but if there's no issue with current code (see 
below), we can leave it as is and do optimization on top.


>
>>>    That
>>> code is not in the series at the moment, because setting at
>>> vhost_vdpa_dev_start tells the difference for free. Is just adding
>>> code, not moving.
>>> b) Logic to set again *the same* file descriptor to device, with logic
>>> to tell if we have missed calls. That logic is not implemented for
>>> device->SVQ call file descriptor, because we are assuming it never
>>> changes from vhost_vdpa_svq_set_fds. So this is again adding code.
>>>
>>> At this moment, we have:
>>> vhost_vdpa_svq_set_fds:
>>>     set SVQ<->device fds
>>>
>>> vhost_vdpa_set_vring_call:
>>>     set guest<-SVQ call
>>>
>>> vhost_vdpa_set_vring_kick:
>>>     set guest->SVQ kick.
>>>
>>> If I understood correctly, the alternative would be something like:
>>> vhost_vdpa_set_vring_call:
>>>     set guest<-SVQ call
>>>     if(!vq->call_set) {
>>>       - set SVQ<-device call.
>>>       - vq->call_set = true
>>>     }
>>>
>>> vhost_vdpa_set_vring_kick:
>>>     set guest<-SVQ call
>>>     if(!vq->dev_kick_set) {
>>>       - set guest->device kick.
>>>       - vq->dev_kick_set = true
>>>     }
>>>
>>> dev_reset / dev_stop:
>>> for vq in vqs:
>>>     vq->dev_kick_set = vq->dev_call_set = false
>>> ...
>>>
>>> Or have I misunderstood something?
>>
>> I wonder what happens if MSI-X is masking in guest. So if I understand
>> correctly, we don't disable the eventfd from device? If yes, this seems
>> suboptinal.
>>
> We cannot disable the device's call fd unless SVQ actively poll it. As
> I see it, if the guest masks the call fd, it could be because:
> a) it doesn't want to receive more calls because is processing buffers
> b) Is going to burn a cpu to poll it.
>
> The masking only affects SVQ->guest call. If we also mask device->SVQ,
> we're adding latency in the case a), and we're effectively disabling
> forwarding in case b).


Right, so we need leave a comment to explain this, then I'm totally fine 
with this approach.


>
> It only works if guest is effectively not interested in calls because
> is not going to retire used buffers, but in that case it doesn't hurt
> to simply maintain the device->call fd, the eventfds are going to be
> silent anyway.
>
> Thanks!


Yes.

Thanks


>
>> Thanks
>>
>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>>> 2) The call could be disabled by using -1 as the fd, I don't see any
>>>>>> code to deal with that.
>>>>>>
>>>>> Right, I didn't take that into account. vhost-kernel takes also -1 as
>>>>> kick_fd to unbind, so SVQ can be reworked to take that into account
>>>>> for sure.
>>>>>
>>>>> Thanks!
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> +        return 0;
>>>>>>> +    } else {
>>>>>>> +        return vhost_vdpa_set_vring_dev_call(dev, file);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>>      /**
>>>>>>>       * Set shadow virtqueue descriptors to the device
>>>>>>>       *

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
       [not found]             ` <CAJaqyWcHhMpjJ4kde1ejV5c_vP7_8PvfXpi5u9rdWuaORFt_zg@mail.gmail.com>
@ 2022-02-22  7:26               ` Jason Wang
       [not found]                 ` <CAJaqyWePWg+eeQjjcMh24k0K+yUQUF2x0yXH32tPPWEw_wYP0Q@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-22  7:26 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/21 下午4:15, Eugenio Perez Martin 写道:
> On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
>>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
>>>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
>>>>>>>      void vhost_svq_stop(VhostShadowVirtqueue *svq)
>>>>>>>      {
>>>>>>>          event_notifier_set_handler(&svq->svq_kick, NULL);
>>>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
>>>>>>> +
>>>>>>> +    if (!svq->vq) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    /* Send all pending used descriptors to guest */
>>>>>>> +    vhost_svq_flush(svq, false);
>>>>>> Do we need to wait for all the pending descriptors to be completed here?
>>>>>>
>>>>> No, this function does not wait, it only completes the forwarding of
>>>>> the *used* descriptors.
>>>>>
>>>>> The best example is the net rx queue in my opinion. This call will
>>>>> check SVQ's vring used_idx and will forward the last used descriptors
>>>>> if any, but all available descriptors will remain as available for
>>>>> qemu's VQ code.
>>>>>
>>>>> To skip it would miss those last rx descriptors in migration.
>>>>>
>>>>> Thanks!
>>>> So it's probably to not the best place to ask. It's more about the
>>>> inflight descriptors so it should be TX instead of RX.
>>>>
>>>> I can imagine the migration last phase, we should stop the vhost-vDPA
>>>> before calling vhost_svq_stop(). Then we should be fine regardless of
>>>> inflight descriptors.
>>>>
>>> I think I'm still missing something here.
>>>
>>> To be on the same page. Regarding tx this could cause repeated tx
>>> frames (one at source and other at destination), but never a missed
>>> buffer not transmitted. The "stop before" could be interpreted as "SVQ
>>> is not forwarding available buffers anymore". Would that work?
>>
>> Right, but this only work if
>>
>> 1) a flush to make sure TX DMA for inflight descriptors are all completed
>>
>> 2) just mark all inflight descriptor used
>>
> It currently trusts on the reverse: Buffers not marked as used (by the
> device) will be available in the destination, so expect
> retransmissions.


I may miss something but I think we do migrate last_avail_idx. So there 
won't be a re-transmission, since we depend on qemu virtqueue code to 
deal with vring base?

Thanks


>
> Thanks!
>
>> Otherwise there could be buffers that is inflight forever.
>>
>> Thanks
>>
>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> +
>>>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
>>>>>>> +        g_autofree VirtQueueElement *elem = NULL;
>>>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
>>>>>>> +        if (elem) {
>>>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
>>>>>>> +    if (next_avail_elem) {
>>>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
>>>>>>> +                                 next_avail_elem->len);
>>>>>>> +    }
>>>>>>>      }

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
       [not found]             ` <CAJaqyWdhHmD+tB_bY_YEMnBU1p7-LW=LP8f+3e_ZXDcOfSRiNA@mail.gmail.com>
@ 2022-02-22  7:41               ` Jason Wang
       [not found]                 ` <CAJaqyWfFC4SgxQ4zQeHgtDDJSd0tBa-W4HmtW0UASA2cVDWDUg@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-22  7:41 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake


在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
>> <eperezma@redhat.com> wrote:
>>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
>>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
>>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
>>>>>>> block migration.
>>>>>>>
>>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
>>>>>>> enabled. Even if the device supports it, the reports would be nonsense
>>>>>>> because SVQ memory is in the qemu region.
>>>>>>>
>>>>>>> The log region is still allocated. Future changes might skip that, but
>>>>>>> this series is already long enough.
>>>>>>>
>>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>>>> ---
>>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
>>>>>>>     1 file changed, 20 insertions(+)
>>>>>>>
>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>>>> index fb0a338baa..75090d65e8 100644
>>>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
>>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
>>>>>>>             /* Filter only features that SVQ can offer to guest */
>>>>>>>             vhost_svq_valid_guest_features(features);
>>>>>>> +
>>>>>>> +        /* Add SVQ logging capabilities */
>>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
>>>>>>>         }
>>>>>>>
>>>>>>>         return ret;
>>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
>>>>>>>
>>>>>>>         if (v->shadow_vqs_enabled) {
>>>>>>>             uint64_t dev_features, svq_features, acked_features;
>>>>>>> +        uint8_t status = 0;
>>>>>>>             bool ok;
>>>>>>>
>>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
>>>>>>> +        if (unlikely(ret)) {
>>>>>>> +            return ret;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
>>>>>>> +            /*
>>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
>>>>>>> +             * would report wrong dirty pages. SVQ handles it.
>>>>>>> +             */
>>>>>> I fail to understand this comment, I'd think there's no way to disable
>>>>>> dirty page tracking for SVQ.
>>>>>>
>>>>> vhost_log_global_{start,stop} are called at the beginning and end of
>>>>> migration. To inform the device that it should start logging, they set
>>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
>>>>
>>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
>>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
>>>> enabled and disabled.
>>>>
>>> Yes, that's what this patch does.
>>>
>>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
>>>>> vhost does not block migration. Maybe we need to look for another way
>>>>> to do this?
>>>>
>>>> I'm fine with filtering since it's much more simpler, but I fail to
>>>> understand why we need to check DRIVER_OK.
>>>>
>>> Ok maybe I can make that part more clear,
>>>
>>> Since both operations use vhost_vdpa_set_features we must just filter
>>> the one that actually sets or removes VHOST_F_LOG_ALL, without
>>> affecting other features.
>>>
>>> In practice, that means to not forward the set features after
>>> DRIVER_OK. The device is not expecting them anymore.
>> I wonder what happens if we don't do this.
>>
> If we simply delete the check vhost_dev_set_features will return an
> error, failing the start of the migration. More on this below.


Ok.


>
>> So kernel had this check:
>>
>>          /*
>>           * It's not allowed to change the features after they have
>>           * been negotiated.
>>           */
>> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
>>          return -EBUSY;
>>
>> So is it FEATURES_OK actually?
>>
> Yes, FEATURES_OK seems more appropriate actually so I will switch to
> it for the next version.
>
> But it should be functionally equivalent, since
> vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> be concurrent with it.


Right.


>
>> For this patch, I wonder if the thing we need to do is to see whether
>> it is a enable/disable F_LOG_ALL and simply return.
>>
> Yes, that's the intention of the patch.
>
> We have 4 cases here:
> a) We're being called from vhost_dev_start, with enable_log = false
> b) We're being called from vhost_dev_start, with enable_log = true


And this case makes us can't simply return without calling vhost-vdpa.


> c) We're being called from vhost_dev_set_log, with enable_log = false
> d) We're being called from vhost_dev_set_log, with enable_log = true
>
> The way to tell the difference between a/b and c/d is to check if
> {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> memory through the memory unmapping, so we clear the bit
> unconditionally if we detect that VHOST_SET_FEATURES will be called
> (cases a and b).
>
> Another possibility is to track if features have been set with a bool
> in vhost_vdpa or something like that. But it seems cleaner to me to
> only store that in the actual device.


So I suggest to make sure codes match the comment:

         if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
             /*
              * vhost is trying to enable or disable _F_LOG, and the device
              * would report wrong dirty pages. SVQ handles it.
              */
             return 0;
         }

It would be better to check whether the caller is toggling _F_LOG_ALL in 
this case.

Thanks


>
>> Thanks
>>
>>> Does that make more sense?
>>>
>>> Thanks!
>>>
>>>> Thanks
>>>>
>>>>
>>>>> Thanks!
>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>> +            return 0;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        /* We must not ack _F_LOG if SVQ is enabled */
>>>>>>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
>>>>>>> +
>>>>>>>             ret = vhost_vdpa_get_dev_features(dev, &dev_features);
>>>>>>>             if (ret != 0) {
>>>>>>>                 error_report("Can't get vdpa device features, got (%d)", ret);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
       [not found]                 ` <CAJaqyWd2PQFedaEOV7YVZgp0m37snn-4LYYtNw7g4u+7hrtq=Q@mail.gmail.com>
@ 2022-02-22  7:59                   ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-22  7:59 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Tue, Feb 22, 2022 at 3:43 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 4:16 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Feb 22, 2022 at 1:23 AM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Mon, Feb 21, 2022 at 8:15 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2022/2/18 上午1:13, Eugenio Perez Martin 写道:
> > > > > On Tue, Feb 8, 2022 at 4:58 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>
> > > > >> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > > > >>> On Sun, Jan 30, 2022 at 5:03 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > > >>>>> First half of the buffers forwarding part, preparing vhost-vdpa
> > > > >>>>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> > > > >>>>> this is effectively dead code at the moment, but it helps to reduce
> > > > >>>>> patch size.
> > > > >>>>>
> > > > >>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > >>>>> ---
> > > > >>>>>     hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> > > > >>>>>     hw/virtio/vhost-shadow-virtqueue.c |  21 ++++-
> > > > >>>>>     hw/virtio/vhost-vdpa.c             | 133 ++++++++++++++++++++++++++---
> > > > >>>>>     3 files changed, 143 insertions(+), 13 deletions(-)
> > > > >>>>>
> > > > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> index 035207a469..39aef5ffdf 100644
> > > > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > > >>>>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > > > >>>>>
> > > > >>>>>     void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > > > >>>>>
> > > > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void);
> > > > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > > > >>>>>
> > > > >>>>>     void vhost_svq_free(VhostShadowVirtqueue *vq);
> > > > >>>>>
> > > > >>>>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> index f129ec8395..7c168075d7 100644
> > > > >>>>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > > >>>>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > > > >>>>>     /**
> > > > >>>>>      * Creates vhost shadow virtqueue, and instruct vhost device to use the shadow
> > > > >>>>>      * methods and file descriptors.
> > > > >>>>> + *
> > > > >>>>> + * @qsize Shadow VirtQueue size
> > > > >>>>> + *
> > > > >>>>> + * Returns the new virtqueue or NULL.
> > > > >>>>> + *
> > > > >>>>> + * In case of error, reason is reported through error_report.
> > > > >>>>>      */
> > > > >>>>> -VhostShadowVirtqueue *vhost_svq_new(void)
> > > > >>>>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > > > >>>>>     {
> > > > >>>>> +    size_t desc_size = sizeof(vring_desc_t) * qsize;
> > > > >>>>> +    size_t device_size, driver_size;
> > > > >>>>>         g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
> > > > >>>>>         int r;
> > > > >>>>>
> > > > >>>>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> > > > >>>>>         /* Placeholder descriptor, it should be deleted at set_kick_fd */
> > > > >>>>>         event_notifier_init_fd(&svq->svq_kick, INVALID_SVQ_KICK_FD);
> > > > >>>>>
> > > > >>>>> +    svq->vring.num = qsize;
> > > > >>>> I wonder if this is the best. E.g some hardware can support up to 32K
> > > > >>>> queue size. So this will probably end up with:
> > > > >>>>
> > > > >>>> 1) SVQ use 32K queue size
> > > > >>>> 2) hardware queue uses 256
> > > > >>>>
> > > > >>> In that case SVQ vring queue size will be 32K and guest's vring can
> > > > >>> negotiate any number with SVQ equal or less than 32K,
> > > > >>
> > > > >> Sorry for being unclear what I meant is actually
> > > > >>
> > > > >> 1) SVQ uses 32K queue size
> > > > >>
> > > > >> 2) guest vq uses 256
> > > > >>
> > > > >> This looks like a burden that needs extra logic and may damage the
> > > > >> performance.
> > > > >>
> > > > > Still not getting this point.
> > > > >
> > > > > An available guest buffer, although contiguous in GPA/GVA, can expand
> > > > > in multiple buffers if it's not contiguous in qemu's VA (by the while
> > > > > loop in virtqueue_map_desc [1]). In that scenario it is better to have
> > > > > "plenty" of SVQ buffers.
> > > >
> > > >
> > > > Yes, but this case should be rare. So in this case we should deal with
> > > > overrun on SVQ, that is
> > > >
> > > > 1) SVQ is full
> > > > 2) guest VQ isn't
> > > >
> > > > We need to
> > > >
> > > > 1) check the available buffer slots
> > > > 2) disable guest kick and wait for the used buffers
> > > >
> > > > But it looks to me the current code is not ready for dealing with this case?
> > > >
> > >
> > > Yes it deals, that's the meaning of svq->next_guest_avail_elem.
> >
> > Oh right, I missed that.
> >
> > >
> > > >
> > > > >
> > > > > I'm ok if we decide to put an upper limit though, or if we decide not
> > > > > to handle this situation. But we would leave out valid virtio drivers.
> > > > > Maybe to set a fixed upper limit (1024?)? To add another parameter
> > > > > (x-svq-size-n=N)?
> > > > >
> > > > > If you mean we lose performance because memory gets more sparse I
> > > > > think the only possibility is to limit that way.
> > > >
> > > >
> > > > If guest is not using 32K, having a 32K for svq may gives extra stress
> > > > on the cache since we will end up with a pretty large working set.
> > > >
> > >
> > > That might be true. My guess is that it should not matter, since SVQ
> > > and the guest's vring will have the same numbers of scattered buffers
> > > and the avail / used / packed ring will be consumed more or less
> > > sequentially. But I haven't tested.
> > >
> > > I think it's better to add an upper limit (either fixed or in the
> > > qemu's backend's cmdline) later if we see that this is a problem.
> >
> > I'd suggest using the same size as what the guest saw.
> >
> > > Another solution now would be to get the number from the frontend
> > > device cmdline instead of from the vdpa device. I'm ok with that, but
> > > it doesn't delete the svq->next_guest_avail_elem processing, and it
> > > comes with disadvantages in my opinion. More below.
> >
> > Right, we should keep next_guest_avail_elem. Using the same queue size
> > is a balance between:
> >
> > 1) using next_guest_avail_elem (rare)
> > 2) not give too much stress on the cache
> >
>
> Ok I'll change the SVQ size for the frontend size then.
>
> > >
> > > >
> > > > >
> > > > >> And this can lead other interesting situation:
> > > > >>
> > > > >> 1) SVQ uses 256
> > > > >>
> > > > >> 2) guest vq uses 1024
> > > > >>
> > > > >> Where a lot of more SVQ logic is needed.
> > > > >>
> > > > > If we agree that a guest descriptor can expand in multiple SVQ
> > > > > descriptors, this should be already handled by the previous logic too.
> > > > >
> > > > > But this should only happen in case that qemu is launched with a "bad"
> > > > > cmdline, isn't it?
> > > >
> > > >
> > > > This seems can happen when we use -device
> > > > virtio-net-pci,tx_queue_size=1024 with a 256 size vp_vdpa device at least?
> > > >
> > >
> > > I'm going to use the rx queue here since it's more accurate, tx has
> > > its own limit separately.
> > >
> > > If we use rx_queue_size=256 in L0 and rx_queue_size=1024 in L1 with no
> > > SVQ, L0 qemu will happily accept 1024 as size
> >
> > Interesting, looks like a bug (I guess it works since you enable vhost?):
> >
>
> No, emulated interfaces. More below.
>
> > Per virtio-spec:
> >
> > """
> > Queue Size. On reset, specifies the maximum queue size supported by
> > the device. This can be modified by the driver to reduce memory
> > requirements. A 0 means the queue is unavailable.
> > """
> >
>
> Yes but how should it fail? Drivers do not know how to check if the
> value was invalid. DEVICE_NEEDS_RESET?

I think it can be detected by reading the value back to see if it matches.

Thanks

>
> The L0 emulated device simply receives the write to pci and calls
> virtio_queue_set_num. I can try to add to the check "num <
> vdev->vq[n].vring.num_default", but there is no way to notify the
> guest that setting the value failed.
>
> > We can't increase the queue_size from 256 to 1024 actually. (Only
> > decrease is allowed).
> >
> > > when L1 qemu writes that
> > > value at vhost_virtqueue_start. I'm not sure what would happen with a
> > > real device, my guess is that the device will fail somehow. That's
> > > what I meant with a "bad cmdline", I should have been more specific.
> >
> > I should say that it's something that is probably unrelated to this
> > series but needs to be addressed.
> >
>
> I agree, I can start developing the patches for sure.
>
> > >
> > > If we add SVQ to the mix, the guest first negotiates the 1024 with the
> > > qemu device model. After that, vhost.c will try to write 1024 too but
> > > this is totally ignored by this patch's changes at
> > > vhost_vdpa_set_vring_num. Finally, SVQ will set 256 as a ring size to
> > > the device, since it's the read value from the device, leading to your
> > > scenario. So SVQ effectively isolates both sides and makes possible
> > > the communication, even with a device that does not support so many
> > > descriptors.
> > >
> > > But SVQ already handles this case: It's the same as if the buffers are
> > > fragmented in HVA and queue size is equal at both sides. That's why I
> > > think SVQ size should depend on the backend device's size, not
> > > frontend cmdline.
> >
> > Right.
> >
> > Thanks
> >
> > >
> > > Thanks!
> > >
> > > >
> > > > >
> > > > > If I run that example with vp_vdpa, L0 qemu will happily accept 1024
> > > > > as a queue size [2]. But if the vdpa device maximum queue size is
> > > > > effectively 256, this will result in an error: We're not exposing it
> > > > > to the guest at any moment but with qemu's cmdline.
> > > > >
> > > > >>> including 256.
> > > > >>> Is that what you mean?
> > > > >>
> > > > >> I mean, it looks to me the logic will be much more simplified if we just
> > > > >> allocate the shadow virtqueue with the size what guest can see (guest
> > > > >> vring).
> > > > >>
> > > > >> Then we don't need to think if the difference of the queue size can have
> > > > >> any side effects.
> > > > >>
> > > > > I think that we cannot avoid that extra logic unless we force GPA to
> > > > > be contiguous in IOVA. If we are sure the guest's buffers cannot be at
> > > > > more than one descriptor in SVQ, then yes, we can simplify things. If
> > > > > not, I think we are forced to carry all of it.
> > > >
> > > >
> > > > Yes, I agree, the code should be robust to handle any case.
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > But if we prove it I'm not opposed to simplifying things and making
> > > > > head at SVQ == head at guest.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > [1] https://gitlab.com/qemu-project/qemu/-/blob/17e31340/hw/virtio/virtio.c#L1297
> > > > > [2] But that's not the whole story: I've been running limited in tx
> > > > > descriptors because of virtio_net_max_tx_queue_size, which predates
> > > > > vdpa. I'll send a patch to also un-limit it.
> > > > >
> > > > >>> If with hardware queues you mean guest's vring, not sure why it is
> > > > >>> "probably 256". I'd say that in that case with the virtio-net kernel
> > > > >>> driver the ring size will be the same as the device export, for
> > > > >>> example, isn't it?
> > > > >>>
> > > > >>> The implementation should support any combination of sizes, but the
> > > > >>> ring size exposed to the guest is never bigger than hardware one.
> > > > >>>
> > > > >>>> ? Or we SVQ can stick to 256 but this will this cause trouble if we want
> > > > >>>> to add event index support?
> > > > >>>>
> > > > >>> I think we should not have any problem with event idx. If you mean
> > > > >>> that the guest could mark more buffers available than SVQ vring's
> > > > >>> size, that should not happen because there must be less entries in the
> > > > >>> guest than SVQ.
> > > > >>>
> > > > >>> But if I understood you correctly, a similar situation could happen if
> > > > >>> a guest's contiguous buffer is scattered across many qemu's VA chunks.
> > > > >>> Even if that would happen, the situation should be ok too: SVQ knows
> > > > >>> the guest's avail idx and, if SVQ is full, it will continue forwarding
> > > > >>> avail buffers when the device uses more buffers.
> > > > >>>
> > > > >>> Does that make sense to you?
> > > > >>
> > > > >> Yes.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
       [not found]         ` <CAJaqyWfRWexq7jrCkJrPzLB4g_fK42pE8BarMhZwKNYtNXi7XA@mail.gmail.com>
@ 2022-02-23  2:03           ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-23  2:03 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Wed, Feb 23, 2022 at 3:01 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 8, 2022 at 9:11 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/2 上午1:08, Eugenio Perez Martin 写道:
> > > On Sun, Jan 30, 2022 at 5:43 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>> Initial version of shadow virtqueue that actually forward buffers. There
> > >>> is no iommu support at the moment, and that will be addressed in future
> > >>> patches of this series. Since all vhost-vdpa devices use forced IOMMU,
> > >>> this means that SVQ is not usable at this point of the series on any
> > >>> device.
> > >>>
> > >>> For simplicity it only supports modern devices, that expects vring
> > >>> in little endian, with split ring and no event idx or indirect
> > >>> descriptors. Support for them will not be added in this series.
> > >>>
> > >>> It reuses the VirtQueue code for the device part. The driver part is
> > >>> based on Linux's virtio_ring driver, but with stripped functionality
> > >>> and optimizations so it's easier to review.
> > >>>
> > >>> However, forwarding buffers have some particular pieces: One of the most
> > >>> unexpected ones is that a guest's buffer can expand through more than
> > >>> one descriptor in SVQ. While this is handled gracefully by qemu's
> > >>> emulated virtio devices, it may cause unexpected SVQ queue full. This
> > >>> patch also solves it by checking for this condition at both guest's
> > >>> kicks and device's calls. The code may be more elegant in the future if
> > >>> SVQ code runs in its own iocontext.
> > >>>
> > >>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>> ---
> > >>>    hw/virtio/vhost-shadow-virtqueue.h |   2 +
> > >>>    hw/virtio/vhost-shadow-virtqueue.c | 365 ++++++++++++++++++++++++++++-
> > >>>    hw/virtio/vhost-vdpa.c             | 111 ++++++++-
> > >>>    3 files changed, 462 insertions(+), 16 deletions(-)
> > >>>
> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > >>> index 39aef5ffdf..19c934af49 100644
> > >>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > >>> @@ -33,6 +33,8 @@ uint16_t vhost_svq_get_num(const VhostShadowVirtqueue *svq);
> > >>>    size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
> > >>>    size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
> > >>>
> > >>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> > >>> +                     VirtQueue *vq);
> > >>>    void vhost_svq_stop(VhostShadowVirtqueue *svq);
> > >>>
> > >>>    VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > >>> index 7c168075d7..a1a404f68f 100644
> > >>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > >>> @@ -9,6 +9,8 @@
> > >>>
> > >>>    #include "qemu/osdep.h"
> > >>>    #include "hw/virtio/vhost-shadow-virtqueue.h"
> > >>> +#include "hw/virtio/vhost.h"
> > >>> +#include "hw/virtio/virtio-access.h"
> > >>>    #include "standard-headers/linux/vhost_types.h"
> > >>>
> > >>>    #include "qemu/error-report.h"
> > >>> @@ -36,6 +38,33 @@ typedef struct VhostShadowVirtqueue {
> > >>>
> > >>>        /* Guest's call notifier, where SVQ calls guest. */
> > >>>        EventNotifier svq_call;
> > >>> +
> > >>> +    /* Virtio queue shadowing */
> > >>> +    VirtQueue *vq;
> > >>> +
> > >>> +    /* Virtio device */
> > >>> +    VirtIODevice *vdev;
> > >>> +
> > >>> +    /* Map for returning guest's descriptors */
> > >>> +    VirtQueueElement **ring_id_maps;
> > >>> +
> > >>> +    /* Next VirtQueue element that guest made available */
> > >>> +    VirtQueueElement *next_guest_avail_elem;
> > >>> +
> > >>> +    /* Next head to expose to device */
> > >>> +    uint16_t avail_idx_shadow;
> > >>> +
> > >>> +    /* Next free descriptor */
> > >>> +    uint16_t free_head;
> > >>> +
> > >>> +    /* Last seen used idx */
> > >>> +    uint16_t shadow_used_idx;
> > >>> +
> > >>> +    /* Next head to consume from device */
> > >>> +    uint16_t last_used_idx;
> > >>> +
> > >>> +    /* Cache for the exposed notification flag */
> > >>> +    bool notification;
> > >>>    } VhostShadowVirtqueue;
> > >>>
> > >>>    #define INVALID_SVQ_KICK_FD -1
> > >>> @@ -148,30 +177,294 @@ bool vhost_svq_ack_guest_features(uint64_t dev_features,
> > >>>        return true;
> > >>>    }
> > >>>
> > >>> -/* Forward guest notifications */
> > >>> -static void vhost_handle_guest_kick(EventNotifier *n)
> > >>> +/**
> > >>> + * Number of descriptors that SVQ can make available from the guest.
> > >>> + *
> > >>> + * @svq   The svq
> > >>> + */
> > >>> +static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
> > >>>    {
> > >>> -    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>> -                                             svq_kick);
> > >>> +    return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
> > >>> +{
> > >>> +    uint16_t notification_flag;
> > >>>
> > >>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> > >>> +    if (svq->notification == enable) {
> > >>> +        return;
> > >>> +    }
> > >>> +
> > >>> +    notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
> > >>> +
> > >>> +    svq->notification = enable;
> > >>> +    if (enable) {
> > >>> +        svq->vring.avail->flags &= ~notification_flag;
> > >>> +    } else {
> > >>> +        svq->vring.avail->flags |= notification_flag;
> > >>> +    }
> > >>> +}
> > >>> +
> > >>> +static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
> > >>> +                                    const struct iovec *iovec,
> > >>> +                                    size_t num, bool more_descs, bool write)
> > >>> +{
> > >>> +    uint16_t i = svq->free_head, last = svq->free_head;
> > >>> +    unsigned n;
> > >>> +    uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
> > >>> +    vring_desc_t *descs = svq->vring.desc;
> > >>> +
> > >>> +    if (num == 0) {
> > >>> +        return;
> > >>> +    }
> > >>> +
> > >>> +    for (n = 0; n < num; n++) {
> > >>> +        if (more_descs || (n + 1 < num)) {
> > >>> +            descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
> > >>> +        } else {
> > >>> +            descs[i].flags = flags;
> > >>> +        }
> > >>> +        descs[i].addr = cpu_to_le64((hwaddr)iovec[n].iov_base);
> > >>> +        descs[i].len = cpu_to_le32(iovec[n].iov_len);
> > >>> +
> > >>> +        last = i;
> > >>> +        i = cpu_to_le16(descs[i].next);
> > >>> +    }
> > >>> +
> > >>> +    svq->free_head = le16_to_cpu(descs[last].next);
> > >>> +}
> > >>> +
> > >>> +static unsigned vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > >>> +                                    VirtQueueElement *elem)
> > >>> +{
> > >>> +    int head;
> > >>> +    unsigned avail_idx;
> > >>> +    vring_avail_t *avail = svq->vring.avail;
> > >>> +
> > >>> +    head = svq->free_head;
> > >>> +
> > >>> +    /* We need some descriptors here */
> > >>> +    assert(elem->out_num || elem->in_num);
> > >>
> > >> Looks like this could be triggered by guest, we need fail instead assert
> > >> here.
> > >>
> > > My understanding was that virtqueue_pop already sanitized that case,
> > > but I'm not able to find where now. I will recheck and, in case it's
> > > not, I will move to a failure.
> > >
> > >>> +
> > >>> +    vhost_vring_write_descs(svq, elem->out_sg, elem->out_num,
> > >>> +                            elem->in_num > 0, false);
> > >>> +    vhost_vring_write_descs(svq, elem->in_sg, elem->in_num, false, true);
> > >>> +
> > >>> +    /*
> > >>> +     * Put entry in available array (but don't update avail->idx until they
> > >>> +     * do sync).
> > >>> +     */
> > >>> +    avail_idx = svq->avail_idx_shadow & (svq->vring.num - 1);
> > >>> +    avail->ring[avail_idx] = cpu_to_le16(head);
> > >>> +    svq->avail_idx_shadow++;
> > >>> +
> > >>> +    /* Update avail index after the descriptor is wrote */
> > >>> +    smp_wmb();
> > >>> +    avail->idx = cpu_to_le16(svq->avail_idx_shadow);
> > >>> +
> > >>> +    return head;
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
> > >>> +{
> > >>> +    unsigned qemu_head = vhost_svq_add_split(svq, elem);
> > >>> +
> > >>> +    svq->ring_id_maps[qemu_head] = elem;
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    /* We need to expose available array entries before checking used flags */
> > >>> +    smp_mb();
> > >>> +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
> > >>>            return;
> > >>>        }
> > >>>
> > >>>        event_notifier_set(&svq->hdev_kick);
> > >>>    }
> > >>>
> > >>> -/* Forward vhost notifications */
> > >>> +/**
> > >>> + * Forward available buffers.
> > >>> + *
> > >>> + * @svq Shadow VirtQueue
> > >>> + *
> > >>> + * Note that this function does not guarantee that all guest's available
> > >>> + * buffers are available to the device in SVQ avail ring. The guest may have
> > >>> + * exposed a GPA / GIOVA congiuous buffer, but it may not be contiguous in qemu
> > >>> + * vaddr.
> > >>> + *
> > >>> + * If that happens, guest's kick notifications will be disabled until device
> > >>> + * makes some buffers used.
> > >>> + */
> > >>> +static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    /* Clear event notifier */
> > >>> +    event_notifier_test_and_clear(&svq->svq_kick);
> > >>> +
> > >>> +    /* Make available as many buffers as possible */
> > >>> +    do {
> > >>> +        if (virtio_queue_get_notification(svq->vq)) {
> > >>> +            virtio_queue_set_notification(svq->vq, false);
> > >>
> > >> This looks like an optimization the should belong to
> > >> virtio_queue_set_notification() itself.
> > >>
> > > Sure we can move.
> > >
> > >>> +        }
> > >>> +
> > >>> +        while (true) {
> > >>> +            VirtQueueElement *elem;
> > >>> +
> > >>> +            if (svq->next_guest_avail_elem) {
> > >>> +                elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >>> +            } else {
> > >>> +                elem = virtqueue_pop(svq->vq, sizeof(*elem));
> > >>> +            }
> > >>> +
> > >>> +            if (!elem) {
> > >>> +                break;
> > >>> +            }
> > >>> +
> > >>> +            if (elem->out_num + elem->in_num >
> > >>> +                vhost_svq_available_slots(svq)) {
> > >>> +                /*
> > >>> +                 * This condition is possible since a contiguous buffer in GPA
> > >>> +                 * does not imply a contiguous buffer in qemu's VA
> > >>> +                 * scatter-gather segments. If that happen, the buffer exposed
> > >>> +                 * to the device needs to be a chain of descriptors at this
> > >>> +                 * moment.
> > >>> +                 *
> > >>> +                 * SVQ cannot hold more available buffers if we are here:
> > >>> +                 * queue the current guest descriptor and ignore further kicks
> > >>> +                 * until some elements are used.
> > >>> +                 */
> > >>> +                svq->next_guest_avail_elem = elem;
> > >>> +                return;
> > >>> +            }
> > >>> +
> > >>> +            vhost_svq_add(svq, elem);
> > >>> +            vhost_svq_kick(svq);
> > >>> +        }
> > >>> +
> > >>> +        virtio_queue_set_notification(svq->vq, true);
> > >>> +    } while (!virtio_queue_empty(svq->vq));
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Handle guest's kick.
> > >>> + *
> > >>> + * @n guest kick event notifier, the one that guest set to notify svq.
> > >>> + */
> > >>> +static void vhost_handle_guest_kick_notifier(EventNotifier *n)
> > >>> +{
> > >>> +    VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>> +                                             svq_kick);
> > >>> +    vhost_handle_guest_kick(svq);
> > >>> +}
> > >>> +
> > >>> +static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    if (svq->last_used_idx != svq->shadow_used_idx) {
> > >>> +        return true;
> > >>> +    }
> > >>> +
> > >>> +    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
> > >>> +
> > >>> +    return svq->last_used_idx != svq->shadow_used_idx;
> > >>> +}
> > >>> +
> > >>> +static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    vring_desc_t *descs = svq->vring.desc;
> > >>> +    const vring_used_t *used = svq->vring.used;
> > >>> +    vring_used_elem_t used_elem;
> > >>> +    uint16_t last_used;
> > >>> +
> > >>> +    if (!vhost_svq_more_used(svq)) {
> > >>> +        return NULL;
> > >>> +    }
> > >>> +
> > >>> +    /* Only get used array entries after they have been exposed by dev */
> > >>> +    smp_rmb();
> > >>> +    last_used = svq->last_used_idx & (svq->vring.num - 1);
> > >>> +    used_elem.id = le32_to_cpu(used->ring[last_used].id);
> > >>> +    used_elem.len = le32_to_cpu(used->ring[last_used].len);
> > >>> +
> > >>> +    svq->last_used_idx++;
> > >>> +    if (unlikely(used_elem.id >= svq->vring.num)) {
> > >>> +        error_report("Device %s says index %u is used", svq->vdev->name,
> > >>> +                     used_elem.id);
> > >>> +        return NULL;
> > >>> +    }
> > >>> +
> > >>> +    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
> > >>> +        error_report(
> > >>> +            "Device %s says index %u is used, but it was not available",
> > >>> +            svq->vdev->name, used_elem.id);
> > >>> +        return NULL;
> > >>> +    }
> > >>> +
> > >>> +    descs[used_elem.id].next = svq->free_head;
> > >>> +    svq->free_head = used_elem.id;
> > >>> +
> > >>> +    svq->ring_id_maps[used_elem.id]->len = used_elem.len;
> > >>> +    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
> > >>> +}
> > >>> +
> > >>> +static void vhost_svq_flush(VhostShadowVirtqueue *svq,
> > >>> +                            bool check_for_avail_queue)
> > >>> +{
> > >>> +    VirtQueue *vq = svq->vq;
> > >>> +
> > >>> +    /* Make as many buffers as possible used. */
> > >>> +    do {
> > >>> +        unsigned i = 0;
> > >>> +
> > >>> +        vhost_svq_set_notification(svq, false);
> > >>> +        while (true) {
> > >>> +            g_autofree VirtQueueElement *elem = vhost_svq_get_buf(svq);
> > >>> +            if (!elem) {
> > >>> +                break;
> > >>> +            }
> > >>> +
> > >>> +            if (unlikely(i >= svq->vring.num)) {
> > >>> +                virtio_error(svq->vdev,
> > >>> +                         "More than %u used buffers obtained in a %u size SVQ",
> > >>> +                         i, svq->vring.num);
> > >>> +                virtqueue_fill(vq, elem, elem->len, i);
> > >>> +                virtqueue_flush(vq, i);
> > >>
> > >> Let's simply use virtqueue_push() here?
> > >>
> > > virtqueue_push support to fill and flush only one element, instead of
> > > batch. I'm fine with either but I think the less updates to the used
> > > idx, the better.
> >
> >
> > Fine.
> >
> >
> > >
> > >>> +                i = 0;
> > >>
> > >> Do we need to bail out here?
> > >>
> > > Yes I guess we can simply return.
> > >
> > >>> +            }
> > >>> +            virtqueue_fill(vq, elem, elem->len, i++);
> > >>> +        }
> > >>> +
> > >>> +        virtqueue_flush(vq, i);
> > >>> +        event_notifier_set(&svq->svq_call);
> > >>> +
> > >>> +        if (check_for_avail_queue && svq->next_guest_avail_elem) {
> > >>> +            /*
> > >>> +             * Avail ring was full when vhost_svq_flush was called, so it's a
> > >>> +             * good moment to make more descriptors available if possible
> > >>> +             */
> > >>> +            vhost_handle_guest_kick(svq);
> > >>
> > >> Is there better to have a similar check as vhost_handle_guest_kick() did?
> > >>
> > >>               if (elem->out_num + elem->in_num >
> > >>                   vhost_svq_available_slots(svq)) {
> > >>
> > > It will be duplicated when we call vhost_handle_guest_kick, won't it?
> >
> >
> > Right, I mis-read the code.
> >
> >
> > >
> > >>> +        }
> > >>> +
> > >>> +        vhost_svq_set_notification(svq, true);
> > >>
> > >> A mb() is needed here? Otherwise we may lost a call here (where
> > >> vhost_svq_more_used() is run before vhost_svq_set_notification()).
> > >>
> > > I'm confused here then, I thought you said this is just a hint so
> > > there was no need? [1]. I think the memory barrier is needed too.
> >
> >
> > Yes, it's a hint but:
> >
> > 1) When we disable the notification, consider the notification disable
> > is just a hint, device can still raise an interrupt, so the ordering is
> > meaningless and a memory barrier is not necessary (the
> > vhost_svq_set_notification(svq, false))
> >
> > 2) When we enable the notification, though it's a hint, the device can
> > choose to implement it by enabling the interrupt, in this case, the
> > notification enable should be done before checking the used. Otherwise,
> > the checking of more used might be done before enable the notification:
> >
> > 1) driver check more used
> > 2) device add more used but no notification
> > 3) driver enable the notification then we lost a notification here
> >
>
> That was my understanding too. So the right way is to only add the
> memory barrier in case 2), when setting the flag, right?

Yes.

>
> >
> > >>> +    } while (vhost_svq_more_used(svq));
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Forward used buffers.
> > >>> + *
> > >>> + * @n hdev call event notifier, the one that device set to notify svq.
> > >>> + *
> > >>> + * Note that we are not making any buffers available in the loop, there is no
> > >>> + * way that it runs more than virtqueue size times.
> > >>> + */
> > >>>    static void vhost_svq_handle_call(EventNotifier *n)
> > >>>    {
> > >>>        VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
> > >>>                                                 hdev_call);
> > >>>
> > >>> -    if (unlikely(!event_notifier_test_and_clear(n))) {
> > >>> -        return;
> > >>> -    }
> > >>> +    /* Clear event notifier */
> > >>> +    event_notifier_test_and_clear(n);
> > >>
> > >> Any reason that we remove the above check?
> > >>
> > > This comes from the previous versions, where this made sure we missed
> > > no used buffers in the process of switching to SVQ mode.
> >
> >
> > I'm not sure I get here. Even if for the switching, it should be more
> > safe the handle the flush unconditionally?
> >
>
> Yes, I also think it's better to forward and kick/call unconditionally.
>
> Thanks!

Ok.

Thanks

>
> > Thanks
> >
> >
> > >
> > > If we enable SVQ from the beginning I think we can rely on getting all
> > > the device's used buffer notifications, so let me think a little bit
> > > and I can move to check the eventfd.
> > >
> > >>> -    event_notifier_set(&svq->svq_call);
> > >>> +    vhost_svq_flush(svq, true);
> > >>>    }
> > >>>
> > >>>    /**
> > >>> @@ -258,13 +551,38 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > >>>         * need to explicitely check for them.
> > >>>         */
> > >>>        event_notifier_init_fd(&svq->svq_kick, svq_kick_fd);
> > >>> -    event_notifier_set_handler(&svq->svq_kick, vhost_handle_guest_kick);
> > >>> +    event_notifier_set_handler(&svq->svq_kick,
> > >>> +                               vhost_handle_guest_kick_notifier);
> > >>>
> > >>>        if (!check_old || event_notifier_test_and_clear(&tmp)) {
> > >>>            event_notifier_set(&svq->hdev_kick);
> > >>>        }
> > >>>    }
> > >>>
> > >>> +/**
> > >>> + * Start shadow virtqueue operation.
> > >>> + *
> > >>> + * @svq Shadow Virtqueue
> > >>> + * @vdev        VirtIO device
> > >>> + * @vq          Virtqueue to shadow
> > >>> + */
> > >>> +void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
> > >>> +                     VirtQueue *vq)
> > >>> +{
> > >>> +    svq->next_guest_avail_elem = NULL;
> > >>> +    svq->avail_idx_shadow = 0;
> > >>> +    svq->shadow_used_idx = 0;
> > >>> +    svq->last_used_idx = 0;
> > >>> +    svq->vdev = vdev;
> > >>> +    svq->vq = vq;
> > >>> +
> > >>> +    memset(svq->vring.avail, 0, sizeof(*svq->vring.avail));
> > >>> +    memset(svq->vring.used, 0, sizeof(*svq->vring.avail));
> > >>> +    for (unsigned i = 0; i < svq->vring.num - 1; i++) {
> > >>> +        svq->vring.desc[i].next = cpu_to_le16(i + 1);
> > >>> +    }
> > >>> +}
> > >>> +
> > >>>    /**
> > >>>     * Stop shadow virtqueue operation.
> > >>>     * @svq Shadow Virtqueue
> > >>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > >>>    void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>    {
> > >>>        event_notifier_set_handler(&svq->svq_kick, NULL);
> > >>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> > >>> +
> > >>> +    if (!svq->vq) {
> > >>> +        return;
> > >>> +    }
> > >>> +
> > >>> +    /* Send all pending used descriptors to guest */
> > >>> +    vhost_svq_flush(svq, false);
> > >>> +
> > >>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> > >>> +        g_autofree VirtQueueElement *elem = NULL;
> > >>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > >>> +        if (elem) {
> > >>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> > >>> +        }
> > >>> +    }
> > >>> +
> > >>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >>> +    if (next_avail_elem) {
> > >>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> > >>> +                                 next_avail_elem->len);
> > >>> +    }
> > >>>    }
> > >>>
> > >>>    /**
> > >>> @@ -316,7 +656,7 @@ VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> > >>>        memset(svq->vring.desc, 0, driver_size);
> > >>>        svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
> > >>>        memset(svq->vring.used, 0, device_size);
> > >>> -
> > >>> +    svq->ring_id_maps = g_new0(VirtQueueElement *, qsize);
> > >>>        event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
> > >>>        return g_steal_pointer(&svq);
> > >>>
> > >>> @@ -335,6 +675,7 @@ void vhost_svq_free(VhostShadowVirtqueue *vq)
> > >>>        event_notifier_cleanup(&vq->hdev_kick);
> > >>>        event_notifier_set_handler(&vq->hdev_call, NULL);
> > >>>        event_notifier_cleanup(&vq->hdev_call);
> > >>> +    g_free(vq->ring_id_maps);
> > >>>        qemu_vfree(vq->vring.desc);
> > >>>        qemu_vfree(vq->vring.used);
> > >>>        g_free(vq);
> > >>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>> index 53e14bafa0..0e5c00ed7e 100644
> > >>> --- a/hw/virtio/vhost-vdpa.c
> > >>> +++ b/hw/virtio/vhost-vdpa.c
> > >>> @@ -752,9 +752,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev *dev,
> > >>>     * Note that this function does not rewind kick file descriptor if cannot set
> > >>>     * call one.
> > >>>     */
> > >>> -static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>> -                                VhostShadowVirtqueue *svq,
> > >>> -                                unsigned idx)
> > >>> +static int vhost_vdpa_svq_set_fds(struct vhost_dev *dev,
> > >>> +                                  VhostShadowVirtqueue *svq,
> > >>> +                                  unsigned idx)
> > >>>    {
> > >>>        struct vhost_vring_file file = {
> > >>>            .index = dev->vq_index + idx,
> > >>> @@ -767,7 +767,7 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>>        r = vhost_vdpa_set_vring_dev_kick(dev, &file);
> > >>>        if (unlikely(r != 0)) {
> > >>>            error_report("Can't set device kick fd (%d)", -r);
> > >>> -        return false;
> > >>> +        return r;
> > >>>        }
> > >>>
> > >>>        event_notifier = vhost_svq_get_svq_call_notifier(svq);
> > >>> @@ -777,6 +777,99 @@ static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>>            error_report("Can't set device call fd (%d)", -r);
> > >>>        }
> > >>>
> > >>> +    return r;
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Unmap SVQ area in the device
> > >>> + */
> > >>> +static bool vhost_vdpa_svq_unmap_ring(struct vhost_vdpa *v, hwaddr iova,
> > >>> +                                      hwaddr size)
> > >>> +{
> > >>> +    int r;
> > >>> +
> > >>> +    size = ROUND_UP(size, qemu_real_host_page_size);
> > >>> +    r = vhost_vdpa_dma_unmap(v, iova, size);
> > >>> +    return r == 0;
> > >>> +}
> > >>> +
> > >>> +static bool vhost_vdpa_svq_unmap_rings(struct vhost_dev *dev,
> > >>> +                                       const VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    struct vhost_vdpa *v = dev->opaque;
> > >>> +    struct vhost_vring_addr svq_addr;
> > >>> +    size_t device_size = vhost_svq_device_area_size(svq);
> > >>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> > >>> +    bool ok;
> > >>> +
> > >>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> > >>> +
> > >>> +    ok = vhost_vdpa_svq_unmap_ring(v, svq_addr.desc_user_addr, driver_size);
> > >>> +    if (unlikely(!ok)) {
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    return vhost_vdpa_svq_unmap_ring(v, svq_addr.used_user_addr, device_size);
> > >>> +}
> > >>> +
> > >>> +/**
> > >>> + * Map shadow virtqueue rings in device
> > >>> + *
> > >>> + * @dev   The vhost device
> > >>> + * @svq   The shadow virtqueue
> > >>> + */
> > >>> +static bool vhost_vdpa_svq_map_rings(struct vhost_dev *dev,
> > >>> +                                     const VhostShadowVirtqueue *svq)
> > >>> +{
> > >>> +    struct vhost_vdpa *v = dev->opaque;
> > >>> +    struct vhost_vring_addr svq_addr;
> > >>> +    size_t device_size = vhost_svq_device_area_size(svq);
> > >>> +    size_t driver_size = vhost_svq_driver_area_size(svq);
> > >>> +    int r;
> > >>> +
> > >>> +    vhost_svq_get_vring_addr(svq, &svq_addr);
> > >>> +
> > >>> +    r = vhost_vdpa_dma_map(v, svq_addr.desc_user_addr, driver_size,
> > >>> +                           (void *)svq_addr.desc_user_addr, true);
> > >>> +    if (unlikely(r != 0)) {
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    r = vhost_vdpa_dma_map(v, svq_addr.used_user_addr, device_size,
> > >>> +                           (void *)svq_addr.used_user_addr, false);
> > >>
> > >> Do we need unmap the driver area if we fail here?
> > >>
> > > Yes, this used to trust in unmap them at the disabling of SVQ. Now I
> > > think we need to unmap as you say.
> > >
> > > Thanks!
> > >
> > > [1] https://lists.linuxfoundation.org/pipermail/virtualization/2021-March/053322.html
> > >
> > >> Thanks
> > >>
> > >>
> > >>> +    return r == 0;
> > >>> +}
> > >>> +
> > >>> +static bool vhost_vdpa_svq_setup(struct vhost_dev *dev,
> > >>> +                                VhostShadowVirtqueue *svq,
> > >>> +                                unsigned idx)
> > >>> +{
> > >>> +    uint16_t vq_index = dev->vq_index + idx;
> > >>> +    struct vhost_vring_state s = {
> > >>> +        .index = vq_index,
> > >>> +    };
> > >>> +    int r;
> > >>> +    bool ok;
> > >>> +
> > >>> +    r = vhost_vdpa_set_dev_vring_base(dev, &s);
> > >>> +    if (unlikely(r)) {
> > >>> +        error_report("Can't set vring base (%d)", r);
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    s.num = vhost_svq_get_num(svq);
> > >>> +    r = vhost_vdpa_set_dev_vring_num(dev, &s);
> > >>> +    if (unlikely(r)) {
> > >>> +        error_report("Can't set vring num (%d)", r);
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    ok = vhost_vdpa_svq_map_rings(dev, svq);
> > >>> +    if (unlikely(!ok)) {
> > >>> +        return false;
> > >>> +    }
> > >>> +
> > >>> +    r = vhost_vdpa_svq_set_fds(dev, svq, idx);
> > >>>        return r == 0;
> > >>>    }
> > >>>
> > >>> @@ -788,14 +881,24 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >>>        if (started) {
> > >>>            vhost_vdpa_host_notifiers_init(dev);
> > >>>            for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> > >>> +            VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
> > >>>                VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
> > >>>                bool ok = vhost_vdpa_svq_setup(dev, svq, i);
> > >>>                if (unlikely(!ok)) {
> > >>>                    return -1;
> > >>>                }
> > >>> +            vhost_svq_start(svq, dev->vdev, vq);
> > >>>            }
> > >>>            vhost_vdpa_set_vring_ready(dev);
> > >>>        } else {
> > >>> +        for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
> > >>> +            VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
> > >>> +                                                          i);
> > >>> +            bool ok = vhost_vdpa_svq_unmap_rings(dev, svq);
> > >>> +            if (unlikely(!ok)) {
> > >>> +                return -1;
> > >>> +            }
> > >>> +        }
> > >>>            vhost_vdpa_host_notifiers_uninit(dev, dev->nvqs);
> > >>>        }
> > >>>
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding
       [not found]                 ` <CAJaqyWePWg+eeQjjcMh24k0K+yUQUF2x0yXH32tPPWEw_wYP0Q@mail.gmail.com>
@ 2022-02-23  2:26                   ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-23  2:26 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Tue, Feb 22, 2022 at 4:56 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 8:26 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/21 下午4:15, Eugenio Perez Martin 写道:
> > > On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>
> > >> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道:
> > >>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道:
> > >>>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
> > >>>>>>>      void vhost_svq_stop(VhostShadowVirtqueue *svq)
> > >>>>>>>      {
> > >>>>>>>          event_notifier_set_handler(&svq->svq_kick, NULL);
> > >>>>>>> +    g_autofree VirtQueueElement *next_avail_elem = NULL;
> > >>>>>>> +
> > >>>>>>> +    if (!svq->vq) {
> > >>>>>>> +        return;
> > >>>>>>> +    }
> > >>>>>>> +
> > >>>>>>> +    /* Send all pending used descriptors to guest */
> > >>>>>>> +    vhost_svq_flush(svq, false);
> > >>>>>> Do we need to wait for all the pending descriptors to be completed here?
> > >>>>>>
> > >>>>> No, this function does not wait, it only completes the forwarding of
> > >>>>> the *used* descriptors.
> > >>>>>
> > >>>>> The best example is the net rx queue in my opinion. This call will
> > >>>>> check SVQ's vring used_idx and will forward the last used descriptors
> > >>>>> if any, but all available descriptors will remain as available for
> > >>>>> qemu's VQ code.
> > >>>>>
> > >>>>> To skip it would miss those last rx descriptors in migration.
> > >>>>>
> > >>>>> Thanks!
> > >>>> So it's probably to not the best place to ask. It's more about the
> > >>>> inflight descriptors so it should be TX instead of RX.
> > >>>>
> > >>>> I can imagine the migration last phase, we should stop the vhost-vDPA
> > >>>> before calling vhost_svq_stop(). Then we should be fine regardless of
> > >>>> inflight descriptors.
> > >>>>
> > >>> I think I'm still missing something here.
> > >>>
> > >>> To be on the same page. Regarding tx this could cause repeated tx
> > >>> frames (one at source and other at destination), but never a missed
> > >>> buffer not transmitted. The "stop before" could be interpreted as "SVQ
> > >>> is not forwarding available buffers anymore". Would that work?
> > >>
> > >> Right, but this only work if
> > >>
> > >> 1) a flush to make sure TX DMA for inflight descriptors are all completed
> > >>
> > >> 2) just mark all inflight descriptor used
> > >>
> > > It currently trusts on the reverse: Buffers not marked as used (by the
> > > device) will be available in the destination, so expect
> > > retransmissions.
> >
> >
> > I may miss something but I think we do migrate last_avail_idx. So there
> > won't be a re-transmission, since we depend on qemu virtqueue code to
> > deal with vring base?
> >
>
> On stop, vhost_virtqueue_stop calls vhost_vdpa_get_vring_base. In SVQ
> mode, it returns last_used_idx. After that, vhost.c code set VirtQueue
> last_avail_idx == last_used_idx, and it's migrated after that if I'm
> not wrong.

Ok, I miss these details in the review. I suggest mentioning this in
the change log and add a comment in vhost_vdpa_get_vring_base().

>
> vhost kernel migrates last_avail_idx, but it makes rx buffers
> available on-demand, unlike SVQ. So it does not need to unwind buffers
> or anything like that. Because of how SVQ works with the rx queue,
> this is not possible, since the destination will find no available
> buffers for rx. And for tx you already have described the scenario.
>
> In other words, we cannot see SVQ as a vhost device in that regard:
> SVQ looks for total drain (as "make all guest's buffers available for
> the device ASAP") vs the vhost device which can live with a lot of
> available ones and it will use them on demand. Same problem as
> masking. So the difference in behavior is justified in my opinion, and
> it can be improved in the future with the vdpa in-flight descriptors.
>
> If we restore the state that way in a virtio-net device, it will see
> the available ones as expected, not as in-flight.
>
> Another possibility is to transform all of these into in-flight ones,
> but I feel it would create problems. Can we migrate all rx queues as
> in-flight, with 0 bytes written? Is it worth it?

To clarify, for inflight I meant from the device point of view, that
is [last_used_idx, last_avail_idx).

So for RX and SVQ, it should be as simple as stop forwarding buffers
since last_used_idx should be the same as last_avail_idx in this case.
(Though technically the rx buffer might be modified by the NIC).

> I didn't investigate
> that path too much, but I think the virtio-net emulated device does
> not support that at the moment. If I'm not wrong, we should copy
> something like the body of virtio_blk_load_device if we want to go
> that route.
>
> The current approach might be too net-centric, so let me know if this
> behavior is unexpected or we can do better otherwise.

It should be fine to start from a networking device. We can add more
in the future if it is needed.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > Thanks!
> > >
> > >> Otherwise there could be buffers that is inflight forever.
> > >>
> > >> Thanks
> > >>
> > >>
> > >>> Thanks!
> > >>>
> > >>>> Thanks
> > >>>>
> > >>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>
> > >>>>>>> +
> > >>>>>>> +    for (unsigned i = 0; i < svq->vring.num; ++i) {
> > >>>>>>> +        g_autofree VirtQueueElement *elem = NULL;
> > >>>>>>> +        elem = g_steal_pointer(&svq->ring_id_maps[i]);
> > >>>>>>> +        if (elem) {
> > >>>>>>> +            virtqueue_detach_element(svq->vq, elem, elem->len);
> > >>>>>>> +        }
> > >>>>>>> +    }
> > >>>>>>> +
> > >>>>>>> +    next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem);
> > >>>>>>> +    if (next_avail_elem) {
> > >>>>>>> +        virtqueue_detach_element(svq->vq, next_avail_elem,
> > >>>>>>> +                                 next_avail_elem->len);
> > >>>>>>> +    }
> > >>>>>>>      }
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
       [not found]                 ` <CAJaqyWfFC4SgxQ4zQeHgtDDJSd0tBa-W4HmtW0UASA2cVDWDUg@mail.gmail.com>
@ 2022-02-23  3:46                   ` Jason Wang
       [not found]                     ` <CAJaqyWds=97TjEpORiqhsj57KNxJ482jwcRS8TN59a4aank7-w@mail.gmail.com>
  0 siblings, 1 reply; 52+ messages in thread
From: Jason Wang @ 2022-02-23  3:46 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Tue, Feb 22, 2022 at 4:06 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Feb 22, 2022 at 8:41 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> > >> <eperezma@redhat.com> wrote:
> > >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>>
> > >>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > >>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > >>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > >>>>>>> block migration.
> > >>>>>>>
> > >>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > >>>>>>> enabled. Even if the device supports it, the reports would be nonsense
> > >>>>>>> because SVQ memory is in the qemu region.
> > >>>>>>>
> > >>>>>>> The log region is still allocated. Future changes might skip that, but
> > >>>>>>> this series is already long enough.
> > >>>>>>>
> > >>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > >>>>>>> ---
> > >>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > >>>>>>>     1 file changed, 20 insertions(+)
> > >>>>>>>
> > >>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>>>>>> index fb0a338baa..75090d65e8 100644
> > >>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > >>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > >>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > >>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
> > >>>>>>>             /* Filter only features that SVQ can offer to guest */
> > >>>>>>>             vhost_svq_valid_guest_features(features);
> > >>>>>>> +
> > >>>>>>> +        /* Add SVQ logging capabilities */
> > >>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > >>>>>>>         }
> > >>>>>>>
> > >>>>>>>         return ret;
> > >>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > >>>>>>>
> > >>>>>>>         if (v->shadow_vqs_enabled) {
> > >>>>>>>             uint64_t dev_features, svq_features, acked_features;
> > >>>>>>> +        uint8_t status = 0;
> > >>>>>>>             bool ok;
> > >>>>>>>
> > >>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > >>>>>>> +        if (unlikely(ret)) {
> > >>>>>>> +            return ret;
> > >>>>>>> +        }
> > >>>>>>> +
> > >>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > >>>>>>> +            /*
> > >>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > >>>>>>> +             * would report wrong dirty pages. SVQ handles it.
> > >>>>>>> +             */
> > >>>>>> I fail to understand this comment, I'd think there's no way to disable
> > >>>>>> dirty page tracking for SVQ.
> > >>>>>>
> > >>>>> vhost_log_global_{start,stop} are called at the beginning and end of
> > >>>>> migration. To inform the device that it should start logging, they set
> > >>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > >>>>
> > >>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > >>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > >>>> enabled and disabled.
> > >>>>
> > >>> Yes, that's what this patch does.
> > >>>
> > >>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > >>>>> vhost does not block migration. Maybe we need to look for another way
> > >>>>> to do this?
> > >>>>
> > >>>> I'm fine with filtering since it's much more simpler, but I fail to
> > >>>> understand why we need to check DRIVER_OK.
> > >>>>
> > >>> Ok maybe I can make that part more clear,
> > >>>
> > >>> Since both operations use vhost_vdpa_set_features we must just filter
> > >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> > >>> affecting other features.
> > >>>
> > >>> In practice, that means to not forward the set features after
> > >>> DRIVER_OK. The device is not expecting them anymore.
> > >> I wonder what happens if we don't do this.
> > >>
> > > If we simply delete the check vhost_dev_set_features will return an
> > > error, failing the start of the migration. More on this below.
> >
> >
> > Ok.
> >
> >
> > >
> > >> So kernel had this check:
> > >>
> > >>          /*
> > >>           * It's not allowed to change the features after they have
> > >>           * been negotiated.
> > >>           */
> > >> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
> > >>          return -EBUSY;
> > >>
> > >> So is it FEATURES_OK actually?
> > >>
> > > Yes, FEATURES_OK seems more appropriate actually so I will switch to
> > > it for the next version.
> > >
> > > But it should be functionally equivalent, since
> > > vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> > > be concurrent with it.
> >
> >
> > Right.
> >
> >
> > >
> > >> For this patch, I wonder if the thing we need to do is to see whether
> > >> it is a enable/disable F_LOG_ALL and simply return.
> > >>
> > > Yes, that's the intention of the patch.
> > >
> > > We have 4 cases here:
> > > a) We're being called from vhost_dev_start, with enable_log = false
> > > b) We're being called from vhost_dev_start, with enable_log = true
> >
> >
> > And this case makes us can't simply return without calling vhost-vdpa.
> >
>
> It calls because {FEATURES,DRIVER}_OK is still not set at that point.
>
> >
> > > c) We're being called from vhost_dev_set_log, with enable_log = false
> > > d) We're being called from vhost_dev_set_log, with enable_log = true
> > >
> > > The way to tell the difference between a/b and c/d is to check if
> > > {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> > > F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> > > memory through the memory unmapping, so we clear the bit
> > > unconditionally if we detect that VHOST_SET_FEATURES will be called
> > > (cases a and b).
> > >
> > > Another possibility is to track if features have been set with a bool
> > > in vhost_vdpa or something like that. But it seems cleaner to me to
> > > only store that in the actual device.
> >
> >
> > So I suggest to make sure codes match the comment:
> >
> >          if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> >              /*
> >               * vhost is trying to enable or disable _F_LOG, and the device
> >               * would report wrong dirty pages. SVQ handles it.
> >               */
> >              return 0;
> >          }
> >
> > It would be better to check whether the caller is toggling _F_LOG_ALL in
> > this case.
> >
>
> How to detect? We can save feature flags and compare, but ignoring all
> set_features after FEATURES_OK seems simpler to me.

Something like:

(status ^ status_old == _F_LOG_ALL) ?

It helps us to return errors on wrong features set during DRIVER_OK.

Thanks

>
> Would changing the comment work? Something like "set_features after
> _S_FEATURES_OK means vhost is trying to enable or disable _F_LOG, and
> the device would report wrong dirty pages. SVQ handles it."
>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > >> Thanks
> > >>
> > >>> Does that make more sense?
> > >>>
> > >>> Thanks!
> > >>>
> > >>>> Thanks
> > >>>>
> > >>>>
> > >>>>> Thanks!
> > >>>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>
> > >>>>>>> +            return 0;
> > >>>>>>> +        }
> > >>>>>>> +
> > >>>>>>> +        /* We must not ack _F_LOG if SVQ is enabled */
> > >>>>>>> +        features &= ~BIT_ULL(VHOST_F_LOG_ALL);
> > >>>>>>> +
> > >>>>>>>             ret = vhost_vdpa_get_dev_features(dev, &dev_features);
> > >>>>>>>             if (ret != 0) {
> > >>>>>>>                 error_report("Can't get vdpa device features, got (%d)", ret);
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ
       [not found]                     ` <CAJaqyWds=97TjEpORiqhsj57KNxJ482jwcRS8TN59a4aank7-w@mail.gmail.com>
@ 2022-02-24  3:45                       ` Jason Wang
  0 siblings, 0 replies; 52+ messages in thread
From: Jason Wang @ 2022-02-24  3:45 UTC (permalink / raw)
  To: Eugenio Perez Martin
  Cc: Laurent Vivier, Parav Pandit, Cindy Lu, Michael S. Tsirkin,
	Richard Henderson, qemu-level, Gautam Dawar, Markus Armbruster,
	Eduardo Habkost, Harpreet Singh Anand, Xiao W Wang,
	Stefan Hajnoczi, Eli Cohen, Paolo Bonzini, Zhu Lingshan,
	virtualization, Eric Blake

On Wed, Feb 23, 2022 at 4:06 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Feb 23, 2022 at 4:47 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Feb 22, 2022 at 4:06 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Feb 22, 2022 at 8:41 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2022/2/17 下午4:22, Eugenio Perez Martin 写道:
> > > > > On Thu, Feb 17, 2022 at 7:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >> On Wed, Feb 16, 2022 at 11:54 PM Eugenio Perez Martin
> > > > >> <eperezma@redhat.com> wrote:
> > > > >>> On Tue, Feb 8, 2022 at 9:25 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>>
> > > > >>>> 在 2022/2/1 下午7:45, Eugenio Perez Martin 写道:
> > > > >>>>> On Sun, Jan 30, 2022 at 7:50 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> > > > >>>>>>> SVQ is able to log the dirty bits by itself, so let's use it to not
> > > > >>>>>>> block migration.
> > > > >>>>>>>
> > > > >>>>>>> Also, ignore set and clear of VHOST_F_LOG_ALL on set_features if SVQ is
> > > > >>>>>>> enabled. Even if the device supports it, the reports would be nonsense
> > > > >>>>>>> because SVQ memory is in the qemu region.
> > > > >>>>>>>
> > > > >>>>>>> The log region is still allocated. Future changes might skip that, but
> > > > >>>>>>> this series is already long enough.
> > > > >>>>>>>
> > > > >>>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > >>>>>>> ---
> > > > >>>>>>>     hw/virtio/vhost-vdpa.c | 20 ++++++++++++++++++++
> > > > >>>>>>>     1 file changed, 20 insertions(+)
> > > > >>>>>>>
> > > > >>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > >>>>>>> index fb0a338baa..75090d65e8 100644
> > > > >>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > > > >>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > > > >>>>>>> @@ -1022,6 +1022,9 @@ static int vhost_vdpa_get_features(struct vhost_dev *dev, uint64_t *features)
> > > > >>>>>>>         if (ret == 0 && v->shadow_vqs_enabled) {
> > > > >>>>>>>             /* Filter only features that SVQ can offer to guest */
> > > > >>>>>>>             vhost_svq_valid_guest_features(features);
> > > > >>>>>>> +
> > > > >>>>>>> +        /* Add SVQ logging capabilities */
> > > > >>>>>>> +        *features |= BIT_ULL(VHOST_F_LOG_ALL);
> > > > >>>>>>>         }
> > > > >>>>>>>
> > > > >>>>>>>         return ret;
> > > > >>>>>>> @@ -1039,8 +1042,25 @@ static int vhost_vdpa_set_features(struct vhost_dev *dev,
> > > > >>>>>>>
> > > > >>>>>>>         if (v->shadow_vqs_enabled) {
> > > > >>>>>>>             uint64_t dev_features, svq_features, acked_features;
> > > > >>>>>>> +        uint8_t status = 0;
> > > > >>>>>>>             bool ok;
> > > > >>>>>>>
> > > > >>>>>>> +        ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_STATUS, &status);
> > > > >>>>>>> +        if (unlikely(ret)) {
> > > > >>>>>>> +            return ret;
> > > > >>>>>>> +        }
> > > > >>>>>>> +
> > > > >>>>>>> +        if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > > >>>>>>> +            /*
> > > > >>>>>>> +             * vhost is trying to enable or disable _F_LOG, and the device
> > > > >>>>>>> +             * would report wrong dirty pages. SVQ handles it.
> > > > >>>>>>> +             */
> > > > >>>>>> I fail to understand this comment, I'd think there's no way to disable
> > > > >>>>>> dirty page tracking for SVQ.
> > > > >>>>>>
> > > > >>>>> vhost_log_global_{start,stop} are called at the beginning and end of
> > > > >>>>> migration. To inform the device that it should start logging, they set
> > > > >>>>> or clean VHOST_F_LOG_ALL at vhost_dev_set_log.
> > > > >>>>
> > > > >>>> Yes, but for SVQ, we can't disable dirty page tracking, isn't it? The
> > > > >>>> only thing is to ignore or filter out the F_LOG_ALL and pretend to be
> > > > >>>> enabled and disabled.
> > > > >>>>
> > > > >>> Yes, that's what this patch does.
> > > > >>>
> > > > >>>>> While SVQ does not use VHOST_F_LOG_ALL, it exports the feature bit so
> > > > >>>>> vhost does not block migration. Maybe we need to look for another way
> > > > >>>>> to do this?
> > > > >>>>
> > > > >>>> I'm fine with filtering since it's much more simpler, but I fail to
> > > > >>>> understand why we need to check DRIVER_OK.
> > > > >>>>
> > > > >>> Ok maybe I can make that part more clear,
> > > > >>>
> > > > >>> Since both operations use vhost_vdpa_set_features we must just filter
> > > > >>> the one that actually sets or removes VHOST_F_LOG_ALL, without
> > > > >>> affecting other features.
> > > > >>>
> > > > >>> In practice, that means to not forward the set features after
> > > > >>> DRIVER_OK. The device is not expecting them anymore.
> > > > >> I wonder what happens if we don't do this.
> > > > >>
> > > > > If we simply delete the check vhost_dev_set_features will return an
> > > > > error, failing the start of the migration. More on this below.
> > > >
> > > >
> > > > Ok.
> > > >
> > > >
> > > > >
> > > > >> So kernel had this check:
> > > > >>
> > > > >>          /*
> > > > >>           * It's not allowed to change the features after they have
> > > > >>           * been negotiated.
> > > > >>           */
> > > > >> if (ops->get_status(vdpa) & VIRTIO_CONFIG_S_FEATURES_OK)
> > > > >>          return -EBUSY;
> > > > >>
> > > > >> So is it FEATURES_OK actually?
> > > > >>
> > > > > Yes, FEATURES_OK seems more appropriate actually so I will switch to
> > > > > it for the next version.
> > > > >
> > > > > But it should be functionally equivalent, since
> > > > > vhost.c:vhost_dev_start sets both and the setting of _F_LOG_ALL cannot
> > > > > be concurrent with it.
> > > >
> > > >
> > > > Right.
> > > >
> > > >
> > > > >
> > > > >> For this patch, I wonder if the thing we need to do is to see whether
> > > > >> it is a enable/disable F_LOG_ALL and simply return.
> > > > >>
> > > > > Yes, that's the intention of the patch.
> > > > >
> > > > > We have 4 cases here:
> > > > > a) We're being called from vhost_dev_start, with enable_log = false
> > > > > b) We're being called from vhost_dev_start, with enable_log = true
> > > >
> > > >
> > > > And this case makes us can't simply return without calling vhost-vdpa.
> > > >
> > >
> > > It calls because {FEATURES,DRIVER}_OK is still not set at that point.
> > >
> > > >
> > > > > c) We're being called from vhost_dev_set_log, with enable_log = false
> > > > > d) We're being called from vhost_dev_set_log, with enable_log = true
> > > > >
> > > > > The way to tell the difference between a/b and c/d is to check if
> > > > > {FEATURES,DRIVER}_OK is set. And, as you point out in previous mails,
> > > > > F_LOG_ALL must be filtered unconditionally since SVQ tracks dirty
> > > > > memory through the memory unmapping, so we clear the bit
> > > > > unconditionally if we detect that VHOST_SET_FEATURES will be called
> > > > > (cases a and b).
> > > > >
> > > > > Another possibility is to track if features have been set with a bool
> > > > > in vhost_vdpa or something like that. But it seems cleaner to me to
> > > > > only store that in the actual device.
> > > >
> > > >
> > > > So I suggest to make sure codes match the comment:
> > > >
> > > >          if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> > > >              /*
> > > >               * vhost is trying to enable or disable _F_LOG, and the device
> > > >               * would report wrong dirty pages. SVQ handles it.
> > > >               */
> > > >              return 0;
> > > >          }
> > > >
> > > > It would be better to check whether the caller is toggling _F_LOG_ALL in
> > > > this case.
> > > >
> > >
> > > How to detect? We can save feature flags and compare, but ignoring all
> > > set_features after FEATURES_OK seems simpler to me.
> >
> > Something like:
> >
> > (status ^ status_old == _F_LOG_ALL) ?
> >
>
> s/status/features/ ?

Right.

>
> > It helps us to return errors on wrong features set during DRIVER_OK.
> >
>
> Do you mean to return errors in case of toggling other features than
> _F_LOG_ALL, isn't it? That's interesting actually, but it seems it
> forces vhost_vdpa to track acked_features too.

I meant we can change the check a little bit like:

if (featurs ^ features_old == _F_LOG_ALL && status &
VIRTIO_CONFIG_S_DRIVER_OK) {
    return 0;
}

For other features changing we and let it go down the logic as you
proposed in this patch.

Thanks

>
> Actually, it seems to me vhost_dev->acked_features will retain the bad
> features even on error. I'll investigate it.
>
> Thanks!
>
>
> > Thanks
> >
> > >
> > > Would changing the comment work? Something like "set_features after
> > > _S_FEATURES_OK means vhost is trying to enable or disable _F_LOG, and
> > > the device would report wrong dirty pages. SVQ handles it."
> > >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2022-02-24  3:45 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20220121202733.404989-1-eperezma@redhat.com>
     [not found] ` <20220121202733.404989-22-eperezma@redhat.com>
2022-01-24  4:32   ` [PATCH 21/31] util: Add iova_tree_alloc Peter Xu
     [not found]     ` <CAJaqyWf--wbNZz5ZzbpixD9op_fO5fV01kbYXzG097c_NkqYrw@mail.gmail.com>
2022-01-24 11:07       ` Peter Xu
     [not found]         ` <CAJaqyWcdpTr2X4VuAN2NLmpviCjDoAaY269+VQGZ7-F6myOhSw@mail.gmail.com>
2022-01-27  8:06           ` Peter Xu
     [not found]             ` <CAJaqyWczZ7C_vbwugyN9bEgOVuRokGqVMb_g5UK_R4F8O+qKOA@mail.gmail.com>
2022-01-28  3:57               ` Peter Xu
2022-01-28  5:55                 ` Jason Wang
2022-01-30  5:06       ` Jason Wang
     [not found] ` <20220121202733.404989-2-eperezma@redhat.com>
2022-01-28  5:59   ` [PATCH 01/31] vdpa: Reorder virtio/vhost-vdpa.c functions Jason Wang
     [not found]     ` <CAJaqyWffGzYv2+HufFZzzBPtu5z3_vaKh4evGXqj7hqTB0WU3A@mail.gmail.com>
2022-02-21  7:31       ` Jason Wang
     [not found] ` <20220121202733.404989-3-eperezma@redhat.com>
2022-01-28  6:00   ` [PATCH 02/31] vhost: Add VhostShadowVirtqueue Jason Wang
2022-01-28  6:02 ` [PATCH 00/31] vDPA shadow virtqueue Jason Wang
     [not found]   ` <CAJaqyWfWxQSJc3YMpF6g7VwZBN_ab0Z+1nXgWH1sg+uBaOYgBQ@mail.gmail.com>
2022-02-08  8:27     ` Jason Wang
     [not found] ` <20220121202733.404989-4-eperezma@redhat.com>
2022-01-28  6:03   ` [PATCH 03/31] vdpa: Add vhost_svq_get_dev_kick_notifier Jason Wang
     [not found] ` <20220121202733.404989-5-eperezma@redhat.com>
2022-01-28  6:29   ` [PATCH 04/31] vdpa: Add vhost_svq_set_svq_kick_fd Jason Wang
     [not found]     ` <CAJaqyWc7fbgN-W7y3=iFqHsJzj+1Mg0cuwSu+my=62nu9vGOqA@mail.gmail.com>
2022-02-08  8:47       ` Jason Wang
     [not found] ` <20220121202733.404989-6-eperezma@redhat.com>
2022-01-28  6:32   ` [PATCH 05/31] vhost: Add Shadow VirtQueue kick forwarding capabilities Jason Wang
     [not found] ` <20220121202733.404989-7-eperezma@redhat.com>
2022-01-28  6:56   ` [PATCH 06/31] vhost: Route guest->host notification through shadow virtqueue Jason Wang
     [not found]     ` <CAJaqyWeRbmwW80q3q52nFw=iz1xcPRFviFaRHo0nzXpEb+3m3A@mail.gmail.com>
2022-02-08  9:02       ` Jason Wang
     [not found] ` <20220121202733.404989-8-eperezma@redhat.com>
2022-01-29  7:57   ` [PATCH 07/31] vhost: dd vhost_svq_get_svq_call_notifier Jason Wang
     [not found] ` <20220121202733.404989-10-eperezma@redhat.com>
2022-01-29  8:05   ` [PATCH 09/31] vhost-vdpa: Take into account SVQ in vhost_vdpa_set_vring_call Jason Wang
     [not found]     ` <CAJaqyWda5sBw9VGBrz8g60OJ07Eeq45RRYu9vwgOPZFwten9rw@mail.gmail.com>
2022-02-08  3:23       ` Jason Wang
     [not found]         ` <CAJaqyWeisXmZ9+xw2Rj50K7aKx4khNZZjLZEz4MY97B9pQQm3w@mail.gmail.com>
2022-02-21  7:39           ` Jason Wang
     [not found]             ` <CAJaqyWc5uR70a=hTpVpomuahF9iZouLmRpXPnWidga5CFxJOpA@mail.gmail.com>
2022-02-22  7:18               ` Jason Wang
     [not found] ` <20220121202733.404989-12-eperezma@redhat.com>
2022-01-29  8:11   ` [PATCH 11/31] vhost: Add vhost_svq_valid_device_features to shadow vq Jason Wang
     [not found]     ` <CAJaqyWfaf0RG9AzW4ktH2L3wyfOGuSk=rNm-j7xRkpdfVvkY-g@mail.gmail.com>
     [not found]       ` <CAJaqyWc6BqJBDcUE36AQ=bgWjJYkyMo1ZYxRwmc5ZgGj4T-pVg@mail.gmail.com>
2022-02-08  3:37         ` Jason Wang
     [not found] ` <20220121202733.404989-16-eperezma@redhat.com>
2022-01-29  8:14   ` [PATCH 15/31] vdpa: Add vhost_svq_get_num Jason Wang
     [not found] ` <20220121202733.404989-17-eperezma@redhat.com>
2022-01-29  8:20   ` [PATCH 16/31] vhost: pass queue index to vhost_vq_get_addr Jason Wang
     [not found]     ` <CAJaqyWexu=VroHQxmtJDQm=iu1va-s1VGR8hqGOreG0SOisjYg@mail.gmail.com>
2022-02-08  6:58       ` Jason Wang
     [not found] ` <20220121202733.404989-18-eperezma@redhat.com>
2022-01-30  4:03   ` [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq Jason Wang
     [not found]     ` <CAJaqyWdRKZp6CwnE+HAr0JALhSRh-trJbZ01kddnLTuRX_tMKQ@mail.gmail.com>
2022-02-08  3:57       ` Jason Wang
     [not found]         ` <CAJaqyWfEEg2PKgxBAFwYhF9LD1oDtwVYXSjHHnCbstT3dvL2GA@mail.gmail.com>
2022-02-21  7:15           ` Jason Wang
     [not found]             ` <CAJaqyWcoHgToqsR-bVRctTnhgufmarR_2hh4O_VoCbCGp8WNhg@mail.gmail.com>
2022-02-22  3:16               ` Jason Wang
     [not found]                 ` <CAJaqyWd2PQFedaEOV7YVZgp0m37snn-4LYYtNw7g4u+7hrtq=Q@mail.gmail.com>
2022-02-22  7:59                   ` Jason Wang
     [not found] ` <20220121202733.404989-23-eperezma@redhat.com>
2022-01-30  5:21   ` [PATCH 22/31] vhost: Add VhostIOVATree Jason Wang
     [not found]     ` <CAJaqyWePW6hJKAm7nk+syqmXAgdTQSTtuv9jACu_+hgbg2bRHg@mail.gmail.com>
2022-02-08  8:17       ` Jason Wang
     [not found] ` <20220121202733.404989-24-eperezma@redhat.com>
2022-01-30  5:57   ` [PATCH 23/31] vdpa: Add custom IOTLB translations to SVQ Jason Wang
     [not found]     ` <CAJaqyWe1zH8bfaoxTyz_RXH=0q+Yk9H7QyUffaRB1fCV9oVLZQ@mail.gmail.com>
2022-02-08  8:19       ` Jason Wang
     [not found] ` <20220121202733.404989-19-eperezma@redhat.com>
2022-01-30  4:42   ` [PATCH 18/31] vhost: Shadow virtqueue buffers forwarding Jason Wang
     [not found]     ` <CAJaqyWdDax2+e3ZUEYyYNe5xAL=Oocu+72n89ygayrzYrQz2Yw@mail.gmail.com>
2022-02-08  8:11       ` Jason Wang
     [not found]         ` <CAJaqyWfRWexq7jrCkJrPzLB4g_fK42pE8BarMhZwKNYtNXi7XA@mail.gmail.com>
2022-02-23  2:03           ` Jason Wang
2022-01-30  6:46   ` Jason Wang
     [not found]     ` <CAJaqyWfF01k3LntM7RLEmFcej=EY2d4+2MARKXPptQ2J7VnB9A@mail.gmail.com>
2022-02-08  8:15       ` Jason Wang
     [not found]         ` <CAJaqyWedqtzRW=ur7upchneSc-oOkvkr3FUph_BfphV3zTmnkw@mail.gmail.com>
2022-02-21  7:43           ` Jason Wang
     [not found]             ` <CAJaqyWcHhMpjJ4kde1ejV5c_vP7_8PvfXpi5u9rdWuaORFt_zg@mail.gmail.com>
2022-02-22  7:26               ` Jason Wang
     [not found]                 ` <CAJaqyWePWg+eeQjjcMh24k0K+yUQUF2x0yXH32tPPWEw_wYP0Q@mail.gmail.com>
2022-02-23  2:26                   ` Jason Wang
     [not found] ` <20220121202733.404989-29-eperezma@redhat.com>
2022-01-30  6:50   ` [PATCH 28/31] vdpa: Expose VHOST_F_LOG_ALL on SVQ Jason Wang
     [not found]     ` <CAJaqyWdBLU+maEhByepzeH7iwLmqUba0rRb8PM4VwBy2P8Vtow@mail.gmail.com>
2022-02-08  8:25       ` Jason Wang
     [not found]         ` <CAJaqyWcvWjPas0=xp+U-c-kG+e6k73jg=C4phFD7S-tZY=niSQ@mail.gmail.com>
2022-02-17  6:02           ` Jason Wang
     [not found]             ` <CAJaqyWdhHmD+tB_bY_YEMnBU1p7-LW=LP8f+3e_ZXDcOfSRiNA@mail.gmail.com>
2022-02-22  7:41               ` Jason Wang
     [not found]                 ` <CAJaqyWfFC4SgxQ4zQeHgtDDJSd0tBa-W4HmtW0UASA2cVDWDUg@mail.gmail.com>
2022-02-23  3:46                   ` Jason Wang
     [not found]                     ` <CAJaqyWds=97TjEpORiqhsj57KNxJ482jwcRS8TN59a4aank7-w@mail.gmail.com>
2022-02-24  3:45                       ` Jason Wang
     [not found] ` <20220121202733.404989-30-eperezma@redhat.com>
2022-01-30  6:51   ` [PATCH 29/31] vdpa: Make ncs autofree Jason Wang
     [not found] ` <20220121202733.404989-31-eperezma@redhat.com>
2022-01-30  6:53   ` [PATCH 30/31] vdpa: Move vhost_vdpa_get_iova_range to net/vhost-vdpa.c Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).