qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: qemu-devel@nongnu.org, si-wei.liu@oracle.com,
	Liuxiangdong <liuxiangdong5@huawei.com>,
	Zhu Lingshan <lingshan.zhu@intel.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>,
	alvaro.karsz@solid-run.com, Shannon Nelson <snelson@pensando.io>,
	Laurent Vivier <lvivier@redhat.com>,
	Harpreet Singh Anand <hanand@xilinx.com>,
	Gautam Dawar <gdawar@xilinx.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>, Cindy Lu <lulu@redhat.com>,
	Eli Cohen <eli@mellanox.com>, Paolo Bonzini <pbonzini@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Parav Pandit <parav@mellanox.com>
Subject: Re: [RFC v2 04/13] vdpa: rewind at get_base, not set_base
Date: Tue, 17 Jan 2023 12:38:43 +0800	[thread overview]
Message-ID: <0ce077ab-068f-e715-d4de-21a00f579860@redhat.com> (raw)
In-Reply-To: <CAJaqyWeY30QETgksM2_zrc8xvOABSTAhwFUXRJRHumX0FFrqpw@mail.gmail.com>


在 2023/1/16 17:53, Eugenio Perez Martin 写道:
> On Mon, Jan 16, 2023 at 4:32 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2023/1/13 15:40, Eugenio Perez Martin 写道:
>>> On Fri, Jan 13, 2023 at 5:10 AM Jason Wang <jasowang@redhat.com> wrote:
>>>> On Fri, Jan 13, 2023 at 1:24 AM Eugenio Pérez <eperezma@redhat.com> wrote:
>>>>> At this moment it is only possible to migrate to a vdpa device running
>>>>> with x-svq=on. As a protective measure, the rewind of the inflight
>>>>> descriptors was done at the destination. That way if the source sent a
>>>>> virtqueue with inuse descriptors they are always discarded.
>>>>>
>>>>> Since this series allows to migrate also to passthrough devices with no
>>>>> SVQ, the right thing to do is to rewind at the source so base of vrings
>>>>> are correct.
>>>>>
>>>>> Support for inflight descriptors may be added in the future.
>>>>>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> ---
>>>>>    include/hw/virtio/vhost-backend.h |  4 +++
>>>>>    hw/virtio/vhost-vdpa.c            | 46 +++++++++++++++++++------------
>>>>>    hw/virtio/vhost.c                 |  3 ++
>>>>>    3 files changed, 36 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
>>>>> index c5ab49051e..ec3fbae58d 100644
>>>>> --- a/include/hw/virtio/vhost-backend.h
>>>>> +++ b/include/hw/virtio/vhost-backend.h
>>>>> @@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev *dev);
>>>>>
>>>>>    typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
>>>>>                                           int fd);
>>>>> +
>>>>> +typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
>>>>> +
>>>>>    typedef struct VhostOps {
>>>>>        VhostBackendType backend_type;
>>>>>        vhost_backend_init vhost_backend_init;
>>>>> @@ -177,6 +180,7 @@ typedef struct VhostOps {
>>>>>        vhost_get_device_id_op vhost_get_device_id;
>>>>>        vhost_force_iommu_op vhost_force_iommu;
>>>>>        vhost_set_config_call_op vhost_set_config_call;
>>>>> +    vhost_reset_status_op vhost_reset_status;
>>>>>    } VhostOps;
>>>>>
>>>>>    int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>> index 542e003101..28a52ddc78 100644
>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>> @@ -1132,14 +1132,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
>>>>>        if (started) {
>>>>>            memory_listener_register(&v->listener, &address_space_memory);
>>>>>            return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>>>>> -    } else {
>>>>> -        vhost_vdpa_reset_device(dev);
>>>>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>>> -                                   VIRTIO_CONFIG_S_DRIVER);
>>>>> -        memory_listener_unregister(&v->listener);
>>>>> +    }
>>>>>
>>>>> -        return 0;
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +static void vhost_vdpa_reset_status(struct vhost_dev *dev)
>>>>> +{
>>>>> +    struct vhost_vdpa *v = dev->opaque;
>>>>> +
>>>>> +    if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
>>>>> +        return;
>>>>>        }
>>>>> +
>>>>> +    vhost_vdpa_reset_device(dev);
>>>>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>>> +                                VIRTIO_CONFIG_S_DRIVER);
>>>>> +    memory_listener_unregister(&v->listener);
>>>>>    }
>>>>>
>>>>>    static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
>>>>> @@ -1182,18 +1191,7 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
>>>>>                                           struct vhost_vring_state *ring)
>>>>>    {
>>>>>        struct vhost_vdpa *v = dev->opaque;
>>>>> -    VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
>>>>>
>>>>> -    /*
>>>>> -     * vhost-vdpa devices does not support in-flight requests. Set all of them
>>>>> -     * as available.
>>>>> -     *
>>>>> -     * TODO: This is ok for networking, but other kinds of devices might
>>>>> -     * have problems with these retransmissions.
>>>>> -     */
>>>>> -    while (virtqueue_rewind(vq, 1)) {
>>>>> -        continue;
>>>>> -    }
>>>>>        if (v->shadow_vqs_enabled) {
>>>>>            /*
>>>>>             * Device vring base was set at device start. SVQ base is handled by
>>>>> @@ -1212,6 +1210,19 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
>>>>>        int ret;
>>>>>
>>>>>        if (v->shadow_vqs_enabled) {
>>>>> +        VirtQueue *vq = virtio_get_queue(dev->vdev, ring->index);
>>>>> +
>>>>> +        /*
>>>>> +         * vhost-vdpa devices does not support in-flight requests. Set all of
>>>>> +         * them as available.
>>>>> +         *
>>>>> +         * TODO: This is ok for networking, but other kinds of devices might
>>>>> +         * have problems with these retransmissions.
>>>>> +         */
>>>>> +        while (virtqueue_rewind(vq, 1)) {
>>>>> +            continue;
>>>>> +        }
>>>>> +
>>>>>            ring->num = virtio_queue_get_last_avail_idx(dev->vdev, ring->index);
>>>>>            return 0;
>>>>>        }
>>>>> @@ -1326,4 +1337,5 @@ const VhostOps vdpa_ops = {
>>>>>            .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
>>>>>            .vhost_force_iommu = vhost_vdpa_force_iommu,
>>>>>            .vhost_set_config_call = vhost_vdpa_set_config_call,
>>>>> +        .vhost_reset_status = vhost_vdpa_reset_status,
>>>> Can we simply use the NetClient stop method here?
>>>>
>>> Ouch, I squashed two patches by mistake here.
>>>
>>> All the vhost_reset_status part should be independent of this patch,
>>> and I was especially interested in its feedback. It had this message:
>>>
>>>       vdpa: move vhost reset after get vring base
>>>
>>>       The function vhost.c:vhost_dev_stop calls vhost operation
>>>       vhost_dev_start(false). In the case of vdpa it totally reset and wipes
>>>       the device, making the fetching of the vring base (virtqueue state) totally
>>>       useless.
>>>
>>>       The kernel backend does not use vhost_dev_start vhost op callback, but
>>>       vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
>>>       is desirable, but it can be added on top.
>>>
>>> I can resend the series splitting it again but conversation may
>>> scatter between versions. Would you prefer me to send a new version?
>>
>> I think it can be done in next version (after we finalize the discussion
>> for this version).
>>
>>
>>> Regarding the use of NetClient, it feels weird to call net specific
>>> functions in VhostOps, doesn't it?
>>
>> Basically, I meant, the patch call vhost_reset_status() in
>> vhost_dev_stop(). But we've already had vhost_dev_start ops where we
>> implement per backend start/stop logic.
>>
>> I think it's better to do things in vhost_dev_start():
>>
>> For device that can do suspend, we can do suspend. For other we need to
>> do reset as a workaround.
>>
> If the device implements _F_SUSPEND we can call suspend in
> vhost_dev_start(false) and fetch the vq base after it. But we cannot
> call vhost_dev_reset until we get the vq base. If we do it, we will
> always get zero there.


I'm not sure I understand here, that is kind of expected. For the device 
that doesn't support suspend, we can't get base anyhow since we need to 
emulate the stop with reset then we lose all the states.


>
> If we don't reset the device at vhost_vdpa_dev_start(false) we need to
> call a proper reset after getting the base, at least in vdpa.


This looks racy if we do get base before reset? Device can move the 
last_avail_idx.


> So to
> create a new vhost_op should be the right thing to do, isn't it?


So we did:

vhost_dev_stop()
     hdev->vhost_ops->vhost_dev_start(hdev, false);
     vhost_virtqueue_stop()
         vhost_get_vring_base()

I don't see any issue if we do suspend in vhost_dev_stop() in this case?

For the device that doesn't support suspend, we do reset in the stop and 
fail the get_vring_base() then we can use software fallback 
virtio_queue_restore_last_avail_idx()

?


>
> Hopefully with a better name than vhost_vdpa_reset_status, that's for sure :).
>
> I'm not sure how vhost-user works with this or when it does reset the
> indexes. My bet is that it never does at the device reinitialization
> and it trusts VMM calls to vhost_user_set_base but I may be wrong.


I think it's more safe to not touch the code path for vhost-user, it may 
connect to various kind of backends some of which might be fragile.

Thanks


>
> Thanks!
>
>> And if necessary, we can call nc client ops for net specific operations
>> (if it has any).
>>
>> Thanks
>>
>>
>>> At the moment vhost ops is
>>> specialized in vhost-kernel, vhost-user and vhost-vdpa. If we want to
>>> make it specific to the kind of device, that makes vhost-vdpa-net too.
>>>
>>> Thanks!
>>>
>>>
>>>> Thanks
>>>>
>>>>>    };
>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>> index eb8c4c378c..a266396576 100644
>>>>> --- a/hw/virtio/vhost.c
>>>>> +++ b/hw/virtio/vhost.c
>>>>> @@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings)
>>>>>                                 hdev->vqs + i,
>>>>>                                 hdev->vq_index + i);
>>>>>        }
>>>>> +    if (hdev->vhost_ops->vhost_reset_status) {
>>>>> +        hdev->vhost_ops->vhost_reset_status(hdev);
>>>>> +    }
>>>>>
>>>>>        if (vhost_dev_has_iommu(hdev)) {
>>>>>            if (hdev->vhost_ops->vhost_set_iotlb_callback) {
>>>>> --
>>>>> 2.31.1
>>>>>



  reply	other threads:[~2023-01-17  4:39 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-12 17:24 [RFC v2 00/13] Dinamycally switch to vhost shadow virtqueues at vdpa net migration Eugenio Pérez
2023-01-12 17:24 ` [RFC v2 01/13] vdpa: fix VHOST_BACKEND_F_IOTLB_ASID flag check Eugenio Pérez
2023-01-13  3:12   ` Jason Wang
2023-01-13  6:42     ` Eugenio Perez Martin
2023-01-16  3:01       ` Jason Wang
2023-01-12 17:24 ` [RFC v2 02/13] vdpa net: move iova tree creation from init to start Eugenio Pérez
2023-01-13  3:53   ` Jason Wang
2023-01-13  7:28     ` Eugenio Perez Martin
2023-01-16  3:05       ` Jason Wang
2023-01-16  9:14         ` Eugenio Perez Martin
2023-01-17  4:30           ` Jason Wang
2023-01-12 17:24 ` [RFC v2 03/13] vdpa: copy cvq shadow_data from data vqs, not from x-svq Eugenio Pérez
2023-01-12 17:24 ` [RFC v2 04/13] vdpa: rewind at get_base, not set_base Eugenio Pérez
2023-01-13  4:09   ` Jason Wang
2023-01-13  7:40     ` Eugenio Perez Martin
2023-01-16  3:32       ` Jason Wang
2023-01-16  9:53         ` Eugenio Perez Martin
2023-01-17  4:38           ` Jason Wang [this message]
2023-01-17  6:57             ` Eugenio Perez Martin
2023-01-12 17:24 ` [RFC v2 05/13] vdpa net: add migration blocker if cannot migrate cvq Eugenio Pérez
2023-01-13  4:24   ` Jason Wang
2023-01-13  7:46     ` Eugenio Perez Martin
2023-01-16  3:34       ` Jason Wang
2023-01-16  5:23         ` Michael S. Tsirkin
2023-01-16  9:33           ` Eugenio Perez Martin
2023-01-17  5:42             ` Jason Wang
2023-01-12 17:24 ` [RFC v2 06/13] vhost: delay set_vring_ready after DRIVER_OK Eugenio Pérez
2023-01-13  4:36   ` Jason Wang
2023-01-13  8:19     ` Eugenio Perez Martin
2023-01-13  9:51       ` Stefano Garzarella
2023-01-13 10:03         ` Eugenio Perez Martin
2023-01-13 10:37           ` Stefano Garzarella
2023-01-17 15:15           ` Maxime Coquelin
2023-01-16  6:36       ` Jason Wang
2023-01-16 16:16         ` Eugenio Perez Martin
2023-01-17  5:36           ` Jason Wang
2023-01-12 17:24 ` [RFC v2 07/13] vdpa: " Eugenio Pérez
2023-01-12 17:24 ` [RFC v2 08/13] vdpa: Negotiate _F_SUSPEND feature Eugenio Pérez
2023-01-13  4:39   ` Jason Wang
2023-01-13  8:45     ` Eugenio Perez Martin
2023-01-16  6:48       ` Jason Wang
2023-01-16 16:17         ` Eugenio Perez Martin
2023-01-12 17:24 ` [RFC v2 09/13] vdpa: add feature_log parameter to vhost_vdpa Eugenio Pérez
2023-01-12 17:24 ` [RFC v2 10/13] vdpa net: allow VHOST_F_LOG_ALL Eugenio Pérez
2023-01-13  4:42   ` Jason Wang
2023-01-12 17:24 ` [RFC v2 11/13] vdpa: add vdpa net migration state notifier Eugenio Pérez
2023-01-13  4:54   ` Jason Wang
2023-01-13  9:00     ` Eugenio Perez Martin
2023-01-16  6:51       ` Jason Wang
2023-01-16 15:21         ` Eugenio Perez Martin
2023-01-17  9:58       ` Dr. David Alan Gilbert
2023-01-17 10:23         ` Eugenio Perez Martin
2023-01-17 12:54           ` Dr. David Alan Gilbert
2023-02-02  1:52   ` Si-Wei Liu
2023-02-02 15:28     ` Eugenio Perez Martin
2023-02-04  2:03       ` Si-Wei Liu
2023-02-13  9:47         ` Eugenio Perez Martin
2023-02-13 22:36           ` Si-Wei Liu
2023-02-14 18:51             ` Eugenio Perez Martin
2023-02-12 14:31     ` Eli Cohen
2023-01-12 17:24 ` [RFC v2 12/13] vdpa: preemptive kick at enable Eugenio Pérez
2023-01-13  2:31   ` Jason Wang
2023-01-13  3:25     ` Zhu, Lingshan
2023-01-13  3:39       ` Jason Wang
2023-01-13  9:06         ` Eugenio Perez Martin
2023-01-16  7:02           ` Jason Wang
2023-02-02 16:55             ` Eugenio Perez Martin
2023-02-02  0:56           ` Si-Wei Liu
2023-02-02 16:53             ` Eugenio Perez Martin
2023-02-04 11:04               ` Si-Wei Liu
2023-02-05 10:00                 ` Michael S. Tsirkin
2023-02-06  5:08                   ` Si-Wei Liu
2023-01-12 17:24 ` [RFC v2 13/13] vdpa: Conditionally expose _F_LOG in vhost_net devices Eugenio Pérez
2023-02-02  1:00 ` [RFC v2 00/13] Dinamycally switch to vhost shadow virtqueues at vdpa net migration Si-Wei Liu
2023-02-02 11:27   ` Eugenio Perez Martin
2023-02-03  5:08     ` Si-Wei Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ce077ab-068f-e715-d4de-21a00f579860@redhat.com \
    --to=jasowang@redhat.com \
    --cc=alvaro.karsz@solid-run.com \
    --cc=arei.gonglei@huawei.com \
    --cc=cohuck@redhat.com \
    --cc=eli@mellanox.com \
    --cc=eperezma@redhat.com \
    --cc=gdawar@xilinx.com \
    --cc=hanand@xilinx.com \
    --cc=lingshan.zhu@intel.com \
    --cc=liuxiangdong5@huawei.com \
    --cc=lulu@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=mst@redhat.com \
    --cc=parav@mellanox.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=si-wei.liu@oracle.com \
    --cc=snelson@pensando.io \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).