qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Si-Wei Liu <si-wei.liu@oracle.com>
Cc: Cindy Lu <lulu@redhat.com>, mst <mst@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	eperezma <eperezma@redhat.com>,
	Michael Qiu <qiudayu@archeros.com>,
	Zhu Lingshan <lingshan.zhu@intel.com>
Subject: Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
Date: Thu, 7 Apr 2022 15:50:35 +0800	[thread overview]
Message-ID: <6a1e7563-94d9-ea30-d94c-a2197c04ecac@redhat.com> (raw)
In-Reply-To: <379df944-f46a-ecbc-3566-8776cfefa3f1@oracle.com>


在 2022/4/6 上午8:56, Si-Wei Liu 写道:
>
>
> On 4/1/2022 7:20 PM, Jason Wang wrote:
>> Adding Michael.
>>
>> On Sat, Apr 2, 2022 at 7:08 AM Si-Wei Liu <si-wei.liu@oracle.com> wrote:
>>>
>>>
>>> On 3/31/2022 7:53 PM, Jason Wang wrote:
>>>> On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu <qiudayu@archeros.com> 
>>>> wrote:
>>>>> Currently, when VM poweroff, it will trigger vdpa
>>>>> device(such as mlx bluefield2 VF) reset many times(with 1 datapath
>>>>> queue pair and one control queue, triggered 3 times), this
>>>>> leads to below issue:
>>>>>
>>>>> vhost VQ 2 ring restore failed: -22: Invalid argument (22)
>>>>>
>>>>> This because in vhost_net_stop(), it will stop all vhost device 
>>>>> bind to
>>>>> this virtio device, and in vhost_dev_stop(), qemu tries to stop 
>>>>> the device
>>>>> , then stop the queue: vhost_virtqueue_stop().
>>>>>
>>>>> In vhost_dev_stop(), it resets the device, which clear some flags
>>>>> in low level driver, and in next loop(stop other vhost backends),
>>>>> qemu try to stop the queue corresponding to the vhost backend,
>>>>>    the driver finds that the VQ is invalied, this is the root cause.
>>>>>
>>>>> To solve the issue, vdpa should set vring unready, and
>>>>> remove reset ops in device stop: vhost_dev_start(hdev, false).
>>>>>
>>>>> and implement a new function vhost_dev_reset, only reset backend
>>>>> device after all vhost(per-queue) stoped.
>>>> Typo.
>>>>
>>>>> Signed-off-by: Michael Qiu<qiudayu@archeros.com>
>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>> Rethink this patch, consider there're devices that don't support
>>>> set_vq_ready(). I wonder if we need
>>>>
>>>> 1) uAPI to tell the user space whether or not it supports 
>>>> set_vq_ready()
>>> I guess what's more relevant here is to define the uAPI semantics for
>>> unready i.e. set_vq_ready(0) for resuming/stopping virtqueue 
>>> processing,
>>> as starting vq is comparatively less ambiguous.
>> Yes.
>>
>>> Considering the
>>> likelihood that this interface may be used for live migration, it would
>>> be nice to come up with variants such as 1) discard inflight request
>>> v.s. 2) waiting for inflight processing to be done,
>> Or inflight descriptor reporting (which seems to be tricky). But we
>> can start from net that a discarding may just work.
>>
>>> and 3) timeout in
>>> waiting.
>> Actually, that's the plan and Eugenio is proposing something like this
>> via virtio spec:
>>
>> https://urldefense.com/v3/__https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html__;!!ACWV5N9M2RV99hQ!bcX6i6_atR-6Gcl-4q5Tekab_xDuXr7lDAMw2E1hilZ_1cZIX1c5mztQtvsnjiiy$ 
>>
> Thanks for the pointer, I seem to recall I saw it some time back 
> though I wonder if there's follow-up for the v3? My impression was 
> that this is still a work-in-progress spec proposal, while the 
> semantics of various F_STOP scenario is unclear yet and not all of the 
> requirements (ex: STOP_FAILED, rewind & !IN_ORDER) for live migration 
> do seem to get accommodated?


My understanding is that, the reason for STOP_FAILED and IN_ORDER is 
because we don't have a way to report inflight descriptors. We will try 
to overcome those by allow the device to report inflight descriptors in 
the next version.


>
>>
>>>> 2) userspace will call SET_VRING_ENABLE() when the device supports
>>>> otherwise it will use RESET.
>>> Are you looking to making virtqueue resume-able through the new
>>> SET_VRING_ENABLE() uAPI?
>>>
>>> I think RESET is inevitable in some case, i.e. when guest initiates
>>> device reset by writing 0 to the status register.
>> Yes, that's all my plan.
>>
>>> For suspend/resume and
>>> live migration use cases, indeed RESET can be substituted with
>>> SET_VRING_ENABLE. Again, it'd need quite some code refactoring to
>>> accommodate this change. Although I'm all for it, it'd be the best to
>>> lay out the plan for multiple phases rather than overload this single
>>> patch too much. You can count my time on this endeavor if you don't 
>>> mind. :)
>> You're welcome, I agree we should choose a way to go first:
>>
>> 1) manage to use SET_VRING_ENABLE (more like a workaround anyway)
> For networking device and the vq suspend/resume and live migration use 
> cases to support, I thought it might suffice?


Without config space change it would be sufficient. And anyhow the vDPA 
parent can prevent the config change if all the virtqueue is disabled.


> We may drop inflight or unused ones for Ethernet...


Yes.


> What other part do you think may limit its extension to become a 
> general uAPI or add new uAPI to address similar VQ stop requirement if 
> need be? 


For networking, we don't need other.


> Or we might well define subsystem specific uAPI to stop the virtqueue, 
> for vdpa device specifically?


Anyhow we need a uAPI consider we have some parent that doesn't support 
virtqueue stop. So this could be another way to go.

But if we decide to bother with new uAPI, I would rather go with a new 
uAPI for stop the device. It can help for the config space change as well.


> I think the point here is given that we would like to avoid guest side 
> modification to support live migration, we can define specific uAPI 
> for specific live migration requirement without having to involve 
> guest driver change. It'd be easy to get started this way and 
> generalize them all to a full blown _S_STOP when things are eventually 
> settled.


Yes, note that the status seen by guest is mediated by the hypervisor. 
So the hypervisor can choose not to hide the _S_STOP from the guest to 
keep the migration work without modifications in the guest driver.


>
>> 2) go with virtio-spec (may take a while)
> I feel it might be still quite early for now to get to a full blown 
> _S_STOP spec level amendment that works for all types of virtio 
> (vendor) devices. Generally there can be very specific 
> subsystem-dependent ways to stop each type of virtio devices that 
> satisfies the live migration of virtio subsystem devices.


Yes.


> For now the discussion mostly concerns with vq index rewind, inflight 
> handling, notification interrupt and configuration space such kind of 
> virtio level things, but real device backend has implication on the 
> other parts such as the order of IO/DMA quiescing and interrupt masking.


It's the charge of the vDPA parent to perform any necessary quiescing to 
satisfy the semantic of _S_STOP, it's an implementation detail which is 
out of the scope of the spec.


> If the subsystem virtio guest drivers today somehow don't support any 
> of those _S_STOP new behaviors, I guess it's with little point to 
> introduce the same or similar _S_STOP functionality to the guest 
> driver to effectively support live migration.


See above, the live migration is transparent to the guest. For the 
driver that doesn't support _S_STOP, we can still live migrate it. The 
only interesting part is nesting: if we want to live migrate a nested 
guest we need to the guest driver must support _S_STOP.

Thanks


>
>
> Thanks,
> -Siwei
>> 3) don't wait for the spec, have a vDPA specific uAPI first. Note that
>> I've chatted with most of the vendors and they seem to be fine with
>> the _S_STOP. If we go this way, we can still provide the forward
>> compatibility of _S_STOP
>> 4) or do them all (in parallel)
>>
>> Any thoughts?
>>
>> Thanks
>>
>>>> And for safety, I suggest tagging this as 7.1.
>>> +1
>>>
>>> Regards,
>>> -Siwei
>>>
>>>>> ---
>>>>> v4 --> v3
>>>>>       Nothing changed, becasue of issue with mimecast,
>>>>>       when the From: tag is different from the sender,
>>>>>       the some mail client will take the patch as an
>>>>>       attachment, RESEND v3 does not work, So resend
>>>>>       the patch as v4
>>>>>
>>>>> v3 --> v2:
>>>>>       Call vhost_dev_reset() at the end of vhost_net_stop().
>>>>>
>>>>>       Since the vDPA device need re-add the status bit
>>>>>       VIRTIO_CONFIG_S_ACKNOWLEDGE and VIRTIO_CONFIG_S_DRIVER,
>>>>>       simply, add them inside vhost_vdpa_reset_device, and
>>>>>       the only way calling vhost_vdpa_reset_device is in
>>>>>       vhost_net_stop(), so it keeps the same behavior as before.
>>>>>
>>>>> v2 --> v1:
>>>>>      Implement a new function vhost_dev_reset,
>>>>>      reset the backend kernel device at last.
>>>>> ---
>>>>>    hw/net/vhost_net.c        | 24 +++++++++++++++++++++---
>>>>>    hw/virtio/vhost-vdpa.c    | 15 +++++++++------
>>>>>    hw/virtio/vhost.c         | 15 ++++++++++++++-
>>>>>    include/hw/virtio/vhost.h |  1 +
>>>>>    4 files changed, 45 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>> index 30379d2..422c9bf 100644
>>>>> --- a/hw/net/vhost_net.c
>>>>> +++ b/hw/net/vhost_net.c
>>>>> @@ -325,7 +325,7 @@ int vhost_net_start(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>>        int nvhosts = data_queue_pairs + cvq;
>>>>> -    struct vhost_net *net;
>>>>> +    struct vhost_net *net = NULL;
>>>>>        int r, e, i, index_end = data_queue_pairs * 2;
>>>>>        NetClientState *peer;
>>>>>
>>>>> @@ -391,8 +391,17 @@ int vhost_net_start(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>    err_start:
>>>>>        while (--i >= 0) {
>>>>>            peer = qemu_get_peer(ncs , i);
>>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>>> +
>>>>> +        net = get_vhost_net(peer);
>>>>> +
>>>>> +        vhost_net_stop_one(net, dev);
>>>>>        }
>>>>> +
>>>>> +    /* We only reset backend vdpa device */
>>>>> +    if (net && net->dev.vhost_ops->backend_type == 
>>>>> VHOST_BACKEND_TYPE_VDPA) {
>>>>> +        vhost_dev_reset(&net->dev);
>>>>> +    }
>>>>> +
>>>>>        e = k->set_guest_notifiers(qbus->parent, total_notifiers, 
>>>>> false);
>>>>>        if (e < 0) {
>>>>>            fprintf(stderr, "vhost guest notifier cleanup failed: 
>>>>> %d\n", e);
>>>>> @@ -410,6 +419,7 @@ void vhost_net_stop(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>        VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(vbus);
>>>>>        VirtIONet *n = VIRTIO_NET(dev);
>>>>>        NetClientState *peer;
>>>>> +    struct vhost_net *net = NULL;
>>>>>        int total_notifiers = data_queue_pairs * 2 + cvq;
>>>>>        int nvhosts = data_queue_pairs + cvq;
>>>>>        int i, r;
>>>>> @@ -420,7 +430,15 @@ void vhost_net_stop(VirtIODevice *dev, 
>>>>> NetClientState *ncs,
>>>>>            } else {
>>>>>                peer = qemu_get_peer(ncs, n->max_queue_pairs);
>>>>>            }
>>>>> -        vhost_net_stop_one(get_vhost_net(peer), dev);
>>>>> +
>>>>> +        net = get_vhost_net(peer);
>>>>> +
>>>>> +        vhost_net_stop_one(net, dev);
>>>>> +    }
>>>>> +
>>>>> +    /* We only reset backend vdpa device */
>>>>> +    if (net && net->dev.vhost_ops->backend_type == 
>>>>> VHOST_BACKEND_TYPE_VDPA) {
>>>>> +        vhost_dev_reset(&net->dev);
>>>>>        }
>>>> So we've already reset the device in vhost_vdpa_dev_start(), any
>>>> reason we need to do it again here?
>>>>
>>>>>        r = k->set_guest_notifiers(qbus->parent, total_notifiers, 
>>>>> false);
>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
>>>>> index c5ed7a3..3ef0199 100644
>>>>> --- a/hw/virtio/vhost-vdpa.c
>>>>> +++ b/hw/virtio/vhost-vdpa.c
>>>>> @@ -708,6 +708,11 @@ static int vhost_vdpa_reset_device(struct 
>>>>> vhost_dev *dev)
>>>>>
>>>>>        ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
>>>>>        trace_vhost_vdpa_reset_device(dev, status);
>>>>> +
>>>>> +    /* Add back this status, so that the device could work next 
>>>>> time*/
>>>>> +    vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>>> +                               VIRTIO_CONFIG_S_DRIVER);
>>>> This seems to contradict the semantic of reset.
>>>>
>>>>> +
>>>>>        return ret;
>>>>>    }
>>>>>
>>>>> @@ -719,14 +724,14 @@ static int vhost_vdpa_get_vq_index(struct 
>>>>> vhost_dev *dev, int idx)
>>>>>        return idx;
>>>>>    }
>>>>>
>>>>> -static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
>>>>> +static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev, 
>>>>> unsigned int ready)
>>>>>    {
>>>>>        int i;
>>>>>        trace_vhost_vdpa_set_vring_ready(dev);
>>>>>        for (i = 0; i < dev->nvqs; ++i) {
>>>>>            struct vhost_vring_state state = {
>>>>>                .index = dev->vq_index + i,
>>>>> -            .num = 1,
>>>>> +            .num = ready,
>>>>>            };
>>>>>            vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, &state);
>>>>>        }
>>>>> @@ -1088,8 +1093,9 @@ static int vhost_vdpa_dev_start(struct 
>>>>> vhost_dev *dev, bool started)
>>>>>            if (unlikely(!ok)) {
>>>>>                return -1;
>>>>>            }
>>>>> -        vhost_vdpa_set_vring_ready(dev);
>>>>> +        vhost_vdpa_set_vring_ready(dev, 1);
>>>>>        } else {
>>>>> +        vhost_vdpa_set_vring_ready(dev, 0);
>>>>>            ok = vhost_vdpa_svqs_stop(dev);
>>>>>            if (unlikely(!ok)) {
>>>>>                return -1;
>>>>> @@ -1105,9 +1111,6 @@ static int vhost_vdpa_dev_start(struct 
>>>>> vhost_dev *dev, bool started)
>>>>>            memory_listener_register(&v->listener, 
>>>>> &address_space_memory);
>>>>>            return vhost_vdpa_add_status(dev, 
>>>>> VIRTIO_CONFIG_S_DRIVER_OK);
>>>>>        } else {
>>>>> -        vhost_vdpa_reset_device(dev);
>>>>> -        vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
>>>>> - VIRTIO_CONFIG_S_DRIVER);
>>>>> memory_listener_unregister(&v->listener);
>>>>>
>>>>>            return 0;
>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>> index b643f42..7e0cdb6 100644
>>>>> --- a/hw/virtio/vhost.c
>>>>> +++ b/hw/virtio/vhost.c
>>>>> @@ -1820,7 +1820,6 @@ fail_features:
>>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev)
>>>>>    {
>>>>>        int i;
>>>>> -
>>>> Unnecessary changes.
>>>>
>>>>>        /* should only be called after backend is connected */
>>>>>        assert(hdev->vhost_ops);
>>>>>
>>>>> @@ -1854,3 +1853,17 @@ int vhost_net_set_backend(struct vhost_dev 
>>>>> *hdev,
>>>>>
>>>>>        return -ENOSYS;
>>>>>    }
>>>>> +
>>>>> +int vhost_dev_reset(struct vhost_dev *hdev)
>>>>> +{
>>>> Let's use a separate patch for this.
>>>>
>>>> Thanks
>>>>
>>>>> +    int ret = 0;
>>>>> +
>>>>> +    /* should only be called after backend is connected */
>>>>> +    assert(hdev->vhost_ops);
>>>>> +
>>>>> +    if (hdev->vhost_ops->vhost_reset_device) {
>>>>> +        ret = hdev->vhost_ops->vhost_reset_device(hdev);
>>>>> +    }
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
>>>>> index 58a73e7..b8b7c20 100644
>>>>> --- a/include/hw/virtio/vhost.h
>>>>> +++ b/include/hw/virtio/vhost.h
>>>>> @@ -114,6 +114,7 @@ int vhost_dev_init(struct vhost_dev *hdev, 
>>>>> void *opaque,
>>>>>    void vhost_dev_cleanup(struct vhost_dev *hdev);
>>>>>    int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>>    void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev);
>>>>> +int vhost_dev_reset(struct vhost_dev *hdev);
>>>>>    int vhost_dev_enable_notifiers(struct vhost_dev *hdev, 
>>>>> VirtIODevice *vdev);
>>>>>    void vhost_dev_disable_notifiers(struct vhost_dev *hdev, 
>>>>> VirtIODevice *vdev);
>>>>>
>>>>> -- 
>>>>> 1.8.3.1
>>>>>
>>>>>
>>>>>
>



  reply	other threads:[~2022-04-07  7:52 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23  8:42 [PATCH] vdpa: Avoid reset when stop device 08005325
2022-03-23  9:20 ` Jason Wang
2022-03-25  6:32   ` Si-Wei Liu
     [not found]     ` <fe13304f-0a18-639e-580d-ce6eb7daecab@archeros.com>
2022-03-25 19:19       ` Si-Wei Liu
     [not found]     ` <6fbf82a9-39ce-f179-5e4b-384123ca542c@archeros.com>
2022-03-25 19:59       ` Si-Wei Liu
2022-03-30  8:52         ` Jason Wang
2022-03-30  9:53           ` Michael Qiu
2022-03-30 10:02 ` [PATCH v2] vdpa: reset the backend device in stage of stop last vhost device 08005325
2022-03-30 10:52   ` Michael S. Tsirkin
2022-03-31  1:39     ` Michael Qiu
2022-03-31  0:15   ` Si-Wei Liu
2022-03-31  4:01     ` Michael Qiu
2022-03-31  4:02     ` Michael Qiu
2022-03-31  5:19   ` [PATCH v3] vdpa: reset the backend device in the end of vhost_net_stop() 08005325
2022-03-31  8:55     ` Jason Wang
2022-03-31  9:12       ` Maxime Coquelin
2022-03-31  9:22         ` Michael Qiu
2022-04-01  2:55         ` Jason Wang
2022-03-31  9:25   ` [PATCH RESEND " qiudayu
2022-03-31 10:19     ` Michael Qiu
     [not found]     ` <6245804d.1c69fb81.3c35c.d7efSMTPIN_ADDED_BROKEN@mx.google.com>
2022-03-31 20:32       ` Michael S. Tsirkin
2022-04-01  1:12     ` Si-Wei Liu
2022-04-01  1:45       ` Michael Qiu
2022-04-01  1:31     ` [PATCH v4] " Michael Qiu
2022-04-01  2:53       ` Jason Wang
2022-04-01  3:20         ` Michael Qiu
2022-04-01 23:07         ` Si-Wei Liu
2022-04-02  2:20           ` Jason Wang
2022-04-02  3:53             ` Michael Qiu
2022-04-06  0:56             ` Si-Wei Liu
2022-04-07  7:50               ` Jason Wang [this message]
     [not found]             ` <6247c8f5.1c69fb81.848e0.8b49SMTPIN_ADDED_BROKEN@mx.google.com>
2022-04-07  7:52               ` Jason Wang
     [not found]         ` <62466fff.1c69fb81.8817a.d813SMTPIN_ADDED_BROKEN@mx.google.com>
2022-04-02  1:48           ` Jason Wang
2022-04-02  3:43             ` Michael Qiu
2022-04-01 11:06       ` [PATCH 0/3] Refactor vhost device reset Michael Qiu
2022-04-01 11:06         ` [PATCH 1/3] vhost: Refactor vhost_reset_device() in VhostOps Michael Qiu
2022-04-02  0:44           ` Si-Wei Liu
2022-04-02  2:08             ` Michael Qiu
2022-04-02  2:38           ` Jason Wang
2022-04-02  5:14             ` Michael Qiu
     [not found]             ` <6247dc22.1c69fb81.4244.a88bSMTPIN_ADDED_BROKEN@mx.google.com>
2022-04-07  7:35               ` Jason Wang
2022-04-08  8:38                 ` Michael Qiu
2022-04-08 17:17                   ` Si-Wei Liu
2022-04-11  8:51                     ` Jason Wang
2022-04-01 11:06         ` [PATCH 2/3] vhost: add vhost_dev_reset() Michael Qiu
2022-04-02  0:48           ` Si-Wei Liu
2022-04-01 11:06         ` [PATCH 3/3 v5] vdpa: reset the backend device in the end of vhost_net_stop() Michael Qiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6a1e7563-94d9-ea30-d94c-a2197c04ecac@redhat.com \
    --to=jasowang@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=lingshan.zhu@intel.com \
    --cc=lulu@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qiudayu@archeros.com \
    --cc=si-wei.liu@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).