From: Stefan Hajnoczi <stefanha@redhat.com>
To: longpeng2@huawei.com
Cc: Hanna Czenczek <hreitz@redhat.com>,
qemu-devel@nongnu.org, "Michael S . Tsirkin" <mst@redhat.com>,
German Maglione <gmaglione@redhat.com>,
Eugenio Perez Martin <eperezma@redhat.com>
Subject: Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop
Date: Thu, 27 Jul 2023 16:26:05 -0400 [thread overview]
Message-ID: <20230727202605.GB986673@fedora> (raw)
In-Reply-To: <CAJaqyWcqGU7iBEuSgDWuWQY_D8GcavBbjnywnf+xqqS+SpAq9Q@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 12365 bytes --]
On Thu, Jul 27, 2023 at 02:49:04PM +0200, Eugenio Perez Martin wrote:
> On Wed, Jul 26, 2023 at 8:57 AM Hanna Czenczek <hreitz@redhat.com> wrote:
> >
> > On 25.07.23 20:53, Eugenio Perez Martin wrote:
> > > On Tue, Jul 25, 2023 at 3:09 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >> On 25.07.23 12:03, Eugenio Perez Martin wrote:
> > >>> On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >>>> On 24.07.23 17:48, Eugenio Perez Martin wrote:
> > >>>>> On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >>>>>> On 21.07.23 17:25, Eugenio Perez Martin wrote:
> > >>>>>>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >>>>>>>> Move the `suspended` field from vhost_vdpa into the global vhost_dev
> > >>>>>>>> struct, so vhost_dev_stop() can check whether the back-end has been
> > >>>>>>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has,
> > >>>>>>>> there is no need to reset it; the reset is just a fall-back to stop
> > >>>>>>>> device operations for back-ends that do not support suspend.
> > >>>>>>>>
> > >>>>>>>> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so
> > >>>>>>>> when the device is re-started, we still have to do the reset to have it
> > >>>>>>>> un-suspend.
> > >>>>>>>>
> > >>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> > >>>>>>>> ---
> > >>>>>>>> include/hw/virtio/vhost-vdpa.h | 2 --
> > >>>>>>>> include/hw/virtio/vhost.h | 8 ++++++++
> > >>>>>>>> hw/virtio/vhost-vdpa.c | 11 +++++++----
> > >>>>>>>> hw/virtio/vhost.c | 8 +++++++-
> > >>>>>>>> 4 files changed, 22 insertions(+), 7 deletions(-)
> > >>>>>>>>
> > >>>>>>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> index e64bfc7f98..72c3686b7f 100644
> > >>>>>>>> --- a/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> +++ b/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa {
> > >>>>>>>> bool shadow_vqs_enabled;
> > >>>>>>>> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */
> > >>>>>>>> bool shadow_data;
> > >>>>>>>> - /* Device suspended successfully */
> > >>>>>>>> - bool suspended;
> > >>>>>>>> /* IOVA mapping used by the Shadow Virtqueue */
> > >>>>>>>> VhostIOVATree *iova_tree;
> > >>>>>>>> GPtrArray *shadow_vqs;
> > >>>>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > >>>>>>>> index 6a173cb9fa..69bf59d630 100644
> > >>>>>>>> --- a/include/hw/virtio/vhost.h
> > >>>>>>>> +++ b/include/hw/virtio/vhost.h
> > >>>>>>>> @@ -120,6 +120,14 @@ struct vhost_dev {
> > >>>>>>>> uint64_t backend_cap;
> > >>>>>>>> /* @started: is the vhost device started? */
> > >>>>>>>> bool started;
> > >>>>>>>> + /**
> > >>>>>>>> + * @suspended: Whether the vhost device is currently suspended. Set
> > >>>>>>>> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which
> > >>>>>>>> + * are supposed to automatically suspend/resume in their
> > >>>>>>>> + * vhost_dev_start handlers as required. Must also be cleared when
> > >>>>>>>> + * the device is reset.
> > >>>>>>>> + */
> > >>>>>>>> + bool suspended;
> > >>>>>>>> bool log_enabled;
> > >>>>>>>> uint64_t log_size;
> > >>>>>>>> Error *migration_blocker;
> > >>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>>>>>>> index 7b7dee468e..f7fd19a203 100644
> > >>>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > >>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > >>>>>>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
> > >>>>>>>>
> > >>>>>>>> static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> > >>>>>>>> {
> > >>>>>>>> - struct vhost_vdpa *v = dev->opaque;
> > >>>>>>>> int ret;
> > >>>>>>>> uint8_t status = 0;
> > >>>>>>>>
> > >>>>>>>> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
> > >>>>>>>> trace_vhost_vdpa_reset_device(dev);
> > >>>>>>>> - v->suspended = false;
> > >>>>>>>> + dev->suspended = false;
> > >>>>>>>> return ret;
> > >>>>>>>> }
> > >>>>>>>>
> > >>>>>>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev)
> > >>>>>>>> if (unlikely(r)) {
> > >>>>>>>> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
> > >>>>>>>> } else {
> > >>>>>>>> - v->suspended = true;
> > >>>>>>>> + dev->suspended = true;
> > >>>>>>>> return;
> > >>>>>>>> }
> > >>>>>>>> }
> > >>>>>>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >>>>>>>> return -1;
> > >>>>>>>> }
> > >>>>>>>> vhost_vdpa_set_vring_ready(dev);
> > >>>>>>>> + if (dev->suspended) {
> > >>>>>>>> + /* TODO: When RESUME is available, use it instead of resetting */
> > >>>>>>>> + vhost_vdpa_reset_status(dev);
> > >>>>>>> How is that we reset the status at each vhost_vdpa_dev_start? That
> > >>>>>>> will clean all the vqs configured, features negotiated, etc. in the
> > >>>>>>> vDPA device. Or am I missing something?
> > >>>>>> What alternative do you propose? We don’t have RESUME for vDPA in qemu,
> > >>>>>> but we somehow need to lift the previous SUSPEND so the device will
> > >>>>>> again respond to guest requests, do we not?
> > >>>>>>
> > >>>>> Reset also clears the suspend state in vDPA, and it should be called
> > >>>>> at vhost_dev_stop. So the device should never be in suspended state
> > >>>>> here. Does that solve your concerns?
> > >>>> My intention with this patch was precisely not to reset in
> > >>>> vhost_dev_stop when suspending is supported. So now I’m more confused
> > >>>> than before.
> > >>>>
> > >>> At this moment, I think that should be focused as an optimization and
> > >>> not to be included in the main series.
> > >> It is absolutely not an optimization but vital for my use case.
> > >> virtiofsd does not currently implement resetting, but if it did (and we
> > >> want this support in the future), resetting it during stop/cont would be
> > >> catastrophic. There must be a way to have qemu not issue a reset. This
> > >> patch is the reason why this series exists.
> > >>
> > > Sorry, I see I confused things with the first reply. Let me do a recap.
> > >
> > > If I understand the problem correctly, your use case requires that
> > > qemu does not reset the device before the device state is fetched with
> > > virtio_save in the case of a migration.
> >
> > That is only part of the problem, the bigger picture has nothing to do
> > with migration at all. The problem is that when the VM is paused
> > (stop), we invoke vhost_dev_stop(), and when it is resumed (cont), we
> > invoke vhost_dev_start(). To me, it therefore sounds absolutely wrong
> > to reset the back-end in either of these functions. For stateless
> > devices, it was determined to not be an issue (I still find it extremely
> > strange), and as far as I’ve understood, we’ve come to the agreement
> > that it’s basically a fallback for when there is no other way to stop
> > the back-end. But stateful devices like virtio-fs would be completely
> > broken by resetting them there.
> >
> > Therefore, if virtiofsd did implement reset through SET_STATUS,
> > stop/cont would break it today. Maybe other vhost-user devices, too,
> > which just implement RESET_OWNER/RESET_DEVICE, which aren’t even called
> > when the device is supposed to be reset in vhost_dev_stop() (patch 6).
> >
> > So not just because of migration, but in general, there must be a way
> > for back-ends to force qemu not to reset them in vhost_dev_start()/stop().
> >
> > Or we stop using vhost_dev_start()/stop() when the VM is paused/resumed
> > (stop/cont).
> >
>
> Yes, that comes back to the thread [1].
>
> As a third alternative, you can keep vhost_dev_start and let the
> function check the current state and initialize the device only if
> needed. But you can keep symmetrical functions and call one or another
> at the device's code, of course. Not sure what is cleaner or requires
> less changes.
>
> > > This is understandable and I
> > > think we have a solution for that: to move the vhost_ops call to
> > > virtio_reset and the end of virtio_save.
> >
> > Why would we reset the device in virtio_save()?
> >
>
> If the VM continues in the source because of whatever reason,
> vhost_dev_start would expect the device to be clean. You can test it
> with the command "cont" after the LM.
>
> > > To remove the reset call from
> > > vhost_dev_stop is somehow mandatory, as it is called before
> > > virtio_save.
> > >
> > > But we cannot move to vhost_vdpa_dev_start, as proposed here. The reasons are:
> > > * All the features, vq parameters, etc are set before any
> > > vhost_vdpa_dev_start call. To reset at any vhost_vdpa_dev_start would
> > > wipe them.
> > > * The device needs to hold all the resources until it is reset. Things
> > > like descriptor status etc.
> > >
> > > And, regarding the comment "When RESUME is available, use it instead
> > > of resetting", we cannot use resume to replace reset in all cases.
> > > This is because the semantics are different.
> > >
> > > For example, vhost_dev_stop and vhost_dev_start are also called when
> > > the guest reset by itself the device. You can check it rmmoding and
> > > modprobing virtio-net driver, for example. In this case, the driver
> > > expects to find all vqs to start with 0, but the resume must not reset
> > > these indexes.
> >
> > This isn’t quite clear to me. I understand this to mean that there must
> > be a reset somewhere in vhost_dev_stop() and/or vhost_dev_start(). But
> > above, you proposed moving the reset from vhost_dev_stop() into
> > virtio_reset(). Is virtio_reset() called in addition to
> > vhost_dev_stop() and vhost_dev_start() when the guest driver is changed?
> >
>
> Right. Maybe another option is virtio_set_status?
>
> The point is that other parts of qemu / guest trust the device to be
> reset after (current version of) vhost_dev_stop, so if we are going to
> move the reset it must be added to the callers at least. To trace
> these callers is needed, so maybe it is easy to add another function
> (vhost_dev_suspend?), your first alternative.
>
> > Because we can’t have an always-present reset in vhost_dev_stop() or
> > vhost_dev_start(). It just doesn’t work with stop/cont. At the same
> > time, I understand that you say we must have it because
> > vhost_dev_{stop,start}() are also used when the guest driver changes.
> > Consequently, it sounds clear to me that using the exact same functions
> > in stop/cont and when the guest driver is unloaded/loaded is and has
> > always been wrong. Because in stop/cont, the guest driver never
> > changes, so we shouldn’t tell the back-end that it did (by sending
> > SET_STATUS(0)).
> >
>
> Talking just about vhost_net here, the device dumps all the state
> (like vqs) to qemu in vhost_dev_stop. That is what allows to have, for
> example, an unified code in migrating: qemu only needs to know how to
> migrate its emulated device, and vhost code just writes or read at
> suspend / continue or live migrating. To suspend and continue is the
> same operation actually for vhost_net.
Hi Longpeng,
This discussion made me realize that --device vhost-vdpa-device-pci does
not support stateful vDPA devices.
For example, a vdpa-net device that implements the controlq will lose
state (e.g. packet receive filtering) when the guest is stopped with the
QEMU 'stop' command.
QEMU needs to use the VHOST_VDPA_RESUME ioctl so it can resume stateful
devices instead of resetting them across 'stop'/'cont'.
Whatever solution we agree on in this thread should also work for
vhost-vdpa and solve the issue for --device vhost-vdpa-device-pci.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2023-07-27 20:42 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek
2023-07-11 15:52 ` [PATCH 1/6] vhost-user.rst: " Hanna Czenczek
2023-07-18 14:25 ` Stefan Hajnoczi
2023-07-19 13:59 ` Hanna Czenczek
2023-07-24 17:55 ` Stefan Hajnoczi
2023-07-25 8:30 ` Hanna Czenczek
2023-07-27 21:12 ` Stefan Hajnoczi
2023-07-11 15:52 ` [PATCH 2/6] vhost-vdpa: Move vhost_vdpa_reset_status() up Hanna Czenczek
2023-07-18 14:29 ` Stefan Hajnoczi
2023-07-11 15:52 ` [PATCH 3/6] vhost: Do not reset suspended devices on stop Hanna Czenczek
2023-07-18 14:33 ` Stefan Hajnoczi
2023-07-21 15:25 ` Eugenio Perez Martin
2023-07-21 16:07 ` Hanna Czenczek
2023-07-24 15:48 ` Eugenio Perez Martin
2023-07-25 7:53 ` Hanna Czenczek
2023-07-25 10:03 ` Eugenio Perez Martin
2023-07-25 13:09 ` Hanna Czenczek
2023-07-25 18:53 ` Eugenio Perez Martin
2023-07-26 6:57 ` Hanna Czenczek
2023-07-27 12:49 ` Eugenio Perez Martin
2023-07-27 20:26 ` Stefan Hajnoczi [this message]
2023-07-11 15:52 ` [PATCH 4/6] vhost-user: Implement suspend/resume Hanna Czenczek
2023-07-18 14:37 ` Stefan Hajnoczi
2023-07-11 15:52 ` [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset Hanna Czenczek
2023-07-18 14:50 ` Stefan Hajnoczi
2023-07-19 14:09 ` Hanna Czenczek
2023-07-19 15:06 ` Stefan Hajnoczi
2023-07-21 15:47 ` Eugenio Perez Martin
2023-07-11 15:52 ` [PATCH 6/6] vhost-user: Have reset_status fall back to reset Hanna Czenczek
2023-07-18 15:10 ` Stefan Hajnoczi
2023-07-19 14:11 ` Hanna Czenczek
2023-07-19 14:27 ` Hanna Czenczek
2023-07-20 16:03 ` Stefan Hajnoczi
2023-07-21 14:16 ` Hanna Czenczek
2023-07-24 18:04 ` Stefan Hajnoczi
2023-07-25 8:39 ` Hanna Czenczek
2023-07-18 15:14 ` [PATCH 0/6] vhost-user: Add suspend/resume Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230727202605.GB986673@fedora \
--to=stefanha@redhat.com \
--cc=eperezma@redhat.com \
--cc=gmaglione@redhat.com \
--cc=hreitz@redhat.com \
--cc=longpeng2@huawei.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.