From: Stefan Hajnoczi <stefanha@redhat.com>
To: longpeng2@huawei.com
Cc: Hanna Czenczek <hreitz@redhat.com>,
qemu-devel@nongnu.org, "Michael S . Tsirkin" <mst@redhat.com>,
German Maglione <gmaglione@redhat.com>,
Eugenio Perez Martin <eperezma@redhat.com>
Subject: Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop
Date: Thu, 27 Jul 2023 16:26:05 -0400 [thread overview]
Message-ID: <20230727202605.GB986673@fedora> (raw)
In-Reply-To: <CAJaqyWcqGU7iBEuSgDWuWQY_D8GcavBbjnywnf+xqqS+SpAq9Q@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 12365 bytes --]
On Thu, Jul 27, 2023 at 02:49:04PM +0200, Eugenio Perez Martin wrote:
> On Wed, Jul 26, 2023 at 8:57 AM Hanna Czenczek <hreitz@redhat.com> wrote:
> >
> > On 25.07.23 20:53, Eugenio Perez Martin wrote:
> > > On Tue, Jul 25, 2023 at 3:09 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >> On 25.07.23 12:03, Eugenio Perez Martin wrote:
> > >>> On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >>>> On 24.07.23 17:48, Eugenio Perez Martin wrote:
> > >>>>> On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >>>>>> On 21.07.23 17:25, Eugenio Perez Martin wrote:
> > >>>>>>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >>>>>>>> Move the `suspended` field from vhost_vdpa into the global vhost_dev
> > >>>>>>>> struct, so vhost_dev_stop() can check whether the back-end has been
> > >>>>>>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has,
> > >>>>>>>> there is no need to reset it; the reset is just a fall-back to stop
> > >>>>>>>> device operations for back-ends that do not support suspend.
> > >>>>>>>>
> > >>>>>>>> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so
> > >>>>>>>> when the device is re-started, we still have to do the reset to have it
> > >>>>>>>> un-suspend.
> > >>>>>>>>
> > >>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> > >>>>>>>> ---
> > >>>>>>>> include/hw/virtio/vhost-vdpa.h | 2 --
> > >>>>>>>> include/hw/virtio/vhost.h | 8 ++++++++
> > >>>>>>>> hw/virtio/vhost-vdpa.c | 11 +++++++----
> > >>>>>>>> hw/virtio/vhost.c | 8 +++++++-
> > >>>>>>>> 4 files changed, 22 insertions(+), 7 deletions(-)
> > >>>>>>>>
> > >>>>>>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> index e64bfc7f98..72c3686b7f 100644
> > >>>>>>>> --- a/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> +++ b/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa {
> > >>>>>>>> bool shadow_vqs_enabled;
> > >>>>>>>> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */
> > >>>>>>>> bool shadow_data;
> > >>>>>>>> - /* Device suspended successfully */
> > >>>>>>>> - bool suspended;
> > >>>>>>>> /* IOVA mapping used by the Shadow Virtqueue */
> > >>>>>>>> VhostIOVATree *iova_tree;
> > >>>>>>>> GPtrArray *shadow_vqs;
> > >>>>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > >>>>>>>> index 6a173cb9fa..69bf59d630 100644
> > >>>>>>>> --- a/include/hw/virtio/vhost.h
> > >>>>>>>> +++ b/include/hw/virtio/vhost.h
> > >>>>>>>> @@ -120,6 +120,14 @@ struct vhost_dev {
> > >>>>>>>> uint64_t backend_cap;
> > >>>>>>>> /* @started: is the vhost device started? */
> > >>>>>>>> bool started;
> > >>>>>>>> + /**
> > >>>>>>>> + * @suspended: Whether the vhost device is currently suspended. Set
> > >>>>>>>> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which
> > >>>>>>>> + * are supposed to automatically suspend/resume in their
> > >>>>>>>> + * vhost_dev_start handlers as required. Must also be cleared when
> > >>>>>>>> + * the device is reset.
> > >>>>>>>> + */
> > >>>>>>>> + bool suspended;
> > >>>>>>>> bool log_enabled;
> > >>>>>>>> uint64_t log_size;
> > >>>>>>>> Error *migration_blocker;
> > >>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>>>>>>> index 7b7dee468e..f7fd19a203 100644
> > >>>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > >>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > >>>>>>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
> > >>>>>>>>
> > >>>>>>>> static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> > >>>>>>>> {
> > >>>>>>>> - struct vhost_vdpa *v = dev->opaque;
> > >>>>>>>> int ret;
> > >>>>>>>> uint8_t status = 0;
> > >>>>>>>>
> > >>>>>>>> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
> > >>>>>>>> trace_vhost_vdpa_reset_device(dev);
> > >>>>>>>> - v->suspended = false;
> > >>>>>>>> + dev->suspended = false;
> > >>>>>>>> return ret;
> > >>>>>>>> }
> > >>>>>>>>
> > >>>>>>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev)
> > >>>>>>>> if (unlikely(r)) {
> > >>>>>>>> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
> > >>>>>>>> } else {
> > >>>>>>>> - v->suspended = true;
> > >>>>>>>> + dev->suspended = true;
> > >>>>>>>> return;
> > >>>>>>>> }
> > >>>>>>>> }
> > >>>>>>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >>>>>>>> return -1;
> > >>>>>>>> }
> > >>>>>>>> vhost_vdpa_set_vring_ready(dev);
> > >>>>>>>> + if (dev->suspended) {
> > >>>>>>>> + /* TODO: When RESUME is available, use it instead of resetting */
> > >>>>>>>> + vhost_vdpa_reset_status(dev);
> > >>>>>>> How is that we reset the status at each vhost_vdpa_dev_start? That
> > >>>>>>> will clean all the vqs configured, features negotiated, etc. in the
> > >>>>>>> vDPA device. Or am I missing something?
> > >>>>>> What alternative do you propose? We don’t have RESUME for vDPA in qemu,
> > >>>>>> but we somehow need to lift the previous SUSPEND so the device will
> > >>>>>> again respond to guest requests, do we not?
> > >>>>>>
> > >>>>> Reset also clears the suspend state in vDPA, and it should be called
> > >>>>> at vhost_dev_stop. So the device should never be in suspended state
> > >>>>> here. Does that solve your concerns?
> > >>>> My intention with this patch was precisely not to reset in
> > >>>> vhost_dev_stop when suspending is supported. So now I’m more confused
> > >>>> than before.
> > >>>>
> > >>> At this moment, I think that should be focused as an optimization and
> > >>> not to be included in the main series.
> > >> It is absolutely not an optimization but vital for my use case.
> > >> virtiofsd does not currently implement resetting, but if it did (and we
> > >> want this support in the future), resetting it during stop/cont would be
> > >> catastrophic. There must be a way to have qemu not issue a reset. This
> > >> patch is the reason why this series exists.
> > >>
> > > Sorry, I see I confused things with the first reply. Let me do a recap.
> > >
> > > If I understand the problem correctly, your use case requires that
> > > qemu does not reset the device before the device state is fetched with
> > > virtio_save in the case of a migration.
> >
> > That is only part of the problem, the bigger picture has nothing to do
> > with migration at all. The problem is that when the VM is paused
> > (stop), we invoke vhost_dev_stop(), and when it is resumed (cont), we
> > invoke vhost_dev_start(). To me, it therefore sounds absolutely wrong
> > to reset the back-end in either of these functions. For stateless
> > devices, it was determined to not be an issue (I still find it extremely
> > strange), and as far as I’ve understood, we’ve come to the agreement
> > that it’s basically a fallback for when there is no other way to stop
> > the back-end. But stateful devices like virtio-fs would be completely
> > broken by resetting them there.
> >
> > Therefore, if virtiofsd did implement reset through SET_STATUS,
> > stop/cont would break it today. Maybe other vhost-user devices, too,
> > which just implement RESET_OWNER/RESET_DEVICE, which aren’t even called
> > when the device is supposed to be reset in vhost_dev_stop() (patch 6).
> >
> > So not just because of migration, but in general, there must be a way
> > for back-ends to force qemu not to reset them in vhost_dev_start()/stop().
> >
> > Or we stop using vhost_dev_start()/stop() when the VM is paused/resumed
> > (stop/cont).
> >
>
> Yes, that comes back to the thread [1].
>
> As a third alternative, you can keep vhost_dev_start and let the
> function check the current state and initialize the device only if
> needed. But you can keep symmetrical functions and call one or another
> at the device's code, of course. Not sure what is cleaner or requires
> less changes.
>
> > > This is understandable and I
> > > think we have a solution for that: to move the vhost_ops call to
> > > virtio_reset and the end of virtio_save.
> >
> > Why would we reset the device in virtio_save()?
> >
>
> If the VM continues in the source because of whatever reason,
> vhost_dev_start would expect the device to be clean. You can test it
> with the command "cont" after the LM.
>
> > > To remove the reset call from
> > > vhost_dev_stop is somehow mandatory, as it is called before
> > > virtio_save.
> > >
> > > But we cannot move to vhost_vdpa_dev_start, as proposed here. The reasons are:
> > > * All the features, vq parameters, etc are set before any
> > > vhost_vdpa_dev_start call. To reset at any vhost_vdpa_dev_start would
> > > wipe them.
> > > * The device needs to hold all the resources until it is reset. Things
> > > like descriptor status etc.
> > >
> > > And, regarding the comment "When RESUME is available, use it instead
> > > of resetting", we cannot use resume to replace reset in all cases.
> > > This is because the semantics are different.
> > >
> > > For example, vhost_dev_stop and vhost_dev_start are also called when
> > > the guest reset by itself the device. You can check it rmmoding and
> > > modprobing virtio-net driver, for example. In this case, the driver
> > > expects to find all vqs to start with 0, but the resume must not reset
> > > these indexes.
> >
> > This isn’t quite clear to me. I understand this to mean that there must
> > be a reset somewhere in vhost_dev_stop() and/or vhost_dev_start(). But
> > above, you proposed moving the reset from vhost_dev_stop() into
> > virtio_reset(). Is virtio_reset() called in addition to
> > vhost_dev_stop() and vhost_dev_start() when the guest driver is changed?
> >
>
> Right. Maybe another option is virtio_set_status?
>
> The point is that other parts of qemu / guest trust the device to be
> reset after (current version of) vhost_dev_stop, so if we are going to
> move the reset it must be added to the callers at least. To trace
> these callers is needed, so maybe it is easy to add another function
> (vhost_dev_suspend?), your first alternative.
>
> > Because we can’t have an always-present reset in vhost_dev_stop() or
> > vhost_dev_start(). It just doesn’t work with stop/cont. At the same
> > time, I understand that you say we must have it because
> > vhost_dev_{stop,start}() are also used when the guest driver changes.
> > Consequently, it sounds clear to me that using the exact same functions
> > in stop/cont and when the guest driver is unloaded/loaded is and has
> > always been wrong. Because in stop/cont, the guest driver never
> > changes, so we shouldn’t tell the back-end that it did (by sending
> > SET_STATUS(0)).
> >
>
> Talking just about vhost_net here, the device dumps all the state
> (like vqs) to qemu in vhost_dev_stop. That is what allows to have, for
> example, an unified code in migrating: qemu only needs to know how to
> migrate its emulated device, and vhost code just writes or read at
> suspend / continue or live migrating. To suspend and continue is the
> same operation actually for vhost_net.
Hi Longpeng,
This discussion made me realize that --device vhost-vdpa-device-pci does
not support stateful vDPA devices.
For example, a vdpa-net device that implements the controlq will lose
state (e.g. packet receive filtering) when the guest is stopped with the
QEMU 'stop' command.
QEMU needs to use the VHOST_VDPA_RESUME ioctl so it can resume stateful
devices instead of resetting them across 'stop'/'cont'.
Whatever solution we agree on in this thread should also work for
vhost-vdpa and solve the issue for --device vhost-vdpa-device-pci.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2023-07-27 20:42 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek
2023-07-11 15:52 ` [PATCH 1/6] vhost-user.rst: " Hanna Czenczek
2023-07-18 14:25 ` Stefan Hajnoczi
2023-07-19 13:59 ` Hanna Czenczek
2023-07-24 17:55 ` Stefan Hajnoczi
2023-07-25 8:30 ` Hanna Czenczek
2023-07-27 21:12 ` Stefan Hajnoczi
2023-07-11 15:52 ` [PATCH 2/6] vhost-vdpa: Move vhost_vdpa_reset_status() up Hanna Czenczek
2023-07-18 14:29 ` Stefan Hajnoczi
2023-07-11 15:52 ` [PATCH 3/6] vhost: Do not reset suspended devices on stop Hanna Czenczek
2023-07-18 14:33 ` Stefan Hajnoczi
2023-07-21 15:25 ` Eugenio Perez Martin
2023-07-21 16:07 ` Hanna Czenczek
2023-07-24 15:48 ` Eugenio Perez Martin
2023-07-25 7:53 ` Hanna Czenczek
2023-07-25 10:03 ` Eugenio Perez Martin
2023-07-25 13:09 ` Hanna Czenczek
2023-07-25 18:53 ` Eugenio Perez Martin
2023-07-26 6:57 ` Hanna Czenczek
2023-07-27 12:49 ` Eugenio Perez Martin
2023-07-27 20:26 ` Stefan Hajnoczi [this message]
2023-07-11 15:52 ` [PATCH 4/6] vhost-user: Implement suspend/resume Hanna Czenczek
2023-07-18 14:37 ` Stefan Hajnoczi
2023-07-11 15:52 ` [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset Hanna Czenczek
2023-07-18 14:50 ` Stefan Hajnoczi
2023-07-19 14:09 ` Hanna Czenczek
2023-07-19 15:06 ` Stefan Hajnoczi
2023-07-21 15:47 ` Eugenio Perez Martin
2023-07-11 15:52 ` [PATCH 6/6] vhost-user: Have reset_status fall back to reset Hanna Czenczek
2023-07-18 15:10 ` Stefan Hajnoczi
2023-07-19 14:11 ` Hanna Czenczek
2023-07-19 14:27 ` Hanna Czenczek
2023-07-20 16:03 ` Stefan Hajnoczi
2023-07-21 14:16 ` Hanna Czenczek
2023-07-24 18:04 ` Stefan Hajnoczi
2023-07-25 8:39 ` Hanna Czenczek
2023-07-18 15:14 ` [PATCH 0/6] vhost-user: Add suspend/resume Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230727202605.GB986673@fedora \
--to=stefanha@redhat.com \
--cc=eperezma@redhat.com \
--cc=gmaglione@redhat.com \
--cc=hreitz@redhat.com \
--cc=longpeng2@huawei.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).