qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: longpeng2@huawei.com
Cc: Hanna Czenczek <hreitz@redhat.com>,
	qemu-devel@nongnu.org, "Michael S . Tsirkin" <mst@redhat.com>,
	German Maglione <gmaglione@redhat.com>,
	Eugenio Perez Martin <eperezma@redhat.com>
Subject: Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop
Date: Thu, 27 Jul 2023 16:26:05 -0400	[thread overview]
Message-ID: <20230727202605.GB986673@fedora> (raw)
In-Reply-To: <CAJaqyWcqGU7iBEuSgDWuWQY_D8GcavBbjnywnf+xqqS+SpAq9Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 12365 bytes --]

On Thu, Jul 27, 2023 at 02:49:04PM +0200, Eugenio Perez Martin wrote:
> On Wed, Jul 26, 2023 at 8:57 AM Hanna Czenczek <hreitz@redhat.com> wrote:
> >
> > On 25.07.23 20:53, Eugenio Perez Martin wrote:
> > > On Tue, Jul 25, 2023 at 3:09 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >> On 25.07.23 12:03, Eugenio Perez Martin wrote:
> > >>> On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >>>> On 24.07.23 17:48, Eugenio Perez Martin wrote:
> > >>>>> On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >>>>>> On 21.07.23 17:25, Eugenio Perez Martin wrote:
> > >>>>>>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote:
> > >>>>>>>> Move the `suspended` field from vhost_vdpa into the global vhost_dev
> > >>>>>>>> struct, so vhost_dev_stop() can check whether the back-end has been
> > >>>>>>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`.  If it has,
> > >>>>>>>> there is no need to reset it; the reset is just a fall-back to stop
> > >>>>>>>> device operations for back-ends that do not support suspend.
> > >>>>>>>>
> > >>>>>>>> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so
> > >>>>>>>> when the device is re-started, we still have to do the reset to have it
> > >>>>>>>> un-suspend.
> > >>>>>>>>
> > >>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> > >>>>>>>> ---
> > >>>>>>>>      include/hw/virtio/vhost-vdpa.h |  2 --
> > >>>>>>>>      include/hw/virtio/vhost.h      |  8 ++++++++
> > >>>>>>>>      hw/virtio/vhost-vdpa.c         | 11 +++++++----
> > >>>>>>>>      hw/virtio/vhost.c              |  8 +++++++-
> > >>>>>>>>      4 files changed, 22 insertions(+), 7 deletions(-)
> > >>>>>>>>
> > >>>>>>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> index e64bfc7f98..72c3686b7f 100644
> > >>>>>>>> --- a/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> +++ b/include/hw/virtio/vhost-vdpa.h
> > >>>>>>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa {
> > >>>>>>>>          bool shadow_vqs_enabled;
> > >>>>>>>>          /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */
> > >>>>>>>>          bool shadow_data;
> > >>>>>>>> -    /* Device suspended successfully */
> > >>>>>>>> -    bool suspended;
> > >>>>>>>>          /* IOVA mapping used by the Shadow Virtqueue */
> > >>>>>>>>          VhostIOVATree *iova_tree;
> > >>>>>>>>          GPtrArray *shadow_vqs;
> > >>>>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> > >>>>>>>> index 6a173cb9fa..69bf59d630 100644
> > >>>>>>>> --- a/include/hw/virtio/vhost.h
> > >>>>>>>> +++ b/include/hw/virtio/vhost.h
> > >>>>>>>> @@ -120,6 +120,14 @@ struct vhost_dev {
> > >>>>>>>>          uint64_t backend_cap;
> > >>>>>>>>          /* @started: is the vhost device started? */
> > >>>>>>>>          bool started;
> > >>>>>>>> +    /**
> > >>>>>>>> +     * @suspended: Whether the vhost device is currently suspended.  Set
> > >>>>>>>> +     * and reset by implementations (vhost-user, vhost-vdpa, ...), which
> > >>>>>>>> +     * are supposed to automatically suspend/resume in their
> > >>>>>>>> +     * vhost_dev_start handlers as required.  Must also be cleared when
> > >>>>>>>> +     * the device is reset.
> > >>>>>>>> +     */
> > >>>>>>>> +    bool suspended;
> > >>>>>>>>          bool log_enabled;
> > >>>>>>>>          uint64_t log_size;
> > >>>>>>>>          Error *migration_blocker;
> > >>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > >>>>>>>> index 7b7dee468e..f7fd19a203 100644
> > >>>>>>>> --- a/hw/virtio/vhost-vdpa.c
> > >>>>>>>> +++ b/hw/virtio/vhost-vdpa.c
> > >>>>>>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev,
> > >>>>>>>>
> > >>>>>>>>      static int vhost_vdpa_reset_device(struct vhost_dev *dev)
> > >>>>>>>>      {
> > >>>>>>>> -    struct vhost_vdpa *v = dev->opaque;
> > >>>>>>>>          int ret;
> > >>>>>>>>          uint8_t status = 0;
> > >>>>>>>>
> > >>>>>>>>          ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status);
> > >>>>>>>>          trace_vhost_vdpa_reset_device(dev);
> > >>>>>>>> -    v->suspended = false;
> > >>>>>>>> +    dev->suspended = false;
> > >>>>>>>>          return ret;
> > >>>>>>>>      }
> > >>>>>>>>
> > >>>>>>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev)
> > >>>>>>>>              if (unlikely(r)) {
> > >>>>>>>>                  error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno);
> > >>>>>>>>              } else {
> > >>>>>>>> -            v->suspended = true;
> > >>>>>>>> +            dev->suspended = true;
> > >>>>>>>>                  return;
> > >>>>>>>>              }
> > >>>>>>>>          }
> > >>>>>>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
> > >>>>>>>>                  return -1;
> > >>>>>>>>              }
> > >>>>>>>>              vhost_vdpa_set_vring_ready(dev);
> > >>>>>>>> +        if (dev->suspended) {
> > >>>>>>>> +            /* TODO: When RESUME is available, use it instead of resetting */
> > >>>>>>>> +            vhost_vdpa_reset_status(dev);
> > >>>>>>> How is that we reset the status at each vhost_vdpa_dev_start? That
> > >>>>>>> will clean all the vqs configured, features negotiated, etc. in the
> > >>>>>>> vDPA device. Or am I missing something?
> > >>>>>> What alternative do you propose?  We don’t have RESUME for vDPA in qemu,
> > >>>>>> but we somehow need to lift the previous SUSPEND so the device will
> > >>>>>> again respond to guest requests, do we not?
> > >>>>>>
> > >>>>> Reset also clears the suspend state in vDPA, and it should be called
> > >>>>> at vhost_dev_stop. So the device should never be in suspended state
> > >>>>> here. Does that solve your concerns?
> > >>>> My intention with this patch was precisely not to reset in
> > >>>> vhost_dev_stop when suspending is supported.  So now I’m more confused
> > >>>> than before.
> > >>>>
> > >>> At this moment, I think that should be focused as an optimization and
> > >>> not to be included in the main series.
> > >> It is absolutely not an optimization but vital for my use case.
> > >> virtiofsd does not currently implement resetting, but if it did (and we
> > >> want this support in the future), resetting it during stop/cont would be
> > >> catastrophic.  There must be a way to have qemu not issue a reset.  This
> > >> patch is the reason why this series exists.
> > >>
> > > Sorry, I see I confused things with the first reply. Let me do a recap.
> > >
> > > If I understand the problem correctly, your use case requires that
> > > qemu does not reset the device before the device state is fetched with
> > > virtio_save in the case of a migration.
> >
> > That is only part of the problem, the bigger picture has nothing to do
> > with migration at all.  The problem is that when the VM is paused
> > (stop), we invoke vhost_dev_stop(), and when it is resumed (cont), we
> > invoke vhost_dev_start().  To me, it therefore sounds absolutely wrong
> > to reset the back-end in either of these functions.  For stateless
> > devices, it was determined to not be an issue (I still find it extremely
> > strange), and as far as I’ve understood, we’ve come to the agreement
> > that it’s basically a fallback for when there is no other way to stop
> > the back-end.  But stateful devices like virtio-fs would be completely
> > broken by resetting them there.
> >
> > Therefore, if virtiofsd did implement reset through SET_STATUS,
> > stop/cont would break it today.  Maybe other vhost-user devices, too,
> > which just implement RESET_OWNER/RESET_DEVICE, which aren’t even called
> > when the device is supposed to be reset in vhost_dev_stop() (patch 6).
> >
> > So not just because of migration, but in general, there must be a way
> > for back-ends to force qemu not to reset them in vhost_dev_start()/stop().
> >
> > Or we stop using vhost_dev_start()/stop() when the VM is paused/resumed
> > (stop/cont).
> >
> 
> Yes, that comes back to the thread [1].
> 
> As a third alternative, you can keep vhost_dev_start and let the
> function check the current state and initialize the device only if
> needed. But you can keep symmetrical functions and call one or another
> at the device's code, of course. Not sure what is cleaner or requires
> less changes.
> 
> > > This is understandable and I
> > > think we have a solution for that: to move the vhost_ops call to
> > > virtio_reset and the end of virtio_save.
> >
> > Why would we reset the device in virtio_save()?
> >
> 
> If the VM continues in the source because of whatever reason,
> vhost_dev_start would expect the device to be clean. You can test it
> with the command "cont" after the LM.
> 
> > > To remove the reset call from
> > > vhost_dev_stop is somehow mandatory, as it is called before
> > > virtio_save.
> > >
> > > But we cannot move to vhost_vdpa_dev_start, as proposed here. The reasons are:
> > > * All the features, vq parameters, etc are set before any
> > > vhost_vdpa_dev_start call. To reset at any vhost_vdpa_dev_start would
> > > wipe them.
> > > * The device needs to hold all the resources until it is reset. Things
> > > like descriptor status etc.
> > >
> > > And, regarding the comment "When RESUME is available, use it instead
> > > of resetting", we cannot use resume to replace reset in all cases.
> > > This is because the semantics are different.
> > >
> > > For example, vhost_dev_stop and vhost_dev_start are also called when
> > > the guest reset by itself the device. You can check it rmmoding and
> > > modprobing virtio-net driver, for example. In this case, the driver
> > > expects to find all vqs to start with 0, but the resume must not reset
> > > these indexes.
> >
> > This isn’t quite clear to me.  I understand this to mean that there must
> > be a reset somewhere in vhost_dev_stop() and/or vhost_dev_start().  But
> > above, you proposed moving the reset from vhost_dev_stop() into
> > virtio_reset().  Is virtio_reset() called in addition to
> > vhost_dev_stop() and vhost_dev_start() when the guest driver is changed?
> >
> 
> Right. Maybe another option is virtio_set_status?
> 
> The point is that other parts of qemu / guest trust the device to be
> reset after (current version of) vhost_dev_stop, so if we are going to
> move the reset it must be added to the callers at least. To trace
> these callers is needed, so maybe it is easy to add another function
> (vhost_dev_suspend?), your first alternative.
> 
> > Because we can’t have an always-present reset in vhost_dev_stop() or
> > vhost_dev_start().  It just doesn’t work with stop/cont.  At the same
> > time, I understand that you say we must have it because
> > vhost_dev_{stop,start}() are also used when the guest driver changes.
> > Consequently, it sounds clear to me that using the exact same functions
> > in stop/cont and when the guest driver is unloaded/loaded is and has
> > always been wrong.  Because in stop/cont, the guest driver never
> > changes, so we shouldn’t tell the back-end that it did (by sending
> > SET_STATUS(0)).
> >
> 
> Talking just about vhost_net here, the device dumps all the state
> (like vqs) to qemu in vhost_dev_stop. That is what allows to have, for
> example, an unified code in migrating: qemu only needs to know how to
> migrate its emulated device, and vhost code just writes or read at
> suspend / continue or live migrating. To suspend and continue is the
> same operation actually for vhost_net.

Hi Longpeng,
This discussion made me realize that --device vhost-vdpa-device-pci does
not support stateful vDPA devices.

For example, a vdpa-net device that implements the controlq will lose
state (e.g. packet receive filtering) when the guest is stopped with the
QEMU 'stop' command.

QEMU needs to use the VHOST_VDPA_RESUME ioctl so it can resume stateful
devices instead of resetting them across 'stop'/'cont'.

Whatever solution we agree on in this thread should also work for
vhost-vdpa and solve the issue for --device vhost-vdpa-device-pci.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2023-07-27 20:42 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek
2023-07-11 15:52 ` [PATCH 1/6] vhost-user.rst: " Hanna Czenczek
2023-07-18 14:25   ` Stefan Hajnoczi
2023-07-19 13:59     ` Hanna Czenczek
2023-07-24 17:55       ` Stefan Hajnoczi
2023-07-25  8:30         ` Hanna Czenczek
2023-07-27 21:12           ` Stefan Hajnoczi
2023-07-11 15:52 ` [PATCH 2/6] vhost-vdpa: Move vhost_vdpa_reset_status() up Hanna Czenczek
2023-07-18 14:29   ` Stefan Hajnoczi
2023-07-11 15:52 ` [PATCH 3/6] vhost: Do not reset suspended devices on stop Hanna Czenczek
2023-07-18 14:33   ` Stefan Hajnoczi
2023-07-21 15:25   ` Eugenio Perez Martin
2023-07-21 16:07     ` Hanna Czenczek
2023-07-24 15:48       ` Eugenio Perez Martin
2023-07-25  7:53         ` Hanna Czenczek
2023-07-25 10:03           ` Eugenio Perez Martin
2023-07-25 13:09             ` Hanna Czenczek
2023-07-25 18:53               ` Eugenio Perez Martin
2023-07-26  6:57                 ` Hanna Czenczek
2023-07-27 12:49                   ` Eugenio Perez Martin
2023-07-27 20:26                     ` Stefan Hajnoczi [this message]
2023-07-11 15:52 ` [PATCH 4/6] vhost-user: Implement suspend/resume Hanna Czenczek
2023-07-18 14:37   ` Stefan Hajnoczi
2023-07-11 15:52 ` [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset Hanna Czenczek
2023-07-18 14:50   ` Stefan Hajnoczi
2023-07-19 14:09     ` Hanna Czenczek
2023-07-19 15:06       ` Stefan Hajnoczi
2023-07-21 15:47       ` Eugenio Perez Martin
2023-07-11 15:52 ` [PATCH 6/6] vhost-user: Have reset_status fall back to reset Hanna Czenczek
2023-07-18 15:10   ` Stefan Hajnoczi
2023-07-19 14:11     ` Hanna Czenczek
2023-07-19 14:27       ` Hanna Czenczek
2023-07-20 16:03         ` Stefan Hajnoczi
2023-07-21 14:16           ` Hanna Czenczek
2023-07-24 18:04             ` Stefan Hajnoczi
2023-07-25  8:39               ` Hanna Czenczek
2023-07-18 15:14 ` [PATCH 0/6] vhost-user: Add suspend/resume Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230727202605.GB986673@fedora \
    --to=stefanha@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=gmaglione@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=longpeng2@huawei.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).