* [PATCH 0/6] vhost-user: Add suspend/resume @ 2023-07-11 15:52 Hanna Czenczek 2023-07-11 15:52 ` [PATCH 1/6] vhost-user.rst: " Hanna Czenczek ` (6 more replies) 0 siblings, 7 replies; 37+ messages in thread From: Hanna Czenczek @ 2023-07-11 15:52 UTC (permalink / raw) To: qemu-devel Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi, Eugenio Pérez, German Maglione Hi, As discussed on the previous version of the virtio-fs migration series (https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html), we currently don’t have a good way to have a vhost-user back-end fully cease all operations, including background operations. To work around this, we reset it, which is not an option for stateful devices like virtio-fs. Instead, we want the same SUSPEND/RESUME model that vhost-vdpa already has, so that we can suspend back-ends when we want them to stop doing anything (i.e. on VM stop), and resume them later (i.e. on VM resume). This series adds these vhost-user operations to the protocol and implements them in qemu. Furthermore, it has vhost-user and vhost-vdpa do roughly the same thing in their reset paths, as far as possible. That path will still remain as a fall-back if SUSPEND/RESUME is not implemented, and, given that qemu’s vhost-vdpa code currently does not make use of RESUME, it is actually always used for vhost-vdpa (to take the device out of a suspended state). Hanna Czenczek (6): vhost-user.rst: Add suspend/resume vhost-vdpa: Move vhost_vdpa_reset_status() up vhost: Do not reset suspended devices on stop vhost-user: Implement suspend/resume vhost-vdpa: Match vhost-user's status reset vhost-user: Have reset_status fall back to reset docs/interop/vhost-user.rst | 35 +++++++++++- include/hw/virtio/vhost-vdpa.h | 2 - include/hw/virtio/vhost.h | 8 +++ hw/virtio/vhost-user.c | 101 ++++++++++++++++++++++++++++++++- hw/virtio/vhost-vdpa.c | 41 ++++++------- hw/virtio/vhost.c | 8 ++- 6 files changed, 169 insertions(+), 26 deletions(-) -- 2.41.0 ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 1/6] vhost-user.rst: Add suspend/resume 2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek @ 2023-07-11 15:52 ` Hanna Czenczek 2023-07-18 14:25 ` Stefan Hajnoczi 2023-07-11 15:52 ` [PATCH 2/6] vhost-vdpa: Move vhost_vdpa_reset_status() up Hanna Czenczek ` (5 subsequent siblings) 6 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-11 15:52 UTC (permalink / raw) To: qemu-devel Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi, Eugenio Pérez, German Maglione When stopping the VM, qemu wants all devices to fully cease any operation, too. Currently, we can only have vhost-user back-ends stop processing vrings, but not background operations. Add the SUSPEND and RESUME commands from vDPA, which allow the front-end (qemu) to tell back-ends to cease all operations, including those running in the background. qemu's current work-around for this is to reset the back-end instead of suspending it, which will not work for back-ends that have internal state that must be preserved across e.g. stop/cont. Note that the given specification requires the back-end to delay processing kicks (i.e. starting vrings) until the device is resumed, instead of requiring the front-end not to send kicks while suspended. qemu starts devices (and would just resume them) only when the VM is in a running state, so it would be difficult to have qemu delay kicks until the device is resumed, which is why this patch specifies handling of kicks as it does. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> --- docs/interop/vhost-user.rst | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst index 5a070adbc1..ac6be34c4c 100644 --- a/docs/interop/vhost-user.rst +++ b/docs/interop/vhost-user.rst @@ -381,6 +381,10 @@ readable) on the descriptor specified by ``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated, and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``. +While the back-end is suspended (via ``VHOST_USER_SUSPEND``), it must +never process rings, and thus also delay handling kicks until the +back-end is resumed again. + Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the @@ -479,8 +483,9 @@ supplied by ``VhostUserLog``. ancillary data, it may be used to inform the front-end that the log has been modified. -Once the source has finished migration, rings will be stopped by the -source. No further update must be done before rings are restarted. +Once the source has finished migration, the device will be suspended and +its rings will be stopped by the source. No further update must be done +before the device and its rings are resumed. In postcopy migration the back-end is started before all the memory has been received from the source host, and care must be taken to avoid @@ -885,6 +890,7 @@ Protocol features #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 #define VHOST_USER_PROTOCOL_F_STATUS 16 #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17 + #define VHOST_USER_PROTOCOL_F_SUSPEND 18 Front-end message types ----------------------- @@ -1440,6 +1446,31 @@ Front-end message types query the back-end for its device status as defined in the Virtio specification. +``VHOST_USER_SUSPEND`` + :id: 41 + :equivalent ioctl: VHOST_VDPA_SUSPEND + :request payload: N/A + :reply payload: N/A + + When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been + successfully negotiated, this message is submitted by the front-end to + have the back-end cease all operations except for handling vhost-user + requests. The back-end must stop processing all virt queues, and it + must not perform any background operations. It may not resume until a + subsequent ``VHOST_USER_RESUME`` call. + +``VHOST_USER_RESUME`` + :id: 42 + :equivalent ioctl: VHOST_VDPA_RESUME + :request payload: N/A + :reply payload: N/A + + When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been + successfully negotiated, this message is submitted by the front-end to + allow the back-end to resume operations after having been suspended + before. The back-end must again begin processing rings that are not + stopped, and it may resume background operations. + Back-end message types ---------------------- -- 2.41.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 1/6] vhost-user.rst: Add suspend/resume 2023-07-11 15:52 ` [PATCH 1/6] vhost-user.rst: " Hanna Czenczek @ 2023-07-18 14:25 ` Stefan Hajnoczi 2023-07-19 13:59 ` Hanna Czenczek 0 siblings, 1 reply; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-18 14:25 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione [-- Attachment #1: Type: text/plain, Size: 5477 bytes --] On Tue, Jul 11, 2023 at 05:52:23PM +0200, Hanna Czenczek wrote: > When stopping the VM, qemu wants all devices to fully cease any > operation, too. Currently, we can only have vhost-user back-ends stop > processing vrings, but not background operations. Add the SUSPEND and > RESUME commands from vDPA, which allow the front-end (qemu) to tell > back-ends to cease all operations, including those running in the > background. > > qemu's current work-around for this is to reset the back-end instead of > suspending it, which will not work for back-ends that have internal > state that must be preserved across e.g. stop/cont. > > Note that the given specification requires the back-end to delay > processing kicks (i.e. starting vrings) until the device is resumed, > instead of requiring the front-end not to send kicks while suspended. > qemu starts devices (and would just resume them) only when the VM is in > a running state, so it would be difficult to have qemu delay kicks until > the device is resumed, which is why this patch specifies handling of > kicks as it does. > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > --- > docs/interop/vhost-user.rst | 35 +++++++++++++++++++++++++++++++++-- > 1 file changed, 33 insertions(+), 2 deletions(-) > > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst > index 5a070adbc1..ac6be34c4c 100644 > --- a/docs/interop/vhost-user.rst > +++ b/docs/interop/vhost-user.rst > @@ -381,6 +381,10 @@ readable) on the descriptor specified by ``VHOST_USER_SET_VRING_KICK`` > or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated, > and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``. > > +While the back-end is suspended (via ``VHOST_USER_SUSPEND``), it must > +never process rings, and thus also delay handling kicks until the If you respin this series, I suggest replacing "never" with "not" to emphasize that ring processing is only skipped while the device is suspended (rather than forever). "Never" feels too strong to use when describing a temporary state. > +back-end is resumed again. > + > Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. > > If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the > @@ -479,8 +483,9 @@ supplied by ``VhostUserLog``. > ancillary data, it may be used to inform the front-end that the log has > been modified. > > -Once the source has finished migration, rings will be stopped by the > -source. No further update must be done before rings are restarted. > +Once the source has finished migration, the device will be suspended and > +its rings will be stopped by the source. No further update must be done > +before the device and its rings are resumed. This paragraph is abstract and doesn't directly identify the mechanisms or who does what: - "the device will be suspended" via VHOST_USER_SUSPEND (or reset when VHOST_USER_SUSPEND is not supported?) or automatically by the device itself or some other mechanism? - "before the device and its rings are resumed" via VHOST_USER_RESUME? And is this referring to the source device? Please rephrase the paragraph to identify the vhost-user messages involved. > > In postcopy migration the back-end is started before all the memory has > been received from the source host, and care must be taken to avoid > @@ -885,6 +890,7 @@ Protocol features > #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 > #define VHOST_USER_PROTOCOL_F_STATUS 16 > #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17 > + #define VHOST_USER_PROTOCOL_F_SUSPEND 18 > > Front-end message types > ----------------------- > @@ -1440,6 +1446,31 @@ Front-end message types > query the back-end for its device status as defined in the Virtio > specification. > > +``VHOST_USER_SUSPEND`` > + :id: 41 > + :equivalent ioctl: VHOST_VDPA_SUSPEND > + :request payload: N/A > + :reply payload: N/A > + > + When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been > + successfully negotiated, this message is submitted by the front-end to > + have the back-end cease all operations except for handling vhost-user > + requests. The back-end must stop processing all virt queues, and it > + must not perform any background operations. It may not resume until a "background operations" are not defined. What does it mean: - Anything that writes to memory slots - Anything that changes the visible state of the device - Anything that changes the non-visible internal state of the device - etc ? > + subsequent ``VHOST_USER_RESUME`` call. > + > +``VHOST_USER_RESUME`` > + :id: 42 > + :equivalent ioctl: VHOST_VDPA_RESUME > + :request payload: N/A > + :reply payload: N/A > + > + When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been > + successfully negotiated, this message is submitted by the front-end to > + allow the back-end to resume operations after having been suspended > + before. The back-end must again begin processing rings that are not This can be made more concrete by referencing the vhost-user message used to suspend the device: "suspended before" -> "suspended with VHOST_USER_SUSPEND" > + stopped, and it may resume background operations. > + > > Back-end message types > ---------------------- > -- > 2.41.0 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/6] vhost-user.rst: Add suspend/resume 2023-07-18 14:25 ` Stefan Hajnoczi @ 2023-07-19 13:59 ` Hanna Czenczek 2023-07-24 17:55 ` Stefan Hajnoczi 0 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-19 13:59 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione On 18.07.23 16:25, Stefan Hajnoczi wrote: > On Tue, Jul 11, 2023 at 05:52:23PM +0200, Hanna Czenczek wrote: >> When stopping the VM, qemu wants all devices to fully cease any >> operation, too. Currently, we can only have vhost-user back-ends stop >> processing vrings, but not background operations. Add the SUSPEND and >> RESUME commands from vDPA, which allow the front-end (qemu) to tell >> back-ends to cease all operations, including those running in the >> background. >> >> qemu's current work-around for this is to reset the back-end instead of >> suspending it, which will not work for back-ends that have internal >> state that must be preserved across e.g. stop/cont. >> >> Note that the given specification requires the back-end to delay >> processing kicks (i.e. starting vrings) until the device is resumed, >> instead of requiring the front-end not to send kicks while suspended. >> qemu starts devices (and would just resume them) only when the VM is in >> a running state, so it would be difficult to have qemu delay kicks until >> the device is resumed, which is why this patch specifies handling of >> kicks as it does. >> >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >> --- >> docs/interop/vhost-user.rst | 35 +++++++++++++++++++++++++++++++++-- >> 1 file changed, 33 insertions(+), 2 deletions(-) >> >> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst >> index 5a070adbc1..ac6be34c4c 100644 >> --- a/docs/interop/vhost-user.rst >> +++ b/docs/interop/vhost-user.rst >> @@ -381,6 +381,10 @@ readable) on the descriptor specified by ``VHOST_USER_SET_VRING_KICK`` >> or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated, >> and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``. >> >> +While the back-end is suspended (via ``VHOST_USER_SUSPEND``), it must >> +never process rings, and thus also delay handling kicks until the > If you respin this series, I suggest replacing "never" with "not" to > emphasize that ring processing is only skipped while the device is > suspended (rather than forever). "Never" feels too strong to use when > describing a temporary state. Sure. >> +back-end is resumed again. >> + >> Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. >> >> If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the >> @@ -479,8 +483,9 @@ supplied by ``VhostUserLog``. >> ancillary data, it may be used to inform the front-end that the log has >> been modified. >> >> -Once the source has finished migration, rings will be stopped by the >> -source. No further update must be done before rings are restarted. >> +Once the source has finished migration, the device will be suspended and >> +its rings will be stopped by the source. No further update must be done >> +before the device and its rings are resumed. > This paragraph is abstract and doesn't directly identify the mechanisms > or who does what: > - "the device will be suspended" via VHOST_USER_SUSPEND (or reset when > VHOST_USER_SUSPEND is not supported?) or automatically by the device > itself or some other mechanism? OK, I’ll add VHOST_USER_SUSPEND. So far I hadn’t considered making a note of resetting as a fallback right in the specification. I don’t think I would want it in the specification, but on the other hand, it is probably important for back-end authors to know that if they do not implement SUSPEND, their device is going to be reset by qemu. Can we make that a ”may”, i.e.: ``` Once the source has finished migration, the device will be suspended via VHOST_USER_SUSPEND and its rings will be stopped by the source. No further update must be done before the device and its rings are resumed. If and only if the back-end does not support VHOST_USER_SUSPEND, the front-end may reset it instead (via VHOST_USER_SET_STATUS, VHOST_USER_RESET_DEVICE, or VHOST_USER_RESET_OWNER). ``` I’m unsure about the “If and only if” – older qemu versions will break this, but I feel like we must promise back-end writers that if they implement SUSPEND, their back-end is not going to be reset; if it is, and something breaks because of it, it’s the front-end that must be updated to match the specification. > - "before the device and its rings are resumed" via VHOST_USER_RESUME? > And is this referring to the source device? Yes, via VHOST_USER_RESUME, and restarting the rings by starting them (i.e. a kick). Whether this is referring to the source device… Well, the text as it was before begs the exact same question, so honestly, I don’t know for sure. “Restarting” only makes sense if the rings were stopped before, so I assume it’s referring to the source, e.g. for the case of a failed migration. RESUME at least definitely will only happen after a prior SUSPEND, so this one will definitely only apply on the source side. > Please rephrase the paragraph to identify the vhost-user messages > involved. > >> >> In postcopy migration the back-end is started before all the memory has >> been received from the source host, and care must be taken to avoid >> @@ -885,6 +890,7 @@ Protocol features >> #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 >> #define VHOST_USER_PROTOCOL_F_STATUS 16 >> #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17 >> + #define VHOST_USER_PROTOCOL_F_SUSPEND 18 >> >> Front-end message types >> ----------------------- >> @@ -1440,6 +1446,31 @@ Front-end message types >> query the back-end for its device status as defined in the Virtio >> specification. >> >> +``VHOST_USER_SUSPEND`` >> + :id: 41 >> + :equivalent ioctl: VHOST_VDPA_SUSPEND >> + :request payload: N/A >> + :reply payload: N/A >> + >> + When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been >> + successfully negotiated, this message is submitted by the front-end to >> + have the back-end cease all operations except for handling vhost-user >> + requests. The back-end must stop processing all virt queues, and it >> + must not perform any background operations. It may not resume until a > "background operations" are not defined. What does it mean: > - Anything that writes to memory slots > - Anything that changes the visible state of the device > - Anything that changes the non-visible internal state of the device > - etc > ? My best answer (honestly) is: You tell me. This series is introducing SUSPEND/RESUME because qemu wants to reset devices to make them stop “background operations”, and this would break virtiofsd if any form of reset were actually implemented. The implementation of SUSPEND/RESUME in virtiofsd on the other hand is supposed to basically be a no-op (besides delaying ring processing until a RESUME, but even if we processed them before, i.e. really make SUSPEND/RESUME no-ops, it would most likely work out fine), so I have no idea what kind of background operations we are even talking about, or whether any such actually exist in practice. I don’t know what anyone had in mind when introducing the RESET. It comes from vDPA (c3716f260bf moved it from vdpa into the common code), and exists since the code was originally added (108a64818e6), so there’s no comment explaining why it exists. I can’t explain what the back-end is supposed to stop doing, because so far it isn’t explained anywhere in qemu, its git log, or in any documentation (there basically is no vdpa documentation). I can only say what I just completely naïvely assumed it to mean so far: That the back-end basically should stop doing anything besides handling and replying to vhost-user messages. If such a message requires changing any state, visible or not, then this state change is permissible. >> + subsequent ``VHOST_USER_RESUME`` call. >> + >> +``VHOST_USER_RESUME`` >> + :id: 42 >> + :equivalent ioctl: VHOST_VDPA_RESUME >> + :request payload: N/A >> + :reply payload: N/A >> + >> + When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been >> + successfully negotiated, this message is submitted by the front-end to >> + allow the back-end to resume operations after having been suspended >> + before. The back-end must again begin processing rings that are not > This can be made more concrete by referencing the vhost-user message > used to suspend the device: > > "suspended before" -> "suspended with VHOST_USER_SUSPEND" Alright. Hanna >> + stopped, and it may resume background operations. >> + >> >> Back-end message types >> ---------------------- >> -- >> 2.41.0 >> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/6] vhost-user.rst: Add suspend/resume 2023-07-19 13:59 ` Hanna Czenczek @ 2023-07-24 17:55 ` Stefan Hajnoczi 2023-07-25 8:30 ` Hanna Czenczek 0 siblings, 1 reply; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-24 17:55 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione [-- Attachment #1: Type: text/plain, Size: 9531 bytes --] On Wed, Jul 19, 2023 at 03:59:32PM +0200, Hanna Czenczek wrote: > On 18.07.23 16:25, Stefan Hajnoczi wrote: > > On Tue, Jul 11, 2023 at 05:52:23PM +0200, Hanna Czenczek wrote: > > > When stopping the VM, qemu wants all devices to fully cease any > > > operation, too. Currently, we can only have vhost-user back-ends stop > > > processing vrings, but not background operations. Add the SUSPEND and > > > RESUME commands from vDPA, which allow the front-end (qemu) to tell > > > back-ends to cease all operations, including those running in the > > > background. > > > > > > qemu's current work-around for this is to reset the back-end instead of > > > suspending it, which will not work for back-ends that have internal > > > state that must be preserved across e.g. stop/cont. > > > > > > Note that the given specification requires the back-end to delay > > > processing kicks (i.e. starting vrings) until the device is resumed, > > > instead of requiring the front-end not to send kicks while suspended. > > > qemu starts devices (and would just resume them) only when the VM is in > > > a running state, so it would be difficult to have qemu delay kicks until > > > the device is resumed, which is why this patch specifies handling of > > > kicks as it does. > > > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > > > --- > > > docs/interop/vhost-user.rst | 35 +++++++++++++++++++++++++++++++++-- > > > 1 file changed, 33 insertions(+), 2 deletions(-) > > > > > > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst > > > index 5a070adbc1..ac6be34c4c 100644 > > > --- a/docs/interop/vhost-user.rst > > > +++ b/docs/interop/vhost-user.rst > > > @@ -381,6 +381,10 @@ readable) on the descriptor specified by ``VHOST_USER_SET_VRING_KICK`` > > > or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated, > > > and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``. > > > +While the back-end is suspended (via ``VHOST_USER_SUSPEND``), it must > > > +never process rings, and thus also delay handling kicks until the > > If you respin this series, I suggest replacing "never" with "not" to > > emphasize that ring processing is only skipped while the device is > > suspended (rather than forever). "Never" feels too strong to use when > > describing a temporary state. > > Sure. > > > > +back-end is resumed again. > > > + > > > Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. > > > If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the > > > @@ -479,8 +483,9 @@ supplied by ``VhostUserLog``. > > > ancillary data, it may be used to inform the front-end that the log has > > > been modified. > > > -Once the source has finished migration, rings will be stopped by the > > > -source. No further update must be done before rings are restarted. > > > +Once the source has finished migration, the device will be suspended and > > > +its rings will be stopped by the source. No further update must be done > > > +before the device and its rings are resumed. > > This paragraph is abstract and doesn't directly identify the mechanisms > > or who does what: > > - "the device will be suspended" via VHOST_USER_SUSPEND (or reset when > > VHOST_USER_SUSPEND is not supported?) or automatically by the device > > itself or some other mechanism? > > OK, I’ll add VHOST_USER_SUSPEND. > > So far I hadn’t considered making a note of resetting as a fallback right in > the specification. I don’t think I would want it in the specification, but > on the other hand, it is probably important for back-end authors to know > that if they do not implement SUSPEND, their device is going to be reset by > qemu. > > Can we make that a ”may”, i.e.: > > ``` > Once the source has finished migration, the device will be suspended via > VHOST_USER_SUSPEND and its rings will be stopped by the source. I'm not sure what "its rings will be stopped by the source" means exactly. Is it summarizing the effect of VHOST_USER_SUSPEND or is there an additional action after VHOST_USER_SUSPEND that stops rings? And I'm not sure whether "by the source" means by the front-end or back-end on the source machine? > No further > update must be done before the device and its rings are resumed. "Update" to what? Guest RAM? Device state? Rings? I feel like this text is too vague for a spec. People may interpret it differently. Can you make rephrase this to more concrete? > If and only > if the back-end does not support VHOST_USER_SUSPEND, the front-end may reset > it instead (via VHOST_USER_SET_STATUS, VHOST_USER_RESET_DEVICE, or > VHOST_USER_RESET_OWNER). > ``` > > I’m unsure about the “If and only if” – older qemu versions will break this, > but I feel like we must promise back-end writers that if they implement > SUSPEND, their back-end is not going to be reset; if it is, and something > breaks because of it, it’s the front-end that must be updated to match the > specification. I this the trick is to say "if and only if VHOST_USER_F_SUSPEND has not been negotiated". That way really only new front-ends that support VHOST_USER_SUSPEND are required to use suspend instead of reset and older versions of QEMU will not violate this statement. > > > - "before the device and its rings are resumed" via VHOST_USER_RESUME? > > And is this referring to the source device? > > Yes, via VHOST_USER_RESUME, and restarting the rings by starting them (i.e. > a kick). > > Whether this is referring to the source device… Well, the text as it was > before begs the exact same question, so honestly, I don’t know for sure. > “Restarting” only makes sense if the rings were stopped before, so I assume > it’s referring to the source, e.g. for the case of a failed migration. > RESUME at least definitely will only happen after a prior SUSPEND, so this > one will definitely only apply on the source side. > > > Please rephrase the paragraph to identify the vhost-user messages > > involved. > > > > > In postcopy migration the back-end is started before all the memory has > > > been received from the source host, and care must be taken to avoid > > > @@ -885,6 +890,7 @@ Protocol features > > > #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 > > > #define VHOST_USER_PROTOCOL_F_STATUS 16 > > > #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17 > > > + #define VHOST_USER_PROTOCOL_F_SUSPEND 18 > > > Front-end message types > > > ----------------------- > > > @@ -1440,6 +1446,31 @@ Front-end message types > > > query the back-end for its device status as defined in the Virtio > > > specification. > > > +``VHOST_USER_SUSPEND`` > > > + :id: 41 > > > + :equivalent ioctl: VHOST_VDPA_SUSPEND > > > + :request payload: N/A > > > + :reply payload: N/A > > > + > > > + When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been > > > + successfully negotiated, this message is submitted by the front-end to > > > + have the back-end cease all operations except for handling vhost-user > > > + requests. The back-end must stop processing all virt queues, and it > > > + must not perform any background operations. It may not resume until a > > "background operations" are not defined. What does it mean: > > - Anything that writes to memory slots > > - Anything that changes the visible state of the device > > - Anything that changes the non-visible internal state of the device > > - etc > > ? > > My best answer (honestly) is: You tell me. This series is introducing > SUSPEND/RESUME because qemu wants to reset devices to make them stop > “background operations”, and this would break virtiofsd if any form of reset > were actually implemented. The implementation of SUSPEND/RESUME in > virtiofsd on the other hand is supposed to basically be a no-op (besides > delaying ring processing until a RESUME, but even if we processed them > before, i.e. really make SUSPEND/RESUME no-ops, it would most likely work > out fine), so I have no idea what kind of background operations we are even > talking about, or whether any such actually exist in practice. > > I don’t know what anyone had in mind when introducing the RESET. It comes > from vDPA (c3716f260bf moved it from vdpa into the common code), and exists > since the code was originally added (108a64818e6), so there’s no comment > explaining why it exists. I can’t explain what the back-end is supposed to > stop doing, because so far it isn’t explained anywhere in qemu, its git log, > or in any documentation (there basically is no vdpa documentation). > > I can only say what I just completely naïvely assumed it to mean so far: > That the back-end basically should stop doing anything besides handling and > replying to vhost-user messages. If such a message requires changing any > state, visible or not, then this state change is permissible. Okay, I suggest the following instead of "background operations": - Changes to the device state produced by SET_DEVICE_STATE_FD. - Writing to memory regions. - Signalling that buffers have been used. - Signalling that the configuration space has changed. The goal is to ensure the device state and memory regions are stable and that back-end doesn't trigger activity in the front-end. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/6] vhost-user.rst: Add suspend/resume 2023-07-24 17:55 ` Stefan Hajnoczi @ 2023-07-25 8:30 ` Hanna Czenczek 2023-07-27 21:12 ` Stefan Hajnoczi 0 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-25 8:30 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione On 24.07.23 19:55, Stefan Hajnoczi wrote: > On Wed, Jul 19, 2023 at 03:59:32PM +0200, Hanna Czenczek wrote: >> On 18.07.23 16:25, Stefan Hajnoczi wrote: >>> On Tue, Jul 11, 2023 at 05:52:23PM +0200, Hanna Czenczek wrote: >>>> When stopping the VM, qemu wants all devices to fully cease any >>>> operation, too. Currently, we can only have vhost-user back-ends stop >>>> processing vrings, but not background operations. Add the SUSPEND and >>>> RESUME commands from vDPA, which allow the front-end (qemu) to tell >>>> back-ends to cease all operations, including those running in the >>>> background. >>>> >>>> qemu's current work-around for this is to reset the back-end instead of >>>> suspending it, which will not work for back-ends that have internal >>>> state that must be preserved across e.g. stop/cont. >>>> >>>> Note that the given specification requires the back-end to delay >>>> processing kicks (i.e. starting vrings) until the device is resumed, >>>> instead of requiring the front-end not to send kicks while suspended. >>>> qemu starts devices (and would just resume them) only when the VM is in >>>> a running state, so it would be difficult to have qemu delay kicks until >>>> the device is resumed, which is why this patch specifies handling of >>>> kicks as it does. >>>> >>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>> --- >>>> docs/interop/vhost-user.rst | 35 +++++++++++++++++++++++++++++++++-- >>>> 1 file changed, 33 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst >>>> index 5a070adbc1..ac6be34c4c 100644 >>>> --- a/docs/interop/vhost-user.rst >>>> +++ b/docs/interop/vhost-user.rst >>>> @@ -381,6 +381,10 @@ readable) on the descriptor specified by ``VHOST_USER_SET_VRING_KICK`` >>>> or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated, >>>> and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``. >>>> +While the back-end is suspended (via ``VHOST_USER_SUSPEND``), it must >>>> +never process rings, and thus also delay handling kicks until the >>> If you respin this series, I suggest replacing "never" with "not" to >>> emphasize that ring processing is only skipped while the device is >>> suspended (rather than forever). "Never" feels too strong to use when >>> describing a temporary state. >> Sure. >> >>>> +back-end is resumed again. >>>> + >>>> Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. >>>> If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the >>>> @@ -479,8 +483,9 @@ supplied by ``VhostUserLog``. >>>> ancillary data, it may be used to inform the front-end that the log has >>>> been modified. >>>> -Once the source has finished migration, rings will be stopped by the >>>> -source. No further update must be done before rings are restarted. >>>> +Once the source has finished migration, the device will be suspended and >>>> +its rings will be stopped by the source. No further update must be done >>>> +before the device and its rings are resumed. >>> This paragraph is abstract and doesn't directly identify the mechanisms >>> or who does what: >>> - "the device will be suspended" via VHOST_USER_SUSPEND (or reset when >>> VHOST_USER_SUSPEND is not supported?) or automatically by the device >>> itself or some other mechanism? >> OK, I’ll add VHOST_USER_SUSPEND. >> >> So far I hadn’t considered making a note of resetting as a fallback right in >> the specification. I don’t think I would want it in the specification, but >> on the other hand, it is probably important for back-end authors to know >> that if they do not implement SUSPEND, their device is going to be reset by >> qemu. >> >> Can we make that a ”may”, i.e.: >> >> ``` >> Once the source has finished migration, the device will be suspended via >> VHOST_USER_SUSPEND and its rings will be stopped by the source. > I'm not sure what "its rings will be stopped by the source" means > exactly. Is it summarizing the effect of VHOST_USER_SUSPEND or is there > an additional action after VHOST_USER_SUSPEND that stops rings? And I'm > not sure whether "by the source" means by the front-end or back-end on > the source machine? This is pre-existing text and I assumed it (with not doubt) to refer to GET_VRING_BASE, because that is how rings are stopped. I can improve the existing documentation and add the reference to GET_VRING_BASE, and say that it clearly must come from the front-end. >> No further >> update must be done before the device and its rings are resumed. > "Update" to what? Guest RAM? Device state? Rings? > > I feel like this text is too vague for a spec. People may interpret it > differently. Can you make rephrase this to more concrete? Honestly, no. This is pre-existing, and I have the same questions as you do. I cannot “rephrase” this to make it more concrete. I can try to actually specify that was was left unspecified, but that would be a change in specification that would require its own patch, separate from this series. Personally, I’ve generally taken this sentence to be fluff. If the rings are stopped, clearly, they should not be accessed at all. Probably the back-end should also refrain from writing to guest memory, because that is a logical conclusion from having the rings stopped. But now it’s even clearer: The back-end ideally is suspended, which directly means not to modify guest memory, and not to perform “background operations”. Updating device state of course is possible through vhost-user commands, because those must always be executed. So basically it’s just “Device and rings are stopped (RESUME and GET_VRING_BASE, resp.), so you know, adhere to that.” Or maybe I’m completely wrong and “Once the source has finished migration, rings will be stopped by the source. No further update must be done before rings are restarted.” is to be taken together and the second sentence just refers to the rings, i.e. the front-end stops the rings, and the back-end must not update them. Or it means that the front-end must not send any commands to the back-end until restarting the rings, but that feels practically impossible. Again, because this sentence currently doesn’t specify anything, really, changing it to carry any meaning would be to add to the specification, not just clarify it. >> If and only >> if the back-end does not support VHOST_USER_SUSPEND, the front-end may reset >> it instead (via VHOST_USER_SET_STATUS, VHOST_USER_RESET_DEVICE, or >> VHOST_USER_RESET_OWNER). >> ``` >> >> I’m unsure about the “If and only if” – older qemu versions will break this, >> but I feel like we must promise back-end writers that if they implement >> SUSPEND, their back-end is not going to be reset; if it is, and something >> breaks because of it, it’s the front-end that must be updated to match the >> specification. > I this the trick is to say "if and only if VHOST_USER_F_SUSPEND has not > been negotiated". That way really only new front-ends that support > VHOST_USER_SUSPEND are required to use suspend instead of reset and > older versions of QEMU will not violate this statement. Ah, right, thanks! >>> - "before the device and its rings are resumed" via VHOST_USER_RESUME? >>> And is this referring to the source device? >> Yes, via VHOST_USER_RESUME, and restarting the rings by starting them (i.e. >> a kick). >> >> Whether this is referring to the source device… Well, the text as it was >> before begs the exact same question, so honestly, I don’t know for sure. >> “Restarting” only makes sense if the rings were stopped before, so I assume >> it’s referring to the source, e.g. for the case of a failed migration. >> RESUME at least definitely will only happen after a prior SUSPEND, so this >> one will definitely only apply on the source side. >> >>> Please rephrase the paragraph to identify the vhost-user messages >>> involved. >>> >>>> In postcopy migration the back-end is started before all the memory has >>>> been received from the source host, and care must be taken to avoid >>>> @@ -885,6 +890,7 @@ Protocol features >>>> #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 >>>> #define VHOST_USER_PROTOCOL_F_STATUS 16 >>>> #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17 >>>> + #define VHOST_USER_PROTOCOL_F_SUSPEND 18 >>>> Front-end message types >>>> ----------------------- >>>> @@ -1440,6 +1446,31 @@ Front-end message types >>>> query the back-end for its device status as defined in the Virtio >>>> specification. >>>> +``VHOST_USER_SUSPEND`` >>>> + :id: 41 >>>> + :equivalent ioctl: VHOST_VDPA_SUSPEND >>>> + :request payload: N/A >>>> + :reply payload: N/A >>>> + >>>> + When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been >>>> + successfully negotiated, this message is submitted by the front-end to >>>> + have the back-end cease all operations except for handling vhost-user >>>> + requests. The back-end must stop processing all virt queues, and it >>>> + must not perform any background operations. It may not resume until a >>> "background operations" are not defined. What does it mean: >>> - Anything that writes to memory slots >>> - Anything that changes the visible state of the device >>> - Anything that changes the non-visible internal state of the device >>> - etc >>> ? >> My best answer (honestly) is: You tell me. This series is introducing >> SUSPEND/RESUME because qemu wants to reset devices to make them stop >> “background operations”, and this would break virtiofsd if any form of reset >> were actually implemented. The implementation of SUSPEND/RESUME in >> virtiofsd on the other hand is supposed to basically be a no-op (besides >> delaying ring processing until a RESUME, but even if we processed them >> before, i.e. really make SUSPEND/RESUME no-ops, it would most likely work >> out fine), so I have no idea what kind of background operations we are even >> talking about, or whether any such actually exist in practice. >> >> I don’t know what anyone had in mind when introducing the RESET. It comes >> from vDPA (c3716f260bf moved it from vdpa into the common code), and exists >> since the code was originally added (108a64818e6), so there’s no comment >> explaining why it exists. I can’t explain what the back-end is supposed to >> stop doing, because so far it isn’t explained anywhere in qemu, its git log, >> or in any documentation (there basically is no vdpa documentation). >> >> I can only say what I just completely naïvely assumed it to mean so far: >> That the back-end basically should stop doing anything besides handling and >> replying to vhost-user messages. If such a message requires changing any >> state, visible or not, then this state change is permissible. > Okay, I suggest the following instead of "background operations": > > - Changes to the device state produced by SET_DEVICE_STATE_FD. This is definitely something that I would absolutely allow after SUSPEND. If the front-end does not wish the back-end to do this, it can just not send this command while the back-end is suspended. > - Writing to memory regions. > - Signalling that buffers have been used. This feels both too tight and not concrete enough. The only buffers I can think of are buffers supplied through virt queues, which I intended to already be included in “stop processing all virt queues”. (I took this to mean the used-buffer ring, too, but I can of course be more explicit about this, e.g. “stop processing all virt queues, including returning buffers to the driver”.) “Signalling” sounds like triggering the callfd, but that also seems clear; if you can’t process virt queues, including returning buffers, you can never trigger the callfd (or send VHOST_USER_BACKEND_VRING_CALL), because there can never be a new buffer returned to the driver. So too tight because it feels like a subset of virt queue processing, but not concrete enough because “buffers” makes me feel like I’m overlooking something besides virt queues. > - Signalling that the configuration space has changed. Maybe this could be more general, i.e. the back-end must not send any vhost-user messages to the front-end? > The goal is to ensure the device state and memory regions are stable and > that back-end doesn't trigger activity in the front-end. If the goal is to ensure that the device state is stable, I feel like we must then specify precisely this, and not just to say it must not be mutable through SET_DEVICE_STATE_FD: In general, the state is more likely to be changed by other factors after all. On the other hand, I also don’t see why that’s a goal. For migration, it might seem desirable, but I don’t actually think so: The back-end is required to send a consistent device state anyway, so it is up to the back-end to ensure the state is consistent, we don’t need to make it a requirement for SUSPEND. It is implicitly clear from the migration model that if the device state were to change after the back-end has sent it to the front-end, those change will be lost on the destination side, so it is clear that the back-end must anticipate this and work around it. Other than migration, qemu doesn’t see the device state at all. I don’t see why internal state should not change between stop/cont. If a device experience some signal that (for some reason) it can’t pause to receive only after the subsequent RESUME, it might need to make note of that signal to act on it after RESUME. I would consider that a change in internal state, and I don’t immediately see a problem with it. (It may be problematic when migrating, because receiving such signals on the source side after transferring the state would mean they’re lost, but again, this is something the device clearly has to solve, e.g. by redirecting the signals to the destination starting with the SET_TRANSFER_STATE_FD call.) Hanna ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 1/6] vhost-user.rst: Add suspend/resume 2023-07-25 8:30 ` Hanna Czenczek @ 2023-07-27 21:12 ` Stefan Hajnoczi 0 siblings, 0 replies; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-27 21:12 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione [-- Attachment #1: Type: text/plain, Size: 17061 bytes --] On Tue, Jul 25, 2023 at 10:30:49AM +0200, Hanna Czenczek wrote: > On 24.07.23 19:55, Stefan Hajnoczi wrote: > > On Wed, Jul 19, 2023 at 03:59:32PM +0200, Hanna Czenczek wrote: > > > On 18.07.23 16:25, Stefan Hajnoczi wrote: > > > > On Tue, Jul 11, 2023 at 05:52:23PM +0200, Hanna Czenczek wrote: > > > > > When stopping the VM, qemu wants all devices to fully cease any > > > > > operation, too. Currently, we can only have vhost-user back-ends stop > > > > > processing vrings, but not background operations. Add the SUSPEND and > > > > > RESUME commands from vDPA, which allow the front-end (qemu) to tell > > > > > back-ends to cease all operations, including those running in the > > > > > background. > > > > > > > > > > qemu's current work-around for this is to reset the back-end instead of > > > > > suspending it, which will not work for back-ends that have internal > > > > > state that must be preserved across e.g. stop/cont. > > > > > > > > > > Note that the given specification requires the back-end to delay > > > > > processing kicks (i.e. starting vrings) until the device is resumed, > > > > > instead of requiring the front-end not to send kicks while suspended. > > > > > qemu starts devices (and would just resume them) only when the VM is in > > > > > a running state, so it would be difficult to have qemu delay kicks until > > > > > the device is resumed, which is why this patch specifies handling of > > > > > kicks as it does. > > > > > > > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > > > > > --- > > > > > docs/interop/vhost-user.rst | 35 +++++++++++++++++++++++++++++++++-- > > > > > 1 file changed, 33 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst > > > > > index 5a070adbc1..ac6be34c4c 100644 > > > > > --- a/docs/interop/vhost-user.rst > > > > > +++ b/docs/interop/vhost-user.rst > > > > > @@ -381,6 +381,10 @@ readable) on the descriptor specified by ``VHOST_USER_SET_VRING_KICK`` > > > > > or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated, > > > > > and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``. > > > > > +While the back-end is suspended (via ``VHOST_USER_SUSPEND``), it must > > > > > +never process rings, and thus also delay handling kicks until the > > > > If you respin this series, I suggest replacing "never" with "not" to > > > > emphasize that ring processing is only skipped while the device is > > > > suspended (rather than forever). "Never" feels too strong to use when > > > > describing a temporary state. > > > Sure. > > > > > > > > +back-end is resumed again. > > > > > + > > > > > Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. > > > > > If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the > > > > > @@ -479,8 +483,9 @@ supplied by ``VhostUserLog``. > > > > > ancillary data, it may be used to inform the front-end that the log has > > > > > been modified. > > > > > -Once the source has finished migration, rings will be stopped by the > > > > > -source. No further update must be done before rings are restarted. > > > > > +Once the source has finished migration, the device will be suspended and > > > > > +its rings will be stopped by the source. No further update must be done > > > > > +before the device and its rings are resumed. > > > > This paragraph is abstract and doesn't directly identify the mechanisms > > > > or who does what: > > > > - "the device will be suspended" via VHOST_USER_SUSPEND (or reset when > > > > VHOST_USER_SUSPEND is not supported?) or automatically by the device > > > > itself or some other mechanism? > > > OK, I’ll add VHOST_USER_SUSPEND. > > > > > > So far I hadn’t considered making a note of resetting as a fallback right in > > > the specification. I don’t think I would want it in the specification, but > > > on the other hand, it is probably important for back-end authors to know > > > that if they do not implement SUSPEND, their device is going to be reset by > > > qemu. > > > > > > Can we make that a ”may”, i.e.: > > > > > > ``` > > > Once the source has finished migration, the device will be suspended via > > > VHOST_USER_SUSPEND and its rings will be stopped by the source. > > I'm not sure what "its rings will be stopped by the source" means > > exactly. Is it summarizing the effect of VHOST_USER_SUSPEND or is there > > an additional action after VHOST_USER_SUSPEND that stops rings? And I'm > > not sure whether "by the source" means by the front-end or back-end on > > the source machine? > > This is pre-existing text and I assumed it (with not doubt) to refer to > GET_VRING_BASE, because that is how rings are stopped. > > I can improve the existing documentation and add the reference to > GET_VRING_BASE, and say that it clearly must come from the front-end. Yes, please. > > > > No further > > > update must be done before the device and its rings are resumed. > > "Update" to what? Guest RAM? Device state? Rings? > > > > I feel like this text is too vague for a spec. People may interpret it > > differently. Can you make rephrase this to more concrete? > > Honestly, no. This is pre-existing, and I have the same questions as you > do. > > I cannot “rephrase” this to make it more concrete. I can try to actually > specify that was was left unspecified, but that would be a change in > specification that would require its own patch, separate from this series. I want to make sure that VHOST_USER_SUSPEND is well-defined. If it's pre-existing live migration text that doesn't involve VHOST_USER_SUSPEND, then it would be unfair to ask you to rewrite it. > > Personally, I’ve generally taken this sentence to be fluff. If the rings > are stopped, clearly, they should not be accessed at all. Probably the > back-end should also refrain from writing to guest memory, because that is a > logical conclusion from having the rings stopped. But now it’s even > clearer: The back-end ideally is suspended, which directly means not to > modify guest memory, and not to perform “background operations”. Updating > device state of course is possible through vhost-user commands, because > those must always be executed. > > So basically it’s just “Device and rings are stopped (RESUME and > GET_VRING_BASE, resp.), so you know, adhere to that.” > > Or maybe I’m completely wrong and “Once the source has finished migration, > rings will be stopped by the source. No further update must be done before > rings are restarted.” is to be taken together and the second sentence just > refers to the rings, i.e. the front-end stops the rings, and the back-end > must not update them. Or it means that the front-end must not send any > commands to the back-end until restarting the rings, but that feels > practically impossible. > > Again, because this sentence currently doesn’t specify anything, really, > changing it to carry any meaning would be to add to the specification, not > just clarify it. > > > > If and only > > > if the back-end does not support VHOST_USER_SUSPEND, the front-end may reset > > > it instead (via VHOST_USER_SET_STATUS, VHOST_USER_RESET_DEVICE, or > > > VHOST_USER_RESET_OWNER). > > > ``` > > > > > > I’m unsure about the “If and only if” – older qemu versions will break this, > > > but I feel like we must promise back-end writers that if they implement > > > SUSPEND, their back-end is not going to be reset; if it is, and something > > > breaks because of it, it’s the front-end that must be updated to match the > > > specification. > > I this the trick is to say "if and only if VHOST_USER_F_SUSPEND has not > > been negotiated". That way really only new front-ends that support > > VHOST_USER_SUSPEND are required to use suspend instead of reset and > > older versions of QEMU will not violate this statement. > > Ah, right, thanks! > > > > > - "before the device and its rings are resumed" via VHOST_USER_RESUME? > > > > And is this referring to the source device? > > > Yes, via VHOST_USER_RESUME, and restarting the rings by starting them (i.e. > > > a kick). > > > > > > Whether this is referring to the source device… Well, the text as it was > > > before begs the exact same question, so honestly, I don’t know for sure. > > > “Restarting” only makes sense if the rings were stopped before, so I assume > > > it’s referring to the source, e.g. for the case of a failed migration. > > > RESUME at least definitely will only happen after a prior SUSPEND, so this > > > one will definitely only apply on the source side. > > > > > > > Please rephrase the paragraph to identify the vhost-user messages > > > > involved. > > > > > > > > > In postcopy migration the back-end is started before all the memory has > > > > > been received from the source host, and care must be taken to avoid > > > > > @@ -885,6 +890,7 @@ Protocol features > > > > > #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15 > > > > > #define VHOST_USER_PROTOCOL_F_STATUS 16 > > > > > #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17 > > > > > + #define VHOST_USER_PROTOCOL_F_SUSPEND 18 > > > > > Front-end message types > > > > > ----------------------- > > > > > @@ -1440,6 +1446,31 @@ Front-end message types > > > > > query the back-end for its device status as defined in the Virtio > > > > > specification. > > > > > +``VHOST_USER_SUSPEND`` > > > > > + :id: 41 > > > > > + :equivalent ioctl: VHOST_VDPA_SUSPEND > > > > > + :request payload: N/A > > > > > + :reply payload: N/A > > > > > + > > > > > + When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been > > > > > + successfully negotiated, this message is submitted by the front-end to > > > > > + have the back-end cease all operations except for handling vhost-user > > > > > + requests. The back-end must stop processing all virt queues, and it > > > > > + must not perform any background operations. It may not resume until a > > > > "background operations" are not defined. What does it mean: > > > > - Anything that writes to memory slots > > > > - Anything that changes the visible state of the device > > > > - Anything that changes the non-visible internal state of the device > > > > - etc > > > > ? > > > My best answer (honestly) is: You tell me. This series is introducing > > > SUSPEND/RESUME because qemu wants to reset devices to make them stop > > > “background operations”, and this would break virtiofsd if any form of reset > > > were actually implemented. The implementation of SUSPEND/RESUME in > > > virtiofsd on the other hand is supposed to basically be a no-op (besides > > > delaying ring processing until a RESUME, but even if we processed them > > > before, i.e. really make SUSPEND/RESUME no-ops, it would most likely work > > > out fine), so I have no idea what kind of background operations we are even > > > talking about, or whether any such actually exist in practice. > > > > > > I don’t know what anyone had in mind when introducing the RESET. It comes > > > from vDPA (c3716f260bf moved it from vdpa into the common code), and exists > > > since the code was originally added (108a64818e6), so there’s no comment > > > explaining why it exists. I can’t explain what the back-end is supposed to > > > stop doing, because so far it isn’t explained anywhere in qemu, its git log, > > > or in any documentation (there basically is no vdpa documentation). > > > > > > I can only say what I just completely naïvely assumed it to mean so far: > > > That the back-end basically should stop doing anything besides handling and > > > replying to vhost-user messages. If such a message requires changing any > > > state, visible or not, then this state change is permissible. > > Okay, I suggest the following instead of "background operations": > > > > - Changes to the device state produced by SET_DEVICE_STATE_FD. > > This is definitely something that I would absolutely allow after SUSPEND. > If the front-end does not wish the back-end to do this, it can just not send > this command while the back-end is suspended. > > > > - Writing to memory regions. > > - Signalling that buffers have been used. > > This feels both too tight and not concrete enough. The only buffers I can > think of are buffers supplied through virt queues, which I intended to > already be included in “stop processing all virt queues”. (I took this to > mean the used-buffer ring, too, but I can of course be more explicit about > this, e.g. “stop processing all virt queues, including returning buffers to > the driver”.) “Signalling” sounds like triggering the callfd, but that also > seems clear; if you can’t process virt queues, including returning buffers, > you can never trigger the callfd (or send VHOST_USER_BACKEND_VRING_CALL), > because there can never be a new buffer returned to the driver. Yes, I was trying to use the language of the vhost-user spec related to the callfd. If the back-end has a timer for interrupt mitigation then it technically isn't doing any virtqueue processing after SUSPEND but might still signal the callfd when the timer expires. There is overlap with virtqueue processing, depending on how you define it. Maybe mentioning this separately is overkill. > > So too tight because it feels like a subset of virt queue processing, but > not concrete enough because “buffers” makes me feel like I’m overlooking > something besides virt queues. > > > - Signalling that the configuration space has changed. > > Maybe this could be more general, i.e. the back-end must not send any > vhost-user messages to the front-end? Yes, I was thinking solely of VHOST_USER_BACKEND_CONFIG_CHANGE_MSG but it could include all back-end message types. [I am reordering your reply, because this might be important before we continue below...] > Other than migration, qemu doesn’t see the device state at all. I don’t see > why internal state should not change between stop/cont. If a device > experience some signal that (for some reason) it can’t pause to receive only > after the subsequent RESUME, it might need to make note of that signal to > act on it after RESUME. I would consider that a change in internal state, > and I don’t immediately see a problem with it. (It may be problematic when > migrating, because receiving such signals on the source side after > transferring the state would mean they’re lost, but again, this is something > the device clearly has to solve, e.g. by redirecting the signals to the > destination starting with the SET_TRANSFER_STATE_FD call.) I only consider device state to be the data that is transferred over the device state fd. The other state in the back-end, like the pending signal example you mentioned, is not part of what I call the device state. > > The goal is to ensure the device state and memory regions are stable and > > that back-end doesn't trigger activity in the front-end. > > If the goal is to ensure that the device state is stable, I feel like we > must then specify precisely this, and not just to say it must not be mutable > through SET_DEVICE_STATE_FD: In general, the state is more likely to be > changed by other factors after all. > > On the other hand, I also don’t see why that’s a goal. For migration, it > might seem desirable, but I don’t actually think so: The back-end is > required to send a consistent device state anyway, so it is up to the > back-end to ensure the state is consistent, we don’t need to make it a > requirement for SUSPEND. It is implicitly clear from the migration model > that if the device state were to change after the back-end has sent it to > the front-end, those change will be lost on the destination side, so it is > clear that the back-end must anticipate this and work around it. I don't follow your last sentence. I agree that device state changes after the device state fd has been read would be lost on the destination. But that does not mean that it is valid to modify the device state while reading the device state fd (there is no iterative migration support yet). Are you assuming that SET_DEVICE_STATE_FD takes a snapshot of the device state and reading the fd reads the (consistent) snapshot? I worry that it may not be possible for all implementations to take a snapshot of the device state and then continue modifying the actual device state. This could be due to resource limitations or due to underlying API limitations. It would be easier if the spec forbid modifying the device state while the device was suspended. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 2/6] vhost-vdpa: Move vhost_vdpa_reset_status() up 2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek 2023-07-11 15:52 ` [PATCH 1/6] vhost-user.rst: " Hanna Czenczek @ 2023-07-11 15:52 ` Hanna Czenczek 2023-07-18 14:29 ` Stefan Hajnoczi 2023-07-11 15:52 ` [PATCH 3/6] vhost: Do not reset suspended devices on stop Hanna Czenczek ` (4 subsequent siblings) 6 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-11 15:52 UTC (permalink / raw) To: qemu-devel Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi, Eugenio Pérez, German Maglione The next commit is going to have vhost_vdpa_dev_start() call this, so move it up to have the declaration where we are going to need it. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> --- hw/virtio/vhost-vdpa.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 42f2a4bae9..7b7dee468e 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -1286,6 +1286,20 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) vhost_vdpa_reset_device(dev); } +static void vhost_vdpa_reset_status(struct vhost_dev *dev) +{ + struct vhost_vdpa *v = dev->opaque; + + if (dev->vq_index + dev->nvqs != dev->vq_index_end) { + return; + } + + vhost_vdpa_reset_device(dev); + vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | + VIRTIO_CONFIG_S_DRIVER); + memory_listener_unregister(&v->listener); +} + static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) { struct vhost_vdpa *v = dev->opaque; @@ -1323,20 +1337,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) return 0; } -static void vhost_vdpa_reset_status(struct vhost_dev *dev) -{ - struct vhost_vdpa *v = dev->opaque; - - if (dev->vq_index + dev->nvqs != dev->vq_index_end) { - return; - } - - vhost_vdpa_reset_device(dev); - vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | - VIRTIO_CONFIG_S_DRIVER); - memory_listener_unregister(&v->listener); -} - static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base, struct vhost_log *log) { -- 2.41.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 2/6] vhost-vdpa: Move vhost_vdpa_reset_status() up 2023-07-11 15:52 ` [PATCH 2/6] vhost-vdpa: Move vhost_vdpa_reset_status() up Hanna Czenczek @ 2023-07-18 14:29 ` Stefan Hajnoczi 0 siblings, 0 replies; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-18 14:29 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione [-- Attachment #1: Type: text/plain, Size: 440 bytes --] On Tue, Jul 11, 2023 at 05:52:24PM +0200, Hanna Czenczek wrote: > The next commit is going to have vhost_vdpa_dev_start() call this, so > move it up to have the declaration where we are going to need it. > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > --- > hw/virtio/vhost-vdpa.c | 28 ++++++++++++++-------------- > 1 file changed, 14 insertions(+), 14 deletions(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek 2023-07-11 15:52 ` [PATCH 1/6] vhost-user.rst: " Hanna Czenczek 2023-07-11 15:52 ` [PATCH 2/6] vhost-vdpa: Move vhost_vdpa_reset_status() up Hanna Czenczek @ 2023-07-11 15:52 ` Hanna Czenczek 2023-07-18 14:33 ` Stefan Hajnoczi 2023-07-21 15:25 ` Eugenio Perez Martin 2023-07-11 15:52 ` [PATCH 4/6] vhost-user: Implement suspend/resume Hanna Czenczek ` (3 subsequent siblings) 6 siblings, 2 replies; 37+ messages in thread From: Hanna Czenczek @ 2023-07-11 15:52 UTC (permalink / raw) To: qemu-devel Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi, Eugenio Pérez, German Maglione Move the `suspended` field from vhost_vdpa into the global vhost_dev struct, so vhost_dev_stop() can check whether the back-end has been suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, there is no need to reset it; the reset is just a fall-back to stop device operations for back-ends that do not support suspend. Unfortunately, for vDPA specifically, RESUME is not yet implemented, so when the device is re-started, we still have to do the reset to have it un-suspend. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> --- include/hw/virtio/vhost-vdpa.h | 2 -- include/hw/virtio/vhost.h | 8 ++++++++ hw/virtio/vhost-vdpa.c | 11 +++++++---- hw/virtio/vhost.c | 8 +++++++- 4 files changed, 22 insertions(+), 7 deletions(-) diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h index e64bfc7f98..72c3686b7f 100644 --- a/include/hw/virtio/vhost-vdpa.h +++ b/include/hw/virtio/vhost-vdpa.h @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { bool shadow_vqs_enabled; /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ bool shadow_data; - /* Device suspended successfully */ - bool suspended; /* IOVA mapping used by the Shadow Virtqueue */ VhostIOVATree *iova_tree; GPtrArray *shadow_vqs; diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h index 6a173cb9fa..69bf59d630 100644 --- a/include/hw/virtio/vhost.h +++ b/include/hw/virtio/vhost.h @@ -120,6 +120,14 @@ struct vhost_dev { uint64_t backend_cap; /* @started: is the vhost device started? */ bool started; + /** + * @suspended: Whether the vhost device is currently suspended. Set + * and reset by implementations (vhost-user, vhost-vdpa, ...), which + * are supposed to automatically suspend/resume in their + * vhost_dev_start handlers as required. Must also be cleared when + * the device is reset. + */ + bool suspended; bool log_enabled; uint64_t log_size; Error *migration_blocker; diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 7b7dee468e..f7fd19a203 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, static int vhost_vdpa_reset_device(struct vhost_dev *dev) { - struct vhost_vdpa *v = dev->opaque; int ret; uint8_t status = 0; ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); trace_vhost_vdpa_reset_device(dev); - v->suspended = false; + dev->suspended = false; return ret; } @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) if (unlikely(r)) { error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); } else { - v->suspended = true; + dev->suspended = true; return; } } @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) return -1; } vhost_vdpa_set_vring_ready(dev); + if (dev->suspended) { + /* TODO: When RESUME is available, use it instead of resetting */ + vhost_vdpa_reset_status(dev); + } } else { vhost_vdpa_suspend(dev); vhost_vdpa_svqs_stop(dev); @@ -1400,7 +1403,7 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev, return 0; } - if (!v->suspended) { + if (!dev->suspended) { /* * Cannot trust in value returned by device, let vhost recover used * idx from guest. diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c index abf0d03c8d..2e28e58da7 100644 --- a/hw/virtio/vhost.c +++ b/hw/virtio/vhost.c @@ -2059,7 +2059,13 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings) hdev->vqs + i, hdev->vq_index + i); } - if (hdev->vhost_ops->vhost_reset_status) { + + /* + * If we failed to successfully stop the device via SUSPEND (should have + * been attempted by `vhost_dev_start(hdev, false)`), reset it to stop it. + * Stateful devices where this would be problematic must implement SUSPEND. + */ + if (!hdev->suspended && hdev->vhost_ops->vhost_reset_status) { hdev->vhost_ops->vhost_reset_status(hdev); } -- 2.41.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-11 15:52 ` [PATCH 3/6] vhost: Do not reset suspended devices on stop Hanna Czenczek @ 2023-07-18 14:33 ` Stefan Hajnoczi 2023-07-21 15:25 ` Eugenio Perez Martin 1 sibling, 0 replies; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-18 14:33 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione [-- Attachment #1: Type: text/plain, Size: 948 bytes --] On Tue, Jul 11, 2023 at 05:52:25PM +0200, Hanna Czenczek wrote: > Move the `suspended` field from vhost_vdpa into the global vhost_dev > struct, so vhost_dev_stop() can check whether the back-end has been > suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, > there is no need to reset it; the reset is just a fall-back to stop > device operations for back-ends that do not support suspend. > > Unfortunately, for vDPA specifically, RESUME is not yet implemented, so > when the device is re-started, we still have to do the reset to have it > un-suspend. > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > --- > include/hw/virtio/vhost-vdpa.h | 2 -- > include/hw/virtio/vhost.h | 8 ++++++++ > hw/virtio/vhost-vdpa.c | 11 +++++++---- > hw/virtio/vhost.c | 8 +++++++- > 4 files changed, 22 insertions(+), 7 deletions(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-11 15:52 ` [PATCH 3/6] vhost: Do not reset suspended devices on stop Hanna Czenczek 2023-07-18 14:33 ` Stefan Hajnoczi @ 2023-07-21 15:25 ` Eugenio Perez Martin 2023-07-21 16:07 ` Hanna Czenczek 1 sibling, 1 reply; 37+ messages in thread From: Eugenio Perez Martin @ 2023-07-21 15:25 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Stefan Hajnoczi, German Maglione On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote: > > Move the `suspended` field from vhost_vdpa into the global vhost_dev > struct, so vhost_dev_stop() can check whether the back-end has been > suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, > there is no need to reset it; the reset is just a fall-back to stop > device operations for back-ends that do not support suspend. > > Unfortunately, for vDPA specifically, RESUME is not yet implemented, so > when the device is re-started, we still have to do the reset to have it > un-suspend. > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > --- > include/hw/virtio/vhost-vdpa.h | 2 -- > include/hw/virtio/vhost.h | 8 ++++++++ > hw/virtio/vhost-vdpa.c | 11 +++++++---- > hw/virtio/vhost.c | 8 +++++++- > 4 files changed, 22 insertions(+), 7 deletions(-) > > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h > index e64bfc7f98..72c3686b7f 100644 > --- a/include/hw/virtio/vhost-vdpa.h > +++ b/include/hw/virtio/vhost-vdpa.h > @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { > bool shadow_vqs_enabled; > /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ > bool shadow_data; > - /* Device suspended successfully */ > - bool suspended; > /* IOVA mapping used by the Shadow Virtqueue */ > VhostIOVATree *iova_tree; > GPtrArray *shadow_vqs; > diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h > index 6a173cb9fa..69bf59d630 100644 > --- a/include/hw/virtio/vhost.h > +++ b/include/hw/virtio/vhost.h > @@ -120,6 +120,14 @@ struct vhost_dev { > uint64_t backend_cap; > /* @started: is the vhost device started? */ > bool started; > + /** > + * @suspended: Whether the vhost device is currently suspended. Set > + * and reset by implementations (vhost-user, vhost-vdpa, ...), which > + * are supposed to automatically suspend/resume in their > + * vhost_dev_start handlers as required. Must also be cleared when > + * the device is reset. > + */ > + bool suspended; > bool log_enabled; > uint64_t log_size; > Error *migration_blocker; > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > index 7b7dee468e..f7fd19a203 100644 > --- a/hw/virtio/vhost-vdpa.c > +++ b/hw/virtio/vhost-vdpa.c > @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, > > static int vhost_vdpa_reset_device(struct vhost_dev *dev) > { > - struct vhost_vdpa *v = dev->opaque; > int ret; > uint8_t status = 0; > > ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); > trace_vhost_vdpa_reset_device(dev); > - v->suspended = false; > + dev->suspended = false; > return ret; > } > > @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) > if (unlikely(r)) { > error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); > } else { > - v->suspended = true; > + dev->suspended = true; > return; > } > } > @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) > return -1; > } > vhost_vdpa_set_vring_ready(dev); > + if (dev->suspended) { > + /* TODO: When RESUME is available, use it instead of resetting */ > + vhost_vdpa_reset_status(dev); How is that we reset the status at each vhost_vdpa_dev_start? That will clean all the vqs configured, features negotiated, etc. in the vDPA device. Or am I missing something? I'm totally ok with the move of "suspended" member. Thanks! > + } > } else { > vhost_vdpa_suspend(dev); > vhost_vdpa_svqs_stop(dev); > @@ -1400,7 +1403,7 @@ static int vhost_vdpa_get_vring_base(struct vhost_dev *dev, > return 0; > } > > - if (!v->suspended) { > + if (!dev->suspended) { > /* > * Cannot trust in value returned by device, let vhost recover used > * idx from guest. > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c > index abf0d03c8d..2e28e58da7 100644 > --- a/hw/virtio/vhost.c > +++ b/hw/virtio/vhost.c > @@ -2059,7 +2059,13 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings) > hdev->vqs + i, > hdev->vq_index + i); > } > - if (hdev->vhost_ops->vhost_reset_status) { > + > + /* > + * If we failed to successfully stop the device via SUSPEND (should have > + * been attempted by `vhost_dev_start(hdev, false)`), reset it to stop it. > + * Stateful devices where this would be problematic must implement SUSPEND. > + */ > + if (!hdev->suspended && hdev->vhost_ops->vhost_reset_status) { > hdev->vhost_ops->vhost_reset_status(hdev); > } > > -- > 2.41.0 > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-21 15:25 ` Eugenio Perez Martin @ 2023-07-21 16:07 ` Hanna Czenczek 2023-07-24 15:48 ` Eugenio Perez Martin 0 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-21 16:07 UTC (permalink / raw) To: Eugenio Perez Martin Cc: qemu-devel, Michael S . Tsirkin, Stefan Hajnoczi, German Maglione On 21.07.23 17:25, Eugenio Perez Martin wrote: > On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote: >> Move the `suspended` field from vhost_vdpa into the global vhost_dev >> struct, so vhost_dev_stop() can check whether the back-end has been >> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, >> there is no need to reset it; the reset is just a fall-back to stop >> device operations for back-ends that do not support suspend. >> >> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so >> when the device is re-started, we still have to do the reset to have it >> un-suspend. >> >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >> --- >> include/hw/virtio/vhost-vdpa.h | 2 -- >> include/hw/virtio/vhost.h | 8 ++++++++ >> hw/virtio/vhost-vdpa.c | 11 +++++++---- >> hw/virtio/vhost.c | 8 +++++++- >> 4 files changed, 22 insertions(+), 7 deletions(-) >> >> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h >> index e64bfc7f98..72c3686b7f 100644 >> --- a/include/hw/virtio/vhost-vdpa.h >> +++ b/include/hw/virtio/vhost-vdpa.h >> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { >> bool shadow_vqs_enabled; >> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ >> bool shadow_data; >> - /* Device suspended successfully */ >> - bool suspended; >> /* IOVA mapping used by the Shadow Virtqueue */ >> VhostIOVATree *iova_tree; >> GPtrArray *shadow_vqs; >> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h >> index 6a173cb9fa..69bf59d630 100644 >> --- a/include/hw/virtio/vhost.h >> +++ b/include/hw/virtio/vhost.h >> @@ -120,6 +120,14 @@ struct vhost_dev { >> uint64_t backend_cap; >> /* @started: is the vhost device started? */ >> bool started; >> + /** >> + * @suspended: Whether the vhost device is currently suspended. Set >> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which >> + * are supposed to automatically suspend/resume in their >> + * vhost_dev_start handlers as required. Must also be cleared when >> + * the device is reset. >> + */ >> + bool suspended; >> bool log_enabled; >> uint64_t log_size; >> Error *migration_blocker; >> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c >> index 7b7dee468e..f7fd19a203 100644 >> --- a/hw/virtio/vhost-vdpa.c >> +++ b/hw/virtio/vhost-vdpa.c >> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, >> >> static int vhost_vdpa_reset_device(struct vhost_dev *dev) >> { >> - struct vhost_vdpa *v = dev->opaque; >> int ret; >> uint8_t status = 0; >> >> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); >> trace_vhost_vdpa_reset_device(dev); >> - v->suspended = false; >> + dev->suspended = false; >> return ret; >> } >> >> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) >> if (unlikely(r)) { >> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); >> } else { >> - v->suspended = true; >> + dev->suspended = true; >> return; >> } >> } >> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) >> return -1; >> } >> vhost_vdpa_set_vring_ready(dev); >> + if (dev->suspended) { >> + /* TODO: When RESUME is available, use it instead of resetting */ >> + vhost_vdpa_reset_status(dev); > How is that we reset the status at each vhost_vdpa_dev_start? That > will clean all the vqs configured, features negotiated, etc. in the > vDPA device. Or am I missing something? What alternative do you propose? We don’t have RESUME for vDPA in qemu, but we somehow need to lift the previous SUSPEND so the device will again respond to guest requests, do we not? But more generally, is this any different from what is done before this patch? Before this patch, vhost_dev_stop() unconditionally invokes vhost_reset_status(), so the device is reset in every stop/start cycle, that doesn’t change. And we still won’t reset it on the first vhost_dev_start(), because dev->suspended will be false then, only on subsequent stop/start cycles, as before. So the only difference is that now the device is reset on start, not on stop. Hanna ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-21 16:07 ` Hanna Czenczek @ 2023-07-24 15:48 ` Eugenio Perez Martin 2023-07-25 7:53 ` Hanna Czenczek 0 siblings, 1 reply; 37+ messages in thread From: Eugenio Perez Martin @ 2023-07-24 15:48 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Stefan Hajnoczi, German Maglione On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote: > > On 21.07.23 17:25, Eugenio Perez Martin wrote: > > On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote: > >> Move the `suspended` field from vhost_vdpa into the global vhost_dev > >> struct, so vhost_dev_stop() can check whether the back-end has been > >> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, > >> there is no need to reset it; the reset is just a fall-back to stop > >> device operations for back-ends that do not support suspend. > >> > >> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so > >> when the device is re-started, we still have to do the reset to have it > >> un-suspend. > >> > >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > >> --- > >> include/hw/virtio/vhost-vdpa.h | 2 -- > >> include/hw/virtio/vhost.h | 8 ++++++++ > >> hw/virtio/vhost-vdpa.c | 11 +++++++---- > >> hw/virtio/vhost.c | 8 +++++++- > >> 4 files changed, 22 insertions(+), 7 deletions(-) > >> > >> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h > >> index e64bfc7f98..72c3686b7f 100644 > >> --- a/include/hw/virtio/vhost-vdpa.h > >> +++ b/include/hw/virtio/vhost-vdpa.h > >> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { > >> bool shadow_vqs_enabled; > >> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ > >> bool shadow_data; > >> - /* Device suspended successfully */ > >> - bool suspended; > >> /* IOVA mapping used by the Shadow Virtqueue */ > >> VhostIOVATree *iova_tree; > >> GPtrArray *shadow_vqs; > >> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h > >> index 6a173cb9fa..69bf59d630 100644 > >> --- a/include/hw/virtio/vhost.h > >> +++ b/include/hw/virtio/vhost.h > >> @@ -120,6 +120,14 @@ struct vhost_dev { > >> uint64_t backend_cap; > >> /* @started: is the vhost device started? */ > >> bool started; > >> + /** > >> + * @suspended: Whether the vhost device is currently suspended. Set > >> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which > >> + * are supposed to automatically suspend/resume in their > >> + * vhost_dev_start handlers as required. Must also be cleared when > >> + * the device is reset. > >> + */ > >> + bool suspended; > >> bool log_enabled; > >> uint64_t log_size; > >> Error *migration_blocker; > >> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > >> index 7b7dee468e..f7fd19a203 100644 > >> --- a/hw/virtio/vhost-vdpa.c > >> +++ b/hw/virtio/vhost-vdpa.c > >> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, > >> > >> static int vhost_vdpa_reset_device(struct vhost_dev *dev) > >> { > >> - struct vhost_vdpa *v = dev->opaque; > >> int ret; > >> uint8_t status = 0; > >> > >> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); > >> trace_vhost_vdpa_reset_device(dev); > >> - v->suspended = false; > >> + dev->suspended = false; > >> return ret; > >> } > >> > >> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) > >> if (unlikely(r)) { > >> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); > >> } else { > >> - v->suspended = true; > >> + dev->suspended = true; > >> return; > >> } > >> } > >> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) > >> return -1; > >> } > >> vhost_vdpa_set_vring_ready(dev); > >> + if (dev->suspended) { > >> + /* TODO: When RESUME is available, use it instead of resetting */ > >> + vhost_vdpa_reset_status(dev); > > How is that we reset the status at each vhost_vdpa_dev_start? That > > will clean all the vqs configured, features negotiated, etc. in the > > vDPA device. Or am I missing something? > > What alternative do you propose? We don’t have RESUME for vDPA in qemu, > but we somehow need to lift the previous SUSPEND so the device will > again respond to guest requests, do we not? > Reset also clears the suspend state in vDPA, and it should be called at vhost_dev_stop. So the device should never be in suspended state here. Does that solve your concerns? > But more generally, is this any different from what is done before this > patch? Before this patch, vhost_dev_stop() unconditionally invokes > vhost_reset_status(), so the device is reset in every stop/start cycle, > that doesn’t change. And we still won’t reset it on the first > vhost_dev_start(), because dev->suspended will be false then, only on > subsequent stop/start cycles, as before. So the only difference is that > now the device is reset on start, not on stop. > The difference is that vhost_vdpa_dev_start is called after features ack (via vhost_dev_start, through vhost_dev_set_features call) and vq configuration (using vhost_virtqueue_start). A device reset forces the device to forget about all of that, and qemu cannot configure them again until qemu acks the features again. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-24 15:48 ` Eugenio Perez Martin @ 2023-07-25 7:53 ` Hanna Czenczek 2023-07-25 10:03 ` Eugenio Perez Martin 0 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-25 7:53 UTC (permalink / raw) To: Eugenio Perez Martin Cc: qemu-devel, Michael S . Tsirkin, Stefan Hajnoczi, German Maglione On 24.07.23 17:48, Eugenio Perez Martin wrote: > On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote: >> On 21.07.23 17:25, Eugenio Perez Martin wrote: >>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote: >>>> Move the `suspended` field from vhost_vdpa into the global vhost_dev >>>> struct, so vhost_dev_stop() can check whether the back-end has been >>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, >>>> there is no need to reset it; the reset is just a fall-back to stop >>>> device operations for back-ends that do not support suspend. >>>> >>>> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so >>>> when the device is re-started, we still have to do the reset to have it >>>> un-suspend. >>>> >>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>> --- >>>> include/hw/virtio/vhost-vdpa.h | 2 -- >>>> include/hw/virtio/vhost.h | 8 ++++++++ >>>> hw/virtio/vhost-vdpa.c | 11 +++++++---- >>>> hw/virtio/vhost.c | 8 +++++++- >>>> 4 files changed, 22 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h >>>> index e64bfc7f98..72c3686b7f 100644 >>>> --- a/include/hw/virtio/vhost-vdpa.h >>>> +++ b/include/hw/virtio/vhost-vdpa.h >>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { >>>> bool shadow_vqs_enabled; >>>> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ >>>> bool shadow_data; >>>> - /* Device suspended successfully */ >>>> - bool suspended; >>>> /* IOVA mapping used by the Shadow Virtqueue */ >>>> VhostIOVATree *iova_tree; >>>> GPtrArray *shadow_vqs; >>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h >>>> index 6a173cb9fa..69bf59d630 100644 >>>> --- a/include/hw/virtio/vhost.h >>>> +++ b/include/hw/virtio/vhost.h >>>> @@ -120,6 +120,14 @@ struct vhost_dev { >>>> uint64_t backend_cap; >>>> /* @started: is the vhost device started? */ >>>> bool started; >>>> + /** >>>> + * @suspended: Whether the vhost device is currently suspended. Set >>>> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which >>>> + * are supposed to automatically suspend/resume in their >>>> + * vhost_dev_start handlers as required. Must also be cleared when >>>> + * the device is reset. >>>> + */ >>>> + bool suspended; >>>> bool log_enabled; >>>> uint64_t log_size; >>>> Error *migration_blocker; >>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c >>>> index 7b7dee468e..f7fd19a203 100644 >>>> --- a/hw/virtio/vhost-vdpa.c >>>> +++ b/hw/virtio/vhost-vdpa.c >>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, >>>> >>>> static int vhost_vdpa_reset_device(struct vhost_dev *dev) >>>> { >>>> - struct vhost_vdpa *v = dev->opaque; >>>> int ret; >>>> uint8_t status = 0; >>>> >>>> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); >>>> trace_vhost_vdpa_reset_device(dev); >>>> - v->suspended = false; >>>> + dev->suspended = false; >>>> return ret; >>>> } >>>> >>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) >>>> if (unlikely(r)) { >>>> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); >>>> } else { >>>> - v->suspended = true; >>>> + dev->suspended = true; >>>> return; >>>> } >>>> } >>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) >>>> return -1; >>>> } >>>> vhost_vdpa_set_vring_ready(dev); >>>> + if (dev->suspended) { >>>> + /* TODO: When RESUME is available, use it instead of resetting */ >>>> + vhost_vdpa_reset_status(dev); >>> How is that we reset the status at each vhost_vdpa_dev_start? That >>> will clean all the vqs configured, features negotiated, etc. in the >>> vDPA device. Or am I missing something? >> What alternative do you propose? We don’t have RESUME for vDPA in qemu, >> but we somehow need to lift the previous SUSPEND so the device will >> again respond to guest requests, do we not? >> > Reset also clears the suspend state in vDPA, and it should be called > at vhost_dev_stop. So the device should never be in suspended state > here. Does that solve your concerns? My intention with this patch was precisely not to reset in vhost_dev_stop when suspending is supported. So now I’m more confused than before. >> But more generally, is this any different from what is done before this >> patch? Before this patch, vhost_dev_stop() unconditionally invokes >> vhost_reset_status(), so the device is reset in every stop/start cycle, >> that doesn’t change. And we still won’t reset it on the first >> vhost_dev_start(), because dev->suspended will be false then, only on >> subsequent stop/start cycles, as before. So the only difference is that >> now the device is reset on start, not on stop. >> > The difference is that vhost_vdpa_dev_start is called after features > ack (via vhost_dev_start, through vhost_dev_set_features call) and vq > configuration (using vhost_virtqueue_start). A device reset forces the > device to forget about all of that, and qemu cannot configure them > again until qemu acks the features again. Now I’m completely confused, because I don’t see the point of implementing suspend at all if we rely on a reset immediately afterwards anyway. It was my impression this whole time that suspending would remove the need to reset. Well, at least until the device should be resumed again, i.e. in vhost_dev_start(). In addition, I also don’t understand the magnitude of the problem with ordering. If the order in vhost_dev_start() is wrong, can we not easily fix it? E.g. add a full vhost_dev_resume callback to invoke right at the start of vhost_dev_start(); or check (in the same place) whether the back-end supports resuming, and if it doesn’t (and it is currently suspended), reset it there. In any case, if we need to reset in vhost_dev_stop(), i.e. immediately after suspend, I don’t see the point of suspending, indicating to me that I still fail to understand its purpose. Hanna ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-25 7:53 ` Hanna Czenczek @ 2023-07-25 10:03 ` Eugenio Perez Martin 2023-07-25 13:09 ` Hanna Czenczek 0 siblings, 1 reply; 37+ messages in thread From: Eugenio Perez Martin @ 2023-07-25 10:03 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Stefan Hajnoczi, German Maglione On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek <hreitz@redhat.com> wrote: > > On 24.07.23 17:48, Eugenio Perez Martin wrote: > > On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote: > >> On 21.07.23 17:25, Eugenio Perez Martin wrote: > >>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote: > >>>> Move the `suspended` field from vhost_vdpa into the global vhost_dev > >>>> struct, so vhost_dev_stop() can check whether the back-end has been > >>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, > >>>> there is no need to reset it; the reset is just a fall-back to stop > >>>> device operations for back-ends that do not support suspend. > >>>> > >>>> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so > >>>> when the device is re-started, we still have to do the reset to have it > >>>> un-suspend. > >>>> > >>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > >>>> --- > >>>> include/hw/virtio/vhost-vdpa.h | 2 -- > >>>> include/hw/virtio/vhost.h | 8 ++++++++ > >>>> hw/virtio/vhost-vdpa.c | 11 +++++++---- > >>>> hw/virtio/vhost.c | 8 +++++++- > >>>> 4 files changed, 22 insertions(+), 7 deletions(-) > >>>> > >>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h > >>>> index e64bfc7f98..72c3686b7f 100644 > >>>> --- a/include/hw/virtio/vhost-vdpa.h > >>>> +++ b/include/hw/virtio/vhost-vdpa.h > >>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { > >>>> bool shadow_vqs_enabled; > >>>> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ > >>>> bool shadow_data; > >>>> - /* Device suspended successfully */ > >>>> - bool suspended; > >>>> /* IOVA mapping used by the Shadow Virtqueue */ > >>>> VhostIOVATree *iova_tree; > >>>> GPtrArray *shadow_vqs; > >>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h > >>>> index 6a173cb9fa..69bf59d630 100644 > >>>> --- a/include/hw/virtio/vhost.h > >>>> +++ b/include/hw/virtio/vhost.h > >>>> @@ -120,6 +120,14 @@ struct vhost_dev { > >>>> uint64_t backend_cap; > >>>> /* @started: is the vhost device started? */ > >>>> bool started; > >>>> + /** > >>>> + * @suspended: Whether the vhost device is currently suspended. Set > >>>> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which > >>>> + * are supposed to automatically suspend/resume in their > >>>> + * vhost_dev_start handlers as required. Must also be cleared when > >>>> + * the device is reset. > >>>> + */ > >>>> + bool suspended; > >>>> bool log_enabled; > >>>> uint64_t log_size; > >>>> Error *migration_blocker; > >>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > >>>> index 7b7dee468e..f7fd19a203 100644 > >>>> --- a/hw/virtio/vhost-vdpa.c > >>>> +++ b/hw/virtio/vhost-vdpa.c > >>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, > >>>> > >>>> static int vhost_vdpa_reset_device(struct vhost_dev *dev) > >>>> { > >>>> - struct vhost_vdpa *v = dev->opaque; > >>>> int ret; > >>>> uint8_t status = 0; > >>>> > >>>> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); > >>>> trace_vhost_vdpa_reset_device(dev); > >>>> - v->suspended = false; > >>>> + dev->suspended = false; > >>>> return ret; > >>>> } > >>>> > >>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) > >>>> if (unlikely(r)) { > >>>> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); > >>>> } else { > >>>> - v->suspended = true; > >>>> + dev->suspended = true; > >>>> return; > >>>> } > >>>> } > >>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) > >>>> return -1; > >>>> } > >>>> vhost_vdpa_set_vring_ready(dev); > >>>> + if (dev->suspended) { > >>>> + /* TODO: When RESUME is available, use it instead of resetting */ > >>>> + vhost_vdpa_reset_status(dev); > >>> How is that we reset the status at each vhost_vdpa_dev_start? That > >>> will clean all the vqs configured, features negotiated, etc. in the > >>> vDPA device. Or am I missing something? > >> What alternative do you propose? We don’t have RESUME for vDPA in qemu, > >> but we somehow need to lift the previous SUSPEND so the device will > >> again respond to guest requests, do we not? > >> > > Reset also clears the suspend state in vDPA, and it should be called > > at vhost_dev_stop. So the device should never be in suspended state > > here. Does that solve your concerns? > > My intention with this patch was precisely not to reset in > vhost_dev_stop when suspending is supported. So now I’m more confused > than before. > At this moment, I think that should be focused as an optimization and not to be included in the main series. > >> But more generally, is this any different from what is done before this > >> patch? Before this patch, vhost_dev_stop() unconditionally invokes > >> vhost_reset_status(), so the device is reset in every stop/start cycle, > >> that doesn’t change. And we still won’t reset it on the first > >> vhost_dev_start(), because dev->suspended will be false then, only on > >> subsequent stop/start cycles, as before. So the only difference is that > >> now the device is reset on start, not on stop. > >> > > The difference is that vhost_vdpa_dev_start is called after features > > ack (via vhost_dev_start, through vhost_dev_set_features call) and vq > > configuration (using vhost_virtqueue_start). A device reset forces the > > device to forget about all of that, and qemu cannot configure them > > again until qemu acks the features again. > > Now I’m completely confused, because I don’t see the point of > implementing suspend at all if we rely on a reset immediately afterwards > anyway. It's not immediate. From vhost_dev_stop, comments added only in this mail: void vhost_virtqueue_stop(struct vhost_dev *dev, struct VirtIODevice *vdev, struct vhost_virtqueue *vq, unsigned idx) { ... // Get each vring indexes, trusting the destination device can continue safely from there r = dev->vhost_ops->vhost_get_vring_base(dev, &state); if (r < 0) { VHOST_OPS_DEBUG(r, "vhost VQ %u ring restore failed: %d", idx, r); /* Connection to the backend is broken, so let's sync internal * last avail idx to the device used idx. */ virtio_queue_restore_last_avail_idx(vdev, idx); } else { virtio_queue_set_last_avail_idx(vdev, idx, state.num); } ... } void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings) { ... // Suspend the device, so we can trust in vring indexes / vq state if (hdev->vhost_ops->vhost_dev_start) { hdev->vhost_ops->vhost_dev_start(hdev, false); } if (vrings) { vhost_dev_set_vring_enable(hdev, false); } // Fetch each vq index / state and store in vdev->vq[i] for (i = 0; i < hdev->nvqs; ++i) { vhost_virtqueue_stop(hdev, vdev, hdev->vqs + i, hdev->vq_index + i); } // Reset the device, as we don't need it anymore and it can release the resources if (hdev->vhost_ops->vhost_reset_status) { hdev->vhost_ops->vhost_reset_status(hdev); } } --- > It was my impression this whole time that suspending would > remove the need to reset. Well, at least until the device should be > resumed again, i.e. in vhost_dev_start(). > It cannot. vhost_dev_stop is also called when the guest reset the device, so the guest trusts the device to be in a clean state. Also, the reset is the moment when the device frees the unused resources. This would mandate the device to > In addition, I also don’t understand the magnitude of the problem with > ordering. If the order in vhost_dev_start() is wrong, can we not easily > fix it? The order in vhost_dev_start follows the VirtIO standard. The move of the reset should be to remove it from vhost_dev_stop to something like both virtio_reset and the end of virtio_save. I'm not sure if I'm forgetting some other use cases. > E.g. add a full vhost_dev_resume callback to invoke right at > the start of vhost_dev_start(); or check (in the same place) whether the > back-end supports resuming, and if it doesn’t (and it is currently > suspended), reset it there. > > In any case, if we need to reset in vhost_dev_stop(), i.e. immediately > after suspend, I don’t see the point of suspending, indicating to me > that I still fail to understand its purpose. > > Hanna > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-25 10:03 ` Eugenio Perez Martin @ 2023-07-25 13:09 ` Hanna Czenczek 2023-07-25 18:53 ` Eugenio Perez Martin 0 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-25 13:09 UTC (permalink / raw) To: Eugenio Perez Martin Cc: qemu-devel, Michael S . Tsirkin, Stefan Hajnoczi, German Maglione On 25.07.23 12:03, Eugenio Perez Martin wrote: > On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek <hreitz@redhat.com> wrote: >> On 24.07.23 17:48, Eugenio Perez Martin wrote: >>> On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote: >>>> On 21.07.23 17:25, Eugenio Perez Martin wrote: >>>>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote: >>>>>> Move the `suspended` field from vhost_vdpa into the global vhost_dev >>>>>> struct, so vhost_dev_stop() can check whether the back-end has been >>>>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, >>>>>> there is no need to reset it; the reset is just a fall-back to stop >>>>>> device operations for back-ends that do not support suspend. >>>>>> >>>>>> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so >>>>>> when the device is re-started, we still have to do the reset to have it >>>>>> un-suspend. >>>>>> >>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>>> --- >>>>>> include/hw/virtio/vhost-vdpa.h | 2 -- >>>>>> include/hw/virtio/vhost.h | 8 ++++++++ >>>>>> hw/virtio/vhost-vdpa.c | 11 +++++++---- >>>>>> hw/virtio/vhost.c | 8 +++++++- >>>>>> 4 files changed, 22 insertions(+), 7 deletions(-) >>>>>> >>>>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h >>>>>> index e64bfc7f98..72c3686b7f 100644 >>>>>> --- a/include/hw/virtio/vhost-vdpa.h >>>>>> +++ b/include/hw/virtio/vhost-vdpa.h >>>>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { >>>>>> bool shadow_vqs_enabled; >>>>>> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ >>>>>> bool shadow_data; >>>>>> - /* Device suspended successfully */ >>>>>> - bool suspended; >>>>>> /* IOVA mapping used by the Shadow Virtqueue */ >>>>>> VhostIOVATree *iova_tree; >>>>>> GPtrArray *shadow_vqs; >>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h >>>>>> index 6a173cb9fa..69bf59d630 100644 >>>>>> --- a/include/hw/virtio/vhost.h >>>>>> +++ b/include/hw/virtio/vhost.h >>>>>> @@ -120,6 +120,14 @@ struct vhost_dev { >>>>>> uint64_t backend_cap; >>>>>> /* @started: is the vhost device started? */ >>>>>> bool started; >>>>>> + /** >>>>>> + * @suspended: Whether the vhost device is currently suspended. Set >>>>>> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which >>>>>> + * are supposed to automatically suspend/resume in their >>>>>> + * vhost_dev_start handlers as required. Must also be cleared when >>>>>> + * the device is reset. >>>>>> + */ >>>>>> + bool suspended; >>>>>> bool log_enabled; >>>>>> uint64_t log_size; >>>>>> Error *migration_blocker; >>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c >>>>>> index 7b7dee468e..f7fd19a203 100644 >>>>>> --- a/hw/virtio/vhost-vdpa.c >>>>>> +++ b/hw/virtio/vhost-vdpa.c >>>>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, >>>>>> >>>>>> static int vhost_vdpa_reset_device(struct vhost_dev *dev) >>>>>> { >>>>>> - struct vhost_vdpa *v = dev->opaque; >>>>>> int ret; >>>>>> uint8_t status = 0; >>>>>> >>>>>> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); >>>>>> trace_vhost_vdpa_reset_device(dev); >>>>>> - v->suspended = false; >>>>>> + dev->suspended = false; >>>>>> return ret; >>>>>> } >>>>>> >>>>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) >>>>>> if (unlikely(r)) { >>>>>> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); >>>>>> } else { >>>>>> - v->suspended = true; >>>>>> + dev->suspended = true; >>>>>> return; >>>>>> } >>>>>> } >>>>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) >>>>>> return -1; >>>>>> } >>>>>> vhost_vdpa_set_vring_ready(dev); >>>>>> + if (dev->suspended) { >>>>>> + /* TODO: When RESUME is available, use it instead of resetting */ >>>>>> + vhost_vdpa_reset_status(dev); >>>>> How is that we reset the status at each vhost_vdpa_dev_start? That >>>>> will clean all the vqs configured, features negotiated, etc. in the >>>>> vDPA device. Or am I missing something? >>>> What alternative do you propose? We don’t have RESUME for vDPA in qemu, >>>> but we somehow need to lift the previous SUSPEND so the device will >>>> again respond to guest requests, do we not? >>>> >>> Reset also clears the suspend state in vDPA, and it should be called >>> at vhost_dev_stop. So the device should never be in suspended state >>> here. Does that solve your concerns? >> My intention with this patch was precisely not to reset in >> vhost_dev_stop when suspending is supported. So now I’m more confused >> than before. >> > At this moment, I think that should be focused as an optimization and > not to be included in the main series. It is absolutely not an optimization but vital for my use case. virtiofsd does not currently implement resetting, but if it did (and we want this support in the future), resetting it during stop/cont would be catastrophic. There must be a way to have qemu not issue a reset. This patch is the reason why this series exists. >>>> But more generally, is this any different from what is done before this >>>> patch? Before this patch, vhost_dev_stop() unconditionally invokes >>>> vhost_reset_status(), so the device is reset in every stop/start cycle, >>>> that doesn’t change. And we still won’t reset it on the first >>>> vhost_dev_start(), because dev->suspended will be false then, only on >>>> subsequent stop/start cycles, as before. So the only difference is that >>>> now the device is reset on start, not on stop. >>>> >>> The difference is that vhost_vdpa_dev_start is called after features >>> ack (via vhost_dev_start, through vhost_dev_set_features call) and vq >>> configuration (using vhost_virtqueue_start). A device reset forces the >>> device to forget about all of that, and qemu cannot configure them >>> again until qemu acks the features again. >> Now I’m completely confused, because I don’t see the point of >> implementing suspend at all if we rely on a reset immediately afterwards >> anyway. > It's not immediate. From vhost_dev_stop, comments added only in this mail: > > void vhost_virtqueue_stop(struct vhost_dev *dev, > struct VirtIODevice *vdev, > struct vhost_virtqueue *vq, > unsigned idx) > { > ... > // Get each vring indexes, trusting the destination device can > continue safely from there > r = dev->vhost_ops->vhost_get_vring_base(dev, &state); > if (r < 0) { > VHOST_OPS_DEBUG(r, "vhost VQ %u ring restore failed: %d", idx, r); > /* Connection to the backend is broken, so let's sync internal > * last avail idx to the device used idx. > */ > virtio_queue_restore_last_avail_idx(vdev, idx); > } else { > virtio_queue_set_last_avail_idx(vdev, idx, state.num); > } > ... > } > > void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings) > { > ... > // Suspend the device, so we can trust in vring indexes / vq state I don’t understand this purpose. GET_VRING_BASE stops the vring in question, so the vring index returned must be trustworthy, no? > if (hdev->vhost_ops->vhost_dev_start) { > hdev->vhost_ops->vhost_dev_start(hdev, false); > } > if (vrings) { > vhost_dev_set_vring_enable(hdev, false); > } > > // Fetch each vq index / state and store in vdev->vq[i] > for (i = 0; i < hdev->nvqs; ++i) { > vhost_virtqueue_stop(hdev, > vdev, > hdev->vqs + i, > hdev->vq_index + i); > } > > // Reset the device, as we don't need it anymore and it can > release the resources > if (hdev->vhost_ops->vhost_reset_status) { > hdev->vhost_ops->vhost_reset_status(hdev); > } > } > --- > >> It was my impression this whole time that suspending would >> remove the need to reset. Well, at least until the device should be >> resumed again, i.e. in vhost_dev_start(). >> > It cannot. vhost_dev_stop is also called when the guest reset the > device, so the guest trusts the device to be in a clean state. > > Also, the reset is the moment when the device frees the unused > resources. This would mandate the device to What resources are we talking about? This function is called when the VM is paused (stop). If a stateful device is reset to free “unused resources”, this means dropping its internal state, which is absolutely wrong in a stop/cont cycle. >> In addition, I also don’t understand the magnitude of the problem with >> ordering. If the order in vhost_dev_start() is wrong, can we not easily >> fix it? > The order in vhost_dev_start follows the VirtIO standard. What does the VirtIO standard say about suspended vhost back-ends? Hanna > The move of > the reset should be to remove it from vhost_dev_stop to something like > both virtio_reset and the end of virtio_save. I'm not sure if I'm > forgetting some other use cases. > >> E.g. add a full vhost_dev_resume callback to invoke right at >> the start of vhost_dev_start(); or check (in the same place) whether the >> back-end supports resuming, and if it doesn’t (and it is currently >> suspended), reset it there. >> >> In any case, if we need to reset in vhost_dev_stop(), i.e. immediately >> after suspend, I don’t see the point of suspending, indicating to me >> that I still fail to understand its purpose. >> >> Hanna >> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-25 13:09 ` Hanna Czenczek @ 2023-07-25 18:53 ` Eugenio Perez Martin 2023-07-26 6:57 ` Hanna Czenczek 0 siblings, 1 reply; 37+ messages in thread From: Eugenio Perez Martin @ 2023-07-25 18:53 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Stefan Hajnoczi, German Maglione On Tue, Jul 25, 2023 at 3:09 PM Hanna Czenczek <hreitz@redhat.com> wrote: > > On 25.07.23 12:03, Eugenio Perez Martin wrote: > > On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek <hreitz@redhat.com> wrote: > >> On 24.07.23 17:48, Eugenio Perez Martin wrote: > >>> On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote: > >>>> On 21.07.23 17:25, Eugenio Perez Martin wrote: > >>>>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote: > >>>>>> Move the `suspended` field from vhost_vdpa into the global vhost_dev > >>>>>> struct, so vhost_dev_stop() can check whether the back-end has been > >>>>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, > >>>>>> there is no need to reset it; the reset is just a fall-back to stop > >>>>>> device operations for back-ends that do not support suspend. > >>>>>> > >>>>>> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so > >>>>>> when the device is re-started, we still have to do the reset to have it > >>>>>> un-suspend. > >>>>>> > >>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > >>>>>> --- > >>>>>> include/hw/virtio/vhost-vdpa.h | 2 -- > >>>>>> include/hw/virtio/vhost.h | 8 ++++++++ > >>>>>> hw/virtio/vhost-vdpa.c | 11 +++++++---- > >>>>>> hw/virtio/vhost.c | 8 +++++++- > >>>>>> 4 files changed, 22 insertions(+), 7 deletions(-) > >>>>>> > >>>>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h > >>>>>> index e64bfc7f98..72c3686b7f 100644 > >>>>>> --- a/include/hw/virtio/vhost-vdpa.h > >>>>>> +++ b/include/hw/virtio/vhost-vdpa.h > >>>>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { > >>>>>> bool shadow_vqs_enabled; > >>>>>> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ > >>>>>> bool shadow_data; > >>>>>> - /* Device suspended successfully */ > >>>>>> - bool suspended; > >>>>>> /* IOVA mapping used by the Shadow Virtqueue */ > >>>>>> VhostIOVATree *iova_tree; > >>>>>> GPtrArray *shadow_vqs; > >>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h > >>>>>> index 6a173cb9fa..69bf59d630 100644 > >>>>>> --- a/include/hw/virtio/vhost.h > >>>>>> +++ b/include/hw/virtio/vhost.h > >>>>>> @@ -120,6 +120,14 @@ struct vhost_dev { > >>>>>> uint64_t backend_cap; > >>>>>> /* @started: is the vhost device started? */ > >>>>>> bool started; > >>>>>> + /** > >>>>>> + * @suspended: Whether the vhost device is currently suspended. Set > >>>>>> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which > >>>>>> + * are supposed to automatically suspend/resume in their > >>>>>> + * vhost_dev_start handlers as required. Must also be cleared when > >>>>>> + * the device is reset. > >>>>>> + */ > >>>>>> + bool suspended; > >>>>>> bool log_enabled; > >>>>>> uint64_t log_size; > >>>>>> Error *migration_blocker; > >>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > >>>>>> index 7b7dee468e..f7fd19a203 100644 > >>>>>> --- a/hw/virtio/vhost-vdpa.c > >>>>>> +++ b/hw/virtio/vhost-vdpa.c > >>>>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, > >>>>>> > >>>>>> static int vhost_vdpa_reset_device(struct vhost_dev *dev) > >>>>>> { > >>>>>> - struct vhost_vdpa *v = dev->opaque; > >>>>>> int ret; > >>>>>> uint8_t status = 0; > >>>>>> > >>>>>> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); > >>>>>> trace_vhost_vdpa_reset_device(dev); > >>>>>> - v->suspended = false; > >>>>>> + dev->suspended = false; > >>>>>> return ret; > >>>>>> } > >>>>>> > >>>>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) > >>>>>> if (unlikely(r)) { > >>>>>> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); > >>>>>> } else { > >>>>>> - v->suspended = true; > >>>>>> + dev->suspended = true; > >>>>>> return; > >>>>>> } > >>>>>> } > >>>>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) > >>>>>> return -1; > >>>>>> } > >>>>>> vhost_vdpa_set_vring_ready(dev); > >>>>>> + if (dev->suspended) { > >>>>>> + /* TODO: When RESUME is available, use it instead of resetting */ > >>>>>> + vhost_vdpa_reset_status(dev); > >>>>> How is that we reset the status at each vhost_vdpa_dev_start? That > >>>>> will clean all the vqs configured, features negotiated, etc. in the > >>>>> vDPA device. Or am I missing something? > >>>> What alternative do you propose? We don’t have RESUME for vDPA in qemu, > >>>> but we somehow need to lift the previous SUSPEND so the device will > >>>> again respond to guest requests, do we not? > >>>> > >>> Reset also clears the suspend state in vDPA, and it should be called > >>> at vhost_dev_stop. So the device should never be in suspended state > >>> here. Does that solve your concerns? > >> My intention with this patch was precisely not to reset in > >> vhost_dev_stop when suspending is supported. So now I’m more confused > >> than before. > >> > > At this moment, I think that should be focused as an optimization and > > not to be included in the main series. > > It is absolutely not an optimization but vital for my use case. > virtiofsd does not currently implement resetting, but if it did (and we > want this support in the future), resetting it during stop/cont would be > catastrophic. There must be a way to have qemu not issue a reset. This > patch is the reason why this series exists. > Sorry, I see I confused things with the first reply. Let me do a recap. If I understand the problem correctly, your use case requires that qemu does not reset the device before the device state is fetched with virtio_save in the case of a migration. This is understandable and I think we have a solution for that: to move the vhost_ops call to virtio_reset and the end of virtio_save. To remove the reset call from vhost_dev_stop is somehow mandatory, as it is called before virtio_save. But we cannot move to vhost_vdpa_dev_start, as proposed here. The reasons are: * All the features, vq parameters, etc are set before any vhost_vdpa_dev_start call. To reset at any vhost_vdpa_dev_start would wipe them. * The device needs to hold all the resources until it is reset. Things like descriptor status etc. And, regarding the comment "When RESUME is available, use it instead of resetting", we cannot use resume to replace reset in all cases. This is because the semantics are different. For example, vhost_dev_stop and vhost_dev_start are also called when the guest reset by itself the device. You can check it rmmoding and modprobing virtio-net driver, for example. In this case, the driver expects to find all vqs to start with 0, but the resume must not reset these indexes. It can be applied as an optimizations sometimes, but not for the general case. > >>>> But more generally, is this any different from what is done before this > >>>> patch? Before this patch, vhost_dev_stop() unconditionally invokes > >>>> vhost_reset_status(), so the device is reset in every stop/start cycle, > >>>> that doesn’t change. And we still won’t reset it on the first > >>>> vhost_dev_start(), because dev->suspended will be false then, only on > >>>> subsequent stop/start cycles, as before. So the only difference is that > >>>> now the device is reset on start, not on stop. > >>>> > >>> The difference is that vhost_vdpa_dev_start is called after features > >>> ack (via vhost_dev_start, through vhost_dev_set_features call) and vq > >>> configuration (using vhost_virtqueue_start). A device reset forces the > >>> device to forget about all of that, and qemu cannot configure them > >>> again until qemu acks the features again. > >> Now I’m completely confused, because I don’t see the point of > >> implementing suspend at all if we rely on a reset immediately afterwards > >> anyway. > > It's not immediate. From vhost_dev_stop, comments added only in this mail: > > > > void vhost_virtqueue_stop(struct vhost_dev *dev, > > struct VirtIODevice *vdev, > > struct vhost_virtqueue *vq, > > unsigned idx) > > { > > ... > > // Get each vring indexes, trusting the destination device can > > continue safely from there > > r = dev->vhost_ops->vhost_get_vring_base(dev, &state); > > if (r < 0) { > > VHOST_OPS_DEBUG(r, "vhost VQ %u ring restore failed: %d", idx, r); > > /* Connection to the backend is broken, so let's sync internal > > * last avail idx to the device used idx. > > */ > > virtio_queue_restore_last_avail_idx(vdev, idx); > > } else { > > virtio_queue_set_last_avail_idx(vdev, idx, state.num); > > } > > ... > > } > > > > void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings) > > { > > ... > > // Suspend the device, so we can trust in vring indexes / vq state > > I don’t understand this purpose. GET_VRING_BASE stops the vring in > question, so the vring index returned must be trustworthy, no? > That only happens in vhost-user, not in vhost-vdpa. > > if (hdev->vhost_ops->vhost_dev_start) { > > hdev->vhost_ops->vhost_dev_start(hdev, false); > > } > > if (vrings) { > > vhost_dev_set_vring_enable(hdev, false); > > } > > > > // Fetch each vq index / state and store in vdev->vq[i] > > for (i = 0; i < hdev->nvqs; ++i) { > > vhost_virtqueue_stop(hdev, > > vdev, > > hdev->vqs + i, > > hdev->vq_index + i); > > } > > > > // Reset the device, as we don't need it anymore and it can > > release the resources > > if (hdev->vhost_ops->vhost_reset_status) { > > hdev->vhost_ops->vhost_reset_status(hdev); > > } > > } > > --- > > > >> It was my impression this whole time that suspending would > >> remove the need to reset. Well, at least until the device should be > >> resumed again, i.e. in vhost_dev_start(). > >> > > It cannot. vhost_dev_stop is also called when the guest reset the > > device, so the guest trusts the device to be in a clean state. > > > > Also, the reset is the moment when the device frees the unused > > resources. This would mandate the device to > > What resources are we talking about? This function is called when the > VM is paused (stop). If a stateful device is reset to free “unused > resources”, this means dropping its internal state, which is absolutely > wrong in a stop/cont cycle. > But is the expected result in the virtio reset cycle. We need to split these paths. > >> In addition, I also don’t understand the magnitude of the problem with > >> ordering. If the order in vhost_dev_start() is wrong, can we not easily > >> fix it? > > The order in vhost_dev_start follows the VirtIO standard. > > What does the VirtIO standard say about suspended vhost back-ends? > Suspend does not exist in the VirtIO standard. I meant the device initialization order in "3.1 Device Initialization": 1. Reset the device. ... 5. Set the FEATURES_OK status bit. [...] ... 7. Perform device-specific setup, including discovery of virtqueues for the device, optional per-bus setup, reading and possibly writing the device’s virtio configuration space, and population of virtqueues. 8.Set the DRIVER_OK status bit. At this point the device is “live”. Steps 4-8 are all done in vhost_dev_start, in that particular order. To call vhost_vdpa_reset_status from vhost_vdpa_dev_start(true) would reset the device back to step 1, but there is no more code to set all configuration from 2-7 before 8 (DRIVER_OK). > Hanna > > > The move of > > the reset should be to remove it from vhost_dev_stop to something like > > both virtio_reset and the end of virtio_save. I'm not sure if I'm > > forgetting some other use cases. > > > >> E.g. add a full vhost_dev_resume callback to invoke right at > >> the start of vhost_dev_start(); or check (in the same place) whether the > >> back-end supports resuming, and if it doesn’t (and it is currently > >> suspended), reset it there. > >> > >> In any case, if we need to reset in vhost_dev_stop(), i.e. immediately > >> after suspend, I don’t see the point of suspending, indicating to me > >> that I still fail to understand its purpose. > >> > >> Hanna > >> > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-25 18:53 ` Eugenio Perez Martin @ 2023-07-26 6:57 ` Hanna Czenczek 2023-07-27 12:49 ` Eugenio Perez Martin 0 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-26 6:57 UTC (permalink / raw) To: Eugenio Perez Martin Cc: qemu-devel, Michael S . Tsirkin, Stefan Hajnoczi, German Maglione On 25.07.23 20:53, Eugenio Perez Martin wrote: > On Tue, Jul 25, 2023 at 3:09 PM Hanna Czenczek <hreitz@redhat.com> wrote: >> On 25.07.23 12:03, Eugenio Perez Martin wrote: >>> On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek <hreitz@redhat.com> wrote: >>>> On 24.07.23 17:48, Eugenio Perez Martin wrote: >>>>> On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote: >>>>>> On 21.07.23 17:25, Eugenio Perez Martin wrote: >>>>>>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote: >>>>>>>> Move the `suspended` field from vhost_vdpa into the global vhost_dev >>>>>>>> struct, so vhost_dev_stop() can check whether the back-end has been >>>>>>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, >>>>>>>> there is no need to reset it; the reset is just a fall-back to stop >>>>>>>> device operations for back-ends that do not support suspend. >>>>>>>> >>>>>>>> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so >>>>>>>> when the device is re-started, we still have to do the reset to have it >>>>>>>> un-suspend. >>>>>>>> >>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>>>>> --- >>>>>>>> include/hw/virtio/vhost-vdpa.h | 2 -- >>>>>>>> include/hw/virtio/vhost.h | 8 ++++++++ >>>>>>>> hw/virtio/vhost-vdpa.c | 11 +++++++---- >>>>>>>> hw/virtio/vhost.c | 8 +++++++- >>>>>>>> 4 files changed, 22 insertions(+), 7 deletions(-) >>>>>>>> >>>>>>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h >>>>>>>> index e64bfc7f98..72c3686b7f 100644 >>>>>>>> --- a/include/hw/virtio/vhost-vdpa.h >>>>>>>> +++ b/include/hw/virtio/vhost-vdpa.h >>>>>>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { >>>>>>>> bool shadow_vqs_enabled; >>>>>>>> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ >>>>>>>> bool shadow_data; >>>>>>>> - /* Device suspended successfully */ >>>>>>>> - bool suspended; >>>>>>>> /* IOVA mapping used by the Shadow Virtqueue */ >>>>>>>> VhostIOVATree *iova_tree; >>>>>>>> GPtrArray *shadow_vqs; >>>>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h >>>>>>>> index 6a173cb9fa..69bf59d630 100644 >>>>>>>> --- a/include/hw/virtio/vhost.h >>>>>>>> +++ b/include/hw/virtio/vhost.h >>>>>>>> @@ -120,6 +120,14 @@ struct vhost_dev { >>>>>>>> uint64_t backend_cap; >>>>>>>> /* @started: is the vhost device started? */ >>>>>>>> bool started; >>>>>>>> + /** >>>>>>>> + * @suspended: Whether the vhost device is currently suspended. Set >>>>>>>> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which >>>>>>>> + * are supposed to automatically suspend/resume in their >>>>>>>> + * vhost_dev_start handlers as required. Must also be cleared when >>>>>>>> + * the device is reset. >>>>>>>> + */ >>>>>>>> + bool suspended; >>>>>>>> bool log_enabled; >>>>>>>> uint64_t log_size; >>>>>>>> Error *migration_blocker; >>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c >>>>>>>> index 7b7dee468e..f7fd19a203 100644 >>>>>>>> --- a/hw/virtio/vhost-vdpa.c >>>>>>>> +++ b/hw/virtio/vhost-vdpa.c >>>>>>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, >>>>>>>> >>>>>>>> static int vhost_vdpa_reset_device(struct vhost_dev *dev) >>>>>>>> { >>>>>>>> - struct vhost_vdpa *v = dev->opaque; >>>>>>>> int ret; >>>>>>>> uint8_t status = 0; >>>>>>>> >>>>>>>> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); >>>>>>>> trace_vhost_vdpa_reset_device(dev); >>>>>>>> - v->suspended = false; >>>>>>>> + dev->suspended = false; >>>>>>>> return ret; >>>>>>>> } >>>>>>>> >>>>>>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) >>>>>>>> if (unlikely(r)) { >>>>>>>> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); >>>>>>>> } else { >>>>>>>> - v->suspended = true; >>>>>>>> + dev->suspended = true; >>>>>>>> return; >>>>>>>> } >>>>>>>> } >>>>>>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) >>>>>>>> return -1; >>>>>>>> } >>>>>>>> vhost_vdpa_set_vring_ready(dev); >>>>>>>> + if (dev->suspended) { >>>>>>>> + /* TODO: When RESUME is available, use it instead of resetting */ >>>>>>>> + vhost_vdpa_reset_status(dev); >>>>>>> How is that we reset the status at each vhost_vdpa_dev_start? That >>>>>>> will clean all the vqs configured, features negotiated, etc. in the >>>>>>> vDPA device. Or am I missing something? >>>>>> What alternative do you propose? We don’t have RESUME for vDPA in qemu, >>>>>> but we somehow need to lift the previous SUSPEND so the device will >>>>>> again respond to guest requests, do we not? >>>>>> >>>>> Reset also clears the suspend state in vDPA, and it should be called >>>>> at vhost_dev_stop. So the device should never be in suspended state >>>>> here. Does that solve your concerns? >>>> My intention with this patch was precisely not to reset in >>>> vhost_dev_stop when suspending is supported. So now I’m more confused >>>> than before. >>>> >>> At this moment, I think that should be focused as an optimization and >>> not to be included in the main series. >> It is absolutely not an optimization but vital for my use case. >> virtiofsd does not currently implement resetting, but if it did (and we >> want this support in the future), resetting it during stop/cont would be >> catastrophic. There must be a way to have qemu not issue a reset. This >> patch is the reason why this series exists. >> > Sorry, I see I confused things with the first reply. Let me do a recap. > > If I understand the problem correctly, your use case requires that > qemu does not reset the device before the device state is fetched with > virtio_save in the case of a migration. That is only part of the problem, the bigger picture has nothing to do with migration at all. The problem is that when the VM is paused (stop), we invoke vhost_dev_stop(), and when it is resumed (cont), we invoke vhost_dev_start(). To me, it therefore sounds absolutely wrong to reset the back-end in either of these functions. For stateless devices, it was determined to not be an issue (I still find it extremely strange), and as far as I’ve understood, we’ve come to the agreement that it’s basically a fallback for when there is no other way to stop the back-end. But stateful devices like virtio-fs would be completely broken by resetting them there. Therefore, if virtiofsd did implement reset through SET_STATUS, stop/cont would break it today. Maybe other vhost-user devices, too, which just implement RESET_OWNER/RESET_DEVICE, which aren’t even called when the device is supposed to be reset in vhost_dev_stop() (patch 6). So not just because of migration, but in general, there must be a way for back-ends to force qemu not to reset them in vhost_dev_start()/stop(). Or we stop using vhost_dev_start()/stop() when the VM is paused/resumed (stop/cont). > This is understandable and I > think we have a solution for that: to move the vhost_ops call to > virtio_reset and the end of virtio_save. Why would we reset the device in virtio_save()? > To remove the reset call from > vhost_dev_stop is somehow mandatory, as it is called before > virtio_save. > > But we cannot move to vhost_vdpa_dev_start, as proposed here. The reasons are: > * All the features, vq parameters, etc are set before any > vhost_vdpa_dev_start call. To reset at any vhost_vdpa_dev_start would > wipe them. > * The device needs to hold all the resources until it is reset. Things > like descriptor status etc. > > And, regarding the comment "When RESUME is available, use it instead > of resetting", we cannot use resume to replace reset in all cases. > This is because the semantics are different. > > For example, vhost_dev_stop and vhost_dev_start are also called when > the guest reset by itself the device. You can check it rmmoding and > modprobing virtio-net driver, for example. In this case, the driver > expects to find all vqs to start with 0, but the resume must not reset > these indexes. This isn’t quite clear to me. I understand this to mean that there must be a reset somewhere in vhost_dev_stop() and/or vhost_dev_start(). But above, you proposed moving the reset from vhost_dev_stop() into virtio_reset(). Is virtio_reset() called in addition to vhost_dev_stop() and vhost_dev_start() when the guest driver is changed? Because we can’t have an always-present reset in vhost_dev_stop() or vhost_dev_start(). It just doesn’t work with stop/cont. At the same time, I understand that you say we must have it because vhost_dev_{stop,start}() are also used when the guest driver changes. Consequently, it sounds clear to me that using the exact same functions in stop/cont and when the guest driver is unloaded/loaded is and has always been wrong. Because in stop/cont, the guest driver never changes, so we shouldn’t tell the back-end that it did (by sending SET_STATUS(0)). > It can be applied as an optimizations sometimes, but not for the general case. > >>>>>> But more generally, is this any different from what is done before this >>>>>> patch? Before this patch, vhost_dev_stop() unconditionally invokes >>>>>> vhost_reset_status(), so the device is reset in every stop/start cycle, >>>>>> that doesn’t change. And we still won’t reset it on the first >>>>>> vhost_dev_start(), because dev->suspended will be false then, only on >>>>>> subsequent stop/start cycles, as before. So the only difference is that >>>>>> now the device is reset on start, not on stop. >>>>>> >>>>> The difference is that vhost_vdpa_dev_start is called after features >>>>> ack (via vhost_dev_start, through vhost_dev_set_features call) and vq >>>>> configuration (using vhost_virtqueue_start). A device reset forces the >>>>> device to forget about all of that, and qemu cannot configure them >>>>> again until qemu acks the features again. >>>> Now I’m completely confused, because I don’t see the point of >>>> implementing suspend at all if we rely on a reset immediately afterwards >>>> anyway. >>> It's not immediate. From vhost_dev_stop, comments added only in this mail: >>> >>> void vhost_virtqueue_stop(struct vhost_dev *dev, >>> struct VirtIODevice *vdev, >>> struct vhost_virtqueue *vq, >>> unsigned idx) >>> { >>> ... >>> // Get each vring indexes, trusting the destination device can >>> continue safely from there >>> r = dev->vhost_ops->vhost_get_vring_base(dev, &state); >>> if (r < 0) { >>> VHOST_OPS_DEBUG(r, "vhost VQ %u ring restore failed: %d", idx, r); >>> /* Connection to the backend is broken, so let's sync internal >>> * last avail idx to the device used idx. >>> */ >>> virtio_queue_restore_last_avail_idx(vdev, idx); >>> } else { >>> virtio_queue_set_last_avail_idx(vdev, idx, state.num); >>> } >>> ... >>> } >>> >>> void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings) >>> { >>> ... >>> // Suspend the device, so we can trust in vring indexes / vq state >> I don’t understand this purpose. GET_VRING_BASE stops the vring in >> question, so the vring index returned must be trustworthy, no? >> > That only happens in vhost-user, not in vhost-vdpa. OK, so that begs the question: Was SUSPEND ever intended to do anything but stop all vrings? Because this series is about to make its meaning a whole lot broader than that in vhost-user. >>> if (hdev->vhost_ops->vhost_dev_start) { >>> hdev->vhost_ops->vhost_dev_start(hdev, false); >>> } >>> if (vrings) { >>> vhost_dev_set_vring_enable(hdev, false); >>> } >>> >>> // Fetch each vq index / state and store in vdev->vq[i] >>> for (i = 0; i < hdev->nvqs; ++i) { >>> vhost_virtqueue_stop(hdev, >>> vdev, >>> hdev->vqs + i, >>> hdev->vq_index + i); >>> } >>> >>> // Reset the device, as we don't need it anymore and it can >>> release the resources >>> if (hdev->vhost_ops->vhost_reset_status) { >>> hdev->vhost_ops->vhost_reset_status(hdev); >>> } >>> } >>> --- >>> >>>> It was my impression this whole time that suspending would >>>> remove the need to reset. Well, at least until the device should be >>>> resumed again, i.e. in vhost_dev_start(). >>>> >>> It cannot. vhost_dev_stop is also called when the guest reset the >>> device, so the guest trusts the device to be in a clean state. >>> >>> Also, the reset is the moment when the device frees the unused >>> resources. This would mandate the device to >> What resources are we talking about? This function is called when the >> VM is paused (stop). If a stateful device is reset to free “unused >> resources”, this means dropping its internal state, which is absolutely >> wrong in a stop/cont cycle. >> > But is the expected result in the virtio reset cycle. We need to split > these paths. > >>>> In addition, I also don’t understand the magnitude of the problem with >>>> ordering. If the order in vhost_dev_start() is wrong, can we not easily >>>> fix it? >>> The order in vhost_dev_start follows the VirtIO standard. >> What does the VirtIO standard say about suspended vhost back-ends? >> > Suspend does not exist in the VirtIO standard. I meant the device > initialization order in "3.1 Device Initialization": > > 1. Reset the device. > ... > 5. Set the FEATURES_OK status bit. [...] > ... > 7. Perform device-specific setup, including discovery of virtqueues > for the device, optional per-bus setup, reading and possibly writing > the device’s virtio configuration space, and population of virtqueues. > 8.Set the DRIVER_OK status bit. At this point the device is “live”. > > Steps 4-8 are all done in vhost_dev_start, in that particular order. > To call vhost_vdpa_reset_status from vhost_vdpa_dev_start(true) would > reset the device back to step 1, but there is no more code to set all > configuration from 2-7 before 8 (DRIVER_OK). That’s why I’ve proposed doing the reset at the start of vhost_dev_start() (quoted below still). To me, that sounds in line with the virtio specification. Still, if you insist there must a reset somewhere in vhost_dev_start()/stop() because it may be guest-initiated, then there is no solution that can work for both. We must be able to distinguish between a paused VM and a change in the guest driver. >> Hanna >> >>> The move of >>> the reset should be to remove it from vhost_dev_stop to something like >>> both virtio_reset and the end of virtio_save. I'm not sure if I'm >>> forgetting some other use cases. >>> >>>> E.g. add a full vhost_dev_resume callback to invoke right at >>>> the start of vhost_dev_start(); or check (in the same place) whether the >>>> back-end supports resuming, and if it doesn’t (and it is currently >>>> suspended), reset it there. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-26 6:57 ` Hanna Czenczek @ 2023-07-27 12:49 ` Eugenio Perez Martin 2023-07-27 20:26 ` Stefan Hajnoczi 0 siblings, 1 reply; 37+ messages in thread From: Eugenio Perez Martin @ 2023-07-27 12:49 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Stefan Hajnoczi, German Maglione On Wed, Jul 26, 2023 at 8:57 AM Hanna Czenczek <hreitz@redhat.com> wrote: > > On 25.07.23 20:53, Eugenio Perez Martin wrote: > > On Tue, Jul 25, 2023 at 3:09 PM Hanna Czenczek <hreitz@redhat.com> wrote: > >> On 25.07.23 12:03, Eugenio Perez Martin wrote: > >>> On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek <hreitz@redhat.com> wrote: > >>>> On 24.07.23 17:48, Eugenio Perez Martin wrote: > >>>>> On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote: > >>>>>> On 21.07.23 17:25, Eugenio Perez Martin wrote: > >>>>>>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote: > >>>>>>>> Move the `suspended` field from vhost_vdpa into the global vhost_dev > >>>>>>>> struct, so vhost_dev_stop() can check whether the back-end has been > >>>>>>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, > >>>>>>>> there is no need to reset it; the reset is just a fall-back to stop > >>>>>>>> device operations for back-ends that do not support suspend. > >>>>>>>> > >>>>>>>> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so > >>>>>>>> when the device is re-started, we still have to do the reset to have it > >>>>>>>> un-suspend. > >>>>>>>> > >>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > >>>>>>>> --- > >>>>>>>> include/hw/virtio/vhost-vdpa.h | 2 -- > >>>>>>>> include/hw/virtio/vhost.h | 8 ++++++++ > >>>>>>>> hw/virtio/vhost-vdpa.c | 11 +++++++---- > >>>>>>>> hw/virtio/vhost.c | 8 +++++++- > >>>>>>>> 4 files changed, 22 insertions(+), 7 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h > >>>>>>>> index e64bfc7f98..72c3686b7f 100644 > >>>>>>>> --- a/include/hw/virtio/vhost-vdpa.h > >>>>>>>> +++ b/include/hw/virtio/vhost-vdpa.h > >>>>>>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { > >>>>>>>> bool shadow_vqs_enabled; > >>>>>>>> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ > >>>>>>>> bool shadow_data; > >>>>>>>> - /* Device suspended successfully */ > >>>>>>>> - bool suspended; > >>>>>>>> /* IOVA mapping used by the Shadow Virtqueue */ > >>>>>>>> VhostIOVATree *iova_tree; > >>>>>>>> GPtrArray *shadow_vqs; > >>>>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h > >>>>>>>> index 6a173cb9fa..69bf59d630 100644 > >>>>>>>> --- a/include/hw/virtio/vhost.h > >>>>>>>> +++ b/include/hw/virtio/vhost.h > >>>>>>>> @@ -120,6 +120,14 @@ struct vhost_dev { > >>>>>>>> uint64_t backend_cap; > >>>>>>>> /* @started: is the vhost device started? */ > >>>>>>>> bool started; > >>>>>>>> + /** > >>>>>>>> + * @suspended: Whether the vhost device is currently suspended. Set > >>>>>>>> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which > >>>>>>>> + * are supposed to automatically suspend/resume in their > >>>>>>>> + * vhost_dev_start handlers as required. Must also be cleared when > >>>>>>>> + * the device is reset. > >>>>>>>> + */ > >>>>>>>> + bool suspended; > >>>>>>>> bool log_enabled; > >>>>>>>> uint64_t log_size; > >>>>>>>> Error *migration_blocker; > >>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > >>>>>>>> index 7b7dee468e..f7fd19a203 100644 > >>>>>>>> --- a/hw/virtio/vhost-vdpa.c > >>>>>>>> +++ b/hw/virtio/vhost-vdpa.c > >>>>>>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, > >>>>>>>> > >>>>>>>> static int vhost_vdpa_reset_device(struct vhost_dev *dev) > >>>>>>>> { > >>>>>>>> - struct vhost_vdpa *v = dev->opaque; > >>>>>>>> int ret; > >>>>>>>> uint8_t status = 0; > >>>>>>>> > >>>>>>>> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); > >>>>>>>> trace_vhost_vdpa_reset_device(dev); > >>>>>>>> - v->suspended = false; > >>>>>>>> + dev->suspended = false; > >>>>>>>> return ret; > >>>>>>>> } > >>>>>>>> > >>>>>>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) > >>>>>>>> if (unlikely(r)) { > >>>>>>>> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); > >>>>>>>> } else { > >>>>>>>> - v->suspended = true; > >>>>>>>> + dev->suspended = true; > >>>>>>>> return; > >>>>>>>> } > >>>>>>>> } > >>>>>>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) > >>>>>>>> return -1; > >>>>>>>> } > >>>>>>>> vhost_vdpa_set_vring_ready(dev); > >>>>>>>> + if (dev->suspended) { > >>>>>>>> + /* TODO: When RESUME is available, use it instead of resetting */ > >>>>>>>> + vhost_vdpa_reset_status(dev); > >>>>>>> How is that we reset the status at each vhost_vdpa_dev_start? That > >>>>>>> will clean all the vqs configured, features negotiated, etc. in the > >>>>>>> vDPA device. Or am I missing something? > >>>>>> What alternative do you propose? We don’t have RESUME for vDPA in qemu, > >>>>>> but we somehow need to lift the previous SUSPEND so the device will > >>>>>> again respond to guest requests, do we not? > >>>>>> > >>>>> Reset also clears the suspend state in vDPA, and it should be called > >>>>> at vhost_dev_stop. So the device should never be in suspended state > >>>>> here. Does that solve your concerns? > >>>> My intention with this patch was precisely not to reset in > >>>> vhost_dev_stop when suspending is supported. So now I’m more confused > >>>> than before. > >>>> > >>> At this moment, I think that should be focused as an optimization and > >>> not to be included in the main series. > >> It is absolutely not an optimization but vital for my use case. > >> virtiofsd does not currently implement resetting, but if it did (and we > >> want this support in the future), resetting it during stop/cont would be > >> catastrophic. There must be a way to have qemu not issue a reset. This > >> patch is the reason why this series exists. > >> > > Sorry, I see I confused things with the first reply. Let me do a recap. > > > > If I understand the problem correctly, your use case requires that > > qemu does not reset the device before the device state is fetched with > > virtio_save in the case of a migration. > > That is only part of the problem, the bigger picture has nothing to do > with migration at all. The problem is that when the VM is paused > (stop), we invoke vhost_dev_stop(), and when it is resumed (cont), we > invoke vhost_dev_start(). To me, it therefore sounds absolutely wrong > to reset the back-end in either of these functions. For stateless > devices, it was determined to not be an issue (I still find it extremely > strange), and as far as I’ve understood, we’ve come to the agreement > that it’s basically a fallback for when there is no other way to stop > the back-end. But stateful devices like virtio-fs would be completely > broken by resetting them there. > > Therefore, if virtiofsd did implement reset through SET_STATUS, > stop/cont would break it today. Maybe other vhost-user devices, too, > which just implement RESET_OWNER/RESET_DEVICE, which aren’t even called > when the device is supposed to be reset in vhost_dev_stop() (patch 6). > > So not just because of migration, but in general, there must be a way > for back-ends to force qemu not to reset them in vhost_dev_start()/stop(). > > Or we stop using vhost_dev_start()/stop() when the VM is paused/resumed > (stop/cont). > Yes, that comes back to the thread [1]. As a third alternative, you can keep vhost_dev_start and let the function check the current state and initialize the device only if needed. But you can keep symmetrical functions and call one or another at the device's code, of course. Not sure what is cleaner or requires less changes. > > This is understandable and I > > think we have a solution for that: to move the vhost_ops call to > > virtio_reset and the end of virtio_save. > > Why would we reset the device in virtio_save()? > If the VM continues in the source because of whatever reason, vhost_dev_start would expect the device to be clean. You can test it with the command "cont" after the LM. > > To remove the reset call from > > vhost_dev_stop is somehow mandatory, as it is called before > > virtio_save. > > > > But we cannot move to vhost_vdpa_dev_start, as proposed here. The reasons are: > > * All the features, vq parameters, etc are set before any > > vhost_vdpa_dev_start call. To reset at any vhost_vdpa_dev_start would > > wipe them. > > * The device needs to hold all the resources until it is reset. Things > > like descriptor status etc. > > > > And, regarding the comment "When RESUME is available, use it instead > > of resetting", we cannot use resume to replace reset in all cases. > > This is because the semantics are different. > > > > For example, vhost_dev_stop and vhost_dev_start are also called when > > the guest reset by itself the device. You can check it rmmoding and > > modprobing virtio-net driver, for example. In this case, the driver > > expects to find all vqs to start with 0, but the resume must not reset > > these indexes. > > This isn’t quite clear to me. I understand this to mean that there must > be a reset somewhere in vhost_dev_stop() and/or vhost_dev_start(). But > above, you proposed moving the reset from vhost_dev_stop() into > virtio_reset(). Is virtio_reset() called in addition to > vhost_dev_stop() and vhost_dev_start() when the guest driver is changed? > Right. Maybe another option is virtio_set_status? The point is that other parts of qemu / guest trust the device to be reset after (current version of) vhost_dev_stop, so if we are going to move the reset it must be added to the callers at least. To trace these callers is needed, so maybe it is easy to add another function (vhost_dev_suspend?), your first alternative. > Because we can’t have an always-present reset in vhost_dev_stop() or > vhost_dev_start(). It just doesn’t work with stop/cont. At the same > time, I understand that you say we must have it because > vhost_dev_{stop,start}() are also used when the guest driver changes. > Consequently, it sounds clear to me that using the exact same functions > in stop/cont and when the guest driver is unloaded/loaded is and has > always been wrong. Because in stop/cont, the guest driver never > changes, so we shouldn’t tell the back-end that it did (by sending > SET_STATUS(0)). > Talking just about vhost_net here, the device dumps all the state (like vqs) to qemu in vhost_dev_stop. That is what allows to have, for example, an unified code in migrating: qemu only needs to know how to migrate its emulated device, and vhost code just writes or read at suspend / continue or live migrating. To suspend and continue is the same operation actually for vhost_net. > > It can be applied as an optimizations sometimes, but not for the general case. > > > >>>>>> But more generally, is this any different from what is done before this > >>>>>> patch? Before this patch, vhost_dev_stop() unconditionally invokes > >>>>>> vhost_reset_status(), so the device is reset in every stop/start cycle, > >>>>>> that doesn’t change. And we still won’t reset it on the first > >>>>>> vhost_dev_start(), because dev->suspended will be false then, only on > >>>>>> subsequent stop/start cycles, as before. So the only difference is that > >>>>>> now the device is reset on start, not on stop. > >>>>>> > >>>>> The difference is that vhost_vdpa_dev_start is called after features > >>>>> ack (via vhost_dev_start, through vhost_dev_set_features call) and vq > >>>>> configuration (using vhost_virtqueue_start). A device reset forces the > >>>>> device to forget about all of that, and qemu cannot configure them > >>>>> again until qemu acks the features again. > >>>> Now I’m completely confused, because I don’t see the point of > >>>> implementing suspend at all if we rely on a reset immediately afterwards > >>>> anyway. > >>> It's not immediate. From vhost_dev_stop, comments added only in this mail: > >>> > >>> void vhost_virtqueue_stop(struct vhost_dev *dev, > >>> struct VirtIODevice *vdev, > >>> struct vhost_virtqueue *vq, > >>> unsigned idx) > >>> { > >>> ... > >>> // Get each vring indexes, trusting the destination device can > >>> continue safely from there > >>> r = dev->vhost_ops->vhost_get_vring_base(dev, &state); > >>> if (r < 0) { > >>> VHOST_OPS_DEBUG(r, "vhost VQ %u ring restore failed: %d", idx, r); > >>> /* Connection to the backend is broken, so let's sync internal > >>> * last avail idx to the device used idx. > >>> */ > >>> virtio_queue_restore_last_avail_idx(vdev, idx); > >>> } else { > >>> virtio_queue_set_last_avail_idx(vdev, idx, state.num); > >>> } > >>> ... > >>> } > >>> > >>> void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice *vdev, bool vrings) > >>> { > >>> ... > >>> // Suspend the device, so we can trust in vring indexes / vq state > >> I don’t understand this purpose. GET_VRING_BASE stops the vring in > >> question, so the vring index returned must be trustworthy, no? > >> > > That only happens in vhost-user, not in vhost-vdpa. > > OK, so that begs the question: Was SUSPEND ever intended to do anything > but stop all vrings? Because this series is about to make its meaning a > whole lot broader than that in vhost-user. > That was the big part, yes. The device must also stop modifying config, like in the case of link status changes. Other actions like fetch vq state are performed later (at vhost_virtqueue_stop). VHOST_VDPA_SUSPEND requires the device to preserve all the needed state to be recovered or continued: Suspend a device so it does not process virtqueue requests anymore After the return of ioctl the device must preserve all the necessary state (the virtqueue vring base plus the possible device specific states) that is required for restoring in the future. The device must not change its configuration after that point. --- > >>> if (hdev->vhost_ops->vhost_dev_start) { > >>> hdev->vhost_ops->vhost_dev_start(hdev, false); > >>> } > >>> if (vrings) { > >>> vhost_dev_set_vring_enable(hdev, false); > >>> } > >>> > >>> // Fetch each vq index / state and store in vdev->vq[i] > >>> for (i = 0; i < hdev->nvqs; ++i) { > >>> vhost_virtqueue_stop(hdev, > >>> vdev, > >>> hdev->vqs + i, > >>> hdev->vq_index + i); > >>> } > >>> > >>> // Reset the device, as we don't need it anymore and it can > >>> release the resources > >>> if (hdev->vhost_ops->vhost_reset_status) { > >>> hdev->vhost_ops->vhost_reset_status(hdev); > >>> } > >>> } > >>> --- > >>> > >>>> It was my impression this whole time that suspending would > >>>> remove the need to reset. Well, at least until the device should be > >>>> resumed again, i.e. in vhost_dev_start(). > >>>> > >>> It cannot. vhost_dev_stop is also called when the guest reset the > >>> device, so the guest trusts the device to be in a clean state. > >>> > >>> Also, the reset is the moment when the device frees the unused > >>> resources. This would mandate the device to > >> What resources are we talking about? This function is called when the > >> VM is paused (stop). If a stateful device is reset to free “unused > >> resources”, this means dropping its internal state, which is absolutely > >> wrong in a stop/cont cycle. > >> > > But is the expected result in the virtio reset cycle. We need to split > > these paths. > > > >>>> In addition, I also don’t understand the magnitude of the problem with > >>>> ordering. If the order in vhost_dev_start() is wrong, can we not easily > >>>> fix it? > >>> The order in vhost_dev_start follows the VirtIO standard. > >> What does the VirtIO standard say about suspended vhost back-ends? > >> > > Suspend does not exist in the VirtIO standard. I meant the device > > initialization order in "3.1 Device Initialization": > > > > 1. Reset the device. > > ... > > 5. Set the FEATURES_OK status bit. [...] > > ... > > 7. Perform device-specific setup, including discovery of virtqueues > > for the device, optional per-bus setup, reading and possibly writing > > the device’s virtio configuration space, and population of virtqueues. > > 8.Set the DRIVER_OK status bit. At this point the device is “live”. > > > > Steps 4-8 are all done in vhost_dev_start, in that particular order. > > To call vhost_vdpa_reset_status from vhost_vdpa_dev_start(true) would > > reset the device back to step 1, but there is no more code to set all > > configuration from 2-7 before 8 (DRIVER_OK). > > That’s why I’ve proposed doing the reset at the start of > vhost_dev_start() (quoted below still). To me, that sounds in line with > the virtio specification. > > Still, if you insist there must a reset somewhere in > vhost_dev_start()/stop() because it may be guest-initiated, then there > is no solution that can work for both. We must be able to distinguish > between a paused VM and a change in the guest driver. > Right. [1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg954916.html > >> Hanna > >> > >>> The move of > >>> the reset should be to remove it from vhost_dev_stop to something like > >>> both virtio_reset and the end of virtio_save. I'm not sure if I'm > >>> forgetting some other use cases. > >>> > >>>> E.g. add a full vhost_dev_resume callback to invoke right at > >>>> the start of vhost_dev_start(); or check (in the same place) whether the > >>>> back-end supports resuming, and if it doesn’t (and it is currently > >>>> suspended), reset it there. > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 3/6] vhost: Do not reset suspended devices on stop 2023-07-27 12:49 ` Eugenio Perez Martin @ 2023-07-27 20:26 ` Stefan Hajnoczi 0 siblings, 0 replies; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-27 20:26 UTC (permalink / raw) To: longpeng2 Cc: Hanna Czenczek, qemu-devel, Michael S . Tsirkin, German Maglione, Eugenio Perez Martin [-- Attachment #1: Type: text/plain, Size: 12365 bytes --] On Thu, Jul 27, 2023 at 02:49:04PM +0200, Eugenio Perez Martin wrote: > On Wed, Jul 26, 2023 at 8:57 AM Hanna Czenczek <hreitz@redhat.com> wrote: > > > > On 25.07.23 20:53, Eugenio Perez Martin wrote: > > > On Tue, Jul 25, 2023 at 3:09 PM Hanna Czenczek <hreitz@redhat.com> wrote: > > >> On 25.07.23 12:03, Eugenio Perez Martin wrote: > > >>> On Tue, Jul 25, 2023 at 9:53 AM Hanna Czenczek <hreitz@redhat.com> wrote: > > >>>> On 24.07.23 17:48, Eugenio Perez Martin wrote: > > >>>>> On Fri, Jul 21, 2023 at 6:07 PM Hanna Czenczek <hreitz@redhat.com> wrote: > > >>>>>> On 21.07.23 17:25, Eugenio Perez Martin wrote: > > >>>>>>> On Tue, Jul 11, 2023 at 5:52 PM Hanna Czenczek <hreitz@redhat.com> wrote: > > >>>>>>>> Move the `suspended` field from vhost_vdpa into the global vhost_dev > > >>>>>>>> struct, so vhost_dev_stop() can check whether the back-end has been > > >>>>>>>> suspended by `vhost_ops->vhost_dev_start(hdev, false)`. If it has, > > >>>>>>>> there is no need to reset it; the reset is just a fall-back to stop > > >>>>>>>> device operations for back-ends that do not support suspend. > > >>>>>>>> > > >>>>>>>> Unfortunately, for vDPA specifically, RESUME is not yet implemented, so > > >>>>>>>> when the device is re-started, we still have to do the reset to have it > > >>>>>>>> un-suspend. > > >>>>>>>> > > >>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > > >>>>>>>> --- > > >>>>>>>> include/hw/virtio/vhost-vdpa.h | 2 -- > > >>>>>>>> include/hw/virtio/vhost.h | 8 ++++++++ > > >>>>>>>> hw/virtio/vhost-vdpa.c | 11 +++++++---- > > >>>>>>>> hw/virtio/vhost.c | 8 +++++++- > > >>>>>>>> 4 files changed, 22 insertions(+), 7 deletions(-) > > >>>>>>>> > > >>>>>>>> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h > > >>>>>>>> index e64bfc7f98..72c3686b7f 100644 > > >>>>>>>> --- a/include/hw/virtio/vhost-vdpa.h > > >>>>>>>> +++ b/include/hw/virtio/vhost-vdpa.h > > >>>>>>>> @@ -42,8 +42,6 @@ typedef struct vhost_vdpa { > > >>>>>>>> bool shadow_vqs_enabled; > > >>>>>>>> /* Vdpa must send shadow addresses as IOTLB key for data queues, not GPA */ > > >>>>>>>> bool shadow_data; > > >>>>>>>> - /* Device suspended successfully */ > > >>>>>>>> - bool suspended; > > >>>>>>>> /* IOVA mapping used by the Shadow Virtqueue */ > > >>>>>>>> VhostIOVATree *iova_tree; > > >>>>>>>> GPtrArray *shadow_vqs; > > >>>>>>>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h > > >>>>>>>> index 6a173cb9fa..69bf59d630 100644 > > >>>>>>>> --- a/include/hw/virtio/vhost.h > > >>>>>>>> +++ b/include/hw/virtio/vhost.h > > >>>>>>>> @@ -120,6 +120,14 @@ struct vhost_dev { > > >>>>>>>> uint64_t backend_cap; > > >>>>>>>> /* @started: is the vhost device started? */ > > >>>>>>>> bool started; > > >>>>>>>> + /** > > >>>>>>>> + * @suspended: Whether the vhost device is currently suspended. Set > > >>>>>>>> + * and reset by implementations (vhost-user, vhost-vdpa, ...), which > > >>>>>>>> + * are supposed to automatically suspend/resume in their > > >>>>>>>> + * vhost_dev_start handlers as required. Must also be cleared when > > >>>>>>>> + * the device is reset. > > >>>>>>>> + */ > > >>>>>>>> + bool suspended; > > >>>>>>>> bool log_enabled; > > >>>>>>>> uint64_t log_size; > > >>>>>>>> Error *migration_blocker; > > >>>>>>>> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > > >>>>>>>> index 7b7dee468e..f7fd19a203 100644 > > >>>>>>>> --- a/hw/virtio/vhost-vdpa.c > > >>>>>>>> +++ b/hw/virtio/vhost-vdpa.c > > >>>>>>>> @@ -858,13 +858,12 @@ static int vhost_vdpa_get_device_id(struct vhost_dev *dev, > > >>>>>>>> > > >>>>>>>> static int vhost_vdpa_reset_device(struct vhost_dev *dev) > > >>>>>>>> { > > >>>>>>>> - struct vhost_vdpa *v = dev->opaque; > > >>>>>>>> int ret; > > >>>>>>>> uint8_t status = 0; > > >>>>>>>> > > >>>>>>>> ret = vhost_vdpa_call(dev, VHOST_VDPA_SET_STATUS, &status); > > >>>>>>>> trace_vhost_vdpa_reset_device(dev); > > >>>>>>>> - v->suspended = false; > > >>>>>>>> + dev->suspended = false; > > >>>>>>>> return ret; > > >>>>>>>> } > > >>>>>>>> > > >>>>>>>> @@ -1278,7 +1277,7 @@ static void vhost_vdpa_suspend(struct vhost_dev *dev) > > >>>>>>>> if (unlikely(r)) { > > >>>>>>>> error_report("Cannot suspend: %s(%d)", g_strerror(errno), errno); > > >>>>>>>> } else { > > >>>>>>>> - v->suspended = true; > > >>>>>>>> + dev->suspended = true; > > >>>>>>>> return; > > >>>>>>>> } > > >>>>>>>> } > > >>>>>>>> @@ -1313,6 +1312,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) > > >>>>>>>> return -1; > > >>>>>>>> } > > >>>>>>>> vhost_vdpa_set_vring_ready(dev); > > >>>>>>>> + if (dev->suspended) { > > >>>>>>>> + /* TODO: When RESUME is available, use it instead of resetting */ > > >>>>>>>> + vhost_vdpa_reset_status(dev); > > >>>>>>> How is that we reset the status at each vhost_vdpa_dev_start? That > > >>>>>>> will clean all the vqs configured, features negotiated, etc. in the > > >>>>>>> vDPA device. Or am I missing something? > > >>>>>> What alternative do you propose? We don’t have RESUME for vDPA in qemu, > > >>>>>> but we somehow need to lift the previous SUSPEND so the device will > > >>>>>> again respond to guest requests, do we not? > > >>>>>> > > >>>>> Reset also clears the suspend state in vDPA, and it should be called > > >>>>> at vhost_dev_stop. So the device should never be in suspended state > > >>>>> here. Does that solve your concerns? > > >>>> My intention with this patch was precisely not to reset in > > >>>> vhost_dev_stop when suspending is supported. So now I’m more confused > > >>>> than before. > > >>>> > > >>> At this moment, I think that should be focused as an optimization and > > >>> not to be included in the main series. > > >> It is absolutely not an optimization but vital for my use case. > > >> virtiofsd does not currently implement resetting, but if it did (and we > > >> want this support in the future), resetting it during stop/cont would be > > >> catastrophic. There must be a way to have qemu not issue a reset. This > > >> patch is the reason why this series exists. > > >> > > > Sorry, I see I confused things with the first reply. Let me do a recap. > > > > > > If I understand the problem correctly, your use case requires that > > > qemu does not reset the device before the device state is fetched with > > > virtio_save in the case of a migration. > > > > That is only part of the problem, the bigger picture has nothing to do > > with migration at all. The problem is that when the VM is paused > > (stop), we invoke vhost_dev_stop(), and when it is resumed (cont), we > > invoke vhost_dev_start(). To me, it therefore sounds absolutely wrong > > to reset the back-end in either of these functions. For stateless > > devices, it was determined to not be an issue (I still find it extremely > > strange), and as far as I’ve understood, we’ve come to the agreement > > that it’s basically a fallback for when there is no other way to stop > > the back-end. But stateful devices like virtio-fs would be completely > > broken by resetting them there. > > > > Therefore, if virtiofsd did implement reset through SET_STATUS, > > stop/cont would break it today. Maybe other vhost-user devices, too, > > which just implement RESET_OWNER/RESET_DEVICE, which aren’t even called > > when the device is supposed to be reset in vhost_dev_stop() (patch 6). > > > > So not just because of migration, but in general, there must be a way > > for back-ends to force qemu not to reset them in vhost_dev_start()/stop(). > > > > Or we stop using vhost_dev_start()/stop() when the VM is paused/resumed > > (stop/cont). > > > > Yes, that comes back to the thread [1]. > > As a third alternative, you can keep vhost_dev_start and let the > function check the current state and initialize the device only if > needed. But you can keep symmetrical functions and call one or another > at the device's code, of course. Not sure what is cleaner or requires > less changes. > > > > This is understandable and I > > > think we have a solution for that: to move the vhost_ops call to > > > virtio_reset and the end of virtio_save. > > > > Why would we reset the device in virtio_save()? > > > > If the VM continues in the source because of whatever reason, > vhost_dev_start would expect the device to be clean. You can test it > with the command "cont" after the LM. > > > > To remove the reset call from > > > vhost_dev_stop is somehow mandatory, as it is called before > > > virtio_save. > > > > > > But we cannot move to vhost_vdpa_dev_start, as proposed here. The reasons are: > > > * All the features, vq parameters, etc are set before any > > > vhost_vdpa_dev_start call. To reset at any vhost_vdpa_dev_start would > > > wipe them. > > > * The device needs to hold all the resources until it is reset. Things > > > like descriptor status etc. > > > > > > And, regarding the comment "When RESUME is available, use it instead > > > of resetting", we cannot use resume to replace reset in all cases. > > > This is because the semantics are different. > > > > > > For example, vhost_dev_stop and vhost_dev_start are also called when > > > the guest reset by itself the device. You can check it rmmoding and > > > modprobing virtio-net driver, for example. In this case, the driver > > > expects to find all vqs to start with 0, but the resume must not reset > > > these indexes. > > > > This isn’t quite clear to me. I understand this to mean that there must > > be a reset somewhere in vhost_dev_stop() and/or vhost_dev_start(). But > > above, you proposed moving the reset from vhost_dev_stop() into > > virtio_reset(). Is virtio_reset() called in addition to > > vhost_dev_stop() and vhost_dev_start() when the guest driver is changed? > > > > Right. Maybe another option is virtio_set_status? > > The point is that other parts of qemu / guest trust the device to be > reset after (current version of) vhost_dev_stop, so if we are going to > move the reset it must be added to the callers at least. To trace > these callers is needed, so maybe it is easy to add another function > (vhost_dev_suspend?), your first alternative. > > > Because we can’t have an always-present reset in vhost_dev_stop() or > > vhost_dev_start(). It just doesn’t work with stop/cont. At the same > > time, I understand that you say we must have it because > > vhost_dev_{stop,start}() are also used when the guest driver changes. > > Consequently, it sounds clear to me that using the exact same functions > > in stop/cont and when the guest driver is unloaded/loaded is and has > > always been wrong. Because in stop/cont, the guest driver never > > changes, so we shouldn’t tell the back-end that it did (by sending > > SET_STATUS(0)). > > > > Talking just about vhost_net here, the device dumps all the state > (like vqs) to qemu in vhost_dev_stop. That is what allows to have, for > example, an unified code in migrating: qemu only needs to know how to > migrate its emulated device, and vhost code just writes or read at > suspend / continue or live migrating. To suspend and continue is the > same operation actually for vhost_net. Hi Longpeng, This discussion made me realize that --device vhost-vdpa-device-pci does not support stateful vDPA devices. For example, a vdpa-net device that implements the controlq will lose state (e.g. packet receive filtering) when the guest is stopped with the QEMU 'stop' command. QEMU needs to use the VHOST_VDPA_RESUME ioctl so it can resume stateful devices instead of resetting them across 'stop'/'cont'. Whatever solution we agree on in this thread should also work for vhost-vdpa and solve the issue for --device vhost-vdpa-device-pci. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 4/6] vhost-user: Implement suspend/resume 2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek ` (2 preceding siblings ...) 2023-07-11 15:52 ` [PATCH 3/6] vhost: Do not reset suspended devices on stop Hanna Czenczek @ 2023-07-11 15:52 ` Hanna Czenczek 2023-07-18 14:37 ` Stefan Hajnoczi 2023-07-11 15:52 ` [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset Hanna Czenczek ` (2 subsequent siblings) 6 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-11 15:52 UTC (permalink / raw) To: qemu-devel Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi, Eugenio Pérez, German Maglione Implement SUSPEND/RESUME like vDPA does, by automatically using it in vhost_user_dev_start(). (Though our vDPA code does not implement RESUME yet, so there, the device is reset when it is to be resumed.) Signed-off-by: Hanna Czenczek <hreitz@redhat.com> --- hw/virtio/vhost-user.c | 99 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 97 insertions(+), 2 deletions(-) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 8dcf049d42..4507de5a92 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -74,6 +74,8 @@ enum VhostUserProtocolFeature { /* Feature 14 reserved for VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS. */ VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15, VHOST_USER_PROTOCOL_F_STATUS = 16, + /* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */ + VHOST_USER_PROTOCOL_F_SUSPEND = 18, VHOST_USER_PROTOCOL_F_MAX }; @@ -121,6 +123,8 @@ typedef enum VhostUserRequest { VHOST_USER_REM_MEM_REG = 38, VHOST_USER_SET_STATUS = 39, VHOST_USER_GET_STATUS = 40, + VHOST_USER_SUSPEND = 41, + VHOST_USER_RESUME = 42, VHOST_USER_MAX } VhostUserRequest; @@ -1389,7 +1393,19 @@ static int vhost_user_set_u64(struct vhost_dev *dev, int request, uint64_t u64, static int vhost_user_set_status(struct vhost_dev *dev, uint8_t status) { - return vhost_user_set_u64(dev, VHOST_USER_SET_STATUS, status, false); + int ret; + + ret = vhost_user_set_u64(dev, VHOST_USER_SET_STATUS, status, false); + if (ret < 0) { + return ret; + } + + if (!status) { + /* reset */ + dev->suspended = false; + } + + return 0; } static int vhost_user_get_status(struct vhost_dev *dev, uint8_t *status) @@ -1490,6 +1506,7 @@ static int vhost_user_get_max_memslots(struct vhost_dev *dev, static int vhost_user_reset_device(struct vhost_dev *dev) { + int ret; VhostUserMsg msg = { .hdr.flags = VHOST_USER_VERSION, }; @@ -1499,7 +1516,13 @@ static int vhost_user_reset_device(struct vhost_dev *dev) ? VHOST_USER_RESET_DEVICE : VHOST_USER_RESET_OWNER; - return vhost_user_write(dev, &msg, NULL, 0); + ret = vhost_user_write(dev, &msg, NULL, 0); + if (ret < 0) { + return ret; + } + + dev->suspended = false; + return 0; } static int vhost_user_backend_handle_config_change(struct vhost_dev *dev) @@ -2707,8 +2730,80 @@ void vhost_user_async_close(DeviceState *d, } } +static bool vhost_user_supports_suspend(struct vhost_dev *dev) +{ + return virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_SUSPEND); +} + +static int vhost_user_do_suspend_resume(struct vhost_dev *dev, bool suspend) +{ + VhostUserMsg msg; + bool reply_supported = virtio_has_feature(dev->protocol_features, + VHOST_USER_PROTOCOL_F_REPLY_ACK); + VhostUserRequest request = suspend ? VHOST_USER_SUSPEND : VHOST_USER_RESUME; + int ret; + + if (dev->suspended == suspend) { + /* Nothing to do */ + return 0; + } + + if (!vhost_user_supports_suspend(dev)) { + return -ENOTSUP; + } + + msg = (VhostUserMsg) { + .hdr = { + .request = request, + .size = 0, + .flags = VHOST_USER_VERSION, + }, + }; + if (reply_supported) { + msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK; + } + + ret = vhost_user_write(dev, &msg, NULL, 0); + if (ret < 0) { + return ret; + } + + if (reply_supported) { + ret = process_message_reply(dev, &msg); + if (ret < 0) { + return ret; + } + } + + dev->suspended = suspend; + return 0; +} + +static int vhost_user_suspend(struct vhost_dev *dev) +{ + return vhost_user_do_suspend_resume(dev, true); +} + +static int vhost_user_resume(struct vhost_dev *dev) +{ + return vhost_user_do_suspend_resume(dev, false); +} + static int vhost_user_dev_start(struct vhost_dev *dev, bool started) { + /* + * Ignore results. If the central vhost code cares, it will check + * dev->suspended. (These calls will fail if the back-end does not + * support suspend/resume, which callers that just want to start the + * device do not care about.) + */ + if (started) { + vhost_user_resume(dev); + } else { + vhost_user_suspend(dev); + } + if (!virtio_has_feature(dev->protocol_features, VHOST_USER_PROTOCOL_F_STATUS)) { return 0; -- 2.41.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 4/6] vhost-user: Implement suspend/resume 2023-07-11 15:52 ` [PATCH 4/6] vhost-user: Implement suspend/resume Hanna Czenczek @ 2023-07-18 14:37 ` Stefan Hajnoczi 0 siblings, 0 replies; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-18 14:37 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione [-- Attachment #1: Type: text/plain, Size: 525 bytes --] On Tue, Jul 11, 2023 at 05:52:26PM +0200, Hanna Czenczek wrote: > Implement SUSPEND/RESUME like vDPA does, by automatically using it in > vhost_user_dev_start(). (Though our vDPA code does not implement RESUME > yet, so there, the device is reset when it is to be resumed.) > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > --- > hw/virtio/vhost-user.c | 99 +++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 97 insertions(+), 2 deletions(-) Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset 2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek ` (3 preceding siblings ...) 2023-07-11 15:52 ` [PATCH 4/6] vhost-user: Implement suspend/resume Hanna Czenczek @ 2023-07-11 15:52 ` Hanna Czenczek 2023-07-18 14:50 ` Stefan Hajnoczi 2023-07-11 15:52 ` [PATCH 6/6] vhost-user: Have reset_status fall back to reset Hanna Czenczek 2023-07-18 15:14 ` [PATCH 0/6] vhost-user: Add suspend/resume Stefan Hajnoczi 6 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-11 15:52 UTC (permalink / raw) To: qemu-devel Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi, Eugenio Pérez, German Maglione vhost-vdpa and vhost-user differ in how they reset the status in their respective vhost_reset_status implementations: vhost-vdpa zeroes it, then re-adds the S_ACKNOWLEDGE and S_DRIVER config bits. S_DRIVER_OK is then set in vhost_vdpa_dev_start(). vhost-user in contrast just zeroes the status, and does no re-add any config bits until vhost_user_dev_start() (where it does re-add all of S_ACKNOWLEDGE, S_DRIVER, and S_DRIVER_OK). There is no documentation for vhost_reset_status, but its only caller is vhost_dev_stop(). So apparently, the device is to be stopped after vhost_reset_status, and therefore it makes more sense to keep the status field fully cleared until the back-end is re-started, which is how vhost-user does it. Make vhost-vdpa do the same -- if nothing else it's confusing to have both vhost implementations handle this differently. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> --- hw/virtio/vhost-vdpa.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index f7fd19a203..0cde8b40de 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -1294,8 +1294,6 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev) } vhost_vdpa_reset_device(dev); - vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | - VIRTIO_CONFIG_S_DRIVER); memory_listener_unregister(&v->listener); } @@ -1334,7 +1332,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) } memory_listener_register(&v->listener, dev->vdev->dma_as); - return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); + return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | + VIRTIO_CONFIG_S_DRIVER | + VIRTIO_CONFIG_S_DRIVER_OK); } return 0; -- 2.41.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset 2023-07-11 15:52 ` [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset Hanna Czenczek @ 2023-07-18 14:50 ` Stefan Hajnoczi 2023-07-19 14:09 ` Hanna Czenczek 0 siblings, 1 reply; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-18 14:50 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione [-- Attachment #1: Type: text/plain, Size: 3091 bytes --] On Tue, Jul 11, 2023 at 05:52:27PM +0200, Hanna Czenczek wrote: > vhost-vdpa and vhost-user differ in how they reset the status in their > respective vhost_reset_status implementations: vhost-vdpa zeroes it, > then re-adds the S_ACKNOWLEDGE and S_DRIVER config bits. S_DRIVER_OK is > then set in vhost_vdpa_dev_start(). > > vhost-user in contrast just zeroes the status, and does no re-add any > config bits until vhost_user_dev_start() (where it does re-add all of > S_ACKNOWLEDGE, S_DRIVER, and S_DRIVER_OK). > > There is no documentation for vhost_reset_status, but its only caller is > vhost_dev_stop(). So apparently, the device is to be stopped after > vhost_reset_status, and therefore it makes more sense to keep the status > field fully cleared until the back-end is re-started, which is how > vhost-user does it. Make vhost-vdpa do the same -- if nothing else it's > confusing to have both vhost implementations handle this differently. > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > --- > hw/virtio/vhost-vdpa.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) Hi Hanna, The VIRTIO spec lists the Device Initialization sequence including the bits set in the Device Status Register here: https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-1070001 ACKNOWLEDGE and DRIVER must be set before FEATURES_OK. DRIVER_OK is set after FEATURES_OK. The driver may read the Device Configuration Space once ACKNOWLEDGE and DRIVER are set. QEMU's vhost code should follow this sequence (especially for vDPA where full VIRTIO devices are implemented). vhost-user is not faithful to the VIRTIO spec here. That's probably due to the fact that vhost-user didn't have the concept of the Device Status Register until recently and back-ends mostly ignore it. Please do the opposite of this patch: bring vhost-user in line with the VIRTIO specification so that the Device Initialization sequence is followed correctly. I think vhost-vdpa already does the right thing. > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > index f7fd19a203..0cde8b40de 100644 > --- a/hw/virtio/vhost-vdpa.c > +++ b/hw/virtio/vhost-vdpa.c > @@ -1294,8 +1294,6 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev) > } > > vhost_vdpa_reset_device(dev); > - vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | > - VIRTIO_CONFIG_S_DRIVER); > memory_listener_unregister(&v->listener); > } > > @@ -1334,7 +1332,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) > } > memory_listener_register(&v->listener, dev->vdev->dma_as); > > - return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); > + return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | > + VIRTIO_CONFIG_S_DRIVER | > + VIRTIO_CONFIG_S_DRIVER_OK); > } > > return 0; > -- > 2.41.0 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset 2023-07-18 14:50 ` Stefan Hajnoczi @ 2023-07-19 14:09 ` Hanna Czenczek 2023-07-19 15:06 ` Stefan Hajnoczi 2023-07-21 15:47 ` Eugenio Perez Martin 0 siblings, 2 replies; 37+ messages in thread From: Hanna Czenczek @ 2023-07-19 14:09 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione On 18.07.23 16:50, Stefan Hajnoczi wrote: > On Tue, Jul 11, 2023 at 05:52:27PM +0200, Hanna Czenczek wrote: >> vhost-vdpa and vhost-user differ in how they reset the status in their >> respective vhost_reset_status implementations: vhost-vdpa zeroes it, >> then re-adds the S_ACKNOWLEDGE and S_DRIVER config bits. S_DRIVER_OK is >> then set in vhost_vdpa_dev_start(). >> >> vhost-user in contrast just zeroes the status, and does no re-add any >> config bits until vhost_user_dev_start() (where it does re-add all of >> S_ACKNOWLEDGE, S_DRIVER, and S_DRIVER_OK). >> >> There is no documentation for vhost_reset_status, but its only caller is >> vhost_dev_stop(). So apparently, the device is to be stopped after >> vhost_reset_status, and therefore it makes more sense to keep the status >> field fully cleared until the back-end is re-started, which is how >> vhost-user does it. Make vhost-vdpa do the same -- if nothing else it's >> confusing to have both vhost implementations handle this differently. >> >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >> --- >> hw/virtio/vhost-vdpa.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) > Hi Hanna, > The VIRTIO spec lists the Device Initialization sequence including the > bits set in the Device Status Register here: > https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-1070001 > > ACKNOWLEDGE and DRIVER must be set before FEATURES_OK. DRIVER_OK is set > after FEATURES_OK. > > The driver may read the Device Configuration Space once ACKNOWLEDGE and > DRIVER are set. > > QEMU's vhost code should follow this sequence (especially for vDPA where > full VIRTIO devices are implemented). > > vhost-user is not faithful to the VIRTIO spec here. That's probably due > to the fact that vhost-user didn't have the concept of the Device Status > Register until recently and back-ends mostly ignore it. > > Please do the opposite of this patch: bring vhost-user in line with the > VIRTIO specification so that the Device Initialization sequence is > followed correctly. I think vhost-vdpa already does the right thing. Hm. This sounds all very good, but what leaves me lost is the fact that we never actually expose the status field to the guest, as far as I can see. We have no set_status callback, and as written in the commit message, the only caller of reset_status is vhost_dev_stop(). So the status field seems completely artificial in vhost right now. That is why I’m wondering what the flags even really mean. Another point I made in the commit message is that it is strange that we reset the status to 0, and then add the ACKNOWLEDGE and DRIVER while the VM is still stopped. It doesn’t make sense to me to set these flags while the guest driver is not operative. If what you’re saying is that we must set FEATURES_OK only after ACKNOWLEDGE and DRIVER, wouldn’t it be still better to set all of these flags only in vhost_*_dev_start(), but do it in two separate SET_STATUS calls? (You mentioned the configuration space – is that accessed while between vhost_dev_stop and vhost_dev_start?) Hanna >> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c >> index f7fd19a203..0cde8b40de 100644 >> --- a/hw/virtio/vhost-vdpa.c >> +++ b/hw/virtio/vhost-vdpa.c >> @@ -1294,8 +1294,6 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev) >> } >> >> vhost_vdpa_reset_device(dev); >> - vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | >> - VIRTIO_CONFIG_S_DRIVER); >> memory_listener_unregister(&v->listener); >> } >> >> @@ -1334,7 +1332,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) >> } >> memory_listener_register(&v->listener, dev->vdev->dma_as); >> >> - return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); >> + return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | >> + VIRTIO_CONFIG_S_DRIVER | >> + VIRTIO_CONFIG_S_DRIVER_OK); >> } >> >> return 0; >> -- >> 2.41.0 >> ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset 2023-07-19 14:09 ` Hanna Czenczek @ 2023-07-19 15:06 ` Stefan Hajnoczi 2023-07-21 15:47 ` Eugenio Perez Martin 1 sibling, 0 replies; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-19 15:06 UTC (permalink / raw) To: Hanna Czenczek Cc: Stefan Hajnoczi, qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione On Wed, 19 Jul 2023 at 10:10, Hanna Czenczek <hreitz@redhat.com> wrote: > > On 18.07.23 16:50, Stefan Hajnoczi wrote: > > On Tue, Jul 11, 2023 at 05:52:27PM +0200, Hanna Czenczek wrote: > >> vhost-vdpa and vhost-user differ in how they reset the status in their > >> respective vhost_reset_status implementations: vhost-vdpa zeroes it, > >> then re-adds the S_ACKNOWLEDGE and S_DRIVER config bits. S_DRIVER_OK is > >> then set in vhost_vdpa_dev_start(). > >> > >> vhost-user in contrast just zeroes the status, and does no re-add any > >> config bits until vhost_user_dev_start() (where it does re-add all of > >> S_ACKNOWLEDGE, S_DRIVER, and S_DRIVER_OK). > >> > >> There is no documentation for vhost_reset_status, but its only caller is > >> vhost_dev_stop(). So apparently, the device is to be stopped after > >> vhost_reset_status, and therefore it makes more sense to keep the status > >> field fully cleared until the back-end is re-started, which is how > >> vhost-user does it. Make vhost-vdpa do the same -- if nothing else it's > >> confusing to have both vhost implementations handle this differently. > >> > >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > >> --- > >> hw/virtio/vhost-vdpa.c | 6 +++--- > >> 1 file changed, 3 insertions(+), 3 deletions(-) > > Hi Hanna, > > The VIRTIO spec lists the Device Initialization sequence including the > > bits set in the Device Status Register here: > > https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-1070001 > > > > ACKNOWLEDGE and DRIVER must be set before FEATURES_OK. DRIVER_OK is set > > after FEATURES_OK. > > > > The driver may read the Device Configuration Space once ACKNOWLEDGE and > > DRIVER are set. > > > > QEMU's vhost code should follow this sequence (especially for vDPA where > > full VIRTIO devices are implemented). > > > > vhost-user is not faithful to the VIRTIO spec here. That's probably due > > to the fact that vhost-user didn't have the concept of the Device Status > > Register until recently and back-ends mostly ignore it. > > > > Please do the opposite of this patch: bring vhost-user in line with the > > VIRTIO specification so that the Device Initialization sequence is > > followed correctly. I think vhost-vdpa already does the right thing. > > Hm. This sounds all very good, but what leaves me lost is the fact that > we never actually expose the status field to the guest, as far as I can > see. We have no set_status callback, and as written in the commit > message, the only caller of reset_status is vhost_dev_stop(). So the > status field seems completely artificial in vhost right now. That is > why I’m wondering what the flags even really mean. vhost (including vDPA and vhost-user) is not a 100% passthrough solution. The VMM emulates a VIRTIO device (e.g. virtio-fs-pci) that has some separate state from the vhost back-end, including the Device Status Register. This is analogous to how passthrough PCI devices still have emulated PCI registers that are not passed through to the physical PCI device. However, just because the vDPA, and now vhost-user with the SET_STATUS message, back-end is not directly exposed to the guest does not mean it should diverge from the VIRTIO specification for no reason. > Another point I made in the commit message is that it is strange that we > reset the status to 0, and then add the ACKNOWLEDGE and DRIVER while the > VM is still stopped. It doesn’t make sense to me to set these flags > while the guest driver is not operative. While there is no harm in setting those bits, I agree that leaving the Device Status Register at 0 while the VM is stopped would be nicer. > If what you’re saying is that we must set FEATURES_OK only after > ACKNOWLEDGE and DRIVER, wouldn’t it be still better to set all of these > flags only in vhost_*_dev_start(), but do it in two separate SET_STATUS > calls? The device initialization sequence could be put into vhost_dev_start(): 1. ACKNOWLEDGE | DRIVER 2. FEATURES_OK via vhost_dev_set_features() 3. DRIVER_OK via ->vhost_dev_start() But note that the ->vhost_dev_start() callback is too late to set ACKNOWLEDGE | DRIVER because feature negotiation happens earlier. > (You mentioned the configuration space – is that accessed while between > vhost_dev_stop and vhost_dev_start?) I don't think so. > > Hanna > > >> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > >> index f7fd19a203..0cde8b40de 100644 > >> --- a/hw/virtio/vhost-vdpa.c > >> +++ b/hw/virtio/vhost-vdpa.c > >> @@ -1294,8 +1294,6 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev) > >> } > >> > >> vhost_vdpa_reset_device(dev); > >> - vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | > >> - VIRTIO_CONFIG_S_DRIVER); > >> memory_listener_unregister(&v->listener); > >> } > >> > >> @@ -1334,7 +1332,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) > >> } > >> memory_listener_register(&v->listener, dev->vdev->dma_as); > >> > >> - return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); > >> + return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | > >> + VIRTIO_CONFIG_S_DRIVER | > >> + VIRTIO_CONFIG_S_DRIVER_OK); > >> } > >> > >> return 0; > >> -- > >> 2.41.0 > >> > > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset 2023-07-19 14:09 ` Hanna Czenczek 2023-07-19 15:06 ` Stefan Hajnoczi @ 2023-07-21 15:47 ` Eugenio Perez Martin 1 sibling, 0 replies; 37+ messages in thread From: Eugenio Perez Martin @ 2023-07-21 15:47 UTC (permalink / raw) To: Hanna Czenczek Cc: Stefan Hajnoczi, qemu-devel, Michael S . Tsirkin, German Maglione On Wed, Jul 19, 2023 at 4:10 PM Hanna Czenczek <hreitz@redhat.com> wrote: > > On 18.07.23 16:50, Stefan Hajnoczi wrote: > > On Tue, Jul 11, 2023 at 05:52:27PM +0200, Hanna Czenczek wrote: > >> vhost-vdpa and vhost-user differ in how they reset the status in their > >> respective vhost_reset_status implementations: vhost-vdpa zeroes it, > >> then re-adds the S_ACKNOWLEDGE and S_DRIVER config bits. S_DRIVER_OK is > >> then set in vhost_vdpa_dev_start(). > >> > >> vhost-user in contrast just zeroes the status, and does no re-add any > >> config bits until vhost_user_dev_start() (where it does re-add all of > >> S_ACKNOWLEDGE, S_DRIVER, and S_DRIVER_OK). > >> > >> There is no documentation for vhost_reset_status, but its only caller is > >> vhost_dev_stop(). So apparently, the device is to be stopped after > >> vhost_reset_status, and therefore it makes more sense to keep the status > >> field fully cleared until the back-end is re-started, which is how > >> vhost-user does it. Make vhost-vdpa do the same -- if nothing else it's > >> confusing to have both vhost implementations handle this differently. > >> Ok now I understand better why you moved the call to vhost_vdpa_reset_status. Maybe we can move the vhost_vdpa_add_status(dev, _S_ACKNOWLEDGE | _S_DRIVER) to vhost_vdpa_set_features then, and let reset_status call vhost_vdpa_reset_device directly. But I'm not 100% sure if I'm missing something, it would need testing. Thanks! > >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > >> --- > >> hw/virtio/vhost-vdpa.c | 6 +++--- > >> 1 file changed, 3 insertions(+), 3 deletions(-) > > Hi Hanna, > > The VIRTIO spec lists the Device Initialization sequence including the > > bits set in the Device Status Register here: > > https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-1070001 > > > > ACKNOWLEDGE and DRIVER must be set before FEATURES_OK. DRIVER_OK is set > > after FEATURES_OK. > > > > The driver may read the Device Configuration Space once ACKNOWLEDGE and > > DRIVER are set. > > > > QEMU's vhost code should follow this sequence (especially for vDPA where > > full VIRTIO devices are implemented). > > > > vhost-user is not faithful to the VIRTIO spec here. That's probably due > > to the fact that vhost-user didn't have the concept of the Device Status > > Register until recently and back-ends mostly ignore it. > > > > Please do the opposite of this patch: bring vhost-user in line with the > > VIRTIO specification so that the Device Initialization sequence is > > followed correctly. I think vhost-vdpa already does the right thing. > > Hm. This sounds all very good, but what leaves me lost is the fact that > we never actually expose the status field to the guest, as far as I can > see. We have no set_status callback, and as written in the commit > message, the only caller of reset_status is vhost_dev_stop(). So the > status field seems completely artificial in vhost right now. That is > why I’m wondering what the flags even really mean. > > Another point I made in the commit message is that it is strange that we > reset the status to 0, and then add the ACKNOWLEDGE and DRIVER while the > VM is still stopped. It doesn’t make sense to me to set these flags > while the guest driver is not operative. > > If what you’re saying is that we must set FEATURES_OK only after > ACKNOWLEDGE and DRIVER, wouldn’t it be still better to set all of these > flags only in vhost_*_dev_start(), but do it in two separate SET_STATUS > calls? > > (You mentioned the configuration space – is that accessed while between > vhost_dev_stop and vhost_dev_start?) > > Hanna > > >> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c > >> index f7fd19a203..0cde8b40de 100644 > >> --- a/hw/virtio/vhost-vdpa.c > >> +++ b/hw/virtio/vhost-vdpa.c > >> @@ -1294,8 +1294,6 @@ static void vhost_vdpa_reset_status(struct vhost_dev *dev) > >> } > >> > >> vhost_vdpa_reset_device(dev); > >> - vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | > >> - VIRTIO_CONFIG_S_DRIVER); > >> memory_listener_unregister(&v->listener); > >> } > >> > >> @@ -1334,7 +1332,9 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started) > >> } > >> memory_listener_register(&v->listener, dev->vdev->dma_as); > >> > >> - return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK); > >> + return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE | > >> + VIRTIO_CONFIG_S_DRIVER | > >> + VIRTIO_CONFIG_S_DRIVER_OK); > >> } > >> > >> return 0; > >> -- > >> 2.41.0 > >> > ^ permalink raw reply [flat|nested] 37+ messages in thread
* [PATCH 6/6] vhost-user: Have reset_status fall back to reset 2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek ` (4 preceding siblings ...) 2023-07-11 15:52 ` [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset Hanna Czenczek @ 2023-07-11 15:52 ` Hanna Czenczek 2023-07-18 15:10 ` Stefan Hajnoczi 2023-07-18 15:14 ` [PATCH 0/6] vhost-user: Add suspend/resume Stefan Hajnoczi 6 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-11 15:52 UTC (permalink / raw) To: qemu-devel Cc: Hanna Czenczek, Michael S . Tsirkin, Stefan Hajnoczi, Eugenio Pérez, German Maglione The only user of vhost_user_reset_status() is vhost_dev_stop(), which only uses it as a fall-back to stop the back-end if it does not support SUSPEND. However, vhost-user's implementation is a no-op unless the back-end supports SET_STATUS. vhost-vdpa's implementation instead just calls vhost_vdpa_reset_device(), implying that it's OK to fully reset the device if SET_STATUS is not supported. To be fair, vhost_vdpa_reset_device() does nothing but to set the status to zero. However, that may well be because vhost-vdpa has no method besides this to reset a device. In contrast, vhost-user has RESET_DEVICE and a RESET_OWNER, which can be used instead. While it is not entirely clear from documentation or git logs, from discussions and the order of vhost-user protocol features, it appears to me as if RESET_OWNER originally had no real meaning for vhost-user, and was thus used to signal a device reset to the back-end. Then, RESET_DEVICE was introduced, to have a well-defined dedicated reset command. Finally, vhost-user received full STATUS support, including SET_STATUS, so setting the device status to 0 is now the preferred way of resetting a device. Still, RESET_DEVICE and RESET_OWNER should remain valid as fall-backs. Therefore, have vhost_user_reset_status() fall back to vhost_user_reset_device() if the back-end has no STATUS support. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> --- hw/virtio/vhost-user.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 4507de5a92..53a881ec2a 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -2833,6 +2833,8 @@ static void vhost_user_reset_status(struct vhost_dev *dev) if (virtio_has_feature(dev->protocol_features, VHOST_USER_PROTOCOL_F_STATUS)) { vhost_user_set_status(dev, 0); + } else { + vhost_user_reset_device(dev); } } -- 2.41.0 ^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH 6/6] vhost-user: Have reset_status fall back to reset 2023-07-11 15:52 ` [PATCH 6/6] vhost-user: Have reset_status fall back to reset Hanna Czenczek @ 2023-07-18 15:10 ` Stefan Hajnoczi 2023-07-19 14:11 ` Hanna Czenczek 0 siblings, 1 reply; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-18 15:10 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione [-- Attachment #1: Type: text/plain, Size: 2601 bytes --] On Tue, Jul 11, 2023 at 05:52:28PM +0200, Hanna Czenczek wrote: > The only user of vhost_user_reset_status() is vhost_dev_stop(), which > only uses it as a fall-back to stop the back-end if it does not support > SUSPEND. However, vhost-user's implementation is a no-op unless the > back-end supports SET_STATUS. > > vhost-vdpa's implementation instead just calls > vhost_vdpa_reset_device(), implying that it's OK to fully reset the > device if SET_STATUS is not supported. > > To be fair, vhost_vdpa_reset_device() does nothing but to set the status > to zero. However, that may well be because vhost-vdpa has no method > besides this to reset a device. In contrast, vhost-user has > RESET_DEVICE and a RESET_OWNER, which can be used instead. > > While it is not entirely clear from documentation or git logs, from > discussions and the order of vhost-user protocol features, it appears to > me as if RESET_OWNER originally had no real meaning for vhost-user, and > was thus used to signal a device reset to the back-end. Then, > RESET_DEVICE was introduced, to have a well-defined dedicated reset > command. Finally, vhost-user received full STATUS support, including > SET_STATUS, so setting the device status to 0 is now the preferred way > of resetting a device. Still, RESET_DEVICE and RESET_OWNER should > remain valid as fall-backs. > > Therefore, have vhost_user_reset_status() fall back to > vhost_user_reset_device() if the back-end has no STATUS support. > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > --- > hw/virtio/vhost-user.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > index 4507de5a92..53a881ec2a 100644 > --- a/hw/virtio/vhost-user.c > +++ b/hw/virtio/vhost-user.c > @@ -2833,6 +2833,8 @@ static void vhost_user_reset_status(struct vhost_dev *dev) > if (virtio_has_feature(dev->protocol_features, > VHOST_USER_PROTOCOL_F_STATUS)) { > vhost_user_set_status(dev, 0); > + } else { > + vhost_user_reset_device(dev); > } > } Did you check whether DPDK treats setting the status to 0 as equivalent to RESET_DEVICE? My understanding is that SET_STATUS is mostly ignored by vhost-user back-ends today. Even those that implement it may not treat SET_STATUS 0 as equivalent to RESET_DEVICE. If you decide it's safe to make this change, please also update vhost-user.rst to document that front-ends should use SET_STATUS 0, RESET_DEVICE, and RESET_OWNER in order of preference. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/6] vhost-user: Have reset_status fall back to reset 2023-07-18 15:10 ` Stefan Hajnoczi @ 2023-07-19 14:11 ` Hanna Czenczek 2023-07-19 14:27 ` Hanna Czenczek 0 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-19 14:11 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione On 18.07.23 17:10, Stefan Hajnoczi wrote: > On Tue, Jul 11, 2023 at 05:52:28PM +0200, Hanna Czenczek wrote: >> The only user of vhost_user_reset_status() is vhost_dev_stop(), which >> only uses it as a fall-back to stop the back-end if it does not support >> SUSPEND. However, vhost-user's implementation is a no-op unless the >> back-end supports SET_STATUS. >> >> vhost-vdpa's implementation instead just calls >> vhost_vdpa_reset_device(), implying that it's OK to fully reset the >> device if SET_STATUS is not supported. >> >> To be fair, vhost_vdpa_reset_device() does nothing but to set the status >> to zero. However, that may well be because vhost-vdpa has no method >> besides this to reset a device. In contrast, vhost-user has >> RESET_DEVICE and a RESET_OWNER, which can be used instead. >> >> While it is not entirely clear from documentation or git logs, from >> discussions and the order of vhost-user protocol features, it appears to >> me as if RESET_OWNER originally had no real meaning for vhost-user, and >> was thus used to signal a device reset to the back-end. Then, >> RESET_DEVICE was introduced, to have a well-defined dedicated reset >> command. Finally, vhost-user received full STATUS support, including >> SET_STATUS, so setting the device status to 0 is now the preferred way >> of resetting a device. Still, RESET_DEVICE and RESET_OWNER should >> remain valid as fall-backs. >> >> Therefore, have vhost_user_reset_status() fall back to >> vhost_user_reset_device() if the back-end has no STATUS support. >> >> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >> --- >> hw/virtio/vhost-user.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >> index 4507de5a92..53a881ec2a 100644 >> --- a/hw/virtio/vhost-user.c >> +++ b/hw/virtio/vhost-user.c >> @@ -2833,6 +2833,8 @@ static void vhost_user_reset_status(struct vhost_dev *dev) >> if (virtio_has_feature(dev->protocol_features, >> VHOST_USER_PROTOCOL_F_STATUS)) { >> vhost_user_set_status(dev, 0); >> + } else { >> + vhost_user_reset_device(dev); >> } >> } > Did you check whether DPDK treats setting the status to 0 as equivalent > to RESET_DEVICE? If it doesn’t, what’s even the point of using reset_status? I will investigate, but if there’s a difference, that makes the whole reset_* thing even more questionable to me than it has already been so far. Hanna > My understanding is that SET_STATUS is mostly ignored by vhost-user > back-ends today. Even those that implement it may not treat SET_STATUS 0 > as equivalent to RESET_DEVICE. > > If you decide it's safe to make this change, please also update > vhost-user.rst to document that front-ends should use SET_STATUS 0, > RESET_DEVICE, and RESET_OWNER in order of preference. > > Stefan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/6] vhost-user: Have reset_status fall back to reset 2023-07-19 14:11 ` Hanna Czenczek @ 2023-07-19 14:27 ` Hanna Czenczek 2023-07-20 16:03 ` Stefan Hajnoczi 0 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-19 14:27 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione On 19.07.23 16:11, Hanna Czenczek wrote: > On 18.07.23 17:10, Stefan Hajnoczi wrote: >> On Tue, Jul 11, 2023 at 05:52:28PM +0200, Hanna Czenczek wrote: >>> The only user of vhost_user_reset_status() is vhost_dev_stop(), which >>> only uses it as a fall-back to stop the back-end if it does not support >>> SUSPEND. However, vhost-user's implementation is a no-op unless the >>> back-end supports SET_STATUS. >>> >>> vhost-vdpa's implementation instead just calls >>> vhost_vdpa_reset_device(), implying that it's OK to fully reset the >>> device if SET_STATUS is not supported. >>> >>> To be fair, vhost_vdpa_reset_device() does nothing but to set the >>> status >>> to zero. However, that may well be because vhost-vdpa has no method >>> besides this to reset a device. In contrast, vhost-user has >>> RESET_DEVICE and a RESET_OWNER, which can be used instead. >>> >>> While it is not entirely clear from documentation or git logs, from >>> discussions and the order of vhost-user protocol features, it >>> appears to >>> me as if RESET_OWNER originally had no real meaning for vhost-user, and >>> was thus used to signal a device reset to the back-end. Then, >>> RESET_DEVICE was introduced, to have a well-defined dedicated reset >>> command. Finally, vhost-user received full STATUS support, including >>> SET_STATUS, so setting the device status to 0 is now the preferred way >>> of resetting a device. Still, RESET_DEVICE and RESET_OWNER should >>> remain valid as fall-backs. >>> >>> Therefore, have vhost_user_reset_status() fall back to >>> vhost_user_reset_device() if the back-end has no STATUS support. >>> >>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>> --- >>> hw/virtio/vhost-user.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >>> index 4507de5a92..53a881ec2a 100644 >>> --- a/hw/virtio/vhost-user.c >>> +++ b/hw/virtio/vhost-user.c >>> @@ -2833,6 +2833,8 @@ static void vhost_user_reset_status(struct >>> vhost_dev *dev) >>> if (virtio_has_feature(dev->protocol_features, >>> VHOST_USER_PROTOCOL_F_STATUS)) { >>> vhost_user_set_status(dev, 0); >>> + } else { >>> + vhost_user_reset_device(dev); >>> } >>> } >> Did you check whether DPDK treats setting the status to 0 as equivalent >> to RESET_DEVICE? > > If it doesn’t, what’s even the point of using reset_status? Sorry, I’m being unclear, and I think this may be important because it ties into the question from patch 1, what qemu is even trying to do by running SET_STATUS(0) vhost_dev_stop(), so here’s what gave me the impression that SET_STATUS(0) and RESET_DEVICE should be equivalent: vhost-vdpa.c runs SET_STATUS(0) in a function called vhost_vdpa_reset_device(). This is one thing that gave me the impression that this is about an actual full reset. Another is the whole discussion that we’ve had. vhost_dev_stop() does not call a `vhost_reset_device()` function, it calls `vhost_reset_status()`. Still, we were always talking about resetting the device. It doesn’t make sense to me that vDPA would provide no function to fully reset a device, while vhost-user does. Being able to reset a device sounds vital to me. This also gave me the impression that SET_STATUS(0) on vDPA at least is functionally equivalent to a full device reset. Maybe SET_STATUS(0) does mean a full device reset on vDPA, but not on vhost-user. That would be a real shame, so I assumed this would not be the case; that SET_STATUS(0) does the same thing on both protocols. The virtio specification says “Writing 0 into this field resets the device.” about the device_status field. This also makes sense, because the device_status field is basically used to tell the device that a driver has taken control. If reset, this indicates the driver has given up control, and to me this is a point where a device should fully reset itself. So all in all, I can’t see the rationale why any implementation that supports SET_STATUS would decide to treat SET_STATUS(0) not as equivalent or a superset of RESET_DEVICE. I may be wrong, and this might explain a whole deal about what kind of background operations we hope to stop with SET_STATUS(0). Hanna ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/6] vhost-user: Have reset_status fall back to reset 2023-07-19 14:27 ` Hanna Czenczek @ 2023-07-20 16:03 ` Stefan Hajnoczi 2023-07-21 14:16 ` Hanna Czenczek 0 siblings, 1 reply; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-20 16:03 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione, Maxime Coquelin [-- Attachment #1: Type: text/plain, Size: 5643 bytes --] On Wed, Jul 19, 2023 at 04:27:58PM +0200, Hanna Czenczek wrote: > On 19.07.23 16:11, Hanna Czenczek wrote: > > On 18.07.23 17:10, Stefan Hajnoczi wrote: > > > On Tue, Jul 11, 2023 at 05:52:28PM +0200, Hanna Czenczek wrote: > > > > The only user of vhost_user_reset_status() is vhost_dev_stop(), which > > > > only uses it as a fall-back to stop the back-end if it does not support > > > > SUSPEND. However, vhost-user's implementation is a no-op unless the > > > > back-end supports SET_STATUS. > > > > > > > > vhost-vdpa's implementation instead just calls > > > > vhost_vdpa_reset_device(), implying that it's OK to fully reset the > > > > device if SET_STATUS is not supported. > > > > > > > > To be fair, vhost_vdpa_reset_device() does nothing but to set > > > > the status > > > > to zero. However, that may well be because vhost-vdpa has no method > > > > besides this to reset a device. In contrast, vhost-user has > > > > RESET_DEVICE and a RESET_OWNER, which can be used instead. > > > > > > > > While it is not entirely clear from documentation or git logs, from > > > > discussions and the order of vhost-user protocol features, it > > > > appears to > > > > me as if RESET_OWNER originally had no real meaning for vhost-user, and > > > > was thus used to signal a device reset to the back-end. Then, > > > > RESET_DEVICE was introduced, to have a well-defined dedicated reset > > > > command. Finally, vhost-user received full STATUS support, including > > > > SET_STATUS, so setting the device status to 0 is now the preferred way > > > > of resetting a device. Still, RESET_DEVICE and RESET_OWNER should > > > > remain valid as fall-backs. > > > > > > > > Therefore, have vhost_user_reset_status() fall back to > > > > vhost_user_reset_device() if the back-end has no STATUS support. > > > > > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > > > > --- > > > > hw/virtio/vhost-user.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > index 4507de5a92..53a881ec2a 100644 > > > > --- a/hw/virtio/vhost-user.c > > > > +++ b/hw/virtio/vhost-user.c > > > > @@ -2833,6 +2833,8 @@ static void vhost_user_reset_status(struct > > > > vhost_dev *dev) > > > > if (virtio_has_feature(dev->protocol_features, > > > > VHOST_USER_PROTOCOL_F_STATUS)) { > > > > vhost_user_set_status(dev, 0); > > > > + } else { > > > > + vhost_user_reset_device(dev); > > > > } > > > > } > > > Did you check whether DPDK treats setting the status to 0 as equivalent > > > to RESET_DEVICE? > > > > If it doesn’t, what’s even the point of using reset_status? > > Sorry, I’m being unclear, and I think this may be important because it ties > into the question from patch 1, what qemu is even trying to do by running > SET_STATUS(0) vhost_dev_stop(), so here’s what gave me the impression that > SET_STATUS(0) and RESET_DEVICE should be equivalent: > > vhost-vdpa.c runs SET_STATUS(0) in a function called > vhost_vdpa_reset_device(). This is one thing that gave me the impression > that this is about an actual full reset. > > Another is the whole discussion that we’ve had. vhost_dev_stop() does not > call a `vhost_reset_device()` function, it calls `vhost_reset_status()`. > Still, we were always talking about resetting the device. There is some hacky stuff with struct vhost_dev's vq_index_end and multi-queue devices. I think it's because multi-queue vhost-net device consist of many vhost_devs and NetClientStates, so certain vhost operations are skipped unless this is the "first" or "last" vhost_dev from a large aggregate vhost-net device. That might be responsible for part of the weirdness. > > It doesn’t make sense to me that vDPA would provide no function to fully > reset a device, while vhost-user does. Being able to reset a device sounds > vital to me. This also gave me the impression that SET_STATUS(0) on vDPA at > least is functionally equivalent to a full device reset. > > > Maybe SET_STATUS(0) does mean a full device reset on vDPA, but not on > vhost-user. That would be a real shame, so I assumed this would not be the > case; that SET_STATUS(0) does the same thing on both protocols. Yes, exactly. It has the real VIRTIO spec meaning in vDPA. In vhost-user it's currently only used by DPDK as a hint for when device initialization is complete: https://github.com/DPDK/dpdk/commit/41d201804c4c44738168e2d247d3b1780845faa1 > The virtio specification says “Writing 0 into this field resets the device.” > about the device_status field. > > This also makes sense, because the device_status field is basically used to > tell the device that a driver has taken control. If reset, this indicates > the driver has given up control, and to me this is a point where a device > should fully reset itself. > > So all in all, I can’t see the rationale why any implementation that > supports SET_STATUS would decide to treat SET_STATUS(0) not as equivalent or > a superset of RESET_DEVICE. I may be wrong, and this might explain a whole > deal about what kind of background operations we hope to stop with > SET_STATUS(0). I would like vhost-user devices to implement SET_STATUS according to the VIRTIO specification in the future and they can do that. But I think front-ends should continue sending RESET_DEVICE in order to support old devices. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/6] vhost-user: Have reset_status fall back to reset 2023-07-20 16:03 ` Stefan Hajnoczi @ 2023-07-21 14:16 ` Hanna Czenczek 2023-07-24 18:04 ` Stefan Hajnoczi 0 siblings, 1 reply; 37+ messages in thread From: Hanna Czenczek @ 2023-07-21 14:16 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione, Maxime Coquelin On 20.07.23 18:03, Stefan Hajnoczi wrote: > On Wed, Jul 19, 2023 at 04:27:58PM +0200, Hanna Czenczek wrote: >> On 19.07.23 16:11, Hanna Czenczek wrote: >>> On 18.07.23 17:10, Stefan Hajnoczi wrote: >>>> On Tue, Jul 11, 2023 at 05:52:28PM +0200, Hanna Czenczek wrote: >>>>> The only user of vhost_user_reset_status() is vhost_dev_stop(), which >>>>> only uses it as a fall-back to stop the back-end if it does not support >>>>> SUSPEND. However, vhost-user's implementation is a no-op unless the >>>>> back-end supports SET_STATUS. >>>>> >>>>> vhost-vdpa's implementation instead just calls >>>>> vhost_vdpa_reset_device(), implying that it's OK to fully reset the >>>>> device if SET_STATUS is not supported. >>>>> >>>>> To be fair, vhost_vdpa_reset_device() does nothing but to set >>>>> the status >>>>> to zero. However, that may well be because vhost-vdpa has no method >>>>> besides this to reset a device. In contrast, vhost-user has >>>>> RESET_DEVICE and a RESET_OWNER, which can be used instead. >>>>> >>>>> While it is not entirely clear from documentation or git logs, from >>>>> discussions and the order of vhost-user protocol features, it >>>>> appears to >>>>> me as if RESET_OWNER originally had no real meaning for vhost-user, and >>>>> was thus used to signal a device reset to the back-end. Then, >>>>> RESET_DEVICE was introduced, to have a well-defined dedicated reset >>>>> command. Finally, vhost-user received full STATUS support, including >>>>> SET_STATUS, so setting the device status to 0 is now the preferred way >>>>> of resetting a device. Still, RESET_DEVICE and RESET_OWNER should >>>>> remain valid as fall-backs. >>>>> >>>>> Therefore, have vhost_user_reset_status() fall back to >>>>> vhost_user_reset_device() if the back-end has no STATUS support. >>>>> >>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>> --- >>>>> hw/virtio/vhost-user.c | 2 ++ >>>>> 1 file changed, 2 insertions(+) >>>>> >>>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >>>>> index 4507de5a92..53a881ec2a 100644 >>>>> --- a/hw/virtio/vhost-user.c >>>>> +++ b/hw/virtio/vhost-user.c >>>>> @@ -2833,6 +2833,8 @@ static void vhost_user_reset_status(struct >>>>> vhost_dev *dev) >>>>> if (virtio_has_feature(dev->protocol_features, >>>>> VHOST_USER_PROTOCOL_F_STATUS)) { >>>>> vhost_user_set_status(dev, 0); >>>>> + } else { >>>>> + vhost_user_reset_device(dev); >>>>> } >>>>> } >>>> Did you check whether DPDK treats setting the status to 0 as equivalent >>>> to RESET_DEVICE? >>> If it doesn’t, what’s even the point of using reset_status? >> Sorry, I’m being unclear, and I think this may be important because it ties >> into the question from patch 1, what qemu is even trying to do by running >> SET_STATUS(0) vhost_dev_stop(), so here’s what gave me the impression that >> SET_STATUS(0) and RESET_DEVICE should be equivalent: >> >> vhost-vdpa.c runs SET_STATUS(0) in a function called >> vhost_vdpa_reset_device(). This is one thing that gave me the impression >> that this is about an actual full reset. >> >> Another is the whole discussion that we’ve had. vhost_dev_stop() does not >> call a `vhost_reset_device()` function, it calls `vhost_reset_status()`. >> Still, we were always talking about resetting the device. > There is some hacky stuff with struct vhost_dev's vq_index_end and > multi-queue devices. I think it's because multi-queue vhost-net device > consist of many vhost_devs and NetClientStates, so certain vhost > operations are skipped unless this is the "first" or "last" vhost_dev > from a large aggregate vhost-net device. That might be responsible for > part of the weirdness. > >> It doesn’t make sense to me that vDPA would provide no function to fully >> reset a device, while vhost-user does. Being able to reset a device sounds >> vital to me. This also gave me the impression that SET_STATUS(0) on vDPA at >> least is functionally equivalent to a full device reset. >> >> >> Maybe SET_STATUS(0) does mean a full device reset on vDPA, but not on >> vhost-user. That would be a real shame, so I assumed this would not be the >> case; that SET_STATUS(0) does the same thing on both protocols. > Yes, exactly. It has the real VIRTIO spec meaning in vDPA. In vhost-user > it's currently only used by DPDK as a hint for when device > initialization is complete: > https://github.com/DPDK/dpdk/commit/41d201804c4c44738168e2d247d3b1780845faa1 FWIW, now the code is a bit different. https://github.com/DPDK/dpdk/commit/671cc679a5fcd26705bb20ddc13b93e665719054 has added a RESET interpretation for the status field, i.e. when it is 0. It doesn’t do anything, but at least DPDK seems to agree that SET_STATUS(0) is a reset. >> The virtio specification says “Writing 0 into this field resets the device.” >> about the device_status field. >> >> This also makes sense, because the device_status field is basically used to >> tell the device that a driver has taken control. If reset, this indicates >> the driver has given up control, and to me this is a point where a device >> should fully reset itself. >> >> So all in all, I can’t see the rationale why any implementation that >> supports SET_STATUS would decide to treat SET_STATUS(0) not as equivalent or >> a superset of RESET_DEVICE. I may be wrong, and this might explain a whole >> deal about what kind of background operations we hope to stop with >> SET_STATUS(0). > I would like vhost-user devices to implement SET_STATUS according to the > VIRTIO specification in the future and they can do that. But I think > front-ends should continue sending RESET_DEVICE in order to support old > devices. Well, yes, exactly. That is what I meant to address with this patch, vhost-user right now does not send RESET_DEVICE in its vhost_reset_status implementation, so the front-end will not fall back to RESET_DEVICE when it apparently does intend to reset the device[1]. We do arguably have vhost_reset_device, too, but for vDPA that is just a SET_STATUS(0) (there is no RESET_DEVICE on vDPA), and it’s also only called by vhost-user-scsi. So this also begs the question why we even do have vhost_reset_status and vhost_reset_device as two separate things. The commit introducing vhost_reset_status (c3716f260bf) doesn’t say. Maybe the intention was that vhost_reset_device would leave the status at 0, while vhost_reset_status would return it to ACKNOWLEDGE | DRIVER, as done by the introducing commit, but that comes back to patch 5 in this series – we don’t need to have ACKNOWLEDGE | DRIVER set after vhost_dev_stop(), so we don’t need vhost_reset_status to set those flags. They should be set in vhost_dev_start(). [1] This is assuming that SET_STATUS(0) is intended to reset the device, but it sounds like you agree on that. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/6] vhost-user: Have reset_status fall back to reset 2023-07-21 14:16 ` Hanna Czenczek @ 2023-07-24 18:04 ` Stefan Hajnoczi 2023-07-25 8:39 ` Hanna Czenczek 0 siblings, 1 reply; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-24 18:04 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione, Maxime Coquelin [-- Attachment #1: Type: text/plain, Size: 8105 bytes --] On Fri, Jul 21, 2023 at 04:16:07PM +0200, Hanna Czenczek wrote: > On 20.07.23 18:03, Stefan Hajnoczi wrote: > > On Wed, Jul 19, 2023 at 04:27:58PM +0200, Hanna Czenczek wrote: > > > On 19.07.23 16:11, Hanna Czenczek wrote: > > > > On 18.07.23 17:10, Stefan Hajnoczi wrote: > > > > > On Tue, Jul 11, 2023 at 05:52:28PM +0200, Hanna Czenczek wrote: > > > > > > The only user of vhost_user_reset_status() is vhost_dev_stop(), which > > > > > > only uses it as a fall-back to stop the back-end if it does not support > > > > > > SUSPEND. However, vhost-user's implementation is a no-op unless the > > > > > > back-end supports SET_STATUS. > > > > > > > > > > > > vhost-vdpa's implementation instead just calls > > > > > > vhost_vdpa_reset_device(), implying that it's OK to fully reset the > > > > > > device if SET_STATUS is not supported. > > > > > > > > > > > > To be fair, vhost_vdpa_reset_device() does nothing but to set > > > > > > the status > > > > > > to zero. However, that may well be because vhost-vdpa has no method > > > > > > besides this to reset a device. In contrast, vhost-user has > > > > > > RESET_DEVICE and a RESET_OWNER, which can be used instead. > > > > > > > > > > > > While it is not entirely clear from documentation or git logs, from > > > > > > discussions and the order of vhost-user protocol features, it > > > > > > appears to > > > > > > me as if RESET_OWNER originally had no real meaning for vhost-user, and > > > > > > was thus used to signal a device reset to the back-end. Then, > > > > > > RESET_DEVICE was introduced, to have a well-defined dedicated reset > > > > > > command. Finally, vhost-user received full STATUS support, including > > > > > > SET_STATUS, so setting the device status to 0 is now the preferred way > > > > > > of resetting a device. Still, RESET_DEVICE and RESET_OWNER should > > > > > > remain valid as fall-backs. > > > > > > > > > > > > Therefore, have vhost_user_reset_status() fall back to > > > > > > vhost_user_reset_device() if the back-end has no STATUS support. > > > > > > > > > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com> > > > > > > --- > > > > > > hw/virtio/vhost-user.c | 2 ++ > > > > > > 1 file changed, 2 insertions(+) > > > > > > > > > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > > > index 4507de5a92..53a881ec2a 100644 > > > > > > --- a/hw/virtio/vhost-user.c > > > > > > +++ b/hw/virtio/vhost-user.c > > > > > > @@ -2833,6 +2833,8 @@ static void vhost_user_reset_status(struct > > > > > > vhost_dev *dev) > > > > > > if (virtio_has_feature(dev->protocol_features, > > > > > > VHOST_USER_PROTOCOL_F_STATUS)) { > > > > > > vhost_user_set_status(dev, 0); > > > > > > + } else { > > > > > > + vhost_user_reset_device(dev); > > > > > > } > > > > > > } > > > > > Did you check whether DPDK treats setting the status to 0 as equivalent > > > > > to RESET_DEVICE? > > > > If it doesn’t, what’s even the point of using reset_status? > > > Sorry, I’m being unclear, and I think this may be important because it ties > > > into the question from patch 1, what qemu is even trying to do by running > > > SET_STATUS(0) vhost_dev_stop(), so here’s what gave me the impression that > > > SET_STATUS(0) and RESET_DEVICE should be equivalent: > > > > > > vhost-vdpa.c runs SET_STATUS(0) in a function called > > > vhost_vdpa_reset_device(). This is one thing that gave me the impression > > > that this is about an actual full reset. > > > > > > Another is the whole discussion that we’ve had. vhost_dev_stop() does not > > > call a `vhost_reset_device()` function, it calls `vhost_reset_status()`. > > > Still, we were always talking about resetting the device. > > There is some hacky stuff with struct vhost_dev's vq_index_end and > > multi-queue devices. I think it's because multi-queue vhost-net device > > consist of many vhost_devs and NetClientStates, so certain vhost > > operations are skipped unless this is the "first" or "last" vhost_dev > > from a large aggregate vhost-net device. That might be responsible for > > part of the weirdness. > > > > > It doesn’t make sense to me that vDPA would provide no function to fully > > > reset a device, while vhost-user does. Being able to reset a device sounds > > > vital to me. This also gave me the impression that SET_STATUS(0) on vDPA at > > > least is functionally equivalent to a full device reset. > > > > > > > > > Maybe SET_STATUS(0) does mean a full device reset on vDPA, but not on > > > vhost-user. That would be a real shame, so I assumed this would not be the > > > case; that SET_STATUS(0) does the same thing on both protocols. > > Yes, exactly. It has the real VIRTIO spec meaning in vDPA. In vhost-user > > it's currently only used by DPDK as a hint for when device > > initialization is complete: > > https://github.com/DPDK/dpdk/commit/41d201804c4c44738168e2d247d3b1780845faa1 > > FWIW, now the code is a bit different. > https://github.com/DPDK/dpdk/commit/671cc679a5fcd26705bb20ddc13b93e665719054 > has added a RESET interpretation for the status field, i.e. when it is 0. > It doesn’t do anything, but at least DPDK seems to agree that SET_STATUS(0) > is a reset. That patch adds diagnostics but does not perform any action for SET_STATUS 0. DPDK's vhost_user_reset_owner() is still the only place where the device is actually reset. QEMU cannot switch to just SET_STATUS 0, it still needs to send RESET_DEVICE/RESET_OWNER. > > > > The virtio specification says “Writing 0 into this field resets the device.” > > > about the device_status field. > > > > > > This also makes sense, because the device_status field is basically used to > > > tell the device that a driver has taken control. If reset, this indicates > > > the driver has given up control, and to me this is a point where a device > > > should fully reset itself. > > > > > > So all in all, I can’t see the rationale why any implementation that > > > supports SET_STATUS would decide to treat SET_STATUS(0) not as equivalent or > > > a superset of RESET_DEVICE. I may be wrong, and this might explain a whole > > > deal about what kind of background operations we hope to stop with > > > SET_STATUS(0). > > I would like vhost-user devices to implement SET_STATUS according to the > > VIRTIO specification in the future and they can do that. But I think > > front-ends should continue sending RESET_DEVICE in order to support old > > devices. > > Well, yes, exactly. That is what I meant to address with this patch, > vhost-user right now does not send RESET_DEVICE in its vhost_reset_status > implementation, so the front-end will not fall back to RESET_DEVICE when it > apparently does intend to reset the device[1]. We do arguably have > vhost_reset_device, too, but for vDPA that is just a SET_STATUS(0) (there is > no RESET_DEVICE on vDPA), and it’s also only called by vhost-user-scsi. > > So this also begs the question why we even do have vhost_reset_status and > vhost_reset_device as two separate things. The commit introducing > vhost_reset_status (c3716f260bf) doesn’t say. Maybe the intention was that > vhost_reset_device would leave the status at 0, while vhost_reset_status > would return it to ACKNOWLEDGE | DRIVER, as done by the introducing commit, > but that comes back to patch 5 in this series – we don’t need to have > ACKNOWLEDGE | DRIVER set after vhost_dev_stop(), so we don’t need > vhost_reset_status to set those flags. They should be set in > vhost_dev_start(). > > [1] This is assuming that SET_STATUS(0) is intended to reset the device, but > it sounds like you agree on that. I don't know the answers, but I think it's safe to go ahead with a SET_STATUS sequence that follows the VIRTIO spec, plus a VHOST_USER_RESET_DEVICE/VHOST_USER_RESET_OWNER. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 6/6] vhost-user: Have reset_status fall back to reset 2023-07-24 18:04 ` Stefan Hajnoczi @ 2023-07-25 8:39 ` Hanna Czenczek 0 siblings, 0 replies; 37+ messages in thread From: Hanna Czenczek @ 2023-07-25 8:39 UTC (permalink / raw) To: Stefan Hajnoczi Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione, Maxime Coquelin On 24.07.23 20:04, Stefan Hajnoczi wrote: > On Fri, Jul 21, 2023 at 04:16:07PM +0200, Hanna Czenczek wrote: >> On 20.07.23 18:03, Stefan Hajnoczi wrote: >>> On Wed, Jul 19, 2023 at 04:27:58PM +0200, Hanna Czenczek wrote: >>>> On 19.07.23 16:11, Hanna Czenczek wrote: >>>>> On 18.07.23 17:10, Stefan Hajnoczi wrote: >>>>>> On Tue, Jul 11, 2023 at 05:52:28PM +0200, Hanna Czenczek wrote: >>>>>>> The only user of vhost_user_reset_status() is vhost_dev_stop(), which >>>>>>> only uses it as a fall-back to stop the back-end if it does not support >>>>>>> SUSPEND. However, vhost-user's implementation is a no-op unless the >>>>>>> back-end supports SET_STATUS. >>>>>>> >>>>>>> vhost-vdpa's implementation instead just calls >>>>>>> vhost_vdpa_reset_device(), implying that it's OK to fully reset the >>>>>>> device if SET_STATUS is not supported. >>>>>>> >>>>>>> To be fair, vhost_vdpa_reset_device() does nothing but to set >>>>>>> the status >>>>>>> to zero. However, that may well be because vhost-vdpa has no method >>>>>>> besides this to reset a device. In contrast, vhost-user has >>>>>>> RESET_DEVICE and a RESET_OWNER, which can be used instead. >>>>>>> >>>>>>> While it is not entirely clear from documentation or git logs, from >>>>>>> discussions and the order of vhost-user protocol features, it >>>>>>> appears to >>>>>>> me as if RESET_OWNER originally had no real meaning for vhost-user, and >>>>>>> was thus used to signal a device reset to the back-end. Then, >>>>>>> RESET_DEVICE was introduced, to have a well-defined dedicated reset >>>>>>> command. Finally, vhost-user received full STATUS support, including >>>>>>> SET_STATUS, so setting the device status to 0 is now the preferred way >>>>>>> of resetting a device. Still, RESET_DEVICE and RESET_OWNER should >>>>>>> remain valid as fall-backs. >>>>>>> >>>>>>> Therefore, have vhost_user_reset_status() fall back to >>>>>>> vhost_user_reset_device() if the back-end has no STATUS support. >>>>>>> >>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> >>>>>>> --- >>>>>>> hw/virtio/vhost-user.c | 2 ++ >>>>>>> 1 file changed, 2 insertions(+) >>>>>>> >>>>>>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c >>>>>>> index 4507de5a92..53a881ec2a 100644 >>>>>>> --- a/hw/virtio/vhost-user.c >>>>>>> +++ b/hw/virtio/vhost-user.c >>>>>>> @@ -2833,6 +2833,8 @@ static void vhost_user_reset_status(struct >>>>>>> vhost_dev *dev) >>>>>>> if (virtio_has_feature(dev->protocol_features, >>>>>>> VHOST_USER_PROTOCOL_F_STATUS)) { >>>>>>> vhost_user_set_status(dev, 0); >>>>>>> + } else { >>>>>>> + vhost_user_reset_device(dev); >>>>>>> } >>>>>>> } >>>>>> Did you check whether DPDK treats setting the status to 0 as equivalent >>>>>> to RESET_DEVICE? >>>>> If it doesn’t, what’s even the point of using reset_status? >>>> Sorry, I’m being unclear, and I think this may be important because it ties >>>> into the question from patch 1, what qemu is even trying to do by running >>>> SET_STATUS(0) vhost_dev_stop(), so here’s what gave me the impression that >>>> SET_STATUS(0) and RESET_DEVICE should be equivalent: >>>> >>>> vhost-vdpa.c runs SET_STATUS(0) in a function called >>>> vhost_vdpa_reset_device(). This is one thing that gave me the impression >>>> that this is about an actual full reset. >>>> >>>> Another is the whole discussion that we’ve had. vhost_dev_stop() does not >>>> call a `vhost_reset_device()` function, it calls `vhost_reset_status()`. >>>> Still, we were always talking about resetting the device. >>> There is some hacky stuff with struct vhost_dev's vq_index_end and >>> multi-queue devices. I think it's because multi-queue vhost-net device >>> consist of many vhost_devs and NetClientStates, so certain vhost >>> operations are skipped unless this is the "first" or "last" vhost_dev >>> from a large aggregate vhost-net device. That might be responsible for >>> part of the weirdness. >>> >>>> It doesn’t make sense to me that vDPA would provide no function to fully >>>> reset a device, while vhost-user does. Being able to reset a device sounds >>>> vital to me. This also gave me the impression that SET_STATUS(0) on vDPA at >>>> least is functionally equivalent to a full device reset. >>>> >>>> >>>> Maybe SET_STATUS(0) does mean a full device reset on vDPA, but not on >>>> vhost-user. That would be a real shame, so I assumed this would not be the >>>> case; that SET_STATUS(0) does the same thing on both protocols. >>> Yes, exactly. It has the real VIRTIO spec meaning in vDPA. In vhost-user >>> it's currently only used by DPDK as a hint for when device >>> initialization is complete: >>> https://github.com/DPDK/dpdk/commit/41d201804c4c44738168e2d247d3b1780845faa1 >> FWIW, now the code is a bit different. >> https://github.com/DPDK/dpdk/commit/671cc679a5fcd26705bb20ddc13b93e665719054 >> has added a RESET interpretation for the status field, i.e. when it is 0. >> It doesn’t do anything, but at least DPDK seems to agree that SET_STATUS(0) >> is a reset. > That patch adds diagnostics but does not perform any action for > SET_STATUS 0. DPDK's vhost_user_reset_owner() is still the only place > where the device is actually reset. That’s what I said, it doesn’t do anything, but the diagnostics agree that it is a RESET. > QEMU cannot switch to just > SET_STATUS 0, it still needs to send RESET_DEVICE/RESET_OWNER. That is what I questioned below: We currently *do not* call RESET_DEVICE/RESET_OWNER. This patch is not about switching to SET_STATUS(0), it is about having RESET_DEVICE/RESET_OWNER be fallbacks for it. >>>> The virtio specification says “Writing 0 into this field resets the device.” >>>> about the device_status field. >>>> >>>> This also makes sense, because the device_status field is basically used to >>>> tell the device that a driver has taken control. If reset, this indicates >>>> the driver has given up control, and to me this is a point where a device >>>> should fully reset itself. >>>> >>>> So all in all, I can’t see the rationale why any implementation that >>>> supports SET_STATUS would decide to treat SET_STATUS(0) not as equivalent or >>>> a superset of RESET_DEVICE. I may be wrong, and this might explain a whole >>>> deal about what kind of background operations we hope to stop with >>>> SET_STATUS(0). >>> I would like vhost-user devices to implement SET_STATUS according to the >>> VIRTIO specification in the future and they can do that. But I think >>> front-ends should continue sending RESET_DEVICE in order to support old >>> devices. >> Well, yes, exactly. That is what I meant to address with this patch, >> vhost-user right now does not send RESET_DEVICE in its vhost_reset_status >> implementation, so the front-end will not fall back to RESET_DEVICE when it >> apparently does intend to reset the device[1]. We do arguably have >> vhost_reset_device, too, but for vDPA that is just a SET_STATUS(0) (there is >> no RESET_DEVICE on vDPA), and it’s also only called by vhost-user-scsi. >> >> So this also begs the question why we even do have vhost_reset_status and >> vhost_reset_device as two separate things. The commit introducing >> vhost_reset_status (c3716f260bf) doesn’t say. Maybe the intention was that >> vhost_reset_device would leave the status at 0, while vhost_reset_status >> would return it to ACKNOWLEDGE | DRIVER, as done by the introducing commit, >> but that comes back to patch 5 in this series – we don’t need to have >> ACKNOWLEDGE | DRIVER set after vhost_dev_stop(), so we don’t need >> vhost_reset_status to set those flags. They should be set in >> vhost_dev_start(). >> >> [1] This is assuming that SET_STATUS(0) is intended to reset the device, but >> it sounds like you agree on that. > I don't know the answers, but I think it's safe to go ahead with a > SET_STATUS sequence that follows the VIRTIO spec, plus a > VHOST_USER_RESET_DEVICE/VHOST_USER_RESET_OWNER. So what you’re saying is that RESET_DEVICE/RESET_OWNER should not be fallbacks, but be invoked in addition to SET_STATUS(0)? If so, that would be silly. I see your point that DPDK resets only in response to RESET_DEVICE/RESET_OWNER, but the diagnostics agree that SET_STATUS(0) is a reset, which is why I find this so silly. It sounds to me as if any properly behaving implementation would fully reset the back-end on SET_STATUS(0), so unconditionally invoking RESET_DEVICE/RESET_OWNER afterwards is just doing a double-reset. Notably, invoking RESET_DEVICE/RESET_OWNER in addition to SET_STATUS(0) (instead of as a fallback) would be a change in behavior, because we do not call RESET_DEVICE/RESET_OWNER outside of vhost-user-scsi today. Hanna ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 0/6] vhost-user: Add suspend/resume 2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek ` (5 preceding siblings ...) 2023-07-11 15:52 ` [PATCH 6/6] vhost-user: Have reset_status fall back to reset Hanna Czenczek @ 2023-07-18 15:14 ` Stefan Hajnoczi 6 siblings, 0 replies; 37+ messages in thread From: Stefan Hajnoczi @ 2023-07-18 15:14 UTC (permalink / raw) To: Hanna Czenczek Cc: qemu-devel, Michael S . Tsirkin, Eugenio Pérez, German Maglione [-- Attachment #1: Type: text/plain, Size: 2079 bytes --] On Tue, Jul 11, 2023 at 05:52:22PM +0200, Hanna Czenczek wrote: > Hi, > > As discussed on the previous version of the virtio-fs migration series > (https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01575.html), > we currently don’t have a good way to have a vhost-user back-end fully > cease all operations, including background operations. To work around > this, we reset it, which is not an option for stateful devices like > virtio-fs. > > Instead, we want the same SUSPEND/RESUME model that vhost-vdpa already > has, so that we can suspend back-ends when we want them to stop doing > anything (i.e. on VM stop), and resume them later (i.e. on VM resume). > This series adds these vhost-user operations to the protocol and > implements them in qemu. Furthermore, it has vhost-user and vhost-vdpa > do roughly the same thing in their reset paths, as far as possible. > That path will still remain as a fall-back if SUSPEND/RESUME is not > implemented, and, given that qemu’s vhost-vdpa code currently does not > make use of RESUME, it is actually always used for vhost-vdpa (to take > the device out of a suspended state). > > > Hanna Czenczek (6): > vhost-user.rst: Add suspend/resume > vhost-vdpa: Move vhost_vdpa_reset_status() up > vhost: Do not reset suspended devices on stop > vhost-user: Implement suspend/resume > vhost-vdpa: Match vhost-user's status reset > vhost-user: Have reset_status fall back to reset > > docs/interop/vhost-user.rst | 35 +++++++++++- > include/hw/virtio/vhost-vdpa.h | 2 - > include/hw/virtio/vhost.h | 8 +++ > hw/virtio/vhost-user.c | 101 ++++++++++++++++++++++++++++++++- > hw/virtio/vhost-vdpa.c | 41 ++++++------- > hw/virtio/vhost.c | 8 ++- > 6 files changed, 169 insertions(+), 26 deletions(-) Hi Hanna, I posted comments but wanted to say great job! There was a long and somewhat messy email discussion to figure out how to proceed and you came up with a clean patch series that solves the issues. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2023-07-27 21:48 UTC | newest] Thread overview: 37+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-11 15:52 [PATCH 0/6] vhost-user: Add suspend/resume Hanna Czenczek 2023-07-11 15:52 ` [PATCH 1/6] vhost-user.rst: " Hanna Czenczek 2023-07-18 14:25 ` Stefan Hajnoczi 2023-07-19 13:59 ` Hanna Czenczek 2023-07-24 17:55 ` Stefan Hajnoczi 2023-07-25 8:30 ` Hanna Czenczek 2023-07-27 21:12 ` Stefan Hajnoczi 2023-07-11 15:52 ` [PATCH 2/6] vhost-vdpa: Move vhost_vdpa_reset_status() up Hanna Czenczek 2023-07-18 14:29 ` Stefan Hajnoczi 2023-07-11 15:52 ` [PATCH 3/6] vhost: Do not reset suspended devices on stop Hanna Czenczek 2023-07-18 14:33 ` Stefan Hajnoczi 2023-07-21 15:25 ` Eugenio Perez Martin 2023-07-21 16:07 ` Hanna Czenczek 2023-07-24 15:48 ` Eugenio Perez Martin 2023-07-25 7:53 ` Hanna Czenczek 2023-07-25 10:03 ` Eugenio Perez Martin 2023-07-25 13:09 ` Hanna Czenczek 2023-07-25 18:53 ` Eugenio Perez Martin 2023-07-26 6:57 ` Hanna Czenczek 2023-07-27 12:49 ` Eugenio Perez Martin 2023-07-27 20:26 ` Stefan Hajnoczi 2023-07-11 15:52 ` [PATCH 4/6] vhost-user: Implement suspend/resume Hanna Czenczek 2023-07-18 14:37 ` Stefan Hajnoczi 2023-07-11 15:52 ` [PATCH 5/6] vhost-vdpa: Match vhost-user's status reset Hanna Czenczek 2023-07-18 14:50 ` Stefan Hajnoczi 2023-07-19 14:09 ` Hanna Czenczek 2023-07-19 15:06 ` Stefan Hajnoczi 2023-07-21 15:47 ` Eugenio Perez Martin 2023-07-11 15:52 ` [PATCH 6/6] vhost-user: Have reset_status fall back to reset Hanna Czenczek 2023-07-18 15:10 ` Stefan Hajnoczi 2023-07-19 14:11 ` Hanna Czenczek 2023-07-19 14:27 ` Hanna Czenczek 2023-07-20 16:03 ` Stefan Hajnoczi 2023-07-21 14:16 ` Hanna Czenczek 2023-07-24 18:04 ` Stefan Hajnoczi 2023-07-25 8:39 ` Hanna Czenczek 2023-07-18 15:14 ` [PATCH 0/6] vhost-user: Add suspend/resume Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).