qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Hanna Czenczek <hreitz@redhat.com>
Cc: qemu-devel@nongnu.org, virtio-fs@redhat.com,
	"Eugenio Pérez" <eperezma@redhat.com>,
	"Anton Kuchin" <antonkuchin@yandex-team.ru>,
	"Yajun Wu" <yajunw@nvidia.com>
Subject: Re: [Virtio-fs] (no subject)
Date: Fri, 6 Oct 2023 06:34:36 -0400	[thread overview]
Message-ID: <20231006055229-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <a4af0357-12ee-fc7f-e249-239da34409b0@redhat.com>

On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote:
> On 06.10.23 11:26, Michael S. Tsirkin wrote:
> > On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote:
> > > On 06.10.23 10:45, Michael S. Tsirkin wrote:
> > > > On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote:
> > > > > On 05.10.23 19:15, Michael S. Tsirkin wrote:
> > > > > > On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote:
> > > > > > > On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote:
> > > > > > > > There is no clearly defined purpose for the virtio status byte in
> > > > > > > > vhost-user: For resetting, we already have RESET_DEVICE; and for virtio
> > > > > > > > feature negotiation, we have [GS]ET_FEATURES.  With the REPLY_ACK
> > > > > > > > protocol extension, it is possible for SET_FEATURES to return errors
> > > > > > > > (SET_PROTOCOL_FEATURES may be called before SET_FEATURES).
> > > > > > > > 
> > > > > > > > As for implementations, SET_STATUS is not widely implemented.  dpdk does
> > > > > > > > implement it, but only uses it to signal feature negotiation failure.
> > > > > > > > While it does log reset requests (SET_STATUS 0) as such, it effectively
> > > > > > > > ignores them, in contrast to RESET_OWNER (which is deprecated, and today
> > > > > > > > means the same thing as RESET_DEVICE).
> > > > > > > > 
> > > > > > > > While qemu superficially has support for [GS]ET_STATUS, it does not
> > > > > > > > forward the guest-set status byte, but instead just makes it up
> > > > > > > > internally, and actually completely ignores what the back-end returns,
> > > > > > > > only using it as the template for a subsequent SET_STATUS to add single
> > > > > > > > bits to it.  Notably, after setting FEATURES_OK, it never reads it back
> > > > > > > > to see whether the flag is still set, which is the only way in which
> > > > > > > > dpdk uses the status byte.
> > > > > > > > 
> > > > > > > > As-is, no front-end or back-end can rely on the other side handling this
> > > > > > > > field in a useful manner, and it also provides no practical use over
> > > > > > > > other mechanisms the vhost-user protocol has, which are more clearly
> > > > > > > > defined.  Deprecate it.
> > > > > > > > 
> > > > > > > > Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > > > Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> > > > > > > > ---
> > > > > > > >     docs/interop/vhost-user.rst | 28 +++++++++++++++++++++-------
> > > > > > > >     1 file changed, 21 insertions(+), 7 deletions(-)
> > > > > > > Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK.
> > > > > > The fact current backends never check errors does not mean they never
> > > > > > will. So no, not applying this.
> > > > > Can this not be done with REPLY_ACK?  I.e., with the following message
> > > > > order:
> > > > > 
> > > > > 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is
> > > > > present
> > > > > 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK
> > > > > 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK
> > > > > 4. SET_FEATURES with need_reply
> > > > > 
> > > > > If not, the problem is that qemu has sent SET_STATUS 0 for a while when the
> > > > > vCPUs are stopped, which generally seems to request a device reset.  If we
> > > > > don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will
> > > > > implement SET_STATUS later may break with at least these qemu versions.  But
> > > > > documenting that a particular use of the status byte is to be ignored would
> > > > > be really strange.
> > > > > 
> > > > > Hanna
> > > > Hmm I guess. Though just following virtio spec seems cleaner to me...
> > > > vhost-user reconfigures the state fully on start.
> > > Not the internal device state, though.  virtiofsd has internal state, and
> > > other devices like vhost-gpu back-ends would probably, too.
> > > 
> > > Stefan has recently sent a series
> > > (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to
> > > put the reset (RESET_DEVICE) into virtio_reset() (when we really need a
> > > reset).
> > > 
> > > I really don’t like our current approach with the status byte. Following the
> > > virtio specification to me would mean that the guest directly controls this
> > > byte, which it does not.  qemu makes up values as it deems appropriate, and
> > > this includes sending a SET_STATUS 0 when the guest is just paused, i.e.
> > > when the guest really doesn’t want a device reset.
> > > 
> > > That means that qemu does not treat this as a virtio device field (because
> > > that would mean exposing it to the guest driver), but instead treats it as
> > > part of the vhost(-user) protocol.  It doesn’t feel right to me that we use
> > > a virtio-defined feature for communication on the vhost level, i.e. between
> > > front-end and back-end, and not between guest driver and device.  I think
> > > all vhost-level protocol features should be fully defined in the vhost-user
> > > specification, which REPLY_ACK is.
> > Hmm that makes sense. Maybe we should have done what stefan's patch
> > is doing.
> > 
> > Do look at the original commit that introduced it to understand why
> > it was added.
> 
> I don’t understand why this was added to the stop/cont code, though.  If it
> is time consuming to make these changes, why are they done every time the VM
> is paused
> and resumed?  It makes sense that this would be done for the initial
> configuration (where a reset also wouldn’t hurt), but here it seems wrong.
> 
> (To be clear, a reset in the stop/cont code is wrong, because it breaks
> stateful devices.)
> 
> Also, note the newer commits 6f8be29ec17 and c3716f260bf.  The reset as
> originally introduced was wrong even for non-stateful devices, because it
> occurred before we fetched the state (vring indices) so we could restore it
> later.  I don’t know how 923b8921d21 was tested, but if the back-end used
> for testing implemented SET_STATUS 0 as a reset, it could not have survived
> either migration or a stop/cont in general, because the vring indices would
> have been reset to 0.
> 
> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all
> devices that would implement them as per virtio spec, and even today it’s
> broken for stateful devices.  The mentioned performance issue is likely
> real, but we can’t address it by making up SET_STATUS calls that are wrong.
> 
> I concede that I didn’t think about DRIVER_OK.  Personally, I would do all
> final configuration that would happen upon a DRIVER_OK once the first vring
> is started (i.e. receives a kick).  That has the added benefit of being
> asynchronous because it doesn’t block any vhost-user messages (which are
> synchronous, and thus block downtime).
> 
> Hanna


For better or worse kick is per ring. It's out of spec to start rings
that were not kicked but I guess you could do configuration ...
Seems somewhat asymmetrical though.

Let's wait until next week, hopefully Yajun Wu will answer.

> > > Now, we could hand full control of the status byte to the guest, and that
> > > would make me content.  But I feel like that doesn’t really work, because
> > > qemu needs to intercept the status byte anyway (it needs to know when there
> > > is a reset, probably wants to know when the device is configured, etc.), so
> > > I don’t think having the status byte in vhost-user really gains us much when
> > > qemu could translate status byte changes to/from other vhost-user commands.
> > > 
> > > Hanna
> > well it intercepts it but I think it could pass it on unchanged.
> > 
> > 
> > > > I guess symmetry was the
> > > > point. So I don't see why SET_STATUS 0 has to be ignored.
> > > > 
> > > > 
> > > > SET_STATUS was introduced by:
> > > > 
> > > > commit 923b8921d210763359e96246a58658ac0db6c645
> > > > Author: Yajun Wu <yajunw@nvidia.com>
> > > > Date:   Mon Oct 17 14:44:52 2022 +0800
> > > > 
> > > >       vhost-user: Support vhost_dev_start
> > > > 
> > > > CC the author.
> > > > 



  reply	other threads:[~2023-10-06 10:35 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-04 12:58 [PATCH v4 0/8] vhost-user: Back-end state migration Hanna Czenczek
2023-10-04 12:58 ` [PATCH v4 1/8] vhost-user.rst: Deprecate [GS]ET_STATUS Hanna Czenczek
2023-10-05 17:08   ` Stefan Hajnoczi
2023-10-05 17:15     ` Michael S. Tsirkin
2023-10-06  7:48       ` [Virtio-fs] (no subject) Hanna Czenczek
2023-10-06  8:45         ` Michael S. Tsirkin
2023-10-06  9:15           ` Hanna Czenczek
2023-10-06  9:26             ` Michael S. Tsirkin
2023-10-06  9:47               ` Hanna Czenczek
2023-10-06 10:34                 ` Michael S. Tsirkin [this message]
2023-10-06 11:42                   ` Hanna Czenczek
2023-10-06 15:17                     ` Alex Bennée
2023-10-06 15:47                       ` Hanna Czenczek
2023-10-06 20:49                         ` Alex Bennée
2023-10-09  8:07                           ` Hanna Czenczek
2023-10-07  2:22                   ` Yajun Wu
2023-10-09  8:21                     ` Hanna Czenczek
2023-10-09  9:07                       ` Hanna Czenczek
2023-10-09  9:13                         ` Hanna Czenczek
2023-10-10  4:00                           ` Yajun Wu
2023-10-10  8:18                             ` Hanna Czenczek
2023-10-10 10:36                               ` Alex Bennée
2023-10-10 13:18                                 ` Hanna Czenczek
2023-10-10 14:35                                   ` Alex Bennée
2023-10-13 18:02                                     ` Hanna Czenczek
2023-10-17  7:49                                       ` Viresh Kumar
2023-10-17  8:13                                         ` Hanna Czenczek
2023-10-09 10:28                     ` German Maglione
2023-10-10  2:56                       ` Yajun Wu
2023-10-10 10:04                         ` German Maglione
2023-10-04 12:58 ` [PATCH v4 2/8] vhost-user.rst: Improve [GS]ET_VRING_BASE doc Hanna Czenczek
2023-10-05 17:38   ` Stefan Hajnoczi
2023-10-06  7:53     ` [Virtio-fs] " Hanna Czenczek
2023-10-06  8:49       ` Michael S. Tsirkin
2023-10-06 13:55         ` Hanna Czenczek
2023-10-06 13:58           ` Hanna Czenczek
2023-10-07 21:29             ` Michael S. Tsirkin
2023-10-07 21:27           ` Michael S. Tsirkin
2023-10-04 12:58 ` [PATCH v4 3/8] vhost-user.rst: Clarify enabling/disabling vrings Hanna Czenczek
2023-10-05 17:43   ` Stefan Hajnoczi
2023-10-18 12:14   ` Michael S. Tsirkin
2023-10-18 16:17     ` Hanna Czenczek
2023-10-04 12:59 ` [PATCH v4 4/8] vhost-user.rst: Introduce suspended state Hanna Czenczek
2023-10-05 17:44   ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 5/8] vhost-user.rst: Migrating back-end-internal state Hanna Czenczek
2023-10-05 17:46   ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 6/8] vhost-user: Interface for migration state transfer Hanna Czenczek
2023-10-05 17:46   ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 7/8] vhost: Add high-level state save/load functions Hanna Czenczek
2023-10-05 17:46   ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 8/8] vhost-user-fs: Implement internal migration Hanna Czenczek
2023-10-05 17:46   ` Stefan Hajnoczi
2023-10-05 17:48 ` [PATCH v4 0/8] vhost-user: Back-end state migration Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231006055229-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=antonkuchin@yandex-team.ru \
    --cc=eperezma@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtio-fs@redhat.com \
    --cc=yajunw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).