From: Yajun Wu <yajunw@nvidia.com>
To: German Maglione <gmaglione@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
"Hanna Czenczek" <hreitz@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"virtio-fs@redhat.com" <virtio-fs@redhat.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"maxime.coquelin@redhat.com" <maxime.coquelin@redhat.com>,
"Parav Pandit" <parav@nvidia.com>,
"Anton Kuchin" <antonkuchin@yandex-team.ru>
Subject: Re: [Virtio-fs] (no subject)
Date: Tue, 10 Oct 2023 10:56:59 +0800 [thread overview]
Message-ID: <ccfb153c-0e2e-15dd-b543-5b5a8bebbff4@nvidia.com> (raw)
In-Reply-To: <CAJh=p+5OFrx4nP5rYKrgZ_y02n358wy70c4L0L4fHKQUo9fU5Q@mail.gmail.com>
On 10/9/2023 6:28 PM, German Maglione wrote:
> External email: Use caution opening links or attachments
>
>
> On Sat, Oct 7, 2023 at 4:23 AM Yajun Wu <yajunw@nvidia.com> wrote:
>>
>> On 10/6/2023 6:34 PM, Michael S. Tsirkin wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Fri, Oct 06, 2023 at 11:47:55AM +0200, Hanna Czenczek wrote:
>>>> On 06.10.23 11:26, Michael S. Tsirkin wrote:
>>>>> On Fri, Oct 06, 2023 at 11:15:55AM +0200, Hanna Czenczek wrote:
>>>>>> On 06.10.23 10:45, Michael S. Tsirkin wrote:
>>>>>>> On Fri, Oct 06, 2023 at 09:48:14AM +0200, Hanna Czenczek wrote:
>>>>>>>> On 05.10.23 19:15, Michael S. Tsirkin wrote:
>>>>>>>>> On Thu, Oct 05, 2023 at 01:08:52PM -0400, Stefan Hajnoczi wrote:
>>>>>>>>>> On Wed, Oct 04, 2023 at 02:58:57PM +0200, Hanna Czenczek wrote:
>>>>>>>>>>> There is no clearly defined purpose for the virtio status byte in
>>>>>>>>>>> vhost-user: For resetting, we already have RESET_DEVICE; and for virtio
>>>>>>>>>>> feature negotiation, we have [GS]ET_FEATURES. With the REPLY_ACK
>>>>>>>>>>> protocol extension, it is possible for SET_FEATURES to return errors
>>>>>>>>>>> (SET_PROTOCOL_FEATURES may be called before SET_FEATURES).
>>>>>>>>>>>
>>>>>>>>>>> As for implementations, SET_STATUS is not widely implemented. dpdk does
>>>>>>>>>>> implement it, but only uses it to signal feature negotiation failure.
>>>>>>>>>>> While it does log reset requests (SET_STATUS 0) as such, it effectively
>>>>>>>>>>> ignores them, in contrast to RESET_OWNER (which is deprecated, and today
>>>>>>>>>>> means the same thing as RESET_DEVICE).
>>>>>>>>>>>
>>>>>>>>>>> While qemu superficially has support for [GS]ET_STATUS, it does not
>>>>>>>>>>> forward the guest-set status byte, but instead just makes it up
>>>>>>>>>>> internally, and actually completely ignores what the back-end returns,
>>>>>>>>>>> only using it as the template for a subsequent SET_STATUS to add single
>>>>>>>>>>> bits to it. Notably, after setting FEATURES_OK, it never reads it back
>>>>>>>>>>> to see whether the flag is still set, which is the only way in which
>>>>>>>>>>> dpdk uses the status byte.
>>>>>>>>>>>
>>>>>>>>>>> As-is, no front-end or back-end can rely on the other side handling this
>>>>>>>>>>> field in a useful manner, and it also provides no practical use over
>>>>>>>>>>> other mechanisms the vhost-user protocol has, which are more clearly
>>>>>>>>>>> defined. Deprecate it.
>>>>>>>>>>>
>>>>>>>>>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>>>>>>>>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>>>>>>>>>>> ---
>>>>>>>>>>> docs/interop/vhost-user.rst | 28 +++++++++++++++++++++-------
>>>>>>>>>>> 1 file changed, 21 insertions(+), 7 deletions(-)
>>>>>>>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>>>>>>> SET_STATUS is the only way to signal failure to acknowledge FEATURES_OK.
>>>>>>>>> The fact current backends never check errors does not mean they never
>>>>>>>>> will. So no, not applying this.
>>>>>>>> Can this not be done with REPLY_ACK? I.e., with the following message
>>>>>>>> order:
>>>>>>>>
>>>>>>>> 1. GET_FEATURES to find out whether VHOST_USER_F_PROTOCOL_FEATURES is
>>>>>>>> present
>>>>>>>> 2. GET_PROTOCOL_FEATURES to hopefully get VHOST_USER_PROTOCOL_F_REPLY_ACK
>>>>>>>> 3. SET_PROTOCOL_FEATURES to set VHOST_USER_PROTOCOL_F_REPLY_ACK
>>>>>>>> 4. SET_FEATURES with need_reply
>>>>>>>>
>>>>>>>> If not, the problem is that qemu has sent SET_STATUS 0 for a while when the
>>>>>>>> vCPUs are stopped, which generally seems to request a device reset. If we
>>>>>>>> don’t state at least that SET_STATUS 0 is to be ignored, back-ends that will
>>>>>>>> implement SET_STATUS later may break with at least these qemu versions. But
>>>>>>>> documenting that a particular use of the status byte is to be ignored would
>>>>>>>> be really strange.
>>>>>>>>
>>>>>>>> Hanna
>>>>>>> Hmm I guess. Though just following virtio spec seems cleaner to me...
>>>>>>> vhost-user reconfigures the state fully on start.
>>>>>> Not the internal device state, though. virtiofsd has internal state, and
>>>>>> other devices like vhost-gpu back-ends would probably, too.
>>>>>>
>>>>>> Stefan has recently sent a series
>>>>>> (https://lists.nongnu.org/archive/html/qemu-devel/2023-10/msg00709.html) to
>>>>>> put the reset (RESET_DEVICE) into virtio_reset() (when we really need a
>>>>>> reset).
>>>>>>
>>>>>> I really don’t like our current approach with the status byte. Following the
>>>>>> virtio specification to me would mean that the guest directly controls this
>>>>>> byte, which it does not. qemu makes up values as it deems appropriate, and
>>>>>> this includes sending a SET_STATUS 0 when the guest is just paused, i.e.
>>>>>> when the guest really doesn’t want a device reset.
>>>>>>
>>>>>> That means that qemu does not treat this as a virtio device field (because
>>>>>> that would mean exposing it to the guest driver), but instead treats it as
>>>>>> part of the vhost(-user) protocol. It doesn’t feel right to me that we use
>>>>>> a virtio-defined feature for communication on the vhost level, i.e. between
>>>>>> front-end and back-end, and not between guest driver and device. I think
>>>>>> all vhost-level protocol features should be fully defined in the vhost-user
>>>>>> specification, which REPLY_ACK is.
>>>>> Hmm that makes sense. Maybe we should have done what stefan's patch
>>>>> is doing.
>>>>>
>>>>> Do look at the original commit that introduced it to understand why
>>>>> it was added.
>>>> I don’t understand why this was added to the stop/cont code, though. If it
>>>> is time consuming to make these changes, why are they done every time the VM
>>>> is paused
>>>> and resumed? It makes sense that this would be done for the initial
>>>> configuration (where a reset also wouldn’t hurt), but here it seems wrong.
>>>>
>>>> (To be clear, a reset in the stop/cont code is wrong, because it breaks
>>>> stateful devices.)
>>>>
>>>> Also, note the newer commits 6f8be29ec17 and c3716f260bf. The reset as
>>>> originally introduced was wrong even for non-stateful devices, because it
>>>> occurred before we fetched the state (vring indices) so we could restore it
>>>> later. I don’t know how 923b8921d21 was tested, but if the back-end used
>>>> for testing implemented SET_STATUS 0 as a reset, it could not have survived
>>>> either migration or a stop/cont in general, because the vring indices would
>>>> have been reset to 0.
>>>>
>>>> What I’m saying is, 923b8921d21 introduced SET_STATUS calls that broke all
>>>> devices that would implement them as per virtio spec, and even today it’s
>>>> broken for stateful devices. The mentioned performance issue is likely
>>>> real, but we can’t address it by making up SET_STATUS calls that are wrong.
>>>>
>>>> I concede that I didn’t think about DRIVER_OK. Personally, I would do all
>>>> final configuration that would happen upon a DRIVER_OK once the first vring
>>>> is started (i.e. receives a kick). That has the added benefit of being
>>>> asynchronous because it doesn’t block any vhost-user messages (which are
>>>> synchronous, and thus block downtime).
>>>>
>>>> Hanna
>>> For better or worse kick is per ring. It's out of spec to start rings
>>> that were not kicked but I guess you could do configuration ...
>>> Seems somewhat asymmetrical though.
>>>
>>> Let's wait until next week, hopefully Yajun Wu will answer.
>> The main motivation of adding VHOST_USER_SET_STATUS is to let backend
>> DPDK know
>> when DRIVER_OK bit is valid. It's an indication of all VQ configuration
>> has sent,
>> otherwise DPDK has to rely on first queue pair is ready, then
>> receiving/applying
>> VQ configuration one by one.
>>
>> During live migration, configuring VQ one by one is very time consuming.
>> For VIRTIO
>> net vDPA, HW needs to know how many VQs are enabled to set
>> RSS(Receive-Side Scaling).
>>
>> If you don’t want SET_STATUS message, backend can remove protocol
>> feature bit
>> VHOST_USER_PROTOCOL_F_STATUS.
>> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device
>> close/reset.
> This is incorrect, resetting the device on GET_VRING_BASE breaks
> the stop/cont. Since you don't want to reset the VQs on stop/cont.
Sorry for the misunderstanding, dpdk vhost backend framework doesn't
have RESET concept(only device level .dev_conf and .dev_close). On
receiving DRIVER_OK does dev_conf, on receiving GET_VRING_BASE does
dev_close. For every VM suspend/resume, dpdk issues dev_close then dev_conf.
>
>> I'm not involved in discussion about adding SET_STATUS in Vhost
>> protocol. This feature
>> is essential for vDPA(same as vhost-vdpa implements VHOST_VDPA_SET_STATUS).
>>
>> Thanks,
>> Yajun
>>>>>> Now, we could hand full control of the status byte to the guest, and that
>>>>>> would make me content. But I feel like that doesn’t really work, because
>>>>>> qemu needs to intercept the status byte anyway (it needs to know when there
>>>>>> is a reset, probably wants to know when the device is configured, etc.), so
>>>>>> I don’t think having the status byte in vhost-user really gains us much when
>>>>>> qemu could translate status byte changes to/from other vhost-user commands.
>>>>>>
>>>>>> Hanna
>>>>> well it intercepts it but I think it could pass it on unchanged.
>>>>>
>>>>>
>>>>>>> I guess symmetry was the
>>>>>>> point. So I don't see why SET_STATUS 0 has to be ignored.
>>>>>>>
>>>>>>>
>>>>>>> SET_STATUS was introduced by:
>>>>>>>
>>>>>>> commit 923b8921d210763359e96246a58658ac0db6c645
>>>>>>> Author: Yajun Wu <yajunw@nvidia.com>
>>>>>>> Date: Mon Oct 17 14:44:52 2022 +0800
>>>>>>>
>>>>>>> vhost-user: Support vhost_dev_start
>>>>>>>
>>>>>>> CC the author.
>>>>>>>
>> _______________________________________________
>> Virtio-fs mailing list
>> Virtio-fs@redhat.com
>> https://listman.redhat.com/mailman/listinfo/virtio-fs
>
>
> --
> German
>
next prev parent reply other threads:[~2023-10-10 3:02 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-04 12:58 [PATCH v4 0/8] vhost-user: Back-end state migration Hanna Czenczek
2023-10-04 12:58 ` [PATCH v4 1/8] vhost-user.rst: Deprecate [GS]ET_STATUS Hanna Czenczek
2023-10-05 17:08 ` Stefan Hajnoczi
2023-10-05 17:15 ` Michael S. Tsirkin
2023-10-06 7:48 ` [Virtio-fs] (no subject) Hanna Czenczek
2023-10-06 8:45 ` Michael S. Tsirkin
2023-10-06 9:15 ` Hanna Czenczek
2023-10-06 9:26 ` Michael S. Tsirkin
2023-10-06 9:47 ` Hanna Czenczek
2023-10-06 10:34 ` Michael S. Tsirkin
2023-10-06 11:42 ` Hanna Czenczek
2023-10-06 15:17 ` Alex Bennée
2023-10-06 15:47 ` Hanna Czenczek
2023-10-06 20:49 ` Alex Bennée
2023-10-09 8:07 ` Hanna Czenczek
2023-10-07 2:22 ` Yajun Wu
2023-10-09 8:21 ` Hanna Czenczek
2023-10-09 9:07 ` Hanna Czenczek
2023-10-09 9:13 ` Hanna Czenczek
2023-10-10 4:00 ` Yajun Wu
2023-10-10 8:18 ` Hanna Czenczek
2023-10-10 10:36 ` Alex Bennée
2023-10-10 13:18 ` Hanna Czenczek
2023-10-10 14:35 ` Alex Bennée
2023-10-13 18:02 ` Hanna Czenczek
2023-10-17 7:49 ` Viresh Kumar
2023-10-17 8:13 ` Hanna Czenczek
2023-10-09 10:28 ` German Maglione
2023-10-10 2:56 ` Yajun Wu [this message]
2023-10-10 10:04 ` German Maglione
2023-10-04 12:58 ` [PATCH v4 2/8] vhost-user.rst: Improve [GS]ET_VRING_BASE doc Hanna Czenczek
2023-10-05 17:38 ` Stefan Hajnoczi
2023-10-06 7:53 ` [Virtio-fs] " Hanna Czenczek
2023-10-06 8:49 ` Michael S. Tsirkin
2023-10-06 13:55 ` Hanna Czenczek
2023-10-06 13:58 ` Hanna Czenczek
2023-10-07 21:29 ` Michael S. Tsirkin
2023-10-07 21:27 ` Michael S. Tsirkin
2023-10-04 12:58 ` [PATCH v4 3/8] vhost-user.rst: Clarify enabling/disabling vrings Hanna Czenczek
2023-10-05 17:43 ` Stefan Hajnoczi
2023-10-18 12:14 ` Michael S. Tsirkin
2023-10-18 16:17 ` Hanna Czenczek
2023-10-04 12:59 ` [PATCH v4 4/8] vhost-user.rst: Introduce suspended state Hanna Czenczek
2023-10-05 17:44 ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 5/8] vhost-user.rst: Migrating back-end-internal state Hanna Czenczek
2023-10-05 17:46 ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 6/8] vhost-user: Interface for migration state transfer Hanna Czenczek
2023-10-05 17:46 ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 7/8] vhost: Add high-level state save/load functions Hanna Czenczek
2023-10-05 17:46 ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 8/8] vhost-user-fs: Implement internal migration Hanna Czenczek
2023-10-05 17:46 ` Stefan Hajnoczi
2023-10-05 17:48 ` [PATCH v4 0/8] vhost-user: Back-end state migration Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ccfb153c-0e2e-15dd-b543-5b5a8bebbff4@nvidia.com \
--to=yajunw@nvidia.com \
--cc=antonkuchin@yandex-team.ru \
--cc=eperezma@redhat.com \
--cc=gmaglione@redhat.com \
--cc=hreitz@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=mst@redhat.com \
--cc=parav@nvidia.com \
--cc=qemu-devel@nongnu.org \
--cc=virtio-fs@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).