From: Hanna Czenczek <hreitz@redhat.com>
To: Yajun Wu <yajunw@nvidia.com>, "Michael S. Tsirkin" <mst@redhat.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"virtio-fs@redhat.com" <virtio-fs@redhat.com>,
"Eugenio Pérez" <eperezma@redhat.com>,
"Anton Kuchin" <antonkuchin@yandex-team.ru>,
"Parav Pandit" <parav@nvidia.com>,
"maxime.coquelin@redhat.com" <maxime.coquelin@redhat.com>,
"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [Virtio-fs] (no subject)
Date: Tue, 10 Oct 2023 10:18:29 +0200 [thread overview]
Message-ID: <9a36a319-4567-f297-f14a-2025792ae93f@redhat.com> (raw)
In-Reply-To: <8f51f02b-4676-c566-7304-f63e76df74ba@nvidia.com>
On 10.10.23 06:00, Yajun Wu wrote:
>
> On 10/9/2023 5:13 PM, Hanna Czenczek wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 09.10.23 11:07, Hanna Czenczek wrote:
>>> On 09.10.23 10:21, Hanna Czenczek wrote:
>>>> On 07.10.23 04:22, Yajun Wu wrote:
>>> [...]
>>>
>>>>> The main motivation of adding VHOST_USER_SET_STATUS is to let
>>>>> backend DPDK know
>>>>> when DRIVER_OK bit is valid. It's an indication of all VQ
>>>>> configuration has sent,
>>>>> otherwise DPDK has to rely on first queue pair is ready, then
>>>>> receiving/applying
>>>>> VQ configuration one by one.
>>>>>
>>>>> During live migration, configuring VQ one by one is very time
>>>>> consuming.
>>>> One question I have here is why it wasn’t then introduced in the live
>>>> migration code, but in the general VM stop/cont code instead. It does
>>>> seem time-consuming to do this every time the VM is paused and
>>>> resumed.
>
> Yes, VM stop/cont will call vhost_net_stop/vhost_net_start. Maybe
> because there's no device level stop/cont vhost message?
No, it is because qemu will reset the status in stop/cont*, which it
should not do. Aside from guest-initiated resets, the only thing where
a reset comes into play is when the back-end is changed, e.g. during
migration. In that case, the source back-end will see a disconnect on
the vhost-user socket and can then do whatever uninitialization it needs
to do, and the destination front-end will need to be reconfigured by
qemu anyway, because it’s just a case of the destination qemu initiating
a fresh connection to a new back-end (except that it will need to
restore the state from the source).
*Yes, technically, dpdk will ignore that reset, but it still stops the
device on a different message (when it should just pause processing
vrings), so the outcome is the same.
>>>>
>>>>> For VIRTIO
>>>>> net vDPA, HW needs to know how many VQs are enabled to set
>>>>> RSS(Receive-Side Scaling).
>>>>>
>>>>> If you don’t want SET_STATUS message, backend can remove protocol
>>>>> feature bit
>>>>> VHOST_USER_PROTOCOL_F_STATUS.
>>>> The problem isn’t back-ends that don’t want the message, the problem
>>>> is that qemu uses the message wrongly, which prevents well-behaving
>>>> back-ends from implementing the message.
>>>>
>>>>> DPDK is ignoring SET_STATUS 0, but using GET_VRING_BASE to do device
>>>>> close/reset.
>>>> So the right thing to do for back-ends is to announce STATUS support
>>>> and then not implement it correctly?
>>>>
>>>> GET_VRING_BASE should not reset the close or reset the device, by the
>>>> way. It should stop that one vring, not more. We have a
>>>> RESET_DEVICE command for resetting.
> I believe dpdk uses GET_VRING_BASE long before qemu has RESET_DEVICE?
I don’t think it matters who came first. What matters is the
specification, and that dpdk decided to rely on implementation-specific
behavior without having all involved parties agree by matters of putting
that in the specification. And now dpdk clearly deviates from the
specification as a result of that action, which can result in problems
if the front-end doesn’t do what qemu always used to do. (E.g. the
front-end might just send GET_VRING_BASE for all vrings when suspending
the guest, and then only send kicks on resume to re-start the vrings.
dpdk would most likely be left in a state where the whole device is
stopped, expecting DRIVER_OK. Same thing in general for front-ends that
don’t support F_STATUS.)
> It's a compatible issue. For new backend implements, we can have
> better solution, right?
The fact that dpdk and qemu deviate from the specification is a problem
as-is.
>>>>> I'm not involved in discussion about adding SET_STATUS in Vhost
>>>>> protocol. This feature
>>>>> is essential for vDPA(same as vhost-vdpa implements
>>>>> VHOST_VDPA_SET_STATUS).
>>>> So from what I gather from your response is that there is only a
>>>> single use for SET_STATUS, which is the DRIVER_OK bit. If so,
>>>> documenting that all other bits are to be ignored by both back-end
>>>> and front-end would be fine by me.
>>>>
>>>> I’m not fully serious about that suggestion, but I hear the strong
>>>> implication that nothing but DRIVER_OK was of any concern, and this
>>>> is really important to note when we talk about the status of the
>>>> STATUS feature in vhost today. It seems to me now that it was not
>>>> intended to be the virtio-level status byte, but just a DRIVER_OK
>>>> signalling path from front-end to back-end. That makes it a
>>>> vhost-level protocol feature to me.
>>> On second thought, it just is a pure vhost-level protocol feature, and
>>> has nothing to do with the virtio status byte as-is. The only stated
>>> purpose is for the front-end to send DRIVER_OK after migration, but
>>> migration is transparent to the guest, so the guest would never change
>>> the status byte during migration. Therefore, if this feature is
>>> essential, we will never be able to have a status byte that is
>>> transparently shared between guest and back-end device, i.e. the
>>> virtio status byte.
>> On third thought, scratch that. The guest wouldn’t set it, but
>> naturally, after migration, the front-end will need to restore the
>> status byte from the source, so the front-end will always need to set
>> it, even if it were otherwise used controlled only by the guest and the
>> back-end device. So technically, this doesn’t prevent such a use case.
>> (In practice, it isn’t controlled by the guest right now, but that could
>> be fixed.)
> I only tested the feature with DPDK(the only backend use it today?).
> Max defined the protocol and added the corresponding code in DPDK
> before I added QEMU support. If other backend or different device type
> want to use this, we can have further discussion?
So as far as I understand, the feature is supposed to rely on
implementation-specific behavior between specifically qemu as a
front-end and dpdk as a back-end, nothing else. Honestly, that to me is
a very good reason to deprecate it. That would make it clear that any
implementation that implements it does so because it relies on
implementation-specific behavior from other implementations.
Option 2 is to fix it. It is not right to use this broadly defined
feature with its clear protocol as given in the virtio specification
just to set and clear a single bit (DRIVER_OK). The vhost-user
specification points to that virtio protocol. We must adhere to the
protocol. And note that we must not reset devices just because the VM
is paused/resumed. (That is why I wanted to deprecate SET_STATUS, so
that Stefan’s series would introduce RESET_DEVICE where we need it, and
we can (for now) ignore the SET_STATUS 0 in vhost_dev_stop().)
Option 3 would be to just be honest in the specification, and limit the
scope of F_STATUS to say the only bit that matters is DRIVER_OK. I
would say this is not really different from deprecating, though it
wouldn’t affect your case. However, I understand Alex relies on a full
status byte. I’m still interested to know why that is.
Option 4 is of course not to do anything, and leave everything as-is,
waiting for the next person to stir the hornet’s nest.
>>> Cc-ing Alex on this mail, because to me, this seems like an important
>>> detail when he plans on using the byte in the future. If we need a
>>> virtio status byte, I can’t see how we could use the existing F_STATUS
>>> for it.
>>>
>>> Hanna
>
next prev parent reply other threads:[~2023-10-10 8:19 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-04 12:58 [PATCH v4 0/8] vhost-user: Back-end state migration Hanna Czenczek
2023-10-04 12:58 ` [PATCH v4 1/8] vhost-user.rst: Deprecate [GS]ET_STATUS Hanna Czenczek
2023-10-05 17:08 ` Stefan Hajnoczi
2023-10-05 17:15 ` Michael S. Tsirkin
2023-10-06 7:48 ` [Virtio-fs] (no subject) Hanna Czenczek
2023-10-06 8:45 ` Michael S. Tsirkin
2023-10-06 9:15 ` Hanna Czenczek
2023-10-06 9:26 ` Michael S. Tsirkin
2023-10-06 9:47 ` Hanna Czenczek
2023-10-06 10:34 ` Michael S. Tsirkin
2023-10-06 11:42 ` Hanna Czenczek
2023-10-06 15:17 ` Alex Bennée
2023-10-06 15:47 ` Hanna Czenczek
2023-10-06 20:49 ` Alex Bennée
2023-10-09 8:07 ` Hanna Czenczek
2023-10-07 2:22 ` Yajun Wu
2023-10-09 8:21 ` Hanna Czenczek
2023-10-09 9:07 ` Hanna Czenczek
2023-10-09 9:13 ` Hanna Czenczek
2023-10-10 4:00 ` Yajun Wu
2023-10-10 8:18 ` Hanna Czenczek [this message]
2023-10-10 10:36 ` Alex Bennée
2023-10-10 13:18 ` Hanna Czenczek
2023-10-10 14:35 ` Alex Bennée
2023-10-13 18:02 ` Hanna Czenczek
2023-10-17 7:49 ` Viresh Kumar
2023-10-17 8:13 ` Hanna Czenczek
2023-10-09 10:28 ` German Maglione
2023-10-10 2:56 ` Yajun Wu
2023-10-10 10:04 ` German Maglione
2023-10-04 12:58 ` [PATCH v4 2/8] vhost-user.rst: Improve [GS]ET_VRING_BASE doc Hanna Czenczek
2023-10-05 17:38 ` Stefan Hajnoczi
2023-10-06 7:53 ` [Virtio-fs] " Hanna Czenczek
2023-10-06 8:49 ` Michael S. Tsirkin
2023-10-06 13:55 ` Hanna Czenczek
2023-10-06 13:58 ` Hanna Czenczek
2023-10-07 21:29 ` Michael S. Tsirkin
2023-10-07 21:27 ` Michael S. Tsirkin
2023-10-04 12:58 ` [PATCH v4 3/8] vhost-user.rst: Clarify enabling/disabling vrings Hanna Czenczek
2023-10-05 17:43 ` Stefan Hajnoczi
2023-10-18 12:14 ` Michael S. Tsirkin
2023-10-18 16:17 ` Hanna Czenczek
2023-10-04 12:59 ` [PATCH v4 4/8] vhost-user.rst: Introduce suspended state Hanna Czenczek
2023-10-05 17:44 ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 5/8] vhost-user.rst: Migrating back-end-internal state Hanna Czenczek
2023-10-05 17:46 ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 6/8] vhost-user: Interface for migration state transfer Hanna Czenczek
2023-10-05 17:46 ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 7/8] vhost: Add high-level state save/load functions Hanna Czenczek
2023-10-05 17:46 ` Stefan Hajnoczi
2023-10-04 12:59 ` [PATCH v4 8/8] vhost-user-fs: Implement internal migration Hanna Czenczek
2023-10-05 17:46 ` Stefan Hajnoczi
2023-10-05 17:48 ` [PATCH v4 0/8] vhost-user: Back-end state migration Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a36a319-4567-f297-f14a-2025792ae93f@redhat.com \
--to=hreitz@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=antonkuchin@yandex-team.ru \
--cc=eperezma@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=mst@redhat.com \
--cc=parav@nvidia.com \
--cc=qemu-devel@nongnu.org \
--cc=virtio-fs@redhat.com \
--cc=yajunw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).