From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
To: Si-Wei Liu <si-wei.liu@oracle.com>, Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
virtualization <virtualization@lists.linux-foundation.org>,
netdev <netdev@vger.kernel.org>, kvm <kvm@vger.kernel.org>,
Parav Pandit <parav@nvidia.com>,
Yongji Xie <xieyongji@bytedance.com>,
"Dawar, Gautam" <gautam.dawar@amd.com>
Subject: Re: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev
Date: Mon, 22 Aug 2022 13:07:55 +0800 [thread overview]
Message-ID: <e06d1f6d-3199-1b75-d369-2e5d69040271@intel.com> (raw)
In-Reply-To: <4678fc51-a402-d3ea-e875-6eba175933ba@oracle.com>
On 8/20/2022 4:55 PM, Si-Wei Liu wrote:
>
>
> On 8/18/2022 5:42 PM, Jason Wang wrote:
>> On Fri, Aug 19, 2022 at 7:20 AM Si-Wei Liu <si-wei.liu@oracle.com>
>> wrote:
>>>
>>>
>>> On 8/17/2022 9:15 PM, Jason Wang wrote:
>>>> 在 2022/8/17 18:37, Michael S. Tsirkin 写道:
>>>>> On Wed, Aug 17, 2022 at 05:43:22PM +0800, Zhu, Lingshan wrote:
>>>>>> On 8/17/2022 5:39 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Aug 17, 2022 at 05:13:59PM +0800, Zhu, Lingshan wrote:
>>>>>>>> On 8/17/2022 4:55 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Aug 17, 2022 at 10:14:26AM +0800, Zhu, Lingshan wrote:
>>>>>>>>>> Yes it is a little messy, and we can not check _F_VERSION_1
>>>>>>>>>> because of
>>>>>>>>>> transitional devices, so maybe this is the best we can do for
>>>>>>>>>> now
>>>>>>>>> I think vhost generally needs an API to declare config space
>>>>>>>>> endian-ness
>>>>>>>>> to kernel. vdpa can reuse that too then.
>>>>>>>> Yes, I remember you have mentioned some IOCTL to set the
>>>>>>>> endian-ness,
>>>>>>>> for vDPA, I think only the vendor driver knows the endian,
>>>>>>>> so we may need a new function vdpa_ops->get_endian().
>>>>>>>> In the last thread, we say maybe it's better to add a comment for
>>>>>>>> now.
>>>>>>>> But if you think we should add a vdpa_ops->get_endian(), I can
>>>>>>>> work
>>>>>>>> on it for sure!
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Zhu Lingshan
>>>>>>> I think QEMU has to set endian-ness. No one else knows.
>>>>>> Yes, for SW based vhost it is true. But for HW vDPA, only
>>>>>> the device & driver knows the endian, I think we can not
>>>>>> "set" a hardware's endian.
>>>>> QEMU knows the guest endian-ness and it knows that
>>>>> device is accessed through the legacy interface.
>>>>> It can accordingly send endian-ness to the kernel and
>>>>> kernel can propagate it to the driver.
>>>>
>>>> I wonder if we can simply force LE and then Qemu can do the endian
>>>> conversion?
>>> convert from LE for config space fields only, or QEMU has to forcefully
>>> mediate and covert endianness for all device memory access including
>>> even the datapath (fields in descriptor and avail/used rings)?
>> Former. Actually, I want to force modern devices for vDPA when
>> developing the vDPA framework. But then we see requirements for
>> transitional or even legacy (e.g the Ali ENI parent). So it
>> complicates things a lot.
>>
>> I think several ideas has been proposed:
>>
>> 1) Your proposal of having a vDPA specific way for
>> modern/transitional/legacy awareness. This seems very clean since each
>> transport should have the ability to do that but it still requires
>> some kind of mediation for the case e.g running BE legacy guest on LE
>> host.
> In theory it seems like so, though practically I wonder if we can just
> forbid BE legacy driver from running on modern LE host. For those who
> care about legacy BE guest, they mostly like could and should talk to
> vendor to get native BE support to achieve hardware acceleration, few
> of them would count on QEMU in mediating or emulating the datapath
> (otherwise I don't see the benefit of adopting vDPA?). I still feel
> that not every hardware vendor has to offer backward compatibility
> (transitional device) with legacy interface/behavior (BE being just
> one), this is unlike the situation on software virtio device, which
> has legacy support since day one. I think we ever discussed it before:
> for those vDPA vendors who don't offer legacy guest support, maybe we
> should mandate some feature for e.g. VERSION_1, as these devices
> really don't offer functionality of the opposite side (!VERSION_1)
> during negotiation.
>
> Having it said, perhaps we should also allow vendor device to
> implement only partial support for legacy. We can define "reversed"
> backend feature to denote some part of the legacy
> interface/functionality not getting implemented by device. For
> instance, VHOST_BACKEND_F_NO_BE_VRING, VHOST_BACKEND_F_NO_BE_CONFIG,
> VHOST_BACKEND_F_NO_ALIGNED_VRING,
> VHOST_BACKEND_NET_F_NO_WRITEABLE_MAC, and et al. Not all of these
> missing features for legacy would be easy for QEMU to make up for, so
> QEMU can selectively emulate those at its best when necessary and
> applicable. In other word, this design shouldn't prevent QEMU from
> making up for vendor device's partial legacy support.
>
>>
>> 2) Michael suggests using VHOST_SET_VRING_ENDIAN where it means we
>> need a new config ops for vDPA bus, but it doesn't solve the issue for
>> config space (at least from its name). We probably need a new ioctl
>> for both vring and config space.
> Yep adding a new ioctl makes things better, but I think the key is not
> the new ioctl. It's whether or not we should enforce every vDPA vendor
> driver to implement all transitional interfaces to be spec compliant.
> If we allow them to reject the VHOST_SET_VRING_ENDIAN or
> VHOST_SET_CONFIG_ENDIAN call, what could we do? We would still end up
> with same situation of either fail the guest, or trying to
> mediate/emulate, right?
>
> Not to mention VHOST_SET_VRING_ENDIAN is rarely supported by vhost
> today - few distro kernel has CONFIG_VHOST_CROSS_ENDIAN_LEGACY enabled
> and QEMU just ignores the result. vhost doesn't necessarily depend on
> it to determine endianness it looks.
I would like to suggest to add two new config ops get/set_vq_endian()
and get/set_config_endian() for vDPA. This is used to:
a) support VHOST_GET/SET_VRING_ENDIAN as MST suggested, and add
VHOST_SET/GET_CONFIG_ENDIAN for vhost_vdpa.
If the device has not implemented interface to set its endianess, then
no matter success or failure of SET_ENDIAN, QEMU knows the endian-ness
anyway. In this case, if the device endianess does not match the guest,
there needs a mediation layer or fail.
b) ops->get_config_endian() can always tell the endian-ness of the
device config space after the vendor driver probing the device. So we
can use this ops->get_config_endian() for
MTU, MAC and other fields handling in vdpa_dev_net_config_fill() and we
don't need to set_features in vdpa_get_config_unlocked(), so no race
conditions.
Every time ops->get_config() returned, we can tell the endian by
ops-config_>get_endian(), we don't need set_features(xxx, 0) if features
negotiation not done.
The question is: Do we need two pairs of ioctls for both vq and config
space? Can config space endian-ness differ from the vqs?
c) do we need a new netlink attr telling the endian-ness to user space?
Thanks,
Zhu Lingshan
>
>>
>> or
>>
>> 3) revisit the idea of forcing modern only device which may simplify
>> things a lot
> I am not actually against forcing modern only config space, given that
> it's not hard for either QEMU or individual driver to mediate or
> emulate, and for the most part it's not conflict with the goal of
> offload or acceleration with vDPA. But forcing LE ring layout IMO
> would just kill off the potential of a very good use case. Currently
> for our use case the priority for supporting 0.9.5 guest with vDPA is
> slightly lower compared to live migration, but it is still in our TODO
> list.
>
> Thanks,
> -Siwei
>
>>
>> which way should we go?
>>
>>> I hope
>>> it's not the latter, otherwise it loses the point to use vDPA for
>>> datapath acceleration.
>>>
>>> Even if its the former, it's a little weird for vendor device to
>>> implement a LE config space with BE ring layout, although still
>>> possible...
>> Right.
>>
>> Thanks
>>
>>> -Siwei
>>>> Thanks
>>>>
>>>>
>>>>>> So if you think we should add a vdpa_ops->get_endian(),
>>>>>> I will drop these comments in the next version of
>>>>>> series, and work on a new patch for get_endian().
>>>>>>
>>>>>> Thanks,
>>>>>> Zhu Lingshan
>>>>> Guests don't get endian-ness from devices so this seems pointless.
>>>>>
>
next prev parent reply other threads:[~2022-08-22 5:08 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-15 9:26 [PATCH 0/2] allow userspace to query device features Zhu Lingshan
2022-08-15 9:26 ` [PATCH 1/2] vDPA: allow userspace to query features of a vDPA device Zhu Lingshan
2022-08-15 18:15 ` Si-Wei Liu
2022-08-16 1:49 ` Zhu, Lingshan
2022-08-16 2:07 ` Parav Pandit
2022-08-16 4:21 ` Zhu, Lingshan
2022-08-15 9:26 ` [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev Zhu Lingshan
2022-08-15 15:52 ` Michael S. Tsirkin
2022-08-15 23:32 ` Si-Wei Liu
2022-08-16 1:58 ` Zhu, Lingshan
2022-08-16 4:26 ` Zhu, Lingshan
2022-08-16 7:58 ` Si-Wei Liu
2022-08-16 9:08 ` Zhu, Lingshan
2022-08-16 23:14 ` Si-Wei Liu
2022-08-17 2:14 ` Zhu, Lingshan
2022-08-17 8:55 ` Michael S. Tsirkin
2022-08-17 9:13 ` Zhu, Lingshan
2022-08-17 9:39 ` Michael S. Tsirkin
2022-08-17 9:43 ` Zhu, Lingshan
2022-08-17 10:37 ` Michael S. Tsirkin
2022-08-18 4:15 ` Jason Wang
2022-08-18 7:58 ` Zhu, Lingshan
2022-08-18 23:20 ` Si-Wei Liu
2022-08-19 0:42 ` Jason Wang
2022-08-19 3:52 ` Michael S. Tsirkin
2022-08-20 8:55 ` Si-Wei Liu
2022-08-22 5:07 ` Zhu, Lingshan [this message]
2022-08-23 3:26 ` Jason Wang
2022-08-23 6:52 ` Zhu, Lingshan
2022-08-30 9:43 ` Zhu, Lingshan
2022-08-26 6:23 ` Si-Wei Liu
2022-09-02 6:03 ` Jason Wang
2022-09-02 6:14 ` Michael S. Tsirkin
2022-09-05 3:54 ` Jason Wang
2022-08-16 2:32 ` Parav Pandit
2022-08-16 4:18 ` Zhu, Lingshan
2022-08-16 21:02 ` Parav Pandit
2022-08-16 21:09 ` Michael S. Tsirkin
2022-08-17 2:03 ` Zhu, Lingshan
2022-08-18 4:18 ` Jason Wang
2022-08-18 6:38 ` Zhu, Lingshan
2022-08-18 17:20 ` Parav Pandit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e06d1f6d-3199-1b75-d369-2e5d69040271@intel.com \
--to=lingshan.zhu@intel.com \
--cc=gautam.dawar@amd.com \
--cc=jasowang@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=parav@nvidia.com \
--cc=si-wei.liu@oracle.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=xieyongji@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox