Re: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: "Zhu, Lingshan" <lingshan.zhu@intel.com>
To: Si-Wei Liu <si-wei.liu@oracle.com>, Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	virtualization <virtualization@lists.linux-foundation.org>,
	netdev <netdev@vger.kernel.org>, kvm <kvm@vger.kernel.org>,
	Parav Pandit <parav@nvidia.com>,
	Yongji Xie <xieyongji@bytedance.com>,
	"Dawar, Gautam" <gautam.dawar@amd.com>
Subject: Re: [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev
Date: Mon, 22 Aug 2022 13:07:55 +0800	[thread overview]
Message-ID: <e06d1f6d-3199-1b75-d369-2e5d69040271@intel.com> (raw)
In-Reply-To: <4678fc51-a402-d3ea-e875-6eba175933ba@oracle.com>



On 8/20/2022 4:55 PM, Si-Wei Liu wrote:
>
>
> On 8/18/2022 5:42 PM, Jason Wang wrote:
>> On Fri, Aug 19, 2022 at 7:20 AM Si-Wei Liu <si-wei.liu@oracle.com> 
>> wrote:
>>>
>>>
>>> On 8/17/2022 9:15 PM, Jason Wang wrote:
>>>> 在 2022/8/17 18:37, Michael S. Tsirkin 写道:
>>>>> On Wed, Aug 17, 2022 at 05:43:22PM +0800, Zhu, Lingshan wrote:
>>>>>> On 8/17/2022 5:39 PM, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Aug 17, 2022 at 05:13:59PM +0800, Zhu, Lingshan wrote:
>>>>>>>> On 8/17/2022 4:55 PM, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, Aug 17, 2022 at 10:14:26AM +0800, Zhu, Lingshan wrote:
>>>>>>>>>> Yes it is a little messy, and we can not check _F_VERSION_1
>>>>>>>>>> because of
>>>>>>>>>> transitional devices, so maybe this is the best we can do for 
>>>>>>>>>> now
>>>>>>>>> I think vhost generally needs an API to declare config space
>>>>>>>>> endian-ness
>>>>>>>>> to kernel. vdpa can reuse that too then.
>>>>>>>> Yes, I remember you have mentioned some IOCTL to set the 
>>>>>>>> endian-ness,
>>>>>>>> for vDPA, I think only the vendor driver knows the endian,
>>>>>>>> so we may need a new function vdpa_ops->get_endian().
>>>>>>>> In the last thread, we say maybe it's better to add a comment for
>>>>>>>> now.
>>>>>>>> But if you think we should add a vdpa_ops->get_endian(), I can 
>>>>>>>> work
>>>>>>>> on it for sure!
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Zhu Lingshan
>>>>>>> I think QEMU has to set endian-ness. No one else knows.
>>>>>> Yes, for SW based vhost it is true. But for HW vDPA, only
>>>>>> the device & driver knows the endian, I think we can not
>>>>>> "set" a hardware's endian.
>>>>> QEMU knows the guest endian-ness and it knows that
>>>>> device is accessed through the legacy interface.
>>>>> It can accordingly send endian-ness to the kernel and
>>>>> kernel can propagate it to the driver.
>>>>
>>>> I wonder if we can simply force LE and then Qemu can do the endian
>>>> conversion?
>>> convert from LE for config space fields only, or QEMU has to forcefully
>>> mediate and covert endianness for all device memory access including
>>> even the datapath (fields in descriptor and avail/used rings)?
>> Former. Actually, I want to force modern devices for vDPA when
>> developing the vDPA framework. But then we see requirements for
>> transitional or even legacy (e.g the Ali ENI parent). So it
>> complicates things a lot.
>>
>> I think several ideas has been proposed:
>>
>> 1) Your proposal of having a vDPA specific way for
>> modern/transitional/legacy awareness. This seems very clean since each
>> transport should have the ability to do that but it still requires
>> some kind of mediation for the case e.g running BE legacy guest on LE
>> host.
> In theory it seems like so, though practically I wonder if we can just 
> forbid BE legacy driver from running on modern LE host. For those who 
> care about legacy BE guest, they mostly like could and should talk to 
> vendor to get native BE support to achieve hardware acceleration, few 
> of them would count on QEMU in mediating or emulating the datapath 
> (otherwise I don't see the benefit of adopting vDPA?). I still feel 
> that not every hardware vendor has to offer backward compatibility 
> (transitional device) with legacy interface/behavior (BE being just 
> one), this is unlike the situation on software virtio device, which 
> has legacy support since day one. I think we ever discussed it before: 
> for those vDPA vendors who don't offer legacy guest support, maybe we 
> should mandate some feature for e.g. VERSION_1, as these devices 
> really don't offer functionality of the opposite side (!VERSION_1) 
> during negotiation.
>
> Having it said, perhaps we should also allow vendor device to 
> implement only partial support for legacy. We can define "reversed" 
> backend feature to denote some part of the legacy 
> interface/functionality not getting implemented by device. For 
> instance, VHOST_BACKEND_F_NO_BE_VRING, VHOST_BACKEND_F_NO_BE_CONFIG, 
> VHOST_BACKEND_F_NO_ALIGNED_VRING, 
> VHOST_BACKEND_NET_F_NO_WRITEABLE_MAC, and et al. Not all of these 
> missing features for legacy would be easy for QEMU to make up for, so 
> QEMU can selectively emulate those at its best when necessary and 
> applicable. In other word, this design shouldn't prevent QEMU from 
> making up for vendor device's partial legacy support.
>
>>
>> 2) Michael suggests using VHOST_SET_VRING_ENDIAN where it means we
>> need a new config ops for vDPA bus, but it doesn't solve the issue for
>> config space (at least from its name). We probably need a new ioctl
>> for both vring and config space.
> Yep adding a new ioctl makes things better, but I think the key is not 
> the new ioctl. It's whether or not we should enforce every vDPA vendor 
> driver to implement all transitional interfaces to be spec compliant. 
> If we allow them to reject the VHOST_SET_VRING_ENDIAN  or 
> VHOST_SET_CONFIG_ENDIAN call, what could we do? We would still end up 
> with same situation of either fail the guest, or trying to 
> mediate/emulate, right?
>
> Not to mention VHOST_SET_VRING_ENDIAN is rarely supported by vhost 
> today - few distro kernel has CONFIG_VHOST_CROSS_ENDIAN_LEGACY enabled 
> and QEMU just ignores the result. vhost doesn't necessarily depend on 
> it to determine endianness it looks.
I would like to suggest to add two new config ops get/set_vq_endian() 
and get/set_config_endian() for vDPA. This is used to:
a) support VHOST_GET/SET_VRING_ENDIAN as MST suggested, and add 
VHOST_SET/GET_CONFIG_ENDIAN for vhost_vdpa.
If the device has not implemented interface to set its endianess, then 
no matter success or failure of SET_ENDIAN, QEMU knows the endian-ness 
anyway. In this case, if the device endianess does not match the guest, 
there needs a mediation layer or fail.
b) ops->get_config_endian() can always tell the endian-ness of the 
device config space after the vendor driver probing the device. So we 
can use this ops->get_config_endian() for
MTU, MAC and other fields handling in vdpa_dev_net_config_fill() and we 
don't need to set_features in vdpa_get_config_unlocked(), so no race 
conditions.
Every time ops->get_config() returned, we can tell the endian by 
ops-config_>get_endian(), we don't need set_features(xxx, 0) if features 
negotiation not done.

The question is: Do we need two pairs of ioctls for both vq and config 
space? Can config space endian-ness differ from the vqs?
c) do we need a new netlink attr telling the endian-ness to user space?

Thanks,
Zhu Lingshan
>
>>
>> or
>>
>> 3) revisit the idea of forcing modern only device which may simplify
>> things a lot
> I am not actually against forcing modern only config space, given that 
> it's not hard for either QEMU or individual driver to mediate or 
> emulate, and for the most part it's not conflict with the goal of 
> offload or acceleration with vDPA. But forcing LE ring layout IMO 
> would just kill off the potential of a very good use case. Currently 
> for our use case the priority for supporting 0.9.5 guest with vDPA is 
> slightly lower compared to live migration, but it is still in our TODO 
> list.
>
> Thanks,
> -Siwei
>
>>
>> which way should we go?
>>
>>> I hope
>>> it's not the latter, otherwise it loses the point to use vDPA for
>>> datapath acceleration.
>>>
>>> Even if its the former, it's a little weird for vendor device to
>>> implement a LE config space with BE ring layout, although still 
>>> possible...
>> Right.
>>
>> Thanks
>>
>>> -Siwei
>>>> Thanks
>>>>
>>>>
>>>>>> So if you think we should add a vdpa_ops->get_endian(),
>>>>>> I will drop these comments in the next version of
>>>>>> series, and work on a new patch for get_endian().
>>>>>>
>>>>>> Thanks,
>>>>>> Zhu Lingshan
>>>>> Guests don't get endian-ness from devices so this seems pointless.
>>>>>
>

next prev parent reply	other threads:[~2022-08-22  5:08 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-15  9:26 [PATCH 0/2] allow userspace to query device features Zhu Lingshan
2022-08-15  9:26 ` [PATCH 1/2] vDPA: allow userspace to query features of a vDPA device Zhu Lingshan
2022-08-15 18:15   ` Si-Wei Liu
2022-08-16  1:49     ` Zhu, Lingshan
2022-08-16  2:07       ` Parav Pandit
2022-08-16  4:21         ` Zhu, Lingshan
2022-08-15  9:26 ` [PATCH 2/2] vDPA: conditionally read fields in virtio-net dev Zhu Lingshan
2022-08-15 15:52   ` Michael S. Tsirkin
2022-08-15 23:32   ` Si-Wei Liu
2022-08-16  1:58     ` Zhu, Lingshan
2022-08-16  4:26       ` Zhu, Lingshan
2022-08-16  7:58       ` Si-Wei Liu
2022-08-16  9:08         ` Zhu, Lingshan
2022-08-16 23:14           ` Si-Wei Liu
2022-08-17  2:14             ` Zhu, Lingshan
2022-08-17  8:55               ` Michael S. Tsirkin
2022-08-17  9:13                 ` Zhu, Lingshan
2022-08-17  9:39                   ` Michael S. Tsirkin
2022-08-17  9:43                     ` Zhu, Lingshan
2022-08-17 10:37                       ` Michael S. Tsirkin
2022-08-18  4:15                         ` Jason Wang
2022-08-18  7:58                           ` Zhu, Lingshan
2022-08-18 23:20                           ` Si-Wei Liu
2022-08-19  0:42                             ` Jason Wang
2022-08-19  3:52                               ` Michael S. Tsirkin
2022-08-20  8:55                               ` Si-Wei Liu
2022-08-22  5:07                                 ` Zhu, Lingshan [this message]
2022-08-23  3:26                                   ` Jason Wang
2022-08-23  6:52                                     ` Zhu, Lingshan
2022-08-30  9:43                                       ` Zhu, Lingshan
2022-08-26  6:23                                     ` Si-Wei Liu
2022-09-02  6:03                                       ` Jason Wang
2022-09-02  6:14                                         ` Michael S. Tsirkin
2022-09-05  3:54                                           ` Jason Wang
2022-08-16  2:32   ` Parav Pandit
2022-08-16  4:18     ` Zhu, Lingshan
2022-08-16 21:02       ` Parav Pandit
2022-08-16 21:09         ` Michael S. Tsirkin
2022-08-17  2:03           ` Zhu, Lingshan
2022-08-18  4:18             ` Jason Wang
2022-08-18  6:38               ` Zhu, Lingshan
2022-08-18 17:20               ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e06d1f6d-3199-1b75-d369-2e5d69040271@intel.com \
    --to=lingshan.zhu@intel.com \
    --cc=gautam.dawar@amd.com \
    --cc=jasowang@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=parav@nvidia.com \
    --cc=si-wei.liu@oracle.com \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=xieyongji@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox