qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Tiwei Bie <tiwei.bie@intel.com>,
	cunming.liang@intel.com, qemu-devel@nongnu.org,
	peterx@redhat.com, zhihong.wang@intel.com, dan.daly@intel.com
Subject: Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Date: Thu, 12 Apr 2018 15:24:37 +0800	[thread overview]
Message-ID: <c6aec3eb-602e-1802-b7b8-800ebc77cb99@redhat.com> (raw)
In-Reply-To: <20180412063824-mutt-send-email-mst@kernel.org>



On 2018年04月12日 11:41, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:37:35AM +0800, Jason Wang wrote:
>>
>> On 2018年04月12日 09:57, Michael S. Tsirkin wrote:
>>> On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
>>>> On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
>>>>> On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
>>>>>> On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
>>>>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
>>>>>>>> feature for vhost-user. By default, vhost-user backend needs
>>>>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs.
>>>>>>>> With this protocol feature negotiated, QEMU will provide all
>>>>>>>> the IOTLBs to vhost-user backend without waiting for the
>>>>>>>> queries from backend. This is helpful when using a hardware
>>>>>>>> accelerator which is not able to handle unknown IOVAs at the
>>>>>>>> vhost-user backend.
>>>>>>>>
>>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>>>> This is potentially a large amount of data to be sent
>>>>>>> on a socket.
>>>>>> If we take the hardware accelerator out of this picture, we
>>>>>> will find that it's actually a question of "pre-loading" vs
>>>>>> "lazy-loading". I think neither of them is perfect.
>>>>>>
>>>>>> For "pre-loading", as you said, we may have a tough starting.
>>>>>> But for "lazy-loading", we can't have a predictable performance.
>>>>>> A sudden, unexpected performance drop may happen at any time,
>>>>>> because we may meet an unknown IOVA at any time in this case.
>>>>> That's how hardware behaves too though. So we can expect guests
>>>>> to try to optimize locality.
>>>> The difference is that, the software implementation needs to
>>>> query the mappings via socket. And it's much slower..
>>> If you are proposing this new feature as an optimization,
>>> then I'd like to see numbers showing the performance gains.
>>>
>>> It's definitely possible to optimize things out.  Pre-loading isn't
>>> where I would start optimizing though.  For example, DPDK could have its
>>> own VTD emulation, then it could access guest memory directly.
>> Have vtd emulation in dpdk have many disadvantages:
>>
>> - vendor locked, can only work for intel
> I don't see what would prevent other vendors from doing the same.

Technically it can, two questions here:

- Shouldn't we keep vhost-user vendor/transport independent?
- Do we really prefer the split device model here, it means to implement 
datapath in two places at least. Personally I prefer to keep all virt 
stuffs inside qemu.

>
>> - duplication of codes and bugs
>> - a huge number of new message types needs to be invented
> Oh, just the flush I'd wager.

Not only flush, but also error reporting, context entry programming and 
even PRS in the future. And we need a feature negotiation between them 
like vhost to keep the compatibility for future features. This sounds 
not good.

>
>> So I tend to go to a reverse way, link dpdk to qemu.
> Won't really help as people want to build software using dpdk.

Well I believe the main use case it vDPA which is hardware virtio 
offload. For building software using dpdk like ovs-dpdk, it's another 
interesting topic. We can seek solution other than linking dpdk to qemu, 
e.g we can do all virtio and packet copy stuffs inside a qemu IOThread 
and use another inter-process channel to communicate with OVS-dpdk (or 
another virtio-user here). The key is to hide all virtualization details 
from OVS-dpdk.

>
>
>>>
>>>>>> Once we meet an unknown IOVA, the backend's data path will need
>>>>>> to stop and query the mapping of the IOVA via the socket and
>>>>>> wait for the reply. And the latency is not negligible (sometimes
>>>>>> it's even unacceptable), especially in high performance network
>>>>>> case. So maybe it's better to make both of them available to
>>>>>> the vhost backend.
>>>>>>
>>>>>>> I had an impression that a hardware accelerator was using
>>>>>>> VFIO anyway. Given this, can't we have QEMU program
>>>>>>> the shadow IOMMU tables into VFIO directly?
>>>>>> I think it's a good idea! Currently, my concern about it is
>>>>>> that, the hardware device may not use IOMMU and it may have
>>>>>> its builtin address translation unit. And it will be a pain
>>>>>> for device vendors to teach VFIO to be able to work with the
>>>>>> builtin address translation unit.
>>>>> I think such drivers would have to interate with VFIO somehow.
>>>>> Otherwise, what is the plan for assigning such devices then?
>>>> Such devices are just for vhost data path acceleration.
>>> That's not true I think.  E.g. RDMA devices have an on-card MMU.
>>>
>>>> They have many available queue pairs, the switch logic
>>>> will be done among those queue pairs. And different queue
>>>> pairs will serve different VMs directly.
>>>>
>>>> Best regards,
>>>> Tiwei Bie
>>> The way I would do it is attach different PASID values to
>>> different queues. This way you can use the standard IOMMU
>>> to enforce protection.
>> So that's just shared virtual memory on host which can share iova address
>> space between a specific queue pair and a process. I'm not sure how hard can
>> exist vhost-user backend to support this.
>>
>> Thanks
> That would be VFIO's job, nothing to do with vhost-user besides
> sharing the VFIO descriptor.

At least dpdk need to offload DMA mapping setup to qemu.

Thanks

  reply	other threads:[~2018-04-12  7:25 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-11  7:20 [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature Tiwei Bie
2018-04-11  8:00 ` Peter Xu
2018-04-11  8:25   ` Tiwei Bie
2018-04-11  8:37     ` Peter Xu
2018-04-11  8:55       ` Tiwei Bie
2018-04-11  9:16         ` Peter Xu
2018-04-11  9:25           ` Tiwei Bie
2018-04-11  8:01 ` Jason Wang
2018-04-11  8:38   ` Tiwei Bie
2018-04-11 13:41     ` Jason Wang
2018-04-11 17:00       ` Michael S. Tsirkin
2018-04-12  3:23         ` Jason Wang
2018-04-12  3:37           ` Michael S. Tsirkin
2018-04-12  3:56             ` Jason Wang
2018-04-11 17:37     ` Michael S. Tsirkin
2018-04-12  1:44       ` Tiwei Bie
2018-04-12  7:38         ` Jason Wang
2018-04-12  8:10           ` Tiwei Bie
2018-04-12  9:40             ` Jason Wang
2018-04-11 13:22 ` Michael S. Tsirkin
2018-04-11 13:42   ` Jason Wang
2018-04-12  1:10   ` Tiwei Bie
2018-04-12  1:29     ` Michael S. Tsirkin
2018-04-12  1:39       ` Tiwei Bie
2018-04-12  1:57         ` Michael S. Tsirkin
2018-04-12  2:35           ` Tiwei Bie
2018-04-12  3:20             ` Michael S. Tsirkin
2018-04-12  3:35               ` Michael S. Tsirkin
2018-04-12  3:43                 ` Jason Wang
2018-04-12  4:19                   ` Michael S. Tsirkin
2018-04-12  3:37           ` Jason Wang
2018-04-12  3:41             ` Michael S. Tsirkin
2018-04-12  7:24               ` Jason Wang [this message]
2018-04-16  7:47 ` Stefan Hajnoczi
2018-04-17  2:14   ` Jason Wang
2018-04-17  2:35   ` Tiwei Bie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6aec3eb-602e-1802-b7b8-800ebc77cb99@redhat.com \
    --to=jasowang@redhat.com \
    --cc=cunming.liang@intel.com \
    --cc=dan.daly@intel.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tiwei.bie@intel.com \
    --cc=zhihong.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).