From: Jason Wang <jasowang@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Tiwei Bie <tiwei.bie@intel.com>,
cunming.liang@intel.com, qemu-devel@nongnu.org,
peterx@redhat.com, zhihong.wang@intel.com, dan.daly@intel.com
Subject: Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature
Date: Thu, 12 Apr 2018 15:24:37 +0800 [thread overview]
Message-ID: <c6aec3eb-602e-1802-b7b8-800ebc77cb99@redhat.com> (raw)
In-Reply-To: <20180412063824-mutt-send-email-mst@kernel.org>
On 2018年04月12日 11:41, Michael S. Tsirkin wrote:
> On Thu, Apr 12, 2018 at 11:37:35AM +0800, Jason Wang wrote:
>>
>> On 2018年04月12日 09:57, Michael S. Tsirkin wrote:
>>> On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote:
>>>> On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote:
>>>>> On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote:
>>>>>> On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote:
>>>>>>> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote:
>>>>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB
>>>>>>>> feature for vhost-user. By default, vhost-user backend needs
>>>>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs.
>>>>>>>> With this protocol feature negotiated, QEMU will provide all
>>>>>>>> the IOTLBs to vhost-user backend without waiting for the
>>>>>>>> queries from backend. This is helpful when using a hardware
>>>>>>>> accelerator which is not able to handle unknown IOVAs at the
>>>>>>>> vhost-user backend.
>>>>>>>>
>>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>>>> This is potentially a large amount of data to be sent
>>>>>>> on a socket.
>>>>>> If we take the hardware accelerator out of this picture, we
>>>>>> will find that it's actually a question of "pre-loading" vs
>>>>>> "lazy-loading". I think neither of them is perfect.
>>>>>>
>>>>>> For "pre-loading", as you said, we may have a tough starting.
>>>>>> But for "lazy-loading", we can't have a predictable performance.
>>>>>> A sudden, unexpected performance drop may happen at any time,
>>>>>> because we may meet an unknown IOVA at any time in this case.
>>>>> That's how hardware behaves too though. So we can expect guests
>>>>> to try to optimize locality.
>>>> The difference is that, the software implementation needs to
>>>> query the mappings via socket. And it's much slower..
>>> If you are proposing this new feature as an optimization,
>>> then I'd like to see numbers showing the performance gains.
>>>
>>> It's definitely possible to optimize things out. Pre-loading isn't
>>> where I would start optimizing though. For example, DPDK could have its
>>> own VTD emulation, then it could access guest memory directly.
>> Have vtd emulation in dpdk have many disadvantages:
>>
>> - vendor locked, can only work for intel
> I don't see what would prevent other vendors from doing the same.
Technically it can, two questions here:
- Shouldn't we keep vhost-user vendor/transport independent?
- Do we really prefer the split device model here, it means to implement
datapath in two places at least. Personally I prefer to keep all virt
stuffs inside qemu.
>
>> - duplication of codes and bugs
>> - a huge number of new message types needs to be invented
> Oh, just the flush I'd wager.
Not only flush, but also error reporting, context entry programming and
even PRS in the future. And we need a feature negotiation between them
like vhost to keep the compatibility for future features. This sounds
not good.
>
>> So I tend to go to a reverse way, link dpdk to qemu.
> Won't really help as people want to build software using dpdk.
Well I believe the main use case it vDPA which is hardware virtio
offload. For building software using dpdk like ovs-dpdk, it's another
interesting topic. We can seek solution other than linking dpdk to qemu,
e.g we can do all virtio and packet copy stuffs inside a qemu IOThread
and use another inter-process channel to communicate with OVS-dpdk (or
another virtio-user here). The key is to hide all virtualization details
from OVS-dpdk.
>
>
>>>
>>>>>> Once we meet an unknown IOVA, the backend's data path will need
>>>>>> to stop and query the mapping of the IOVA via the socket and
>>>>>> wait for the reply. And the latency is not negligible (sometimes
>>>>>> it's even unacceptable), especially in high performance network
>>>>>> case. So maybe it's better to make both of them available to
>>>>>> the vhost backend.
>>>>>>
>>>>>>> I had an impression that a hardware accelerator was using
>>>>>>> VFIO anyway. Given this, can't we have QEMU program
>>>>>>> the shadow IOMMU tables into VFIO directly?
>>>>>> I think it's a good idea! Currently, my concern about it is
>>>>>> that, the hardware device may not use IOMMU and it may have
>>>>>> its builtin address translation unit. And it will be a pain
>>>>>> for device vendors to teach VFIO to be able to work with the
>>>>>> builtin address translation unit.
>>>>> I think such drivers would have to interate with VFIO somehow.
>>>>> Otherwise, what is the plan for assigning such devices then?
>>>> Such devices are just for vhost data path acceleration.
>>> That's not true I think. E.g. RDMA devices have an on-card MMU.
>>>
>>>> They have many available queue pairs, the switch logic
>>>> will be done among those queue pairs. And different queue
>>>> pairs will serve different VMs directly.
>>>>
>>>> Best regards,
>>>> Tiwei Bie
>>> The way I would do it is attach different PASID values to
>>> different queues. This way you can use the standard IOMMU
>>> to enforce protection.
>> So that's just shared virtual memory on host which can share iova address
>> space between a specific queue pair and a process. I'm not sure how hard can
>> exist vhost-user backend to support this.
>>
>> Thanks
> That would be VFIO's job, nothing to do with vhost-user besides
> sharing the VFIO descriptor.
At least dpdk need to offload DMA mapping setup to qemu.
Thanks
next prev parent reply other threads:[~2018-04-12 7:25 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-11 7:20 [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature Tiwei Bie
2018-04-11 8:00 ` Peter Xu
2018-04-11 8:25 ` Tiwei Bie
2018-04-11 8:37 ` Peter Xu
2018-04-11 8:55 ` Tiwei Bie
2018-04-11 9:16 ` Peter Xu
2018-04-11 9:25 ` Tiwei Bie
2018-04-11 8:01 ` Jason Wang
2018-04-11 8:38 ` Tiwei Bie
2018-04-11 13:41 ` Jason Wang
2018-04-11 17:00 ` Michael S. Tsirkin
2018-04-12 3:23 ` Jason Wang
2018-04-12 3:37 ` Michael S. Tsirkin
2018-04-12 3:56 ` Jason Wang
2018-04-11 17:37 ` Michael S. Tsirkin
2018-04-12 1:44 ` Tiwei Bie
2018-04-12 7:38 ` Jason Wang
2018-04-12 8:10 ` Tiwei Bie
2018-04-12 9:40 ` Jason Wang
2018-04-11 13:22 ` Michael S. Tsirkin
2018-04-11 13:42 ` Jason Wang
2018-04-12 1:10 ` Tiwei Bie
2018-04-12 1:29 ` Michael S. Tsirkin
2018-04-12 1:39 ` Tiwei Bie
2018-04-12 1:57 ` Michael S. Tsirkin
2018-04-12 2:35 ` Tiwei Bie
2018-04-12 3:20 ` Michael S. Tsirkin
2018-04-12 3:35 ` Michael S. Tsirkin
2018-04-12 3:43 ` Jason Wang
2018-04-12 4:19 ` Michael S. Tsirkin
2018-04-12 3:37 ` Jason Wang
2018-04-12 3:41 ` Michael S. Tsirkin
2018-04-12 7:24 ` Jason Wang [this message]
2018-04-16 7:47 ` Stefan Hajnoczi
2018-04-17 2:14 ` Jason Wang
2018-04-17 2:35 ` Tiwei Bie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c6aec3eb-602e-1802-b7b8-800ebc77cb99@redhat.com \
--to=jasowang@redhat.com \
--cc=cunming.liang@intel.com \
--cc=dan.daly@intel.com \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=tiwei.bie@intel.com \
--cc=zhihong.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).