From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39784) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f6WbM-0000Px-GS for qemu-devel@nongnu.org; Thu, 12 Apr 2018 03:25:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f6WbJ-0001tu-Ax for qemu-devel@nongnu.org; Thu, 12 Apr 2018 03:25:00 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:36034 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1f6WbJ-0001tV-6I for qemu-devel@nongnu.org; Thu, 12 Apr 2018 03:24:57 -0400 References: <20180411072027.5656-1-tiwei.bie@intel.com> <20180411161926-mutt-send-email-mst@kernel.org> <20180412011059.yywn73znjdip2cyv@debian> <20180412042724-mutt-send-email-mst@kernel.org> <20180412013942.egucc4isxkokta7z@debian> <20180412044404-mutt-send-email-mst@kernel.org> <1726abc8-92ff-420a-adbd-c08e9fa251d2@redhat.com> <20180412063824-mutt-send-email-mst@kernel.org> From: Jason Wang Message-ID: Date: Thu, 12 Apr 2018 15:24:37 +0800 MIME-Version: 1.0 In-Reply-To: <20180412063824-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC] vhost-user: introduce F_NEED_ALL_IOTLB protocol feature List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: Tiwei Bie , cunming.liang@intel.com, qemu-devel@nongnu.org, peterx@redhat.com, zhihong.wang@intel.com, dan.daly@intel.com On 2018=E5=B9=B404=E6=9C=8812=E6=97=A5 11:41, Michael S. Tsirkin wrote: > On Thu, Apr 12, 2018 at 11:37:35AM +0800, Jason Wang wrote: >> >> On 2018=E5=B9=B404=E6=9C=8812=E6=97=A5 09:57, Michael S. Tsirkin wrote= : >>> On Thu, Apr 12, 2018 at 09:39:43AM +0800, Tiwei Bie wrote: >>>> On Thu, Apr 12, 2018 at 04:29:29AM +0300, Michael S. Tsirkin wrote: >>>>> On Thu, Apr 12, 2018 at 09:10:59AM +0800, Tiwei Bie wrote: >>>>>> On Wed, Apr 11, 2018 at 04:22:21PM +0300, Michael S. Tsirkin wrote= : >>>>>>> On Wed, Apr 11, 2018 at 03:20:27PM +0800, Tiwei Bie wrote: >>>>>>>> This patch introduces VHOST_USER_PROTOCOL_F_NEED_ALL_IOTLB >>>>>>>> feature for vhost-user. By default, vhost-user backend needs >>>>>>>> to query the IOTLBs from QEMU after meeting unknown IOVAs. >>>>>>>> With this protocol feature negotiated, QEMU will provide all >>>>>>>> the IOTLBs to vhost-user backend without waiting for the >>>>>>>> queries from backend. This is helpful when using a hardware >>>>>>>> accelerator which is not able to handle unknown IOVAs at the >>>>>>>> vhost-user backend. >>>>>>>> >>>>>>>> Signed-off-by: Tiwei Bie >>>>>>> This is potentially a large amount of data to be sent >>>>>>> on a socket. >>>>>> If we take the hardware accelerator out of this picture, we >>>>>> will find that it's actually a question of "pre-loading" vs >>>>>> "lazy-loading". I think neither of them is perfect. >>>>>> >>>>>> For "pre-loading", as you said, we may have a tough starting. >>>>>> But for "lazy-loading", we can't have a predictable performance. >>>>>> A sudden, unexpected performance drop may happen at any time, >>>>>> because we may meet an unknown IOVA at any time in this case. >>>>> That's how hardware behaves too though. So we can expect guests >>>>> to try to optimize locality. >>>> The difference is that, the software implementation needs to >>>> query the mappings via socket. And it's much slower.. >>> If you are proposing this new feature as an optimization, >>> then I'd like to see numbers showing the performance gains. >>> >>> It's definitely possible to optimize things out. Pre-loading isn't >>> where I would start optimizing though. For example, DPDK could have = its >>> own VTD emulation, then it could access guest memory directly. >> Have vtd emulation in dpdk have many disadvantages: >> >> - vendor locked, can only work for intel > I don't see what would prevent other vendors from doing the same. Technically it can, two questions here: - Shouldn't we keep vhost-user vendor/transport independent? - Do we really prefer the split device model here, it means to implement=20 datapath in two places at least. Personally I prefer to keep all virt=20 stuffs inside qemu. > >> - duplication of codes and bugs >> - a huge number of new message types needs to be invented > Oh, just the flush I'd wager. Not only flush, but also error reporting, context entry programming and=20 even PRS in the future. And we need a feature negotiation between them=20 like vhost to keep the compatibility for future features. This sounds=20 not good. > >> So I tend to go to a reverse way, link dpdk to qemu. > Won't really help as people want to build software using dpdk. Well I believe the main use case it vDPA which is hardware virtio=20 offload. For building software using dpdk like ovs-dpdk, it's another=20 interesting topic. We can seek solution other than linking dpdk to qemu,=20 e.g we can do all virtio and packet copy stuffs inside a qemu IOThread=20 and use another inter-process channel to communicate with OVS-dpdk (or=20 another virtio-user here). The key is to hide all virtualization details=20 from OVS-dpdk. > > >>> >>>>>> Once we meet an unknown IOVA, the backend's data path will need >>>>>> to stop and query the mapping of the IOVA via the socket and >>>>>> wait for the reply. And the latency is not negligible (sometimes >>>>>> it's even unacceptable), especially in high performance network >>>>>> case. So maybe it's better to make both of them available to >>>>>> the vhost backend. >>>>>> >>>>>>> I had an impression that a hardware accelerator was using >>>>>>> VFIO anyway. Given this, can't we have QEMU program >>>>>>> the shadow IOMMU tables into VFIO directly? >>>>>> I think it's a good idea! Currently, my concern about it is >>>>>> that, the hardware device may not use IOMMU and it may have >>>>>> its builtin address translation unit. And it will be a pain >>>>>> for device vendors to teach VFIO to be able to work with the >>>>>> builtin address translation unit. >>>>> I think such drivers would have to interate with VFIO somehow. >>>>> Otherwise, what is the plan for assigning such devices then? >>>> Such devices are just for vhost data path acceleration. >>> That's not true I think. E.g. RDMA devices have an on-card MMU. >>> >>>> They have many available queue pairs, the switch logic >>>> will be done among those queue pairs. And different queue >>>> pairs will serve different VMs directly. >>>> >>>> Best regards, >>>> Tiwei Bie >>> The way I would do it is attach different PASID values to >>> different queues. This way you can use the standard IOMMU >>> to enforce protection. >> So that's just shared virtual memory on host which can share iova addr= ess >> space between a specific queue pair and a process. I'm not sure how ha= rd can >> exist vhost-user backend to support this. >> >> Thanks > That would be VFIO's job, nothing to do with vhost-user besides > sharing the VFIO descriptor. At least dpdk need to offload DMA mapping setup to qemu. Thanks