From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48483) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bA7ZO-0000l4-DH for qemu-devel@nongnu.org; Mon, 06 Jun 2016 23:20:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bA7ZI-0002pl-7A for qemu-devel@nongnu.org; Mon, 06 Jun 2016 23:20:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45296) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bA7ZH-0002pf-RW for qemu-devel@nongnu.org; Mon, 06 Jun 2016 23:20:40 -0400 Date: Tue, 7 Jun 2016 11:20:32 +0800 From: Peter Xu Message-ID: <20160607032032.GC3800@pxdev.xzpeter.org> References: <1463847590-22782-1-git-send-email-bd.aviv@gmail.com> <1463847590-22782-2-git-send-email-bd.aviv@gmail.com> <57408FDB.1010000@web.de> <20160602084439.GB3477@pxdev.xzpeter.org> <20160602070046.761be49c@ul30vt.home> <5750313C.4000709@web.de> <20160606050407.GB21254@pxdev.xzpeter.org> <20160606071141.31d2008e@ul30vt.home> <20160606134317.GJ21254@pxdev.xzpeter.org> <20160606110211.2c9bc8ef@ul30vt.home> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160606110211.2c9bc8ef@ul30vt.home> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v3 1/3] IOMMU: add VTD_CAP_CM to vIOMMU capability exposed to guest List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: Jan Kiszka , "Aviv B.D" , qemu-devel@nongnu.org, "Michael S. Tsirkin" On Mon, Jun 06, 2016 at 11:02:11AM -0600, Alex Williamson wrote: > On Mon, 6 Jun 2016 21:43:17 +0800 > Peter Xu wrote: >=20 > > On Mon, Jun 06, 2016 at 07:11:41AM -0600, Alex Williamson wrote: > > > On Mon, 6 Jun 2016 13:04:07 +0800 > > > Peter Xu wrote: =20 > > [...] > > > > Besides the reason that there might have guests that do not suppo= rt > > > > CM=3D1, will there be performance considerations? When user's > > > > configuration does not require CM capability (e.g., generic VM > > > > configuration, without VFIO), shall we allow user to disable the = CM > > > > bit so that we can have better IOMMU performance (avoid extra and > > > > useless invalidations)? =20 > > >=20 > > > With Alexey's proposed patch to have callback ops when the iommu > > > notifier list adds its first entry and removes its last, any of the > > > additional overhead to generate notifies when nobody is listening c= an > > > be avoided. These same callbacks would be the ones that need to > > > generate a hw_error if a notifier is added while running in CM=3D0.= =20 > >=20 > > Not familar with Alexey's patch >=20 > https://lists.nongnu.org/archive/html/qemu-devel/2016-06/msg00079.html Thanks for the pointer. :) >=20 > >, but is that for VFIO only? >=20 > vfio is currently the only user of the iommu notifier, but the > interface is generic, which is how it should (must) be. Yes. >=20 > > I mean, if > > we configured CMbit=3D1, guest kernel will send invalidation request > > every time it creates new entries (context entries, or iotlb > > entries). Even without VFIO notifiers, guest need to trap into QEMU > > and process the invalidation requests. This is avoidable if we are no= t > > using VFIO devices at all (so no need to maintain any mappings), > > right? >=20 > CM=3D1 only defines that not-present and invalid entries can be cached, > any changes to existing entries requires an invalidation regardless of > CM. What you're looking for sounds more like ECAP.C: Yes, but I guess what I was talking about is CM bit but not ECAP.C. When we clear/replace one context entry, guest kernel will definitely send one context entry invalidation to QEMU: static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u= 8 devfn) { if (!iommu) return; clear_context_table(iommu, bus, devfn); iommu->flush.flush_context(iommu, 0, 0, 0, DMA_CCMD_GLOBAL_INVL); iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH); } ... While if we are creating a new one (like attaching a new VFIO device?), it's an optional behavior depending on whether CM bit is set: static int domain_context_mapping_one(struct dmar_domain *domain, struct intel_iommu *iommu, u8 bus, u8 devfn) { ... /* * It's a non-present to present mapping. If hardware doesn't cache * non-present entry we only need to flush the write-buffer. If the * _does_ cache non-present entries, then it does so in the special * domain #0, which we have to flush: */ if (cap_caching_mode(iommu->cap)) { iommu->flush.flush_context(iommu, 0, (((u16)bus) << 8) | devfn, DMA_CCMD_MASK_NOBIT, DMA_CCMD_DEVICE_INVL); iommu->flush.flush_iotlb(iommu, did, 0, 0, DMA_TLB_DSI_FLUSH); } else { iommu_flush_write_buffer(iommu); } ... } Only if cap_caching_mode() is set (which is bit 7, the CM bit), we will send these invalidations. What I meant is that, we should allow user to specify the CM bit, so that when we are not using VFIO devices, we can skip the above flush_content() and flush_iotlb() etc... So, besides the truth that we have some guests do not support CM bit (like Jailhouse), performance might be another consideration point that we should allow user to specify the CM bit themselfs. >=20 > C: Page-walk Coherency > This field indicates if hardware access to the root, context, > extended-context and interrupt-remap tables, and second-level paging > structures for requests-without PASID, are coherent (snooped) or not. > =E2=80=A2 0: Indicates hardware accesses to remapping structures ar= e non-coherent. > =E2=80=A2 1: Indicates hardware accesses to remapping structures ar= e coherent. >=20 > Without both CM=3D0 and C=3D0, our only virtualization mechanism for > maintaining a hardware cache coherent with the guest view of the iommu > would be to shadow all of the VT-d structures. For purely emulated > devices, maybe we can get away with that, but I doubt the current > ghashes used for the iotlb are prepared for it. Actually I haven't noticed this bit yet. I see that this will decide whether guest kernel need to send specific clflush() when modifying IOMMU PTEs, but shouldn't we flush the memory cache always so that we can sure IOMMU can see the same memory data as CPU does? Thanks! -- peterx