From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [PATCH] iommu/quirk: disable shared EPT for Sandybridge and earlier processors. Date: Mon, 30 Nov 2015 16:22:29 -0500 Message-ID: <20151130212229.GC14317@char.us.oracle.com> References: <1448385479-17614-1-git-send-email-anshul.makkar@citrix.com> <5654AF4C02000078000B8A11@prv-mh.provo.novell.com> <56558D35.2040800@citrix.com> <56571B3202000078000B9652@prv-mh.provo.novell.com> <56570DB1.7020800@citrix.com> <56570F6D.7010907@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <56570F6D.7010907@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: kevin.tian@intel.com, Anshul Makkar , xen-devel@lists.xen.org, Jan Beulich , yang.z.zhang@intel.com, Malcolm Crossley List-Id: xen-devel@lists.xenproject.org On Thu, Nov 26, 2015 at 01:55:57PM +0000, Andrew Cooper wrote: > On 26/11/15 13:48, Malcolm Crossley wrote: > > On 26/11/15 13:46, Jan Beulich wrote: > >>>>> On 25.11.15 at 11:28, wrote: > >>> The problem is that SandyBridge IOMMUs advertise 2M support and do > >>> function with it, but cannot cache 2MB translations in the IOTLBs. > >>> > >>> As a result, attempting to use 2M translations causes substantially > >>> worse performance than 4K translations. > >> Btw - how does this get explained? At a first glance, even if 2Mb > >> translations don't get entered into the TLB, it should still be one > >> less page table level to walk for the IOMMU, and should hence > >> nevertheless be a benefit. Yet you even say _substantially_ > >> worse performance results. > > There is a IOTLB for the 4K translation so if you only use 4K > > translations then you get to take advantage of the IOTLB. > > > > If you use the 2Mb translation then a page table walk has to be > > performed every time there's a DMA access to that region of the BFN > > address space. > > Also remember that a high level dma access (from the point of view of a > driver) will be fragmented at the PCIe max packet size, which is > typically 256 bytes. > > So by not caching the 2Mb translation, a dma access of 4k may undergo 16 > pagetable walks, one for each PCIe packet. > > We observed that using 2Mb mappings results in a 40% overhead, compared > to using 4k mappings, from the point of view of a sample network workload. How did you observe this? I am mighty curious what kind of performance tools you used to find this as I would love to figure out if some of the issues we have seen are related to this? > > ~Andrew > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel