Re: [PATCH] iommu/quirk: disable shared EPT for Sandybridge and earlier processors.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: kevin.tian@intel.com, Anshul Makkar <anshul.makkar@citrix.com>,
	xen-devel@lists.xen.org, Jan Beulich <JBeulich@suse.com>,
	yang.z.zhang@intel.com,
	Malcolm Crossley <malcolm.crossley@citrix.com>
Subject: Re: [PATCH] iommu/quirk: disable shared EPT for Sandybridge and earlier processors.
Date: Tue, 1 Dec 2015 16:19:24 +0000	[thread overview]
Message-ID: <565DC88C.9010709@citrix.com> (raw)
In-Reply-To: <20151201152457.GC19885@char.us.oracle.com>

On 01/12/15 15:24, Konrad Rzeszutek Wilk wrote:
> On Tue, Dec 01, 2015 at 10:34:17AM +0000, Andrew Cooper wrote:
>> On 30/11/15 21:22, Konrad Rzeszutek Wilk wrote:
>>> On Thu, Nov 26, 2015 at 01:55:57PM +0000, Andrew Cooper wrote:
>>>> On 26/11/15 13:48, Malcolm Crossley wrote:
>>>>> On 26/11/15 13:46, Jan Beulich wrote:
>>>>>>>>> On 25.11.15 at 11:28, <andrew.cooper3@citrix.com> wrote:
>>>>>>> The problem is that SandyBridge IOMMUs advertise 2M support and do
>>>>>>> function with it, but cannot cache 2MB translations in the IOTLBs.
>>>>>>>
>>>>>>> As a result, attempting to use 2M translations causes substantially
>>>>>>> worse performance than 4K translations.
>>>>>> Btw - how does this get explained? At a first glance, even if 2Mb
>>>>>> translations don't get entered into the TLB, it should still be one
>>>>>> less page table level to walk for the IOMMU, and should hence
>>>>>> nevertheless be a benefit. Yet you even say _substantially_
>>>>>> worse performance results.
>>>>> There is a IOTLB for the 4K translation so if you only use 4K
>>>>> translations then you get to take advantage of the IOTLB.
>>>>>
>>>>> If you use the 2Mb translation then a page table walk has to be
>>>>> performed every time there's a DMA access to that region of the BFN
>>>>> address space.
>>>> Also remember that a high level dma access (from the point of view of a
>>>> driver) will be fragmented at the PCIe max packet size, which is
>>>> typically 256 bytes.
>>>>
>>>> So by not caching the 2Mb translation, a dma access of 4k may undergo 16
>>>> pagetable walks, one for each PCIe packet.
>>>>
>>>> We observed that using 2Mb mappings results in a 40% overhead, compared
>>>> to using 4k mappings, from the point of view of a sample network workload.
>>> How did you observe this? I am mighty curious what kind of performance tools
>>> you used to find this  as I would love to figure out if some of the issues
>>> we have seen are related to this?
>> The 40% difference is just in terms of network throughput of a VF, given
>> a workload which can normally saturate line rate on the card.
> I understand that.
>
> But I am curious on how you found out the page walks by the IOMMU were
> so excessive?

I didn't.  It is all speculation drawn from other information.

The manual states that there is not a superpage IOTLB.

This leaves two options
1) 2M mappings are entirely uncached
2) 2M mappings are shattered to 4K mappings and cached

The fact there is a 40% performance reduction suggests 1 rather than 2.

~Andrew

next prev parent reply	other threads:[~2015-12-01 16:19 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-24 17:17 [PATCH] iommu/quirk: disable shared EPT for Sandybridge and earlier processors Anshul
2015-11-24 17:41 ` Jan Beulich
2015-11-25 10:28   ` Andrew Cooper
2015-11-25 10:49     ` Jan Beulich
2015-11-25 15:13       ` Andrew Cooper
2015-11-25 15:38         ` Jan Beulich
2015-11-25 15:58           ` Malcolm Crossley
2015-11-26  7:17             ` Tian, Kevin
2015-12-01 16:45               ` Anshul Makkar
2015-12-01 17:20                 ` Jan Beulich
2015-11-26  8:45             ` Jan Beulich
2015-11-26 10:27               ` Andrew Cooper
2015-11-26 10:39                 ` Jan Beulich
2015-11-26 11:42                   ` Andrew Cooper
2015-11-26 11:53                     ` Jan Beulich
2015-11-26 13:46     ` Jan Beulich
2015-11-26 13:48       ` Malcolm Crossley
2015-11-26 13:55         ` Andrew Cooper
2015-11-30 21:22           ` Konrad Rzeszutek Wilk
2015-12-01 10:34             ` Andrew Cooper
2015-12-01 10:44               ` Anshul Makkar
2015-12-01 15:24               ` Konrad Rzeszutek Wilk
2015-12-01 16:19                 ` Andrew Cooper [this message]
2015-12-03  1:19           ` Tian, Kevin
2015-12-03 11:24             ` Andrew Cooper
2015-12-04  1:55               ` Tian, Kevin
2015-12-03  2:40           ` Tian, Kevin
2015-12-03  8:18             ` Jan Beulich
2015-12-03  8:50               ` Tian, Kevin
2015-12-03 11:19                 ` Andrew Cooper
2015-12-04  2:35                   ` Tian, Kevin
     [not found] <1440776507-30218-1-git-send-email-anshul.makkar@citrix.com>
2015-08-28 16:24 ` Andrew Cooper
2015-08-31  8:09 ` Jan Beulich
2015-09-01 14:18   ` Andrew Cooper
2015-09-01 14:55     ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=565DC88C.9010709@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=anshul.makkar@citrix.com \
    --cc=kevin.tian@intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=malcolm.crossley@citrix.com \
    --cc=xen-devel@lists.xen.org \
    --cc=yang.z.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.