All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Travis <travis@sgi.com>
To: Chris Wright <chrisw@sous-sol.org>
Cc: David Woodhouse <dwmw2@infradead.org>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	linux-pci@vger.kernel.org, iommu@lists.linux-foundation.org,
	Mike Habeck <habeck@sgi.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/4] Intel pci: Remove Host Bridge devices from identity mapping
Date: Wed, 30 Mar 2011 12:25:43 -0700	[thread overview]
Message-ID: <4D9383B7.40807@sgi.com> (raw)
In-Reply-To: <20110330191511.GS18712@sequoia.sous-sol.org>



Chris Wright wrote:
> * Mike Travis (travis@sgi.com) wrote:
>> Chris Wright wrote:
>>> * Mike Travis (travis@sgi.com) wrote:
>>>>    When the IOMMU is being used, each request for a DMA mapping requires
>>>>    the intel_iommu code to look for some space in the DMA mapping table.
>>>>    For most drivers this occurs for each transfer.
>>>>
>>>>    When there are many outstanding DMA mappings [as seems to be the case
>>>>    with the 10GigE driver], the table grows large and the search for
>>>>    space becomes increasingly time consuming.  Performance for the
>>>>    10GigE driver drops to about 10% of it's capacity on a UV system
>>>>    when the CPU count is large.
>>> That's pretty poor.  I've seen large overheads, but when that big it was
>>> also related to issues in the 10G driver.  Do you have profile data
>>> showing this as the hotspot?
>> Here's one from our internal bug report:
>>
>> Here is a profile from a run with iommu=on  iommu=pt  (no forcedac)
> 
> OK, I was actually interested in the !pt case.  But this is useful
> still.  The iova lookup being distinct from the identity_mapping() case.

I can get that as well, but having every device using maps caused it's
own set of problems (hundreds of dma maps).  Here's a list of devices
on the system under test.  You can see that even 'minor' glitches can
get magnified when there are so many...

Blade Location    NASID  PCI Address X Display   Device
----------------------------------------------------------------------
    0 r001i01b00      0  0000:01:00.0      -   Intel 82576 Gigabit Network Connection
    .          .      .  0000:01:00.1      -   Intel 82576 Gigabit Network Connection
    .          .      .  0000:04:00.0      -   LSI SAS1064ET Fusion-MPT SAS
    .          .      .  0000:05:00.0      -   Matrox MGA G200e
    2 r001i01b02      4  0001:02:00.0      -   Mellanox MT26428 InfiniBand
    3 r001i01b03      6  0002:02:00.0      -   Mellanox MT26428 InfiniBand
    4 r001i01b04      8  0003:02:00.0      -   Mellanox MT26428 InfiniBand
   11 r001i01b11     22  0007:02:00.0      -   Mellanox MT26428 InfiniBand
   13 r001i01b13     26  0008:02:00.0      -   Mellanox MT26428 InfiniBand
   15 r001i01b15     30  0009:07:00.0   :0.0   nVidia GF100 [Tesla S2050]
    .          .      .  0009:08:00.0   :1.1   nVidia GF100 [Tesla S2050]
   18 r001i23b02     36  000b:02:00.0      -   Mellanox MT26428 InfiniBand
   20 r001i23b04     40  000c:01:00.0      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  000c:01:00.1      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  000c:04:00.0      -   Mellanox MT26428 InfiniBand
   23 r001i23b07     46  000d:07:00.0      -   nVidia GF100 [Tesla S2050]
    .          .      .  000d:08:00.0      -   nVidia GF100 [Tesla S2050]
   25 r001i23b09     50  000e:01:00.0      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  000e:01:00.1      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  000e:04:00.0      -   Mellanox MT26428 InfiniBand
   26 r001i23b10     52  000f:02:00.0      -   Mellanox MT26428 InfiniBand
   27 r001i23b11     54  0010:02:00.0      -   Mellanox MT26428 InfiniBand
   29 r001i23b13     58  0011:02:00.0      -   Mellanox MT26428 InfiniBand
   31 r001i23b15     62  0012:02:00.0      -   Mellanox MT26428 InfiniBand
   34 r002i01b02     68  0013:01:00.0      -   Mellanox MT26428 InfiniBand
   35 r002i01b03     70  0014:02:00.0      -   Mellanox MT26428 InfiniBand
   36 r002i01b04     72  0015:01:00.0      -   Mellanox MT26428 InfiniBand
   41 r002i01b09     82  0018:07:00.0      -   nVidia GF100 [Tesla S2050]
    .          .      .  0018:08:00.0      -   nVidia GF100 [Tesla S2050]
   43 r002i01b11     86  0019:01:00.0      -   Mellanox MT26428 InfiniBand
   45 r002i01b13     90  001a:01:00.0      -   Mellanox MT26428 InfiniBand
   48 r002i23b00     96  001c:07:00.0      -   nVidia GF100 [Tesla S2050]
    .          .      .  001c:08:00.0      -   nVidia GF100 [Tesla S2050]
   50 r002i23b02    100  001d:02:00.0      -   Mellanox MT26428 InfiniBand
   52 r002i23b04    104  001e:01:00.0      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  001e:01:00.1      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  001e:04:00.0      -   Mellanox MT26428 InfiniBand
   57 r002i23b09    114  0020:01:00.0      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  0020:01:00.1      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  0020:04:00.0      -   Mellanox MT26428 InfiniBand
   58 r002i23b10    116  0021:02:00.0      -   Mellanox MT26428 InfiniBand
   59 r002i23b11    118  0022:02:00.0      -   Mellanox MT26428 InfiniBand
   61 r002i23b13    122  0023:02:00.0      -   Mellanox MT26428 InfiniBand
   63 r002i23b15    126  0024:02:00.0      -   Mellanox MT26428 InfiniBand

> 
>> uv48-sys was receiving and uv-debug sending.
>> ksoftirqd/640 was running at approx. 100% cpu utilization.
>> I had pinned the nttcp process on uv48-sys to cpu 64.
>>
>> # Samples: 1255641
>> #
>> # Overhead        Command  Shared Object  Symbol
>> # ........  .............  .............  ......
>> #
>>    50.27%ESC[m  ksoftirqd/640  [kernel]       [k] _spin_lock
>>    27.43%ESC[m  ksoftirqd/640  [kernel]       [k] iommu_no_mapping
> 
>> ...
>>      0.48%  ksoftirqd/640  [kernel]       [k] iommu_should_identity_map
>>      0.45%  ksoftirqd/640  [kernel]       [k] ixgbe_alloc_rx_buffers    [
>> ixgbe]
> 
> Note, ixgbe has had rx dma mapping issues (that's why I wondered what
> was causing the massive slowdown under !pt mode).

I think since this profile run, the network guys updated the ixgbe
driver with a later version.  (I don't know the outcome of that test.)

> 
> <snip>
>> I tracked this time down to identity_mapping() in this loop:
>>
>>       list_for_each_entry(info, &si_domain->devices, link)
>>               if (info->dev == pdev)
>>                       return 1;
>>
>> I didn't get the exact count, but there was approx 11,000 PCI devices
>> on this system.  And this function was called for every page request
>> in each DMA request.
> 
> Right, so this is the list traversal (and wow, a lot of PCI devices).

Most of the PCI devices were the 45 on each of 256 Nahalem sockets.
Also, there's a ton of bridges as well.

> Did you try a smarter data structure? (While there's room for another
> bit in pci_dev, the bit is more about iommu implementation details than
> anything at the pci level).
> 
> Or the domain_dev_info is cached in the archdata of device struct.
> You should be able to just reference that directly.
> 
> Didn't think it through completely, but perhaps something as simple as:
> 
> 	return pdev->dev.archdata.iommu == si_domain;

I can try this, thanks!

> 
> thanks,
> -chris

  reply	other threads:[~2011-03-30 19:25 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-29 23:36 [PATCH 0/4] pci: Speed up processing of IOMMU related functions Mike Travis
2011-03-29 23:36 ` [PATCH 1/4] Intel pci: Remove Host Bridge devices from identity mapping Mike Travis
2011-03-30 17:51   ` Chris Wright
2011-03-30 18:30     ` Mike Travis
2011-03-30 19:15       ` Chris Wright
2011-03-30 19:25         ` Mike Travis [this message]
2011-03-30 19:57           ` Chris Wright
2011-03-29 23:36 ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function Mike Travis
2011-03-30 19:19   ` Chris Wright
2011-03-30 19:29     ` Mike Travis
2011-03-31  0:33     ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function v2 Mike Travis
2011-03-29 23:36 ` [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges Mike Travis
2011-03-31 22:11   ` Mike Travis
2011-03-31 22:53     ` Chris Wright
2011-03-31 23:25       ` Mike Travis
2011-03-31 23:40         ` Mike Habeck
2011-03-31 23:56           ` Chris Wright
2011-04-01  1:05             ` Mike Habeck
2011-04-02  0:32               ` [PATCH 3/4 v2] intel-iommu: don't cache iova above 32bit caching boundary Chris Wright
2011-04-06  0:39                 ` [PATCH 3/4 v3] " Chris Wright
2011-03-31 23:39       ` [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges Chris Wright
2011-03-29 23:36 ` [PATCH 4/4] Intel pci: Use coherent DMA mask when requested Mike Travis
2011-03-30 18:02   ` Chris Wright
2011-04-01  2:57     ` FUJITA Tomonori
2011-04-07 19:47 ` [PATCH 1/4] Intel pci: Remove Host Bridge devices from identity mapping Mike Travis
2011-04-07 19:51 ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function Mike Travis
2011-04-07 19:52 ` [PATCH 4/4] Intel pci: Use coherent DMA mask when requested Mike Travis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D9383B7.40807@sgi.com \
    --to=travis@sgi.com \
    --cc=chrisw@sous-sol.org \
    --cc=dwmw2@infradead.org \
    --cc=habeck@sgi.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.