public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Travis <travis@sgi.com>
To: Chris Wright <chrisw@sous-sol.org>
Cc: David Woodhouse <dwmw2@infradead.org>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	linux-pci@vger.kernel.org, iommu@lists.linux-foundation.org,
	Mike Habeck <habeck@sgi.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/4] Intel pci: Remove Host Bridge devices from identity mapping
Date: Wed, 30 Mar 2011 12:25:43 -0700	[thread overview]
Message-ID: <4D9383B7.40807@sgi.com> (raw)
In-Reply-To: <20110330191511.GS18712@sequoia.sous-sol.org>



Chris Wright wrote:
> * Mike Travis (travis@sgi.com) wrote:
>> Chris Wright wrote:
>>> * Mike Travis (travis@sgi.com) wrote:
>>>>    When the IOMMU is being used, each request for a DMA mapping requires
>>>>    the intel_iommu code to look for some space in the DMA mapping table.
>>>>    For most drivers this occurs for each transfer.
>>>>
>>>>    When there are many outstanding DMA mappings [as seems to be the case
>>>>    with the 10GigE driver], the table grows large and the search for
>>>>    space becomes increasingly time consuming.  Performance for the
>>>>    10GigE driver drops to about 10% of it's capacity on a UV system
>>>>    when the CPU count is large.
>>> That's pretty poor.  I've seen large overheads, but when that big it was
>>> also related to issues in the 10G driver.  Do you have profile data
>>> showing this as the hotspot?
>> Here's one from our internal bug report:
>>
>> Here is a profile from a run with iommu=on  iommu=pt  (no forcedac)
> 
> OK, I was actually interested in the !pt case.  But this is useful
> still.  The iova lookup being distinct from the identity_mapping() case.

I can get that as well, but having every device using maps caused it's
own set of problems (hundreds of dma maps).  Here's a list of devices
on the system under test.  You can see that even 'minor' glitches can
get magnified when there are so many...

Blade Location    NASID  PCI Address X Display   Device
----------------------------------------------------------------------
    0 r001i01b00      0  0000:01:00.0      -   Intel 82576 Gigabit Network Connection
    .          .      .  0000:01:00.1      -   Intel 82576 Gigabit Network Connection
    .          .      .  0000:04:00.0      -   LSI SAS1064ET Fusion-MPT SAS
    .          .      .  0000:05:00.0      -   Matrox MGA G200e
    2 r001i01b02      4  0001:02:00.0      -   Mellanox MT26428 InfiniBand
    3 r001i01b03      6  0002:02:00.0      -   Mellanox MT26428 InfiniBand
    4 r001i01b04      8  0003:02:00.0      -   Mellanox MT26428 InfiniBand
   11 r001i01b11     22  0007:02:00.0      -   Mellanox MT26428 InfiniBand
   13 r001i01b13     26  0008:02:00.0      -   Mellanox MT26428 InfiniBand
   15 r001i01b15     30  0009:07:00.0   :0.0   nVidia GF100 [Tesla S2050]
    .          .      .  0009:08:00.0   :1.1   nVidia GF100 [Tesla S2050]
   18 r001i23b02     36  000b:02:00.0      -   Mellanox MT26428 InfiniBand
   20 r001i23b04     40  000c:01:00.0      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  000c:01:00.1      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  000c:04:00.0      -   Mellanox MT26428 InfiniBand
   23 r001i23b07     46  000d:07:00.0      -   nVidia GF100 [Tesla S2050]
    .          .      .  000d:08:00.0      -   nVidia GF100 [Tesla S2050]
   25 r001i23b09     50  000e:01:00.0      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  000e:01:00.1      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  000e:04:00.0      -   Mellanox MT26428 InfiniBand
   26 r001i23b10     52  000f:02:00.0      -   Mellanox MT26428 InfiniBand
   27 r001i23b11     54  0010:02:00.0      -   Mellanox MT26428 InfiniBand
   29 r001i23b13     58  0011:02:00.0      -   Mellanox MT26428 InfiniBand
   31 r001i23b15     62  0012:02:00.0      -   Mellanox MT26428 InfiniBand
   34 r002i01b02     68  0013:01:00.0      -   Mellanox MT26428 InfiniBand
   35 r002i01b03     70  0014:02:00.0      -   Mellanox MT26428 InfiniBand
   36 r002i01b04     72  0015:01:00.0      -   Mellanox MT26428 InfiniBand
   41 r002i01b09     82  0018:07:00.0      -   nVidia GF100 [Tesla S2050]
    .          .      .  0018:08:00.0      -   nVidia GF100 [Tesla S2050]
   43 r002i01b11     86  0019:01:00.0      -   Mellanox MT26428 InfiniBand
   45 r002i01b13     90  001a:01:00.0      -   Mellanox MT26428 InfiniBand
   48 r002i23b00     96  001c:07:00.0      -   nVidia GF100 [Tesla S2050]
    .          .      .  001c:08:00.0      -   nVidia GF100 [Tesla S2050]
   50 r002i23b02    100  001d:02:00.0      -   Mellanox MT26428 InfiniBand
   52 r002i23b04    104  001e:01:00.0      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  001e:01:00.1      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  001e:04:00.0      -   Mellanox MT26428 InfiniBand
   57 r002i23b09    114  0020:01:00.0      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  0020:01:00.1      -   Intel 82599EB 10-Gigabit Network Connection
    .          .      .  0020:04:00.0      -   Mellanox MT26428 InfiniBand
   58 r002i23b10    116  0021:02:00.0      -   Mellanox MT26428 InfiniBand
   59 r002i23b11    118  0022:02:00.0      -   Mellanox MT26428 InfiniBand
   61 r002i23b13    122  0023:02:00.0      -   Mellanox MT26428 InfiniBand
   63 r002i23b15    126  0024:02:00.0      -   Mellanox MT26428 InfiniBand

> 
>> uv48-sys was receiving and uv-debug sending.
>> ksoftirqd/640 was running at approx. 100% cpu utilization.
>> I had pinned the nttcp process on uv48-sys to cpu 64.
>>
>> # Samples: 1255641
>> #
>> # Overhead        Command  Shared Object  Symbol
>> # ........  .............  .............  ......
>> #
>>    50.27%ESC[m  ksoftirqd/640  [kernel]       [k] _spin_lock
>>    27.43%ESC[m  ksoftirqd/640  [kernel]       [k] iommu_no_mapping
> 
>> ...
>>      0.48%  ksoftirqd/640  [kernel]       [k] iommu_should_identity_map
>>      0.45%  ksoftirqd/640  [kernel]       [k] ixgbe_alloc_rx_buffers    [
>> ixgbe]
> 
> Note, ixgbe has had rx dma mapping issues (that's why I wondered what
> was causing the massive slowdown under !pt mode).

I think since this profile run, the network guys updated the ixgbe
driver with a later version.  (I don't know the outcome of that test.)

> 
> <snip>
>> I tracked this time down to identity_mapping() in this loop:
>>
>>       list_for_each_entry(info, &si_domain->devices, link)
>>               if (info->dev == pdev)
>>                       return 1;
>>
>> I didn't get the exact count, but there was approx 11,000 PCI devices
>> on this system.  And this function was called for every page request
>> in each DMA request.
> 
> Right, so this is the list traversal (and wow, a lot of PCI devices).

Most of the PCI devices were the 45 on each of 256 Nahalem sockets.
Also, there's a ton of bridges as well.

> Did you try a smarter data structure? (While there's room for another
> bit in pci_dev, the bit is more about iommu implementation details than
> anything at the pci level).
> 
> Or the domain_dev_info is cached in the archdata of device struct.
> You should be able to just reference that directly.
> 
> Didn't think it through completely, but perhaps something as simple as:
> 
> 	return pdev->dev.archdata.iommu == si_domain;

I can try this, thanks!

> 
> thanks,
> -chris

  reply	other threads:[~2011-03-30 19:25 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-29 23:36 [PATCH 0/4] pci: Speed up processing of IOMMU related functions Mike Travis
2011-03-29 23:36 ` [PATCH 1/4] Intel pci: Remove Host Bridge devices from identity mapping Mike Travis
2011-03-30 17:51   ` Chris Wright
2011-03-30 18:30     ` Mike Travis
2011-03-30 19:15       ` Chris Wright
2011-03-30 19:25         ` Mike Travis [this message]
2011-03-30 19:57           ` Chris Wright
2011-03-29 23:36 ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function Mike Travis
2011-03-30 19:19   ` Chris Wright
2011-03-30 19:29     ` Mike Travis
2011-03-31  0:33     ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function v2 Mike Travis
2011-03-29 23:36 ` [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges Mike Travis
2011-03-31 22:11   ` Mike Travis
2011-03-31 22:53     ` Chris Wright
2011-03-31 23:25       ` Mike Travis
2011-03-31 23:40         ` Mike Habeck
2011-03-31 23:56           ` Chris Wright
2011-04-01  1:05             ` Mike Habeck
2011-04-02  0:32               ` [PATCH 3/4 v2] intel-iommu: don't cache iova above 32bit caching boundary Chris Wright
2011-04-06  0:39                 ` [PATCH 3/4 v3] " Chris Wright
2011-03-31 23:39       ` [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges Chris Wright
2011-03-29 23:36 ` [PATCH 4/4] Intel pci: Use coherent DMA mask when requested Mike Travis
2011-03-30 18:02   ` Chris Wright
2011-04-01  2:57     ` FUJITA Tomonori
2011-04-07 19:47 ` [PATCH 1/4] Intel pci: Remove Host Bridge devices from identity mapping Mike Travis
2011-04-07 19:51 ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function Mike Travis
2011-04-07 19:52 ` [PATCH 4/4] Intel pci: Use coherent DMA mask when requested Mike Travis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D9383B7.40807@sgi.com \
    --to=travis@sgi.com \
    --cc=chrisw@sous-sol.org \
    --cc=dwmw2@infradead.org \
    --cc=habeck@sgi.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox