All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ray Jui via iommu <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
To: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>,
	Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>,
	will.deacon-5wv7dgnIgG8@public.gmane.org,
	joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	sunil.goutham-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	linu.cherian-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH 0/8] io-pgtable lock removal
Date: Wed, 14 Jun 2017 17:40:30 -0700	[thread overview]
Message-ID: <b7830be1-9a78-e29f-a29c-4798aaa28c0a@broadcom.com> (raw)
In-Reply-To: <458ad41d-6679-eeca-3c0f-13ccb6c933b6-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>

Hi Robin,

I have applied this patch series on top of v4.12-rc4, and ran various
Ethernet and NVMf target throughput tests on it.

To give you some background of my setup:

The system is a ARMv8 based system with 8 cores. It has various PCIe
root complexes that can be used to connect to PCIe endpoint devices
including NIC cards and NVMe SSDs.

I'm particularly interested in the performance of the PCIe root complex
that connects to the NIC card, and during my test, IOMMU is
enabled/disabled against that particular PCIe root complex. The root
complexes connected to NVMe SSDs remain unchanged (i.e., without IOMMU).

For the Ethernet throughput out of 50G link:

Note during the multiple TCP session test, each session will be spread
to different CPU cores for optimized performance

Without IOMMU:

TX TCP x1 - 29.7 Gbps
TX TCP x4 - 30.5 Gbps
TX TCP x8 - 28 Gbps

RX TCP x1 - 15 Gbps
RX TCP x4 - 33.7 Gbps
RX TCP x8 - 36 Gbps

With IOMMU, but without your latest patch:

TX TCP x1 - 15.2 Gbps
TX TCP x4 - 14.3 Gbps
TX TCP x8 - 13 Gbps

RX TCP x1 - 7.88 Gbps
RX TCP x4 - 13.2 Gbps
RX TCP x8 - 12.6 Gbps

With IOMMU and your latest patch:

TX TCP x1 - 21.4 Gbps
TX TCP x4 - 30.5 Gbps
TX TCP x8 - 21.3 Gbps

RX TCP x1 - 7.7 Gbps
RX TCP x4 - 20.1 Gbps
RX TCP x8 - 27.1 Gbps

With the NVMf target test with 4 SSDs, fio based test, random read, 4k,
8 jobs:

Without IOMMU:

IOPS = 1080K

With IOMMU, but without your latest patch:

IOPS = 520K

With IOMMU and your latest patch:

IOPS = 500K ~ 850K (a lot of variation observed during the same test run)

As you can see, performance has improved significantly with this patch
series! That is very impressive!

However, it is still off, compared to the test runs without the IOMMU.
I'm wondering if more improvement is expected.

In addition, a much larger throughput variation is observed in the tests
with these latest patches, when multiple CPUs are involved. I'm
wondering if that is caused by some remaining lock in the driver?

Also, in a few occasions, I observed the following message during the
test, when multiple cores are involved:

arm-smmu 64000000.mmu: TLB sync timed out -- SMMU may be deadlocked

Thanks,

Ray

On 6/9/17 12:28 PM, Nate Watterson wrote:
> Hi Robin,
> 
> On 6/8/2017 7:51 AM, Robin Murphy wrote:
>> Hi all,
>>
>> Here's the cleaned up nominally-final version of the patches everybody's
>> keen to see. #1 is just a non-critical thing-I-spotted-in-passing fix,
>> #2-#4 do some preparatory work (and bid farewell to everyone's least
>> favourite bit of code, hooray!), and #5-#8 do the dirty deed itself.
>>
>> The branch I've previously shared has been updated too:
>>
>>    git://linux-arm.org/linux-rm  iommu/pgtable
>>
>> All feedback welcome, as I'd really like to land this for 4.13.
>>
> 
> I tested the series on a QDF2400 development platform and see notable
> performance improvements particularly in workloads that make concurrent
> accesses to a single iommu_domain.
> 
>> Robin.
>>
>>
>> Robin Murphy (8):
>>    iommu/io-pgtable-arm-v7s: Check table PTEs more precisely
>>    iommu/io-pgtable-arm: Improve split_blk_unmap
>>    iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
>>    iommu/io-pgtable: Introduce explicit coherency
>>    iommu/io-pgtable-arm: Support lockless operation
>>    iommu/io-pgtable-arm-v7s: Support lockless operation
>>    iommu/arm-smmu: Remove io-pgtable spinlock
>>    iommu/arm-smmu-v3: Remove io-pgtable spinlock
>>
>>   drivers/iommu/arm-smmu-v3.c        |  36 ++-----
>>   drivers/iommu/arm-smmu.c           |  48 ++++------
>>   drivers/iommu/io-pgtable-arm-v7s.c | 173
>> +++++++++++++++++++++------------
>>   drivers/iommu/io-pgtable-arm.c     | 190
>> ++++++++++++++++++++++++-------------
>>   drivers/iommu/io-pgtable.h         |   6 ++
>>   5 files changed, 268 insertions(+), 185 deletions(-)
>>
> 

WARNING: multiple messages have this Message-ID (diff)
From: ray.jui@broadcom.com (Ray Jui)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 0/8] io-pgtable lock removal
Date: Wed, 14 Jun 2017 17:40:30 -0700	[thread overview]
Message-ID: <b7830be1-9a78-e29f-a29c-4798aaa28c0a@broadcom.com> (raw)
In-Reply-To: <458ad41d-6679-eeca-3c0f-13ccb6c933b6@codeaurora.org>

Hi Robin,

I have applied this patch series on top of v4.12-rc4, and ran various
Ethernet and NVMf target throughput tests on it.

To give you some background of my setup:

The system is a ARMv8 based system with 8 cores. It has various PCIe
root complexes that can be used to connect to PCIe endpoint devices
including NIC cards and NVMe SSDs.

I'm particularly interested in the performance of the PCIe root complex
that connects to the NIC card, and during my test, IOMMU is
enabled/disabled against that particular PCIe root complex. The root
complexes connected to NVMe SSDs remain unchanged (i.e., without IOMMU).

For the Ethernet throughput out of 50G link:

Note during the multiple TCP session test, each session will be spread
to different CPU cores for optimized performance

Without IOMMU:

TX TCP x1 - 29.7 Gbps
TX TCP x4 - 30.5 Gbps
TX TCP x8 - 28 Gbps

RX TCP x1 - 15 Gbps
RX TCP x4 - 33.7 Gbps
RX TCP x8 - 36 Gbps

With IOMMU, but without your latest patch:

TX TCP x1 - 15.2 Gbps
TX TCP x4 - 14.3 Gbps
TX TCP x8 - 13 Gbps

RX TCP x1 - 7.88 Gbps
RX TCP x4 - 13.2 Gbps
RX TCP x8 - 12.6 Gbps

With IOMMU and your latest patch:

TX TCP x1 - 21.4 Gbps
TX TCP x4 - 30.5 Gbps
TX TCP x8 - 21.3 Gbps

RX TCP x1 - 7.7 Gbps
RX TCP x4 - 20.1 Gbps
RX TCP x8 - 27.1 Gbps

With the NVMf target test with 4 SSDs, fio based test, random read, 4k,
8 jobs:

Without IOMMU:

IOPS = 1080K

With IOMMU, but without your latest patch:

IOPS = 520K

With IOMMU and your latest patch:

IOPS = 500K ~ 850K (a lot of variation observed during the same test run)

As you can see, performance has improved significantly with this patch
series! That is very impressive!

However, it is still off, compared to the test runs without the IOMMU.
I'm wondering if more improvement is expected.

In addition, a much larger throughput variation is observed in the tests
with these latest patches, when multiple CPUs are involved. I'm
wondering if that is caused by some remaining lock in the driver?

Also, in a few occasions, I observed the following message during the
test, when multiple cores are involved:

arm-smmu 64000000.mmu: TLB sync timed out -- SMMU may be deadlocked

Thanks,

Ray

On 6/9/17 12:28 PM, Nate Watterson wrote:
> Hi Robin,
> 
> On 6/8/2017 7:51 AM, Robin Murphy wrote:
>> Hi all,
>>
>> Here's the cleaned up nominally-final version of the patches everybody's
>> keen to see. #1 is just a non-critical thing-I-spotted-in-passing fix,
>> #2-#4 do some preparatory work (and bid farewell to everyone's least
>> favourite bit of code, hooray!), and #5-#8 do the dirty deed itself.
>>
>> The branch I've previously shared has been updated too:
>>
>>    git://linux-arm.org/linux-rm  iommu/pgtable
>>
>> All feedback welcome, as I'd really like to land this for 4.13.
>>
> 
> I tested the series on a QDF2400 development platform and see notable
> performance improvements particularly in workloads that make concurrent
> accesses to a single iommu_domain.
> 
>> Robin.
>>
>>
>> Robin Murphy (8):
>>    iommu/io-pgtable-arm-v7s: Check table PTEs more precisely
>>    iommu/io-pgtable-arm: Improve split_blk_unmap
>>    iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
>>    iommu/io-pgtable: Introduce explicit coherency
>>    iommu/io-pgtable-arm: Support lockless operation
>>    iommu/io-pgtable-arm-v7s: Support lockless operation
>>    iommu/arm-smmu: Remove io-pgtable spinlock
>>    iommu/arm-smmu-v3: Remove io-pgtable spinlock
>>
>>   drivers/iommu/arm-smmu-v3.c        |  36 ++-----
>>   drivers/iommu/arm-smmu.c           |  48 ++++------
>>   drivers/iommu/io-pgtable-arm-v7s.c | 173
>> +++++++++++++++++++++------------
>>   drivers/iommu/io-pgtable-arm.c     | 190
>> ++++++++++++++++++++++++-------------
>>   drivers/iommu/io-pgtable.h         |   6 ++
>>   5 files changed, 268 insertions(+), 185 deletions(-)
>>
> 

  parent reply	other threads:[~2017-06-15  0:40 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-08 11:51 [PATCH 0/8] io-pgtable lock removal Robin Murphy
2017-06-08 11:51 ` Robin Murphy
2017-06-08 11:52 ` [PATCH 1/8] iommu/io-pgtable-arm-v7s: Check table PTEs more precisely Robin Murphy
2017-06-08 11:52   ` Robin Murphy
2017-06-08 11:52 ` [PATCH 2/8] iommu/io-pgtable-arm: Improve split_blk_unmap Robin Murphy
2017-06-08 11:52   ` Robin Murphy
2017-06-08 11:52 ` [PATCH 3/8] iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap Robin Murphy
2017-06-08 11:52   ` Robin Murphy
2017-06-08 11:52 ` [PATCH 4/8] iommu/io-pgtable: Introduce explicit coherency Robin Murphy
2017-06-08 11:52   ` Robin Murphy
2017-06-08 11:52 ` [PATCH 5/8] iommu/io-pgtable-arm: Support lockless operation Robin Murphy
2017-06-08 11:52   ` Robin Murphy
2017-06-08 11:52 ` [PATCH 6/8] iommu/io-pgtable-arm-v7s: " Robin Murphy
2017-06-08 11:52   ` Robin Murphy
2017-06-08 11:52 ` [PATCH 7/8] iommu/arm-smmu: Remove io-pgtable spinlock Robin Murphy
2017-06-08 11:52   ` Robin Murphy
2017-06-08 11:52 ` [PATCH 8/8] iommu/arm-smmu-v3: " Robin Murphy
2017-06-08 11:52   ` Robin Murphy
     [not found] ` <cover.1496921366.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-06-09 19:28   ` [PATCH 0/8] io-pgtable lock removal Nate Watterson
2017-06-09 19:28     ` Nate Watterson
     [not found]     ` <458ad41d-6679-eeca-3c0f-13ccb6c933b6-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-06-15  0:40       ` Ray Jui via iommu [this message]
2017-06-15  0:40         ` Ray Jui
     [not found]         ` <b7830be1-9a78-e29f-a29c-4798aaa28c0a-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-06-15 12:25           ` John Garry
2017-06-15 12:25             ` John Garry
2017-06-20 13:37           ` Robin Murphy
2017-06-20 13:37             ` Robin Murphy
     [not found]             ` <cdc1799b-f142-09ed-a7e5-d7fd2e70268f-5wv7dgnIgG8@public.gmane.org>
2017-06-27 16:43               ` Ray Jui via iommu
2017-06-27 16:43                 ` Ray Jui
     [not found]                 ` <e43ba1fe-696e-fabb-a800-52fadaa2fa93-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-06-28 11:46                   ` Will Deacon
2017-06-28 11:46                     ` Will Deacon
     [not found]                     ` <20170628114609.GD11053-5wv7dgnIgG8@public.gmane.org>
2017-06-28 17:02                       ` Ray Jui via iommu
2017-06-28 17:02                         ` Ray Jui
     [not found]                         ` <87d53115-3d80-5a3d-6632-c31986cb7018-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-07-04 17:31                           ` Will Deacon
2017-07-04 17:31                             ` Will Deacon
     [not found]                             ` <20170704173155.GN22175-5wv7dgnIgG8@public.gmane.org>
2017-07-04 17:39                               ` Ray Jui via iommu
2017-07-04 17:39                                 ` Ray Jui
     [not found]                                 ` <6814b246-22f0-bfaa-5002-a269b2735116-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-07-05  1:45                                   ` Ray Jui via iommu
2017-07-05  1:45                                     ` Ray Jui
     [not found]                                     ` <2d5f5ef3-32b1-76c6-6869-ff980557f8e8-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-07-05  8:41                                       ` Will Deacon
2017-07-05  8:41                                         ` Will Deacon
     [not found]                                         ` <20170705084143.GA9378-5wv7dgnIgG8@public.gmane.org>
2017-07-05 23:24                                           ` Ray Jui via iommu
2017-07-05 23:24                                             ` Ray Jui
     [not found]                                             ` <5149280b-a214-249c-c5e2-3712b1f941d2-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-07-06 15:08                                               ` Will Deacon
2017-07-06 15:08                                                 ` Will Deacon
     [not found]                                                 ` <20170706150838.GB15574-5wv7dgnIgG8@public.gmane.org>
2017-07-06 18:14                                                   ` Ray Jui via iommu
2017-07-06 18:14                                                     ` Ray Jui
     [not found]                                                     ` <94ba5d4a-0dae-9394-79ef-90da86e49c86-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-07-07 12:46                                                       ` Will Deacon
2017-07-07 12:46                                                         ` Will Deacon
2017-06-21 15:47           ` Joerg Roedel
2017-06-21 15:47             ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b7830be1-9a78-e29f-a29c-4798aaa28c0a@broadcom.com \
    --to=iommu-cuntk1mwbs9qetfly7kem3xjstq8ys+chz5vsktnxna@public.gmane.org \
    --cc=joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org \
    --cc=linu.cherian-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org \
    --cc=ray.jui-dY08KVG/lbpWk0Htik3J/w@public.gmane.org \
    --cc=robin.murphy-5wv7dgnIgG8@public.gmane.org \
    --cc=sunil.goutham-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org \
    --cc=will.deacon-5wv7dgnIgG8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.