From: Ray Jui via iommu <iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>
To: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>,
Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>,
will.deacon-5wv7dgnIgG8@public.gmane.org,
joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
sunil.goutham-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
linu.cherian-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH 0/8] io-pgtable lock removal
Date: Wed, 14 Jun 2017 17:40:30 -0700 [thread overview]
Message-ID: <b7830be1-9a78-e29f-a29c-4798aaa28c0a@broadcom.com> (raw)
In-Reply-To: <458ad41d-6679-eeca-3c0f-13ccb6c933b6-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Hi Robin,
I have applied this patch series on top of v4.12-rc4, and ran various
Ethernet and NVMf target throughput tests on it.
To give you some background of my setup:
The system is a ARMv8 based system with 8 cores. It has various PCIe
root complexes that can be used to connect to PCIe endpoint devices
including NIC cards and NVMe SSDs.
I'm particularly interested in the performance of the PCIe root complex
that connects to the NIC card, and during my test, IOMMU is
enabled/disabled against that particular PCIe root complex. The root
complexes connected to NVMe SSDs remain unchanged (i.e., without IOMMU).
For the Ethernet throughput out of 50G link:
Note during the multiple TCP session test, each session will be spread
to different CPU cores for optimized performance
Without IOMMU:
TX TCP x1 - 29.7 Gbps
TX TCP x4 - 30.5 Gbps
TX TCP x8 - 28 Gbps
RX TCP x1 - 15 Gbps
RX TCP x4 - 33.7 Gbps
RX TCP x8 - 36 Gbps
With IOMMU, but without your latest patch:
TX TCP x1 - 15.2 Gbps
TX TCP x4 - 14.3 Gbps
TX TCP x8 - 13 Gbps
RX TCP x1 - 7.88 Gbps
RX TCP x4 - 13.2 Gbps
RX TCP x8 - 12.6 Gbps
With IOMMU and your latest patch:
TX TCP x1 - 21.4 Gbps
TX TCP x4 - 30.5 Gbps
TX TCP x8 - 21.3 Gbps
RX TCP x1 - 7.7 Gbps
RX TCP x4 - 20.1 Gbps
RX TCP x8 - 27.1 Gbps
With the NVMf target test with 4 SSDs, fio based test, random read, 4k,
8 jobs:
Without IOMMU:
IOPS = 1080K
With IOMMU, but without your latest patch:
IOPS = 520K
With IOMMU and your latest patch:
IOPS = 500K ~ 850K (a lot of variation observed during the same test run)
As you can see, performance has improved significantly with this patch
series! That is very impressive!
However, it is still off, compared to the test runs without the IOMMU.
I'm wondering if more improvement is expected.
In addition, a much larger throughput variation is observed in the tests
with these latest patches, when multiple CPUs are involved. I'm
wondering if that is caused by some remaining lock in the driver?
Also, in a few occasions, I observed the following message during the
test, when multiple cores are involved:
arm-smmu 64000000.mmu: TLB sync timed out -- SMMU may be deadlocked
Thanks,
Ray
On 6/9/17 12:28 PM, Nate Watterson wrote:
> Hi Robin,
>
> On 6/8/2017 7:51 AM, Robin Murphy wrote:
>> Hi all,
>>
>> Here's the cleaned up nominally-final version of the patches everybody's
>> keen to see. #1 is just a non-critical thing-I-spotted-in-passing fix,
>> #2-#4 do some preparatory work (and bid farewell to everyone's least
>> favourite bit of code, hooray!), and #5-#8 do the dirty deed itself.
>>
>> The branch I've previously shared has been updated too:
>>
>> git://linux-arm.org/linux-rm iommu/pgtable
>>
>> All feedback welcome, as I'd really like to land this for 4.13.
>>
>
> I tested the series on a QDF2400 development platform and see notable
> performance improvements particularly in workloads that make concurrent
> accesses to a single iommu_domain.
>
>> Robin.
>>
>>
>> Robin Murphy (8):
>> iommu/io-pgtable-arm-v7s: Check table PTEs more precisely
>> iommu/io-pgtable-arm: Improve split_blk_unmap
>> iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
>> iommu/io-pgtable: Introduce explicit coherency
>> iommu/io-pgtable-arm: Support lockless operation
>> iommu/io-pgtable-arm-v7s: Support lockless operation
>> iommu/arm-smmu: Remove io-pgtable spinlock
>> iommu/arm-smmu-v3: Remove io-pgtable spinlock
>>
>> drivers/iommu/arm-smmu-v3.c | 36 ++-----
>> drivers/iommu/arm-smmu.c | 48 ++++------
>> drivers/iommu/io-pgtable-arm-v7s.c | 173
>> +++++++++++++++++++++------------
>> drivers/iommu/io-pgtable-arm.c | 190
>> ++++++++++++++++++++++++-------------
>> drivers/iommu/io-pgtable.h | 6 ++
>> 5 files changed, 268 insertions(+), 185 deletions(-)
>>
>
WARNING: multiple messages have this Message-ID (diff)
From: ray.jui@broadcom.com (Ray Jui)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 0/8] io-pgtable lock removal
Date: Wed, 14 Jun 2017 17:40:30 -0700 [thread overview]
Message-ID: <b7830be1-9a78-e29f-a29c-4798aaa28c0a@broadcom.com> (raw)
In-Reply-To: <458ad41d-6679-eeca-3c0f-13ccb6c933b6@codeaurora.org>
Hi Robin,
I have applied this patch series on top of v4.12-rc4, and ran various
Ethernet and NVMf target throughput tests on it.
To give you some background of my setup:
The system is a ARMv8 based system with 8 cores. It has various PCIe
root complexes that can be used to connect to PCIe endpoint devices
including NIC cards and NVMe SSDs.
I'm particularly interested in the performance of the PCIe root complex
that connects to the NIC card, and during my test, IOMMU is
enabled/disabled against that particular PCIe root complex. The root
complexes connected to NVMe SSDs remain unchanged (i.e., without IOMMU).
For the Ethernet throughput out of 50G link:
Note during the multiple TCP session test, each session will be spread
to different CPU cores for optimized performance
Without IOMMU:
TX TCP x1 - 29.7 Gbps
TX TCP x4 - 30.5 Gbps
TX TCP x8 - 28 Gbps
RX TCP x1 - 15 Gbps
RX TCP x4 - 33.7 Gbps
RX TCP x8 - 36 Gbps
With IOMMU, but without your latest patch:
TX TCP x1 - 15.2 Gbps
TX TCP x4 - 14.3 Gbps
TX TCP x8 - 13 Gbps
RX TCP x1 - 7.88 Gbps
RX TCP x4 - 13.2 Gbps
RX TCP x8 - 12.6 Gbps
With IOMMU and your latest patch:
TX TCP x1 - 21.4 Gbps
TX TCP x4 - 30.5 Gbps
TX TCP x8 - 21.3 Gbps
RX TCP x1 - 7.7 Gbps
RX TCP x4 - 20.1 Gbps
RX TCP x8 - 27.1 Gbps
With the NVMf target test with 4 SSDs, fio based test, random read, 4k,
8 jobs:
Without IOMMU:
IOPS = 1080K
With IOMMU, but without your latest patch:
IOPS = 520K
With IOMMU and your latest patch:
IOPS = 500K ~ 850K (a lot of variation observed during the same test run)
As you can see, performance has improved significantly with this patch
series! That is very impressive!
However, it is still off, compared to the test runs without the IOMMU.
I'm wondering if more improvement is expected.
In addition, a much larger throughput variation is observed in the tests
with these latest patches, when multiple CPUs are involved. I'm
wondering if that is caused by some remaining lock in the driver?
Also, in a few occasions, I observed the following message during the
test, when multiple cores are involved:
arm-smmu 64000000.mmu: TLB sync timed out -- SMMU may be deadlocked
Thanks,
Ray
On 6/9/17 12:28 PM, Nate Watterson wrote:
> Hi Robin,
>
> On 6/8/2017 7:51 AM, Robin Murphy wrote:
>> Hi all,
>>
>> Here's the cleaned up nominally-final version of the patches everybody's
>> keen to see. #1 is just a non-critical thing-I-spotted-in-passing fix,
>> #2-#4 do some preparatory work (and bid farewell to everyone's least
>> favourite bit of code, hooray!), and #5-#8 do the dirty deed itself.
>>
>> The branch I've previously shared has been updated too:
>>
>> git://linux-arm.org/linux-rm iommu/pgtable
>>
>> All feedback welcome, as I'd really like to land this for 4.13.
>>
>
> I tested the series on a QDF2400 development platform and see notable
> performance improvements particularly in workloads that make concurrent
> accesses to a single iommu_domain.
>
>> Robin.
>>
>>
>> Robin Murphy (8):
>> iommu/io-pgtable-arm-v7s: Check table PTEs more precisely
>> iommu/io-pgtable-arm: Improve split_blk_unmap
>> iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
>> iommu/io-pgtable: Introduce explicit coherency
>> iommu/io-pgtable-arm: Support lockless operation
>> iommu/io-pgtable-arm-v7s: Support lockless operation
>> iommu/arm-smmu: Remove io-pgtable spinlock
>> iommu/arm-smmu-v3: Remove io-pgtable spinlock
>>
>> drivers/iommu/arm-smmu-v3.c | 36 ++-----
>> drivers/iommu/arm-smmu.c | 48 ++++------
>> drivers/iommu/io-pgtable-arm-v7s.c | 173
>> +++++++++++++++++++++------------
>> drivers/iommu/io-pgtable-arm.c | 190
>> ++++++++++++++++++++++++-------------
>> drivers/iommu/io-pgtable.h | 6 ++
>> 5 files changed, 268 insertions(+), 185 deletions(-)
>>
>
next prev parent reply other threads:[~2017-06-15 0:40 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-08 11:51 [PATCH 0/8] io-pgtable lock removal Robin Murphy
2017-06-08 11:51 ` Robin Murphy
2017-06-08 11:52 ` [PATCH 1/8] iommu/io-pgtable-arm-v7s: Check table PTEs more precisely Robin Murphy
2017-06-08 11:52 ` Robin Murphy
2017-06-08 11:52 ` [PATCH 2/8] iommu/io-pgtable-arm: Improve split_blk_unmap Robin Murphy
2017-06-08 11:52 ` Robin Murphy
2017-06-08 11:52 ` [PATCH 3/8] iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap Robin Murphy
2017-06-08 11:52 ` Robin Murphy
2017-06-08 11:52 ` [PATCH 4/8] iommu/io-pgtable: Introduce explicit coherency Robin Murphy
2017-06-08 11:52 ` Robin Murphy
2017-06-08 11:52 ` [PATCH 5/8] iommu/io-pgtable-arm: Support lockless operation Robin Murphy
2017-06-08 11:52 ` Robin Murphy
2017-06-08 11:52 ` [PATCH 6/8] iommu/io-pgtable-arm-v7s: " Robin Murphy
2017-06-08 11:52 ` Robin Murphy
2017-06-08 11:52 ` [PATCH 7/8] iommu/arm-smmu: Remove io-pgtable spinlock Robin Murphy
2017-06-08 11:52 ` Robin Murphy
2017-06-08 11:52 ` [PATCH 8/8] iommu/arm-smmu-v3: " Robin Murphy
2017-06-08 11:52 ` Robin Murphy
[not found] ` <cover.1496921366.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-06-09 19:28 ` [PATCH 0/8] io-pgtable lock removal Nate Watterson
2017-06-09 19:28 ` Nate Watterson
[not found] ` <458ad41d-6679-eeca-3c0f-13ccb6c933b6-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-06-15 0:40 ` Ray Jui via iommu [this message]
2017-06-15 0:40 ` Ray Jui
[not found] ` <b7830be1-9a78-e29f-a29c-4798aaa28c0a-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-06-15 12:25 ` John Garry
2017-06-15 12:25 ` John Garry
2017-06-20 13:37 ` Robin Murphy
2017-06-20 13:37 ` Robin Murphy
[not found] ` <cdc1799b-f142-09ed-a7e5-d7fd2e70268f-5wv7dgnIgG8@public.gmane.org>
2017-06-27 16:43 ` Ray Jui via iommu
2017-06-27 16:43 ` Ray Jui
[not found] ` <e43ba1fe-696e-fabb-a800-52fadaa2fa93-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-06-28 11:46 ` Will Deacon
2017-06-28 11:46 ` Will Deacon
[not found] ` <20170628114609.GD11053-5wv7dgnIgG8@public.gmane.org>
2017-06-28 17:02 ` Ray Jui via iommu
2017-06-28 17:02 ` Ray Jui
[not found] ` <87d53115-3d80-5a3d-6632-c31986cb7018-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-07-04 17:31 ` Will Deacon
2017-07-04 17:31 ` Will Deacon
[not found] ` <20170704173155.GN22175-5wv7dgnIgG8@public.gmane.org>
2017-07-04 17:39 ` Ray Jui via iommu
2017-07-04 17:39 ` Ray Jui
[not found] ` <6814b246-22f0-bfaa-5002-a269b2735116-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-07-05 1:45 ` Ray Jui via iommu
2017-07-05 1:45 ` Ray Jui
[not found] ` <2d5f5ef3-32b1-76c6-6869-ff980557f8e8-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-07-05 8:41 ` Will Deacon
2017-07-05 8:41 ` Will Deacon
[not found] ` <20170705084143.GA9378-5wv7dgnIgG8@public.gmane.org>
2017-07-05 23:24 ` Ray Jui via iommu
2017-07-05 23:24 ` Ray Jui
[not found] ` <5149280b-a214-249c-c5e2-3712b1f941d2-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-07-06 15:08 ` Will Deacon
2017-07-06 15:08 ` Will Deacon
[not found] ` <20170706150838.GB15574-5wv7dgnIgG8@public.gmane.org>
2017-07-06 18:14 ` Ray Jui via iommu
2017-07-06 18:14 ` Ray Jui
[not found] ` <94ba5d4a-0dae-9394-79ef-90da86e49c86-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-07-07 12:46 ` Will Deacon
2017-07-07 12:46 ` Will Deacon
2017-06-21 15:47 ` Joerg Roedel
2017-06-21 15:47 ` Joerg Roedel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b7830be1-9a78-e29f-a29c-4798aaa28c0a@broadcom.com \
--to=iommu-cuntk1mwbs9qetfly7kem3xjstq8ys+chz5vsktnxna@public.gmane.org \
--cc=joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org \
--cc=linu.cherian-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org \
--cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
--cc=nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org \
--cc=ray.jui-dY08KVG/lbpWk0Htik3J/w@public.gmane.org \
--cc=robin.murphy-5wv7dgnIgG8@public.gmane.org \
--cc=sunil.goutham-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org \
--cc=will.deacon-5wv7dgnIgG8@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.