From: Jean-Philippe Brucker <jean-philippe@linaro.org>
To: John Garry <john.garry@huawei.com>
Cc: Will Deacon <will@kernel.org>, Ming Lei <ming.lei@redhat.com>,
iommu@lists.linux-foundation.org, Marc Zyngier <maz@kernel.org>,
Robin Murphy <robin.murphy@arm.com>
Subject: Re: arm-smmu-v3 high cpu usage for NVMe
Date: Fri, 20 Mar 2020 12:18:42 +0100 [thread overview]
Message-ID: <20200320111842.GD1702630@myrica> (raw)
In-Reply-To: <c9ebe17d-66b8-1b8c-cc2c-5be0bd1501a7@huawei.com>
On Fri, Mar 20, 2020 at 10:41:44AM +0000, John Garry wrote:
> On 19/03/2020 18:43, Jean-Philippe Brucker wrote:
> > On Thu, Mar 19, 2020 at 12:54:59PM +0000, John Garry wrote:
> > > Hi Will,
> > >
> > > >
> > > > On Thu, Jan 02, 2020 at 05:44:39PM +0000, John Garry wrote:
> > > > > And for the overall system, we have:
> > > > >
> > > > > PerfTop: 85864 irqs/sec kernel:89.6% exact: 0.0% lost: 0/34434 drop:
> > > > > 0/40116 [4000Hz cycles], (all, 96 CPUs)
> > > > > --------------------------------------------------------------------------------------------------------------------------
> > > > >
> > > > > 27.43% [kernel] [k] arm_smmu_cmdq_issue_cmdlist
> > > > > 11.71% [kernel] [k] _raw_spin_unlock_irqrestore
> > > > > 6.35% [kernel] [k] _raw_spin_unlock_irq
> > > > > 2.65% [kernel] [k] get_user_pages_fast
> > > > > 2.03% [kernel] [k] __slab_free
> > > > > 1.55% [kernel] [k] tick_nohz_idle_exit
> > > > > 1.47% [kernel] [k] arm_lpae_map
> > > > > 1.39% [kernel] [k] __fget
> > > > > 1.14% [kernel] [k] __lock_text_start
> > > > > 1.09% [kernel] [k] _raw_spin_lock
> > > > > 1.08% [kernel] [k] bio_release_pages.part.42
> > > > > 1.03% [kernel] [k] __sbitmap_get_word
> > > > > 0.97% [kernel] [k] arm_smmu_atc_inv_domain.constprop.42
> > > > > 0.91% [kernel] [k] fput_many
> > > > > 0.88% [kernel] [k] __arm_lpae_map
> > > > >
> > > > > One thing to note is that we still spend an appreciable amount of time in
> > > > > arm_smmu_atc_inv_domain(), which is disappointing when considering it should
> > > > > effectively be a noop.
> > > > >
> > > > > As for arm_smmu_cmdq_issue_cmdlist(), I do note that during the testing our
> > > > > batch size is 1, so we're not seeing the real benefit of the batching. I
> > > > > can't help but think that we could improve this code to try to combine CMD
> > > > > SYNCs for small batches.
> > > > >
> > > > > Anyway, let me know your thoughts or any questions. I'll have a look if a
> > > > > get a chance for other possible bottlenecks.
> > > >
> > > > Did you ever get any more information on this? I don't have any SMMUv3
> > > > hardware any more, so I can't really dig into this myself.
> > >
> > > I'm only getting back to look at this now, as SMMU performance is a bit of a
> > > hot topic again for us.
> > >
> > > So one thing we are doing which looks to help performance is this series
> > > from Marc:
> > >
> > > https://lore.kernel.org/lkml/9171c554-50d2-142b-96ae-1357952fce52@huawei.com/T/#mee5562d1efd6aaeb8d2682bdb6807fe7b5d7f56d
> > >
> > > So that is just spreading the per-CPU load for NVMe interrupt handling
> > > (where the DMA unmapping is happening), so I'd say just side-stepping any
> > > SMMU issue really.
> > >
> > > Going back to the SMMU, I wanted to run epbf and perf annotate to help
> > > profile this, but was having no luck getting them to work properly. I'll
> > > look at this again now.
> >
> > Could you also try with the upcoming ATS change currently in Will's tree?
> > They won't improve your numbers but it'd be good to check that they don't
> > make things worse.
>
> I can do when I get a chance.
>
> >
> > I've run a bunch of netperf instances on multiple cores and collecting
> > SMMU usage (on TaiShan 2280). I'm getting the following ratio pretty
> > consistently.
> >
> > - 6.07% arm_smmu_iotlb_sync
> > - 5.74% arm_smmu_tlb_inv_range
> > 5.09% arm_smmu_cmdq_issue_cmdlist
> > 0.28% __pi_memset
> > 0.08% __pi_memcpy
> > 0.08% arm_smmu_atc_inv_domain.constprop.37
> > 0.07% arm_smmu_cmdq_build_cmd
> > 0.01% arm_smmu_cmdq_batch_add
> > 0.31% __pi_memset
> >
> > So arm_smmu_atc_inv_domain() takes about 1.4% of arm_smmu_iotlb_sync(),
> > when ATS is not used. According to the annotations, the load from the
> > atomic_read(), that checks whether the domain uses ATS, is 77% of the
> > samples in arm_smmu_atc_inv_domain() (265 of 345 samples), so I'm not sure
> > there is much room for optimization there.
>
> Well I did originally suggest using RCU protection to scan the list of
> devices, instead of reading an atomic and checking for non-zero value. But
> that would be an optimsation for ATS also, and there was no ATS devices at
> the time (to verify performance).
Heh, I have yet to get my hands on one. Currently I can't evaluate ATS
performance, but I agree that using RCU to scan the list should get better
results when using ATS.
When ATS isn't in use however, I suspect reading nr_ats_masters should be
more efficient than taking the RCU lock + reading an "ats_devices" list
(since the smmu_domain->devices list also serves context descriptor
invalidation, even when ATS isn't in use). I'll run some tests however, to
see if I can micro-optimize this case, but I don't expect noticeable
improvements.
Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2020-03-20 11:18 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-21 15:17 [PATCH v2 0/8] Sort out SMMUv3 ATC invalidation and locking Will Deacon
2019-08-21 15:17 ` [PATCH v2 1/8] iommu/arm-smmu-v3: Document ordering guarantees of command insertion Will Deacon
2019-08-21 15:17 ` [PATCH v2 2/8] iommu/arm-smmu-v3: Disable detection of ATS and PRI Will Deacon
2019-08-21 15:17 ` Will Deacon
2019-08-21 15:36 ` Robin Murphy
2019-08-21 15:36 ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 3/8] iommu/arm-smmu-v3: Remove boolean bitfield for 'ats_enabled' flag Will Deacon
2019-08-21 15:17 ` [PATCH v2 4/8] iommu/arm-smmu-v3: Don't issue CMD_SYNC for zero-length invalidations Will Deacon
2019-08-21 15:17 ` [PATCH v2 5/8] iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters Will Deacon
2019-08-21 15:50 ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 6/8] iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs Will Deacon
2019-08-21 16:25 ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 7/8] iommu/arm-smmu-v3: Avoid locking on invalidation path when not using ATS Will Deacon
2019-08-22 12:36 ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 8/8] Revert "iommu/arm-smmu-v3: Disable detection of ATS and PRI" Will Deacon
2020-01-02 17:44 ` arm-smmu-v3 high cpu usage for NVMe John Garry
2020-03-18 20:53 ` Will Deacon
2020-03-19 12:54 ` John Garry
2020-03-19 18:43 ` Jean-Philippe Brucker
2020-03-20 10:41 ` John Garry
2020-03-20 11:18 ` Jean-Philippe Brucker [this message]
2020-03-20 16:20 ` John Garry
2020-03-20 16:33 ` Marc Zyngier
2020-03-23 9:03 ` John Garry
2020-03-23 9:16 ` Marc Zyngier
2020-03-24 9:18 ` John Garry
2020-03-24 10:43 ` Marc Zyngier
2020-03-24 11:55 ` John Garry
2020-03-24 12:07 ` Robin Murphy
2020-03-24 12:37 ` John Garry
2020-03-25 15:31 ` John Garry
2020-05-22 14:52 ` John Garry
2020-05-25 5:57 ` Song Bao Hua (Barry Song)
[not found] ` <482c00d5-8e6d-1484-820e-1e89851ad5aa@huawei.com>
2020-04-06 15:11 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200320111842.GD1702630@myrica \
--to=jean-philippe@linaro.org \
--cc=iommu@lists.linux-foundation.org \
--cc=john.garry@huawei.com \
--cc=maz@kernel.org \
--cc=ming.lei@redhat.com \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.