From: Marc Zyngier <maz@kernel.org>
To: John Garry <john.garry@huawei.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>,
Robin Murphy <robin.murphy@arm.com>,
Ming Lei <ming.lei@redhat.com>,
iommu@lists.linux-foundation.org, Will Deacon <will@kernel.org>
Subject: Re: arm-smmu-v3 high cpu usage for NVMe
Date: Mon, 23 Mar 2020 09:16:35 +0000 [thread overview]
Message-ID: <dd375cf6bffacd82174c84a4c7d46283@kernel.org> (raw)
In-Reply-To: <cca67355-672d-81c5-3d37-72004eb8f14f@huawei.com>
On 2020-03-23 09:03, John Garry wrote:
> On 20/03/2020 16:33, Marc Zyngier wrote:
>>> JFYI, I've been playing for "perf annotate" today and it's giving
>>> strange results for my NVMe testing. So "report" looks somewhat sane,
>>> if not a worryingly high % for arm_smmu_cmdq_issue_cmdlist():
>>>
>>>
>>> 55.39% irq/342-nvme0q1 [kernel.kallsyms] [k]
>>> arm_smmu_cmdq_issue_cmdlist
>>> 9.74% irq/342-nvme0q1 [kernel.kallsyms] [k]
>>> _raw_spin_unlock_irqrestore
>>> 2.02% irq/342-nvme0q1 [kernel.kallsyms] [k] nvme_irq
>>> 1.86% irq/342-nvme0q1 [kernel.kallsyms] [k] fput_many
>>> 1.73% irq/342-nvme0q1 [kernel.kallsyms] [k]
>>> arm_smmu_atc_inv_domain.constprop.42
>>> 1.67% irq/342-nvme0q1 [kernel.kallsyms] [k] __arm_lpae_unmap
>>> 1.49% irq/342-nvme0q1 [kernel.kallsyms] [k] aio_complete_rw
>>>
>>> But "annotate" consistently tells me that a specific instruction
>>> consumes ~99% of the load for the enqueue function:
>>>
>>> : /* 5. If we are inserting a CMD_SYNC,
>>> we must wait for it to complete */
>>> : if (sync) {
>>> 0.00 : ffff80001071c948: ldr w0, [x29, #108]
>>> : int ret = 0;
>>> 0.00 : ffff80001071c94c: mov w24, #0x0 // #0
>>> : if (sync) {
>>> 0.00 : ffff80001071c950: cbnz w0, ffff80001071c990
>>> <arm_smmu_cmdq_issue_cmdlist+0x420>
>>> : arch_local_irq_restore():
>>> 0.00 : ffff80001071c954: msr daif, x21
>>> : arm_smmu_cmdq_issue_cmdlist():
>>> : }
>>> : }
>>> :
>>> : local_irq_restore(flags);
>>> : return ret;
>>> : }
>>> 99.51 : ffff80001071c958: adrp x0, ffff800011909000
>>> <page_wait_table+0x14c0>
>>
>
> Hi Marc,
>
>> This is likely the side effect of the re-enabling of interrupts (msr
>> daif, x21)
>> on the previous instruction which causes the perf interrupt to fire
>> right after.
>
> ok, makes sense.
>
>>
>> Time to enable pseudo-NMIs in the PMUv3 driver...
>>
>
> Do you know if there is any plan for this?
There was. Julien Thierry has a bunch of patches for that [1], but they
needs
reviving.
>
> In the meantime, maybe I can do some trickery by putting the
> local_irq_restore() in a separate function, outside
> arm_smmu_cmdq_issue_cmdlist(), to get a fair profile for that same
> function.
I don't see how you can improve the profiling without compromising
the locking in this case...
Thanks,
M.
[1] https://patchwork.kernel.org/cover/11047407/
--
Jazz is not dead. It just smells funny...
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2020-03-23 9:16 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-21 15:17 [PATCH v2 0/8] Sort out SMMUv3 ATC invalidation and locking Will Deacon
2019-08-21 15:17 ` [PATCH v2 1/8] iommu/arm-smmu-v3: Document ordering guarantees of command insertion Will Deacon
2019-08-21 15:17 ` [PATCH v2 2/8] iommu/arm-smmu-v3: Disable detection of ATS and PRI Will Deacon
2019-08-21 15:17 ` Will Deacon
2019-08-21 15:36 ` Robin Murphy
2019-08-21 15:36 ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 3/8] iommu/arm-smmu-v3: Remove boolean bitfield for 'ats_enabled' flag Will Deacon
2019-08-21 15:17 ` [PATCH v2 4/8] iommu/arm-smmu-v3: Don't issue CMD_SYNC for zero-length invalidations Will Deacon
2019-08-21 15:17 ` [PATCH v2 5/8] iommu/arm-smmu-v3: Rework enabling/disabling of ATS for PCI masters Will Deacon
2019-08-21 15:50 ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 6/8] iommu/arm-smmu-v3: Fix ATC invalidation ordering wrt main TLBs Will Deacon
2019-08-21 16:25 ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 7/8] iommu/arm-smmu-v3: Avoid locking on invalidation path when not using ATS Will Deacon
2019-08-22 12:36 ` Robin Murphy
2019-08-21 15:17 ` [PATCH v2 8/8] Revert "iommu/arm-smmu-v3: Disable detection of ATS and PRI" Will Deacon
2020-01-02 17:44 ` arm-smmu-v3 high cpu usage for NVMe John Garry
2020-03-18 20:53 ` Will Deacon
2020-03-19 12:54 ` John Garry
2020-03-19 18:43 ` Jean-Philippe Brucker
2020-03-20 10:41 ` John Garry
2020-03-20 11:18 ` Jean-Philippe Brucker
2020-03-20 16:20 ` John Garry
2020-03-20 16:33 ` Marc Zyngier
2020-03-23 9:03 ` John Garry
2020-03-23 9:16 ` Marc Zyngier [this message]
2020-03-24 9:18 ` John Garry
2020-03-24 10:43 ` Marc Zyngier
2020-03-24 11:55 ` John Garry
2020-03-24 12:07 ` Robin Murphy
2020-03-24 12:37 ` John Garry
2020-03-25 15:31 ` John Garry
2020-05-22 14:52 ` John Garry
2020-05-25 5:57 ` Song Bao Hua (Barry Song)
[not found] ` <482c00d5-8e6d-1484-820e-1e89851ad5aa@huawei.com>
2020-04-06 15:11 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dd375cf6bffacd82174c84a4c7d46283@kernel.org \
--to=maz@kernel.org \
--cc=iommu@lists.linux-foundation.org \
--cc=jean-philippe@linaro.org \
--cc=john.garry@huawei.com \
--cc=ming.lei@redhat.com \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.