From: Ming Lei <ming.lei@redhat.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
iommu@lists.linux-foundation.org, Will Deacon <will@kernel.org>,
linux-arm-kernel@lists.infradead.org
Subject: Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node
Date: Fri, 23 Jul 2021 18:21:01 +0800 [thread overview]
Message-ID: <YPqYDY9/VAhfHNfU@T590> (raw)
In-Reply-To: <0adbe03b-ce26-e4d3-3425-d967bc436ef5@arm.com>
[-- Attachment #1: Type: text/plain, Size: 3040 bytes --]
On Thu, Jul 22, 2021 at 06:40:18PM +0100, Robin Murphy wrote:
> On 2021-07-22 16:54, Ming Lei wrote:
> [...]
> > > If you are still keen to investigate more, then can try either of these:
> > >
> > > - add iommu.strict=0 to the cmdline
> > >
> > > - use perf record+annotate to find the hotspot
> > > - For this you need to enable psuedo-NMI with 2x steps:
> > > CONFIG_ARM64_PSEUDO_NMI=y in defconfig
> > > Add irqchip.gicv3_pseudo_nmi=1
> > >
> > > See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/Kconfig#n1745
> > > Your kernel log should show:
> > > [ 0.000000] GICv3: Pseudo-NMIs enabled using forced ICC_PMR_EL1
> > > synchronisation
> >
> > OK, will try the above tomorrow.
>
> Thanks, I was also going to suggest the latter, since it's what
> arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most
> indicative of where the slowness most likely stems from.
The improvement from 'iommu.strict=0' is very small:
[root@ampere-mtjade-04 ~]# cat /proc/cmdline
BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.14.0-rc2_linus root=UUID=cff79b49-6661-4347-b366-eb48273fe0c1 ro nvme.poll_queues=2 iommu.strict=0
[root@ampere-mtjade-04 ~]# taskset -c 0 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k
+ fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread --name=test --group_reporting
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
fio-3.27
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=1530MiB/s][r=392k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2999: Fri Jul 23 06:05:15 2021
read: IOPS=392k, BW=1530MiB/s (1604MB/s)(14.9GiB/10001msec)
[root@ampere-mtjade-04 ~]# taskset -c 80 ~/git/tools/test/nvme/io_uring 20 1 /dev/nvme1n1 4k
+ fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=20 --numjobs=1 --rw=randread --name=test --group_reporting
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
fio-3.27
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=150MiB/s][r=38.4k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3063: Fri Jul 23 06:05:49 2021
read: IOPS=38.4k, BW=150MiB/s (157MB/s)(3000MiB/20002msec)
>
> FWIW I would expect iommu.strict=0 to give a proportional reduction in SMMU
> overhead for both cases since it should effectively mean only 1/256 as many
> invalidations are issued.
>
> Could you also check whether the SMMU platform devices have "numa_node"
> properties exposed in sysfs (and if so whether the values look right), and
> share all the SMMU output from the boot log?
No found numa_node attribute for smmu platform device, and the whole dmesg log is
attached.
Thanks,
Ming
[-- Attachment #2: arm64.log.tar.gz --]
[-- Type: application/gzip, Size: 34200 bytes --]
[-- Attachment #3: Type: text/plain, Size: 156 bytes --]
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: John Garry <john.garry@huawei.com>,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
iommu@lists.linux-foundation.org, Will Deacon <will@kernel.org>,
linux-arm-kernel@lists.infradead.org
Subject: Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node
Date: Fri, 23 Jul 2021 18:21:01 +0800 [thread overview]
Message-ID: <YPqYDY9/VAhfHNfU@T590> (raw)
In-Reply-To: <0adbe03b-ce26-e4d3-3425-d967bc436ef5@arm.com>
[-- Attachment #1: Type: text/plain, Size: 3040 bytes --]
On Thu, Jul 22, 2021 at 06:40:18PM +0100, Robin Murphy wrote:
> On 2021-07-22 16:54, Ming Lei wrote:
> [...]
> > > If you are still keen to investigate more, then can try either of these:
> > >
> > > - add iommu.strict=0 to the cmdline
> > >
> > > - use perf record+annotate to find the hotspot
> > > - For this you need to enable psuedo-NMI with 2x steps:
> > > CONFIG_ARM64_PSEUDO_NMI=y in defconfig
> > > Add irqchip.gicv3_pseudo_nmi=1
> > >
> > > See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/Kconfig#n1745
> > > Your kernel log should show:
> > > [ 0.000000] GICv3: Pseudo-NMIs enabled using forced ICC_PMR_EL1
> > > synchronisation
> >
> > OK, will try the above tomorrow.
>
> Thanks, I was also going to suggest the latter, since it's what
> arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most
> indicative of where the slowness most likely stems from.
The improvement from 'iommu.strict=0' is very small:
[root@ampere-mtjade-04 ~]# cat /proc/cmdline
BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.14.0-rc2_linus root=UUID=cff79b49-6661-4347-b366-eb48273fe0c1 ro nvme.poll_queues=2 iommu.strict=0
[root@ampere-mtjade-04 ~]# taskset -c 0 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k
+ fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread --name=test --group_reporting
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
fio-3.27
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=1530MiB/s][r=392k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2999: Fri Jul 23 06:05:15 2021
read: IOPS=392k, BW=1530MiB/s (1604MB/s)(14.9GiB/10001msec)
[root@ampere-mtjade-04 ~]# taskset -c 80 ~/git/tools/test/nvme/io_uring 20 1 /dev/nvme1n1 4k
+ fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=20 --numjobs=1 --rw=randread --name=test --group_reporting
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
fio-3.27
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=150MiB/s][r=38.4k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3063: Fri Jul 23 06:05:49 2021
read: IOPS=38.4k, BW=150MiB/s (157MB/s)(3000MiB/20002msec)
>
> FWIW I would expect iommu.strict=0 to give a proportional reduction in SMMU
> overhead for both cases since it should effectively mean only 1/256 as many
> invalidations are issued.
>
> Could you also check whether the SMMU platform devices have "numa_node"
> properties exposed in sysfs (and if so whether the values look right), and
> share all the SMMU output from the boot log?
No found numa_node attribute for smmu platform device, and the whole dmesg log is
attached.
Thanks,
Ming
[-- Attachment #2: arm64.log.tar.gz --]
[-- Type: application/gzip, Size: 34200 bytes --]
[-- Attachment #3: Type: text/plain, Size: 158 bytes --]
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: John Garry <john.garry@huawei.com>,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
iommu@lists.linux-foundation.org, Will Deacon <will@kernel.org>,
linux-arm-kernel@lists.infradead.org
Subject: Re: [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node
Date: Fri, 23 Jul 2021 18:21:01 +0800 [thread overview]
Message-ID: <YPqYDY9/VAhfHNfU@T590> (raw)
In-Reply-To: <0adbe03b-ce26-e4d3-3425-d967bc436ef5@arm.com>
[-- Attachment #1: Type: text/plain, Size: 3040 bytes --]
On Thu, Jul 22, 2021 at 06:40:18PM +0100, Robin Murphy wrote:
> On 2021-07-22 16:54, Ming Lei wrote:
> [...]
> > > If you are still keen to investigate more, then can try either of these:
> > >
> > > - add iommu.strict=0 to the cmdline
> > >
> > > - use perf record+annotate to find the hotspot
> > > - For this you need to enable psuedo-NMI with 2x steps:
> > > CONFIG_ARM64_PSEUDO_NMI=y in defconfig
> > > Add irqchip.gicv3_pseudo_nmi=1
> > >
> > > See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/Kconfig#n1745
> > > Your kernel log should show:
> > > [ 0.000000] GICv3: Pseudo-NMIs enabled using forced ICC_PMR_EL1
> > > synchronisation
> >
> > OK, will try the above tomorrow.
>
> Thanks, I was also going to suggest the latter, since it's what
> arm_smmu_cmdq_issue_cmdlist() does with IRQs masked that should be most
> indicative of where the slowness most likely stems from.
The improvement from 'iommu.strict=0' is very small:
[root@ampere-mtjade-04 ~]# cat /proc/cmdline
BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.14.0-rc2_linus root=UUID=cff79b49-6661-4347-b366-eb48273fe0c1 ro nvme.poll_queues=2 iommu.strict=0
[root@ampere-mtjade-04 ~]# taskset -c 0 ~/git/tools/test/nvme/io_uring 10 1 /dev/nvme1n1 4k
+ fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=10 --numjobs=1 --rw=randread --name=test --group_reporting
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
fio-3.27
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=1530MiB/s][r=392k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2999: Fri Jul 23 06:05:15 2021
read: IOPS=392k, BW=1530MiB/s (1604MB/s)(14.9GiB/10001msec)
[root@ampere-mtjade-04 ~]# taskset -c 80 ~/git/tools/test/nvme/io_uring 20 1 /dev/nvme1n1 4k
+ fio --bs=4k --ioengine=io_uring --fixedbufs --registerfiles --hipri --iodepth=64 --iodepth_batch_submit=16 --iodepth_batch_complete_min=16 --filename=/dev/nvme1n1 --direct=1 --runtime=20 --numjobs=1 --rw=randread --name=test --group_reporting
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
fio-3.27
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=150MiB/s][r=38.4k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3063: Fri Jul 23 06:05:49 2021
read: IOPS=38.4k, BW=150MiB/s (157MB/s)(3000MiB/20002msec)
>
> FWIW I would expect iommu.strict=0 to give a proportional reduction in SMMU
> overhead for both cases since it should effectively mean only 1/256 as many
> invalidations are issued.
>
> Could you also check whether the SMMU platform devices have "numa_node"
> properties exposed in sysfs (and if so whether the values look right), and
> share all the SMMU output from the boot log?
No found numa_node attribute for smmu platform device, and the whole dmesg log is
attached.
Thanks,
Ming
[-- Attachment #2: arm64.log.tar.gz --]
[-- Type: application/gzip, Size: 34200 bytes --]
[-- Attachment #3: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-07-23 10:21 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-09 8:38 [bug report] iommu_dma_unmap_sg() is very slow then running IO from remote numa node Ming Lei
2021-07-09 8:38 ` Ming Lei
2021-07-09 8:38 ` Ming Lei
2021-07-09 10:16 ` Russell King (Oracle)
2021-07-09 10:16 ` Russell King (Oracle)
2021-07-09 10:16 ` Russell King (Oracle)
2021-07-09 14:21 ` Ming Lei
2021-07-09 14:21 ` Ming Lei
2021-07-09 14:21 ` Ming Lei
2021-07-09 10:26 ` Robin Murphy
2021-07-09 10:26 ` Robin Murphy
2021-07-09 10:26 ` Robin Murphy
2021-07-09 11:04 ` John Garry
2021-07-09 11:04 ` John Garry
2021-07-09 11:04 ` John Garry
2021-07-09 12:34 ` Robin Murphy
2021-07-09 12:34 ` Robin Murphy
2021-07-09 12:34 ` Robin Murphy
2021-07-09 14:24 ` Ming Lei
2021-07-09 14:24 ` Ming Lei
2021-07-09 14:24 ` Ming Lei
2021-07-19 16:14 ` John Garry
2021-07-19 16:14 ` John Garry
2021-07-19 16:14 ` John Garry
2021-07-21 1:40 ` Ming Lei
2021-07-21 1:40 ` Ming Lei
2021-07-21 1:40 ` Ming Lei
2021-07-21 9:23 ` John Garry
2021-07-21 9:23 ` John Garry
2021-07-21 9:23 ` John Garry
2021-07-21 9:59 ` Ming Lei
2021-07-21 9:59 ` Ming Lei
2021-07-21 9:59 ` Ming Lei
2021-07-21 11:07 ` John Garry
2021-07-21 11:07 ` John Garry
2021-07-21 11:07 ` John Garry
2021-07-21 11:58 ` Ming Lei
2021-07-21 11:58 ` Ming Lei
2021-07-21 11:58 ` Ming Lei
2021-07-22 7:58 ` Ming Lei
2021-07-22 7:58 ` Ming Lei
2021-07-22 7:58 ` Ming Lei
2021-07-22 10:05 ` John Garry
2021-07-22 10:05 ` John Garry
2021-07-22 10:05 ` John Garry
2021-07-22 10:19 ` Ming Lei
2021-07-22 10:19 ` Ming Lei
2021-07-22 10:19 ` Ming Lei
2021-07-22 11:12 ` John Garry
2021-07-22 11:12 ` John Garry
2021-07-22 11:12 ` John Garry
2021-07-22 12:53 ` Marc Zyngier
2021-07-22 12:53 ` Marc Zyngier
2021-07-22 12:53 ` Marc Zyngier
2021-07-22 13:54 ` John Garry
2021-07-22 13:54 ` John Garry
2021-07-22 13:54 ` John Garry
2021-07-22 15:54 ` Ming Lei
2021-07-22 15:54 ` Ming Lei
2021-07-22 15:54 ` Ming Lei
2021-07-22 17:40 ` Robin Murphy
2021-07-22 17:40 ` Robin Murphy
2021-07-22 17:40 ` Robin Murphy
2021-07-23 10:21 ` Ming Lei [this message]
2021-07-23 10:21 ` Ming Lei
2021-07-23 10:21 ` Ming Lei
2021-07-26 7:51 ` John Garry
2021-07-26 7:51 ` John Garry
2021-07-26 7:51 ` John Garry
2021-07-28 1:32 ` Ming Lei
2021-07-28 1:32 ` Ming Lei
2021-07-28 1:32 ` Ming Lei
2021-07-28 10:38 ` John Garry
2021-07-28 10:38 ` John Garry
2021-07-28 10:38 ` John Garry
2021-07-28 15:17 ` Ming Lei
2021-07-28 15:17 ` Ming Lei
2021-07-28 15:17 ` Ming Lei
2021-07-28 15:39 ` Robin Murphy
2021-07-28 15:39 ` Robin Murphy
2021-07-28 15:39 ` Robin Murphy
2021-08-10 9:36 ` John Garry
2021-08-10 9:36 ` John Garry
2021-08-10 9:36 ` John Garry
2021-08-10 10:35 ` Ming Lei
2021-08-10 10:35 ` Ming Lei
2021-08-10 10:35 ` Ming Lei
2021-07-27 17:08 ` Robin Murphy
2021-07-27 17:08 ` Robin Murphy
2021-07-27 17:08 ` Robin Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YPqYDY9/VAhfHNfU@T590 \
--to=ming.lei@redhat.com \
--cc=iommu@lists.linux-foundation.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.