From: Nicolin Chen <nicolinc@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>, Will Deacon <will@kernel.org>
Cc: <jgg@nvidia.com>, <joro@8bytes.org>, <jean-philippe@linaro.org>,
<apopple@nvidia.com>, <linux-kernel@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>
Subject: Re: [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg
Date: Thu, 31 Aug 2023 17:08:42 -0700 [thread overview]
Message-ID: <ZPErio3R5redmDNU@Asurada-Nvidia> (raw)
In-Reply-To: <ZPDQPs9UL2sksblM@Asurada-Nvidia>
Hi Will/Robin,
On Thu, Aug 31, 2023 at 10:39:15AM -0700, Nicolin Chen wrote:
> Though I have not dig enough, I assume that this worst case could
> happen to SVA too, since the IOTLB invalidation is from MMU code.
> But the same worst case might not happen to non-SVA pathway, i.e.
> TLBI ops for IO Page Table doesn't really benefit from this?
>
> With that being questioned, I got Robin's remarks that it wouldn't
> be easy to decide the arbitrary number, so we could just take the
> worst case from SVA pathway as the common threshold.
>
> Then, SVA uses the CPU page table, so perhaps we should highlight
> that SMMU sets the threshold directly reusing the MAX_TLBI_OPS of
> CPU page table rather than calculating from IO page table, though
> both of them are in the same format?
Our test team encountered a soft lockup in this path today:
--------------------------------------------------------------------
watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
sp : ffff8000d83ef290
x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000
x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000
x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0
x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0
x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a
x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001
Call trace:
arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
__arm_smmu_tlb_inv_range+0x118/0x254
arm_smmu_tlb_inv_range_asid+0x6c/0x130
arm_smmu_mm_invalidate_range+0xa0/0xa4
__mmu_notifier_invalidate_range_end+0x88/0x120
unmap_vmas+0x194/0x1e0
unmap_region+0xb4/0x144
do_mas_align_munmap+0x290/0x490
do_mas_munmap+0xbc/0x124
__vm_munmap+0xa8/0x19c
__arm64_sys_munmap+0x28/0x50
invoke_syscall+0x78/0x11c
el0_svc_common.constprop.0+0x58/0x1c0
do_el0_svc+0x34/0x60
el0_svc+0x2c/0xd4
el0t_64_sync_handler+0x114/0x140
el0t_64_sync+0x1a4/0x1a8
--------------------------------------------------------------------
I think it is the same problem that we fixed in tlbflush.h using
MAX_TLBI_OPS. So, I plan to send a cleaner bug fix (cc stable):
--------------------------------------------------------------------
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a5a63b1c947e..e3ea7d2a6308 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -186,6 +186,9 @@ static void arm_smmu_free_shared_cd(struct arm_smmu_ctx_desc *cd)
}
}
+/* Copid from arch/arm64/include/asm/tlbflush.h to avoid similar soft lockups */
+#define MAX_TLBI_OPS (1 << (PAGE_SHIFT - 3))
+
static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long start, unsigned long end)
@@ -201,9 +204,14 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
*/
size = end - start;
- if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
- arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
- PAGE_SIZE, false, smmu_domain);
+ if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) {
+ if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV) &&
+ size >= MAX_TLBI_OPS * PAGE_SIZE)
+ arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
+ else
+ arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
+ PAGE_SIZE, false, smmu_domain);
+ }
arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size);
}
--------------------------------------------------------------------
What do you think about it?
Thanks
Nic
WARNING: multiple messages have this Message-ID (diff)
From: Nicolin Chen <nicolinc@nvidia.com>
To: Robin Murphy <robin.murphy@arm.com>, Will Deacon <will@kernel.org>
Cc: <jgg@nvidia.com>, <joro@8bytes.org>, <jean-philippe@linaro.org>,
<apopple@nvidia.com>, <linux-kernel@vger.kernel.org>,
<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>
Subject: Re: [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg
Date: Thu, 31 Aug 2023 17:08:42 -0700 [thread overview]
Message-ID: <ZPErio3R5redmDNU@Asurada-Nvidia> (raw)
In-Reply-To: <ZPDQPs9UL2sksblM@Asurada-Nvidia>
Hi Will/Robin,
On Thu, Aug 31, 2023 at 10:39:15AM -0700, Nicolin Chen wrote:
> Though I have not dig enough, I assume that this worst case could
> happen to SVA too, since the IOTLB invalidation is from MMU code.
> But the same worst case might not happen to non-SVA pathway, i.e.
> TLBI ops for IO Page Table doesn't really benefit from this?
>
> With that being questioned, I got Robin's remarks that it wouldn't
> be easy to decide the arbitrary number, so we could just take the
> worst case from SVA pathway as the common threshold.
>
> Then, SVA uses the CPU page table, so perhaps we should highlight
> that SMMU sets the threshold directly reusing the MAX_TLBI_OPS of
> CPU page table rather than calculating from IO page table, though
> both of them are in the same format?
Our test team encountered a soft lockup in this path today:
--------------------------------------------------------------------
watchdog: BUG: soft lockup - CPU#244 stuck for 26s!
pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
lr : arm_smmu_cmdq_issue_cmdlist+0x150/0xa50
sp : ffff8000d83ef290
x29: ffff8000d83ef290 x28: 000000003b9aca00 x27: 0000000000000000
x26: ffff8000d83ef3c0 x25: da86c0812194a0e8 x24: 0000000000000000
x23: 0000000000000040 x22: ffff8000d83ef340 x21: ffff0000c63980c0
x20: 0000000000000001 x19: ffff0000c6398080 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: ffff3000b4a3bbb0
x14: ffff3000b4a30888 x13: ffff3000b4a3cf60 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc08120e4d6bc
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000048cfa
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 000000000000000a
x2 : 0000000080000000 x1 : 0000000000000000 x0 : 0000000000000001
Call trace:
arm_smmu_cmdq_issue_cmdlist+0x178/0xa50
__arm_smmu_tlb_inv_range+0x118/0x254
arm_smmu_tlb_inv_range_asid+0x6c/0x130
arm_smmu_mm_invalidate_range+0xa0/0xa4
__mmu_notifier_invalidate_range_end+0x88/0x120
unmap_vmas+0x194/0x1e0
unmap_region+0xb4/0x144
do_mas_align_munmap+0x290/0x490
do_mas_munmap+0xbc/0x124
__vm_munmap+0xa8/0x19c
__arm64_sys_munmap+0x28/0x50
invoke_syscall+0x78/0x11c
el0_svc_common.constprop.0+0x58/0x1c0
do_el0_svc+0x34/0x60
el0_svc+0x2c/0xd4
el0t_64_sync_handler+0x114/0x140
el0t_64_sync+0x1a4/0x1a8
--------------------------------------------------------------------
I think it is the same problem that we fixed in tlbflush.h using
MAX_TLBI_OPS. So, I plan to send a cleaner bug fix (cc stable):
--------------------------------------------------------------------
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a5a63b1c947e..e3ea7d2a6308 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -186,6 +186,9 @@ static void arm_smmu_free_shared_cd(struct arm_smmu_ctx_desc *cd)
}
}
+/* Copid from arch/arm64/include/asm/tlbflush.h to avoid similar soft lockups */
+#define MAX_TLBI_OPS (1 << (PAGE_SHIFT - 3))
+
static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long start, unsigned long end)
@@ -201,9 +204,14 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
*/
size = end - start;
- if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
- arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
- PAGE_SIZE, false, smmu_domain);
+ if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM)) {
+ if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_RANGE_INV) &&
+ size >= MAX_TLBI_OPS * PAGE_SIZE)
+ arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
+ else
+ arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
+ PAGE_SIZE, false, smmu_domain);
+ }
arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size);
}
--------------------------------------------------------------------
What do you think about it?
Thanks
Nic
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-09-01 0:08 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-22 8:45 [PATCH 0/3] iommu/arm-smmu-v3: Reduce latency in __arm_smmu_tlb_inv_range() Nicolin Chen
2023-08-22 8:45 ` Nicolin Chen
2023-08-22 8:45 ` [PATCH 1/3] iommu/io-pgtable-arm: Add nents_per_pgtable in struct io_pgtable_cfg Nicolin Chen
2023-08-22 8:45 ` Nicolin Chen
2023-08-22 9:19 ` Robin Murphy
2023-08-22 9:19 ` Robin Murphy
2023-08-22 16:42 ` Nicolin Chen
2023-08-22 16:42 ` Nicolin Chen
2023-08-29 15:37 ` Robin Murphy
2023-08-29 15:37 ` Robin Murphy
2023-08-29 20:15 ` Nicolin Chen
2023-08-29 20:15 ` Nicolin Chen
2023-08-29 21:25 ` Robin Murphy
2023-08-29 21:25 ` Robin Murphy
2023-08-29 22:15 ` Nicolin Chen
2023-08-29 22:15 ` Nicolin Chen
2023-08-30 21:49 ` Will Deacon
2023-08-30 21:49 ` Will Deacon
2023-08-31 17:39 ` Nicolin Chen
2023-08-31 17:39 ` Nicolin Chen
2023-09-01 0:08 ` Nicolin Chen [this message]
2023-09-01 0:08 ` Nicolin Chen
2023-09-01 18:02 ` Robin Murphy
2023-09-01 18:02 ` Robin Murphy
2023-09-01 18:23 ` Nicolin Chen
2023-09-01 18:23 ` Nicolin Chen
2024-01-20 19:59 ` Nicolin Chen
2024-01-20 19:59 ` Nicolin Chen
2024-01-22 13:01 ` Jason Gunthorpe
2024-01-22 13:01 ` Jason Gunthorpe
2024-01-22 17:24 ` Nicolin Chen
2024-01-22 17:24 ` Nicolin Chen
2024-01-22 17:57 ` Jason Gunthorpe
2024-01-22 17:57 ` Jason Gunthorpe
2024-01-24 0:11 ` Nicolin Chen
2024-01-24 0:11 ` Nicolin Chen
2024-01-25 13:55 ` Jason Gunthorpe
2024-01-25 13:55 ` Jason Gunthorpe
2024-01-25 17:23 ` Nicolin Chen
2024-01-25 17:23 ` Nicolin Chen
2024-01-25 17:47 ` Jason Gunthorpe
2024-01-25 17:47 ` Jason Gunthorpe
2024-01-25 19:55 ` Nicolin Chen
2024-01-25 19:55 ` Nicolin Chen
[not found] ` <098d64da-ecf5-4a23-bff9-a04840726ef0@huawei.com>
2024-01-25 5:09 ` Nicolin Chen
2024-01-25 5:09 ` Nicolin Chen
2023-08-22 8:45 ` [PATCH 2/3] iommu/arm-smmu-v3: Add an arm_smmu_tlb_inv_domain helper Nicolin Chen
2023-08-22 8:45 ` Nicolin Chen
2023-08-22 9:40 ` Robin Murphy
2023-08-22 9:40 ` Robin Murphy
2023-08-22 17:03 ` Nicolin Chen
2023-08-22 17:03 ` Nicolin Chen
2023-08-29 21:54 ` Robin Murphy
2023-08-29 21:54 ` Robin Murphy
2023-08-29 23:03 ` Nicolin Chen
2023-08-29 23:03 ` Nicolin Chen
2023-08-22 8:45 ` [PATCH 3/3] iommu/arm-smmu-v3: Add a max_tlbi_ops for __arm_smmu_tlb_inv_range() Nicolin Chen
2023-08-22 8:45 ` Nicolin Chen
2023-08-22 9:30 ` Robin Murphy
2023-08-22 9:30 ` Robin Murphy
2023-08-22 16:32 ` Nicolin Chen
2023-08-22 16:32 ` Nicolin Chen
2023-08-22 23:04 ` Nicolin Chen
2023-08-22 23:04 ` Nicolin Chen
2023-08-29 22:40 ` Robin Murphy
2023-08-29 22:40 ` Robin Murphy
2023-08-29 23:14 ` Nicolin Chen
2023-08-29 23:14 ` Nicolin Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZPErio3R5redmDNU@Asurada-Nvidia \
--to=nicolinc@nvidia.com \
--cc=apopple@nvidia.com \
--cc=iommu@lists.linux.dev \
--cc=jean-philippe@linaro.org \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.