Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] iommu/arm-smmu-v3: Shrink command/event/PRI queues in kdump kernel
@ 2026-07-02 11:28 Kiryl Shutsemau (Meta)
  2026-07-02 15:05 ` Pranjal Shrivastava
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Kiryl Shutsemau (Meta) @ 2026-07-02 11:28 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel
  Cc: Jason Gunthorpe, Nicolin Chen, Kyle McMartin, Breno Leitao,
	Usama Arif, linux-arm-kernel, iommu, linux-kernel,
	Kiryl Shutsemau (Meta)

The command, event and PRI queues are sized from the maxima the hardware
advertises in IDR1, which can be several megabytes each. On systems with
many SMMUv3 instances that cost is paid per instance and adds up to tens
of megabytes of coherent DMA in the capture kernel.

A kdump capture kernel runs from a small crashkernel reservation and only
has to drive the few devices used to save the dump, so deep queues serve
no purpose. The queues carry invalidation commands and fault records, not
DMA data, so dump throughput is unaffected; a shallower queue only bounds
how many commands may be in flight before a sync, which does not matter for
the capture kernel's small device count and modest I/O.

Clamp every queue to a single page when is_kdump_kernel() is true. Doing
it in arm_smmu_init_one_queue() covers the command, event and PRI queues
in one place. The command queue still holds at least one batch plus a sync
(256 entries on a 4K-page kernel, well above CMDQ_BATCH_ENTRIES), so
command batching keeps working.

Suggested-by: Kyle McMartin <jkkm@meta.com>
Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
Reviewed-by: Breno Leitao <leitao@debian.org>
---
v2:
 - Use min() instead of min_t(); both operands are u32 so the cast was
   redundant (Jason Gunthorpe, Breno Leitao).
 - Add Reviewed-by from Breno.

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index e8d7dbe495f0..a4ec4a59e527 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -4414,6 +4414,20 @@ int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 {
 	size_t qsz;
 
+	/*
+	 * A kdump capture kernel runs from a small crashkernel reservation and
+	 * only has to drive the few devices used to save the dump, so there is
+	 * no point sizing the queues for the (multi-megabyte) maxima the
+	 * hardware advertises. Clamp each queue to a single page. ent_sz_shift
+	 * is the log2 of the entry size in bytes (dwords * 8).
+	 */
+	if (is_kdump_kernel()) {
+		u32 ent_sz_shift = ilog2(dwords) + 3;
+
+		q->llq.max_n_shift = min(q->llq.max_n_shift,
+					 PAGE_SHIFT - ent_sz_shift);
+	}
+
 	do {
 		qsz = ((1 << q->llq.max_n_shift) * dwords) << 3;
 		q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma,
-- 
2.54.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] iommu/arm-smmu-v3: Shrink command/event/PRI queues in kdump kernel
  2026-07-02 11:28 [PATCH v2] iommu/arm-smmu-v3: Shrink command/event/PRI queues in kdump kernel Kiryl Shutsemau (Meta)
@ 2026-07-02 15:05 ` Pranjal Shrivastava
  2026-07-02 15:38   ` Kiryl Shutsemau
  2026-07-02 15:17 ` Jason Gunthorpe
  2026-07-02 18:54 ` Nicolin Chen
  2 siblings, 1 reply; 6+ messages in thread
From: Pranjal Shrivastava @ 2026-07-02 15:05 UTC (permalink / raw)
  To: Kiryl Shutsemau (Meta)
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Jason Gunthorpe,
	Nicolin Chen, Kyle McMartin, Breno Leitao, Usama Arif,
	linux-arm-kernel, iommu, linux-kernel

On Thu, Jul 02, 2026 at 12:28:25PM +0100, Kiryl Shutsemau (Meta) wrote:
> The command, event and PRI queues are sized from the maxima the hardware

A minor note here is PRI & EVT queues are disabled for the kdump kernel
(see arm_smmu_device_reset). We could just mention all SMMU queues are
sized [...] in the commit message. 

> advertises in IDR1, which can be several megabytes each. On systems with
> many SMMUv3 instances that cost is paid per instance and adds up to tens
> of megabytes of coherent DMA in the capture kernel.
> 
> A kdump capture kernel runs from a small crashkernel reservation and only
> has to drive the few devices used to save the dump, so deep queues serve
> no purpose. The queues carry invalidation commands and fault records, not
> DMA data, so dump throughput is unaffected; a shallower queue only bounds
> how many commands may be in flight before a sync, which does not matter for
> the capture kernel's small device count and modest I/O.
> 
> Clamp every queue to a single page when is_kdump_kernel() is true. Doing
> it in arm_smmu_init_one_queue() covers the command, event and PRI queues
> in one place. The command queue still holds at least one batch plus a sync
> (256 entries on a 4K-page kernel, well above CMDQ_BATCH_ENTRIES), so
> command batching keeps working.
> 
> Suggested-by: Kyle McMartin <jkkm@meta.com>
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Reviewed-by: Breno Leitao <leitao@debian.org>
> ---

Apart from that.

Reviewed-by: Pranjal Shrivastava <praan@google.com>

Thanks,
Praan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] iommu/arm-smmu-v3: Shrink command/event/PRI queues in kdump kernel
  2026-07-02 11:28 [PATCH v2] iommu/arm-smmu-v3: Shrink command/event/PRI queues in kdump kernel Kiryl Shutsemau (Meta)
  2026-07-02 15:05 ` Pranjal Shrivastava
@ 2026-07-02 15:17 ` Jason Gunthorpe
  2026-07-02 18:54 ` Nicolin Chen
  2 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2026-07-02 15:17 UTC (permalink / raw)
  To: Kiryl Shutsemau (Meta)
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Nicolin Chen,
	Kyle McMartin, Breno Leitao, Usama Arif, linux-arm-kernel, iommu,
	linux-kernel

On Thu, Jul 02, 2026 at 12:28:25PM +0100, Kiryl Shutsemau (Meta) wrote:
> The command, event and PRI queues are sized from the maxima the hardware
> advertises in IDR1, which can be several megabytes each. On systems with
> many SMMUv3 instances that cost is paid per instance and adds up to tens
> of megabytes of coherent DMA in the capture kernel.
> 
> A kdump capture kernel runs from a small crashkernel reservation and only
> has to drive the few devices used to save the dump, so deep queues serve
> no purpose. The queues carry invalidation commands and fault records, not
> DMA data, so dump throughput is unaffected; a shallower queue only bounds
> how many commands may be in flight before a sync, which does not matter for
> the capture kernel's small device count and modest I/O.
> 
> Clamp every queue to a single page when is_kdump_kernel() is true. Doing
> it in arm_smmu_init_one_queue() covers the command, event and PRI queues
> in one place. The command queue still holds at least one batch plus a sync
> (256 entries on a 4K-page kernel, well above CMDQ_BATCH_ENTRIES), so
> command batching keeps working.
> 
> Suggested-by: Kyle McMartin <jkkm@meta.com>
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Reviewed-by: Breno Leitao <leitao@debian.org>
> ---
> v2:
>  - Use min() instead of min_t(); both operands are u32 so the cast was
>    redundant (Jason Gunthorpe, Breno Leitao).
>  - Add Reviewed-by from Breno.
> 
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] iommu/arm-smmu-v3: Shrink command/event/PRI queues in kdump kernel
  2026-07-02 15:05 ` Pranjal Shrivastava
@ 2026-07-02 15:38   ` Kiryl Shutsemau
  2026-07-02 20:25     ` Pranjal Shrivastava
  0 siblings, 1 reply; 6+ messages in thread
From: Kiryl Shutsemau @ 2026-07-02 15:38 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Jason Gunthorpe,
	Nicolin Chen, Kyle McMartin, Breno Leitao, Usama Arif,
	linux-arm-kernel, iommu, linux-kernel

On Thu, Jul 02, 2026 at 03:05:12PM +0000, Pranjal Shrivastava wrote:
> On Thu, Jul 02, 2026 at 12:28:25PM +0100, Kiryl Shutsemau (Meta) wrote:
> > The command, event and PRI queues are sized from the maxima the hardware
> 
> A minor note here is PRI & EVT queues are disabled for the kdump kernel
> (see arm_smmu_device_reset). We could just mention all SMMU queues are
> sized [...] in the commit message. 

Fair enough.

Here's updated commit message (I will send v3 in few days, if no new
feedback):

Subject: [PATCH v3] iommu/arm-smmu-v3: Shrink command/event/PRI queues in
 kdump kernel

All SMMU queues are sized from the maxima the hardware advertises in IDR1,
which can be several megabytes each, and are allocated at probe. The kdump
kernel already disables the event and PRI queues (arm_smmu_device_reset()
drops CR0_EVTQEN/CR0_PRIQEN) but still allocates them at full size. On
systems with many SMMUv3 instances that cost is paid per instance and adds
up to tens of megabytes of coherent DMA in the capture kernel.

A kdump capture kernel runs from a small crashkernel reservation and only
has to drive the few devices used to save the dump, so deep queues serve
no purpose. The queues are not on the DMA data path, so dump throughput is
unaffected; a shallower command queue only bounds how many commands may be
in flight before a sync, which does not matter for the capture kernel's
small device count and modest I/O.

Clamp every queue to a single page when is_kdump_kernel() is true. Doing
it in arm_smmu_init_one_queue() covers the command, event and PRI queues
in one place. The command queue still holds at least one batch plus a sync
(256 entries on a 4K-page kernel, well above CMDQ_BATCH_ENTRIES), so
command batching keeps working.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] iommu/arm-smmu-v3: Shrink command/event/PRI queues in kdump kernel
  2026-07-02 11:28 [PATCH v2] iommu/arm-smmu-v3: Shrink command/event/PRI queues in kdump kernel Kiryl Shutsemau (Meta)
  2026-07-02 15:05 ` Pranjal Shrivastava
  2026-07-02 15:17 ` Jason Gunthorpe
@ 2026-07-02 18:54 ` Nicolin Chen
  2 siblings, 0 replies; 6+ messages in thread
From: Nicolin Chen @ 2026-07-02 18:54 UTC (permalink / raw)
  To: Kiryl Shutsemau (Meta)
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Jason Gunthorpe,
	Kyle McMartin, Breno Leitao, Usama Arif, linux-arm-kernel, iommu,
	linux-kernel

On Thu, Jul 02, 2026 at 12:28:25PM +0100, Kiryl Shutsemau (Meta) wrote:
> The command, event and PRI queues are sized from the maxima the hardware
> advertises in IDR1, which can be several megabytes each. On systems with
> many SMMUv3 instances that cost is paid per instance and adds up to tens
> of megabytes of coherent DMA in the capture kernel.
> 
> A kdump capture kernel runs from a small crashkernel reservation and only
> has to drive the few devices used to save the dump, so deep queues serve
> no purpose. The queues carry invalidation commands and fault records, not
> DMA data, so dump throughput is unaffected; a shallower queue only bounds
> how many commands may be in flight before a sync, which does not matter for
> the capture kernel's small device count and modest I/O.
> 
> Clamp every queue to a single page when is_kdump_kernel() is true. Doing
> it in arm_smmu_init_one_queue() covers the command, event and PRI queues
> in one place. The command queue still holds at least one batch plus a sync
> (256 entries on a 4K-page kernel, well above CMDQ_BATCH_ENTRIES), so
> command batching keeps working.
> 
> Suggested-by: Kyle McMartin <jkkm@meta.com>
> Signed-off-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Reviewed-by: Breno Leitao <leitao@debian.org>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] iommu/arm-smmu-v3: Shrink command/event/PRI queues in kdump kernel
  2026-07-02 15:38   ` Kiryl Shutsemau
@ 2026-07-02 20:25     ` Pranjal Shrivastava
  0 siblings, 0 replies; 6+ messages in thread
From: Pranjal Shrivastava @ 2026-07-02 20:25 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Jason Gunthorpe,
	Nicolin Chen, Kyle McMartin, Breno Leitao, Usama Arif,
	linux-arm-kernel, iommu, linux-kernel

On Thu, Jul 02, 2026 at 04:38:55PM +0100, Kiryl Shutsemau wrote:
> On Thu, Jul 02, 2026 at 03:05:12PM +0000, Pranjal Shrivastava wrote:
> > On Thu, Jul 02, 2026 at 12:28:25PM +0100, Kiryl Shutsemau (Meta) wrote:
> > > The command, event and PRI queues are sized from the maxima the hardware
> > 
> > A minor note here is PRI & EVT queues are disabled for the kdump kernel
> > (see arm_smmu_device_reset). We could just mention all SMMU queues are
> > sized [...] in the commit message. 
> 
> Fair enough.
> 
> Here's updated commit message (I will send v3 in few days, if no new
> feedback):
> 
> Subject: [PATCH v3] iommu/arm-smmu-v3: Shrink command/event/PRI queues in
>  kdump kernel
> 
> All SMMU queues are sized from the maxima the hardware advertises in IDR1,
> which can be several megabytes each, and are allocated at probe. The kdump
> kernel already disables the event and PRI queues (arm_smmu_device_reset()
> drops CR0_EVTQEN/CR0_PRIQEN) but still allocates them at full size. On
> systems with many SMMUv3 instances that cost is paid per instance and adds
> up to tens of megabytes of coherent DMA in the capture kernel.
> 
> A kdump capture kernel runs from a small crashkernel reservation and only
> has to drive the few devices used to save the dump, so deep queues serve
> no purpose. The queues are not on the DMA data path, so dump throughput is
> unaffected; a shallower command queue only bounds how many commands may be
> in flight before a sync, which does not matter for the capture kernel's
> small device count and modest I/O.
> 
> Clamp every queue to a single page when is_kdump_kernel() is true. Doing
> it in arm_smmu_init_one_queue() covers the command, event and PRI queues
> in one place. The command queue still holds at least one batch plus a sync
> (256 entries on a 4K-page kernel, well above CMDQ_BATCH_ENTRIES), so
> command batching keeps working.
>

Looks good. Thanks!

Praan


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-07-02 20:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-02 11:28 [PATCH v2] iommu/arm-smmu-v3: Shrink command/event/PRI queues in kdump kernel Kiryl Shutsemau (Meta)
2026-07-02 15:05 ` Pranjal Shrivastava
2026-07-02 15:38   ` Kiryl Shutsemau
2026-07-02 20:25     ` Pranjal Shrivastava
2026-07-02 15:17 ` Jason Gunthorpe
2026-07-02 18:54 ` Nicolin Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox