linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/3] dma: support DMA zone starting above 4GB
@ 2024-08-02  6:03 Baruch Siach
  2024-08-02  6:03 ` [PATCH v5 1/3] dma: improve DMA zone selection Baruch Siach
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Baruch Siach @ 2024-08-02  6:03 UTC (permalink / raw)
  To: Christoph Hellwig, Marek Szyprowski, Catalin Marinas, Will Deacon
  Cc: Baruch Siach, Robin Murphy, iommu, linux-arm-kernel, linux-kernel,
	linuxppc-dev, linux-s390, Petr Tesařík, Ramon Fried,
	Elad Nachman

DMA zones code assumes that DMA lower limit is zero. When there is no RAM 
below 4GB, arm64 platform code sets DMA/DMA32 zone limits to cover the entire 
RAM[0].

My target platform has RAM starting at 32GB. Devices with 30-bit DMA mask are 
mapped to 1GB at the bottom of RAM, between 32GB - 33GB. DMA zone over the 
entire RAM breaks DMA allocation for these devices.

In response to a previous RFC hack[1] Catalin Marinas suggested to add a
separate offset value as base address for the DMA zone, and then refined the 
suggestion to use start of RAM[3]. This series attempts to implement that 
suggestion.

With this series applied, the DMA zone covers the right RAM range for my 
platform.

v5:

  * Test the correct kernel

  * Add missing patch that actually makes DMA zone work

  * Extend the treatment of zone_dma_limit > DMA_BIT_MASK(32)

  * Use max() to make the code somewhat more readable

  * Change zone_dma_limit type to u64 to match DMA_BIT_MASK()

v4:

  * Drop last patch. zone_dma_limit includes RAM base address.

  * Adjust DMA zone selection in swiotlb as well.

  * Don't change max_zone_phys() behaviour

  * Update code to fallback to DMA zone when zone_dma_limit > DMA_BIT_MASK(32)

v3:

  * Rebase on v6.11-rc1.

  * Drop zone_dma_base. Use memblock_start_of_DRAM() instead.

  * Drop DT patches. Low DMA range limit no longer needed.

  * Add patch to improve dma_direct_optimal_gfp_mask() heuristics as Catalin 
    suggested.

RFC v2:

  * Add patch from Catalin[2] changing zone_dma_bits to zone_dma_limit to 
    simplify subsequent patches

  * Test on real hardware

RFC v1: https://lore.kernel.org/all/cover.1703683642.git.baruch@tkos.co.il/

[0] See commit 791ab8b2e3db ("arm64: Ignore any DMA offsets in the 
    max_zone_phys() calculation")

[1] https://lore.kernel.org/all/9af8a19c3398e7dc09cfc1fbafed98d795d9f83e.1699464622.git.baruch@tkos.co.il/

[2] https://lore.kernel.org/all/ZZ2HnHJV3gdzu1Aj@arm.com/

[3] https://lore.kernel.org/all/ZnH-VU2iz9Q2KLbr@arm.com/

Baruch Siach (1):
  dma: improve DMA zone selection

Catalin Marinas (2):
  dma: replace zone_dma_bits by zone_dma_limit
  arm64: support DMA zone above 4GB

 arch/arm64/mm/init.c       | 32 ++++++++++----------------------
 arch/powerpc/mm/mem.c      |  9 ++++-----
 arch/s390/mm/init.c        |  2 +-
 include/linux/dma-direct.h |  2 +-
 kernel/dma/direct.c        | 10 +++++-----
 kernel/dma/pool.c          |  4 ++--
 kernel/dma/swiotlb.c       |  8 ++++----
 7 files changed, 27 insertions(+), 40 deletions(-)


base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
-- 
2.43.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v5 1/3] dma: improve DMA zone selection
  2024-08-02  6:03 [PATCH v5 0/3] dma: support DMA zone starting above 4GB Baruch Siach
@ 2024-08-02  6:03 ` Baruch Siach
  2024-08-07 12:04   ` kernel test robot
                     ` (2 more replies)
  2024-08-02  6:03 ` [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
  2024-08-02  6:03 ` [PATCH v5 3/3] arm64: support DMA zone above 4GB Baruch Siach
  2 siblings, 3 replies; 16+ messages in thread
From: Baruch Siach @ 2024-08-02  6:03 UTC (permalink / raw)
  To: Christoph Hellwig, Marek Szyprowski, Catalin Marinas, Will Deacon
  Cc: Baruch Siach, Robin Murphy, iommu, linux-arm-kernel, linux-kernel,
	linuxppc-dev, linux-s390, Petr Tesařík, Ramon Fried,
	Elad Nachman

When device DMA limit does not fit in DMA32 zone it should use DMA zone,
even when DMA zone is stricter than needed.

Same goes for devices that can't allocate from the entire normal zone.
Limit to DMA32 in that case.

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 kernel/dma/direct.c  | 6 +++---
 kernel/dma/swiotlb.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4480a3cd92e0..3b4be4ca3b08 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -4,7 +4,7 @@
  *
  * DMA operations that map physical memory directly without using an IOMMU.
  */
-#include <linux/memblock.h> /* for max_pfn */
+#include <linux/memblock.h>
 #include <linux/export.h>
 #include <linux/mm.h>
 #include <linux/dma-map-ops.h>
@@ -59,9 +59,9 @@ static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 *phys_limit)
 	 * zones.
 	 */
 	*phys_limit = dma_to_phys(dev, dma_limit);
-	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
+	if (*phys_limit < DMA_BIT_MASK(32))
 		return GFP_DMA;
-	if (*phys_limit <= DMA_BIT_MASK(32))
+	if (*phys_limit < memblock_end_of_DRAM())
 		return GFP_DMA32;
 	return 0;
 }
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index df68d29740a0..043b0ecd3e8d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -629,9 +629,9 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
 	}
 
 	gfp &= ~GFP_ZONEMASK;
-	if (phys_limit <= DMA_BIT_MASK(zone_dma_bits))
+	if (phys_limit < DMA_BIT_MASK(32))
 		gfp |= __GFP_DMA;
-	else if (phys_limit <= DMA_BIT_MASK(32))
+	else if (phys_limit < memblock_end_of_DRAM())
 		gfp |= __GFP_DMA32;
 
 	while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit))) {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-02  6:03 [PATCH v5 0/3] dma: support DMA zone starting above 4GB Baruch Siach
  2024-08-02  6:03 ` [PATCH v5 1/3] dma: improve DMA zone selection Baruch Siach
@ 2024-08-02  6:03 ` Baruch Siach
  2024-08-02  9:37   ` Catalin Marinas
  2024-08-07 14:30   ` Petr Tesařík
  2024-08-02  6:03 ` [PATCH v5 3/3] arm64: support DMA zone above 4GB Baruch Siach
  2 siblings, 2 replies; 16+ messages in thread
From: Baruch Siach @ 2024-08-02  6:03 UTC (permalink / raw)
  To: Christoph Hellwig, Marek Szyprowski, Catalin Marinas, Will Deacon
  Cc: Baruch Siach, Robin Murphy, iommu, linux-arm-kernel, linux-kernel,
	linuxppc-dev, linux-s390, Petr Tesařík, Ramon Fried,
	Elad Nachman

From: Catalin Marinas <catalin.marinas@arm.com>

Hardware DMA limit might not be power of 2. When RAM range starts above
0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
can not encode this limit.

Use plain address for DMA zone limit.

Since DMA zone can now potentially span beyond 4GB physical limit of
DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Co-developed-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
 arch/powerpc/mm/mem.c      |  9 ++++-----
 arch/s390/mm/init.c        |  2 +-
 include/linux/dma-direct.h |  2 +-
 kernel/dma/direct.c        |  4 ++--
 kernel/dma/pool.c          |  4 ++--
 kernel/dma/swiotlb.c       |  4 ++--
 7 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 9b5ab6818f7f..c45e2152ca9e 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -115,35 +115,35 @@ static void __init arch_reserve_crashkernel(void)
 }
 
 /*
- * Return the maximum physical address for a zone accessible by the given bits
- * limit. If DRAM starts above 32-bit, expand the zone to the maximum
+ * Return the maximum physical address for a zone given its limit.
+ * If DRAM starts above 32-bit, expand the zone to the maximum
  * available memory, otherwise cap it at 32-bit.
  */
-static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
+static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
 {
-	phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
 	phys_addr_t phys_start = memblock_start_of_DRAM();
 
 	if (phys_start > U32_MAX)
-		zone_mask = PHYS_ADDR_MAX;
-	else if (phys_start > zone_mask)
-		zone_mask = U32_MAX;
+		zone_limit = PHYS_ADDR_MAX;
+	else if (phys_start > zone_limit)
+		zone_limit = U32_MAX;
 
-	return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
+	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
 }
 
 static void __init zone_sizes_init(void)
 {
 	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
-	unsigned int __maybe_unused acpi_zone_dma_bits;
-	unsigned int __maybe_unused dt_zone_dma_bits;
-	phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32);
+	phys_addr_t __maybe_unused acpi_zone_dma_limit;
+	phys_addr_t __maybe_unused dt_zone_dma_limit;
+	phys_addr_t __maybe_unused dma32_phys_limit =
+		max_zone_phys(DMA_BIT_MASK(32));
 
 #ifdef CONFIG_ZONE_DMA
-	acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
-	dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
-	zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
-	arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
+	acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
+	dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
+	zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
+	arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
 	max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
 #ifdef CONFIG_ZONE_DMA32
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index d325217ab201..342c006cc1b8 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -216,7 +216,7 @@ static int __init mark_nonram_nosave(void)
  * everything else. GFP_DMA32 page allocations automatically fall back to
  * ZONE_DMA.
  *
- * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
+ * By using 31-bit unconditionally, we can exploit zone_dma_limit to inform the
  * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
  * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
  * ZONE_DMA.
@@ -252,13 +252,12 @@ void __init paging_init(void)
 	 * powerbooks.
 	 */
 	if (IS_ENABLED(CONFIG_PPC32))
-		zone_dma_bits = 30;
+		zone_dma_limit = DMA_BIT_MASK(30);
 	else
-		zone_dma_bits = 31;
+		zone_dma_limit = DMA_BIT_MASK(31);
 
 #ifdef CONFIG_ZONE_DMA
-	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
-				      1UL << (zone_dma_bits - PAGE_SHIFT));
+	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn, zone_dma_limit >> PAGE_SHIFT);
 #endif
 	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
 #ifdef CONFIG_HIGHMEM
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index ddcd39ef4346..91fc2b91adfc 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -97,7 +97,7 @@ void __init paging_init(void)
 
 	vmem_map_init();
 	sparse_init();
-	zone_dma_bits = 31;
+	zone_dma_limit = DMA_BIT_MASK(31);
 	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
 	max_zone_pfns[ZONE_DMA] = virt_to_pfn(MAX_DMA_ADDRESS);
 	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index edbe13d00776..d7e30d4f7503 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -12,7 +12,7 @@
 #include <linux/mem_encrypt.h>
 #include <linux/swiotlb.h>
 
-extern unsigned int zone_dma_bits;
+extern u64 zone_dma_limit;
 
 /*
  * Record the mapping of CPU physical to DMA addresses for a given region.
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 3b4be4ca3b08..62b36fda44c9 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -20,7 +20,7 @@
  * it for entirely different regions. In that case the arch code needs to
  * override the variable below for dma-direct to work properly.
  */
-unsigned int zone_dma_bits __ro_after_init = 24;
+u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
 
 static inline dma_addr_t phys_to_dma_direct(struct device *dev,
 		phys_addr_t phys)
@@ -580,7 +580,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 	 * part of the check.
 	 */
 	if (IS_ENABLED(CONFIG_ZONE_DMA))
-		min_mask = min_t(u64, min_mask, DMA_BIT_MASK(zone_dma_bits));
+		min_mask = min_t(u64, min_mask, zone_dma_limit);
 	return mask >= phys_to_dma_unencrypted(dev, min_mask);
 }
 
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index d10613eb0f63..7b04f7575796 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
 	/* CMA can't cross zone boundaries, see cma_activate_area() */
 	end = cma_get_base(cma) + size - 1;
 	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
-		return end <= DMA_BIT_MASK(zone_dma_bits);
+		return end <= zone_dma_limit;
 	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
-		return end <= DMA_BIT_MASK(32);
+		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
 	return true;
 }
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 043b0ecd3e8d..bb51bd5335ad 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 	if (!remap)
 		io_tlb_default_mem.can_grow = true;
 	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
-		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
+		io_tlb_default_mem.phys_limit = zone_dma_limit;
 	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
-		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
+		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
 	else
 		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
 #endif
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v5 3/3] arm64: support DMA zone above 4GB
  2024-08-02  6:03 [PATCH v5 0/3] dma: support DMA zone starting above 4GB Baruch Siach
  2024-08-02  6:03 ` [PATCH v5 1/3] dma: improve DMA zone selection Baruch Siach
  2024-08-02  6:03 ` [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
@ 2024-08-02  6:03 ` Baruch Siach
  2 siblings, 0 replies; 16+ messages in thread
From: Baruch Siach @ 2024-08-02  6:03 UTC (permalink / raw)
  To: Christoph Hellwig, Marek Szyprowski, Catalin Marinas, Will Deacon
  Cc: Baruch Siach, Robin Murphy, iommu, linux-arm-kernel, linux-kernel,
	linuxppc-dev, linux-s390, Petr Tesařík, Ramon Fried,
	Elad Nachman

From: Catalin Marinas <catalin.marinas@arm.com>

Commit 791ab8b2e3db ("arm64: Ignore any DMA offsets in the
max_zone_phys() calculation") made arm64 DMA/DMA32 zones span the entire
RAM when RAM starts above 32-bits. This breaks hardware with DMA area
that start above 32-bits. But the commit log says that "we haven't
noticed any such hardware". It turns out that such hardware does exist.

One such platform has RAM starting at 32GB with an internal bus that has
the following DMA limits:

  #address-cells = <2>;
  #size-cells = <2>;
  dma-ranges = <0x00 0xc0000000 0x08 0x00000000 0x00 0x40000000>;

That is, devices under this bus see 1GB of DMA range between 3GB-4GB in
their address space. This range is mapped to CPU memory at 32GB-33GB.
With current code DMA allocations for devices under this bus are not
limited to DMA area, leading to run-time allocation failure.

This commit reinstates DMA zone at the bottom of RAM. The result is DMA
zone that properly reflects the hardware constraints as follows:

[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000800000000-0x000000083fffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000840000000-0x0000000bffffffff]

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[baruch: split off the original patch]
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 arch/arm64/mm/init.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index c45e2152ca9e..bfb10969cbf0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -114,20 +114,8 @@ static void __init arch_reserve_crashkernel(void)
 				    low_size, high);
 }
 
-/*
- * Return the maximum physical address for a zone given its limit.
- * If DRAM starts above 32-bit, expand the zone to the maximum
- * available memory, otherwise cap it at 32-bit.
- */
 static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
 {
-	phys_addr_t phys_start = memblock_start_of_DRAM();
-
-	if (phys_start > U32_MAX)
-		zone_limit = PHYS_ADDR_MAX;
-	else if (phys_start > zone_limit)
-		zone_limit = U32_MAX;
-
 	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-02  6:03 ` [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
@ 2024-08-02  9:37   ` Catalin Marinas
  2024-08-07 14:19     ` Petr Tesařík
  2024-08-07 14:30   ` Petr Tesařík
  1 sibling, 1 reply; 16+ messages in thread
From: Catalin Marinas @ 2024-08-02  9:37 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Christoph Hellwig, Marek Szyprowski, Will Deacon, Robin Murphy,
	iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman

On Fri, Aug 02, 2024 at 09:03:47AM +0300, Baruch Siach wrote:
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 3b4be4ca3b08..62b36fda44c9 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -20,7 +20,7 @@
>   * it for entirely different regions. In that case the arch code needs to
>   * override the variable below for dma-direct to work properly.
>   */
> -unsigned int zone_dma_bits __ro_after_init = 24;
> +u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);

u64 here makes sense even if it may be larger than phys_addr_t. It
matches the phys_limit type in the swiotlb code. The compilers should no
longer complain.

> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index d10613eb0f63..7b04f7575796 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
>  	/* CMA can't cross zone boundaries, see cma_activate_area() */
>  	end = cma_get_base(cma) + size - 1;
>  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> -		return end <= DMA_BIT_MASK(zone_dma_bits);
> +		return end <= zone_dma_limit;
>  	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
> -		return end <= DMA_BIT_MASK(32);
> +		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
>  	return true;
>  }
>  
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 043b0ecd3e8d..bb51bd5335ad 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>  	if (!remap)
>  		io_tlb_default_mem.can_grow = true;
>  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
> +		io_tlb_default_mem.phys_limit = zone_dma_limit;
>  	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
> +		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
>  	else
>  		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
>  #endif

These two look correct to me now and it's the least intrusive (the
alternative would have been a zone_dma32_limit). The arch code, however,
needs to ensure that zone_dma_limit can always support 32-bit devices
even if it is above 4GB (with the relevant dma offsets in place for such
devices).

-- 
Catalin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] dma: improve DMA zone selection
  2024-08-02  6:03 ` [PATCH v5 1/3] dma: improve DMA zone selection Baruch Siach
@ 2024-08-07 12:04   ` kernel test robot
  2024-08-07 13:13   ` Robin Murphy
  2024-08-07 16:40   ` kernel test robot
  2 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2024-08-07 12:04 UTC (permalink / raw)
  To: Baruch Siach, Christoph Hellwig, Marek Szyprowski,
	Catalin Marinas, Will Deacon
  Cc: oe-kbuild-all, Baruch Siach, Robin Murphy, iommu,
	linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman

Hi Baruch,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 8400291e289ee6b2bf9779ff1c83a291501f017b]

url:    https://github.com/intel-lab-lkp/linux/commits/Baruch-Siach/dma-improve-DMA-zone-selection/20240803-074651
base:   8400291e289ee6b2bf9779ff1c83a291501f017b
patch link:    https://lore.kernel.org/r/5200f289af1a9b80dfd329b6ed3d54e1d4a02876.1722578375.git.baruch%40tkos.co.il
patch subject: [PATCH v5 1/3] dma: improve DMA zone selection
config: i386-randconfig-063-20240807 (https://download.01.org/0day-ci/archive/20240807/202408071931.W1GA8Ee2-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240807/202408071931.W1GA8Ee2-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408071931.W1GA8Ee2-lkp@intel.com/

All warnings (new ones prefixed by >>, old ones prefixed by <<):

>> WARNING: modpost: vmlinux: section mismatch in reference: __dma_direct_alloc_pages+0xcc (section: .text.__dma_direct_alloc_pages) -> memblock (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: __dma_direct_alloc_pages+0xd2 (section: .text.__dma_direct_alloc_pages) -> memblock (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: swiotlb_alloc_pool+0xa0 (section: .text.swiotlb_alloc_pool) -> memblock (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: swiotlb_alloc_pool+0xa6 (section: .text.swiotlb_alloc_pool) -> memblock (section: .init.data)
WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/bpf/preload/bpf_preload.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_userspace.o

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] dma: improve DMA zone selection
  2024-08-02  6:03 ` [PATCH v5 1/3] dma: improve DMA zone selection Baruch Siach
  2024-08-07 12:04   ` kernel test robot
@ 2024-08-07 13:13   ` Robin Murphy
  2024-08-07 13:58     ` Catalin Marinas
  2024-08-07 16:40   ` kernel test robot
  2 siblings, 1 reply; 16+ messages in thread
From: Robin Murphy @ 2024-08-07 13:13 UTC (permalink / raw)
  To: Baruch Siach, Christoph Hellwig, Marek Szyprowski,
	Catalin Marinas, Will Deacon
  Cc: iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman

On 2024-08-02 7:03 am, Baruch Siach wrote:
> When device DMA limit does not fit in DMA32 zone it should use DMA zone,
> even when DMA zone is stricter than needed.
> 
> Same goes for devices that can't allocate from the entire normal zone.
> Limit to DMA32 in that case.

Per the bot report this only works for CONFIG_ARCH_KEEP_MEMBLOCK, 
however the whole concept looks wrong anyway. The logic here is that 
we're only forcing a particular zone if there's *no* chance of the 
higher zone being usable. For example, ignoring offsets for simplicity, 
if we have a 40-bit DMA mask then we *do* want to initially try 
allocating from ZONE_NORMAL even if max_pfn is above 40 bits, since we 
still might get a usable allocation from between 32 and 40 bits, and if 
we don't, then we'll fall back to retrying from the DMA zone(s) anyway.

I'm not sure if the rest of the series functionally depends on this 
change, but I think it would be too needlessly restrictive in the 
general case to be justified.

Thanks,
Robin.

> Reported-by: Catalin Marinas <catalin.marinas@arm.com>
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---
>   kernel/dma/direct.c  | 6 +++---
>   kernel/dma/swiotlb.c | 4 ++--
>   2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4480a3cd92e0..3b4be4ca3b08 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -4,7 +4,7 @@
>    *
>    * DMA operations that map physical memory directly without using an IOMMU.
>    */
> -#include <linux/memblock.h> /* for max_pfn */
> +#include <linux/memblock.h>
>   #include <linux/export.h>
>   #include <linux/mm.h>
>   #include <linux/dma-map-ops.h>
> @@ -59,9 +59,9 @@ static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 *phys_limit)
>   	 * zones.
>   	 */
>   	*phys_limit = dma_to_phys(dev, dma_limit);
> -	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> +	if (*phys_limit < DMA_BIT_MASK(32))
>   		return GFP_DMA;
> -	if (*phys_limit <= DMA_BIT_MASK(32))
> +	if (*phys_limit < memblock_end_of_DRAM())
>   		return GFP_DMA32;
>   	return 0;
>   }
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index df68d29740a0..043b0ecd3e8d 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -629,9 +629,9 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>   	}
>   
>   	gfp &= ~GFP_ZONEMASK;
> -	if (phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> +	if (phys_limit < DMA_BIT_MASK(32))
>   		gfp |= __GFP_DMA;
> -	else if (phys_limit <= DMA_BIT_MASK(32))
> +	else if (phys_limit < memblock_end_of_DRAM())
>   		gfp |= __GFP_DMA32;
>   
>   	while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit))) {


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] dma: improve DMA zone selection
  2024-08-07 13:13   ` Robin Murphy
@ 2024-08-07 13:58     ` Catalin Marinas
  2024-08-07 14:12       ` Petr Tesařík
  0 siblings, 1 reply; 16+ messages in thread
From: Catalin Marinas @ 2024-08-07 13:58 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Baruch Siach, Christoph Hellwig, Marek Szyprowski, Will Deacon,
	iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman

Thanks Robin for having a look.

On Wed, Aug 07, 2024 at 02:13:06PM +0100, Robin Murphy wrote:
> On 2024-08-02 7:03 am, Baruch Siach wrote:
> > When device DMA limit does not fit in DMA32 zone it should use DMA zone,
> > even when DMA zone is stricter than needed.
> > 
> > Same goes for devices that can't allocate from the entire normal zone.
> > Limit to DMA32 in that case.
> 
> Per the bot report this only works for CONFIG_ARCH_KEEP_MEMBLOCK,

Yeah, I just noticed.

> however
> the whole concept looks wrong anyway. The logic here is that we're only
> forcing a particular zone if there's *no* chance of the higher zone being
> usable. For example, ignoring offsets for simplicity, if we have a 40-bit
> DMA mask then we *do* want to initially try allocating from ZONE_NORMAL even
> if max_pfn is above 40 bits, since we still might get a usable allocation
> from between 32 and 40 bits, and if we don't, then we'll fall back to
> retrying from the DMA zone(s) anyway.

Ah, I did not read the code further down in __dma_direct_alloc_pages(),
it does fall back to a GFP_DMA allocation if !dma_coherent_ok().
Similarly with swiotlb_alloc_tlb(), it keeps retrying until the
allocation fails.

So yes, this patch can be dropped.

-- 
Catalin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] dma: improve DMA zone selection
  2024-08-07 13:58     ` Catalin Marinas
@ 2024-08-07 14:12       ` Petr Tesařík
  0 siblings, 0 replies; 16+ messages in thread
From: Petr Tesařík @ 2024-08-07 14:12 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Robin Murphy, Baruch Siach, Christoph Hellwig, Marek Szyprowski,
	Will Deacon, iommu, linux-arm-kernel, linux-kernel, linuxppc-dev,
	linux-s390, Ramon Fried, Elad Nachman

On Wed, 7 Aug 2024 14:58:49 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

> Thanks Robin for having a look.
> 
> On Wed, Aug 07, 2024 at 02:13:06PM +0100, Robin Murphy wrote:
> > On 2024-08-02 7:03 am, Baruch Siach wrote:  
> > > When device DMA limit does not fit in DMA32 zone it should use DMA zone,
> > > even when DMA zone is stricter than needed.
> > > 
> > > Same goes for devices that can't allocate from the entire normal zone.
> > > Limit to DMA32 in that case.  
> > 
> > Per the bot report this only works for CONFIG_ARCH_KEEP_MEMBLOCK,  
> 
> Yeah, I just noticed.
> 
> > however
> > the whole concept looks wrong anyway. The logic here is that we're only
> > forcing a particular zone if there's *no* chance of the higher zone being
> > usable. For example, ignoring offsets for simplicity, if we have a 40-bit
> > DMA mask then we *do* want to initially try allocating from ZONE_NORMAL even
> > if max_pfn is above 40 bits, since we still might get a usable allocation
> > from between 32 and 40 bits, and if we don't, then we'll fall back to
> > retrying from the DMA zone(s) anyway.  
> 
> Ah, I did not read the code further down in __dma_direct_alloc_pages(),
> it does fall back to a GFP_DMA allocation if !dma_coherent_ok().
> Similarly with swiotlb_alloc_tlb(), it keeps retrying until the
> allocation fails.

Er, you certainly mean it keeps retrying as long as the allocation
fails.

Yes, that's true. The initial GFP zone flags are a best-effort guess,
and a stricter zone can be chosen in the end. This whole logic tries to
select the highest zone that satisfies the limit, because allocating
some pages and freeing them immediately is wasteful, especially for
high-order allocations.

> So yes, this patch can be dropped.

+1

Petr T


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-02  9:37   ` Catalin Marinas
@ 2024-08-07 14:19     ` Petr Tesařík
  2024-08-07 18:14       ` Catalin Marinas
  0 siblings, 1 reply; 16+ messages in thread
From: Petr Tesařík @ 2024-08-07 14:19 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Baruch Siach, Christoph Hellwig, Marek Szyprowski, Will Deacon,
	Robin Murphy, iommu, linux-arm-kernel, linux-kernel, linuxppc-dev,
	linux-s390, Ramon Fried, Elad Nachman

On Fri, 2 Aug 2024 10:37:38 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

> On Fri, Aug 02, 2024 at 09:03:47AM +0300, Baruch Siach wrote:
> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 3b4be4ca3b08..62b36fda44c9 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -20,7 +20,7 @@
> >   * it for entirely different regions. In that case the arch code needs to
> >   * override the variable below for dma-direct to work properly.
> >   */
> > -unsigned int zone_dma_bits __ro_after_init = 24;
> > +u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);  
> 
> u64 here makes sense even if it may be larger than phys_addr_t. It
> matches the phys_limit type in the swiotlb code. The compilers should no
> longer complain.

FTR I have never quite understood why phys_limit is u64, but u64 was
already used all around the place when I first looked into swiotlb.

> > diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> > index d10613eb0f63..7b04f7575796 100644
> > --- a/kernel/dma/pool.c
> > +++ b/kernel/dma/pool.c
> > @@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
> >  	/* CMA can't cross zone boundaries, see cma_activate_area() */
> >  	end = cma_get_base(cma) + size - 1;
> >  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> > -		return end <= DMA_BIT_MASK(zone_dma_bits);
> > +		return end <= zone_dma_limit;
> >  	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
> > -		return end <= DMA_BIT_MASK(32);
> > +		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
> >  	return true;
> >  }
> >  
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 043b0ecd3e8d..bb51bd5335ad 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> >  	if (!remap)
> >  		io_tlb_default_mem.can_grow = true;
> >  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
> > -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
> > +		io_tlb_default_mem.phys_limit = zone_dma_limit;
> >  	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
> > -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
> > +		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
> >  	else
> >  		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
> >  #endif  
> 
> These two look correct to me now and it's the least intrusive (the
> alternative would have been a zone_dma32_limit). The arch code, however,
> needs to ensure that zone_dma_limit can always support 32-bit devices
> even if it is above 4GB (with the relevant dma offsets in place for such
> devices).

Just to make sure, the DMA zone (if present) must map to at most 32-bit
bus address space (possibly behind a bridge). Is that what you're
saying?

Petr T


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-02  6:03 ` [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
  2024-08-02  9:37   ` Catalin Marinas
@ 2024-08-07 14:30   ` Petr Tesařík
  1 sibling, 0 replies; 16+ messages in thread
From: Petr Tesařík @ 2024-08-07 14:30 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Christoph Hellwig, Marek Szyprowski, Catalin Marinas, Will Deacon,
	Robin Murphy, iommu, linux-arm-kernel, linux-kernel, linuxppc-dev,
	linux-s390, Ramon Fried, Elad Nachman

On Fri,  2 Aug 2024 09:03:47 +0300
Baruch Siach <baruch@tkos.co.il> wrote:

> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> Hardware DMA limit might not be power of 2. When RAM range starts above
> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> can not encode this limit.
> 
> Use plain address for DMA zone limit.
> 
> Since DMA zone can now potentially span beyond 4GB physical limit of
> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---
>  arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
>  arch/powerpc/mm/mem.c      |  9 ++++-----
>  arch/s390/mm/init.c        |  2 +-
>  include/linux/dma-direct.h |  2 +-
>  kernel/dma/direct.c        |  4 ++--
>  kernel/dma/pool.c          |  4 ++--
>  kernel/dma/swiotlb.c       |  4 ++--
>  7 files changed, 27 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 9b5ab6818f7f..c45e2152ca9e 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -115,35 +115,35 @@ static void __init arch_reserve_crashkernel(void)
>  }
>  
>  /*
> - * Return the maximum physical address for a zone accessible by the given bits
> - * limit. If DRAM starts above 32-bit, expand the zone to the maximum
> + * Return the maximum physical address for a zone given its limit.
> + * If DRAM starts above 32-bit, expand the zone to the maximum
>   * available memory, otherwise cap it at 32-bit.
>   */
> -static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> +static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>  {
> -	phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
>  	phys_addr_t phys_start = memblock_start_of_DRAM();
>  
>  	if (phys_start > U32_MAX)
> -		zone_mask = PHYS_ADDR_MAX;
> -	else if (phys_start > zone_mask)
> -		zone_mask = U32_MAX;
> +		zone_limit = PHYS_ADDR_MAX;
> +	else if (phys_start > zone_limit)
> +		zone_limit = U32_MAX;
>  
> -	return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
> +	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
>  }
>  
>  static void __init zone_sizes_init(void)
>  {
>  	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> -	unsigned int __maybe_unused acpi_zone_dma_bits;
> -	unsigned int __maybe_unused dt_zone_dma_bits;
> -	phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32);
> +	phys_addr_t __maybe_unused acpi_zone_dma_limit;
> +	phys_addr_t __maybe_unused dt_zone_dma_limit;
> +	phys_addr_t __maybe_unused dma32_phys_limit =
> +		max_zone_phys(DMA_BIT_MASK(32));
>  
>  #ifdef CONFIG_ZONE_DMA
> -	acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
> -	dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
> -	zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
> -	arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> +	acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
> +	dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
> +	zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
> +	arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
>  	max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>  #endif
>  #ifdef CONFIG_ZONE_DMA32
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index d325217ab201..342c006cc1b8 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -216,7 +216,7 @@ static int __init mark_nonram_nosave(void)
>   * everything else. GFP_DMA32 page allocations automatically fall back to
>   * ZONE_DMA.
>   *
> - * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
> + * By using 31-bit unconditionally, we can exploit zone_dma_limit to inform the
>   * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
>   * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
>   * ZONE_DMA.
> @@ -252,13 +252,12 @@ void __init paging_init(void)
>  	 * powerbooks.
>  	 */
>  	if (IS_ENABLED(CONFIG_PPC32))
> -		zone_dma_bits = 30;
> +		zone_dma_limit = DMA_BIT_MASK(30);
>  	else
> -		zone_dma_bits = 31;
> +		zone_dma_limit = DMA_BIT_MASK(31);
>  
>  #ifdef CONFIG_ZONE_DMA
> -	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
> -				      1UL << (zone_dma_bits - PAGE_SHIFT));
> +	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn, zone_dma_limit >> PAGE_SHIFT);

No big deal, but this is off by one. DMA_BIT_MASK() returns the highest
address that can be represented with the given number of bits, whereas
max_zone_pfns[] contains the lowest PFN that is NOT contained in the
zone.

Rest of the patch looks perfect.

Petr T


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 1/3] dma: improve DMA zone selection
  2024-08-02  6:03 ` [PATCH v5 1/3] dma: improve DMA zone selection Baruch Siach
  2024-08-07 12:04   ` kernel test robot
  2024-08-07 13:13   ` Robin Murphy
@ 2024-08-07 16:40   ` kernel test robot
  2 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2024-08-07 16:40 UTC (permalink / raw)
  To: Baruch Siach, Christoph Hellwig, Marek Szyprowski,
	Catalin Marinas, Will Deacon
  Cc: oe-kbuild-all, Baruch Siach, Robin Murphy, iommu,
	linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman

Hi Baruch,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 8400291e289ee6b2bf9779ff1c83a291501f017b]

url:    https://github.com/intel-lab-lkp/linux/commits/Baruch-Siach/dma-improve-DMA-zone-selection/20240803-074651
base:   8400291e289ee6b2bf9779ff1c83a291501f017b
patch link:    https://lore.kernel.org/r/5200f289af1a9b80dfd329b6ed3d54e1d4a02876.1722578375.git.baruch%40tkos.co.il
patch subject: [PATCH v5 1/3] dma: improve DMA zone selection
config: csky-randconfig-001-20240807 (https://download.01.org/0day-ci/archive/20240808/202408080035.rXXbb5Yc-lkp@intel.com/config)
compiler: csky-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240808/202408080035.rXXbb5Yc-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408080035.rXXbb5Yc-lkp@intel.com/

All warnings (new ones prefixed by >>, old ones prefixed by <<):

WARNING: modpost: vmlinux: section mismatch in reference: dma_direct_optimal_gfp_mask+0x46 (section: .text) -> memblock_end_of_DRAM (section: .init.text)
>> WARNING: modpost: vmlinux: section mismatch in reference: sg_page.isra.0+0x1c (section: .text) -> memblock_end_of_DRAM (section: .init.text)
WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/locking/test-ww_mutex.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/clk/imx/clk-imxrt1050.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/devfreq/governor_performance.o

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-07 14:19     ` Petr Tesařík
@ 2024-08-07 18:14       ` Catalin Marinas
  2024-08-08  9:35         ` Petr Tesařík
  0 siblings, 1 reply; 16+ messages in thread
From: Catalin Marinas @ 2024-08-07 18:14 UTC (permalink / raw)
  To: Petr Tesařík
  Cc: Baruch Siach, Christoph Hellwig, Marek Szyprowski, Will Deacon,
	Robin Murphy, iommu, linux-arm-kernel, linux-kernel, linuxppc-dev,
	linux-s390, Ramon Fried, Elad Nachman

On Wed, Aug 07, 2024 at 04:19:38PM +0200, Petr Tesařík wrote:
> On Fri, 2 Aug 2024 10:37:38 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Fri, Aug 02, 2024 at 09:03:47AM +0300, Baruch Siach wrote:
> > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > > index 3b4be4ca3b08..62b36fda44c9 100644
> > > --- a/kernel/dma/direct.c
> > > +++ b/kernel/dma/direct.c
> > > @@ -20,7 +20,7 @@
> > >   * it for entirely different regions. In that case the arch code needs to
> > >   * override the variable below for dma-direct to work properly.
> > >   */
> > > -unsigned int zone_dma_bits __ro_after_init = 24;
> > > +u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);  
> > 
> > u64 here makes sense even if it may be larger than phys_addr_t. It
> > matches the phys_limit type in the swiotlb code. The compilers should no
> > longer complain.
> 
> FTR I have never quite understood why phys_limit is u64, but u64 was
> already used all around the place when I first looked into swiotlb.
> 
> > > diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> > > index d10613eb0f63..7b04f7575796 100644
> > > --- a/kernel/dma/pool.c
> > > +++ b/kernel/dma/pool.c
> > > @@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
> > >  	/* CMA can't cross zone boundaries, see cma_activate_area() */
> > >  	end = cma_get_base(cma) + size - 1;
> > >  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> > > -		return end <= DMA_BIT_MASK(zone_dma_bits);
> > > +		return end <= zone_dma_limit;
> > >  	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
> > > -		return end <= DMA_BIT_MASK(32);
> > > +		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
> > >  	return true;
> > >  }
> > >  
> > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > index 043b0ecd3e8d..bb51bd5335ad 100644
> > > --- a/kernel/dma/swiotlb.c
> > > +++ b/kernel/dma/swiotlb.c
> > > @@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> > >  	if (!remap)
> > >  		io_tlb_default_mem.can_grow = true;
> > >  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
> > > -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
> > > +		io_tlb_default_mem.phys_limit = zone_dma_limit;
> > >  	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
> > > -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
> > > +		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
> > >  	else
> > >  		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
> > >  #endif  
> > 
> > These two look correct to me now and it's the least intrusive (the
> > alternative would have been a zone_dma32_limit). The arch code, however,
> > needs to ensure that zone_dma_limit can always support 32-bit devices
> > even if it is above 4GB (with the relevant dma offsets in place for such
> > devices).
> 
> Just to make sure, the DMA zone (if present) must map to at most 32-bit
> bus address space (possibly behind a bridge). Is that what you're
> saying?

No exactly. What I'm trying to say is that on arm64 zone_dma_limit can
go beyond DMA_BIT_MASK(32) when the latter is treated as a CPU address.
In such cases, ZONE_DMA32 is empty.

TBH, this code is confusing and not entirely suitable for a system where
the CPU address offsets are not 0. The device::dma_coherent_mask is
about the bus address range and phys_limit is calculated correctly in
functions like dma_direct_optimal_gfp_mask(). But that's about it w.r.t.
DMA bit masks because zone_dma_bits and DMA_BIT_MASK(32) are assumed to
be about the CPU address ranges in some cases (in other cases
DMA_BIT_MASK() is used to initialise dma_coherent_mask, so more of a bus
address).

On the platform Baruch is trying to fix, RAM starts at 32GB and ZONE_DMA
should end at 33GB. That's 30-bit mask in bus address terms but
something not a power of two for the CPU address, hence the
zone_dma_limit introduced here.

With ZONE_DMA32, since all the DMA code assumes that ZONE_DMA32 ends at
4GB CPU address, it doesn't really work for such platforms. If there are
32-bit devices with a corresponding CPU address offset, ZONE_DMA32
should end at 36GB on Baruch's platform. But to simplify things, we just
ignore this on arm64 and make ZONE_DMA32 empty.

In some cases where we have the device structure we could instead do a
dma_to_phys(DMA_BIT_MASK(32)) but not in the two cases above. I guess if
we really want to address this properly, we'd need to introduce a
zone_dma32_limit that's initialised by the arch code. For arm64, I'm
happy with just having an empty ZONE_DMA32 on such platforms.

-- 
Catalin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-07 18:14       ` Catalin Marinas
@ 2024-08-08  9:35         ` Petr Tesařík
  2024-08-08 10:01           ` Robin Murphy
  2024-08-08 13:46           ` Catalin Marinas
  0 siblings, 2 replies; 16+ messages in thread
From: Petr Tesařík @ 2024-08-08  9:35 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Baruch Siach, Christoph Hellwig, Marek Szyprowski, Will Deacon,
	Robin Murphy, iommu, linux-arm-kernel, linux-kernel, linuxppc-dev,
	linux-s390, Ramon Fried, Elad Nachman

On Wed, 7 Aug 2024 19:14:58 +0100
Catalin Marinas <catalin.marinas@arm.com> wrote:

> On Wed, Aug 07, 2024 at 04:19:38PM +0200, Petr Tesařík wrote:
> > On Fri, 2 Aug 2024 10:37:38 +0100
> > Catalin Marinas <catalin.marinas@arm.com> wrote:  
> > > On Fri, Aug 02, 2024 at 09:03:47AM +0300, Baruch Siach wrote:  
> > > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > > > index 3b4be4ca3b08..62b36fda44c9 100644
> > > > --- a/kernel/dma/direct.c
> > > > +++ b/kernel/dma/direct.c
> > > > @@ -20,7 +20,7 @@
> > > >   * it for entirely different regions. In that case the arch code needs to
> > > >   * override the variable below for dma-direct to work properly.
> > > >   */
> > > > -unsigned int zone_dma_bits __ro_after_init = 24;
> > > > +u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);    
> > > 
> > > u64 here makes sense even if it may be larger than phys_addr_t. It
> > > matches the phys_limit type in the swiotlb code. The compilers should no
> > > longer complain.  
> > 
> > FTR I have never quite understood why phys_limit is u64, but u64 was
> > already used all around the place when I first looked into swiotlb.
> >   
> > > > diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> > > > index d10613eb0f63..7b04f7575796 100644
> > > > --- a/kernel/dma/pool.c
> > > > +++ b/kernel/dma/pool.c
> > > > @@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
> > > >  	/* CMA can't cross zone boundaries, see cma_activate_area() */
> > > >  	end = cma_get_base(cma) + size - 1;
> > > >  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> > > > -		return end <= DMA_BIT_MASK(zone_dma_bits);
> > > > +		return end <= zone_dma_limit;
> > > >  	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
> > > > -		return end <= DMA_BIT_MASK(32);
> > > > +		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
> > > >  	return true;
> > > >  }
> > > >  
> > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > > index 043b0ecd3e8d..bb51bd5335ad 100644
> > > > --- a/kernel/dma/swiotlb.c
> > > > +++ b/kernel/dma/swiotlb.c
> > > > @@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> > > >  	if (!remap)
> > > >  		io_tlb_default_mem.can_grow = true;
> > > >  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mhttps://lpc.events/event/18/contributions/1776/ask & __GFP_DMA))
> > > > -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
> > > > +		io_tlb_default_mem.phys_limit = zone_dma_limit;
> > > >  	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
> > > > -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
> > > > +		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
> > > >  	else
> > > >  		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
> > > >  #endif    
> > > 
> > > These two look correct to me now and it's the least intrusive (the
> > > alternative would have been a zone_dma32_limit). The arch code, however,
> > > needs to ensure that zone_dma_limit can always support 32-bit devices
> > > even if it is above 4GB (with the relevant dma offsets in place for such
> > > devices).  
> > 
> > Just to make sure, the DMA zone (if present) must map to at most 32-bit
> > bus address space (possibly behind a bridge). Is that what you're
> > saying?  
> 
> No exactly. What I'm trying to say is that on arm64 zone_dma_limit can
> go beyond DMA_BIT_MASK(32) when the latter is treated as a CPU address.
> In such cases, ZONE_DMA32 is empty.
> 
> TBH, this code is confusing and not entirely suitable for a system where
> the CPU address offsets are not 0. The device::dma_coherent_mask is
> about the bus address range and phys_limit is calculated correctly in
> functions like dma_direct_optimal_gfp_mask(). But that's about it w.r.t.
> DMA bit masks because zone_dma_bits and DMA_BIT_MASK(32) are assumed to
> be about the CPU address ranges in some cases (in other cases
> DMA_BIT_MASK() is used to initialise dma_coherent_mask, so more of a bus
> address).

Yes, I know.

> On the platform Baruch is trying to fix, RAM starts at 32GB and ZONE_DMA
> should end at 33GB. That's 30-bit mask in bus address terms but
> something not a power of two for the CPU address, hence the
> zone_dma_limit introduced here.

Yes, I was watching the discussion.

> With ZONE_DMA32, since all the DMA code assumes that ZONE_DMA32 ends at
> 4GB CPU address, it doesn't really work for such platforms. If there are
> 32-bit devices with a corresponding CPU address offset, ZONE_DMA32
> should end at 36GB on Baruch's platform. But to simplify things, we just
> ignore this on arm64 and make ZONE_DMA32 empty.

Ah. That makes sense. It also seems to support my theory that Linux
memory zones are an obsolete concept and should be replaced by a
different mechanism.

> In some cases where we have the device structure we could instead do a
> dma_to_phys(DMA_BIT_MASK(32)) but not in the two cases above. I guess if
> we really want to address this properly, we'd need to introduce a
> zone_dma32_limit that's initialised by the arch code. For arm64, I'm
> happy with just having an empty ZONE_DMA32 on such platforms.

The obvious caveat is that zone boundaries are system-wide, but the
mapping between bus addresses and CPU addresses depends on the device
structure. After all, that's why dma_to_phys takes the device as a
parameter... In fact, a system may have multiple busses behind
different bridges with a different offset applied by each.

FYI I want to make more people aware of these issues at this year's
Plumbers, see https://lpc.events/event/18/contributions/1776/

Petr T


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-08  9:35         ` Petr Tesařík
@ 2024-08-08 10:01           ` Robin Murphy
  2024-08-08 13:46           ` Catalin Marinas
  1 sibling, 0 replies; 16+ messages in thread
From: Robin Murphy @ 2024-08-08 10:01 UTC (permalink / raw)
  To: Petr Tesařík, Catalin Marinas
  Cc: Baruch Siach, Christoph Hellwig, Marek Szyprowski, Will Deacon,
	iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Ramon Fried, Elad Nachman

On 2024-08-08 10:35 am, Petr Tesařík wrote:
> On Wed, 7 Aug 2024 19:14:58 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
>> On Wed, Aug 07, 2024 at 04:19:38PM +0200, Petr Tesařík wrote:
>>> On Fri, 2 Aug 2024 10:37:38 +0100
>>> Catalin Marinas <catalin.marinas@arm.com> wrote:
>>>> On Fri, Aug 02, 2024 at 09:03:47AM +0300, Baruch Siach wrote:
>>>>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>>>>> index 3b4be4ca3b08..62b36fda44c9 100644
>>>>> --- a/kernel/dma/direct.c
>>>>> +++ b/kernel/dma/direct.c
>>>>> @@ -20,7 +20,7 @@
>>>>>    * it for entirely different regions. In that case the arch code needs to
>>>>>    * override the variable below for dma-direct to work properly.
>>>>>    */
>>>>> -unsigned int zone_dma_bits __ro_after_init = 24;
>>>>> +u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>>>>
>>>> u64 here makes sense even if it may be larger than phys_addr_t. It
>>>> matches the phys_limit type in the swiotlb code. The compilers should no
>>>> longer complain.
>>>
>>> FTR I have never quite understood why phys_limit is u64, but u64 was
>>> already used all around the place when I first looked into swiotlb.
>>>    
>>>>> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
>>>>> index d10613eb0f63..7b04f7575796 100644
>>>>> --- a/kernel/dma/pool.c
>>>>> +++ b/kernel/dma/pool.c
>>>>> @@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
>>>>>   	/* CMA can't cross zone boundaries, see cma_activate_area() */
>>>>>   	end = cma_get_base(cma) + size - 1;
>>>>>   	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
>>>>> -		return end <= DMA_BIT_MASK(zone_dma_bits);
>>>>> +		return end <= zone_dma_limit;
>>>>>   	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
>>>>> -		return end <= DMA_BIT_MASK(32);
>>>>> +		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
>>>>>   	return true;
>>>>>   }
>>>>>   
>>>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>>>>> index 043b0ecd3e8d..bb51bd5335ad 100644
>>>>> --- a/kernel/dma/swiotlb.c
>>>>> +++ b/kernel/dma/swiotlb.c
>>>>> @@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>>>>>   	if (!remap)
>>>>>   		io_tlb_default_mem.can_grow = true;
>>>>>   	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mhttps://lpc.events/event/18/contributions/1776/ask & __GFP_DMA))
>>>>> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
>>>>> +		io_tlb_default_mem.phys_limit = zone_dma_limit;
>>>>>   	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
>>>>> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
>>>>> +		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
>>>>>   	else
>>>>>   		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
>>>>>   #endif
>>>>
>>>> These two look correct to me now and it's the least intrusive (the
>>>> alternative would have been a zone_dma32_limit). The arch code, however,
>>>> needs to ensure that zone_dma_limit can always support 32-bit devices
>>>> even if it is above 4GB (with the relevant dma offsets in place for such
>>>> devices).
>>>
>>> Just to make sure, the DMA zone (if present) must map to at most 32-bit
>>> bus address space (possibly behind a bridge). Is that what you're
>>> saying?
>>
>> No exactly. What I'm trying to say is that on arm64 zone_dma_limit can
>> go beyond DMA_BIT_MASK(32) when the latter is treated as a CPU address.
>> In such cases, ZONE_DMA32 is empty.
>>
>> TBH, this code is confusing and not entirely suitable for a system where
>> the CPU address offsets are not 0. The device::dma_coherent_mask is
>> about the bus address range and phys_limit is calculated correctly in
>> functions like dma_direct_optimal_gfp_mask(). But that's about it w.r.t.
>> DMA bit masks because zone_dma_bits and DMA_BIT_MASK(32) are assumed to
>> be about the CPU address ranges in some cases (in other cases
>> DMA_BIT_MASK() is used to initialise dma_coherent_mask, so more of a bus
>> address).
> 
> Yes, I know.
> 
>> On the platform Baruch is trying to fix, RAM starts at 32GB and ZONE_DMA
>> should end at 33GB. That's 30-bit mask in bus address terms but
>> something not a power of two for the CPU address, hence the
>> zone_dma_limit introduced here.
> 
> Yes, I was watching the discussion.
> 
>> With ZONE_DMA32, since all the DMA code assumes that ZONE_DMA32 ends at
>> 4GB CPU address, it doesn't really work for such platforms. If there are
>> 32-bit devices with a corresponding CPU address offset, ZONE_DMA32
>> should end at 36GB on Baruch's platform. But to simplify things, we just
>> ignore this on arm64 and make ZONE_DMA32 empty.
> 
> Ah. That makes sense. It also seems to support my theory that Linux
> memory zones are an obsolete concept and should be replaced by a
> different mechanism.
> 
>> In some cases where we have the device structure we could instead do a
>> dma_to_phys(DMA_BIT_MASK(32)) but not in the two cases above. I guess if
>> we really want to address this properly, we'd need to introduce a
>> zone_dma32_limit that's initialised by the arch code. For arm64, I'm
>> happy with just having an empty ZONE_DMA32 on such platforms.
> 
> The obvious caveat is that zone boundaries are system-wide, but the
> mapping between bus addresses and CPU addresses depends on the device
> structure. After all, that's why dma_to_phys takes the device as a
> parameter... In fact, a system may have multiple busses behind
> different bridges with a different offset applied by each.

Right, that's why the *_dma_get_max_cpu_address() functions already walk 
all known bus translations backwards to find the lowest common 
denominator in the CPU address space. In principle we could also 
calculate the lowest translated 32-bit DMA address from every >32-bit 
range in the same way, however that represents enough extra complexity 
that it doesn't seem worth trying to implement unless and until someone 
actually has a clear need for it.

Thanks,
Robin.

> 
> FYI I want to make more people aware of these issues at this year's
> Plumbers, see https://lpc.events/event/18/contributions/1776/
> 
> Petr T


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-08  9:35         ` Petr Tesařík
  2024-08-08 10:01           ` Robin Murphy
@ 2024-08-08 13:46           ` Catalin Marinas
  1 sibling, 0 replies; 16+ messages in thread
From: Catalin Marinas @ 2024-08-08 13:46 UTC (permalink / raw)
  To: Petr Tesařík
  Cc: Baruch Siach, Christoph Hellwig, Marek Szyprowski, Will Deacon,
	Robin Murphy, iommu, linux-arm-kernel, linux-kernel, linuxppc-dev,
	linux-s390, Ramon Fried, Elad Nachman

On Thu, Aug 08, 2024 at 11:35:01AM +0200, Petr Tesařík wrote:
> On Wed, 7 Aug 2024 19:14:58 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > With ZONE_DMA32, since all the DMA code assumes that ZONE_DMA32 ends at
> > 4GB CPU address, it doesn't really work for such platforms. If there are
> > 32-bit devices with a corresponding CPU address offset, ZONE_DMA32
> > should end at 36GB on Baruch's platform. But to simplify things, we just
> > ignore this on arm64 and make ZONE_DMA32 empty.
> 
> Ah. That makes sense. It also seems to support my theory that Linux
> memory zones are an obsolete concept and should be replaced by a
> different mechanism.

I agree, they are too coarse-grained. From an API perspective, what we
need is an alloc_pages() that takes a DMA mask or phys address limit,
maybe something similar to memblock_alloc_range_nid(). OTOH, an
advantage of the zones is that by default you keep the lower memory free
by using ZONE_NORMAL as default, you have free lists per zone. Maybe
with some alternative data structures we could efficiently search free
pages based on phys ranges or bitmasks and get rid of the zones but I
haven't put any thoughts into it.

We'd still need some boundaries like *_dma_get_max_cpu_address() to at
least allocate an swiotlb buffer that's suitable for all devices.

> > In some cases where we have the device structure we could instead do a
> > dma_to_phys(DMA_BIT_MASK(32)) but not in the two cases above. I guess if
> > we really want to address this properly, we'd need to introduce a
> > zone_dma32_limit that's initialised by the arch code. For arm64, I'm
> > happy with just having an empty ZONE_DMA32 on such platforms.
> 
> The obvious caveat is that zone boundaries are system-wide, but the
> mapping between bus addresses and CPU addresses depends on the device
> structure. After all, that's why dma_to_phys takes the device as a
> parameter... In fact, a system may have multiple busses behind
> different bridges with a different offset applied by each.

Indeed, and as Robin mentioned, the ACPI/DT code already handle this.

> FYI I want to make more people aware of these issues at this year's
> Plumbers, see https://lpc.events/event/18/contributions/1776/

Looking forward to this. I'll dial in, unfortunately can't make Plumbers
in person this year.

In the meantime, I think this series is a good compromise ;).

-- 
Catalin


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-08-08 13:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-02  6:03 [PATCH v5 0/3] dma: support DMA zone starting above 4GB Baruch Siach
2024-08-02  6:03 ` [PATCH v5 1/3] dma: improve DMA zone selection Baruch Siach
2024-08-07 12:04   ` kernel test robot
2024-08-07 13:13   ` Robin Murphy
2024-08-07 13:58     ` Catalin Marinas
2024-08-07 14:12       ` Petr Tesařík
2024-08-07 16:40   ` kernel test robot
2024-08-02  6:03 ` [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
2024-08-02  9:37   ` Catalin Marinas
2024-08-07 14:19     ` Petr Tesařík
2024-08-07 18:14       ` Catalin Marinas
2024-08-08  9:35         ` Petr Tesařík
2024-08-08 10:01           ` Robin Murphy
2024-08-08 13:46           ` Catalin Marinas
2024-08-07 14:30   ` Petr Tesařík
2024-08-02  6:03 ` [PATCH v5 3/3] arm64: support DMA zone above 4GB Baruch Siach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).