linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 RESED 0/2] dma: support DMA zone starting above 4GB
@ 2024-08-11  7:09 Baruch Siach
  2024-08-11  7:09 ` [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Baruch Siach @ 2024-08-11  7:09 UTC (permalink / raw)
  To: Christoph Hellwig, Marek Szyprowski, Catalin Marinas, Will Deacon
  Cc: linux-s390, Baruch Siach, Ramon Fried, Petr Tesařík,
	Robin Murphy, linux-kernel, iommu, Elad Nachman, linuxppc-dev,
	linux-arm-kernel

[ Resend series with correct Cc list. Sorry for the spam. ]

DMA zones code assumes that DMA lower limit is zero. When there is no RAM 
below 4GB, arm64 platform code sets DMA/DMA32 zone limits to cover the entire 
RAM[0].

My target platform has RAM starting at 32GB. Devices with 30-bit DMA mask are 
mapped to 1GB at the bottom of RAM, between 32GB - 33GB. DMA zone over the 
entire RAM breaks DMA allocation for these devices.

In response to a previous RFC hack[1] Catalin Marinas suggested to add a
separate offset value as base address for the DMA zone, and then refined the 
suggestion to use start of RAM[3]. This series attempts to implement that 
suggestion.

With this series applied, the DMA zone covers the right RAM range for my 
platform.

v6:

  * Drop the first patch; existing logic is just fine

  * Modify powerpc code to avoid off by one issue

v5:

  * Test the correct kernel

  * Add missing patch that actually makes DMA zone work

  * Extend the treatment of zone_dma_limit > DMA_BIT_MASK(32)

  * Use max() to make the code somewhat more readable

  * Change zone_dma_limit type to u64 to match DMA_BIT_MASK()

v4:

  * Drop last patch. zone_dma_limit includes RAM base address.

  * Adjust DMA zone selection in swiotlb as well.

  * Don't change max_zone_phys() behaviour

  * Update code to fallback to DMA zone when zone_dma_limit > DMA_BIT_MASK(32)

v3:

  * Rebase on v6.11-rc1.

  * Drop zone_dma_base. Use memblock_start_of_DRAM() instead.

  * Drop DT patches. Low DMA range limit no longer needed.

  * Add patch to improve dma_direct_optimal_gfp_mask() heuristics as Catalin 
    suggested.

RFC v2:

  * Add patch from Catalin[2] changing zone_dma_bits to zone_dma_limit to 
    simplify subsequent patches

  * Test on real hardware

RFC v1: https://lore.kernel.org/all/cover.1703683642.git.baruch@tkos.co.il/

[0] See commit 791ab8b2e3db ("arm64: Ignore any DMA offsets in the 
    max_zone_phys() calculation")

[1] https://lore.kernel.org/all/9af8a19c3398e7dc09cfc1fbafed98d795d9f83e.1699464622.git.baruch@tkos.co.il/

[2] https://lore.kernel.org/all/ZZ2HnHJV3gdzu1Aj@arm.com/

[3] https://lore.kernel.org/all/ZnH-VU2iz9Q2KLbr@arm.com/

Catalin Marinas (2):
  dma: replace zone_dma_bits by zone_dma_limit
  arm64: support DMA zone above 4GB

 arch/arm64/mm/init.c       | 32 ++++++++++----------------------
 arch/powerpc/mm/mem.c      |  5 ++++-
 arch/s390/mm/init.c        |  2 +-
 include/linux/dma-direct.h |  2 +-
 kernel/dma/direct.c        |  6 +++---
 kernel/dma/pool.c          |  4 ++--
 kernel/dma/swiotlb.c       |  6 +++---
 7 files changed, 24 insertions(+), 33 deletions(-)


base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
-- 
2.43.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-11  7:09 [PATCH v6 RESED 0/2] dma: support DMA zone starting above 4GB Baruch Siach
@ 2024-08-11  7:09 ` Baruch Siach
  2024-08-12  5:52   ` Petr Tesarik
                     ` (4 more replies)
  2024-08-11  7:09 ` [PATCH v6 RESED 2/2] arm64: support DMA zone above 4GB Baruch Siach
  2024-08-22  4:18 ` [PATCH v6 RESED 0/2] dma: support DMA zone starting " Christoph Hellwig
  2 siblings, 5 replies; 17+ messages in thread
From: Baruch Siach @ 2024-08-11  7:09 UTC (permalink / raw)
  To: Christoph Hellwig, Marek Szyprowski, Catalin Marinas, Will Deacon
  Cc: linux-s390, Baruch Siach, Ramon Fried, Petr Tesařík,
	Robin Murphy, linux-kernel, iommu, Elad Nachman, linuxppc-dev,
	linux-arm-kernel

From: Catalin Marinas <catalin.marinas@arm.com>

Hardware DMA limit might not be power of 2. When RAM range starts above
0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
can not encode this limit.

Use plain address for DMA zone limit.

Since DMA zone can now potentially span beyond 4GB physical limit of
DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Co-developed-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
 arch/powerpc/mm/mem.c      |  5 ++++-
 arch/s390/mm/init.c        |  2 +-
 include/linux/dma-direct.h |  2 +-
 kernel/dma/direct.c        |  6 +++---
 kernel/dma/pool.c          |  4 ++--
 kernel/dma/swiotlb.c       |  6 +++---
 7 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 9b5ab6818f7f..c45e2152ca9e 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -115,35 +115,35 @@ static void __init arch_reserve_crashkernel(void)
 }
 
 /*
- * Return the maximum physical address for a zone accessible by the given bits
- * limit. If DRAM starts above 32-bit, expand the zone to the maximum
+ * Return the maximum physical address for a zone given its limit.
+ * If DRAM starts above 32-bit, expand the zone to the maximum
  * available memory, otherwise cap it at 32-bit.
  */
-static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
+static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
 {
-	phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
 	phys_addr_t phys_start = memblock_start_of_DRAM();
 
 	if (phys_start > U32_MAX)
-		zone_mask = PHYS_ADDR_MAX;
-	else if (phys_start > zone_mask)
-		zone_mask = U32_MAX;
+		zone_limit = PHYS_ADDR_MAX;
+	else if (phys_start > zone_limit)
+		zone_limit = U32_MAX;
 
-	return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
+	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
 }
 
 static void __init zone_sizes_init(void)
 {
 	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
-	unsigned int __maybe_unused acpi_zone_dma_bits;
-	unsigned int __maybe_unused dt_zone_dma_bits;
-	phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32);
+	phys_addr_t __maybe_unused acpi_zone_dma_limit;
+	phys_addr_t __maybe_unused dt_zone_dma_limit;
+	phys_addr_t __maybe_unused dma32_phys_limit =
+		max_zone_phys(DMA_BIT_MASK(32));
 
 #ifdef CONFIG_ZONE_DMA
-	acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
-	dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
-	zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
-	arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
+	acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
+	dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
+	zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
+	arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
 	max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
 #endif
 #ifdef CONFIG_ZONE_DMA32
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index d325217ab201..05b7f702b3f7 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -216,7 +216,7 @@ static int __init mark_nonram_nosave(void)
  * everything else. GFP_DMA32 page allocations automatically fall back to
  * ZONE_DMA.
  *
- * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
+ * By using 31-bit unconditionally, we can exploit zone_dma_limit to inform the
  * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
  * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
  * ZONE_DMA.
@@ -230,6 +230,7 @@ void __init paging_init(void)
 {
 	unsigned long long total_ram = memblock_phys_mem_size();
 	phys_addr_t top_of_ram = memblock_end_of_DRAM();
+	int zone_dma_bits;
 
 #ifdef CONFIG_HIGHMEM
 	unsigned long v = __fix_to_virt(FIX_KMAP_END);
@@ -256,6 +257,8 @@ void __init paging_init(void)
 	else
 		zone_dma_bits = 31;
 
+	zone_dma_limit = DMA_BIT_MASK(zone_dma_bits);
+
 #ifdef CONFIG_ZONE_DMA
 	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
 				      1UL << (zone_dma_bits - PAGE_SHIFT));
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index ddcd39ef4346..91fc2b91adfc 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -97,7 +97,7 @@ void __init paging_init(void)
 
 	vmem_map_init();
 	sparse_init();
-	zone_dma_bits = 31;
+	zone_dma_limit = DMA_BIT_MASK(31);
 	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
 	max_zone_pfns[ZONE_DMA] = virt_to_pfn(MAX_DMA_ADDRESS);
 	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index edbe13d00776..d7e30d4f7503 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -12,7 +12,7 @@
 #include <linux/mem_encrypt.h>
 #include <linux/swiotlb.h>
 
-extern unsigned int zone_dma_bits;
+extern u64 zone_dma_limit;
 
 /*
  * Record the mapping of CPU physical to DMA addresses for a given region.
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4480a3cd92e0..f2ba074a6a54 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -20,7 +20,7 @@
  * it for entirely different regions. In that case the arch code needs to
  * override the variable below for dma-direct to work properly.
  */
-unsigned int zone_dma_bits __ro_after_init = 24;
+u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
 
 static inline dma_addr_t phys_to_dma_direct(struct device *dev,
 		phys_addr_t phys)
@@ -59,7 +59,7 @@ static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 *phys_limit)
 	 * zones.
 	 */
 	*phys_limit = dma_to_phys(dev, dma_limit);
-	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
+	if (*phys_limit <= zone_dma_limit)
 		return GFP_DMA;
 	if (*phys_limit <= DMA_BIT_MASK(32))
 		return GFP_DMA32;
@@ -580,7 +580,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 	 * part of the check.
 	 */
 	if (IS_ENABLED(CONFIG_ZONE_DMA))
-		min_mask = min_t(u64, min_mask, DMA_BIT_MASK(zone_dma_bits));
+		min_mask = min_t(u64, min_mask, zone_dma_limit);
 	return mask >= phys_to_dma_unencrypted(dev, min_mask);
 }
 
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index d10613eb0f63..7b04f7575796 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
 	/* CMA can't cross zone boundaries, see cma_activate_area() */
 	end = cma_get_base(cma) + size - 1;
 	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
-		return end <= DMA_BIT_MASK(zone_dma_bits);
+		return end <= zone_dma_limit;
 	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
-		return end <= DMA_BIT_MASK(32);
+		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
 	return true;
 }
 
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index df68d29740a0..abcf3fa63a56 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 	if (!remap)
 		io_tlb_default_mem.can_grow = true;
 	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
-		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
+		io_tlb_default_mem.phys_limit = zone_dma_limit;
 	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
-		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
+		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
 	else
 		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
 #endif
@@ -629,7 +629,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
 	}
 
 	gfp &= ~GFP_ZONEMASK;
-	if (phys_limit <= DMA_BIT_MASK(zone_dma_bits))
+	if (phys_limit <= zone_dma_limit)
 		gfp |= __GFP_DMA;
 	else if (phys_limit <= DMA_BIT_MASK(32))
 		gfp |= __GFP_DMA32;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 RESED 2/2] arm64: support DMA zone above 4GB
  2024-08-11  7:09 [PATCH v6 RESED 0/2] dma: support DMA zone starting above 4GB Baruch Siach
  2024-08-11  7:09 ` [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
@ 2024-08-11  7:09 ` Baruch Siach
  2024-08-12  5:54   ` Petr Tesarik
  2024-08-22  4:18 ` [PATCH v6 RESED 0/2] dma: support DMA zone starting " Christoph Hellwig
  2 siblings, 1 reply; 17+ messages in thread
From: Baruch Siach @ 2024-08-11  7:09 UTC (permalink / raw)
  To: Christoph Hellwig, Marek Szyprowski, Catalin Marinas, Will Deacon
  Cc: linux-s390, Baruch Siach, Ramon Fried, Petr Tesařík,
	Robin Murphy, linux-kernel, iommu, Elad Nachman, linuxppc-dev,
	linux-arm-kernel

From: Catalin Marinas <catalin.marinas@arm.com>

Commit 791ab8b2e3db ("arm64: Ignore any DMA offsets in the
max_zone_phys() calculation") made arm64 DMA/DMA32 zones span the entire
RAM when RAM starts above 32-bits. This breaks hardware with DMA area
that start above 32-bits. But the commit log says that "we haven't
noticed any such hardware". It turns out that such hardware does exist.

One such platform has RAM starting at 32GB with an internal bus that has
the following DMA limits:

  #address-cells = <2>;
  #size-cells = <2>;
  dma-ranges = <0x00 0xc0000000 0x08 0x00000000 0x00 0x40000000>;

That is, devices under this bus see 1GB of DMA range between 3GB-4GB in
their address space. This range is mapped to CPU memory at 32GB-33GB.
With current code DMA allocations for devices under this bus are not
limited to DMA area, leading to run-time allocation failure.

This commit reinstates DMA zone at the bottom of RAM. The result is DMA
zone that properly reflects the hardware constraints as follows:

[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000800000000-0x000000083fffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000840000000-0x0000000bffffffff]

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
[baruch: split off the original patch]
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---
 arch/arm64/mm/init.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index c45e2152ca9e..bfb10969cbf0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -114,20 +114,8 @@ static void __init arch_reserve_crashkernel(void)
 				    low_size, high);
 }
 
-/*
- * Return the maximum physical address for a zone given its limit.
- * If DRAM starts above 32-bit, expand the zone to the maximum
- * available memory, otherwise cap it at 32-bit.
- */
 static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
 {
-	phys_addr_t phys_start = memblock_start_of_DRAM();
-
-	if (phys_start > U32_MAX)
-		zone_limit = PHYS_ADDR_MAX;
-	else if (phys_start > zone_limit)
-		zone_limit = U32_MAX;
-
 	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-11  7:09 ` [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
@ 2024-08-12  5:52   ` Petr Tesarik
  2024-08-12 11:22   ` Catalin Marinas
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Petr Tesarik @ 2024-08-12  5:52 UTC (permalink / raw)
  To: Baruch Siach
  Cc: linux-s390, Ramon Fried, Will Deacon, Petr Tesařík,
	Catalin Marinas, linuxppc-dev, linux-kernel, iommu, Elad Nachman,
	Robin Murphy, Christoph Hellwig, linux-arm-kernel,
	Marek Szyprowski

On Sun, 11 Aug 2024 10:09:35 +0300
Baruch Siach <baruch@tkos.co.il> wrote:

> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> Hardware DMA limit might not be power of 2. When RAM range starts above
> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> can not encode this limit.
> 
> Use plain address for DMA zone limit.
> 
> Since DMA zone can now potentially span beyond 4GB physical limit of
> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>

LGTM.

Reviewed-by: Petr Tesarik <ptesarik@suse.com>

Petr T

> ---
>  arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
>  arch/powerpc/mm/mem.c      |  5 ++++-
>  arch/s390/mm/init.c        |  2 +-
>  include/linux/dma-direct.h |  2 +-
>  kernel/dma/direct.c        |  6 +++---
>  kernel/dma/pool.c          |  4 ++--
>  kernel/dma/swiotlb.c       |  6 +++---
>  7 files changed, 29 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 9b5ab6818f7f..c45e2152ca9e 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -115,35 +115,35 @@ static void __init arch_reserve_crashkernel(void)
>  }
>  
>  /*
> - * Return the maximum physical address for a zone accessible by the given bits
> - * limit. If DRAM starts above 32-bit, expand the zone to the maximum
> + * Return the maximum physical address for a zone given its limit.
> + * If DRAM starts above 32-bit, expand the zone to the maximum
>   * available memory, otherwise cap it at 32-bit.
>   */
> -static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> +static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>  {
> -	phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
>  	phys_addr_t phys_start = memblock_start_of_DRAM();
>  
>  	if (phys_start > U32_MAX)
> -		zone_mask = PHYS_ADDR_MAX;
> -	else if (phys_start > zone_mask)
> -		zone_mask = U32_MAX;
> +		zone_limit = PHYS_ADDR_MAX;
> +	else if (phys_start > zone_limit)
> +		zone_limit = U32_MAX;
>  
> -	return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
> +	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
>  }
>  
>  static void __init zone_sizes_init(void)
>  {
>  	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> -	unsigned int __maybe_unused acpi_zone_dma_bits;
> -	unsigned int __maybe_unused dt_zone_dma_bits;
> -	phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32);
> +	phys_addr_t __maybe_unused acpi_zone_dma_limit;
> +	phys_addr_t __maybe_unused dt_zone_dma_limit;
> +	phys_addr_t __maybe_unused dma32_phys_limit =
> +		max_zone_phys(DMA_BIT_MASK(32));
>  
>  #ifdef CONFIG_ZONE_DMA
> -	acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
> -	dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
> -	zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
> -	arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> +	acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
> +	dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
> +	zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
> +	arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
>  	max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>  #endif
>  #ifdef CONFIG_ZONE_DMA32
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index d325217ab201..05b7f702b3f7 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -216,7 +216,7 @@ static int __init mark_nonram_nosave(void)
>   * everything else. GFP_DMA32 page allocations automatically fall back to
>   * ZONE_DMA.
>   *
> - * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
> + * By using 31-bit unconditionally, we can exploit zone_dma_limit to inform the
>   * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
>   * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
>   * ZONE_DMA.
> @@ -230,6 +230,7 @@ void __init paging_init(void)
>  {
>  	unsigned long long total_ram = memblock_phys_mem_size();
>  	phys_addr_t top_of_ram = memblock_end_of_DRAM();
> +	int zone_dma_bits;
>  
>  #ifdef CONFIG_HIGHMEM
>  	unsigned long v = __fix_to_virt(FIX_KMAP_END);
> @@ -256,6 +257,8 @@ void __init paging_init(void)
>  	else
>  		zone_dma_bits = 31;
>  
> +	zone_dma_limit = DMA_BIT_MASK(zone_dma_bits);
> +
>  #ifdef CONFIG_ZONE_DMA
>  	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
>  				      1UL << (zone_dma_bits - PAGE_SHIFT));
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index ddcd39ef4346..91fc2b91adfc 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -97,7 +97,7 @@ void __init paging_init(void)
>  
>  	vmem_map_init();
>  	sparse_init();
> -	zone_dma_bits = 31;
> +	zone_dma_limit = DMA_BIT_MASK(31);
>  	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>  	max_zone_pfns[ZONE_DMA] = virt_to_pfn(MAX_DMA_ADDRESS);
>  	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index edbe13d00776..d7e30d4f7503 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -12,7 +12,7 @@
>  #include <linux/mem_encrypt.h>
>  #include <linux/swiotlb.h>
>  
> -extern unsigned int zone_dma_bits;
> +extern u64 zone_dma_limit;
>  
>  /*
>   * Record the mapping of CPU physical to DMA addresses for a given region.
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4480a3cd92e0..f2ba074a6a54 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -20,7 +20,7 @@
>   * it for entirely different regions. In that case the arch code needs to
>   * override the variable below for dma-direct to work properly.
>   */
> -unsigned int zone_dma_bits __ro_after_init = 24;
> +u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>  
>  static inline dma_addr_t phys_to_dma_direct(struct device *dev,
>  		phys_addr_t phys)
> @@ -59,7 +59,7 @@ static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 *phys_limit)
>  	 * zones.
>  	 */
>  	*phys_limit = dma_to_phys(dev, dma_limit);
> -	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> +	if (*phys_limit <= zone_dma_limit)
>  		return GFP_DMA;
>  	if (*phys_limit <= DMA_BIT_MASK(32))
>  		return GFP_DMA32;
> @@ -580,7 +580,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
>  	 * part of the check.
>  	 */
>  	if (IS_ENABLED(CONFIG_ZONE_DMA))
> -		min_mask = min_t(u64, min_mask, DMA_BIT_MASK(zone_dma_bits));
> +		min_mask = min_t(u64, min_mask, zone_dma_limit);
>  	return mask >= phys_to_dma_unencrypted(dev, min_mask);
>  }
>  
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index d10613eb0f63..7b04f7575796 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
>  	/* CMA can't cross zone boundaries, see cma_activate_area() */
>  	end = cma_get_base(cma) + size - 1;
>  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> -		return end <= DMA_BIT_MASK(zone_dma_bits);
> +		return end <= zone_dma_limit;
>  	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
> -		return end <= DMA_BIT_MASK(32);
> +		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
>  	return true;
>  }
>  
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index df68d29740a0..abcf3fa63a56 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>  	if (!remap)
>  		io_tlb_default_mem.can_grow = true;
>  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
> +		io_tlb_default_mem.phys_limit = zone_dma_limit;
>  	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
> +		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
>  	else
>  		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
>  #endif
> @@ -629,7 +629,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>  	}
>  
>  	gfp &= ~GFP_ZONEMASK;
> -	if (phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> +	if (phys_limit <= zone_dma_limit)
>  		gfp |= __GFP_DMA;
>  	else if (phys_limit <= DMA_BIT_MASK(32))
>  		gfp |= __GFP_DMA32;


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 2/2] arm64: support DMA zone above 4GB
  2024-08-11  7:09 ` [PATCH v6 RESED 2/2] arm64: support DMA zone above 4GB Baruch Siach
@ 2024-08-12  5:54   ` Petr Tesarik
  0 siblings, 0 replies; 17+ messages in thread
From: Petr Tesarik @ 2024-08-12  5:54 UTC (permalink / raw)
  To: Baruch Siach
  Cc: linux-s390, Ramon Fried, Will Deacon, Petr Tesařík,
	Catalin Marinas, linuxppc-dev, linux-kernel, iommu, Elad Nachman,
	Robin Murphy, Christoph Hellwig, linux-arm-kernel,
	Marek Szyprowski

On Sun, 11 Aug 2024 10:09:36 +0300
Baruch Siach <baruch@tkos.co.il> wrote:

> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> Commit 791ab8b2e3db ("arm64: Ignore any DMA offsets in the
> max_zone_phys() calculation") made arm64 DMA/DMA32 zones span the entire
> RAM when RAM starts above 32-bits. This breaks hardware with DMA area
> that start above 32-bits. But the commit log says that "we haven't
> noticed any such hardware". It turns out that such hardware does exist.
> 
> One such platform has RAM starting at 32GB with an internal bus that has
> the following DMA limits:
> 
>   #address-cells = <2>;
>   #size-cells = <2>;
>   dma-ranges = <0x00 0xc0000000 0x08 0x00000000 0x00 0x40000000>;
> 
> That is, devices under this bus see 1GB of DMA range between 3GB-4GB in
> their address space. This range is mapped to CPU memory at 32GB-33GB.
> With current code DMA allocations for devices under this bus are not
> limited to DMA area, leading to run-time allocation failure.
> 
> This commit reinstates DMA zone at the bottom of RAM. The result is DMA
> zone that properly reflects the hardware constraints as follows:
> 
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000800000000-0x000000083fffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   [mem 0x0000000840000000-0x0000000bffffffff]
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> [baruch: split off the original patch]
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>

Note that I'm not an Arm64 maintainer, so the value of my review is
limited, but AFAICS this change should work as intended.

Reviewed-by: Petr Tesarik <ptesarik@suse.com>

Petr T

> ---
>  arch/arm64/mm/init.c | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index c45e2152ca9e..bfb10969cbf0 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -114,20 +114,8 @@ static void __init arch_reserve_crashkernel(void)
>  				    low_size, high);
>  }
>  
> -/*
> - * Return the maximum physical address for a zone given its limit.
> - * If DRAM starts above 32-bit, expand the zone to the maximum
> - * available memory, otherwise cap it at 32-bit.
> - */
>  static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>  {
> -	phys_addr_t phys_start = memblock_start_of_DRAM();
> -
> -	if (phys_start > U32_MAX)
> -		zone_limit = PHYS_ADDR_MAX;
> -	else if (phys_start > zone_limit)
> -		zone_limit = U32_MAX;
> -
>  	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
>  }
>  


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-11  7:09 ` [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
  2024-08-12  5:52   ` Petr Tesarik
@ 2024-08-12 11:22   ` Catalin Marinas
  2024-08-16 11:52   ` Will Deacon
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2024-08-12 11:22 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Christoph Hellwig, Marek Szyprowski, Will Deacon, Robin Murphy,
	iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman

On Sun, Aug 11, 2024 at 10:09:35AM +0300, Baruch Siach wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> Hardware DMA limit might not be power of 2. When RAM range starts above
> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> can not encode this limit.
> 
> Use plain address for DMA zone limit.
> 
> Since DMA zone can now potentially span beyond 4GB physical limit of
> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>

You might want to say that no functional change is expected with this
patch. The patch looks fine.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-11  7:09 ` [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
  2024-08-12  5:52   ` Petr Tesarik
  2024-08-12 11:22   ` Catalin Marinas
@ 2024-08-16 11:52   ` Will Deacon
  2024-08-16 14:37     ` Petr Tesařík
  2024-08-26 19:28   ` Marek Szyprowski
  2024-08-29 13:42   ` Neil Armstrong
  4 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2024-08-16 11:52 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Christoph Hellwig, Marek Szyprowski, Catalin Marinas,
	Robin Murphy, iommu, linux-arm-kernel, linux-kernel, linuxppc-dev,
	linux-s390, Petr Tesařík, Ramon Fried, Elad Nachman

On Sun, Aug 11, 2024 at 10:09:35AM +0300, Baruch Siach wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> Hardware DMA limit might not be power of 2. When RAM range starts above
> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> can not encode this limit.
> 
> Use plain address for DMA zone limit.
> 
> Since DMA zone can now potentially span beyond 4GB physical limit of
> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---
>  arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
>  arch/powerpc/mm/mem.c      |  5 ++++-
>  arch/s390/mm/init.c        |  2 +-
>  include/linux/dma-direct.h |  2 +-
>  kernel/dma/direct.c        |  6 +++---
>  kernel/dma/pool.c          |  4 ++--
>  kernel/dma/swiotlb.c       |  6 +++---
>  7 files changed, 29 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 9b5ab6818f7f..c45e2152ca9e 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -115,35 +115,35 @@ static void __init arch_reserve_crashkernel(void)
>  }
>  
>  /*
> - * Return the maximum physical address for a zone accessible by the given bits
> - * limit. If DRAM starts above 32-bit, expand the zone to the maximum
> + * Return the maximum physical address for a zone given its limit.
> + * If DRAM starts above 32-bit, expand the zone to the maximum
>   * available memory, otherwise cap it at 32-bit.
>   */
> -static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> +static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>  {
> -	phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
>  	phys_addr_t phys_start = memblock_start_of_DRAM();
>  
>  	if (phys_start > U32_MAX)
> -		zone_mask = PHYS_ADDR_MAX;
> -	else if (phys_start > zone_mask)
> -		zone_mask = U32_MAX;
> +		zone_limit = PHYS_ADDR_MAX;
> +	else if (phys_start > zone_limit)
> +		zone_limit = U32_MAX;
>  
> -	return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
> +	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;

Why do we need to adjust +-1 now that we're no longer using a mask?

>  }
>  
>  static void __init zone_sizes_init(void)
>  {
>  	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> -	unsigned int __maybe_unused acpi_zone_dma_bits;
> -	unsigned int __maybe_unused dt_zone_dma_bits;
> -	phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32);
> +	phys_addr_t __maybe_unused acpi_zone_dma_limit;
> +	phys_addr_t __maybe_unused dt_zone_dma_limit;
> +	phys_addr_t __maybe_unused dma32_phys_limit =
> +		max_zone_phys(DMA_BIT_MASK(32));
>  
>  #ifdef CONFIG_ZONE_DMA
> -	acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
> -	dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
> -	zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
> -	arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> +	acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
> +	dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
> +	zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
> +	arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
>  	max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>  #endif

Maybe move this block into a helper function so we can avoid three
__maybe_unused variables?

Will


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-16 11:52   ` Will Deacon
@ 2024-08-16 14:37     ` Petr Tesařík
  0 siblings, 0 replies; 17+ messages in thread
From: Petr Tesařík @ 2024-08-16 14:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Baruch Siach, Christoph Hellwig, Marek Szyprowski,
	Catalin Marinas, Robin Murphy, iommu, linux-arm-kernel,
	linux-kernel, linuxppc-dev, linux-s390, Ramon Fried, Elad Nachman

On Fri, 16 Aug 2024 12:52:47 +0100
Will Deacon <will@kernel.org> wrote:

> On Sun, Aug 11, 2024 at 10:09:35AM +0300, Baruch Siach wrote:
> > From: Catalin Marinas <catalin.marinas@arm.com>
> > 
> > Hardware DMA limit might not be power of 2. When RAM range starts above
> > 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> > can not encode this limit.
> > 
> > Use plain address for DMA zone limit.
> > 
> > Since DMA zone can now potentially span beyond 4GB physical limit of
> > DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
> > 
> > Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> > Co-developed-by: Baruch Siach <baruch@tkos.co.il>
> > Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> > ---
> >  arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
> >  arch/powerpc/mm/mem.c      |  5 ++++-
> >  arch/s390/mm/init.c        |  2 +-
> >  include/linux/dma-direct.h |  2 +-
> >  kernel/dma/direct.c        |  6 +++---
> >  kernel/dma/pool.c          |  4 ++--
> >  kernel/dma/swiotlb.c       |  6 +++---
> >  7 files changed, 29 insertions(+), 26 deletions(-)
> > 
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index 9b5ab6818f7f..c45e2152ca9e 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -115,35 +115,35 @@ static void __init arch_reserve_crashkernel(void)
> >  }
> >  
> >  /*
> > - * Return the maximum physical address for a zone accessible by the given bits
> > - * limit. If DRAM starts above 32-bit, expand the zone to the maximum
> > + * Return the maximum physical address for a zone given its limit.
> > + * If DRAM starts above 32-bit, expand the zone to the maximum
> >   * available memory, otherwise cap it at 32-bit.
> >   */
> > -static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> > +static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
> >  {
> > -	phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
> >  	phys_addr_t phys_start = memblock_start_of_DRAM();
> >  
> >  	if (phys_start > U32_MAX)
> > -		zone_mask = PHYS_ADDR_MAX;
> > -	else if (phys_start > zone_mask)
> > -		zone_mask = U32_MAX;
> > +		zone_limit = PHYS_ADDR_MAX;
> > +	else if (phys_start > zone_limit)
> > +		zone_limit = U32_MAX;
> >  
> > -	return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
> > +	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;  
> 
> Why do we need to adjust +-1 now that we're no longer using a mask?

Subtracting 1 is needed to get the highest valid DRAM address so it can
be compared to the highest address in the zone (zone_limit). Adding 1
is necessary to get the lowest address beyond the zone.

AFAICT this is the right thing here:

	arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
	max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);

max_zone_pfns[] max values are exclusive, i.e. the lowest PFN which is
_not_ within the zone.

It is also the right thing when arm64_dma_phys_limit is passed to
dma_contiguous_reserve().

It would be subtly broken if phys_addr_t could be a 32-bit integer, but
that's not possible on arm64.

In short, LGTM.

Petr T


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 0/2] dma: support DMA zone starting above 4GB
  2024-08-11  7:09 [PATCH v6 RESED 0/2] dma: support DMA zone starting above 4GB Baruch Siach
  2024-08-11  7:09 ` [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
  2024-08-11  7:09 ` [PATCH v6 RESED 2/2] arm64: support DMA zone above 4GB Baruch Siach
@ 2024-08-22  4:18 ` Christoph Hellwig
  2 siblings, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-08-22  4:18 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Christoph Hellwig, Marek Szyprowski, Catalin Marinas, Will Deacon,
	Robin Murphy, iommu, linux-arm-kernel, linux-kernel, linuxppc-dev,
	linux-s390, Petr Tesařík, Ramon Fried, Elad Nachman

Thanks,

applied to the dma-mapping tree for Linux 6.12.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-11  7:09 ` [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
                     ` (2 preceding siblings ...)
  2024-08-16 11:52   ` Will Deacon
@ 2024-08-26 19:28   ` Marek Szyprowski
  2024-08-27  4:52     ` Baruch Siach
  2024-08-29 13:42   ` Neil Armstrong
  4 siblings, 1 reply; 17+ messages in thread
From: Marek Szyprowski @ 2024-08-26 19:28 UTC (permalink / raw)
  To: Baruch Siach, Christoph Hellwig, Catalin Marinas, Will Deacon
  Cc: Robin Murphy, iommu, linux-arm-kernel, linux-kernel, linuxppc-dev,
	linux-s390, Petr Tesařík, Ramon Fried, Elad Nachman,
	linux-rockchip

Dear All,

On 11.08.2024 09:09, Baruch Siach wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
>
> Hardware DMA limit might not be power of 2. When RAM range starts above
> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> can not encode this limit.
>
> Use plain address for DMA zone limit.
>
> Since DMA zone can now potentially span beyond 4GB physical limit of
> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---

This patch landed recently in linux-next as commit ba0fb44aed47 
("dma-mapping: replace zone_dma_bits by zone_dma_limit"). During my 
tests I found that it introduces the following warning on ARM64/Rockchip 
based Odroid M1 board (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts):

------------[ cut here ]------------
dwmmc_rockchip fe2b0000.mmc: swiotlb addr 0x00000001faf00000+4096 
overflow (mask ffffffff, bus limit 0).
WARNING: CPU: 3 PID: 1 at kernel/dma/swiotlb.c:1594 swiotlb_map+0x2f0/0x308
Modules linked in:
CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc4+ #15278
Hardware name: Hardkernel ODROID-M1 (DT)
pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : swiotlb_map+0x2f0/0x308
lr : swiotlb_map+0x2f0/0x308
...
Call trace:
  swiotlb_map+0x2f0/0x308
  dma_direct_map_sg+0x9c/0x2e4
  __dma_map_sg_attrs+0x28/0x94
  dma_map_sg_attrs+0x10/0x24
  dw_mci_pre_dma_transfer+0xb8/0xf4
  dw_mci_pre_req+0x50/0x68
  mmc_blk_mq_issue_rq+0x3e0/0x964
  mmc_mq_queue_rq+0x118/0x2b4
  blk_mq_dispatch_rq_list+0x21c/0x714
  __blk_mq_sched_dispatch_requests+0x490/0x58c
  blk_mq_sched_dispatch_requests+0x30/0x6c
  blk_mq_run_hw_queue+0x284/0x40c
  blk_mq_flush_plug_list.part.0+0x190/0x974
  blk_mq_flush_plug_list+0x1c/0x2c
  __blk_flush_plug+0xe4/0x140
  blk_finish_plug+0x38/0x4c
  __ext4_get_inode_loc+0x22c/0x654
  __ext4_get_inode_loc_noinmem+0x40/0xa8
  __ext4_iget+0x154/0xcc0
  ext4_get_journal_inode+0x30/0x110
  ext4_load_and_init_journal+0x9c/0xaf0
  ext4_fill_super+0x1fec/0x2d90
  get_tree_bdev+0x140/0x1d8
  ext4_get_tree+0x18/0x24
  vfs_get_tree+0x28/0xe8
  path_mount+0x3e8/0xb7c
  init_mount+0x68/0xac
  do_mount_root+0x108/0x1dc
  mount_root_generic+0x100/0x330
  mount_root+0x160/0x2d0
  initrd_load+0x1f0/0x2a0
  prepare_namespace+0x4c/0x29c
  kernel_init_freeable+0x4b4/0x50c
  kernel_init+0x20/0x1d8
  ret_from_fork+0x10/0x20
irq event stamp: 1305682
hardirqs last  enabled at (1305681): [<ffff8000800e332c>] 
console_unlock+0x124/0x130
hardirqs last disabled at (1305682): [<ffff80008124e684>] el1_dbg+0x24/0x8c
softirqs last  enabled at (1305678): [<ffff80008005be1c>] 
handle_softirqs+0x4cc/0x4e4
softirqs last disabled at (1305665): [<ffff8000800105b0>] 
__do_softirq+0x14/0x20
---[ end trace 0000000000000000 ]---

This "bus limit 0" seems to be a bit suspicious to me as well as the 
fact that swiotlb is used for the MMC DMA. I will investigate this 
further tomorrow. The board boots fine though.


>   arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
>   arch/powerpc/mm/mem.c      |  5 ++++-
>   arch/s390/mm/init.c        |  2 +-
>   include/linux/dma-direct.h |  2 +-
>   kernel/dma/direct.c        |  6 +++---
>   kernel/dma/pool.c          |  4 ++--
>   kernel/dma/swiotlb.c       |  6 +++---
>   7 files changed, 29 insertions(+), 26 deletions(-)
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 9b5ab6818f7f..c45e2152ca9e 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -115,35 +115,35 @@ static void __init arch_reserve_crashkernel(void)
>   }
>   
>   /*
> - * Return the maximum physical address for a zone accessible by the given bits
> - * limit. If DRAM starts above 32-bit, expand the zone to the maximum
> + * Return the maximum physical address for a zone given its limit.
> + * If DRAM starts above 32-bit, expand the zone to the maximum
>    * available memory, otherwise cap it at 32-bit.
>    */
> -static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> +static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>   {
> -	phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
>   	phys_addr_t phys_start = memblock_start_of_DRAM();
>   
>   	if (phys_start > U32_MAX)
> -		zone_mask = PHYS_ADDR_MAX;
> -	else if (phys_start > zone_mask)
> -		zone_mask = U32_MAX;
> +		zone_limit = PHYS_ADDR_MAX;
> +	else if (phys_start > zone_limit)
> +		zone_limit = U32_MAX;
>   
> -	return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
> +	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
>   }
>   
>   static void __init zone_sizes_init(void)
>   {
>   	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
> -	unsigned int __maybe_unused acpi_zone_dma_bits;
> -	unsigned int __maybe_unused dt_zone_dma_bits;
> -	phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32);
> +	phys_addr_t __maybe_unused acpi_zone_dma_limit;
> +	phys_addr_t __maybe_unused dt_zone_dma_limit;
> +	phys_addr_t __maybe_unused dma32_phys_limit =
> +		max_zone_phys(DMA_BIT_MASK(32));
>   
>   #ifdef CONFIG_ZONE_DMA
> -	acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
> -	dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
> -	zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
> -	arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> +	acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
> +	dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
> +	zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
> +	arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
>   	max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>   #endif
>   #ifdef CONFIG_ZONE_DMA32
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index d325217ab201..05b7f702b3f7 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -216,7 +216,7 @@ static int __init mark_nonram_nosave(void)
>    * everything else. GFP_DMA32 page allocations automatically fall back to
>    * ZONE_DMA.
>    *
> - * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
> + * By using 31-bit unconditionally, we can exploit zone_dma_limit to inform the
>    * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
>    * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
>    * ZONE_DMA.
> @@ -230,6 +230,7 @@ void __init paging_init(void)
>   {
>   	unsigned long long total_ram = memblock_phys_mem_size();
>   	phys_addr_t top_of_ram = memblock_end_of_DRAM();
> +	int zone_dma_bits;
>   
>   #ifdef CONFIG_HIGHMEM
>   	unsigned long v = __fix_to_virt(FIX_KMAP_END);
> @@ -256,6 +257,8 @@ void __init paging_init(void)
>   	else
>   		zone_dma_bits = 31;
>   
> +	zone_dma_limit = DMA_BIT_MASK(zone_dma_bits);
> +
>   #ifdef CONFIG_ZONE_DMA
>   	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
>   				      1UL << (zone_dma_bits - PAGE_SHIFT));
> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
> index ddcd39ef4346..91fc2b91adfc 100644
> --- a/arch/s390/mm/init.c
> +++ b/arch/s390/mm/init.c
> @@ -97,7 +97,7 @@ void __init paging_init(void)
>   
>   	vmem_map_init();
>   	sparse_init();
> -	zone_dma_bits = 31;
> +	zone_dma_limit = DMA_BIT_MASK(31);
>   	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>   	max_zone_pfns[ZONE_DMA] = virt_to_pfn(MAX_DMA_ADDRESS);
>   	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index edbe13d00776..d7e30d4f7503 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -12,7 +12,7 @@
>   #include <linux/mem_encrypt.h>
>   #include <linux/swiotlb.h>
>   
> -extern unsigned int zone_dma_bits;
> +extern u64 zone_dma_limit;
>   
>   /*
>    * Record the mapping of CPU physical to DMA addresses for a given region.
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4480a3cd92e0..f2ba074a6a54 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -20,7 +20,7 @@
>    * it for entirely different regions. In that case the arch code needs to
>    * override the variable below for dma-direct to work properly.
>    */
> -unsigned int zone_dma_bits __ro_after_init = 24;
> +u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>   
>   static inline dma_addr_t phys_to_dma_direct(struct device *dev,
>   		phys_addr_t phys)
> @@ -59,7 +59,7 @@ static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 *phys_limit)
>   	 * zones.
>   	 */
>   	*phys_limit = dma_to_phys(dev, dma_limit);
> -	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> +	if (*phys_limit <= zone_dma_limit)
>   		return GFP_DMA;
>   	if (*phys_limit <= DMA_BIT_MASK(32))
>   		return GFP_DMA32;
> @@ -580,7 +580,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
>   	 * part of the check.
>   	 */
>   	if (IS_ENABLED(CONFIG_ZONE_DMA))
> -		min_mask = min_t(u64, min_mask, DMA_BIT_MASK(zone_dma_bits));
> +		min_mask = min_t(u64, min_mask, zone_dma_limit);
>   	return mask >= phys_to_dma_unencrypted(dev, min_mask);
>   }
>   
> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> index d10613eb0f63..7b04f7575796 100644
> --- a/kernel/dma/pool.c
> +++ b/kernel/dma/pool.c
> @@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
>   	/* CMA can't cross zone boundaries, see cma_activate_area() */
>   	end = cma_get_base(cma) + size - 1;
>   	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> -		return end <= DMA_BIT_MASK(zone_dma_bits);
> +		return end <= zone_dma_limit;
>   	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
> -		return end <= DMA_BIT_MASK(32);
> +		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
>   	return true;
>   }
>   
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index df68d29740a0..abcf3fa63a56 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>   	if (!remap)
>   		io_tlb_default_mem.can_grow = true;
>   	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
> +		io_tlb_default_mem.phys_limit = zone_dma_limit;
>   	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
> +		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
>   	else
>   		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
>   #endif
> @@ -629,7 +629,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>   	}
>   
>   	gfp &= ~GFP_ZONEMASK;
> -	if (phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> +	if (phys_limit <= zone_dma_limit)
>   		gfp |= __GFP_DMA;
>   	else if (phys_limit <= DMA_BIT_MASK(32))
>   		gfp |= __GFP_DMA32;

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-26 19:28   ` Marek Szyprowski
@ 2024-08-27  4:52     ` Baruch Siach
  2024-08-27  6:14       ` Marek Szyprowski
  0 siblings, 1 reply; 17+ messages in thread
From: Baruch Siach @ 2024-08-27  4:52 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Christoph Hellwig, Catalin Marinas, Will Deacon, Robin Murphy,
	iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman, linux-rockchip

Hi Marek,

Thanks for your report.

On Mon, Aug 26 2024, Marek Szyprowski wrote:
> On 11.08.2024 09:09, Baruch Siach wrote:
>> From: Catalin Marinas <catalin.marinas@arm.com>
>>
>> Hardware DMA limit might not be power of 2. When RAM range starts above
>> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
>> can not encode this limit.
>>
>> Use plain address for DMA zone limit.
>>
>> Since DMA zone can now potentially span beyond 4GB physical limit of
>> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
>>
>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
>> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
>> ---
>
> This patch landed recently in linux-next as commit ba0fb44aed47 
> ("dma-mapping: replace zone_dma_bits by zone_dma_limit"). During my 
> tests I found that it introduces the following warning on ARM64/Rockchip 
> based Odroid M1 board (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts):

Does this warning go away if you revert both 3be9b846896d and ba0fb44aed47?

Upstream rockchip DTs have no dma-ranges property. Is that the case for
your platform as well?

Can you share kernel report of DMA zones and swiotlb? On my platform I get:

[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000800000000-0x000000083fffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000840000000-0x0000000fffffffff]
...
[    0.000000] software IO TLB: area num 8.
[    0.000000] software IO TLB: mapped [mem 0x000000083be38000-0x000000083fe38000] (64MB)

What do you get at your end?

> ------------[ cut here ]------------
> dwmmc_rockchip fe2b0000.mmc: swiotlb addr 0x00000001faf00000+4096 
> overflow (mask ffffffff, bus limit 0).
> WARNING: CPU: 3 PID: 1 at kernel/dma/swiotlb.c:1594 swiotlb_map+0x2f0/0x308
> Modules linked in:
> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc4+ #15278
> Hardware name: Hardkernel ODROID-M1 (DT)
> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : swiotlb_map+0x2f0/0x308
> lr : swiotlb_map+0x2f0/0x308
> ...
> Call trace:
>   swiotlb_map+0x2f0/0x308
>   dma_direct_map_sg+0x9c/0x2e4
>   __dma_map_sg_attrs+0x28/0x94
>   dma_map_sg_attrs+0x10/0x24
>   dw_mci_pre_dma_transfer+0xb8/0xf4
>   dw_mci_pre_req+0x50/0x68
>   mmc_blk_mq_issue_rq+0x3e0/0x964
>   mmc_mq_queue_rq+0x118/0x2b4
>   blk_mq_dispatch_rq_list+0x21c/0x714
>   __blk_mq_sched_dispatch_requests+0x490/0x58c
>   blk_mq_sched_dispatch_requests+0x30/0x6c
>   blk_mq_run_hw_queue+0x284/0x40c
>   blk_mq_flush_plug_list.part.0+0x190/0x974
>   blk_mq_flush_plug_list+0x1c/0x2c
>   __blk_flush_plug+0xe4/0x140
>   blk_finish_plug+0x38/0x4c
>   __ext4_get_inode_loc+0x22c/0x654
>   __ext4_get_inode_loc_noinmem+0x40/0xa8
>   __ext4_iget+0x154/0xcc0
>   ext4_get_journal_inode+0x30/0x110
>   ext4_load_and_init_journal+0x9c/0xaf0
>   ext4_fill_super+0x1fec/0x2d90
>   get_tree_bdev+0x140/0x1d8
>   ext4_get_tree+0x18/0x24
>   vfs_get_tree+0x28/0xe8
>   path_mount+0x3e8/0xb7c
>   init_mount+0x68/0xac
>   do_mount_root+0x108/0x1dc
>   mount_root_generic+0x100/0x330
>   mount_root+0x160/0x2d0
>   initrd_load+0x1f0/0x2a0
>   prepare_namespace+0x4c/0x29c
>   kernel_init_freeable+0x4b4/0x50c
>   kernel_init+0x20/0x1d8
>   ret_from_fork+0x10/0x20
> irq event stamp: 1305682
> hardirqs last  enabled at (1305681): [<ffff8000800e332c>] 
> console_unlock+0x124/0x130
> hardirqs last disabled at (1305682): [<ffff80008124e684>] el1_dbg+0x24/0x8c
> softirqs last  enabled at (1305678): [<ffff80008005be1c>] 
> handle_softirqs+0x4cc/0x4e4
> softirqs last disabled at (1305665): [<ffff8000800105b0>] 
> __do_softirq+0x14/0x20
> ---[ end trace 0000000000000000 ]---
>
> This "bus limit 0" seems to be a bit suspicious to me as well as the 
> fact that swiotlb is used for the MMC DMA. I will investigate this 
> further tomorrow. The board boots fine though.

Looking at the code I guess that bus_dma_limit set to 0 means no bus
limit. But dma_mask for your device indicates 32-bit device limit. This
can't work with address above 4GB. For some reason DMA code tries to
allocate from higher address. This is most likely the reason
dma_capable() returns false.

Thanks,
baruch

>>   arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
>>   arch/powerpc/mm/mem.c      |  5 ++++-
>>   arch/s390/mm/init.c        |  2 +-
>>   include/linux/dma-direct.h |  2 +-
>>   kernel/dma/direct.c        |  6 +++---
>>   kernel/dma/pool.c          |  4 ++--
>>   kernel/dma/swiotlb.c       |  6 +++---
>>   7 files changed, 29 insertions(+), 26 deletions(-)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 9b5ab6818f7f..c45e2152ca9e 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -115,35 +115,35 @@ static void __init arch_reserve_crashkernel(void)
>>   }
>>   
>>   /*
>> - * Return the maximum physical address for a zone accessible by the given bits
>> - * limit. If DRAM starts above 32-bit, expand the zone to the maximum
>> + * Return the maximum physical address for a zone given its limit.
>> + * If DRAM starts above 32-bit, expand the zone to the maximum
>>    * available memory, otherwise cap it at 32-bit.
>>    */
>> -static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
>> +static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>>   {
>> -	phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
>>   	phys_addr_t phys_start = memblock_start_of_DRAM();
>>   
>>   	if (phys_start > U32_MAX)
>> -		zone_mask = PHYS_ADDR_MAX;
>> -	else if (phys_start > zone_mask)
>> -		zone_mask = U32_MAX;
>> +		zone_limit = PHYS_ADDR_MAX;
>> +	else if (phys_start > zone_limit)
>> +		zone_limit = U32_MAX;
>>   
>> -	return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
>> +	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
>>   }
>>   
>>   static void __init zone_sizes_init(void)
>>   {
>>   	unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
>> -	unsigned int __maybe_unused acpi_zone_dma_bits;
>> -	unsigned int __maybe_unused dt_zone_dma_bits;
>> -	phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32);
>> +	phys_addr_t __maybe_unused acpi_zone_dma_limit;
>> +	phys_addr_t __maybe_unused dt_zone_dma_limit;
>> +	phys_addr_t __maybe_unused dma32_phys_limit =
>> +		max_zone_phys(DMA_BIT_MASK(32));
>>   
>>   #ifdef CONFIG_ZONE_DMA
>> -	acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
>> -	dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
>> -	zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
>> -	arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
>> +	acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
>> +	dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
>> +	zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
>> +	arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
>>   	max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>>   #endif
>>   #ifdef CONFIG_ZONE_DMA32
>> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
>> index d325217ab201..05b7f702b3f7 100644
>> --- a/arch/powerpc/mm/mem.c
>> +++ b/arch/powerpc/mm/mem.c
>> @@ -216,7 +216,7 @@ static int __init mark_nonram_nosave(void)
>>    * everything else. GFP_DMA32 page allocations automatically fall back to
>>    * ZONE_DMA.
>>    *
>> - * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the
>> + * By using 31-bit unconditionally, we can exploit zone_dma_limit to inform the
>>    * generic DMA mapping code.  32-bit only devices (if not handled by an IOMMU
>>    * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by
>>    * ZONE_DMA.
>> @@ -230,6 +230,7 @@ void __init paging_init(void)
>>   {
>>   	unsigned long long total_ram = memblock_phys_mem_size();
>>   	phys_addr_t top_of_ram = memblock_end_of_DRAM();
>> +	int zone_dma_bits;
>>   
>>   #ifdef CONFIG_HIGHMEM
>>   	unsigned long v = __fix_to_virt(FIX_KMAP_END);
>> @@ -256,6 +257,8 @@ void __init paging_init(void)
>>   	else
>>   		zone_dma_bits = 31;
>>   
>> +	zone_dma_limit = DMA_BIT_MASK(zone_dma_bits);
>> +
>>   #ifdef CONFIG_ZONE_DMA
>>   	max_zone_pfns[ZONE_DMA]	= min(max_low_pfn,
>>   				      1UL << (zone_dma_bits - PAGE_SHIFT));
>> diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
>> index ddcd39ef4346..91fc2b91adfc 100644
>> --- a/arch/s390/mm/init.c
>> +++ b/arch/s390/mm/init.c
>> @@ -97,7 +97,7 @@ void __init paging_init(void)
>>   
>>   	vmem_map_init();
>>   	sparse_init();
>> -	zone_dma_bits = 31;
>> +	zone_dma_limit = DMA_BIT_MASK(31);
>>   	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
>>   	max_zone_pfns[ZONE_DMA] = virt_to_pfn(MAX_DMA_ADDRESS);
>>   	max_zone_pfns[ZONE_NORMAL] = max_low_pfn;
>> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
>> index edbe13d00776..d7e30d4f7503 100644
>> --- a/include/linux/dma-direct.h
>> +++ b/include/linux/dma-direct.h
>> @@ -12,7 +12,7 @@
>>   #include <linux/mem_encrypt.h>
>>   #include <linux/swiotlb.h>
>>   
>> -extern unsigned int zone_dma_bits;
>> +extern u64 zone_dma_limit;
>>   
>>   /*
>>    * Record the mapping of CPU physical to DMA addresses for a given region.
>> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
>> index 4480a3cd92e0..f2ba074a6a54 100644
>> --- a/kernel/dma/direct.c
>> +++ b/kernel/dma/direct.c
>> @@ -20,7 +20,7 @@
>>    * it for entirely different regions. In that case the arch code needs to
>>    * override the variable below for dma-direct to work properly.
>>    */
>> -unsigned int zone_dma_bits __ro_after_init = 24;
>> +u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>>   
>>   static inline dma_addr_t phys_to_dma_direct(struct device *dev,
>>   		phys_addr_t phys)
>> @@ -59,7 +59,7 @@ static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 *phys_limit)
>>   	 * zones.
>>   	 */
>>   	*phys_limit = dma_to_phys(dev, dma_limit);
>> -	if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
>> +	if (*phys_limit <= zone_dma_limit)
>>   		return GFP_DMA;
>>   	if (*phys_limit <= DMA_BIT_MASK(32))
>>   		return GFP_DMA32;
>> @@ -580,7 +580,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
>>   	 * part of the check.
>>   	 */
>>   	if (IS_ENABLED(CONFIG_ZONE_DMA))
>> -		min_mask = min_t(u64, min_mask, DMA_BIT_MASK(zone_dma_bits));
>> +		min_mask = min_t(u64, min_mask, zone_dma_limit);
>>   	return mask >= phys_to_dma_unencrypted(dev, min_mask);
>>   }
>>   
>> diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
>> index d10613eb0f63..7b04f7575796 100644
>> --- a/kernel/dma/pool.c
>> +++ b/kernel/dma/pool.c
>> @@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
>>   	/* CMA can't cross zone boundaries, see cma_activate_area() */
>>   	end = cma_get_base(cma) + size - 1;
>>   	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
>> -		return end <= DMA_BIT_MASK(zone_dma_bits);
>> +		return end <= zone_dma_limit;
>>   	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
>> -		return end <= DMA_BIT_MASK(32);
>> +		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
>>   	return true;
>>   }
>>   
>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>> index df68d29740a0..abcf3fa63a56 100644
>> --- a/kernel/dma/swiotlb.c
>> +++ b/kernel/dma/swiotlb.c
>> @@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>>   	if (!remap)
>>   		io_tlb_default_mem.can_grow = true;
>>   	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
>> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
>> +		io_tlb_default_mem.phys_limit = zone_dma_limit;
>>   	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
>> -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
>> +		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
>>   	else
>>   		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
>>   #endif
>> @@ -629,7 +629,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
>>   	}
>>   
>>   	gfp &= ~GFP_ZONEMASK;
>> -	if (phys_limit <= DMA_BIT_MASK(zone_dma_bits))
>> +	if (phys_limit <= zone_dma_limit)
>>   		gfp |= __GFP_DMA;
>>   	else if (phys_limit <= DMA_BIT_MASK(32))
>>   		gfp |= __GFP_DMA32;
>
> Best regards

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-27  4:52     ` Baruch Siach
@ 2024-08-27  6:14       ` Marek Szyprowski
  2024-08-27  7:03         ` Baruch Siach
  0 siblings, 1 reply; 17+ messages in thread
From: Marek Szyprowski @ 2024-08-27  6:14 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Christoph Hellwig, Catalin Marinas, Will Deacon, Robin Murphy,
	iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman, linux-rockchip

On 27.08.2024 06:52, Baruch Siach wrote:
> Hi Marek,
>
> Thanks for your report.
>
> On Mon, Aug 26 2024, Marek Szyprowski wrote:
>> On 11.08.2024 09:09, Baruch Siach wrote:
>>> From: Catalin Marinas <catalin.marinas@arm.com>
>>>
>>> Hardware DMA limit might not be power of 2. When RAM range starts above
>>> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
>>> can not encode this limit.
>>>
>>> Use plain address for DMA zone limit.
>>>
>>> Since DMA zone can now potentially span beyond 4GB physical limit of
>>> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
>>>
>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
>>> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
>>> ---
>> This patch landed recently in linux-next as commit ba0fb44aed47
>> ("dma-mapping: replace zone_dma_bits by zone_dma_limit"). During my
>> tests I found that it introduces the following warning on ARM64/Rockchip
>> based Odroid M1 board (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts):
> Does this warning go away if you revert both 3be9b846896d and ba0fb44aed47?

Yes, linux-next with above mentioned commits reverted works fine.


> Upstream rockchip DTs have no dma-ranges property. Is that the case for
> your platform as well?
>
> Can you share kernel report of DMA zones and swiotlb? On my platform I get:
>
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000800000000-0x000000083fffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   [mem 0x0000000840000000-0x0000000fffffffff]
> ...
> [    0.000000] software IO TLB: area num 8.
> [    0.000000] software IO TLB: mapped [mem 0x000000083be38000-0x000000083fe38000] (64MB)
>
> What do you get at your end?

On ba0fb44aed47 I got:

[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem 
0x0000000000200000-0x00000001ffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1ff7a0600-0x1ff7a2fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000200000-0x00000001ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000200000-0x00000000083fffff]
[    0.000000]   node   0: [mem 0x0000000009400000-0x00000000efffffff]
[    0.000000]   node   0: [mem 0x00000001f0000000-0x00000001ffffffff]
[    0.000000] Initmem setup node 0 [mem 
0x0000000000200000-0x00000001ffffffff]
[    0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
[    0.000000] cma: Reserved 96 MiB at 0x00000001f0000000 on node -1

...

[    0.000000] software IO TLB: SWIOTLB bounce buffer size adjusted to 3MB
[    0.000000] software IO TLB: area num 4.
[    0.000000] software IO TLB: mapped [mem 
0x00000001fac00000-0x00000001fb000000] (4MB)

On the fa3c109a6d30 (parent commit of the $subject) I got:

[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem 
0x0000000000200000-0x00000001ffffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x1ff7a0600-0x1ff7a2fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x00000001ffffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000200000-0x00000000083fffff]
[    0.000000]   node   0: [mem 0x0000000009400000-0x00000000efffffff]
[    0.000000]   node   0: [mem 0x00000001f0000000-0x00000001ffffffff]
[    0.000000] Initmem setup node 0 [mem 
0x0000000000200000-0x00000001ffffffff]
[    0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
[    0.000000] cma: Reserved 96 MiB at 0x00000000ea000000 on node -1

...

[    0.000000] software IO TLB: area num 4.
[    0.000000] software IO TLB: mapped [mem 
0x00000000e6000000-0x00000000ea000000] (64MB)

It looks that for some reasons $subject patch changes the default zone 
and swiotlb configuration.

>> ------------[ cut here ]------------
>> dwmmc_rockchip fe2b0000.mmc: swiotlb addr 0x00000001faf00000+4096
>> overflow (mask ffffffff, bus limit 0).
>> WARNING: CPU: 3 PID: 1 at kernel/dma/swiotlb.c:1594 swiotlb_map+0x2f0/0x308
>> Modules linked in:
>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc4+ #15278
>> Hardware name: Hardkernel ODROID-M1 (DT)
>> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : swiotlb_map+0x2f0/0x308
>> lr : swiotlb_map+0x2f0/0x308
>> ...
>> Call trace:
>>    swiotlb_map+0x2f0/0x308
>>    dma_direct_map_sg+0x9c/0x2e4
>>    __dma_map_sg_attrs+0x28/0x94
>>    dma_map_sg_attrs+0x10/0x24
>>    dw_mci_pre_dma_transfer+0xb8/0xf4
>>    dw_mci_pre_req+0x50/0x68
>>    mmc_blk_mq_issue_rq+0x3e0/0x964
>>    mmc_mq_queue_rq+0x118/0x2b4
>>    blk_mq_dispatch_rq_list+0x21c/0x714
>>    __blk_mq_sched_dispatch_requests+0x490/0x58c
>>    blk_mq_sched_dispatch_requests+0x30/0x6c
>>    blk_mq_run_hw_queue+0x284/0x40c
>>    blk_mq_flush_plug_list.part.0+0x190/0x974
>>    blk_mq_flush_plug_list+0x1c/0x2c
>>    __blk_flush_plug+0xe4/0x140
>>    blk_finish_plug+0x38/0x4c
>>    __ext4_get_inode_loc+0x22c/0x654
>>    __ext4_get_inode_loc_noinmem+0x40/0xa8
>>    __ext4_iget+0x154/0xcc0
>>    ext4_get_journal_inode+0x30/0x110
>>    ext4_load_and_init_journal+0x9c/0xaf0
>>    ext4_fill_super+0x1fec/0x2d90
>>    get_tree_bdev+0x140/0x1d8
>>    ext4_get_tree+0x18/0x24
>>    vfs_get_tree+0x28/0xe8
>>    path_mount+0x3e8/0xb7c
>>    init_mount+0x68/0xac
>>    do_mount_root+0x108/0x1dc
>>    mount_root_generic+0x100/0x330
>>    mount_root+0x160/0x2d0
>>    initrd_load+0x1f0/0x2a0
>>    prepare_namespace+0x4c/0x29c
>>    kernel_init_freeable+0x4b4/0x50c
>>    kernel_init+0x20/0x1d8
>>    ret_from_fork+0x10/0x20
>> irq event stamp: 1305682
>> hardirqs last  enabled at (1305681): [<ffff8000800e332c>]
>> console_unlock+0x124/0x130
>> hardirqs last disabled at (1305682): [<ffff80008124e684>] el1_dbg+0x24/0x8c
>> softirqs last  enabled at (1305678): [<ffff80008005be1c>]
>> handle_softirqs+0x4cc/0x4e4
>> softirqs last disabled at (1305665): [<ffff8000800105b0>]
>> __do_softirq+0x14/0x20
>> ---[ end trace 0000000000000000 ]---
>>
>> This "bus limit 0" seems to be a bit suspicious to me as well as the
>> fact that swiotlb is used for the MMC DMA. I will investigate this
>> further tomorrow. The board boots fine though.
> Looking at the code I guess that bus_dma_limit set to 0 means no bus
> limit. But dma_mask for your device indicates 32-bit device limit. This
> can't work with address above 4GB. For some reason DMA code tries to
> allocate from higher address. This is most likely the reason
> dma_capable() returns false.

Indeed this looks like a source of the problem:

[    3.123618] Synopsys Designware Multimedia Card Interface Driver
[    3.139653] dwmmc_rockchip fe2b0000.mmc: IDMAC supports 32-bit 
address mode.
[    3.147739] dwmmc_rockchip fe2b0000.mmc: Using internal DMA controller.
[    3.161659] dwmmc_rockchip fe2b0000.mmc: Version ID is 270a
[    3.168455] dwmmc_rockchip fe2b0000.mmc: DW MMC controller at irq 
56,32 bit host data width,256 deep fifo
[    3.182651] dwmmc_rockchip fe2b0000.mmc: Got CD GPIO

...

[   11.009258] ------------[ cut here ]------------
[   11.014762] dwmmc_rockchip fe2b0000.mmc: swiotlb addr 
0x00000001faf00000+4096 overflow (mask ffffffff, bus limit 0).


> ...

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-27  6:14       ` Marek Szyprowski
@ 2024-08-27  7:03         ` Baruch Siach
  2024-08-27  7:46           ` Marek Szyprowski
  0 siblings, 1 reply; 17+ messages in thread
From: Baruch Siach @ 2024-08-27  7:03 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Christoph Hellwig, Catalin Marinas, Will Deacon, Robin Murphy,
	iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman, linux-rockchip

Hi Marek,

On Tue, Aug 27 2024, Marek Szyprowski wrote:
> On 27.08.2024 06:52, Baruch Siach wrote:
>> Hi Marek,
>>
>> Thanks for your report.
>>
>> On Mon, Aug 26 2024, Marek Szyprowski wrote:
>>> On 11.08.2024 09:09, Baruch Siach wrote:
>>>> From: Catalin Marinas <catalin.marinas@arm.com>
>>>>
>>>> Hardware DMA limit might not be power of 2. When RAM range starts above
>>>> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
>>>> can not encode this limit.
>>>>
>>>> Use plain address for DMA zone limit.
>>>>
>>>> Since DMA zone can now potentially span beyond 4GB physical limit of
>>>> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
>>>>
>>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>>> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
>>>> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
>>>> ---
>>> This patch landed recently in linux-next as commit ba0fb44aed47
>>> ("dma-mapping: replace zone_dma_bits by zone_dma_limit"). During my
>>> tests I found that it introduces the following warning on ARM64/Rockchip
>>> based Odroid M1 board (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts):
>> Does this warning go away if you revert both 3be9b846896d and ba0fb44aed47?
>
> Yes, linux-next with above mentioned commits reverted works fine.
>
>
>> Upstream rockchip DTs have no dma-ranges property. Is that the case for
>> your platform as well?
>>
>> Can you share kernel report of DMA zones and swiotlb? On my platform I get:
>>
>> [    0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x0000000800000000-0x000000083fffffff]
>> [    0.000000]   DMA32    empty
>> [    0.000000]   Normal   [mem 0x0000000840000000-0x0000000fffffffff]
>> ...
>> [    0.000000] software IO TLB: area num 8.
>> [    0.000000] software IO TLB: mapped [mem 0x000000083be38000-0x000000083fe38000] (64MB)
>>
>> What do you get at your end?
>
> On ba0fb44aed47 I got:
>
> [    0.000000] NUMA: No NUMA configuration found
> [    0.000000] NUMA: Faking a node at [mem 
> 0x0000000000200000-0x00000001ffffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x1ff7a0600-0x1ff7a2fff]
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000200000-0x00000001ffffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000200000-0x00000000083fffff]
> [    0.000000]   node   0: [mem 0x0000000009400000-0x00000000efffffff]
> [    0.000000]   node   0: [mem 0x00000001f0000000-0x00000001ffffffff]
> [    0.000000] Initmem setup node 0 [mem 
> 0x0000000000200000-0x00000001ffffffff]
> [    0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
> [    0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
> [    0.000000] cma: Reserved 96 MiB at 0x00000001f0000000 on node -1
>
> ...
>
> [    0.000000] software IO TLB: SWIOTLB bounce buffer size adjusted to 3MB
> [    0.000000] software IO TLB: area num 4.
> [    0.000000] software IO TLB: mapped [mem 
> 0x00000001fac00000-0x00000001fb000000] (4MB)
>
> On the fa3c109a6d30 (parent commit of the $subject) I got:
>
> [    0.000000] NUMA: No NUMA configuration found
> [    0.000000] NUMA: Faking a node at [mem 
> 0x0000000000200000-0x00000001ffffffff]
> [    0.000000] NUMA: NODE_DATA [mem 0x1ff7a0600-0x1ff7a2fff]
> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   [mem 0x0000000100000000-0x00000001ffffffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000200000-0x00000000083fffff]
> [    0.000000]   node   0: [mem 0x0000000009400000-0x00000000efffffff]
> [    0.000000]   node   0: [mem 0x00000001f0000000-0x00000001ffffffff]
> [    0.000000] Initmem setup node 0 [mem 
> 0x0000000000200000-0x00000001ffffffff]
> [    0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
> [    0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
> [    0.000000] cma: Reserved 96 MiB at 0x00000000ea000000 on node -1
>
> ...
>
> [    0.000000] software IO TLB: area num 4.
> [    0.000000] software IO TLB: mapped [mem 
> 0x00000000e6000000-0x00000000ea000000] (64MB)
>
> It looks that for some reasons $subject patch changes the default zone 
> and swiotlb configuration.

Does this fix the issue?

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index bfb10969cbf0..7fcd0aaa9bb6 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -116,6 +116,9 @@ static void __init arch_reserve_crashkernel(void)
 
 static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
 {
+	if (memblock_start_of_DRAM() < U32_MAX)
+		zone_limit = min(zone_limit, U32_MAX);
+
 	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
 }
 

Thanks,
baruch

>>> ------------[ cut here ]------------
>>> dwmmc_rockchip fe2b0000.mmc: swiotlb addr 0x00000001faf00000+4096
>>> overflow (mask ffffffff, bus limit 0).
>>> WARNING: CPU: 3 PID: 1 at kernel/dma/swiotlb.c:1594 swiotlb_map+0x2f0/0x308
>>> Modules linked in:
>>> CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc4+ #15278
>>> Hardware name: Hardkernel ODROID-M1 (DT)
>>> pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> pc : swiotlb_map+0x2f0/0x308
>>> lr : swiotlb_map+0x2f0/0x308
>>> ...
>>> Call trace:
>>>    swiotlb_map+0x2f0/0x308
>>>    dma_direct_map_sg+0x9c/0x2e4
>>>    __dma_map_sg_attrs+0x28/0x94
>>>    dma_map_sg_attrs+0x10/0x24
>>>    dw_mci_pre_dma_transfer+0xb8/0xf4
>>>    dw_mci_pre_req+0x50/0x68
>>>    mmc_blk_mq_issue_rq+0x3e0/0x964
>>>    mmc_mq_queue_rq+0x118/0x2b4
>>>    blk_mq_dispatch_rq_list+0x21c/0x714
>>>    __blk_mq_sched_dispatch_requests+0x490/0x58c
>>>    blk_mq_sched_dispatch_requests+0x30/0x6c
>>>    blk_mq_run_hw_queue+0x284/0x40c
>>>    blk_mq_flush_plug_list.part.0+0x190/0x974
>>>    blk_mq_flush_plug_list+0x1c/0x2c
>>>    __blk_flush_plug+0xe4/0x140
>>>    blk_finish_plug+0x38/0x4c
>>>    __ext4_get_inode_loc+0x22c/0x654
>>>    __ext4_get_inode_loc_noinmem+0x40/0xa8
>>>    __ext4_iget+0x154/0xcc0
>>>    ext4_get_journal_inode+0x30/0x110
>>>    ext4_load_and_init_journal+0x9c/0xaf0
>>>    ext4_fill_super+0x1fec/0x2d90
>>>    get_tree_bdev+0x140/0x1d8
>>>    ext4_get_tree+0x18/0x24
>>>    vfs_get_tree+0x28/0xe8
>>>    path_mount+0x3e8/0xb7c
>>>    init_mount+0x68/0xac
>>>    do_mount_root+0x108/0x1dc
>>>    mount_root_generic+0x100/0x330
>>>    mount_root+0x160/0x2d0
>>>    initrd_load+0x1f0/0x2a0
>>>    prepare_namespace+0x4c/0x29c
>>>    kernel_init_freeable+0x4b4/0x50c
>>>    kernel_init+0x20/0x1d8
>>>    ret_from_fork+0x10/0x20
>>> irq event stamp: 1305682
>>> hardirqs last  enabled at (1305681): [<ffff8000800e332c>]
>>> console_unlock+0x124/0x130
>>> hardirqs last disabled at (1305682): [<ffff80008124e684>] el1_dbg+0x24/0x8c
>>> softirqs last  enabled at (1305678): [<ffff80008005be1c>]
>>> handle_softirqs+0x4cc/0x4e4
>>> softirqs last disabled at (1305665): [<ffff8000800105b0>]
>>> __do_softirq+0x14/0x20
>>> ---[ end trace 0000000000000000 ]---
>>>
>>> This "bus limit 0" seems to be a bit suspicious to me as well as the
>>> fact that swiotlb is used for the MMC DMA. I will investigate this
>>> further tomorrow. The board boots fine though.
>> Looking at the code I guess that bus_dma_limit set to 0 means no bus
>> limit. But dma_mask for your device indicates 32-bit device limit. This
>> can't work with address above 4GB. For some reason DMA code tries to
>> allocate from higher address. This is most likely the reason
>> dma_capable() returns false.
>
> Indeed this looks like a source of the problem:
>
> [    3.123618] Synopsys Designware Multimedia Card Interface Driver
> [    3.139653] dwmmc_rockchip fe2b0000.mmc: IDMAC supports 32-bit 
> address mode.
> [    3.147739] dwmmc_rockchip fe2b0000.mmc: Using internal DMA controller.
> [    3.161659] dwmmc_rockchip fe2b0000.mmc: Version ID is 270a
> [    3.168455] dwmmc_rockchip fe2b0000.mmc: DW MMC controller at irq 
> 56,32 bit host data width,256 deep fifo
> [    3.182651] dwmmc_rockchip fe2b0000.mmc: Got CD GPIO
>
> ...
>
> [   11.009258] ------------[ cut here ]------------
> [   11.014762] dwmmc_rockchip fe2b0000.mmc: swiotlb addr 
> 0x00000001faf00000+4096 overflow (mask ffffffff, bus limit 0).
>
>
>> ...
>
> Best regards

-- 
                                                     ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-27  7:03         ` Baruch Siach
@ 2024-08-27  7:46           ` Marek Szyprowski
  0 siblings, 0 replies; 17+ messages in thread
From: Marek Szyprowski @ 2024-08-27  7:46 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Christoph Hellwig, Catalin Marinas, Will Deacon, Robin Murphy,
	iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman, linux-rockchip

On 27.08.2024 09:03, Baruch Siach wrote:
> On Tue, Aug 27 2024, Marek Szyprowski wrote:
>> On 27.08.2024 06:52, Baruch Siach wrote:
>>> Hi Marek,
>>>
>>> Thanks for your report.
>>>
>>> On Mon, Aug 26 2024, Marek Szyprowski wrote:
>>>> On 11.08.2024 09:09, Baruch Siach wrote:
>>>>> From: Catalin Marinas <catalin.marinas@arm.com>
>>>>>
>>>>> Hardware DMA limit might not be power of 2. When RAM range starts above
>>>>> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
>>>>> can not encode this limit.
>>>>>
>>>>> Use plain address for DMA zone limit.
>>>>>
>>>>> Since DMA zone can now potentially span beyond 4GB physical limit of
>>>>> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
>>>>>
>>>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>>>> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
>>>>> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
>>>>> ---
>>>> This patch landed recently in linux-next as commit ba0fb44aed47
>>>> ("dma-mapping: replace zone_dma_bits by zone_dma_limit"). During my
>>>> tests I found that it introduces the following warning on ARM64/Rockchip
>>>> based Odroid M1 board (arch/arm64/boot/dts/rockchip/rk3568-odroid-m1.dts):
>>> Does this warning go away if you revert both 3be9b846896d and ba0fb44aed47?
>> Yes, linux-next with above mentioned commits reverted works fine.
>>
>>
>>> Upstream rockchip DTs have no dma-ranges property. Is that the case for
>>> your platform as well?
>>>
>>> Can you share kernel report of DMA zones and swiotlb? On my platform I get:
>>>
>>> [    0.000000] Zone ranges:
>>> [    0.000000]   DMA      [mem 0x0000000800000000-0x000000083fffffff]
>>> [    0.000000]   DMA32    empty
>>> [    0.000000]   Normal   [mem 0x0000000840000000-0x0000000fffffffff]
>>> ...
>>> [    0.000000] software IO TLB: area num 8.
>>> [    0.000000] software IO TLB: mapped [mem 0x000000083be38000-0x000000083fe38000] (64MB)
>>>
>>> What do you get at your end?
>> On ba0fb44aed47 I got:
>>
>> [    0.000000] NUMA: No NUMA configuration found
>> [    0.000000] NUMA: Faking a node at [mem
>> 0x0000000000200000-0x00000001ffffffff]
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ff7a0600-0x1ff7a2fff]
>> [    0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x0000000000200000-0x00000001ffffffff]
>> [    0.000000]   DMA32    empty
>> [    0.000000]   Normal   empty
>> [    0.000000] Movable zone start for each node
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x0000000000200000-0x00000000083fffff]
>> [    0.000000]   node   0: [mem 0x0000000009400000-0x00000000efffffff]
>> [    0.000000]   node   0: [mem 0x00000001f0000000-0x00000001ffffffff]
>> [    0.000000] Initmem setup node 0 [mem
>> 0x0000000000200000-0x00000001ffffffff]
>> [    0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
>> [    0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
>> [    0.000000] cma: Reserved 96 MiB at 0x00000001f0000000 on node -1
>>
>> ...
>>
>> [    0.000000] software IO TLB: SWIOTLB bounce buffer size adjusted to 3MB
>> [    0.000000] software IO TLB: area num 4.
>> [    0.000000] software IO TLB: mapped [mem
>> 0x00000001fac00000-0x00000001fb000000] (4MB)
>>
>> On the fa3c109a6d30 (parent commit of the $subject) I got:
>>
>> [    0.000000] NUMA: No NUMA configuration found
>> [    0.000000] NUMA: Faking a node at [mem
>> 0x0000000000200000-0x00000001ffffffff]
>> [    0.000000] NUMA: NODE_DATA [mem 0x1ff7a0600-0x1ff7a2fff]
>> [    0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x0000000000200000-0x00000000ffffffff]
>> [    0.000000]   DMA32    empty
>> [    0.000000]   Normal   [mem 0x0000000100000000-0x00000001ffffffff]
>> [    0.000000] Movable zone start for each node
>> [    0.000000] Early memory node ranges
>> [    0.000000]   node   0: [mem 0x0000000000200000-0x00000000083fffff]
>> [    0.000000]   node   0: [mem 0x0000000009400000-0x00000000efffffff]
>> [    0.000000]   node   0: [mem 0x00000001f0000000-0x00000001ffffffff]
>> [    0.000000] Initmem setup node 0 [mem
>> 0x0000000000200000-0x00000001ffffffff]
>> [    0.000000] On node 0, zone DMA: 512 pages in unavailable ranges
>> [    0.000000] On node 0, zone DMA: 4096 pages in unavailable ranges
>> [    0.000000] cma: Reserved 96 MiB at 0x00000000ea000000 on node -1
>>
>> ...
>>
>> [    0.000000] software IO TLB: area num 4.
>> [    0.000000] software IO TLB: mapped [mem
>> 0x00000000e6000000-0x00000000ea000000] (64MB)
>>
>> It looks that for some reasons $subject patch changes the default zone
>> and swiotlb configuration.
> Does this fix the issue?
>
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index bfb10969cbf0..7fcd0aaa9bb6 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -116,6 +116,9 @@ static void __init arch_reserve_crashkernel(void)
>   
>   static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>   {
> +	if (memblock_start_of_DRAM() < U32_MAX)
> +		zone_limit = min(zone_limit, U32_MAX);
> +
>   	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
>   }
>   

Yes, this fixes my issue. Thanks!

Fell free to add:

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-11  7:09 ` [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
                     ` (3 preceding siblings ...)
  2024-08-26 19:28   ` Marek Szyprowski
@ 2024-08-29 13:42   ` Neil Armstrong
  2024-08-29 14:38     ` Robin Murphy
  4 siblings, 1 reply; 17+ messages in thread
From: Neil Armstrong @ 2024-08-29 13:42 UTC (permalink / raw)
  To: Baruch Siach, Christoph Hellwig, Marek Szyprowski,
	Catalin Marinas, Will Deacon
  Cc: Robin Murphy, iommu, linux-arm-kernel, linux-kernel, linuxppc-dev,
	linux-s390, Petr Tesařík, Ramon Fried, Elad Nachman,
	linux-arm-msm

Hi,

On 11/08/2024 09:09, Baruch Siach wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
> 
> Hardware DMA limit might not be power of 2. When RAM range starts above
> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> can not encode this limit.
> 
> Use plain address for DMA zone limit.
> 
> Since DMA zone can now potentially span beyond 4GB physical limit of
> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---
>   arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
>   arch/powerpc/mm/mem.c      |  5 ++++-
>   arch/s390/mm/init.c        |  2 +-
>   include/linux/dma-direct.h |  2 +-
>   kernel/dma/direct.c        |  6 +++---
>   kernel/dma/pool.c          |  4 ++--
>   kernel/dma/swiotlb.c       |  6 +++---
>   7 files changed, 29 insertions(+), 26 deletions(-)
> 

<snip>

This change breaks the Qualcomm SM8550-HDK boot since next-20240826.
It doesn't affect SM8550-QRD or other similar SoCs like SM8650 or SM8450.
The last CI run on next-20240828 can be found at:
https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/pipelines/100936

SM8550-HDK boot log:
https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/jobs/165617

bisect log:
# bad: [b18bbfc14a38b5234e09c2adcf713e38063a7e6e] Add linux-next specific files for 20240829
# good: [5be63fc19fcaa4c236b307420483578a56986a37] Linux 6.11-rc5
git bisect start 'FETCH_HEAD' 'v6.11-rc5'
# bad: [dc09f0263a0accf41821d260f4bf7ad9a4f7b7d8] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
git bisect bad dc09f0263a0accf41821d260f4bf7ad9a4f7b7d8
# bad: [97c7e618752776e03f50311400bb73c01489fb17] Merge branch 'for-next' of git://github.com/Xilinx/linux-xlnx.git
git bisect bad 97c7e618752776e03f50311400bb73c01489fb17
# good: [27ad8eb339a5e3f96aed5f3a3b5901994ce7856d] Merge branch 'clang-format' of https://github.com/ojeda/linux.git
git bisect good 27ad8eb339a5e3f96aed5f3a3b5901994ce7856d
# bad: [fd34d49f6d8dbd157b17b675dc51a145cdad580c] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-omap.git
git bisect bad fd34d49f6d8dbd157b17b675dc51a145cdad580c
# bad: [29f35e8cf8bff7c69a740edb1cf3d62d211f5a43] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux.git
git bisect bad 29f35e8cf8bff7c69a740edb1cf3d62d211f5a43
# good: [41860d49473c0c09dc0a2a4d148047f97aaa2539] perf sched: Use perf_tool__init()
git bisect good 41860d49473c0c09dc0a2a4d148047f97aaa2539
# good: [6236ebe07131a7746d870f1d8eb3637a8df13e70] perf daemon: Fix the build on more 32-bit architectures
git bisect good 6236ebe07131a7746d870f1d8eb3637a8df13e70
# good: [92b0d033c80c882e6be26dfeeb2f24c53bdeeee6] Merge branches 'for-next/acpi', 'for-next/misc', 'for-next/perf', 'for-next/selftests' and 'for-next/timers' into for-next/core
git bisect good 92b0d033c80c882e6be26dfeeb2f24c53bdeeee6
# bad: [5db893307df81f0c1dd6b6f167dc263e88ba855a] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/rmk/linux.git
git bisect bad 5db893307df81f0c1dd6b6f167dc263e88ba855a
# bad: [f65eaf209abc5f4761623aaa1e110bdb6de124ed] Merge branch 'for-next' of git://git.infradead.org/users/hch/dma-mapping.git
git bisect bad f65eaf209abc5f4761623aaa1e110bdb6de124ed
# bad: [f69e342eec008e1bab772d3963c3dd9979293e13] dma-mapping: call ->unmap_page and ->unmap_sg unconditionally
git bisect bad f69e342eec008e1bab772d3963c3dd9979293e13
# bad: [ba0fb44aed47693cc2482427f63ba6cd19051327] dma-mapping: replace zone_dma_bits by zone_dma_limit
git bisect bad ba0fb44aed47693cc2482427f63ba6cd19051327
# good: [fa3c109a6d302b56437a6412c5f3044c3e12de03] dma-mapping: use bit masking to check VM_DMA_COHERENT
git bisect good fa3c109a6d302b56437a6412c5f3044c3e12de03
# first bad commit: [ba0fb44aed47693cc2482427f63ba6cd19051327] dma-mapping: replace zone_dma_bits by zone_dma_limit

Boot log with earlycon:
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd461]
[    0.000000] Linux version 6.11.0-rc5-next-20240829-07694-gb36219e63903 (narmstrong@arrakeen.starnux.net) (aarch64-none-linux-gnu-gcc (Arm GNU Toolchain 13.2.rel1 (Build arm-13.7)) 13.2.1 20231009, GNU ld (Arm GNU Toolchain 13.2.rel1 (Build arm-13.7)) 2.41.0.20231009) #397 SMP PREEMPT Thu Aug 29 15:25:39 CEST 2024
[    0.000000] KASLR enabled
[    0.000000] random: crng init done
[    0.000000] Machine model: Qualcomm Technologies, Inc. SM8550 HDK
[    0.000000] printk: debug: ignoring loglevel setting.
[    0.000000] efi: UEFI not found.
[    0.000000] [Firmware Bug]: Kernel image misaligned at boot, please fix your bootloader!
[    0.000000] OF: reserved mem: 0x0000000080000000..0x00000000809fffff (10240 KiB) nomap non-reusable hyp-region@80000000
[    0.000000] OF: reserved mem: 0x0000000080a00000..0x0000000080dfffff (4096 KiB) nomap non-reusable cpusys-vm-region@80a00000
[    0.000000] OF: reserved mem: 0x0000000080e00000..0x00000000811cffff (3904 KiB) nomap non-reusable hyp-tags-region@80e00000
[    0.000000] OF: reserved mem: 0x00000000811d0000..0x00000000811fffff (192 KiB) nomap non-reusable hyp-tags-reserved-region@811d0000
[    0.000000] OF: reserved mem: 0x0000000081a00000..0x0000000081c5ffff (2432 KiB) nomap non-reusable xbl-dt-log-merged-region@81a00000
[    0.000000] OF: reserved mem: 0x0000000081c60000..0x0000000081c7ffff (128 KiB) nomap non-reusable aop-cmd-db-region@81c60000
[    0.000000] OF: reserved mem: 0x0000000081c80000..0x0000000081cf3fff (464 KiB) nomap non-reusable aop-config-merged-region@81c80000
[    0.000000] OF: reserved mem: 0x0000000081d00000..0x0000000081efffff (2048 KiB) nomap non-reusable smem@81d00000
[    0.000000] OF: reserved mem: 0x0000000081f00000..0x0000000081f1ffff (128 KiB) nomap non-reusable adsp-mhi-region@81f00000
[    0.000000] OF: reserved mem: 0x0000000082600000..0x00000000826fffff (1024 KiB) nomap non-reusable global-sync-region@82600000
[    0.000000] OF: reserved mem: 0x0000000082700000..0x00000000827fffff (1024 KiB) nomap non-reusable tz-stat-region@82700000
[    0.000000] OF: reserved mem: 0x0000000082800000..0x0000000086dfffff (71680 KiB) nomap non-reusable cdsp-secure-heap-region@82800000
[    0.000000] OF: reserved mem: 0x000000008a800000..0x000000009affffff (270336 KiB) nomap non-reusable mpss-region@8a800000
[    0.000000] OF: reserved mem: 0x000000009b000000..0x000000009b07ffff (512 KiB) nomap non-reusable q6-mpss-dtb-region@9b000000
[    0.000000] OF: reserved mem: 0x000000009b080000..0x000000009b08ffff (64 KiB) nomap non-reusable ipa-fw-region@9b080000
[    0.000000] OF: reserved mem: 0x000000009b090000..0x000000009b099fff (40 KiB) nomap non-reusable ipa-gsi-region@9b090000
[    0.000000] OF: reserved mem: 0x000000009b09a000..0x000000009b09bfff (8 KiB) nomap non-reusable gpu-micro-code-region@9b09a000
[    0.000000] OF: reserved mem: 0x000000009b100000..0x000000009b27ffff (1536 KiB) nomap non-reusable spss-region@9b100000
[    0.000000] OF: reserved mem: 0x000000009b280000..0x000000009b2dffff (384 KiB) nomap non-reusable spu-tz-shared-region@9b280000
[    0.000000] OF: reserved mem: 0x000000009b2e0000..0x000000009b2fffff (128 KiB) nomap non-reusable spu-modem-shared-region@9b2e0000
[    0.000000] OF: reserved mem: 0x000000009b300000..0x000000009bafffff (8192 KiB) nomap non-reusable camera-region@9b300000
[    0.000000] OF: reserved mem: 0x000000009bb00000..0x000000009c1fffff (7168 KiB) nomap non-reusable video-region@9bb00000
[    0.000000] OF: reserved mem: 0x000000009c200000..0x000000009c8fffff (7168 KiB) nomap non-reusable cvp-region@9c200000
[    0.000000] OF: reserved mem: 0x000000009c900000..0x000000009e8fffff (32768 KiB) nomap non-reusable cdsp-region@9c900000
[    0.000000] OF: reserved mem: 0x000000009e900000..0x000000009e97ffff (512 KiB) nomap non-reusable q6-cdsp-dtb-region@9e900000
[    0.000000] OF: reserved mem: 0x000000009e980000..0x000000009e9fffff (512 KiB) nomap non-reusable q6-adsp-dtb-region@9e980000
[    0.000000] OF: reserved mem: 0x000000009ea00000..0x00000000a2a7ffff (66048 KiB) nomap non-reusable adspslpi-region@9ea00000
[    0.000000] OF: reserved mem: 0x00000000d4a80000..0x00000000d4cfffff (2560 KiB) nomap non-reusable rmtfs-region@d4a80000
[    0.000000] OF: reserved mem: 0x00000000d4d00000..0x00000000d7ffffff (52224 KiB) nomap non-reusable mpss-dsm-region@d4d00000
[    0.000000] OF: reserved mem: 0x00000000d8000000..0x00000000d80fffff (1024 KiB) nomap non-reusable tz-reserved-region@d8000000
[    0.000000] OF: reserved mem: 0x00000000d8100000..0x00000000d813ffff (256 KiB) nomap non-reusable xbl-sc-region@d8100000
[    0.000000] OF: reserved mem: 0x00000000d8140000..0x00000000d82fffff (1792 KiB) nomap non-reusable cpucp-fw-region@d8140000
[    0.000000] OF: reserved mem: 0x00000000d8300000..0x00000000d87fffff (5120 KiB) nomap non-reusable qtee-region@d8300000
[    0.000000] OF: reserved mem: 0x00000000d8800000..0x00000000e11fffff (141312 KiB) nomap non-reusable ta-region@d8800000
[    0.000000] OF: reserved mem: 0x00000000e1200000..0x00000000e393ffff (40192 KiB) nomap non-reusable tz-tags-region@e1200000
[    0.000000] OF: reserved mem: 0x00000000e6440000..0x00000000e66b8fff (2532 KiB) nomap non-reusable hwfence-shbuf-region@e6440000
[    0.000000] OF: reserved mem: 0x00000000f3600000..0x00000000f80edfff (76728 KiB) nomap non-reusable trust-ui-vm-region@f3600000
[    0.000000] OF: reserved mem: 0x00000000f80ee000..0x00000000f80eefff (4 KiB) nomap non-reusable trust-ui-vm-dump-region@f80ee000
[    0.000000] OF: reserved mem: 0x00000000f80ef000..0x00000000f80f7fff (36 KiB) nomap non-reusable trust-ui-vm-qrt-region@f80ef000
[    0.000000] OF: reserved mem: 0x00000000f80f8000..0x00000000f80fbfff (16 KiB) nomap non-reusable trust-ui-vm-vblk0-ring-region@f80f8000
[    0.000000] OF: reserved mem: 0x00000000f80fc000..0x00000000f80fffff (16 KiB) nomap non-reusable trust-ui-vm-vblk1-ring-region@f80fc000
[    0.000000] OF: reserved mem: 0x00000000f8100000..0x00000000f81fffff (1024 KiB) nomap non-reusable trust-ui-vm-swiotlb-region@f8100000
[    0.000000] OF: reserved mem: 0x00000000f8400000..0x00000000fcbfffff (73728 KiB) nomap non-reusable oem-vm-region@f8400000
[    0.000000] OF: reserved mem: 0x00000000fcc00000..0x00000000fcc03fff (16 KiB) nomap non-reusable oem-vm-vblk0-ring-region@fcc00000
[    0.000000] OF: reserved mem: 0x00000000fcc04000..0x00000000fcd03fff (1024 KiB) nomap non-reusable oem-vm-swiotlb-region@fcc04000
[    0.000000] OF: reserved mem: 0x00000000fce00000..0x00000000ff6fffff (41984 KiB) nomap non-reusable hyp-ext-tags-region@fce00000
[    0.000000] OF: reserved mem: 0x00000000ff700000..0x00000000ff7fffff (1024 KiB) nomap non-reusable hyp-ext-reserved-region@ff700000
[    0.000000] earlycon: qcom_geni0 at MMIO 0x0000000000a9c000 (options '115200n8')
[    0.000000] printk: legacy bootconsole [qcom_geni0] enabled
[    0.000000] NUMA: Faking a node at [mem 0x0000000080000000-0x0000000affffffff]
[    0.000000] NODE_DATA(0) allocated [mem 0xafe933200-0xafe93583f]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000080000000-0x0000000affffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080000000-0x0000000080dfffff]
[    0.000000]   node   0: [mem 0x00000000811d0000-0x00000000811fffff]
[    0.000000]   node   0: [mem 0x0000000081200000-0x00000000819fffff]
[    0.000000]   node   0: [mem 0x0000000081a00000-0x0000000081cf3fff]
[    0.000000]   node   0: [mem 0x0000000081cf4000-0x0000000081cfffff]
[    0.000000]   node   0: [mem 0x0000000081d00000-0x0000000081f1ffff]
[    0.000000]   node   0: [mem 0x0000000081f20000-0x00000000825fffff]
[    0.000000]   node   0: [mem 0x0000000082600000-0x0000000086dfffff]
[    0.000000]   node   0: [mem 0x0000000086e00000-0x000000008a7fffff]
[    0.000000]   node   0: [mem 0x000000008a800000-0x000000009b09bfff]
[    0.000000]   node   0: [mem 0x000000009b09c000-0x000000009b0fffff]
[    0.000000]   node   0: [mem 0x000000009b100000-0x00000000a2a7ffff]
[    0.000000]   node   0: [mem 0x00000000a2a80000-0x00000000d4a7ffff]
[    0.000000]   node   0: [mem 0x00000000d4a80000-0x00000000d7ffffff]
[    0.000000]   node   0: [mem 0x00000000d8140000-0x00000000d815ffff]
[    0.000000]   node   0: [mem 0x00000000e1bb0000-0x00000000e393ffff]
[    0.000000]   node   0: [mem 0x00000000e3940000-0x00000000e643ffff]
[    0.000000]   node   0: [mem 0x00000000e6440000-0x00000000e66b8fff]
[    0.000000]   node   0: [mem 0x00000000e66b9000-0x00000000f35fffff]
[    0.000000]   node   0: [mem 0x00000000f3600000-0x00000000f81fffff]
[    0.000000]   node   0: [mem 0x00000000f8200000-0x00000000f83fffff]
[    0.000000]   node   0: [mem 0x00000000f8400000-0x00000000fcd03fff]
[    0.000000]   node   0: [mem 0x00000000fcd04000-0x00000000fcdfffff]
[    0.000000]   node   0: [mem 0x00000000fce00000-0x00000000ff7fffff]
[    0.000000]   node   0: [mem 0x00000000ff800000-0x00000000ffffffff]
[    0.000000]   node   0: [mem 0x0000000880000000-0x00000008b99fffff]
[    0.000000]   node   0: [mem 0x00000008c0000000-0x0000000affffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080000000-0x0000000affffffff]
[    0.000000] On node 0, zone DMA: 976 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 320 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 39504 pages in unavailable ranges
[    0.000000] On node 0, zone DMA: 26112 pages in unavailable ranges
[    0.000000] cma: Reserved 32 MiB at 0x0000000880000000 on node -1
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.3
[    0.000000] psci: OSI mode supported.
[    0.000000] percpu: Embedded 25 pages/cpu s61656 r8192 d32552 u102400
[    0.000000] pcpu-alloc: s61656 r8192 d32552 u102400 alloc=25*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: Address authentication (architected QARMA5 algorithm)
[    0.000000] CPU features: detected: GIC system register CPU interface
[    0.000000] CPU features: detected: Spectre-v4
[    0.000000] CPU features: detected: ARM erratum 2457168
[    0.000000] CPU features: detected: ARM erratum 2658417
[    0.000000] CPU features: detected: ARM errata 2966298, 3117295
[    0.000000] alternatives: applying boot alternatives
[    0.000000] Kernel command line: earlycon root= allow_mismatched_32bit_el0 clk_ignore_unused pd_ignore_unused systemd.mask=rmtfs.service ignore_loglevel -- androidboot.verifiedbootstate=orange androidboot.keymaster=1  androidboot.bootdevice=1d84000.ufshc androidboot.fstab_suffix=default androidboot.boot_devices=soc/1d84000.ufshc androidboot.serialno=2a7a254f androidboot.baseband=apq androidboot.force_normal_boot=1
[    0.000000] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes, linear)
[    0.000000] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear)
[    0.000000] Fallback order for Node 0: 0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 3078816
[    0.000000] Policy zone: DMA
[    0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
[    0.000000] software IO TLB: SWIOTLB bounce buffer size adjusted to 11MB
[    0.000000] software IO TLB: area num 8.
[    0.000000] software IO TLB: SWIOTLB bounce buffer size roundup to 16MB
[    0.000000] software IO TLB: mapped [mem 0x0000000aeff38000-0x0000000af0f38000] (16MB)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] rcu: 	RCU event tracing is enabled.
[    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=8.
[    0.000000] 	Trampoline variant of Tasks RCU enabled.
[    0.000000] 	Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[    0.000000] RCU Tasks: Setting shift to 3 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=8.
[    0.000000] RCU Tasks Trace: Setting shift to 3 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=8.
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GICv3: 988 SPIs implemented
[    0.000000] GICv3: 0 Extended SPIs implemented
[    0.000000] Root IRQ handler: gic_handle_irq
[    0.000000] GICv3: GICv3 features: 16 PPIs, DirectLPI
[    0.000000] GICv3: GICD_CTRL.DS=1, SCR_EL3.FIQ=0
[    0.000000] GICv3: Enabling SGIs without active state
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x0000000017180000
[    0.000000] ITS [mem 0x17140000-0x1715ffff]
[    0.000000] ITS@0x0000000017140000: Devices Table too large, reduce ids 32->19
[    0.000000] ITS@0x0000000017140000: Devices too large, reduce ITS pages 1024->256
[    0.000000] ITS@0x0000000017140000: allocated 131072 Devices @81500000 (indirect, esz 8, psz 4K, shr 1)
[    0.000000] ITS@0x0000000017140000: allocated 4096 Interrupt Collections @8145b000 (flat, esz 1, psz 4K, shr 1)
[    0.000000] GICv3: using LPI property table @0x0000000081470000
[    0.000000] GICv3: CPU0: using allocated LPI pending table @0x0000000081480000
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] arch_timer: cp15 and mmio timer(s) running at 19.20MHz (virt/virt).
[    0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x46d987e47, max_idle_ns: 440795202767 ns
[    0.000000] sched_clock: 56 bits at 19MHz, resolution 52ns, wraps every 4398046511078ns
[    0.008355] arm-pv: using stolen time PV
[    0.012595] Console: colour dummy device 80x25
[    0.017182] printk: legacy console [tty0] enabled
[    0.022031] printk: legacy bootconsole [qcom_geni0] disabled
<----------------->

System handler takes over, I'll analyze the system firmware logs to understand which access made the crash.

I tested the "arm64: mm: fix DMA zone when dma-ranges is missing" patch at [1], but it doesn't help.

[1] https://lore.kernel.org/all/d8e92b14181fc815773ac4f4bba70b5d48bc390e.1724838684.git.baruch@tkos.co.il/

Neil

#regzbot introduced: ba0fb44aed47693cc2482427f63ba6cd19051327




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-29 13:42   ` Neil Armstrong
@ 2024-08-29 14:38     ` Robin Murphy
  2024-08-29 14:54       ` neil.armstrong
  0 siblings, 1 reply; 17+ messages in thread
From: Robin Murphy @ 2024-08-29 14:38 UTC (permalink / raw)
  To: neil.armstrong, Baruch Siach, Christoph Hellwig, Marek Szyprowski,
	Catalin Marinas, Will Deacon
  Cc: iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman, linux-arm-msm

On 2024-08-29 2:42 pm, Neil Armstrong wrote:
> Hi,
> 
> On 11/08/2024 09:09, Baruch Siach wrote:
>> From: Catalin Marinas <catalin.marinas@arm.com>
>>
>> Hardware DMA limit might not be power of 2. When RAM range starts above
>> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
>> can not encode this limit.
>>
>> Use plain address for DMA zone limit.
>>
>> Since DMA zone can now potentially span beyond 4GB physical limit of
>> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
>>
>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
>> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
>> ---
>>   arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
>>   arch/powerpc/mm/mem.c      |  5 ++++-
>>   arch/s390/mm/init.c        |  2 +-
>>   include/linux/dma-direct.h |  2 +-
>>   kernel/dma/direct.c        |  6 +++---
>>   kernel/dma/pool.c          |  4 ++--
>>   kernel/dma/swiotlb.c       |  6 +++---
>>   7 files changed, 29 insertions(+), 26 deletions(-)
>>
> 
> <snip>
> 
> This change breaks the Qualcomm SM8550-HDK boot since next-20240826.
> It doesn't affect SM8550-QRD or other similar SoCs like SM8650 or SM8450.
> The last CI run on next-20240828 can be found at:
> https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/pipelines/100936
> 
> SM8550-HDK boot log:
> https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/jobs/165617
> 
[...]

Yeah, a 35-bit ZONE_DMA is sure to make stuff go wrong:

> [    0.000000] Zone ranges:
> [    0.000000]   DMA      [mem 0x0000000080000000-0x0000000affffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   empty

Compared to before:

[    0.000000]   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x0000000affffffff]

This'll be because the SoC DT is describing a general non-restrictive range:
		dma-ranges = <0 0 0 0 0x10 0>;

Which proves we need more information than 
{acpi,of}_dma_get_max_cpu_address() are currently able to give us, 
because what zone_dma_limit actually wants to be is the *minimum* of the 
lowest highest CPU address of any DMA range, and the lowest CPU address 
of any DMA range + 2^32. I was thinking it had all ended up looking a 
bit too easy... :)

I think v1 of the fix[1] might actually work out for this, albeit still 
for the wrong reasons - if so, I concede that maybe at this point it 
might be safest to go back to that one as a quick short-term fix (with a 
big fat comment to say so) rather than try to rush the proper solution 
or revert everything.

Thanks,
Robin.

[1] 
https://lore.kernel.org/linux-arm-kernel/731d204f5f556ad61bbaf004b1d984f83c90b4f5.1724748249.git.baruch@tkos.co.il/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit
  2024-08-29 14:38     ` Robin Murphy
@ 2024-08-29 14:54       ` neil.armstrong
  0 siblings, 0 replies; 17+ messages in thread
From: neil.armstrong @ 2024-08-29 14:54 UTC (permalink / raw)
  To: Robin Murphy, Baruch Siach, Christoph Hellwig, Marek Szyprowski,
	Catalin Marinas, Will Deacon
  Cc: iommu, linux-arm-kernel, linux-kernel, linuxppc-dev, linux-s390,
	Petr Tesařík, Ramon Fried, Elad Nachman, linux-arm-msm

Hi Robin,

On 29/08/2024 16:38, Robin Murphy wrote:
> On 2024-08-29 2:42 pm, Neil Armstrong wrote:
>> Hi,
>>
>> On 11/08/2024 09:09, Baruch Siach wrote:
>>> From: Catalin Marinas <catalin.marinas@arm.com>
>>>
>>> Hardware DMA limit might not be power of 2. When RAM range starts above
>>> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
>>> can not encode this limit.
>>>
>>> Use plain address for DMA zone limit.
>>>
>>> Since DMA zone can now potentially span beyond 4GB physical limit of
>>> DMA32, make sure to use DMA zone for GFP_DMA32 allocations in that case.
>>>
>>> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
>>> Co-developed-by: Baruch Siach <baruch@tkos.co.il>
>>> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
>>> ---
>>>   arch/arm64/mm/init.c       | 30 +++++++++++++++---------------
>>>   arch/powerpc/mm/mem.c      |  5 ++++-
>>>   arch/s390/mm/init.c        |  2 +-
>>>   include/linux/dma-direct.h |  2 +-
>>>   kernel/dma/direct.c        |  6 +++---
>>>   kernel/dma/pool.c          |  4 ++--
>>>   kernel/dma/swiotlb.c       |  6 +++---
>>>   7 files changed, 29 insertions(+), 26 deletions(-)
>>>
>>
>> <snip>
>>
>> This change breaks the Qualcomm SM8550-HDK boot since next-20240826.
>> It doesn't affect SM8550-QRD or other similar SoCs like SM8650 or SM8450.
>> The last CI run on next-20240828 can be found at:
>> https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/pipelines/100936
>>
>> SM8550-HDK boot log:
>> https://git.codelinaro.org/linaro/qcomlt/ci/staging/cdba-tester/-/jobs/165617
>>
> [...]
> 
> Yeah, a 35-bit ZONE_DMA is sure to make stuff go wrong:
> 
>> [    0.000000] Zone ranges:
>> [    0.000000]   DMA      [mem 0x0000000080000000-0x0000000affffffff]
>> [    0.000000]   DMA32    empty
>> [    0.000000]   Normal   empty
> 
> Compared to before:
> 
> [    0.000000]   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
> [    0.000000]   DMA32    empty
> [    0.000000]   Normal   [mem 0x0000000100000000-0x0000000affffffff]
> 
> This'll be because the SoC DT is describing a general non-restrictive range:
>          dma-ranges = <0 0 0 0 0x10 0>;
> 
> Which proves we need more information than {acpi,of}_dma_get_max_cpu_address() are currently able to give us, because what zone_dma_limit actually wants to be is the *minimum* of the lowest highest CPU address of any DMA range, and the lowest CPU address of any DMA range + 2^32. I was thinking it had all ended up looking a bit too easy... :)
> 
> I think v1 of the fix[1] might actually work out for this, albeit still for the wrong reasons - if so, I concede that maybe at this point it might be safest to go back to that one as a quick short-term fix (with a big fat comment to say so) rather than try to rush the proper solution or revert everything.

Indeed v1 patches makes boot work again:
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   [mem 0x0000000100000000-0x0000000affffffff]

Please add my:
Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8550-HDK

Thanks,
Neil

> 
> Thanks,
> Robin.
> 
> [1] https://lore.kernel.org/linux-arm-kernel/731d204f5f556ad61bbaf004b1d984f83c90b4f5.1724748249.git.baruch@tkos.co.il/



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-08-29 14:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-11  7:09 [PATCH v6 RESED 0/2] dma: support DMA zone starting above 4GB Baruch Siach
2024-08-11  7:09 ` [PATCH v6 RESED 1/2] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
2024-08-12  5:52   ` Petr Tesarik
2024-08-12 11:22   ` Catalin Marinas
2024-08-16 11:52   ` Will Deacon
2024-08-16 14:37     ` Petr Tesařík
2024-08-26 19:28   ` Marek Szyprowski
2024-08-27  4:52     ` Baruch Siach
2024-08-27  6:14       ` Marek Szyprowski
2024-08-27  7:03         ` Baruch Siach
2024-08-27  7:46           ` Marek Szyprowski
2024-08-29 13:42   ` Neil Armstrong
2024-08-29 14:38     ` Robin Murphy
2024-08-29 14:54       ` neil.armstrong
2024-08-11  7:09 ` [PATCH v6 RESED 2/2] arm64: support DMA zone above 4GB Baruch Siach
2024-08-12  5:54   ` Petr Tesarik
2024-08-22  4:18 ` [PATCH v6 RESED 0/2] dma: support DMA zone starting " Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).