All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: Baruch Siach <baruch@tkos.co.il>,
	will@kernel.org, ptesarik@suse.com, hch@lst.de,
	jiangyutang@os.amperecomputing.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH] arm64: mm: fix zone_dma_limit calculation
Date: Wed, 27 Nov 2024 17:49:37 +0000	[thread overview]
Message-ID: <Z0dbsRCsWT3hiVds@arm.com> (raw)
In-Reply-To: <98583682-a95e-440e-bd89-03828998b48e@os.amperecomputing.com>

+ Robin

On Tue, Nov 26, 2024 at 09:38:22AM -0800, Yang Shi wrote:
> On 11/25/24 10:27 PM, Baruch Siach wrote:
> > On Mon, Nov 25 2024, Yang Shi wrote:
> > > The commit ba0fb44aed47 ("dma-mapping: replace zone_dma_bits by
> > > zone_dma_limit") changed how zone_dma_limit was calculated.  Now it
> > > returns the memsize limit in IORT or device tree instead of U32_MAX if
> > > the memsize limit is greater than U32_MAX.
> > 
> > Can you give a concrete example of memory layout and dma-ranges that
> > demonstrates this issue?
> 
> Our 2 sockets system has physical memory starts at 0x0 on node 0 and
> 0x200000000000 on node 1. The memory size limit defined in IORT is 0x30 (48
> bits).
> 
> The DMA zone is:
> 
> pages free     887722
>         boost    0
>         min      229
>         low      1108
>         high     1987
>         promo    2866
>         spanned  983040
>         present  982034
>         managed  903238
>         cma      16384
>         protection: (0, 0, 124824, 0, 0)
>  start_pfn:           65536
> 
> When allocating DMA buffer, dma_direct_optimal_gfp_mask() is called to
> determine the proper zone constraints. If the phys_limit is less than
> zone_dma_limit, it will use GFP_DMA. But zone_dma_limit is 0xffffffffffff on
> v6.12 instead of 4G prior v6.12, it means all DMA buffer allocation will go
> to DMA zone even though the devices don't require it.
> 
> DMA zone is on node 0, so we saw excessive remote access on 2 sockets
> system.
[...]
> The physical addr range for DMA zone is correct, the problem is wrong
> zone_dma_limit. Before commit ba0fb44aed47 zone_dma_limit was 4G, after it
> it is the whole memory even though DMA zone just covers low 4G.

Thanks for the details. I agree that zone_dma_limit shouldn't be higher
than the ZONE_DMA upper boundary, otherwise it gets confusing for
functions like dma_direct_optimal_gfp_mask() and we may force
allocations to a specific range unnecessarily.

If IORT or DT indicate a large mask covering the whole RAM (i.e. no
restrictions), in an ideal world, we should normally extend ZONE_DMA to
the same. One problem is ZONE_DMA32 (and GFP_DMA32) and the fact that
ZONE_DMA sits below it. Until we hear otherwise, we assume a DMA offset
of 0 for such 32-bit devices and therefore define ZONE_DMA32 in the
lower 4GB if RAM starts below this limit (and an empty ZONE_DMA32 if RAM
starts above).

Another aspect to consider is that we don't always have DT or IORT
information or some devices need a smaller limit than what's advertised
in the firmware tables (typically 32-bit masks). This code went through
a couple of fixes already:

833bd284a454 ("arm64: mm: fix DMA zone when dma-ranges is missing")
122c234ef4e1 ("arm64: mm: keep low RAM dma zone")

Since our current assumption is to assume ZONE_DMA within 32-bit if RAM
below 4GB, I'm happy to make this conditional on CONFIG_ZONE_DMA32 also
being enabled. So, from your patch below:

> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index d21f67d67cf5..ccdef53872a0 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -117,15 +117,6 @@ static void __init arch_reserve_crashkernel(void)
>   static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>   {
> -	/**
> -	 * Information we get from firmware (e.g. DT dma-ranges) describe DMA
> -	 * bus constraints. Devices using DMA might have their own limitations.
> -	 * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM
> -	 * DMA zone on platforms that have RAM there.
> -	 */
> -	if (memblock_start_of_DRAM() < U32_MAX)
> -		zone_limit = min(zone_limit, U32_MAX);
> -
>   	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
>   }

This part is fine.

> @@ -141,6 +132,14 @@ static void __init zone_sizes_init(void)
>   	acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
>   	dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
>   	zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
> +	/*
> +	 * Information we get from firmware (e.g. DT dma-ranges) describe DMA
> +	 * bus constraints. Devices using DMA might have their own limitations.
> +	 * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM
> +	 * DMA zone on platforms that have RAM there.
> +	 */
> +	if (memblock_start_of_DRAM() < U32_MAX)
> +		zone_dma_limit = min(zone_dma_limit, U32_MAX);
>   	arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
>   	max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>   #endif

But I'd move the zone_dma_limit update further down in the
CONFIG_ZONE_DMA32 block. I think we only need to limit it to
dma32_phys_limit and ignore the U32_MAX check. The former is already
capped to 32-bit. For the second hunk, something like below (untested):

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index d21f67d67cf5..ffaf5bd8d0a1 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -146,8 +146,10 @@ static void __init zone_sizes_init(void)
 #endif
 #ifdef CONFIG_ZONE_DMA32
 	max_zone_pfns[ZONE_DMA32] = PFN_DOWN(dma32_phys_limit);
-	if (!arm64_dma_phys_limit)
+	if (!arm64_dma_phys_limit || arm64_dma_phys_limit > dma32_phys_limit) {
 		arm64_dma_phys_limit = dma32_phys_limit;
+		zone_dma_limit = arm64_dma_phys_limit - 1;
+	}
 #endif
 	if (!arm64_dma_phys_limit)
 		arm64_dma_phys_limit = PHYS_MASK + 1;

With some comment on why we do this but most likely not the current
comment in max_zone_phys() (more like keep ZONE_DMA below ZONE_DMA32).

-- 
Catalin


  reply	other threads:[~2024-11-27 17:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-25 17:16 [PATCH] arm64: mm: fix zone_dma_limit calculation Yang Shi
2024-11-26  6:27 ` Baruch Siach
2024-11-26 17:38   ` Yang Shi
2024-11-27 17:49     ` Catalin Marinas [this message]
2024-11-29 18:06       ` Robin Murphy
2024-11-29 19:38         ` Catalin Marinas
2024-12-02 16:41           ` Yang Shi
2024-12-03 21:20 ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z0dbsRCsWT3hiVds@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=baruch@tkos.co.il \
    --cc=hch@lst.de \
    --cc=jiangyutang@os.amperecomputing.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ptesarik@suse.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.