linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Yang Shi <yang@os.amperecomputing.com>
Cc: Baruch Siach <baruch@tkos.co.il>,
	will@kernel.org, ptesarik@suse.com, hch@lst.de,
	jiangyutang@os.amperecomputing.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH] arm64: mm: fix zone_dma_limit calculation
Date: Wed, 27 Nov 2024 17:49:37 +0000	[thread overview]
Message-ID: <Z0dbsRCsWT3hiVds@arm.com> (raw)
In-Reply-To: <98583682-a95e-440e-bd89-03828998b48e@os.amperecomputing.com>

+ Robin

On Tue, Nov 26, 2024 at 09:38:22AM -0800, Yang Shi wrote:
> On 11/25/24 10:27 PM, Baruch Siach wrote:
> > On Mon, Nov 25 2024, Yang Shi wrote:
> > > The commit ba0fb44aed47 ("dma-mapping: replace zone_dma_bits by
> > > zone_dma_limit") changed how zone_dma_limit was calculated.  Now it
> > > returns the memsize limit in IORT or device tree instead of U32_MAX if
> > > the memsize limit is greater than U32_MAX.
> > 
> > Can you give a concrete example of memory layout and dma-ranges that
> > demonstrates this issue?
> 
> Our 2 sockets system has physical memory starts at 0x0 on node 0 and
> 0x200000000000 on node 1. The memory size limit defined in IORT is 0x30 (48
> bits).
> 
> The DMA zone is:
> 
> pages free     887722
>         boost    0
>         min      229
>         low      1108
>         high     1987
>         promo    2866
>         spanned  983040
>         present  982034
>         managed  903238
>         cma      16384
>         protection: (0, 0, 124824, 0, 0)
>  start_pfn:           65536
> 
> When allocating DMA buffer, dma_direct_optimal_gfp_mask() is called to
> determine the proper zone constraints. If the phys_limit is less than
> zone_dma_limit, it will use GFP_DMA. But zone_dma_limit is 0xffffffffffff on
> v6.12 instead of 4G prior v6.12, it means all DMA buffer allocation will go
> to DMA zone even though the devices don't require it.
> 
> DMA zone is on node 0, so we saw excessive remote access on 2 sockets
> system.
[...]
> The physical addr range for DMA zone is correct, the problem is wrong
> zone_dma_limit. Before commit ba0fb44aed47 zone_dma_limit was 4G, after it
> it is the whole memory even though DMA zone just covers low 4G.

Thanks for the details. I agree that zone_dma_limit shouldn't be higher
than the ZONE_DMA upper boundary, otherwise it gets confusing for
functions like dma_direct_optimal_gfp_mask() and we may force
allocations to a specific range unnecessarily.

If IORT or DT indicate a large mask covering the whole RAM (i.e. no
restrictions), in an ideal world, we should normally extend ZONE_DMA to
the same. One problem is ZONE_DMA32 (and GFP_DMA32) and the fact that
ZONE_DMA sits below it. Until we hear otherwise, we assume a DMA offset
of 0 for such 32-bit devices and therefore define ZONE_DMA32 in the
lower 4GB if RAM starts below this limit (and an empty ZONE_DMA32 if RAM
starts above).

Another aspect to consider is that we don't always have DT or IORT
information or some devices need a smaller limit than what's advertised
in the firmware tables (typically 32-bit masks). This code went through
a couple of fixes already:

833bd284a454 ("arm64: mm: fix DMA zone when dma-ranges is missing")
122c234ef4e1 ("arm64: mm: keep low RAM dma zone")

Since our current assumption is to assume ZONE_DMA within 32-bit if RAM
below 4GB, I'm happy to make this conditional on CONFIG_ZONE_DMA32 also
being enabled. So, from your patch below:

> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index d21f67d67cf5..ccdef53872a0 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -117,15 +117,6 @@ static void __init arch_reserve_crashkernel(void)
>   static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
>   {
> -	/**
> -	 * Information we get from firmware (e.g. DT dma-ranges) describe DMA
> -	 * bus constraints. Devices using DMA might have their own limitations.
> -	 * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM
> -	 * DMA zone on platforms that have RAM there.
> -	 */
> -	if (memblock_start_of_DRAM() < U32_MAX)
> -		zone_limit = min(zone_limit, U32_MAX);
> -
>   	return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
>   }

This part is fine.

> @@ -141,6 +132,14 @@ static void __init zone_sizes_init(void)
>   	acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
>   	dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
>   	zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
> +	/*
> +	 * Information we get from firmware (e.g. DT dma-ranges) describe DMA
> +	 * bus constraints. Devices using DMA might have their own limitations.
> +	 * Some of them rely on DMA zone in low 32-bit memory. Keep low RAM
> +	 * DMA zone on platforms that have RAM there.
> +	 */
> +	if (memblock_start_of_DRAM() < U32_MAX)
> +		zone_dma_limit = min(zone_dma_limit, U32_MAX);
>   	arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
>   	max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>   #endif

But I'd move the zone_dma_limit update further down in the
CONFIG_ZONE_DMA32 block. I think we only need to limit it to
dma32_phys_limit and ignore the U32_MAX check. The former is already
capped to 32-bit. For the second hunk, something like below (untested):

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index d21f67d67cf5..ffaf5bd8d0a1 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -146,8 +146,10 @@ static void __init zone_sizes_init(void)
 #endif
 #ifdef CONFIG_ZONE_DMA32
 	max_zone_pfns[ZONE_DMA32] = PFN_DOWN(dma32_phys_limit);
-	if (!arm64_dma_phys_limit)
+	if (!arm64_dma_phys_limit || arm64_dma_phys_limit > dma32_phys_limit) {
 		arm64_dma_phys_limit = dma32_phys_limit;
+		zone_dma_limit = arm64_dma_phys_limit - 1;
+	}
 #endif
 	if (!arm64_dma_phys_limit)
 		arm64_dma_phys_limit = PHYS_MASK + 1;

With some comment on why we do this but most likely not the current
comment in max_zone_phys() (more like keep ZONE_DMA below ZONE_DMA32).

-- 
Catalin


  reply	other threads:[~2024-11-27 17:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-25 17:16 [PATCH] arm64: mm: fix zone_dma_limit calculation Yang Shi
2024-11-26  6:27 ` Baruch Siach
2024-11-26 17:38   ` Yang Shi
2024-11-27 17:49     ` Catalin Marinas [this message]
2024-11-29 18:06       ` Robin Murphy
2024-11-29 19:38         ` Catalin Marinas
2024-12-02 16:41           ` Yang Shi
2024-12-03 21:20 ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z0dbsRCsWT3hiVds@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=baruch@tkos.co.il \
    --cc=hch@lst.de \
    --cc=jiangyutang@os.amperecomputing.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ptesarik@suse.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    --cc=yang@os.amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).