From: Catalin Marinas <catalin.marinas@arm.com>
To: Baruch Siach <baruch@tkos.co.il>
Cc: "Christoph Hellwig" <hch@lst.de>,
"Marek Szyprowski" <m.szyprowski@samsung.com>,
"Rob Herring" <robh@kernel.org>,
"Saravana Kannan" <saravanak@google.com>,
"Will Deacon" <will@kernel.org>,
"Robin Murphy" <robin.murphy@arm.com>,
iommu@lists.linux.dev, devicetree@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
linux-s390@vger.kernel.org, "Petr Tesařík" <petr@tesarici.cz>,
"Ramon Fried" <ramon@neureality.ai>,
"Elad Nachman" <enachman@marvell.com>
Subject: Re: [PATCH RFC v2 1/5] dma-mapping: replace zone_dma_bits by zone_dma_limit
Date: Tue, 18 Jun 2024 19:26:02 +0100 [thread overview]
Message-ID: <ZnHROk1Xs7KcR4I0@arm.com> (raw)
In-Reply-To: <fda45c91f69e65ec14b9aaec9aa053e6982e5b87.1712642324.git.baruch@tkos.co.il>
(finally getting around to looking at this series, sorry for the delay)
On Tue, Apr 09, 2024 at 09:17:54AM +0300, Baruch Siach wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
>
> Hardware DMA limit might not be power of 2. When RAM range starts above
> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> can not encode this limit.
>
> Use direct phys_addr_t limit address for DMA zone limit.
>
> Following commits will add explicit base address to DMA zone.
>
> ---
> Catalin,
>
> This is taken almost verbatim from your email:
>
> https://lore.kernel.org/all/ZZ2HnHJV3gdzu1Aj@arm.com/
>
> Would you provide your sign-off?
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Thanks for writing a commit log. However, I think more work is needed.
See below.
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 03efd86dce0a..00508c69ca9e 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -113,36 +113,24 @@ static void __init arch_reserve_crashkernel(void)
> low_size, high);
> }
>
> -/*
> - * Return the maximum physical address for a zone accessible by the given bits
> - * limit. If DRAM starts above 32-bit, expand the zone to the maximum
> - * available memory, otherwise cap it at 32-bit.
> - */
> -static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> +static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
> {
> - phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
> - phys_addr_t phys_start = memblock_start_of_DRAM();
> -
> - if (phys_start > U32_MAX)
> - zone_mask = PHYS_ADDR_MAX;
> - else if (phys_start > zone_mask)
> - zone_mask = U32_MAX;
> -
> - return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
> + return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
> }
>
> static void __init zone_sizes_init(void)
> {
> unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};
> - unsigned int __maybe_unused acpi_zone_dma_bits;
> - unsigned int __maybe_unused dt_zone_dma_bits;
> - phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32);
> + phys_addr_t __maybe_unused acpi_zone_dma_limit;
> + phys_addr_t __maybe_unused dt_zone_dma_limit;
> + phys_addr_t __maybe_unused dma32_phys_limit =
> + max_zone_phys(DMA_BIT_MASK(32));
>
> #ifdef CONFIG_ZONE_DMA
> - acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
> - dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
> - zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
> - arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> + acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
> + dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
> + zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
> + arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
> max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> #endif
> #ifdef CONFIG_ZONE_DMA32
I think this goes wrong if zone_dma_limit ends up above 32-bit (e.g. no
restrictive dma-ranges properties) but the start of RAM is below 4G.
We'd simply reduce ZONE_DMA32 to zero and ZONE_DMA potentially covering
the whole RAM. Prior to this change, we capped zone_dma_bits to 32 via
min3(). I think we should maintain this cap if memblock_start_of_DRAM()
is below 4G.
We could fix this up in max_zone_phys() above:
if (memblock_start_of_DRAM() < U32_MAX)
zone_limit = min(U32_MAX, zone_limit);
return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4d543b1e9d57..3b2ebcd4f576 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -20,7 +20,7 @@
> * it for entirely different regions. In that case the arch code needs to
> * override the variable below for dma-direct to work properly.
> */
> -unsigned int zone_dma_bits __ro_after_init = 24;
> +phys_addr_t zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>
> static inline dma_addr_t phys_to_dma_direct(struct device *dev,
> phys_addr_t phys)
> @@ -59,7 +59,7 @@ static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 *phys_limit)
> * zones.
> */
> *phys_limit = dma_to_phys(dev, dma_limit);
> - if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> + if (*phys_limit <= zone_dma_limit)
> return GFP_DMA;
> if (*phys_limit <= DMA_BIT_MASK(32))
> return GFP_DMA32;
It's worth noting that if ZONE_DMA ends up entirely above 32-bit, there
won't be any ZONE_DMA32. Thinking about it, this could be a potential
problem. For example, if a device has a 32-bit DMA mask and an offset
that lifts this into the 32-36G range, the above may fail to set
GFP_DMA32.
Actually, I think these checks can go wrong even with the current
implementation, assuming RAM below 4G and no DMA offsets. For example,
we have two devices, one with a coherent mask of 30 bits, the other 31
bits. zone_dma_bits would be set to the smaller of the two, so 30 bit
(as per of_dma_get_max_cpu_address()). For the second device, phys_limit
would be ((1 << 31) - 1) but that's higher than DMA_BIT_MASK(30) so we
fail to set GFP_DMA. We do set GFP_DMA32 because of the second test but
that's not sufficient since that's 32-bit rather than 31-bit as the
device needs. Similarly if we have some weird device with a 33-bit DMA
coherent mask but the RAM is addressed by more bits. We'd fail to set
GFP_DMA32.
Ignoring this patch, I think the checks above in mainline should be
something like:
if (*phys_limit < DMA_BIT_MASK(32))
return GFP_DMA;
if (*phys_limit < memblock_end_of_DRAM())
return GFP_DMA32;
IOW, zone_dma_bits is pretty useless for this check IMHO. It gives us
the minimum hence not sufficient to test for devices that fall between
ZONE_DMA and ZONE_DMA32 coherent masks.
With your series, the above test wouldn't work since we don't have a
zone_dma32_limit and zone_dma_limit is above DMA_BIT_MASK(32). We might
need to introduce zone_dma32_limit and maybe drop zone_dma_limit
altogether.
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Baruch Siach <baruch@tkos.co.il>
Cc: "Rob Herring" <robh@kernel.org>,
linux-s390@vger.kernel.org, "Ramon Fried" <ramon@neureality.ai>,
"Saravana Kannan" <saravanak@google.com>,
devicetree@vger.kernel.org, "Petr Tesařík" <petr@tesarici.cz>,
"Will Deacon" <will@kernel.org>,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
iommu@lists.linux.dev, "Elad Nachman" <enachman@marvell.com>,
"Robin Murphy" <robin.murphy@arm.com>,
"Christoph Hellwig" <hch@lst.de>,
linux-arm-kernel@lists.infradead.org,
"Marek Szyprowski" <m.szyprowski@samsung.com>
Subject: Re: [PATCH RFC v2 1/5] dma-mapping: replace zone_dma_bits by zone_dma_limit
Date: Tue, 18 Jun 2024 19:26:02 +0100 [thread overview]
Message-ID: <ZnHROk1Xs7KcR4I0@arm.com> (raw)
In-Reply-To: <fda45c91f69e65ec14b9aaec9aa053e6982e5b87.1712642324.git.baruch@tkos.co.il>
(finally getting around to looking at this series, sorry for the delay)
On Tue, Apr 09, 2024 at 09:17:54AM +0300, Baruch Siach wrote:
> From: Catalin Marinas <catalin.marinas@arm.com>
>
> Hardware DMA limit might not be power of 2. When RAM range starts above
> 0, say 4GB, DMA limit of 30 bits should end at 5GB. A single high bit
> can not encode this limit.
>
> Use direct phys_addr_t limit address for DMA zone limit.
>
> Following commits will add explicit base address to DMA zone.
>
> ---
> Catalin,
>
> This is taken almost verbatim from your email:
>
> https://lore.kernel.org/all/ZZ2HnHJV3gdzu1Aj@arm.com/
>
> Would you provide your sign-off?
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Thanks for writing a commit log. However, I think more work is needed.
See below.
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 03efd86dce0a..00508c69ca9e 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -113,36 +113,24 @@ static void __init arch_reserve_crashkernel(void)
> low_size, high);
> }
>
> -/*
> - * Return the maximum physical address for a zone accessible by the given bits
> - * limit. If DRAM starts above 32-bit, expand the zone to the maximum
> - * available memory, otherwise cap it at 32-bit.
> - */
> -static phys_addr_t __init max_zone_phys(unsigned int zone_bits)
> +static phys_addr_t __init max_zone_phys(phys_addr_t zone_limit)
> {
> - phys_addr_t zone_mask = DMA_BIT_MASK(zone_bits);
> - phys_addr_t phys_start = memblock_start_of_DRAM();
> -
> - if (phys_start > U32_MAX)
> - zone_mask = PHYS_ADDR_MAX;
> - else if (phys_start > zone_mask)
> - zone_mask = U32_MAX;
> -
> - return min(zone_mask, memblock_end_of_DRAM() - 1) + 1;
> + return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
> }
>
> static void __init zone_sizes_init(void)
> {
> unsigned long max_zone_pfns[MAX_NR_ZONES] = {0};
> - unsigned int __maybe_unused acpi_zone_dma_bits;
> - unsigned int __maybe_unused dt_zone_dma_bits;
> - phys_addr_t __maybe_unused dma32_phys_limit = max_zone_phys(32);
> + phys_addr_t __maybe_unused acpi_zone_dma_limit;
> + phys_addr_t __maybe_unused dt_zone_dma_limit;
> + phys_addr_t __maybe_unused dma32_phys_limit =
> + max_zone_phys(DMA_BIT_MASK(32));
>
> #ifdef CONFIG_ZONE_DMA
> - acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
> - dt_zone_dma_bits = fls64(of_dma_get_max_cpu_address(NULL));
> - zone_dma_bits = min3(32U, dt_zone_dma_bits, acpi_zone_dma_bits);
> - arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
> + acpi_zone_dma_limit = acpi_iort_dma_get_max_cpu_address();
> + dt_zone_dma_limit = of_dma_get_max_cpu_address(NULL);
> + zone_dma_limit = min(dt_zone_dma_limit, acpi_zone_dma_limit);
> + arm64_dma_phys_limit = max_zone_phys(zone_dma_limit);
> max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
> #endif
> #ifdef CONFIG_ZONE_DMA32
I think this goes wrong if zone_dma_limit ends up above 32-bit (e.g. no
restrictive dma-ranges properties) but the start of RAM is below 4G.
We'd simply reduce ZONE_DMA32 to zero and ZONE_DMA potentially covering
the whole RAM. Prior to this change, we capped zone_dma_bits to 32 via
min3(). I think we should maintain this cap if memblock_start_of_DRAM()
is below 4G.
We could fix this up in max_zone_phys() above:
if (memblock_start_of_DRAM() < U32_MAX)
zone_limit = min(U32_MAX, zone_limit);
return min(zone_limit, memblock_end_of_DRAM() - 1) + 1;
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4d543b1e9d57..3b2ebcd4f576 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -20,7 +20,7 @@
> * it for entirely different regions. In that case the arch code needs to
> * override the variable below for dma-direct to work properly.
> */
> -unsigned int zone_dma_bits __ro_after_init = 24;
> +phys_addr_t zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);
>
> static inline dma_addr_t phys_to_dma_direct(struct device *dev,
> phys_addr_t phys)
> @@ -59,7 +59,7 @@ static gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 *phys_limit)
> * zones.
> */
> *phys_limit = dma_to_phys(dev, dma_limit);
> - if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
> + if (*phys_limit <= zone_dma_limit)
> return GFP_DMA;
> if (*phys_limit <= DMA_BIT_MASK(32))
> return GFP_DMA32;
It's worth noting that if ZONE_DMA ends up entirely above 32-bit, there
won't be any ZONE_DMA32. Thinking about it, this could be a potential
problem. For example, if a device has a 32-bit DMA mask and an offset
that lifts this into the 32-36G range, the above may fail to set
GFP_DMA32.
Actually, I think these checks can go wrong even with the current
implementation, assuming RAM below 4G and no DMA offsets. For example,
we have two devices, one with a coherent mask of 30 bits, the other 31
bits. zone_dma_bits would be set to the smaller of the two, so 30 bit
(as per of_dma_get_max_cpu_address()). For the second device, phys_limit
would be ((1 << 31) - 1) but that's higher than DMA_BIT_MASK(30) so we
fail to set GFP_DMA. We do set GFP_DMA32 because of the second test but
that's not sufficient since that's 32-bit rather than 31-bit as the
device needs. Similarly if we have some weird device with a 33-bit DMA
coherent mask but the RAM is addressed by more bits. We'd fail to set
GFP_DMA32.
Ignoring this patch, I think the checks above in mainline should be
something like:
if (*phys_limit < DMA_BIT_MASK(32))
return GFP_DMA;
if (*phys_limit < memblock_end_of_DRAM())
return GFP_DMA32;
IOW, zone_dma_bits is pretty useless for this check IMHO. It gives us
the minimum hence not sufficient to test for devices that fall between
ZONE_DMA and ZONE_DMA32 coherent masks.
With your series, the above test wouldn't work since we don't have a
zone_dma32_limit and zone_dma_limit is above DMA_BIT_MASK(32). We might
need to introduce zone_dma32_limit and maybe drop zone_dma_limit
altogether.
--
Catalin
next prev parent reply other threads:[~2024-06-18 18:26 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-09 6:17 [PATCH RFC v2 0/5] arm64: support DMA zone starting above 4GB Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-04-09 6:17 ` [PATCH RFC v2 1/5] dma-mapping: replace zone_dma_bits by zone_dma_limit Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-06-18 18:26 ` Catalin Marinas [this message]
2024-06-18 18:26 ` Catalin Marinas
2024-04-09 6:17 ` [PATCH RFC v2 2/5] of: get dma area lower limit Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-06-18 21:38 ` Catalin Marinas
2024-06-18 21:38 ` Catalin Marinas
2024-07-25 11:49 ` Baruch Siach
2024-07-25 11:49 ` Baruch Siach
2024-07-30 15:52 ` Catalin Marinas
2024-07-30 15:52 ` Catalin Marinas
2024-04-09 6:17 ` [PATCH RFC v2 3/5] of: unittest: add test for of_dma_get_cpu_limits() 'min' param Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-04-09 6:17 ` [PATCH RFC v2 4/5] dma-direct: add base offset to zone_dma_bits Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-06-18 21:40 ` Catalin Marinas
2024-06-18 21:40 ` Catalin Marinas
2024-04-09 6:17 ` [PATCH RFC v2 5/5] arm64: mm: take DMA zone offset into account Baruch Siach
2024-04-09 6:17 ` Baruch Siach
2024-04-09 6:17 ` Baruch Siach
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZnHROk1Xs7KcR4I0@arm.com \
--to=catalin.marinas@arm.com \
--cc=baruch@tkos.co.il \
--cc=devicetree@vger.kernel.org \
--cc=enachman@marvell.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=m.szyprowski@samsung.com \
--cc=petr@tesarici.cz \
--cc=ramon@neureality.ai \
--cc=robh@kernel.org \
--cc=robin.murphy@arm.com \
--cc=saravanak@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.