From: Baruch Siach <baruch@tkos.co.il>
To: "Catalin Marinas" <catalin.marinas@arm.com>,
"Petr Tesařík" <petr@tesarici.cz>
Cc: Will Deacon <will@kernel.org>, Christoph Hellwig <hch@lst.de>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Robin Murphy <robin.murphy@arm.com>,
Ramon Fried <ramon@neureality.ai>,
iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] arm64: DMA zone above 4GB
Date: Sun, 12 Nov 2023 09:25:46 +0200 [thread overview]
Message-ID: <877cmn7750.fsf@tarshish> (raw)
In-Reply-To: <ZU0QEL9ByWNYVki1@arm.com>
Hi Catalin, Petr,
Thanks for your detailed response.
See below a few comments and questions.
On Thu, Nov 09 2023, Catalin Marinas wrote:
> On Wed, Nov 08, 2023 at 07:30:22PM +0200, Baruch Siach wrote:
>> Consider a bus with this 'dma-ranges' property:
>>
>> #address-cells = <2>;
>> #size-cells = <2>;
>> dma-ranges = <0x00000000 0xc0000000 0x00000008 0x00000000 0x0 0x40000000>;
>>
>> Devices under this bus can see 1GB of DMA range between 3GB-4GB. This
>> range is mapped to CPU memory at 32GB-33GB.
>
> Is this on real hardware or just theoretical for now (the rest of your
> email implies it's real)? Normally I'd expected the first GB (or first
> two) of RAM from 32G to be aliased to the lower 32-bit range for the CPU
> view as well, not just for devices. You'd then get a ZONE_DMA without
> having to play with DMA offsets.
This hardware is currently in fabrication past tapeout. Software tests
are running on FPGA models and software simulators.
CPU view of the 3GB-4GB range is not linear with DMA addresses. That is,
for offset N where 0 <= N <= 1GB, the CPU address 3GB+N does not map to
the same physical location of DMA address 3GB+N. Hardware engineers are
not sure this is fixable. So as is often the case we look at software to
save us. After all, from hardware perspective this design "works".
>> Current zone_sizes_init() code considers 'dma-ranges' only when it maps
>> to RAM under 4GB, because zone_dma_bits is limited to 32. In this case
>> 'dma-ranges' is ignored in practice, since DMA/DMA32 zones are both
>> assumed to be located under 4GB. The result is that the stmmac driver
>> DMA buffers allocation GFP_DMA32 flag has no effect. As a result DMA
>> buffer allocations fail.
>>
>> The patch below is a crude workaround hack. It makes the DMA zone
>> cover the 1GB memory area that is visible to stmmac DMA as follows:
>>
>> [ 0.000000] Zone ranges:
>> [ 0.000000] DMA [mem 0x0000000800000000-0x000000083fffffff]
>> [ 0.000000] DMA32 empty
>> [ 0.000000] Normal [mem 0x0000000840000000-0x0000000bffffffff]
>> ...
>> [ 0.000000] software IO TLB: mapped [mem 0x000000083bfff000-0x000000083ffff000] (64MB)
>>
>> With this hack the stmmac driver works on my platform with no
>> modification.
>>
>> Clearly this can't be the right solutions. zone_dma_bits is now wrong for
>> one. It probably breaks other code as well.
>
> zone_dma_bits ends up as 36 if I counted correctly. So DMA_BIT_MASK(36)
> is 0xf_ffff_ffff and the phys_limit for your device is below this mask,
> so dma_direct_optimal_gfp_mask() does end up setting GFP_DMA. However,
> looking at how it sets GFP_DMA32, it is obvious that the code is not set
> up for such configurations. I'm also not a big fan of zone_dma_bits
> describing a mask that goes well above what the device can access.
>
> A workaround would be for zone_dma_bits to become a *_limit and sort out
> all places where we compare masks with masks derived from zone_dma_bits
> (e.g. cma_in_zone(), dma_direct_supported()).
I was also thinking along these lines. I wasn't sure I see the entire
picture, so I hesitated to suggest a patch. Specifically, the assumption
that DMA range limits are power of 2 looks deeply ingrained in the
code. Another assumption is that DMA32 zone is in low 4GB range.
I can work on an RFC implementation of this approach.
Petr suggested a more radical solution of per bus DMA constraints to
replace DMA/DMA32 zones. As Petr acknowledged, this does not look like a
near future solution.
> Alternatively, take the DMA offset into account when comparing the
> physical address corresponding to zone_dma_bits and keep zone_dma_bits
> small (phys offset subtracted, so in your case it would be 30 rather
> than 36).
I am not following here. Care to elaborate?
Thanks,
baruch
--
~. .~ Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
- baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2023-11-12 7:40 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-08 17:30 [PATCH RFC] arm64: DMA zone above 4GB Baruch Siach
2023-11-09 5:57 ` Petr Tesařík
2023-11-09 17:00 ` Catalin Marinas
2023-11-12 7:25 ` Baruch Siach [this message]
2023-11-12 12:31 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877cmn7750.fsf@tarshish \
--to=baruch@tkos.co.il \
--cc=catalin.marinas@arm.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=petr@tesarici.cz \
--cc=ramon@neureality.ai \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).