All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: "Petr Tesařík" <petr@tesarici.cz>
Cc: Baruch Siach <baruch@tkos.co.il>, Christoph Hellwig <hch@lst.de>,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org, Ramon Fried <ramon@neureality.ai>,
	Elad Nachman <enachman@marvell.com>
Subject: Re: [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit
Date: Wed, 7 Aug 2024 19:14:58 +0100	[thread overview]
Message-ID: <ZrO5okGUljTc9E7N@arm.com> (raw)
In-Reply-To: <20240807161938.5729b656@mordecai.tesarici.cz>

On Wed, Aug 07, 2024 at 04:19:38PM +0200, Petr Tesařík wrote:
> On Fri, 2 Aug 2024 10:37:38 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Fri, Aug 02, 2024 at 09:03:47AM +0300, Baruch Siach wrote:
> > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > > index 3b4be4ca3b08..62b36fda44c9 100644
> > > --- a/kernel/dma/direct.c
> > > +++ b/kernel/dma/direct.c
> > > @@ -20,7 +20,7 @@
> > >   * it for entirely different regions. In that case the arch code needs to
> > >   * override the variable below for dma-direct to work properly.
> > >   */
> > > -unsigned int zone_dma_bits __ro_after_init = 24;
> > > +u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);  
> > 
> > u64 here makes sense even if it may be larger than phys_addr_t. It
> > matches the phys_limit type in the swiotlb code. The compilers should no
> > longer complain.
> 
> FTR I have never quite understood why phys_limit is u64, but u64 was
> already used all around the place when I first looked into swiotlb.
> 
> > > diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> > > index d10613eb0f63..7b04f7575796 100644
> > > --- a/kernel/dma/pool.c
> > > +++ b/kernel/dma/pool.c
> > > @@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
> > >  	/* CMA can't cross zone boundaries, see cma_activate_area() */
> > >  	end = cma_get_base(cma) + size - 1;
> > >  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> > > -		return end <= DMA_BIT_MASK(zone_dma_bits);
> > > +		return end <= zone_dma_limit;
> > >  	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
> > > -		return end <= DMA_BIT_MASK(32);
> > > +		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
> > >  	return true;
> > >  }
> > >  
> > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > index 043b0ecd3e8d..bb51bd5335ad 100644
> > > --- a/kernel/dma/swiotlb.c
> > > +++ b/kernel/dma/swiotlb.c
> > > @@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> > >  	if (!remap)
> > >  		io_tlb_default_mem.can_grow = true;
> > >  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
> > > -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
> > > +		io_tlb_default_mem.phys_limit = zone_dma_limit;
> > >  	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
> > > -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
> > > +		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
> > >  	else
> > >  		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
> > >  #endif  
> > 
> > These two look correct to me now and it's the least intrusive (the
> > alternative would have been a zone_dma32_limit). The arch code, however,
> > needs to ensure that zone_dma_limit can always support 32-bit devices
> > even if it is above 4GB (with the relevant dma offsets in place for such
> > devices).
> 
> Just to make sure, the DMA zone (if present) must map to at most 32-bit
> bus address space (possibly behind a bridge). Is that what you're
> saying?

No exactly. What I'm trying to say is that on arm64 zone_dma_limit can
go beyond DMA_BIT_MASK(32) when the latter is treated as a CPU address.
In such cases, ZONE_DMA32 is empty.

TBH, this code is confusing and not entirely suitable for a system where
the CPU address offsets are not 0. The device::dma_coherent_mask is
about the bus address range and phys_limit is calculated correctly in
functions like dma_direct_optimal_gfp_mask(). But that's about it w.r.t.
DMA bit masks because zone_dma_bits and DMA_BIT_MASK(32) are assumed to
be about the CPU address ranges in some cases (in other cases
DMA_BIT_MASK() is used to initialise dma_coherent_mask, so more of a bus
address).

On the platform Baruch is trying to fix, RAM starts at 32GB and ZONE_DMA
should end at 33GB. That's 30-bit mask in bus address terms but
something not a power of two for the CPU address, hence the
zone_dma_limit introduced here.

With ZONE_DMA32, since all the DMA code assumes that ZONE_DMA32 ends at
4GB CPU address, it doesn't really work for such platforms. If there are
32-bit devices with a corresponding CPU address offset, ZONE_DMA32
should end at 36GB on Baruch's platform. But to simplify things, we just
ignore this on arm64 and make ZONE_DMA32 empty.

In some cases where we have the device structure we could instead do a
dma_to_phys(DMA_BIT_MASK(32)) but not in the two cases above. I guess if
we really want to address this properly, we'd need to introduce a
zone_dma32_limit that's initialised by the arch code. For arm64, I'm
happy with just having an empty ZONE_DMA32 on such platforms.

-- 
Catalin

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: "Petr Tesařík" <petr@tesarici.cz>
Cc: linux-s390@vger.kernel.org, Baruch Siach <baruch@tkos.co.il>,
	Ramon Fried <ramon@neureality.ai>, Will Deacon <will@kernel.org>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	iommu@lists.linux.dev, Elad Nachman <enachman@marvell.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Christoph Hellwig <hch@lst.de>,
	linux-arm-kernel@lists.infradead.org,
	Marek Szyprowski <m.szyprowski@samsung.com>
Subject: Re: [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit
Date: Wed, 7 Aug 2024 19:14:58 +0100	[thread overview]
Message-ID: <ZrO5okGUljTc9E7N@arm.com> (raw)
In-Reply-To: <20240807161938.5729b656@mordecai.tesarici.cz>

On Wed, Aug 07, 2024 at 04:19:38PM +0200, Petr Tesařík wrote:
> On Fri, 2 Aug 2024 10:37:38 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On Fri, Aug 02, 2024 at 09:03:47AM +0300, Baruch Siach wrote:
> > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > > index 3b4be4ca3b08..62b36fda44c9 100644
> > > --- a/kernel/dma/direct.c
> > > +++ b/kernel/dma/direct.c
> > > @@ -20,7 +20,7 @@
> > >   * it for entirely different regions. In that case the arch code needs to
> > >   * override the variable below for dma-direct to work properly.
> > >   */
> > > -unsigned int zone_dma_bits __ro_after_init = 24;
> > > +u64 zone_dma_limit __ro_after_init = DMA_BIT_MASK(24);  
> > 
> > u64 here makes sense even if it may be larger than phys_addr_t. It
> > matches the phys_limit type in the swiotlb code. The compilers should no
> > longer complain.
> 
> FTR I have never quite understood why phys_limit is u64, but u64 was
> already used all around the place when I first looked into swiotlb.
> 
> > > diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> > > index d10613eb0f63..7b04f7575796 100644
> > > --- a/kernel/dma/pool.c
> > > +++ b/kernel/dma/pool.c
> > > @@ -70,9 +70,9 @@ static bool cma_in_zone(gfp_t gfp)
> > >  	/* CMA can't cross zone boundaries, see cma_activate_area() */
> > >  	end = cma_get_base(cma) + size - 1;
> > >  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA))
> > > -		return end <= DMA_BIT_MASK(zone_dma_bits);
> > > +		return end <= zone_dma_limit;
> > >  	if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32))
> > > -		return end <= DMA_BIT_MASK(32);
> > > +		return end <= max(DMA_BIT_MASK(32), zone_dma_limit);
> > >  	return true;
> > >  }
> > >  
> > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > index 043b0ecd3e8d..bb51bd5335ad 100644
> > > --- a/kernel/dma/swiotlb.c
> > > +++ b/kernel/dma/swiotlb.c
> > > @@ -450,9 +450,9 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
> > >  	if (!remap)
> > >  		io_tlb_default_mem.can_grow = true;
> > >  	if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp_mask & __GFP_DMA))
> > > -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(zone_dma_bits);
> > > +		io_tlb_default_mem.phys_limit = zone_dma_limit;
> > >  	else if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp_mask & __GFP_DMA32))
> > > -		io_tlb_default_mem.phys_limit = DMA_BIT_MASK(32);
> > > +		io_tlb_default_mem.phys_limit = max(DMA_BIT_MASK(32), zone_dma_limit);
> > >  	else
> > >  		io_tlb_default_mem.phys_limit = virt_to_phys(high_memory - 1);
> > >  #endif  
> > 
> > These two look correct to me now and it's the least intrusive (the
> > alternative would have been a zone_dma32_limit). The arch code, however,
> > needs to ensure that zone_dma_limit can always support 32-bit devices
> > even if it is above 4GB (with the relevant dma offsets in place for such
> > devices).
> 
> Just to make sure, the DMA zone (if present) must map to at most 32-bit
> bus address space (possibly behind a bridge). Is that what you're
> saying?

No exactly. What I'm trying to say is that on arm64 zone_dma_limit can
go beyond DMA_BIT_MASK(32) when the latter is treated as a CPU address.
In such cases, ZONE_DMA32 is empty.

TBH, this code is confusing and not entirely suitable for a system where
the CPU address offsets are not 0. The device::dma_coherent_mask is
about the bus address range and phys_limit is calculated correctly in
functions like dma_direct_optimal_gfp_mask(). But that's about it w.r.t.
DMA bit masks because zone_dma_bits and DMA_BIT_MASK(32) are assumed to
be about the CPU address ranges in some cases (in other cases
DMA_BIT_MASK() is used to initialise dma_coherent_mask, so more of a bus
address).

On the platform Baruch is trying to fix, RAM starts at 32GB and ZONE_DMA
should end at 33GB. That's 30-bit mask in bus address terms but
something not a power of two for the CPU address, hence the
zone_dma_limit introduced here.

With ZONE_DMA32, since all the DMA code assumes that ZONE_DMA32 ends at
4GB CPU address, it doesn't really work for such platforms. If there are
32-bit devices with a corresponding CPU address offset, ZONE_DMA32
should end at 36GB on Baruch's platform. But to simplify things, we just
ignore this on arm64 and make ZONE_DMA32 empty.

In some cases where we have the device structure we could instead do a
dma_to_phys(DMA_BIT_MASK(32)) but not in the two cases above. I guess if
we really want to address this properly, we'd need to introduce a
zone_dma32_limit that's initialised by the arch code. For arm64, I'm
happy with just having an empty ZONE_DMA32 on such platforms.

-- 
Catalin

  reply	other threads:[~2024-08-07 18:15 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-02  6:03 [PATCH v5 0/3] dma: support DMA zone starting above 4GB Baruch Siach
2024-08-02  6:03 ` Baruch Siach via Linuxppc-dev
2024-08-02  6:03 ` [PATCH v5 1/3] dma: improve DMA zone selection Baruch Siach
2024-08-02  6:03   ` Baruch Siach via Linuxppc-dev
2024-08-07 12:04   ` kernel test robot
2024-08-07 12:04     ` kernel test robot
2024-08-07 13:13   ` Robin Murphy
2024-08-07 13:13     ` Robin Murphy
2024-08-07 13:58     ` Catalin Marinas
2024-08-07 13:58       ` Catalin Marinas
2024-08-07 14:12       ` Petr Tesařík
2024-08-07 14:12         ` Petr Tesařík
2024-08-07 16:40   ` kernel test robot
2024-08-07 16:40     ` kernel test robot
2024-08-02  6:03 ` [PATCH v5 2/3] dma: replace zone_dma_bits by zone_dma_limit Baruch Siach
2024-08-02  6:03   ` Baruch Siach via Linuxppc-dev
2024-08-02  9:37   ` Catalin Marinas
2024-08-02  9:37     ` Catalin Marinas
2024-08-07 14:19     ` Petr Tesařík
2024-08-07 14:19       ` Petr Tesařík
2024-08-07 18:14       ` Catalin Marinas [this message]
2024-08-07 18:14         ` Catalin Marinas
2024-08-08  9:35         ` Petr Tesařík
2024-08-08  9:35           ` Petr Tesařík
2024-08-08 10:01           ` Robin Murphy
2024-08-08 10:01             ` Robin Murphy
2024-08-08 13:46           ` Catalin Marinas
2024-08-08 13:46             ` Catalin Marinas
2024-08-07 14:30   ` Petr Tesařík
2024-08-07 14:30     ` Petr Tesařík
2024-08-02  6:03 ` [PATCH v5 3/3] arm64: support DMA zone above 4GB Baruch Siach
2024-08-02  6:03   ` Baruch Siach via Linuxppc-dev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrO5okGUljTc9E7N@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=baruch@tkos.co.il \
    --cc=enachman@marvell.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=m.szyprowski@samsung.com \
    --cc=petr@tesarici.cz \
    --cc=ramon@neureality.ai \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.