From: Catalin Marinas <catalin.marinas@arm.com>
To: Isaac Manjarres <isaacmanjarres@google.com>
Cc: Christoph Hellwig <hch@lst.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Arnd Bergmann <arnd@arndb.de>, Will Deacon <will@kernel.org>,
Marc Zyngier <maz@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
Ard Biesheuvel <ardb@kernel.org>,
Saravana Kannan <saravanak@google.com>,
linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 2/2] treewide: Add the __GFP_PACKED flag to several non-DMA kmalloc() allocations
Date: Wed, 2 Nov 2022 11:05:54 +0000 [thread overview]
Message-ID: <Y2JPEqRdb9ua9tbj@arm.com> (raw)
In-Reply-To: <Y2FvO2raNElTdeQt@google.com>
On Tue, Nov 01, 2022 at 12:10:51PM -0700, Isaac Manjarres wrote:
> On Tue, Nov 01, 2022 at 06:39:40PM +0100, Christoph Hellwig wrote:
> > On Tue, Nov 01, 2022 at 05:32:14PM +0000, Catalin Marinas wrote:
> > > There's also the case of low-end phones with all RAM below 4GB and arm64
> > > doesn't allocate the swiotlb. Not sure those vendors would go with a
> > > recent kernel anyway.
> > >
> > > So the need for swiotlb now changes from 32-bit DMA to any DMA
> > > (non-coherent but we can't tell upfront when booting, devices may be
> > > initialised pretty late).
>
> Not only low-end phones, but there are other form-factors that can fall
> into this category and are also memory constrained (e.g. wearable
> devices), so the memory headroom impact from enabling SWIOTLB might be
> non-negligible for all of these devices. I also think it's feasible for
> those devices to use recent kernels.
Another option I had in mind is to disable this bouncing if there's no
swiotlb buffer, so kmalloc() will return ARCH_DMA_MINALIGN (or the
typically lower cache_line_size()) aligned objects. That's at least
until we find a lighter way to do bouncing. Those devices would work as
before.
> > Yes. The other option would be to use the dma coherent pool for the
> > bouncing, which must be present on non-coherent systems anyway. But
> > it would require us to write a new set of bounce buffering routines.
>
> I think in addition to having to write new bounce buffering routines,
> this approach still suffers the same problem as SWIOTLB, which is that
> the memory for SWIOTLB and/or the dma coherent pool is not reclaimable,
> even when it is not used.
The dma coherent pool at least it has the advantage that its size can be
increased at run-time and we can start with a small one. Not decreased
though, but if really needed I guess it can be added.
We'd also skip some cache maintenance here since the coherent pool is
mapped as non-cacheable already. But to Christoph's point, it does
require some reworking of the current bouncing code.
> There's not enough context in the DMA mapping routines to know if we need
> an atomic allocation, so if we used kmalloc(), instead of SWIOTLB, to
> dynamically allocate memory, it would always have to use GFP_ATOMIC.
I've seen the expression below in a couple of places in the kernel,
though IIUC in_atomic() doesn't always detect atomic contexts:
gfpflags = (in_atomic() || irqs_disabled()) ? GFP_ATOMIC : GFP_KERNEL;
> But what about having a pool that has a small amount of memory and is
> composed of several objects that can be used for small DMA transfers?
> If the amount of memory in the pool starts falling below a certain
> threshold, there can be a worker thread--so that we don't have to use
> GFP_ATOMIC--that can add more memory to the pool?
If the rate of allocation is high, it may end up calling a slab
allocator directly with GFP_ATOMIC.
The main downside of any memory pool is identifying the original pool in
dma_unmap_*(). We have a simple is_swiotlb_buffer() check looking just
at the bounce buffer boundaries. For the coherent pool we have the more
complex dma_free_from_pool().
With a kmem_cache-based allocator (whether it's behind a mempool or
not), we'd need something like virt_to_cache() and checking whether it
is from our DMA cache. I'm not a big fan of digging into the slab
internals for this. An alternative could be some xarray to remember the
bounced dma_addr.
Anyway, I propose that we try the swiotlb first and look at optimising
it from there, initially using the dma coherent pool.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Isaac Manjarres <isaacmanjarres@google.com>
Cc: Christoph Hellwig <hch@lst.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Arnd Bergmann <arnd@arndb.de>, Will Deacon <will@kernel.org>,
Marc Zyngier <maz@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
Ard Biesheuvel <ardb@kernel.org>,
Saravana Kannan <saravanak@google.com>,
linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 2/2] treewide: Add the __GFP_PACKED flag to several non-DMA kmalloc() allocations
Date: Wed, 2 Nov 2022 11:05:54 +0000 [thread overview]
Message-ID: <Y2JPEqRdb9ua9tbj@arm.com> (raw)
In-Reply-To: <Y2FvO2raNElTdeQt@google.com>
On Tue, Nov 01, 2022 at 12:10:51PM -0700, Isaac Manjarres wrote:
> On Tue, Nov 01, 2022 at 06:39:40PM +0100, Christoph Hellwig wrote:
> > On Tue, Nov 01, 2022 at 05:32:14PM +0000, Catalin Marinas wrote:
> > > There's also the case of low-end phones with all RAM below 4GB and arm64
> > > doesn't allocate the swiotlb. Not sure those vendors would go with a
> > > recent kernel anyway.
> > >
> > > So the need for swiotlb now changes from 32-bit DMA to any DMA
> > > (non-coherent but we can't tell upfront when booting, devices may be
> > > initialised pretty late).
>
> Not only low-end phones, but there are other form-factors that can fall
> into this category and are also memory constrained (e.g. wearable
> devices), so the memory headroom impact from enabling SWIOTLB might be
> non-negligible for all of these devices. I also think it's feasible for
> those devices to use recent kernels.
Another option I had in mind is to disable this bouncing if there's no
swiotlb buffer, so kmalloc() will return ARCH_DMA_MINALIGN (or the
typically lower cache_line_size()) aligned objects. That's at least
until we find a lighter way to do bouncing. Those devices would work as
before.
> > Yes. The other option would be to use the dma coherent pool for the
> > bouncing, which must be present on non-coherent systems anyway. But
> > it would require us to write a new set of bounce buffering routines.
>
> I think in addition to having to write new bounce buffering routines,
> this approach still suffers the same problem as SWIOTLB, which is that
> the memory for SWIOTLB and/or the dma coherent pool is not reclaimable,
> even when it is not used.
The dma coherent pool at least it has the advantage that its size can be
increased at run-time and we can start with a small one. Not decreased
though, but if really needed I guess it can be added.
We'd also skip some cache maintenance here since the coherent pool is
mapped as non-cacheable already. But to Christoph's point, it does
require some reworking of the current bouncing code.
> There's not enough context in the DMA mapping routines to know if we need
> an atomic allocation, so if we used kmalloc(), instead of SWIOTLB, to
> dynamically allocate memory, it would always have to use GFP_ATOMIC.
I've seen the expression below in a couple of places in the kernel,
though IIUC in_atomic() doesn't always detect atomic contexts:
gfpflags = (in_atomic() || irqs_disabled()) ? GFP_ATOMIC : GFP_KERNEL;
> But what about having a pool that has a small amount of memory and is
> composed of several objects that can be used for small DMA transfers?
> If the amount of memory in the pool starts falling below a certain
> threshold, there can be a worker thread--so that we don't have to use
> GFP_ATOMIC--that can add more memory to the pool?
If the rate of allocation is high, it may end up calling a slab
allocator directly with GFP_ATOMIC.
The main downside of any memory pool is identifying the original pool in
dma_unmap_*(). We have a simple is_swiotlb_buffer() check looking just
at the bounce buffer boundaries. For the coherent pool we have the more
complex dma_free_from_pool().
With a kmem_cache-based allocator (whether it's behind a mempool or
not), we'd need something like virt_to_cache() and checking whether it
is from our DMA cache. I'm not a big fan of digging into the slab
internals for this. An alternative could be some xarray to remember the
bounced dma_addr.
Anyway, I propose that we try the swiotlb first and look at optimising
it from there, initially using the dma coherent pool.
--
Catalin
next prev parent reply other threads:[~2022-11-02 11:07 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-25 20:52 [PATCH v2 0/2] mm: Allow kmalloc() allocations below ARCH_KMALLOC_MINALIGN Catalin Marinas
2022-10-25 20:52 ` Catalin Marinas
2022-10-25 20:52 ` [PATCH v2 1/2] mm: slab: Introduce __GFP_PACKED for smaller kmalloc() alignments Catalin Marinas
2022-10-25 20:52 ` Catalin Marinas
2022-10-26 6:39 ` Greg Kroah-Hartman
2022-10-26 6:39 ` Greg Kroah-Hartman
2022-10-26 8:39 ` Catalin Marinas
2022-10-26 8:39 ` Catalin Marinas
2022-10-26 9:49 ` Greg Kroah-Hartman
2022-10-26 9:49 ` Greg Kroah-Hartman
2022-10-26 9:58 ` Catalin Marinas
2022-10-26 9:58 ` Catalin Marinas
2022-10-27 12:11 ` Hyeonggon Yoo
2022-10-27 12:11 ` Hyeonggon Yoo
2022-10-28 7:32 ` Catalin Marinas
2022-10-28 7:32 ` Catalin Marinas
2022-10-25 20:52 ` [PATCH v2 2/2] treewide: Add the __GFP_PACKED flag to several non-DMA kmalloc() allocations Catalin Marinas
2022-10-25 20:52 ` Catalin Marinas
2022-10-26 6:50 ` Greg Kroah-Hartman
2022-10-26 6:50 ` Greg Kroah-Hartman
2022-10-26 9:48 ` Catalin Marinas
2022-10-26 9:48 ` Catalin Marinas
2022-10-26 12:59 ` Greg Kroah-Hartman
2022-10-26 12:59 ` Greg Kroah-Hartman
2022-10-26 17:09 ` Catalin Marinas
2022-10-26 17:09 ` Catalin Marinas
2022-10-26 17:21 ` Greg Kroah-Hartman
2022-10-26 17:21 ` Greg Kroah-Hartman
2022-10-26 17:46 ` Linus Torvalds
2022-10-26 17:46 ` Linus Torvalds
2022-10-27 22:29 ` Catalin Marinas
2022-10-27 22:29 ` Catalin Marinas
2022-10-28 9:37 ` Greg Kroah-Hartman
2022-10-28 9:37 ` Greg Kroah-Hartman
2022-10-28 9:37 ` Greg Kroah-Hartman
2022-10-28 9:37 ` Greg Kroah-Hartman
2022-10-30 8:47 ` Christoph Hellwig
2022-10-30 8:47 ` Christoph Hellwig
2022-10-30 9:02 ` Greg Kroah-Hartman
2022-10-30 9:02 ` Greg Kroah-Hartman
2022-10-30 9:13 ` Christoph Hellwig
2022-10-30 9:13 ` Christoph Hellwig
2022-10-30 16:43 ` Catalin Marinas
2022-10-30 16:43 ` Catalin Marinas
2022-11-01 10:59 ` Christoph Hellwig
2022-11-01 10:59 ` Christoph Hellwig
2022-11-01 17:19 ` Catalin Marinas
2022-11-01 17:19 ` Catalin Marinas
2022-11-01 17:24 ` Christoph Hellwig
2022-11-01 17:24 ` Christoph Hellwig
2022-11-01 17:32 ` Catalin Marinas
2022-11-01 17:32 ` Catalin Marinas
2022-11-01 17:39 ` Christoph Hellwig
2022-11-01 17:39 ` Christoph Hellwig
2022-11-01 19:10 ` Isaac Manjarres
2022-11-01 19:10 ` Isaac Manjarres
2022-11-02 11:05 ` Catalin Marinas [this message]
2022-11-02 11:05 ` Catalin Marinas
2022-11-02 20:50 ` Isaac Manjarres
2022-11-02 20:50 ` Isaac Manjarres
2022-11-01 18:14 ` Robin Murphy
2022-11-01 18:14 ` Robin Murphy
2022-11-02 13:10 ` Catalin Marinas
2022-11-02 13:10 ` Catalin Marinas
2022-10-30 8:46 ` Christoph Hellwig
2022-10-30 8:46 ` Christoph Hellwig
2022-10-30 8:44 ` Christoph Hellwig
2022-10-30 8:44 ` Christoph Hellwig
2022-11-03 16:15 ` Vlastimil Babka
2022-11-03 16:15 ` Vlastimil Babka
2022-11-03 18:03 ` Catalin Marinas
2022-11-03 18:03 ` Catalin Marinas
2022-10-26 6:54 ` [PATCH v2 0/2] mm: Allow kmalloc() allocations below ARCH_KMALLOC_MINALIGN Greg Kroah-Hartman
2022-10-26 6:54 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y2JPEqRdb9ua9tbj@arm.com \
--to=catalin.marinas@arm.com \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=arnd@arndb.de \
--cc=gregkh@linuxfoundation.org \
--cc=hch@lst.de \
--cc=herbert@gondor.apana.org.au \
--cc=isaacmanjarres@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=maz@kernel.org \
--cc=saravanak@google.com \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.