[PATCH] dma-buf: heaps: Set allocation orders for larger page sizes

public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes
@ 2025-04-15 17:19 Juan Yescas
  2025-04-15 20:14 ` John Stultz
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Juan Yescas @ 2025-04-15 17:19 UTC (permalink / raw)
  To: Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T.J. Mercier, Christian König, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel
  Cc: jyescas, baohua, dmitry.osipenko, jaewon31.kim, Guangming.Cao,
	surenb, kaleshsingh

This change sets the allocation orders for the different page sizes
(4k, 16k, 64k) based on PAGE_SHIFT. Before this change, the orders
for large page sizes were calculated incorrectly, this caused system
heap to allocate from 2% to 4% more memory on 16KiB page size kernels.

This change was tested on 4k/16k page size kernels.

Signed-off-by: Juan Yescas <jyescas@google.com>
---
 drivers/dma-buf/heaps/system_heap.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index 26d5dc89ea16..54674c02dcb4 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -50,8 +50,15 @@ static gfp_t order_flags[] = {HIGH_ORDER_GFP, HIGH_ORDER_GFP, LOW_ORDER_GFP};
  * to match with the sizes often found in IOMMUs. Using order 4 pages instead
  * of order 0 pages can significantly improve the performance of many IOMMUs
  * by reducing TLB pressure and time spent updating page tables.
+ *
+ * Note: When the order is 0, the minimum allocation is PAGE_SIZE. The possible
+ * page sizes for ARM devices could be 4K, 16K and 64K.
  */
-static const unsigned int orders[] = {8, 4, 0};
+#define ORDER_1M (20 - PAGE_SHIFT)
+#define ORDER_64K (16 - PAGE_SHIFT)
+#define ORDER_FOR_PAGE_SIZE (0)
+static const unsigned int orders[] = {ORDER_1M, ORDER_64K, ORDER_FOR_PAGE_SIZE};
+
 #define NUM_ORDERS ARRAY_SIZE(orders)
 
 static struct sg_table *dup_sg_table(struct sg_table *table)
-- 
2.49.0.604.gff1f9ca942-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes
  2025-04-15 17:19 [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes Juan Yescas
@ 2025-04-15 20:14 ` John Stultz
  2025-04-15 20:57 ` T.J. Mercier
  2025-04-16 11:34 ` Christian König
  2 siblings, 0 replies; 8+ messages in thread
From: John Stultz @ 2025-04-15 20:14 UTC (permalink / raw)
  To: Juan Yescas
  Cc: Sumit Semwal, Benjamin Gaignard, Brian Starkey, T.J. Mercier,
	Christian König, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, baohua, dmitry.osipenko, jaewon31.kim,
	Guangming.Cao, surenb, kaleshsingh

On Tue, Apr 15, 2025 at 10:20 AM Juan Yescas <jyescas@google.com> wrote:
>
> This change sets the allocation orders for the different page sizes
> (4k, 16k, 64k) based on PAGE_SHIFT. Before this change, the orders
> for large page sizes were calculated incorrectly, this caused system
> heap to allocate from 2% to 4% more memory on 16KiB page size kernels.
>
> This change was tested on 4k/16k page size kernels.
>
> Signed-off-by: Juan Yescas <jyescas@google.com>

Seems reasonable to me.
Acked-by: John Stultz <jstultz@google.com>

thanks
-john

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes
  2025-04-15 17:19 [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes Juan Yescas
  2025-04-15 20:14 ` John Stultz
@ 2025-04-15 20:57 ` T.J. Mercier
  2025-04-16  2:28   ` 김재원
  2025-04-16 11:34 ` Christian König
  2 siblings, 1 reply; 8+ messages in thread
From: T.J. Mercier @ 2025-04-15 20:57 UTC (permalink / raw)
  To: Juan Yescas
  Cc: Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	Christian König, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel, baohua, dmitry.osipenko, jaewon31.kim,
	Guangming.Cao, surenb, kaleshsingh

On Tue, Apr 15, 2025 at 10:20 AM Juan Yescas <jyescas@google.com> wrote:
>
> This change sets the allocation orders for the different page sizes
> (4k, 16k, 64k) based on PAGE_SHIFT. Before this change, the orders
> for large page sizes were calculated incorrectly, this caused system
> heap to allocate from 2% to 4% more memory on 16KiB page size kernels.
>
> This change was tested on 4k/16k page size kernels.
>
> Signed-off-by: Juan Yescas <jyescas@google.com>

I think "dma-buf: system_heap:" would be better for the subject since
this is specific to the system heap.

Would you mind cleaning up the extra space on line 321 too?
@@ -318,7 +318,7 @@ static struct page
*alloc_largest_available(unsigned long size,
        int i;

        for (i = 0; i < NUM_ORDERS; i++) {
-               if (size <  (PAGE_SIZE << orders[i]))
+               if (size < (PAGE_SIZE << orders[i]))

With that,
Reviewed-by: T.J. Mercier <tjmercier@google.com>

Fixes: d963ab0f15fb ("dma-buf: system_heap: Allocate higher order
pages if available") is also probably a good idea.

> ---
>  drivers/dma-buf/heaps/system_heap.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index 26d5dc89ea16..54674c02dcb4 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -50,8 +50,15 @@ static gfp_t order_flags[] = {HIGH_ORDER_GFP, HIGH_ORDER_GFP, LOW_ORDER_GFP};
>   * to match with the sizes often found in IOMMUs. Using order 4 pages instead
>   * of order 0 pages can significantly improve the performance of many IOMMUs
>   * by reducing TLB pressure and time spent updating page tables.
> + *
> + * Note: When the order is 0, the minimum allocation is PAGE_SIZE. The possible
> + * page sizes for ARM devices could be 4K, 16K and 64K.
>   */
> -static const unsigned int orders[] = {8, 4, 0};
> +#define ORDER_1M (20 - PAGE_SHIFT)
> +#define ORDER_64K (16 - PAGE_SHIFT)
> +#define ORDER_FOR_PAGE_SIZE (0)
> +static const unsigned int orders[] = {ORDER_1M, ORDER_64K, ORDER_FOR_PAGE_SIZE};
> +
>  #define NUM_ORDERS ARRAY_SIZE(orders)
>
>  static struct sg_table *dup_sg_table(struct sg_table *table)
> --
> 2.49.0.604.gff1f9ca942-goog
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes
  2025-04-15 20:57 ` T.J. Mercier
@ 2025-04-16  2:28   ` 김재원
  2025-04-16  3:58     ` Juan Yescas
  0 siblings, 1 reply; 8+ messages in thread
From: 김재원 @ 2025-04-16  2:28 UTC (permalink / raw)
  To: 'T.J. Mercier', 'Juan Yescas'
  Cc: 'Sumit Semwal', 'Benjamin Gaignard',
	'Brian Starkey', 'John Stultz',
	'Christian König', linux-media, dri-devel,
	linaro-mm-sig, linux-kernel, baohua, dmitry.osipenko,
	Guangming.Cao, surenb, kaleshsingh



> -----Original Message-----
> From: T.J. Mercier [mailto:tjmercier@google.com]
> Sent: Wednesday, April 16, 2025 5:57 AM
> To: Juan Yescas <jyescas@google.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>; Benjamin Gaignard
> <benjamin.gaignard@collabora.com>; Brian Starkey <Brian.Starkey@arm.com>;
> John Stultz <jstultz@google.com>; Christian König
> <christian.koenig@amd.com>; linux-media@vger.kernel.org; dri-
> devel@lists.freedesktop.org; linaro-mm-sig@lists.linaro.org; linux-
> kernel@vger.kernel.org; baohua@kernel.org; dmitry.osipenko@collabora.com;
> jaewon31.kim@samsung.com; Guangming.Cao@mediatek.com; surenb@google.com;
> kaleshsingh@google.com
> Subject: Re: [PATCH] dma-buf: heaps: Set allocation orders for larger page
> sizes
> 
> On Tue, Apr 15, 2025 at 10:20 AM Juan Yescas <jyescas@google.com> wrote:
> >
> > This change sets the allocation orders for the different page sizes
> > (4k, 16k, 64k) based on PAGE_SHIFT. Before this change, the orders for
> > large page sizes were calculated incorrectly, this caused system heap
> > to allocate from 2% to 4% more memory on 16KiB page size kernels.
> >
> > This change was tested on 4k/16k page size kernels.
> >
> > Signed-off-by: Juan Yescas <jyescas@google.com>
> 
> I think "dma-buf: system_heap:" would be better for the subject since this
> is specific to the system heap.
> 
> Would you mind cleaning up the extra space on line 321 too?
> @@ -318,7 +318,7 @@ static struct page
> *alloc_largest_available(unsigned long size,
>         int i;
> 
>         for (i = 0; i < NUM_ORDERS; i++) {
> -               if (size <  (PAGE_SIZE << orders[i]))
> +               if (size < (PAGE_SIZE << orders[i]))
> 
> With that,
> Reviewed-by: T.J. Mercier <tjmercier@google.com>
> 
> Fixes: d963ab0f15fb ("dma-buf: system_heap: Allocate higher order pages if
> available") is also probably a good idea.
> 


Hi Juan.

Yes. This system_heap change should be changed for 16KB page. Actually,
we may need to check other drivers using page order number. I guess
gpu drivers may be one of them.

> > ---
> >  drivers/dma-buf/heaps/system_heap.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/dma-buf/heaps/system_heap.c
> > b/drivers/dma-buf/heaps/system_heap.c
> > index 26d5dc89ea16..54674c02dcb4 100644
> > --- a/drivers/dma-buf/heaps/system_heap.c
> > +++ b/drivers/dma-buf/heaps/system_heap.c
> > @@ -50,8 +50,15 @@ static gfp_t order_flags[] = {HIGH_ORDER_GFP,
> HIGH_ORDER_GFP, LOW_ORDER_GFP};
> >   * to match with the sizes often found in IOMMUs. Using order 4 pages
> instead
> >   * of order 0 pages can significantly improve the performance of many
> IOMMUs
> >   * by reducing TLB pressure and time spent updating page tables.
> > + *
> > + * Note: When the order is 0, the minimum allocation is PAGE_SIZE.
> > + The possible
> > + * page sizes for ARM devices could be 4K, 16K and 64K.
> >   */
> > -static const unsigned int orders[] = {8, 4, 0};
> > +#define ORDER_1M (20 - PAGE_SHIFT)
> > +#define ORDER_64K (16 - PAGE_SHIFT)
> > +#define ORDER_FOR_PAGE_SIZE (0)
> > +static const unsigned int orders[] = {ORDER_1M, ORDER_64K,
> > +ORDER_FOR_PAGE_SIZE};
> > +
> >  #define NUM_ORDERS ARRAY_SIZE(orders)
> >
> >  static struct sg_table *dup_sg_table(struct sg_table *table)
> > --
> > 2.49.0.604.gff1f9ca942-goog
> >



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes
  2025-04-16  2:28   ` 김재원
@ 2025-04-16  3:58     ` Juan Yescas
  0 siblings, 0 replies; 8+ messages in thread
From: Juan Yescas @ 2025-04-16  3:58 UTC (permalink / raw)
  To: 김재원
  Cc: T.J. Mercier, Sumit Semwal, Benjamin Gaignard, Brian Starkey,
	John Stultz, Christian König, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel, baohua, dmitry.osipenko,
	Guangming.Cao, surenb, kaleshsingh

On Tue, Apr 15, 2025 at 7:28 PM 김재원 <jaewon31.kim@samsung.com> wrote:
>
>
>
> > -----Original Message-----
> > From: T.J. Mercier [mailto:tjmercier@google.com]
> > Sent: Wednesday, April 16, 2025 5:57 AM
> > To: Juan Yescas <jyescas@google.com>
> > Cc: Sumit Semwal <sumit.semwal@linaro.org>; Benjamin Gaignard
> > <benjamin.gaignard@collabora.com>; Brian Starkey <Brian.Starkey@arm.com>;
> > John Stultz <jstultz@google.com>; Christian König
> > <christian.koenig@amd.com>; linux-media@vger.kernel.org; dri-
> > devel@lists.freedesktop.org; linaro-mm-sig@lists.linaro.org; linux-
> > kernel@vger.kernel.org; baohua@kernel.org; dmitry.osipenko@collabora.com;
> > jaewon31.kim@samsung.com; Guangming.Cao@mediatek.com; surenb@google.com;
> > kaleshsingh@google.com
> > Subject: Re: [PATCH] dma-buf: heaps: Set allocation orders for larger page
> > sizes
> >
> > On Tue, Apr 15, 2025 at 10:20 AM Juan Yescas <jyescas@google.com> wrote:
> > >
> > > This change sets the allocation orders for the different page sizes
> > > (4k, 16k, 64k) based on PAGE_SHIFT. Before this change, the orders for
> > > large page sizes were calculated incorrectly, this caused system heap
> > > to allocate from 2% to 4% more memory on 16KiB page size kernels.
> > >
> > > This change was tested on 4k/16k page size kernels.
> > >
> > > Signed-off-by: Juan Yescas <jyescas@google.com>
> >
> > I think "dma-buf: system_heap:" would be better for the subject since this
> > is specific to the system heap.
> >
> > Would you mind cleaning up the extra space on line 321 too?
> > @@ -318,7 +318,7 @@ static struct page
> > *alloc_largest_available(unsigned long size,
> >         int i;
> >
> >         for (i = 0; i < NUM_ORDERS; i++) {
> > -               if (size <  (PAGE_SIZE << orders[i]))
> > +               if (size < (PAGE_SIZE << orders[i]))
> >
> > With that,
> > Reviewed-by: T.J. Mercier <tjmercier@google.com>
> >
> > Fixes: d963ab0f15fb ("dma-buf: system_heap: Allocate higher order pages if
> > available") is also probably a good idea.
> >
>
>
> Hi Juan.
>
> Yes. This system_heap change should be changed for 16KB page. Actually,
> we may need to check other drivers using page order number. I guess
> gpu drivers may be one of them.
>

Thanks Jaewon for pointing it out. We'll take a look at the GPU drivers to make
sure that they are using the proper page order.

> > > ---
> > >  drivers/dma-buf/heaps/system_heap.c | 9 ++++++++-
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/dma-buf/heaps/system_heap.c
> > > b/drivers/dma-buf/heaps/system_heap.c
> > > index 26d5dc89ea16..54674c02dcb4 100644
> > > --- a/drivers/dma-buf/heaps/system_heap.c
> > > +++ b/drivers/dma-buf/heaps/system_heap.c
> > > @@ -50,8 +50,15 @@ static gfp_t order_flags[] = {HIGH_ORDER_GFP,
> > HIGH_ORDER_GFP, LOW_ORDER_GFP};
> > >   * to match with the sizes often found in IOMMUs. Using order 4 pages
> > instead
> > >   * of order 0 pages can significantly improve the performance of many
> > IOMMUs
> > >   * by reducing TLB pressure and time spent updating page tables.
> > > + *
> > > + * Note: When the order is 0, the minimum allocation is PAGE_SIZE.
> > > + The possible
> > > + * page sizes for ARM devices could be 4K, 16K and 64K.
> > >   */
> > > -static const unsigned int orders[] = {8, 4, 0};
> > > +#define ORDER_1M (20 - PAGE_SHIFT)
> > > +#define ORDER_64K (16 - PAGE_SHIFT)
> > > +#define ORDER_FOR_PAGE_SIZE (0)
> > > +static const unsigned int orders[] = {ORDER_1M, ORDER_64K,
> > > +ORDER_FOR_PAGE_SIZE};
> > > +
> > >  #define NUM_ORDERS ARRAY_SIZE(orders)
> > >
> > >  static struct sg_table *dup_sg_table(struct sg_table *table)
> > > --
> > > 2.49.0.604.gff1f9ca942-goog
> > >
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes
  2025-04-15 17:19 [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes Juan Yescas
  2025-04-15 20:14 ` John Stultz
  2025-04-15 20:57 ` T.J. Mercier
@ 2025-04-16 11:34 ` Christian König
  2025-04-16 21:51   ` Juan Yescas
  2 siblings, 1 reply; 8+ messages in thread
From: Christian König @ 2025-04-16 11:34 UTC (permalink / raw)
  To: Juan Yescas, Sumit Semwal, Benjamin Gaignard, Brian Starkey,
	John Stultz, T.J. Mercier, linux-media, dri-devel, linaro-mm-sig,
	linux-kernel
  Cc: baohua, dmitry.osipenko, jaewon31.kim, Guangming.Cao, surenb,
	kaleshsingh



Am 15.04.25 um 19:19 schrieb Juan Yescas:
> This change sets the allocation orders for the different page sizes
> (4k, 16k, 64k) based on PAGE_SHIFT. Before this change, the orders
> for large page sizes were calculated incorrectly, this caused system
> heap to allocate from 2% to 4% more memory on 16KiB page size kernels.
>
> This change was tested on 4k/16k page size kernels.
>
> Signed-off-by: Juan Yescas <jyescas@google.com>
> ---
>  drivers/dma-buf/heaps/system_heap.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index 26d5dc89ea16..54674c02dcb4 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -50,8 +50,15 @@ static gfp_t order_flags[] = {HIGH_ORDER_GFP, HIGH_ORDER_GFP, LOW_ORDER_GFP};
>   * to match with the sizes often found in IOMMUs. Using order 4 pages instead
>   * of order 0 pages can significantly improve the performance of many IOMMUs
>   * by reducing TLB pressure and time spent updating page tables.
> + *
> + * Note: When the order is 0, the minimum allocation is PAGE_SIZE. The possible
> + * page sizes for ARM devices could be 4K, 16K and 64K.
>   */
> -static const unsigned int orders[] = {8, 4, 0};
> +#define ORDER_1M (20 - PAGE_SHIFT)
> +#define ORDER_64K (16 - PAGE_SHIFT)
> +#define ORDER_FOR_PAGE_SIZE (0)
> +static const unsigned int orders[] = {ORDER_1M, ORDER_64K, ORDER_FOR_PAGE_SIZE};
> +#

Good catch, but I think the defines are just overkill.

What you should do instead is to subtract page shift when using the array.

Apart from that using 1M, 64K and then falling back to 4K just sounds random to me. We have especially pushed back on 64K more than once because it is actually not beneficial in almost all cases.

I suggest to fix the code in system_heap_allocate to not over allocate instead and just try the available orders like TTM does. This has proven to be working architecture independent.

Regards,
Christian.

>  #define NUM_ORDERS ARRAY_SIZE(orders)
>  
>  static struct sg_table *dup_sg_table(struct sg_table *table)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes
  2025-04-16 11:34 ` Christian König
@ 2025-04-16 21:51   ` Juan Yescas
       [not found]     ` <21cbda3a-1997-4ac0-ad5d-6e6d447fc11c@amd.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Juan Yescas @ 2025-04-16 21:51 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T.J. Mercier, linux-media, dri-devel, linaro-mm-sig, linux-kernel,
	baohua, dmitry.osipenko, jaewon31.kim, Guangming.Cao, surenb,
	kaleshsingh

On Wed, Apr 16, 2025 at 4:34 AM Christian König
<christian.koenig@amd.com> wrote:
>
>
>
> Am 15.04.25 um 19:19 schrieb Juan Yescas:
> > This change sets the allocation orders for the different page sizes
> > (4k, 16k, 64k) based on PAGE_SHIFT. Before this change, the orders
> > for large page sizes were calculated incorrectly, this caused system
> > heap to allocate from 2% to 4% more memory on 16KiB page size kernels.
> >
> > This change was tested on 4k/16k page size kernels.
> >
> > Signed-off-by: Juan Yescas <jyescas@google.com>
> > ---
> >  drivers/dma-buf/heaps/system_heap.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> > index 26d5dc89ea16..54674c02dcb4 100644
> > --- a/drivers/dma-buf/heaps/system_heap.c
> > +++ b/drivers/dma-buf/heaps/system_heap.c
> > @@ -50,8 +50,15 @@ static gfp_t order_flags[] = {HIGH_ORDER_GFP, HIGH_ORDER_GFP, LOW_ORDER_GFP};
> >   * to match with the sizes often found in IOMMUs. Using order 4 pages instead
> >   * of order 0 pages can significantly improve the performance of many IOMMUs
> >   * by reducing TLB pressure and time spent updating page tables.
> > + *
> > + * Note: When the order is 0, the minimum allocation is PAGE_SIZE. The possible
> > + * page sizes for ARM devices could be 4K, 16K and 64K.
> >   */
> > -static const unsigned int orders[] = {8, 4, 0};
> > +#define ORDER_1M (20 - PAGE_SHIFT)
> > +#define ORDER_64K (16 - PAGE_SHIFT)
> > +#define ORDER_FOR_PAGE_SIZE (0)
> > +static const unsigned int orders[] = {ORDER_1M, ORDER_64K, ORDER_FOR_PAGE_SIZE};
> > +#
>
> Good catch, but I think the defines are just overkill.
>
> What you should do instead is to subtract page shift when using the array.
>

There are several occurrences of orders in the file and I think it is
better to do the calculations up front in the array. Would you be ok
if we get rid of the defines as per your suggestion and make the
calculations during the definition of the array. Something like this:

static const unsigned int orders[] = {20 - PAGE_SHIFT, 16 - PAGE_SHIFT, 0};

> Apart from that using 1M, 64K and then falling back to 4K just sounds random to me. We have especially pushed back on 64K more than once because it is actually not beneficial in almost all cases.
>

In the hardware where the driver is used, the 64K is beneficial for:

Arm SMMUv3 (4KB granule size):
           64KB (contiguous bit groups 16 4KB pages)

SysMMU benefits from 64KB (“large” page) on 4k/16k page sizes.

> I suggest to fix the code in system_heap_allocate to not over allocate instead and just try the available orders like TTM does. This has proven to be working architecture independent.
>

Do you mean to have an implementation similar to __ttm_pool_alloc()?

https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/ttm/ttm_pool.c?h=v6.15-rc2#n728

If that is the case, we can try the change, run some benchmarks and
submit the patch if we see positive results.

Thanks
Juan

> Regards,
> Christian.
>
> >  #define NUM_ORDERS ARRAY_SIZE(orders)
> >
> >  static struct sg_table *dup_sg_table(struct sg_table *table)
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes
       [not found]     ` <21cbda3a-1997-4ac0-ad5d-6e6d447fc11c@amd.com>
@ 2025-04-17 20:30       ` Juan Yescas
  0 siblings, 0 replies; 8+ messages in thread
From: Juan Yescas @ 2025-04-17 20:30 UTC (permalink / raw)
  To: Christian König
  Cc: Sumit Semwal, Benjamin Gaignard, Brian Starkey, John Stultz,
	T.J. Mercier, linux-media, dri-devel, linaro-mm-sig, linux-kernel,
	baohua, dmitry.osipenko, jaewon31.kim, Guangming.Cao, surenb,
	kaleshsingh

On Thu, Apr 17, 2025 at 1:06 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 16.04.25 um 23:51 schrieb Juan Yescas:
>
> On Wed, Apr 16, 2025 at 4:34 AM Christian König
> <christian.koenig@amd.com> wrote:
>
>
> Am 15.04.25 um 19:19 schrieb Juan Yescas:
>
> This change sets the allocation orders for the different page sizes
> (4k, 16k, 64k) based on PAGE_SHIFT. Before this change, the orders
> for large page sizes were calculated incorrectly, this caused system
> heap to allocate from 2% to 4% more memory on 16KiB page size kernels.
>
> This change was tested on 4k/16k page size kernels.
>
> Signed-off-by: Juan Yescas <jyescas@google.com>
> ---
>  drivers/dma-buf/heaps/system_heap.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
> index 26d5dc89ea16..54674c02dcb4 100644
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -50,8 +50,15 @@ static gfp_t order_flags[] = {HIGH_ORDER_GFP, HIGH_ORDER_GFP, LOW_ORDER_GFP};
>   * to match with the sizes often found in IOMMUs. Using order 4 pages instead
>   * of order 0 pages can significantly improve the performance of many IOMMUs
>   * by reducing TLB pressure and time spent updating page tables.
> + *
> + * Note: When the order is 0, the minimum allocation is PAGE_SIZE. The possible
> + * page sizes for ARM devices could be 4K, 16K and 64K.
>   */
> -static const unsigned int orders[] = {8, 4, 0};
> +#define ORDER_1M (20 - PAGE_SHIFT)
> +#define ORDER_64K (16 - PAGE_SHIFT)
> +#define ORDER_FOR_PAGE_SIZE (0)
> +static const unsigned int orders[] = {ORDER_1M, ORDER_64K, ORDER_FOR_PAGE_SIZE};
> +#
>
> Good catch, but I think the defines are just overkill.
>
> What you should do instead is to subtract page shift when using the array.
>
> There are several occurrences of orders in the file and I think it is
> better to do the calculations up front in the array. Would you be ok
> if we get rid of the defines as per your suggestion and make the
> calculations during the definition of the array. Something like this:
>
> static const unsigned int orders[] = {20 - PAGE_SHIFT, 16 - PAGE_SHIFT, 0};
>
>
> Yeah that totally works for me as well. Just make sure that a code comment nearby explains why 20, 16 and 0.
>

Thanks, the changes were added in the [PATCH v3].

> Apart from that using 1M, 64K and then falling back to 4K just sounds random to me. We have especially pushed back on 64K more than once because it is actually not beneficial in almost all cases.
>
> In the hardware where the driver is used, the 64K is beneficial for:
>
> Arm SMMUv3 (4KB granule size):
>            64KB (contiguous bit groups 16 4KB pages)
>
> SysMMU benefits from 64KB (“large” page) on 4k/16k page sizes.
>
>
> Question could this HW also work with pages larger than 64K? (I strongly expect yes).
>

Yes, if the page sizes in the SMMUv3 are:
      - 4k, we can have 2MiB pages
     - 16k, we can have 32MiB pages

> I suggest to fix the code in system_heap_allocate to not over allocate instead and just try the available orders like TTM does. This has proven to be working architecture independent.
>
> Do you mean to have an implementation similar to __ttm_pool_alloc()?
>
>
> Yeah something like that.
>
> https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/ttm/ttm_pool.c?h=v6.15-rc2#n728
>
> If that is the case, we can try the change, run some benchmarks and
> submit the patch if we see positive results.
>
>
> Could be that this doesn't matter for your platform, but it's minimal extra overhead asking for larger chunks first and it just avoids fragmenting everything into 64K chunks.
>
> That is kind of important since the code in DMA-heaps should be platform neutral if possible.

I agree, I'll make the change in another patch.

Thanks
Juan

>
> Regards,
> Christian.
>
>
> Thanks
> Juan
>
> Regards,
> Christian.
>
>  #define NUM_ORDERS ARRAY_SIZE(orders)
>
>  static struct sg_table *dup_sg_table(struct sg_table *table)
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-04-17 20:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-15 17:19 [PATCH] dma-buf: heaps: Set allocation orders for larger page sizes Juan Yescas
2025-04-15 20:14 ` John Stultz
2025-04-15 20:57 ` T.J. Mercier
2025-04-16  2:28   ` 김재원
2025-04-16  3:58     ` Juan Yescas
2025-04-16 11:34 ` Christian König
2025-04-16 21:51   ` Juan Yescas
     [not found]     ` <21cbda3a-1997-4ac0-ad5d-6e6d447fc11c@amd.com>
2025-04-17 20:30       ` Juan Yescas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox