Re: [PATCH] dma-buf: system_heap: Optimize sg_table-to-pages conversion in vmap

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Christian König" <christian.koenig@amd.com>
To: Barry Song <baohua@kernel.org>
Cc: linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
	Xueyuan Chen <Xueyuan.chen21@gmail.com>,
	Sumit Semwal <sumit.semwal@linaro.org>,
	Benjamin Gaignard <benjamin.gaignard@collabora.com>,
	Brian Starkey <Brian.Starkey@arm.com>,
	John Stultz <jstultz@google.com>,
	"T . J . Mercier" <tjmercier@google.com>
Subject: Re: [PATCH] dma-buf: system_heap: Optimize sg_table-to-pages conversion in vmap
Date: Wed, 22 Apr 2026 09:10:03 +0200	[thread overview]
Message-ID: <9034246e-3971-4fad-94b7-80f4ad0e29db@amd.com> (raw)
In-Reply-To: <CAGsJ_4xquCzQLbvpkC8arAN=9KhrAkdPdxnB=v85GvvQu23Xyg@mail.gmail.com>

On 4/7/26 13:29, Barry Song wrote:
> On Tue, Apr 7, 2026 at 3:58 PM Christian König <christian.koenig@amd.com> wrote:
>>
>> On 4/6/26 23:49, Barry Song (Xiaomi) wrote:
>>> From: Xueyuan Chen <Xueyuan.chen21@gmail.com>
>>>
>>> Replace the heavy for_each_sgtable_page() iterator in system_heap_do_vmap()
>>> with a more efficient nested loop approach.
>>>
>>> Instead of iterating page by page, we now iterate through the scatterlist
>>> entries via for_each_sgtable_sg(). Because pages within a single sg entry
>>> are physically contiguous, we can populate the page array with a in an
>>> inner loop using simple pointer math. This save a lot of time.
>>>
>>> The WARN_ON check is also pulled out of the loop to save branch
>>> instructions.
>>>
>>> Performance results mapping a 2GB buffer on Radxa O6:
>>> - Before: ~1440000 ns
>>> - After:  ~232000 ns
>>> (~84% reduction in iteration time, or ~6.2x faster)
>>
>> Well real question is why do you care about the vmap performance?
>>
>> That should basically only be used for fbdev emulation (except for VMGFX) and we absolutely don't care about performance there.
> 
> I agree that in mainline, dma_buf_vmap is not used very often.
> Here’s what I was able to find:
> 
>   1   1638  drivers/dma-buf/dma-buf.c <<dma_buf_vmap_unlocked>>
>              ret = dma_buf_vmap(dmabuf, map);
>    2    376  drivers/gpu/drm/drm_gem_shmem_helper.c
> <<drm_gem_shmem_vmap_locked>>
>              ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
>    3     85  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
> <<etnaviv_gem_prime_vmap_impl>>
>              ret = dma_buf_vmap(etnaviv_obj->base.import_attach->dmabuf, &map);
>    4    433  drivers/gpu/drm/vmwgfx/vmwgfx_blit.c <<map_external>>
>              ret = dma_buf_vmap(bo->tbo.base.dma_buf, map);
>    5     88  drivers/gpu/drm/vmwgfx/vmwgfx_gem.c <<vmw_gem_vmap>>
>              ret = dma_buf_vmap(obj->import_attach->dmabuf, map);
> 
> However, in the Android ecosystem, system_heap and similar heaps
> are widely used across camera, NPU, and media drivers. Many of these
> drivers are not in mainline but do use vmap() in real code paths.

Well out of tree drivers are not a justification to make an upstream changes.

Apart from a handful of workarounds which need to CPU access as fallback DMA-buf vmap is only used to provide fb dev emulation.

The vmap interface has already given us quite a headache in the first place and there are a couple of unresolved problems regarding synchronization and coherency. 

When a driver would be pushed upstream which makes so frequent use of the dma_buf_vmap function that it matters for the performance I think there would be push back on that and the driver developer would require a very good explanation why that is necessary.

So for now I have to reject that patch.

Regards,
Christian.

> 
> As I can show you some of them from MTK platforms:
> 
> 1:
> [    6.689849] system_heap_vmap+0x17c/0x254 [system_heap
> 8d35d4ce35bb30d8a623f0b9863998a2528e4175]
> [    6.689859] dma_buf_vmap_unlocked+0xb8/0x130
> [    6.689861] aov_core_init+0x310/0x718 [mtk_aov
> 96e2e5e9457dcdacce3a7629b0600c5dbeca623b]
> [    6.689873] mtk_aov_probe+0x434/0x5b4 [mtk_aov
> 96e2e5e9457dcdacce3a7629b0600c5dbeca623b]
> 
> 2:
> [  116.181643] __vmap_pages_range_noflush+0x7c4/0x814
> [  116.181645] vmap+0xb4/0x148
> [  116.181647] system_heap_vmap+0x17c/0x254 [system_heap
> 8d35d4ce35bb30d8a623f0b9863998a2528e4175]
> [  116.181651] dma_buf_vmap_unlocked+0xb8/0x130
> [  116.181653] mtk_cam_vb2_vaddr+0xa0/0xfc [mtk_cam_isp8s
> 0cf9be6c773a8f14aab9db9ebf53feacb499846a]
> [  116.181682] vb2_plane_vaddr+0x5c/0x78
> [  116.181684] mtk_cam_job_fill_ipi_frame+0xa8c/0x128c [mtk_cam_isp8s
> 0cf9be6c773a8f14aab9db9ebf53feacb499846a]
> 
> 3:
> [  116.306178] __vmap_pages_range_noflush+0x7c4/0x814
> [  116.306183] vmap+0xb4/0x148
> [  116.306187] system_heap_vmap+0x17c/0x254 [system_heap
> 8d35d4ce35bb30d8a623f0b9863998a2528e4175]
> [  116.306209] dma_buf_vmap_unlocked+0xb8/0x130
> [  116.306212] apu_sysmem_alloc+0x168/0x360 [apusys
> 8fb33cbce3b858d651b9da26fc370090a67cfb70]
> [  116.306468] mdw_mem_alloc+0xd8/0x314 [apusys
> 8fb33cbce3b858d651b9da26fc370090a67cfb70]
> [  116.306591] mdw_mem_pool_chunk_add+0x11c/0x400 [apusys
> 8fb33cbce3b858d651b9da26fc370090a67cfb70]
> [  116.306712] mdw_mem_pool_create+0x190/0x2c8 [apusys
> 8fb33cbce3b858d651b9da26fc370090a67cfb70]
> [  116.306833] mdw_drv_open+0x21c/0x47c [apusys
> 8fb33cbce3b858d651b9da26fc370090a67cfb70]
> 
> While we may want to encourage more of these drivers to upstream,
> some aspects are beyond our control (different SoC vendors), but we
> can at least contribute upstream ourselves.
> 
> Best Regards
> Barry

next prev parent reply	other threads:[~2026-04-22  7:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06 21:49 [PATCH] dma-buf: system_heap: Optimize sg_table-to-pages conversion in vmap Barry Song (Xiaomi)
2026-04-07  7:57 ` Christian König
2026-04-07 11:29   ` Barry Song
2026-04-22  7:10     ` Christian König [this message]
2026-05-01  4:15       ` Barry Song
2026-05-01 15:54         ` T.J. Mercier
2026-05-04  7:49           ` Christian König
2026-05-05 14:44             ` T.J. Mercier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9034246e-3971-4fad-94b7-80f4ad0e29db@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Brian.Starkey@arm.com \
    --cc=Xueyuan.chen21@gmail.com \
    --cc=baohua@kernel.org \
    --cc=benjamin.gaignard@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jstultz@google.com \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tjmercier@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox