* [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count
@ 2026-03-08 13:55 Julian Orth
2026-03-08 16:21 ` Julian Orth
2026-03-08 18:08 ` Jason Gunthorpe
0 siblings, 2 replies; 9+ messages in thread
From: Julian Orth @ 2026-03-08 13:55 UTC (permalink / raw)
To: Andrew Morton, Imre Deak, Sakari Ailus, Jason Gunthorpe,
Thomas Hellstrom
Cc: linux-kernel, Julian Orth
A user reported memory corruption in the Jay wayland compositor [1]. The
corruption started when archlinux enabled
CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_WITHIN_SIZE in kernel 6.19.5.
The compositor uses udmabuf to upload memory from memfds to the GPU.
When running an affected kernel, the following warnings are logged:
a - addrs >= max_entries
WARNING: drivers/gpu/drm/drm_prime.c:1089 at drm_prime_sg_to_dma_addr_array+0x86/0xc0, CPU#31: jay/1864
[...]
Call Trace:
<TASK>
amdgpu_bo_move+0x188/0x800 [amdgpu 3b451640234948027c09e9b39e6520bc7e5471cf]
Disabling the use of huge pages at runtime via
/sys/kernel/mm/transparent_hugepage/shmem_enabled fixes the issue.
udmabuf allocates a scatterlist with buffer_size/PAGE_SIZE entries. Each
entry has a length of PAGE_SIZE. With huge pages disabled, it appears
that sg->offset is always 0. With huge pages enabled, sg->offset is
incremented by PAGE_SIZE until the end of the huge page.
With the code before this patch, this causes __sg_page_iter_dma_next to
iterate 1 + 2 + 3 + ... + 512 times over a single huge page instead of
512 times.
This effect can be seen in the screenshot provided by the user where
parts of the image are repeated and with each repetition the base offset
shifts by one page and the size of the repeated data grows by one page.
[1]: https://github.com/mahkoh/jay/issues/779
Fixes: a321e91b6d73 ("lib/scatterlist: add simple page iterator")
Fixes: d901b2760dc6 ("lib/scatterlist: Provide a DMA page iterator")
Signed-off-by: Julian Orth <ju.orth@gmail.com>
---
I have not verified if this negatively affects any other users of the
iterator interface. In particular, if sg->offset is allowed to not be
page aligned.
The use of sg->offset in these functions looks suspect and removing it
fixes the issue. I have not looked further than that.
---
lib/scatterlist.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index d773720d11bf..001f33ec4e49 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -738,7 +738,7 @@ EXPORT_SYMBOL(__sg_page_iter_start);
static int sg_page_count(struct scatterlist *sg)
{
- return PAGE_ALIGN(sg->offset + sg->length) >> PAGE_SHIFT;
+ return PAGE_ALIGN(sg->length) >> PAGE_SHIFT;
}
bool __sg_page_iter_next(struct sg_page_iter *piter)
@@ -762,7 +762,7 @@ EXPORT_SYMBOL(__sg_page_iter_next);
static int sg_dma_page_count(struct scatterlist *sg)
{
- return PAGE_ALIGN(sg->offset + sg_dma_len(sg)) >> PAGE_SHIFT;
+ return PAGE_ALIGN(sg_dma_len(sg)) >> PAGE_SHIFT;
}
bool __sg_page_iter_dma_next(struct sg_dma_page_iter *dma_iter)
---
base-commit: fb07430e6f98ccff61f6f1a06d01d7f12e29c6d3
change-id: 20260308-scatterlist-8a06ca446bc1
Best regards,
--
Julian Orth <ju.orth@gmail.com>
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count 2026-03-08 13:55 [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count Julian Orth @ 2026-03-08 16:21 ` Julian Orth 2026-03-08 18:08 ` Jason Gunthorpe 1 sibling, 0 replies; 9+ messages in thread From: Julian Orth @ 2026-03-08 16:21 UTC (permalink / raw) To: Andrew Morton, Imre Deak, Sakari Ailus, Jason Gunthorpe, Thomas Hellstrom Cc: linux-kernel I found an easy way to reproduce the corruption this which should work on any compositor: gst-launch-1.0 videotestsrc ! glupload ! waylandsink And then switching between `always` and `never` in sysfs. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count 2026-03-08 13:55 [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count Julian Orth 2026-03-08 16:21 ` Julian Orth @ 2026-03-08 18:08 ` Jason Gunthorpe 2026-03-08 18:41 ` Julian Orth 1 sibling, 1 reply; 9+ messages in thread From: Jason Gunthorpe @ 2026-03-08 18:08 UTC (permalink / raw) To: Julian Orth Cc: Andrew Morton, Imre Deak, Sakari Ailus, Thomas Hellstrom, linux-kernel On Sun, Mar 08, 2026 at 02:55:27PM +0100, Julian Orth wrote: > A user reported memory corruption in the Jay wayland compositor [1]. The > corruption started when archlinux enabled > CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_WITHIN_SIZE in kernel 6.19.5. > > The compositor uses udmabuf to upload memory from memfds to the GPU. > When running an affected kernel, the following warnings are logged: > > a - addrs >= max_entries > WARNING: drivers/gpu/drm/drm_prime.c:1089 at drm_prime_sg_to_dma_addr_array+0x86/0xc0, CPU#31: jay/1864 > [...] > Call Trace: > <TASK> > amdgpu_bo_move+0x188/0x800 [amdgpu 3b451640234948027c09e9b39e6520bc7e5471cf] > > Disabling the use of huge pages at runtime via > /sys/kernel/mm/transparent_hugepage/shmem_enabled fixes the issue. > > udmabuf allocates a scatterlist with buffer_size/PAGE_SIZE entries. Each > entry has a length of PAGE_SIZE. With huge pages disabled, it appears > that sg->offset is always 0. With huge pages enabled, sg->offset is > incremented by PAGE_SIZE until the end of the huge page. This was broken by 0c8b91ef5100 ("udmabuf: add back support for mapping hugetlb pages") which switched from a working sg_alloc_table_from_pages() to a messed up sg_set_pages loop: + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]); [..] + ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; Which is just the wrong way to use the scatterlist API. This was later changed to sg_set_folio() which I'm also suspecting has a bug, it should be setting page_link to the proper tail page because as you observe page_offset must fall within 0 to PAGE_SIZE-1 to make the iterator work. I think the whole design here in udmabuf makes very little sense. It starts out with an actual list of folios then expands them to a per-4K double array of folio/offset. This is nonsensical, if it wants to build a way to direct index the mapping for mmap it should just build itself a page * array like the code used to do and continue to use sg_alloc_table_from_pages() which builds properly formed scatterlists. This would save memory, use the APIs properly and build a correct and optimized scatterlist to boot. It uses vmf_insert_pfn() and vm_map_ram() anyhow so it doesn't even use a folio :\ Here, a few mins of AI shows what I think udmabuf should look like. If you wish to persue this please add my signed-off-by and handle testing it and getting it merged. I reviewed it enough to see it was showing what I wanted. Jason diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 94b8ecb892bb17..5d687860445137 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -26,10 +26,10 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a dmabuf, in megabytes. Default is struct udmabuf { pgoff_t pagecount; - struct folio **folios; + struct page **pages; /** - * Unlike folios, pinned_folios is only used for unpin. + * Unlike pages, pinned_folios is only used for unpin. * So, nr_pinned is not the same to pagecount, the pinned_folios * only set each folio which already pinned when udmabuf_create. * Note that, since a folio may be pinned multiple times, each folio @@ -41,7 +41,6 @@ struct udmabuf { struct sg_table *sg; struct miscdevice *device; - pgoff_t *offsets; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -55,8 +54,7 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) if (pgoff >= ubuf->pagecount) return VM_FAULT_SIGBUS; - pfn = folio_pfn(ubuf->folios[pgoff]); - pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; + pfn = page_to_pfn(ubuf->pages[pgoff]); ret = vmf_insert_pfn(vma, vmf->address, pfn); if (ret & VM_FAULT_ERROR) @@ -73,8 +71,7 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) if (WARN_ON(pgoff >= ubuf->pagecount)) break; - pfn = folio_pfn(ubuf->folios[pgoff]); - pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; + pfn = page_to_pfn(ubuf->pages[pgoff]); /** * If the below vmf_insert_pfn() fails, we do not return an @@ -109,22 +106,11 @@ static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map) { struct udmabuf *ubuf = buf->priv; - struct page **pages; void *vaddr; - pgoff_t pg; dma_resv_assert_held(buf->resv); - pages = kvmalloc_objs(*pages, ubuf->pagecount); - if (!pages) - return -ENOMEM; - - for (pg = 0; pg < ubuf->pagecount; pg++) - pages[pg] = folio_page(ubuf->folios[pg], - ubuf->offsets[pg] >> PAGE_SHIFT); - - vaddr = vm_map_ram(pages, ubuf->pagecount, -1); - kvfree(pages); + vaddr = vm_map_ram(ubuf->pages, ubuf->pagecount, -1); if (!vaddr) return -EINVAL; @@ -146,22 +132,18 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, { struct udmabuf *ubuf = buf->priv; struct sg_table *sg; - struct scatterlist *sgl; - unsigned int i = 0; int ret; sg = kzalloc_obj(*sg); if (!sg) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); + ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, 0, + ubuf->pagecount << PAGE_SHIFT, + GFP_KERNEL); if (ret < 0) goto err_alloc; - for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) - sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE, - ubuf->offsets[i]); - ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) goto err_map; @@ -207,12 +189,8 @@ static void unpin_all_folios(struct udmabuf *ubuf) static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t pgcnt) { - ubuf->folios = kvmalloc_objs(*ubuf->folios, pgcnt); - if (!ubuf->folios) - return -ENOMEM; - - ubuf->offsets = kvzalloc_objs(*ubuf->offsets, pgcnt); - if (!ubuf->offsets) + ubuf->pages = kvmalloc_objs(*ubuf->pages, pgcnt); + if (!ubuf->pages) return -ENOMEM; ubuf->pinned_folios = kvmalloc_objs(*ubuf->pinned_folios, pgcnt); @@ -225,8 +203,7 @@ static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t pgcnt) static __always_inline void deinit_udmabuf(struct udmabuf *ubuf) { unpin_all_folios(ubuf); - kvfree(ubuf->offsets); - kvfree(ubuf->folios); + kvfree(ubuf->pages); } static void release_udmabuf(struct dma_buf *buf) @@ -344,8 +321,8 @@ static long udmabuf_pin_folios(struct udmabuf *ubuf, struct file *memfd, ubuf->pinned_folios[nr_pinned++] = folios[cur_folio]; for (; subpgoff < fsize; subpgoff += PAGE_SIZE) { - ubuf->folios[upgcnt] = folios[cur_folio]; - ubuf->offsets[upgcnt] = subpgoff; + ubuf->pages[upgcnt] = folio_page(folios[cur_folio], + subpgoff >> PAGE_SHIFT); ++upgcnt; if (++cur_pgcnt >= pgcnt) ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count 2026-03-08 18:08 ` Jason Gunthorpe @ 2026-03-08 18:41 ` Julian Orth 2026-03-10 5:49 ` Kasireddy, Vivek 0 siblings, 1 reply; 9+ messages in thread From: Julian Orth @ 2026-03-08 18:41 UTC (permalink / raw) To: Jason Gunthorpe Cc: Andrew Morton, Imre Deak, Sakari Ailus, Thomas Hellstrom, linux-kernel, Gerd Hoffmann, Vivek Kasireddy, Sumit Semwal, Christian König, dri-devel, linux-media, linaro-mm-sig On Sun, Mar 8, 2026 at 7:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Sun, Mar 08, 2026 at 02:55:27PM +0100, Julian Orth wrote: > > A user reported memory corruption in the Jay wayland compositor [1]. The > > corruption started when archlinux enabled > > CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_WITHIN_SIZE in kernel 6.19.5. > > > > The compositor uses udmabuf to upload memory from memfds to the GPU. > > When running an affected kernel, the following warnings are logged: > > > > a - addrs >= max_entries > > WARNING: drivers/gpu/drm/drm_prime.c:1089 at drm_prime_sg_to_dma_addr_array+0x86/0xc0, CPU#31: jay/1864 > > [...] > > Call Trace: > > <TASK> > > amdgpu_bo_move+0x188/0x800 [amdgpu 3b451640234948027c09e9b39e6520bc7e5471cf] > > > > Disabling the use of huge pages at runtime via > > /sys/kernel/mm/transparent_hugepage/shmem_enabled fixes the issue. > > > > udmabuf allocates a scatterlist with buffer_size/PAGE_SIZE entries. Each > > entry has a length of PAGE_SIZE. With huge pages disabled, it appears > > that sg->offset is always 0. With huge pages enabled, sg->offset is > > incremented by PAGE_SIZE until the end of the huge page. > > This was broken by 0c8b91ef5100 ("udmabuf: add back support for > mapping hugetlb pages") which switched from a working > sg_alloc_table_from_pages() to a messed up sg_set_pages loop: > > + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) > + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]); > [..] > + ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; > > Which is just the wrong way to use the scatterlist API. > > This was later changed to sg_set_folio() which I'm also suspecting has > a bug, it should be setting page_link to the proper tail page because > as you observe page_offset must fall within 0 to PAGE_SIZE-1 to make > the iterator work. > > I think the whole design here in udmabuf makes very little sense. It > starts out with an actual list of folios then expands them to a per-4K > double array of folio/offset. This is nonsensical, if it wants to > build a way to direct index the mapping for mmap it should just build > itself a page * array like the code used to do and continue to use > sg_alloc_table_from_pages() which builds properly formed scatterlists. > > This would save memory, use the APIs properly and build a correct and > optimized scatterlist to boot. It uses vmf_insert_pfn() and > vm_map_ram() anyhow so it doesn't even use a folio :\ > > Here, a few mins of AI shows what I think udmabuf should look like. If > you wish to persue this please add my signed-off-by and handle testing > it and getting it merged. I reviewed it enough to see it was showing > what I wanted. I don't know enough about folios or udmabuf to efficiently work on this. If offset is supposed to be in [0, PAGE_SIZE-1], then my patch is incorrect and it's probably better if some of the udmabuf maintainers take a look at this. I've added them to CC. > > Jason > > diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c > index 94b8ecb892bb17..5d687860445137 100644 > --- a/drivers/dma-buf/udmabuf.c > +++ b/drivers/dma-buf/udmabuf.c > @@ -26,10 +26,10 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a dmabuf, in megabytes. Default is > > struct udmabuf { > pgoff_t pagecount; > - struct folio **folios; > + struct page **pages; > > /** > - * Unlike folios, pinned_folios is only used for unpin. > + * Unlike pages, pinned_folios is only used for unpin. > * So, nr_pinned is not the same to pagecount, the pinned_folios > * only set each folio which already pinned when udmabuf_create. > * Note that, since a folio may be pinned multiple times, each folio > @@ -41,7 +41,6 @@ struct udmabuf { > > struct sg_table *sg; > struct miscdevice *device; > - pgoff_t *offsets; > }; > > static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) > @@ -55,8 +54,7 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) > if (pgoff >= ubuf->pagecount) > return VM_FAULT_SIGBUS; > > - pfn = folio_pfn(ubuf->folios[pgoff]); > - pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; > + pfn = page_to_pfn(ubuf->pages[pgoff]); > > ret = vmf_insert_pfn(vma, vmf->address, pfn); > if (ret & VM_FAULT_ERROR) > @@ -73,8 +71,7 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) > if (WARN_ON(pgoff >= ubuf->pagecount)) > break; > > - pfn = folio_pfn(ubuf->folios[pgoff]); > - pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; > + pfn = page_to_pfn(ubuf->pages[pgoff]); > > /** > * If the below vmf_insert_pfn() fails, we do not return an > @@ -109,22 +106,11 @@ static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) > static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map) > { > struct udmabuf *ubuf = buf->priv; > - struct page **pages; > void *vaddr; > - pgoff_t pg; > > dma_resv_assert_held(buf->resv); > > - pages = kvmalloc_objs(*pages, ubuf->pagecount); > - if (!pages) > - return -ENOMEM; > - > - for (pg = 0; pg < ubuf->pagecount; pg++) > - pages[pg] = folio_page(ubuf->folios[pg], > - ubuf->offsets[pg] >> PAGE_SHIFT); > - > - vaddr = vm_map_ram(pages, ubuf->pagecount, -1); > - kvfree(pages); > + vaddr = vm_map_ram(ubuf->pages, ubuf->pagecount, -1); > if (!vaddr) > return -EINVAL; > > @@ -146,22 +132,18 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, > { > struct udmabuf *ubuf = buf->priv; > struct sg_table *sg; > - struct scatterlist *sgl; > - unsigned int i = 0; > int ret; > > sg = kzalloc_obj(*sg); > if (!sg) > return ERR_PTR(-ENOMEM); > > - ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); > + ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, 0, > + ubuf->pagecount << PAGE_SHIFT, > + GFP_KERNEL); > if (ret < 0) > goto err_alloc; > > - for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) > - sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE, > - ubuf->offsets[i]); > - > ret = dma_map_sgtable(dev, sg, direction, 0); > if (ret < 0) > goto err_map; > @@ -207,12 +189,8 @@ static void unpin_all_folios(struct udmabuf *ubuf) > > static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t pgcnt) > { > - ubuf->folios = kvmalloc_objs(*ubuf->folios, pgcnt); > - if (!ubuf->folios) > - return -ENOMEM; > - > - ubuf->offsets = kvzalloc_objs(*ubuf->offsets, pgcnt); > - if (!ubuf->offsets) > + ubuf->pages = kvmalloc_objs(*ubuf->pages, pgcnt); > + if (!ubuf->pages) > return -ENOMEM; > > ubuf->pinned_folios = kvmalloc_objs(*ubuf->pinned_folios, pgcnt); > @@ -225,8 +203,7 @@ static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t pgcnt) > static __always_inline void deinit_udmabuf(struct udmabuf *ubuf) > { > unpin_all_folios(ubuf); > - kvfree(ubuf->offsets); > - kvfree(ubuf->folios); > + kvfree(ubuf->pages); > } > > static void release_udmabuf(struct dma_buf *buf) > @@ -344,8 +321,8 @@ static long udmabuf_pin_folios(struct udmabuf *ubuf, struct file *memfd, > ubuf->pinned_folios[nr_pinned++] = folios[cur_folio]; > > for (; subpgoff < fsize; subpgoff += PAGE_SIZE) { > - ubuf->folios[upgcnt] = folios[cur_folio]; > - ubuf->offsets[upgcnt] = subpgoff; > + ubuf->pages[upgcnt] = folio_page(folios[cur_folio], > + subpgoff >> PAGE_SHIFT); > ++upgcnt; > > if (++cur_pgcnt >= pgcnt) > ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count 2026-03-08 18:41 ` Julian Orth @ 2026-03-10 5:49 ` Kasireddy, Vivek 2026-03-10 12:59 ` Jason Gunthorpe 0 siblings, 1 reply; 9+ messages in thread From: Kasireddy, Vivek @ 2026-03-10 5:49 UTC (permalink / raw) To: Julian Orth, Jason Gunthorpe Cc: Andrew Morton, Deak, Imre, Sakari Ailus, Thomas Hellstrom, linux-kernel@vger.kernel.org, Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org Hi Jason, Julian, > Subject: Re: [PATCH] lib/scatterlist: fix sg_page_count and > sg_dma_page_count > > On Sun, Mar 8, 2026 at 7:08 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > > > On Sun, Mar 08, 2026 at 02:55:27PM +0100, Julian Orth wrote: > > > A user reported memory corruption in the Jay wayland compositor [1]. > The > > > corruption started when archlinux enabled > > > CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_WITHIN_SIZE in > kernel 6.19.5. > > > > > > The compositor uses udmabuf to upload memory from memfds to the > GPU. > > > When running an affected kernel, the following warnings are logged: > > > > > > a - addrs >= max_entries > > > WARNING: drivers/gpu/drm/drm_prime.c:1089 at > drm_prime_sg_to_dma_addr_array+0x86/0xc0, CPU#31: jay/1864 > > > [...] > > > Call Trace: > > > <TASK> > > > amdgpu_bo_move+0x188/0x800 [amdgpu > 3b451640234948027c09e9b39e6520bc7e5471cf] > > > > > > Disabling the use of huge pages at runtime via > > > /sys/kernel/mm/transparent_hugepage/shmem_enabled fixes the > issue. > > > > > > udmabuf allocates a scatterlist with buffer_size/PAGE_SIZE entries. > Each > > > entry has a length of PAGE_SIZE. With huge pages disabled, it appears > > > that sg->offset is always 0. With huge pages enabled, sg->offset is > > > incremented by PAGE_SIZE until the end of the huge page. > > > > This was broken by 0c8b91ef5100 ("udmabuf: add back support for > > mapping hugetlb pages") which switched from a working > > sg_alloc_table_from_pages() to a messed up sg_set_pages loop: > > > > + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) > > + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]); > > [..] > > + ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; > > > > Which is just the wrong way to use the scatterlist API. > > > > This was later changed to sg_set_folio() which I'm also suspecting has > > a bug, it should be setting page_link to the proper tail page because > > as you observe page_offset must fall within 0 to PAGE_SIZE-1 to make > > the iterator work. > > > > I think the whole design here in udmabuf makes very little sense. It > > starts out with an actual list of folios then expands them to a per-4K > > double array of folio/offset. This is nonsensical, if it wants to > > build a way to direct index the mapping for mmap it should just build > > itself a page * array like the code used to do and continue to use > > sg_alloc_table_from_pages() which builds properly formed scatterlists. There are a couple of reasons why we got rid of the pages array: - Back then, there was some confusion about whether a struct page would exist or not for tail pages when HVO is enabled. Regardless, there was also a concern about exposing tail pages outside hugetlb code. - And, we also wanted to prepare for a future where struct page would not exist anymore, so, it made sense to just use folios only. I am not sure if these concerns are valid anymore. If they are not, I am OK with Jason's patch below that brings back the pages array. Thanks, Vivek > > > > This would save memory, use the APIs properly and build a correct and > > optimized scatterlist to boot. It uses vmf_insert_pfn() and > > vm_map_ram() anyhow so it doesn't even use a folio :\ > > > > Here, a few mins of AI shows what I think udmabuf should look like. If > > you wish to persue this please add my signed-off-by and handle testing > > it and getting it merged. I reviewed it enough to see it was showing > > what I wanted. > > I don't know enough about folios or udmabuf to efficiently work on this. > > If offset is supposed to be in [0, PAGE_SIZE-1], then my patch is > incorrect and it's probably better if some of the udmabuf maintainers > take a look at this. I've added them to CC. > > > > > Jason > > > > diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c > > index 94b8ecb892bb17..5d687860445137 100644 > > --- a/drivers/dma-buf/udmabuf.c > > +++ b/drivers/dma-buf/udmabuf.c > > @@ -26,10 +26,10 @@ MODULE_PARM_DESC(size_limit_mb, "Max size > of a dmabuf, in megabytes. Default is > > > > struct udmabuf { > > pgoff_t pagecount; > > - struct folio **folios; > > + struct page **pages; > > > > /** > > - * Unlike folios, pinned_folios is only used for unpin. > > + * Unlike pages, pinned_folios is only used for unpin. > > * So, nr_pinned is not the same to pagecount, the pinned_folios > > * only set each folio which already pinned when udmabuf_create. > > * Note that, since a folio may be pinned multiple times, each folio > > @@ -41,7 +41,6 @@ struct udmabuf { > > > > struct sg_table *sg; > > struct miscdevice *device; > > - pgoff_t *offsets; > > }; > > > > static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) > > @@ -55,8 +54,7 @@ static vm_fault_t udmabuf_vm_fault(struct > vm_fault *vmf) > > if (pgoff >= ubuf->pagecount) > > return VM_FAULT_SIGBUS; > > > > - pfn = folio_pfn(ubuf->folios[pgoff]); > > - pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; > > + pfn = page_to_pfn(ubuf->pages[pgoff]); > > > > ret = vmf_insert_pfn(vma, vmf->address, pfn); > > if (ret & VM_FAULT_ERROR) > > @@ -73,8 +71,7 @@ static vm_fault_t udmabuf_vm_fault(struct > vm_fault *vmf) > > if (WARN_ON(pgoff >= ubuf->pagecount)) > > break; > > > > - pfn = folio_pfn(ubuf->folios[pgoff]); > > - pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; > > + pfn = page_to_pfn(ubuf->pages[pgoff]); > > > > /** > > * If the below vmf_insert_pfn() fails, we do not return an > > @@ -109,22 +106,11 @@ static int mmap_udmabuf(struct dma_buf > *buf, struct vm_area_struct *vma) > > static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map) > > { > > struct udmabuf *ubuf = buf->priv; > > - struct page **pages; > > void *vaddr; > > - pgoff_t pg; > > > > dma_resv_assert_held(buf->resv); > > > > - pages = kvmalloc_objs(*pages, ubuf->pagecount); > > - if (!pages) > > - return -ENOMEM; > > - > > - for (pg = 0; pg < ubuf->pagecount; pg++) > > - pages[pg] = folio_page(ubuf->folios[pg], > > - ubuf->offsets[pg] >> PAGE_SHIFT); > > - > > - vaddr = vm_map_ram(pages, ubuf->pagecount, -1); > > - kvfree(pages); > > + vaddr = vm_map_ram(ubuf->pages, ubuf->pagecount, -1); > > if (!vaddr) > > return -EINVAL; > > > > @@ -146,22 +132,18 @@ static struct sg_table *get_sg_table(struct > device *dev, struct dma_buf *buf, > > { > > struct udmabuf *ubuf = buf->priv; > > struct sg_table *sg; > > - struct scatterlist *sgl; > > - unsigned int i = 0; > > int ret; > > > > sg = kzalloc_obj(*sg); > > if (!sg) > > return ERR_PTR(-ENOMEM); > > > > - ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); > > + ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf- > >pagecount, 0, > > + ubuf->pagecount << PAGE_SHIFT, > > + GFP_KERNEL); > > if (ret < 0) > > goto err_alloc; > > > > - for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) > > - sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE, > > - ubuf->offsets[i]); > > - > > ret = dma_map_sgtable(dev, sg, direction, 0); > > if (ret < 0) > > goto err_map; > > @@ -207,12 +189,8 @@ static void unpin_all_folios(struct udmabuf > *ubuf) > > > > static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t > pgcnt) > > { > > - ubuf->folios = kvmalloc_objs(*ubuf->folios, pgcnt); > > - if (!ubuf->folios) > > - return -ENOMEM; > > - > > - ubuf->offsets = kvzalloc_objs(*ubuf->offsets, pgcnt); > > - if (!ubuf->offsets) > > + ubuf->pages = kvmalloc_objs(*ubuf->pages, pgcnt); > > + if (!ubuf->pages) > > return -ENOMEM; > > > > ubuf->pinned_folios = kvmalloc_objs(*ubuf->pinned_folios, pgcnt); > > @@ -225,8 +203,7 @@ static __always_inline int init_udmabuf(struct > udmabuf *ubuf, pgoff_t pgcnt) > > static __always_inline void deinit_udmabuf(struct udmabuf *ubuf) > > { > > unpin_all_folios(ubuf); > > - kvfree(ubuf->offsets); > > - kvfree(ubuf->folios); > > + kvfree(ubuf->pages); > > } > > > > static void release_udmabuf(struct dma_buf *buf) > > @@ -344,8 +321,8 @@ static long udmabuf_pin_folios(struct udmabuf > *ubuf, struct file *memfd, > > ubuf->pinned_folios[nr_pinned++] = folios[cur_folio]; > > > > for (; subpgoff < fsize; subpgoff += PAGE_SIZE) { > > - ubuf->folios[upgcnt] = folios[cur_folio]; > > - ubuf->offsets[upgcnt] = subpgoff; > > + ubuf->pages[upgcnt] = folio_page(folios[cur_folio], > > + subpgoff >> PAGE_SHIFT); > > ++upgcnt; > > > > if (++cur_pgcnt >= pgcnt) > > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count 2026-03-10 5:49 ` Kasireddy, Vivek @ 2026-03-10 12:59 ` Jason Gunthorpe 2026-03-11 6:53 ` Kasireddy, Vivek 0 siblings, 1 reply; 9+ messages in thread From: Jason Gunthorpe @ 2026-03-10 12:59 UTC (permalink / raw) To: Kasireddy, Vivek Cc: Julian Orth, Andrew Morton, Deak, Imre, Sakari Ailus, Thomas Hellstrom, linux-kernel@vger.kernel.org, Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org On Tue, Mar 10, 2026 at 05:49:23AM +0000, Kasireddy, Vivek wrote: > There are a couple of reasons why we got rid of the pages array: > - Back then, there was some confusion about whether a struct page would > exist or not for tail pages when HVO is enabled. Regardless, there was also > a concern about exposing tail pages outside hugetlb code. The existing code relies on struct page for the vmap: for (pg = 0; pg < ubuf->pagecount; pg++) pages[pg] = folio_page(ubuf->folios[pg], ubuf->offsets[pg] >> PAGE_SHIFT); Tail pages always exist, they are required by many interfaces. > - And, we also wanted to prepare for a future where struct page would not > exist anymore, so, it made sense to just use folios only. If you can 100% stick with whole folios then great, but we don't have the APIs for that cases udmabuf needs right now. Most likely we'd expect to use phys_addr_t for scatterlist and direct full folio for vmap. Neither is helped by the datastructure in udmabuf. Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count 2026-03-10 12:59 ` Jason Gunthorpe @ 2026-03-11 6:53 ` Kasireddy, Vivek 2026-03-11 12:06 ` Jason Gunthorpe 0 siblings, 1 reply; 9+ messages in thread From: Kasireddy, Vivek @ 2026-03-11 6:53 UTC (permalink / raw) To: Jason Gunthorpe Cc: Julian Orth, Andrew Morton, Deak, Imre, Sakari Ailus, linux-kernel@vger.kernel.org, Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org Hi Jason, > Subject: Re: [PATCH] lib/scatterlist: fix sg_page_count and > sg_dma_page_count > > On Tue, Mar 10, 2026 at 05:49:23AM +0000, Kasireddy, Vivek wrote: > > There are a couple of reasons why we got rid of the pages array: > > - Back then, there was some confusion about whether a struct page > would > > exist or not for tail pages when HVO is enabled. Regardless, there was > also > > a concern about exposing tail pages outside hugetlb code. > > The existing code relies on struct page for the vmap: Right, but IMO, this (using struct page in vmap) is a temporary fix until an appropriate folio/pfn based API is available. We had used vmap_pfn() earlier but realized that it was not intended to be used for mapping struct page/folio backed memory. > > for (pg = 0; pg < ubuf->pagecount; pg++) > pages[pg] = folio_page(ubuf->folios[pg], > ubuf->offsets[pg] >> PAGE_SHIFT); > > Tail pages always exist, they are required by many interfaces. Right, hugetlb maintainer (Muchun) confirmed that the tail pages exist in "read-only" mode when HVO is enabled. > > > - And, we also wanted to prepare for a future where struct page would > not > > exist anymore, so, it made sense to just use folios only. > > If you can 100% stick with whole folios then great, but we don't have > the APIs for that cases udmabuf needs right now. Most likely we'd > expect to use phys_addr_t for scatterlist and direct full folio for > vmap. Neither is helped by the datastructure in udmabuf. So, given the current situation, what is the right thing to do? Should we take your patch that brings back the pages array and treat it as a temporary fix until equivalent folio based APIs are available? Thanks, Vivek > > Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count 2026-03-11 6:53 ` Kasireddy, Vivek @ 2026-03-11 12:06 ` Jason Gunthorpe 2026-03-12 5:44 ` Kasireddy, Vivek 0 siblings, 1 reply; 9+ messages in thread From: Jason Gunthorpe @ 2026-03-11 12:06 UTC (permalink / raw) To: Kasireddy, Vivek Cc: Julian Orth, Andrew Morton, Deak, Imre, Sakari Ailus, linux-kernel@vger.kernel.org, Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org On Wed, Mar 11, 2026 at 06:53:51AM +0000, Kasireddy, Vivek wrote: > So, given the current situation, what is the right thing to do? > Should we take your patch that brings back the pages array and treat it as > a temporary fix until equivalent folio based APIs are available? IMHO, yes. It saves memory, increases performance, fixes the bug and uses the APIs properly. Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count 2026-03-11 12:06 ` Jason Gunthorpe @ 2026-03-12 5:44 ` Kasireddy, Vivek 0 siblings, 0 replies; 9+ messages in thread From: Kasireddy, Vivek @ 2026-03-12 5:44 UTC (permalink / raw) To: Jason Gunthorpe Cc: Julian Orth, Andrew Morton, Deak, Imre, Sakari Ailus, linux-kernel@vger.kernel.org, Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org > Subject: Re: [PATCH] lib/scatterlist: fix sg_page_count and > sg_dma_page_count > > > > So, given the current situation, what is the right thing to do? > > Should we take your patch that brings back the pages array and treat it > as > > a temporary fix until equivalent folio based APIs are available? > > IMHO, yes. It saves memory, increases performance, fixes the bug and > uses the APIs properly. Alright, then please send your patch again to dri-devel. I can review and test it. Thanks, Vivek Thanks, Vivek > > Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-12 5:44 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-08 13:55 [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count Julian Orth 2026-03-08 16:21 ` Julian Orth 2026-03-08 18:08 ` Jason Gunthorpe 2026-03-08 18:41 ` Julian Orth 2026-03-10 5:49 ` Kasireddy, Vivek 2026-03-10 12:59 ` Jason Gunthorpe 2026-03-11 6:53 ` Kasireddy, Vivek 2026-03-11 12:06 ` Jason Gunthorpe 2026-03-12 5:44 ` Kasireddy, Vivek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox