Re: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@ziepe.ca>
To: Julian Orth <ju.orth@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Imre Deak <imre.deak@intel.com>,
	Sakari Ailus <sakari.ailus@linux.intel.com>,
	Thomas Hellstrom <thellstrom@vmware.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count
Date: Sun, 8 Mar 2026 15:08:26 -0300	[thread overview]
Message-ID: <20260308180826.GG1687929@ziepe.ca> (raw)
In-Reply-To: <20260308-scatterlist-v1-1-39c4566b0bba@gmail.com>

On Sun, Mar 08, 2026 at 02:55:27PM +0100, Julian Orth wrote:
> A user reported memory corruption in the Jay wayland compositor [1]. The
> corruption started when archlinux enabled
> CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_WITHIN_SIZE in kernel 6.19.5.
> 
> The compositor uses udmabuf to upload memory from memfds to the GPU.
> When running an affected kernel, the following warnings are logged:
> 
>     a - addrs >= max_entries
>     WARNING: drivers/gpu/drm/drm_prime.c:1089 at drm_prime_sg_to_dma_addr_array+0x86/0xc0, CPU#31: jay/1864
>     [...]
>     Call Trace:
>      <TASK>
>      amdgpu_bo_move+0x188/0x800 [amdgpu 3b451640234948027c09e9b39e6520bc7e5471cf]
> 
> Disabling the use of huge pages at runtime via
> /sys/kernel/mm/transparent_hugepage/shmem_enabled fixes the issue.
> 
> udmabuf allocates a scatterlist with buffer_size/PAGE_SIZE entries. Each
> entry has a length of PAGE_SIZE. With huge pages disabled, it appears
> that sg->offset is always 0. With huge pages enabled, sg->offset is
> incremented by PAGE_SIZE until the end of the huge page.

This was broken by 0c8b91ef5100 ("udmabuf: add back support for
mapping hugetlb pages") which switched from a working
sg_alloc_table_from_pages() to a messed up sg_set_pages loop:

+       for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
+               sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]);
[..]
+               ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT;

Which is just the wrong way to use the scatterlist API.

This was later changed to sg_set_folio() which I'm also suspecting has
a bug, it should be setting page_link to the proper tail page because
as you observe page_offset must fall within 0 to PAGE_SIZE-1 to make
the iterator work.

I think the whole design here in udmabuf makes very little sense. It
starts out with an actual list of folios then expands them to a per-4K
double array of folio/offset. This is nonsensical, if it wants to
build a way to direct index the mapping for mmap it should just build
itself a page * array like the code used to do and continue to use
sg_alloc_table_from_pages() which builds properly formed scatterlists.

This would save memory, use the APIs properly and build a correct and
optimized scatterlist to boot. It uses vmf_insert_pfn() and
vm_map_ram() anyhow so it doesn't even use a folio :\

Here, a few mins of AI shows what I think udmabuf should look like. If
you wish to persue this please add my signed-off-by and handle testing
it and getting it merged. I reviewed it enough to see it was showing
what I wanted.

Jason

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 94b8ecb892bb17..5d687860445137 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -26,10 +26,10 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a dmabuf, in megabytes. Default is
 
 struct udmabuf {
 	pgoff_t pagecount;
-	struct folio **folios;
+	struct page **pages;
 
 	/**
-	 * Unlike folios, pinned_folios is only used for unpin.
+	 * Unlike pages, pinned_folios is only used for unpin.
 	 * So, nr_pinned is not the same to pagecount, the pinned_folios
 	 * only set each folio which already pinned when udmabuf_create.
 	 * Note that, since a folio may be pinned multiple times, each folio
@@ -41,7 +41,6 @@ struct udmabuf {
 
 	struct sg_table *sg;
 	struct miscdevice *device;
-	pgoff_t *offsets;
 };
 
 static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
@@ -55,8 +54,7 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
 	if (pgoff >= ubuf->pagecount)
 		return VM_FAULT_SIGBUS;
 
-	pfn = folio_pfn(ubuf->folios[pgoff]);
-	pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
+	pfn = page_to_pfn(ubuf->pages[pgoff]);
 
 	ret = vmf_insert_pfn(vma, vmf->address, pfn);
 	if (ret & VM_FAULT_ERROR)
@@ -73,8 +71,7 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
 		if (WARN_ON(pgoff >= ubuf->pagecount))
 			break;
 
-		pfn = folio_pfn(ubuf->folios[pgoff]);
-		pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
+		pfn = page_to_pfn(ubuf->pages[pgoff]);
 
 		/**
 		 * If the below vmf_insert_pfn() fails, we do not return an
@@ -109,22 +106,11 @@ static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma)
 static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map)
 {
 	struct udmabuf *ubuf = buf->priv;
-	struct page **pages;
 	void *vaddr;
-	pgoff_t pg;
 
 	dma_resv_assert_held(buf->resv);
 
-	pages = kvmalloc_objs(*pages, ubuf->pagecount);
-	if (!pages)
-		return -ENOMEM;
-
-	for (pg = 0; pg < ubuf->pagecount; pg++)
-		pages[pg] = folio_page(ubuf->folios[pg],
-				       ubuf->offsets[pg] >> PAGE_SHIFT);
-
-	vaddr = vm_map_ram(pages, ubuf->pagecount, -1);
-	kvfree(pages);
+	vaddr = vm_map_ram(ubuf->pages, ubuf->pagecount, -1);
 	if (!vaddr)
 		return -EINVAL;
 
@@ -146,22 +132,18 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf,
 {
 	struct udmabuf *ubuf = buf->priv;
 	struct sg_table *sg;
-	struct scatterlist *sgl;
-	unsigned int i = 0;
 	int ret;
 
 	sg = kzalloc_obj(*sg);
 	if (!sg)
 		return ERR_PTR(-ENOMEM);
 
-	ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL);
+	ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, 0,
+					ubuf->pagecount << PAGE_SHIFT,
+					GFP_KERNEL);
 	if (ret < 0)
 		goto err_alloc;
 
-	for_each_sg(sg->sgl, sgl, ubuf->pagecount, i)
-		sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE,
-			     ubuf->offsets[i]);
-
 	ret = dma_map_sgtable(dev, sg, direction, 0);
 	if (ret < 0)
 		goto err_map;
@@ -207,12 +189,8 @@ static void unpin_all_folios(struct udmabuf *ubuf)
 
 static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t pgcnt)
 {
-	ubuf->folios = kvmalloc_objs(*ubuf->folios, pgcnt);
-	if (!ubuf->folios)
-		return -ENOMEM;
-
-	ubuf->offsets = kvzalloc_objs(*ubuf->offsets, pgcnt);
-	if (!ubuf->offsets)
+	ubuf->pages = kvmalloc_objs(*ubuf->pages, pgcnt);
+	if (!ubuf->pages)
 		return -ENOMEM;
 
 	ubuf->pinned_folios = kvmalloc_objs(*ubuf->pinned_folios, pgcnt);
@@ -225,8 +203,7 @@ static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t pgcnt)
 static __always_inline void deinit_udmabuf(struct udmabuf *ubuf)
 {
 	unpin_all_folios(ubuf);
-	kvfree(ubuf->offsets);
-	kvfree(ubuf->folios);
+	kvfree(ubuf->pages);
 }
 
 static void release_udmabuf(struct dma_buf *buf)
@@ -344,8 +321,8 @@ static long udmabuf_pin_folios(struct udmabuf *ubuf, struct file *memfd,
 		ubuf->pinned_folios[nr_pinned++] = folios[cur_folio];
 
 		for (; subpgoff < fsize; subpgoff += PAGE_SIZE) {
-			ubuf->folios[upgcnt] = folios[cur_folio];
-			ubuf->offsets[upgcnt] = subpgoff;
+			ubuf->pages[upgcnt] = folio_page(folios[cur_folio],
+						subpgoff >> PAGE_SHIFT);
 			++upgcnt;
 
 			if (++cur_pgcnt >= pgcnt)

next prev parent reply	other threads:[~2026-03-08 18:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-08 13:55 [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count Julian Orth
2026-03-08 16:21 ` Julian Orth
2026-03-08 18:08 ` Jason Gunthorpe [this message]
2026-03-08 18:41   ` Julian Orth
2026-03-10  5:49     ` Kasireddy, Vivek
2026-03-10 12:59       ` Jason Gunthorpe
2026-03-11  6:53         ` Kasireddy, Vivek
2026-03-11 12:06           ` Jason Gunthorpe
2026-03-12  5:44             ` Kasireddy, Vivek

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:94b8ecb892bb1 dfblob:5d68786044513 )
 OR (
bs:"Re: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260308180826.GG1687929@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=akpm@linux-foundation.org \
    --cc=imre.deak@intel.com \
    --cc=ju.orth@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sakari.ailus@linux.intel.com \
    --cc=thellstrom@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox