From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E76C630E0D6 for ; Sun, 8 Mar 2026 18:08:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.169 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772993311; cv=none; b=CaSpE28VhOYanpFCouyWGR+vBq5LxEzy0LoAiiOyTwS6obaObmzOIqY++JFWrGlVevvJvmcy/bfDITm2VnkGpu4lUz1LF7UZ2oW7LwTk70NExN0l3O8zz3v6ovN5REx+TJ/aM2htlUO5T5AGSL5n+14UVmU3OEzlCXiYVPCRX9E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772993311; c=relaxed/simple; bh=O8rdIZw2MjAK8YBQkfu9JF++s3Ikqrjp8yO8Ilr2HZk=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=EJiHoFSvS8uCh40jU2z2+qC8O+PzDHbKsXE1iOqBBfuEjzseBVPWYApm9Vhuk07lI9FPi4v/PwqxDkqPd28LLsHyvFvb0LuyFKZ+AmxDXE2801RxRjlbEB368yKwx+anIHscMcd/JoRRVOm5hXXPA2kb1ZioMUU9SUqF6dLwytY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca; spf=pass smtp.mailfrom=ziepe.ca; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b=MLiU1LuE; arc=none smtp.client-ip=209.85.222.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ziepe.ca Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="MLiU1LuE" Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-8cbb2329e7eso1067741085a.0 for ; Sun, 08 Mar 2026 11:08:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; t=1772993309; x=1773598109; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JtHQ9HV5US2BtY4IR7zhNTMolN360tB0LamoLQU0vQQ=; b=MLiU1LuE6zYqVvLRs+GCDZg8uFzhzRduUK78juIUSah7pdFstsKmacK47CffDA8Nod H49OScO1tgNwYPHRRwFwuoLv5V8pO7TZ3SPBo19zWcC/CHhyUabmObN9u9J03DS1qnQ1 gStl7I2EG7Y7aFFy1eC9qpFAC4fCGf3Hp6YEd3Df54PlYtHLB/Km40YEKeZGRGHG/bGd viHKeeUW6CRBYHaTCBun0x5vr4DaQuYCLSwb0xK8KaZf4owcvvB14+ZYWwfVUf7mRJq4 EWXIqMHEF9wn6GoRRVNViNwO1+MTVnDdYcQW56egSwqVOsUa4IKlMxmRJjpK53VGsWWS xQnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772993309; x=1773598109; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JtHQ9HV5US2BtY4IR7zhNTMolN360tB0LamoLQU0vQQ=; b=LmIUbHMmuRPRhKxSjp46bacJUb5vOqSTyJJITPVVoh30b3d4zmKmwPEh5DvcGdZXuE vPz+u2O95I1J44OxWGyx8kyzpQ0BQp+PuYwPFr6bK1pJ9aSovXHuzbl8din6vG7Ej7nW NDrby6/q5c/cyvK676K2hpY0wcN6uv8Fmt1nkcPxj332LsHzxmKGlq7ZwOHrpDWyCid7 mXezjsff9uHoPREf6fHIZwWFyqAzX2cNvFWzpuK0BPLlTRbI2OcRMoVWuzXw3FoPsbta 4tUFjJp8YyujCJqsSWBJTM8P8If4PRTtjdZ+aerYHQUJyq5vWfvczd7fYLXWoebycFI/ /LUA== X-Forwarded-Encrypted: i=1; AJvYcCUo04OmDR6Jkvt4asFL+kSagSnlGQF/XdTmoyM17hpXFXBNlcBxs/x+EEJdS8v1Z9O6kliboH4IwBADKVM=@vger.kernel.org X-Gm-Message-State: AOJu0Yw+wGoig0Jx+eL/e+iX/YSC6jh96mLoMnGRghpTlCLiRys2wzzv xvlShdpKgvajTbNVspTuoQiiqSxGtwPOqdi+LqUEf7udNWKuFwPv1oalTSM9ExJWmCw= X-Gm-Gg: ATEYQzxJjmG2s9JAmNQGANNXdYEMk86f/2FXNjRkpuo9lUxXU5cpN/UmZADoUeSKiad BHiu+2H1QczpRbqwknVxixn1RlaGcpeBk1NnPra55uOD1Cu52+z2/kBUphvbfFoc40CD49y/0sd qRgkBrzpeqm/NFLko//eYzrctSu38KzpfJ+2nOsxFIoRDvSiQYI76WhtqQi6Wnj79Q6TtOlFMC9 dcG414OMHF1go38PgjKUUIN7+c1J+CmXVSpClSQcvZIGjZ2aOBVJ6/4Iu++Y4phClSmhmprgUOi Ju2U/l117AKz10weLWH1hHpq6xqtZUomfAm4VDgj+440HUfCwo9o/MYmJKeKdB0PWyVhxmZGGPz UVWi5n7kwujgmdS6Q6XrgvwL3R3f36ULujbIfo5T4CYDfu0E9/V17xtKlSM7M9LL6cFKLZmdBUg Izun5NUX00DRwG7FSGqJXsqV4x9rwqGr9eM1D7sCQQrKTynSzOilAQsELvRlRddtRmzJFP2EAxN M5NhXil X-Received: by 2002:a05:620a:25cc:b0:8cb:52e0:15e8 with SMTP id af79cd13be357-8cd6d4a9f47mr1127163485a.76.1772993308677; Sun, 08 Mar 2026 11:08:28 -0700 (PDT) Received: from ziepe.ca (hlfxns017vw-142-162-112-119.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.162.112.119]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8cd7e857037sm232088185a.20.2026.03.08.11.08.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Mar 2026 11:08:27 -0700 (PDT) Received: from jgg by wakko with local (Exim 4.97) (envelope-from ) id 1vzIYE-0000000AY0U-0s9s; Sun, 08 Mar 2026 15:08:26 -0300 Date: Sun, 8 Mar 2026 15:08:26 -0300 From: Jason Gunthorpe To: Julian Orth Cc: Andrew Morton , Imre Deak , Sakari Ailus , Thomas Hellstrom , linux-kernel@vger.kernel.org Subject: Re: [PATCH] lib/scatterlist: fix sg_page_count and sg_dma_page_count Message-ID: <20260308180826.GG1687929@ziepe.ca> References: <20260308-scatterlist-v1-1-39c4566b0bba@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260308-scatterlist-v1-1-39c4566b0bba@gmail.com> On Sun, Mar 08, 2026 at 02:55:27PM +0100, Julian Orth wrote: > A user reported memory corruption in the Jay wayland compositor [1]. The > corruption started when archlinux enabled > CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_WITHIN_SIZE in kernel 6.19.5. > > The compositor uses udmabuf to upload memory from memfds to the GPU. > When running an affected kernel, the following warnings are logged: > > a - addrs >= max_entries > WARNING: drivers/gpu/drm/drm_prime.c:1089 at drm_prime_sg_to_dma_addr_array+0x86/0xc0, CPU#31: jay/1864 > [...] > Call Trace: > > amdgpu_bo_move+0x188/0x800 [amdgpu 3b451640234948027c09e9b39e6520bc7e5471cf] > > Disabling the use of huge pages at runtime via > /sys/kernel/mm/transparent_hugepage/shmem_enabled fixes the issue. > > udmabuf allocates a scatterlist with buffer_size/PAGE_SIZE entries. Each > entry has a length of PAGE_SIZE. With huge pages disabled, it appears > that sg->offset is always 0. With huge pages enabled, sg->offset is > incremented by PAGE_SIZE until the end of the huge page. This was broken by 0c8b91ef5100 ("udmabuf: add back support for mapping hugetlb pages") which switched from a working sg_alloc_table_from_pages() to a messed up sg_set_pages loop: + for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) + sg_set_page(sgl, ubuf->pages[i], PAGE_SIZE, ubuf->offsets[i]); [..] + ubuf->offsets[*pgbuf] = subpgoff << PAGE_SHIFT; Which is just the wrong way to use the scatterlist API. This was later changed to sg_set_folio() which I'm also suspecting has a bug, it should be setting page_link to the proper tail page because as you observe page_offset must fall within 0 to PAGE_SIZE-1 to make the iterator work. I think the whole design here in udmabuf makes very little sense. It starts out with an actual list of folios then expands them to a per-4K double array of folio/offset. This is nonsensical, if it wants to build a way to direct index the mapping for mmap it should just build itself a page * array like the code used to do and continue to use sg_alloc_table_from_pages() which builds properly formed scatterlists. This would save memory, use the APIs properly and build a correct and optimized scatterlist to boot. It uses vmf_insert_pfn() and vm_map_ram() anyhow so it doesn't even use a folio :\ Here, a few mins of AI shows what I think udmabuf should look like. If you wish to persue this please add my signed-off-by and handle testing it and getting it merged. I reviewed it enough to see it was showing what I wanted. Jason diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c index 94b8ecb892bb17..5d687860445137 100644 --- a/drivers/dma-buf/udmabuf.c +++ b/drivers/dma-buf/udmabuf.c @@ -26,10 +26,10 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a dmabuf, in megabytes. Default is struct udmabuf { pgoff_t pagecount; - struct folio **folios; + struct page **pages; /** - * Unlike folios, pinned_folios is only used for unpin. + * Unlike pages, pinned_folios is only used for unpin. * So, nr_pinned is not the same to pagecount, the pinned_folios * only set each folio which already pinned when udmabuf_create. * Note that, since a folio may be pinned multiple times, each folio @@ -41,7 +41,6 @@ struct udmabuf { struct sg_table *sg; struct miscdevice *device; - pgoff_t *offsets; }; static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) @@ -55,8 +54,7 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) if (pgoff >= ubuf->pagecount) return VM_FAULT_SIGBUS; - pfn = folio_pfn(ubuf->folios[pgoff]); - pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; + pfn = page_to_pfn(ubuf->pages[pgoff]); ret = vmf_insert_pfn(vma, vmf->address, pfn); if (ret & VM_FAULT_ERROR) @@ -73,8 +71,7 @@ static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf) if (WARN_ON(pgoff >= ubuf->pagecount)) break; - pfn = folio_pfn(ubuf->folios[pgoff]); - pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT; + pfn = page_to_pfn(ubuf->pages[pgoff]); /** * If the below vmf_insert_pfn() fails, we do not return an @@ -109,22 +106,11 @@ static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map) { struct udmabuf *ubuf = buf->priv; - struct page **pages; void *vaddr; - pgoff_t pg; dma_resv_assert_held(buf->resv); - pages = kvmalloc_objs(*pages, ubuf->pagecount); - if (!pages) - return -ENOMEM; - - for (pg = 0; pg < ubuf->pagecount; pg++) - pages[pg] = folio_page(ubuf->folios[pg], - ubuf->offsets[pg] >> PAGE_SHIFT); - - vaddr = vm_map_ram(pages, ubuf->pagecount, -1); - kvfree(pages); + vaddr = vm_map_ram(ubuf->pages, ubuf->pagecount, -1); if (!vaddr) return -EINVAL; @@ -146,22 +132,18 @@ static struct sg_table *get_sg_table(struct device *dev, struct dma_buf *buf, { struct udmabuf *ubuf = buf->priv; struct sg_table *sg; - struct scatterlist *sgl; - unsigned int i = 0; int ret; sg = kzalloc_obj(*sg); if (!sg) return ERR_PTR(-ENOMEM); - ret = sg_alloc_table(sg, ubuf->pagecount, GFP_KERNEL); + ret = sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, 0, + ubuf->pagecount << PAGE_SHIFT, + GFP_KERNEL); if (ret < 0) goto err_alloc; - for_each_sg(sg->sgl, sgl, ubuf->pagecount, i) - sg_set_folio(sgl, ubuf->folios[i], PAGE_SIZE, - ubuf->offsets[i]); - ret = dma_map_sgtable(dev, sg, direction, 0); if (ret < 0) goto err_map; @@ -207,12 +189,8 @@ static void unpin_all_folios(struct udmabuf *ubuf) static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t pgcnt) { - ubuf->folios = kvmalloc_objs(*ubuf->folios, pgcnt); - if (!ubuf->folios) - return -ENOMEM; - - ubuf->offsets = kvzalloc_objs(*ubuf->offsets, pgcnt); - if (!ubuf->offsets) + ubuf->pages = kvmalloc_objs(*ubuf->pages, pgcnt); + if (!ubuf->pages) return -ENOMEM; ubuf->pinned_folios = kvmalloc_objs(*ubuf->pinned_folios, pgcnt); @@ -225,8 +203,7 @@ static __always_inline int init_udmabuf(struct udmabuf *ubuf, pgoff_t pgcnt) static __always_inline void deinit_udmabuf(struct udmabuf *ubuf) { unpin_all_folios(ubuf); - kvfree(ubuf->offsets); - kvfree(ubuf->folios); + kvfree(ubuf->pages); } static void release_udmabuf(struct dma_buf *buf) @@ -344,8 +321,8 @@ static long udmabuf_pin_folios(struct udmabuf *ubuf, struct file *memfd, ubuf->pinned_folios[nr_pinned++] = folios[cur_folio]; for (; subpgoff < fsize; subpgoff += PAGE_SIZE) { - ubuf->folios[upgcnt] = folios[cur_folio]; - ubuf->offsets[upgcnt] = subpgoff; + ubuf->pages[upgcnt] = folio_page(folios[cur_folio], + subpgoff >> PAGE_SHIFT); ++upgcnt; if (++cur_pgcnt >= pgcnt)