All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/5] udmbuf bug fix and some improvements
@ 2024-08-13  9:05 Huan Yang
  2024-08-13  9:05 ` [PATCH v3 1/5] udmabuf: cancel mmap page fault, direct map it Huan Yang
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Huan Yang @ 2024-08-13  9:05 UTC (permalink / raw)
  To: Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel,
	linux-media, linaro-mm-sig, linux-kernel
  Cc: opensource.kernel, Huan Yang

This patchset attempts to fix some errors in udmabuf and remove the
upin_list structure.

Some of this fix just gather the patches which I upload before.

Patch 1,2,4,5 has passed the udmabuf self-test suite's tests.
Suggested by Kasireddy, Vivek <vivek.kasireddy@intel.com>
Patch5 modified the unpin function, therefore running the udmabuf
self-test program in a loop did not reveal any memory leaks.

Notice: Test item 6 maybe requires running the command:
  echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

Patch1
===
Try to remove page fault mmap and direct map it.
Due to current udmabuf has already obtained and pinned the folio
upon completion of the creation.This means that the physical memory has
already been acquired, rather than being accessed dynamically. The
current page fault method only saves some page table memory.

As a result, the page fault mechanism has lost its purpose as a demanding
page. Due to the fact that page fault requires trapping into kernel mode
and filling in when accessing the corresponding virtual address in mmap,
this means that user mode access to virtual addresses needs to trap into
kernel mode.

Therefore, when creating a large size udmabuf, this represents a
considerable overhead.

Patch2
===
This is the same to patch:
https://lore.kernel.org/all/20240725021349.580574-1-link@vivo.com/

Patch3
===
The current implementation of udmabuf's vmap has issues.

It does not correctly set each page of the folio to the page structure,
so that when vmap is called, all pages are the head page of the folio.

Due to udmabuf can use hugetlb, if HVO enabled, tail page may not exist,
so, we can't use page array to map, instead, use pfn array.

Patch4
===
Change codestyle and fix a potential bug.

There are some variables in udmabuf_create that are only used inside the
loop. Therefore, there is no need to declare them outside the scope.
This patch moved it into loop.

It is difficult to understand the loop condition of the code that adds
folio to the unpin_list.

The outer loop of this patch iterates through folios, while the inner
loop correctly sets the folio and corresponding offset into the udmabuf
starting from the offset. if reach to pgcnt or nr_folios, end of loop.

By this, more readable.

ubuf->pagecount already set before true set in loop, if get some error
when create, This means that pagecount and folios are not equivalent,
which could lead to potential issues when release.

This patch dynamic update ubuf->pagecount only when folios update end.

Patch5
===
Attempt to remove unpin_list and other related data structures.

In order to adapt to Folio, we established the unpin_list data structure
to unpin all folios and maintain the page mapping relationship.

However, this data structure requires 24 bytes for each page and has low
traversal performance for the list. And maintaining the offset structure
also consumes a portion of memory.

This patch attempts to remove these data structures.

Considering that during creation, we arranged the folio array in the
order of pin and set the offset according to pgcnt.

We actually don't need to use unpin_list to unpin during release.
Instead, we can iterate through the folios array during release and
unpin any folio that is different from the ones previously accessed.

By this, not only saves the overhead of the udmabuf_folio data structure
but also makes array access more cache-friendly.


Changelog
===
  v3 -> v2:
    Patch1, avoid use page, instead, use pfn, and use vmf_insert_pfn map
    suggested-by Kasireddy, Vivek <vivek.kasireddy@intel.com>

    Patch2, update acked-by Kasireddy, Vivek <vivek.kasireddy@intel.com>
    And keep the kvcalloc on the same line.

    Patch3, avoid use page, instead, use pfn, then use vmap_pfn map

    Patch4, split v2 patch4, single update codestyle to keep review
    easy.

    Patch5, another way to remove udmabuf_folio
    
---
  v2 -> v1:
    Patch1, 3 Rectify the improper use of the sg table.
    suggested-by Christian König <christian.koenig@amd.com>
    
    Patch2 add acked-by Christian K�nig <christian.koenig@amd.com> which
    marked in v1

    Patch4
    Modify the data structure to restore the use of pages and
    correct the misunderstanding of loop conditions such as "pgcnt".
    make sure pass self test.

    remove v1's patch4

v2
  https://lore.kernel.org/all/20240805032550.3912454-1-link@vivo.com/
v1
  https://lore.kernel.org/all/20240801104512.4056860-1-link@vivo.com/

Huan Yang (5):
  udmabuf: cancel mmap page fault, direct map it
  udmabuf: change folios array from kmalloc to kvmalloc
  fix vmap_udmabuf error page set
  udmabuf: codestyle cleanup
  udmabuf: remove udmabuf_folio

 drivers/dma-buf/udmabuf.c | 198 ++++++++++++++++++--------------------
 1 file changed, 96 insertions(+), 102 deletions(-)


base-commit: 033a4691702cdca3a613256b0623b8eeacb4985e
-- 
2.45.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/5] udmabuf: cancel mmap page fault, direct map it
  2024-08-13  9:05 [PATCH v3 0/5] udmbuf bug fix and some improvements Huan Yang
@ 2024-08-13  9:05 ` Huan Yang
  2024-08-17  0:53   ` Kasireddy, Vivek
  2024-08-13  9:05 ` [PATCH v3 2/5] udmabuf: change folios array from kmalloc to kvmalloc Huan Yang
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Huan Yang @ 2024-08-13  9:05 UTC (permalink / raw)
  To: Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel,
	linux-media, linaro-mm-sig, linux-kernel
  Cc: opensource.kernel, Huan Yang, Vivek Kasireddy

The current udmabuf mmap uses a page fault to populate the vma.

However, the current udmabuf has already obtained and pinned the folio
upon completion of the creation.This means that the physical memory has
already been acquired, rather than being accessed dynamically. The
current page fault method only saves some page table memory.

As a result, the page fault has lost its purpose as a demanding
page. Due to the fact that page fault requires trapping into kernel mode
and filling in when accessing the corresponding virtual address in mmap,
when creating a large size udmabuf, this represents a considerable
overhead.

The current patch removes the page fault method of mmap and
instead fills pfn directly when mmap is triggered.

Signed-off-by: Huan Yang <link@vivo.com>
Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/dma-buf/udmabuf.c | 37 +++++++++++++++----------------------
 1 file changed, 15 insertions(+), 22 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 047c3cd2ceff..d39f9e1cd532 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -38,36 +38,29 @@ struct udmabuf_folio {
 	struct list_head list;
 };
 
-static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
-{
-	struct vm_area_struct *vma = vmf->vma;
-	struct udmabuf *ubuf = vma->vm_private_data;
-	pgoff_t pgoff = vmf->pgoff;
-	unsigned long pfn;
-
-	if (pgoff >= ubuf->pagecount)
-		return VM_FAULT_SIGBUS;
-
-	pfn = folio_pfn(ubuf->folios[pgoff]);
-	pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
-
-	return vmf_insert_pfn(vma, vmf->address, pfn);
-}
-
-static const struct vm_operations_struct udmabuf_vm_ops = {
-	.fault = udmabuf_vm_fault,
-};
-
 static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma)
 {
 	struct udmabuf *ubuf = buf->priv;
+	unsigned long addr;
+	unsigned long end;
+	unsigned long pgoff;
+	int ret;
 
 	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
 		return -EINVAL;
 
-	vma->vm_ops = &udmabuf_vm_ops;
-	vma->vm_private_data = ubuf;
 	vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
+
+	for (pgoff = vma->vm_pgoff, end = vma->vm_end, addr = vma->vm_start;
+	     addr < end; pgoff++, addr += PAGE_SIZE) {
+		unsigned long pfn = folio_pfn(ubuf->folios[pgoff]);
+
+		pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
+		ret = vmf_insert_pfn(vma, addr, pfn);
+		if (ret & VM_FAULT_ERROR)
+			return vm_fault_to_errno(ret, 0);
+	}
+
 	return 0;
 }
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/5] udmabuf: change folios array from kmalloc to kvmalloc
  2024-08-13  9:05 [PATCH v3 0/5] udmbuf bug fix and some improvements Huan Yang
  2024-08-13  9:05 ` [PATCH v3 1/5] udmabuf: cancel mmap page fault, direct map it Huan Yang
@ 2024-08-13  9:05 ` Huan Yang
  2024-08-13  9:05 ` [PATCH v3 3/5] fix vmap_udmabuf error page set Huan Yang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Huan Yang @ 2024-08-13  9:05 UTC (permalink / raw)
  To: Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel,
	linux-media, linaro-mm-sig, linux-kernel
  Cc: opensource.kernel, Huan Yang, Vivek Kasireddy

When PAGE_SIZE 4096, MAX_PAGE_ORDER 10, 64bit machine,
page_alloc only support 4MB.
If above this, trigger this warn and return NULL.

udmabuf can change size limit, if change it to 3072(3GB), and then alloc
3GB udmabuf, will fail create.

[ 4080.876581] ------------[ cut here ]------------
[ 4080.876843] WARNING: CPU: 3 PID: 2015 at mm/page_alloc.c:4556 __alloc_pages+0x2c8/0x350
[ 4080.878839] RIP: 0010:__alloc_pages+0x2c8/0x350
[ 4080.879470] Call Trace:
[ 4080.879473]  <TASK>
[ 4080.879473]  ? __alloc_pages+0x2c8/0x350
[ 4080.879475]  ? __warn.cold+0x8e/0xe8
[ 4080.880647]  ? __alloc_pages+0x2c8/0x350
[ 4080.880909]  ? report_bug+0xff/0x140
[ 4080.881175]  ? handle_bug+0x3c/0x80
[ 4080.881556]  ? exc_invalid_op+0x17/0x70
[ 4080.881559]  ? asm_exc_invalid_op+0x1a/0x20
[ 4080.882077]  ? udmabuf_create+0x131/0x400

Because MAX_PAGE_ORDER, kmalloc can max alloc 4096 * (1 << 10), 4MB
memory, each array entry is pointer(8byte), so can save 524288 pages(2GB).

Further more, costly order(order 3) may not be guaranteed that it can be
applied for, due to fragmentation.

This patch change udmabuf array use kvmalloc_array, this can fallback
alloc into vmalloc, which can guarantee allocation for any size and does
not affect the performance of kmalloc allocations.

Signed-off-by: Huan Yang <link@vivo.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/dma-buf/udmabuf.c | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index d39f9e1cd532..3ec72d47bb1c 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -73,7 +73,7 @@ static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map)
 
 	dma_resv_assert_held(buf->resv);
 
-	pages = kmalloc_array(ubuf->pagecount, sizeof(*pages), GFP_KERNEL);
+	pages = kvmalloc_array(ubuf->pagecount, sizeof(*pages), GFP_KERNEL);
 	if (!pages)
 		return -ENOMEM;
 
@@ -81,7 +81,7 @@ static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map)
 		pages[pg] = &ubuf->folios[pg]->page;
 
 	vaddr = vm_map_ram(pages, ubuf->pagecount, -1);
-	kfree(pages);
+	kvfree(pages);
 	if (!vaddr)
 		return -EINVAL;
 
@@ -189,8 +189,8 @@ static void release_udmabuf(struct dma_buf *buf)
 		put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
 
 	unpin_all_folios(&ubuf->unpin_list);
-	kfree(ubuf->offsets);
-	kfree(ubuf->folios);
+	kvfree(ubuf->offsets);
+	kvfree(ubuf->folios);
 	kfree(ubuf);
 }
 
@@ -315,14 +315,14 @@ static long udmabuf_create(struct miscdevice *device,
 	if (!ubuf->pagecount)
 		goto err;
 
-	ubuf->folios = kmalloc_array(ubuf->pagecount, sizeof(*ubuf->folios),
-				    GFP_KERNEL);
+	ubuf->folios = kvmalloc_array(ubuf->pagecount, sizeof(*ubuf->folios),
+				      GFP_KERNEL);
 	if (!ubuf->folios) {
 		ret = -ENOMEM;
 		goto err;
 	}
-	ubuf->offsets = kcalloc(ubuf->pagecount, sizeof(*ubuf->offsets),
-				GFP_KERNEL);
+	ubuf->offsets = kvcalloc(ubuf->pagecount, sizeof(*ubuf->offsets),
+				 GFP_KERNEL);
 	if (!ubuf->offsets) {
 		ret = -ENOMEM;
 		goto err;
@@ -336,7 +336,7 @@ static long udmabuf_create(struct miscdevice *device,
 			goto err;
 
 		pgcnt = list[i].size >> PAGE_SHIFT;
-		folios = kmalloc_array(pgcnt, sizeof(*folios), GFP_KERNEL);
+		folios = kvmalloc_array(pgcnt, sizeof(*folios), GFP_KERNEL);
 		if (!folios) {
 			ret = -ENOMEM;
 			goto err;
@@ -346,7 +346,7 @@ static long udmabuf_create(struct miscdevice *device,
 		ret = memfd_pin_folios(memfd, list[i].offset, end,
 				       folios, pgcnt, &pgoff);
 		if (ret <= 0) {
-			kfree(folios);
+			kvfree(folios);
 			if (!ret)
 				ret = -EINVAL;
 			goto err;
@@ -375,7 +375,7 @@ static long udmabuf_create(struct miscdevice *device,
 			}
 		}
 
-		kfree(folios);
+		kvfree(folios);
 		fput(memfd);
 		memfd = NULL;
 	}
@@ -391,8 +391,8 @@ static long udmabuf_create(struct miscdevice *device,
 	if (memfd)
 		fput(memfd);
 	unpin_all_folios(&ubuf->unpin_list);
-	kfree(ubuf->offsets);
-	kfree(ubuf->folios);
+	kvfree(ubuf->offsets);
+	kvfree(ubuf->folios);
 	kfree(ubuf);
 	return ret;
 }
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/5] fix vmap_udmabuf error page set
  2024-08-13  9:05 [PATCH v3 0/5] udmbuf bug fix and some improvements Huan Yang
  2024-08-13  9:05 ` [PATCH v3 1/5] udmabuf: cancel mmap page fault, direct map it Huan Yang
  2024-08-13  9:05 ` [PATCH v3 2/5] udmabuf: change folios array from kmalloc to kvmalloc Huan Yang
@ 2024-08-13  9:05 ` Huan Yang
  2024-08-17  0:54   ` Kasireddy, Vivek
  2024-08-13  9:05 ` [PATCH v3 4/5] udmabuf: codestyle cleanup Huan Yang
  2024-08-13  9:05 ` [PATCH v3 5/5] udmabuf: remove udmabuf_folio Huan Yang
  4 siblings, 1 reply; 15+ messages in thread
From: Huan Yang @ 2024-08-13  9:05 UTC (permalink / raw)
  To: Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel,
	linux-media, linaro-mm-sig, linux-kernel
  Cc: opensource.kernel, Huan Yang, Vivek Kasireddy

Currently vmap_udmabuf set page's array by each folio.
But, ubuf->folios is only contain's the folio's head page.

That mean we repeatedly mapped the folio head page to the vmalloc area.

Due to udmabuf can use hugetlb, if HVO enabled, tail page may not exist,
so, we can't use page array to map, instead, use pfn array.

Signed-off-by: Huan Yang <link@vivo.com>
Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
---
 drivers/dma-buf/udmabuf.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 3ec72d47bb1c..4ec54c60d76c 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -67,21 +67,29 @@ static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma)
 static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map)
 {
 	struct udmabuf *ubuf = buf->priv;
-	struct page **pages;
+	unsigned long *pfns;
 	void *vaddr;
 	pgoff_t pg;
 
 	dma_resv_assert_held(buf->resv);
 
-	pages = kvmalloc_array(ubuf->pagecount, sizeof(*pages), GFP_KERNEL);
-	if (!pages)
+	/**
+	 * HVO may free tail pages, so just use pfn to map each folio
+	 * into vmalloc area.
+	 */
+	pfns = kvmalloc_array(ubuf->pagecount, sizeof(*pfns), GFP_KERNEL);
+	if (!pfns)
 		return -ENOMEM;
 
-	for (pg = 0; pg < ubuf->pagecount; pg++)
-		pages[pg] = &ubuf->folios[pg]->page;
+	for (pg = 0; pg < ubuf->pagecount; pg++) {
+		unsigned long pfn = folio_pfn(ubuf->folios[pg]);
 
-	vaddr = vm_map_ram(pages, ubuf->pagecount, -1);
-	kvfree(pages);
+		pfn += ubuf->offsets[pg] >> PAGE_SHIFT;
+		pfns[pg] = pfn;
+	}
+
+	vaddr = vmap_pfn(pfns, ubuf->pagecount, PAGE_KERNEL);
+	kvfree(pfns);
 	if (!vaddr)
 		return -EINVAL;
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/5] udmabuf: codestyle cleanup
  2024-08-13  9:05 [PATCH v3 0/5] udmbuf bug fix and some improvements Huan Yang
                   ` (2 preceding siblings ...)
  2024-08-13  9:05 ` [PATCH v3 3/5] fix vmap_udmabuf error page set Huan Yang
@ 2024-08-13  9:05 ` Huan Yang
  2024-08-17  0:58   ` Kasireddy, Vivek
  2024-08-13  9:05 ` [PATCH v3 5/5] udmabuf: remove udmabuf_folio Huan Yang
  4 siblings, 1 reply; 15+ messages in thread
From: Huan Yang @ 2024-08-13  9:05 UTC (permalink / raw)
  To: Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel,
	linux-media, linaro-mm-sig, linux-kernel
  Cc: opensource.kernel, Huan Yang

There are some variables in udmabuf_create that are only used inside the
loop. Therefore, there is no need to declare them outside the scope.
This patch moved it into loop.

It is difficult to understand the loop condition of the code that adds
folio to the unpin_list.

The outer loop of this patch iterates through folios, while the inner
loop correctly sets the folio and corresponding offset into the udmabuf
starting from the offset. if reach to pgcnt or nr_folios, end of loop.

By this, more readable.

Signed-off-by: Huan Yang <link@vivo.com>
---
 drivers/dma-buf/udmabuf.c | 65 ++++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 32 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 4ec54c60d76c..8f9cb0e2e71a 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -296,12 +296,12 @@ static long udmabuf_create(struct miscdevice *device,
 			   struct udmabuf_create_list *head,
 			   struct udmabuf_create_item *list)
 {
-	pgoff_t pgoff, pgcnt, pglimit, pgbuf = 0;
-	long nr_folios, ret = -EINVAL;
+	pgoff_t upgcnt = 0, pglimit, pgbuf = 0;
+	long ret = -EINVAL;
 	struct file *memfd = NULL;
 	struct folio **folios;
 	struct udmabuf *ubuf;
-	u32 i, j, k, flags;
+	u32 i, flags;
 	loff_t end;
 
 	ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL);
@@ -315,22 +315,21 @@ static long udmabuf_create(struct miscdevice *device,
 			goto err;
 		if (!IS_ALIGNED(list[i].size, PAGE_SIZE))
 			goto err;
-		ubuf->pagecount += list[i].size >> PAGE_SHIFT;
+		upgcnt += list[i].size >> PAGE_SHIFT;
 		if (ubuf->pagecount > pglimit)
 			goto err;
 	}
 
-	if (!ubuf->pagecount)
+	if (!upgcnt)
 		goto err;
 
-	ubuf->folios = kvmalloc_array(ubuf->pagecount, sizeof(*ubuf->folios),
+	ubuf->folios = kvmalloc_array(upgcnt, sizeof(*ubuf->folios),
 				      GFP_KERNEL);
 	if (!ubuf->folios) {
 		ret = -ENOMEM;
 		goto err;
 	}
-	ubuf->offsets = kvcalloc(ubuf->pagecount, sizeof(*ubuf->offsets),
-				 GFP_KERNEL);
+	ubuf->offsets = kvcalloc(upgcnt, sizeof(*ubuf->offsets), GFP_KERNEL);
 	if (!ubuf->offsets) {
 		ret = -ENOMEM;
 		goto err;
@@ -338,6 +337,10 @@ static long udmabuf_create(struct miscdevice *device,
 
 	pgbuf = 0;
 	for (i = 0; i < head->count; i++) {
+		pgoff_t pgoff, pgcnt;
+		u32 j, cnt;
+		long nr_folios;
+
 		memfd = fget(list[i].memfd);
 		ret = check_memfd_seals(memfd);
 		if (ret < 0)
@@ -351,38 +354,36 @@ static long udmabuf_create(struct miscdevice *device,
 		}
 
 		end = list[i].offset + (pgcnt << PAGE_SHIFT) - 1;
-		ret = memfd_pin_folios(memfd, list[i].offset, end,
-				       folios, pgcnt, &pgoff);
-		if (ret <= 0) {
+		nr_folios = memfd_pin_folios(memfd, list[i].offset, end, folios,
+					     pgcnt, &pgoff);
+		if (nr_folios <= 0) {
 			kvfree(folios);
-			if (!ret)
-				ret = -EINVAL;
+			ret = nr_folios ? nr_folios : -EINVAL;
 			goto err;
 		}
 
-		nr_folios = ret;
-		pgoff >>= PAGE_SHIFT;
-		for (j = 0, k = 0; j < pgcnt; j++) {
-			ubuf->folios[pgbuf] = folios[k];
-			ubuf->offsets[pgbuf] = pgoff << PAGE_SHIFT;
-
-			if (j == 0 || ubuf->folios[pgbuf-1] != folios[k]) {
-				ret = add_to_unpin_list(&ubuf->unpin_list,
-							folios[k]);
-				if (ret < 0) {
-					kfree(folios);
-					goto err;
-				}
+		for (j = 0, cnt = 0; j < nr_folios; ++j, pgoff = 0) {
+			u32 k;
+			long fsize = folio_size(folios[j]);
+
+			ret = add_to_unpin_list(&ubuf->unpin_list, folios[j]);
+			if (ret < 0) {
+				kfree(folios);
+				goto err;
 			}
 
-			pgbuf++;
-			if (++pgoff == folio_nr_pages(folios[k])) {
-				pgoff = 0;
-				if (++k == nr_folios)
-					break;
+			for (k = pgoff; k < fsize; k += PAGE_SIZE) {
+				ubuf->folios[pgbuf] = folios[j];
+				ubuf->offsets[pgbuf] = k;
+				++pgbuf;
+
+				if (++cnt >= pgcnt)
+					goto end;
 			}
 		}
-
+end:
+		// update the number of pages that have already been set up.
+		ubuf->pagecount += pgcnt;
 		kvfree(folios);
 		fput(memfd);
 		memfd = NULL;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 5/5] udmabuf: remove udmabuf_folio
  2024-08-13  9:05 [PATCH v3 0/5] udmbuf bug fix and some improvements Huan Yang
                   ` (3 preceding siblings ...)
  2024-08-13  9:05 ` [PATCH v3 4/5] udmabuf: codestyle cleanup Huan Yang
@ 2024-08-13  9:05 ` Huan Yang
  2024-08-16 12:54   ` kernel test robot
  2024-08-17  1:05   ` Kasireddy, Vivek
  4 siblings, 2 replies; 15+ messages in thread
From: Huan Yang @ 2024-08-13  9:05 UTC (permalink / raw)
  To: Gerd Hoffmann, Sumit Semwal, Christian König, dri-devel,
	linux-media, linaro-mm-sig, linux-kernel
  Cc: opensource.kernel, Huan Yang

Currently, udmabuf handles folio by creating an unpin list to record
each folio obtained from the list and unpinning them when released. To
maintain this approach, many data structures have been established.

However, maintaining this type of data structure requires a significant
amount of memory and traversing the list is a substantial overhead,
which is not friendly to the CPU cache.

Considering that during creation, we arranged the folio array in the
order of pin and set the offset according to pgcnt.

We actually don't need to use unpin_list to unpin during release.
Instead, we can iterate through the folios array during release and
unpin any folio that is different from the ones previously accessed.

By this, not only saves the overhead of the udmabuf_folio data structure
but also makes array access more cache-friendly.

Signed-off-by: Huan Yang <link@vivo.com>
---
 drivers/dma-buf/udmabuf.c | 68 +++++++++++++++++----------------------
 1 file changed, 30 insertions(+), 38 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 8f9cb0e2e71a..1e7f46c33d1a 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -26,16 +26,19 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a dmabuf, in megabytes. Default is
 
 struct udmabuf {
 	pgoff_t pagecount;
-	struct folio **folios;
 	struct sg_table *sg;
 	struct miscdevice *device;
+	struct folio **folios;
+	/**
+	 * offset in folios array's folio, byte unit.
+	 * udmabuf can use either shmem or hugetlb pages, an array based on
+	 * pages may not be suitable.
+	 * Especially when HVO is enabled, the tail page will be released,
+	 * so our reference to the page will no longer be correct.
+	 * Hence, it's necessary to record the offset in order to reference
+	 * the correct PFN within the folio.
+	 */
 	pgoff_t *offsets;
-	struct list_head unpin_list;
-};
-
-struct udmabuf_folio {
-	struct folio *folio;
-	struct list_head list;
 };
 
 static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma)
@@ -160,32 +163,28 @@ static void unmap_udmabuf(struct dma_buf_attachment *at,
 	return put_sg_table(at->dev, sg, direction);
 }
 
-static void unpin_all_folios(struct list_head *unpin_list)
+/**
+ * unpin_all_folios:		unpin each folio we pinned in create
+ * The udmabuf set all folio in folios and pinned it, but for large folio,
+ * We may have only used a small portion of the physical in the folio.
+ * we will repeatedly, sequentially set the folio into the array to ensure
+ * that the offset can index the correct folio at the corresponding index.
+ * Hence, we only need to unpin the first iterred folio.
+ */
+static void unpin_all_folios(struct udmabuf *ubuf)
 {
-	struct udmabuf_folio *ubuf_folio;
-
-	while (!list_empty(unpin_list)) {
-		ubuf_folio = list_first_entry(unpin_list,
-					      struct udmabuf_folio, list);
-		unpin_folio(ubuf_folio->folio);
-
-		list_del(&ubuf_folio->list);
-		kfree(ubuf_folio);
-	}
-}
+	pgoff_t pg;
+	struct folio *last = NULL;
 
-static int add_to_unpin_list(struct list_head *unpin_list,
-			     struct folio *folio)
-{
-	struct udmabuf_folio *ubuf_folio;
+	for (pg = 0; pg < ubuf->pagecount; pg++) {
+		struct folio *tmp = ubuf->folios[pg];
 
-	ubuf_folio = kzalloc(sizeof(*ubuf_folio), GFP_KERNEL);
-	if (!ubuf_folio)
-		return -ENOMEM;
+		if (tmp == last)
+			continue;
 
-	ubuf_folio->folio = folio;
-	list_add_tail(&ubuf_folio->list, unpin_list);
-	return 0;
+		last = tmp;
+		unpin_folio(tmp);
+	}
 }
 
 static void release_udmabuf(struct dma_buf *buf)
@@ -196,7 +195,7 @@ static void release_udmabuf(struct dma_buf *buf)
 	if (ubuf->sg)
 		put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
 
-	unpin_all_folios(&ubuf->unpin_list);
+	unpin_all_folios(ubuf);
 	kvfree(ubuf->offsets);
 	kvfree(ubuf->folios);
 	kfree(ubuf);
@@ -308,7 +307,6 @@ static long udmabuf_create(struct miscdevice *device,
 	if (!ubuf)
 		return -ENOMEM;
 
-	INIT_LIST_HEAD(&ubuf->unpin_list);
 	pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
 	for (i = 0; i < head->count; i++) {
 		if (!IS_ALIGNED(list[i].offset, PAGE_SIZE))
@@ -366,12 +364,6 @@ static long udmabuf_create(struct miscdevice *device,
 			u32 k;
 			long fsize = folio_size(folios[j]);
 
-			ret = add_to_unpin_list(&ubuf->unpin_list, folios[j]);
-			if (ret < 0) {
-				kfree(folios);
-				goto err;
-			}
-
 			for (k = pgoff; k < fsize; k += PAGE_SIZE) {
 				ubuf->folios[pgbuf] = folios[j];
 				ubuf->offsets[pgbuf] = k;
@@ -399,7 +391,7 @@ static long udmabuf_create(struct miscdevice *device,
 err:
 	if (memfd)
 		fput(memfd);
-	unpin_all_folios(&ubuf->unpin_list);
+	unpin_all_folios(ubuf);
 	kvfree(ubuf->offsets);
 	kvfree(ubuf->folios);
 	kfree(ubuf);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 5/5] udmabuf: remove udmabuf_folio
  2024-08-13  9:05 ` [PATCH v3 5/5] udmabuf: remove udmabuf_folio Huan Yang
@ 2024-08-16 12:54   ` kernel test robot
  2024-08-17  1:05   ` Kasireddy, Vivek
  1 sibling, 0 replies; 15+ messages in thread
From: kernel test robot @ 2024-08-16 12:54 UTC (permalink / raw)
  To: Huan Yang, Gerd Hoffmann, Sumit Semwal, Christian König,
	dri-devel, linux-media, linaro-mm-sig, linux-kernel
  Cc: oe-kbuild-all, opensource.kernel, Huan Yang

Hi Huan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 033a4691702cdca3a613256b0623b8eeacb4985e]

url:    https://github.com/intel-lab-lkp/linux/commits/Huan-Yang/udmabuf-cancel-mmap-page-fault-direct-map-it/20240814-231504
base:   033a4691702cdca3a613256b0623b8eeacb4985e
patch link:    https://lore.kernel.org/r/20240813090518.3252469-6-link%40vivo.com
patch subject: [PATCH v3 5/5] udmabuf: remove udmabuf_folio
config: s390-allyesconfig (https://download.01.org/0day-ci/archive/20240816/202408162012.cL9pnFSm-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240816/202408162012.cL9pnFSm-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408162012.cL9pnFSm-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/dma-buf/udmabuf.c:175: warning: Function parameter or struct member 'ubuf' not described in 'unpin_all_folios'


vim +175 drivers/dma-buf/udmabuf.c

17a7ce20349045 Gurchetan Singh 2019-12-02  165  
d934739404652b Huan Yang       2024-08-13  166  /**
d934739404652b Huan Yang       2024-08-13  167   * unpin_all_folios:		unpin each folio we pinned in create
d934739404652b Huan Yang       2024-08-13  168   * The udmabuf set all folio in folios and pinned it, but for large folio,
d934739404652b Huan Yang       2024-08-13  169   * We may have only used a small portion of the physical in the folio.
d934739404652b Huan Yang       2024-08-13  170   * we will repeatedly, sequentially set the folio into the array to ensure
d934739404652b Huan Yang       2024-08-13  171   * that the offset can index the correct folio at the corresponding index.
d934739404652b Huan Yang       2024-08-13  172   * Hence, we only need to unpin the first iterred folio.
d934739404652b Huan Yang       2024-08-13  173   */
d934739404652b Huan Yang       2024-08-13  174  static void unpin_all_folios(struct udmabuf *ubuf)
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23 @175  {
d934739404652b Huan Yang       2024-08-13  176  	pgoff_t pg;
d934739404652b Huan Yang       2024-08-13  177  	struct folio *last = NULL;
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23  178  
d934739404652b Huan Yang       2024-08-13  179  	for (pg = 0; pg < ubuf->pagecount; pg++) {
d934739404652b Huan Yang       2024-08-13  180  		struct folio *tmp = ubuf->folios[pg];
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23  181  
d934739404652b Huan Yang       2024-08-13  182  		if (tmp == last)
d934739404652b Huan Yang       2024-08-13  183  			continue;
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23  184  
d934739404652b Huan Yang       2024-08-13  185  		last = tmp;
d934739404652b Huan Yang       2024-08-13  186  		unpin_folio(tmp);
d934739404652b Huan Yang       2024-08-13  187  	}
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23  188  }
c6a3194c05e7e6 Vivek Kasireddy 2024-06-23  189  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v3 1/5] udmabuf: cancel mmap page fault, direct map it
  2024-08-13  9:05 ` [PATCH v3 1/5] udmabuf: cancel mmap page fault, direct map it Huan Yang
@ 2024-08-17  0:53   ` Kasireddy, Vivek
  2024-08-20  1:30     ` Huan Yang
  0 siblings, 1 reply; 15+ messages in thread
From: Kasireddy, Vivek @ 2024-08-17  0:53 UTC (permalink / raw)
  To: Huan Yang, Gerd Hoffmann, Sumit Semwal, Christian König,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org
  Cc: opensource.kernel@vivo.com

Hi Huan,

> 
> The current udmabuf mmap uses a page fault to populate the vma.
> 
> However, the current udmabuf has already obtained and pinned the folio
> upon completion of the creation.This means that the physical memory has
> already been acquired, rather than being accessed dynamically. The
> current page fault method only saves some page table memory.
> 
> As a result, the page fault has lost its purpose as a demanding
> page. Due to the fact that page fault requires trapping into kernel mode
> and filling in when accessing the corresponding virtual address in mmap,
> when creating a large size udmabuf, this represents a considerable
> overhead.
> 
> The current patch removes the page fault method of mmap and
> instead fills pfn directly when mmap is triggered.
> 
> Signed-off-by: Huan Yang <link@vivo.com>
> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> ---
>  drivers/dma-buf/udmabuf.c | 37 +++++++++++++++----------------------
>  1 file changed, 15 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 047c3cd2ceff..d39f9e1cd532 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -38,36 +38,29 @@ struct udmabuf_folio {
>  	struct list_head list;
>  };
> 
> -static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
> -{
> -	struct vm_area_struct *vma = vmf->vma;
> -	struct udmabuf *ubuf = vma->vm_private_data;
> -	pgoff_t pgoff = vmf->pgoff;
> -	unsigned long pfn;
> -
> -	if (pgoff >= ubuf->pagecount)
> -		return VM_FAULT_SIGBUS;
> -
> -	pfn = folio_pfn(ubuf->folios[pgoff]);
> -	pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
> -
> -	return vmf_insert_pfn(vma, vmf->address, pfn);
> -}
> -
> -static const struct vm_operations_struct udmabuf_vm_ops = {
> -	.fault = udmabuf_vm_fault,
> -};
So, what I was suggesting earlier is that it would be OK to populate the whole
vma after first fault because userspace can simply call mmap() but choose not
to use the returned pointer for various reasons. This is what Qemu's virtio-gpu
module does and in this case we'd be unnecessarily populating the vma.

Therefore, my request to you is to try to benchmark your userspace to see if
there is a significant difference in performance when you populate the vma
during mmap() vs doing it after first fault (which means moving the for loop
to udmabuf_vm_fault()).

> -
>  static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct
> *vma)
>  {
>  	struct udmabuf *ubuf = buf->priv;
> +	unsigned long addr;
> +	unsigned long end;
> +	unsigned long pgoff;
> +	int ret;
Looks like ret's type needs to be vm_fault_t.

> 
>  	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
>  		return -EINVAL;
> 
> -	vma->vm_ops = &udmabuf_vm_ops;
> -	vma->vm_private_data = ubuf;
>  	vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND |
> VM_DONTDUMP);
> +
> +	for (pgoff = vma->vm_pgoff, end = vma->vm_end, addr = vma-
I think initializing these variables above at the declaration time looks better
than initializing them in the for loop, IMO.

Thanks,
Vivek

> >vm_start;
> +	     addr < end; pgoff++, addr += PAGE_SIZE) {
> +		unsigned long pfn = folio_pfn(ubuf->folios[pgoff]);
> +
> +		pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
> +		ret = vmf_insert_pfn(vma, addr, pfn);
> +		if (ret & VM_FAULT_ERROR)
> +			return vm_fault_to_errno(ret, 0);
> +	}
> +
>  	return 0;
>  }
> 
> --
> 2.45.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v3 3/5] fix vmap_udmabuf error page set
  2024-08-13  9:05 ` [PATCH v3 3/5] fix vmap_udmabuf error page set Huan Yang
@ 2024-08-17  0:54   ` Kasireddy, Vivek
  2024-08-20  1:33     ` Huan Yang
  0 siblings, 1 reply; 15+ messages in thread
From: Kasireddy, Vivek @ 2024-08-17  0:54 UTC (permalink / raw)
  To: Huan Yang, Gerd Hoffmann, Sumit Semwal, Christian König,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org
  Cc: opensource.kernel@vivo.com

Hi Huan,

> Subject: [PATCH v3 3/5] fix vmap_udmabuf error page set
Please prepend a "udmabuf:" to the subject line and improve the wording.

> 
> Currently vmap_udmabuf set page's array by each folio.
> But, ubuf->folios is only contain's the folio's head page.
> 
> That mean we repeatedly mapped the folio head page to the vmalloc area.
> 
> Due to udmabuf can use hugetlb, if HVO enabled, tail page may not exist,
> so, we can't use page array to map, instead, use pfn array.
> 
> Signed-off-by: Huan Yang <link@vivo.com>
> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> ---
>  drivers/dma-buf/udmabuf.c | 22 +++++++++++++++-------
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 3ec72d47bb1c..4ec54c60d76c 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -67,21 +67,29 @@ static int mmap_udmabuf(struct dma_buf *buf,
> struct vm_area_struct *vma)
>  static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map)
>  {
>  	struct udmabuf *ubuf = buf->priv;
> -	struct page **pages;
> +	unsigned long *pfns;

I wish there was a way to easily test the vmap scenario but its great that
you are able to eliminate the usage of "struct page" completely from
udmabuf driver, with this patch. 

>  	void *vaddr;
>  	pgoff_t pg;
> 
>  	dma_resv_assert_held(buf->resv);
> 
> -	pages = kvmalloc_array(ubuf->pagecount, sizeof(*pages),
> GFP_KERNEL);
> -	if (!pages)
> +	/**
> +	 * HVO may free tail pages, so just use pfn to map each folio
> +	 * into vmalloc area.
> +	 */
> +	pfns = kvmalloc_array(ubuf->pagecount, sizeof(*pfns), GFP_KERNEL);
> +	if (!pfns)
>  		return -ENOMEM;
> 
> -	for (pg = 0; pg < ubuf->pagecount; pg++)
> -		pages[pg] = &ubuf->folios[pg]->page;
> +	for (pg = 0; pg < ubuf->pagecount; pg++) {
> +		unsigned long pfn = folio_pfn(ubuf->folios[pg]);
> 
> -	vaddr = vm_map_ram(pages, ubuf->pagecount, -1);
> -	kvfree(pages);
> +		pfn += ubuf->offsets[pg] >> PAGE_SHIFT;
> +		pfns[pg] = pfn;
> +	}
> +
> +	vaddr = vmap_pfn(pfns, ubuf->pagecount, PAGE_KERNEL);
Looks like this would require a "select VMAP_PFN" in Kconfig.

Thanks,
Vivek

> +	kvfree(pfns);
>  	if (!vaddr)
>  		return -EINVAL;
> 
> --
> 2.45.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v3 4/5] udmabuf: codestyle cleanup
  2024-08-13  9:05 ` [PATCH v3 4/5] udmabuf: codestyle cleanup Huan Yang
@ 2024-08-17  0:58   ` Kasireddy, Vivek
  2024-08-20  1:37     ` Huan Yang
  0 siblings, 1 reply; 15+ messages in thread
From: Kasireddy, Vivek @ 2024-08-17  0:58 UTC (permalink / raw)
  To: Huan Yang, Gerd Hoffmann, Sumit Semwal, Christian König,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org
  Cc: opensource.kernel@vivo.com

Hi Huan,

> 
> There are some variables in udmabuf_create that are only used inside the
> loop. Therefore, there is no need to declare them outside the scope.
> This patch moved it into loop.
> 
> It is difficult to understand the loop condition of the code that adds
> folio to the unpin_list.
> 
> The outer loop of this patch iterates through folios, while the inner
> loop correctly sets the folio and corresponding offset into the udmabuf
> starting from the offset. if reach to pgcnt or nr_folios, end of loop.
> 
> By this, more readable.
> 
> Signed-off-by: Huan Yang <link@vivo.com>
> ---
>  drivers/dma-buf/udmabuf.c | 65 ++++++++++++++++++++-------------------
>  1 file changed, 33 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 4ec54c60d76c..8f9cb0e2e71a 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -296,12 +296,12 @@ static long udmabuf_create(struct miscdevice
> *device,
>  			   struct udmabuf_create_list *head,
>  			   struct udmabuf_create_item *list)
>  {
> -	pgoff_t pgoff, pgcnt, pglimit, pgbuf = 0;
> -	long nr_folios, ret = -EINVAL;
> +	pgoff_t upgcnt = 0, pglimit, pgbuf = 0;
> +	long ret = -EINVAL;
>  	struct file *memfd = NULL;
>  	struct folio **folios;
>  	struct udmabuf *ubuf;
> -	u32 i, j, k, flags;
> +	u32 i, flags;
>  	loff_t end;
> 
>  	ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL);
> @@ -315,22 +315,21 @@ static long udmabuf_create(struct miscdevice
> *device,
>  			goto err;
>  		if (!IS_ALIGNED(list[i].size, PAGE_SIZE))
>  			goto err;
> -		ubuf->pagecount += list[i].size >> PAGE_SHIFT;
> +		upgcnt += list[i].size >> PAGE_SHIFT;
>  		if (ubuf->pagecount > pglimit)
>  			goto err;
>  	}
> 
> -	if (!ubuf->pagecount)
> +	if (!upgcnt)
>  		goto err;
> 
> -	ubuf->folios = kvmalloc_array(ubuf->pagecount, sizeof(*ubuf-
> >folios),
> +	ubuf->folios = kvmalloc_array(upgcnt, sizeof(*ubuf->folios),
>  				      GFP_KERNEL);
>  	if (!ubuf->folios) {
>  		ret = -ENOMEM;
>  		goto err;
>  	}
> -	ubuf->offsets = kvcalloc(ubuf->pagecount, sizeof(*ubuf->offsets),
> -				 GFP_KERNEL);
> +	ubuf->offsets = kvcalloc(upgcnt, sizeof(*ubuf->offsets),
> GFP_KERNEL);
>  	if (!ubuf->offsets) {
>  		ret = -ENOMEM;
>  		goto err;
> @@ -338,6 +337,10 @@ static long udmabuf_create(struct miscdevice
> *device,
> 
>  	pgbuf = 0;
>  	for (i = 0; i < head->count; i++) {
> +		pgoff_t pgoff, pgcnt;
> +		u32 j, cnt;
> +		long nr_folios;
> +
>  		memfd = fget(list[i].memfd);
>  		ret = check_memfd_seals(memfd);
>  		if (ret < 0)
> @@ -351,38 +354,36 @@ static long udmabuf_create(struct miscdevice
> *device,
>  		}
> 
>  		end = list[i].offset + (pgcnt << PAGE_SHIFT) - 1;
> -		ret = memfd_pin_folios(memfd, list[i].offset, end,
> -				       folios, pgcnt, &pgoff);
> -		if (ret <= 0) {
> +		nr_folios = memfd_pin_folios(memfd, list[i].offset, end,
> folios,
> +					     pgcnt, &pgoff);
> +		if (nr_folios <= 0) {
>  			kvfree(folios);
> -			if (!ret)
> -				ret = -EINVAL;
> +			ret = nr_folios ? nr_folios : -EINVAL;
>  			goto err;
>  		}
> 
> -		nr_folios = ret;
> -		pgoff >>= PAGE_SHIFT;
> -		for (j = 0, k = 0; j < pgcnt; j++) {
> -			ubuf->folios[pgbuf] = folios[k];
> -			ubuf->offsets[pgbuf] = pgoff << PAGE_SHIFT;
> -
> -			if (j == 0 || ubuf->folios[pgbuf-1] != folios[k]) {
> -				ret = add_to_unpin_list(&ubuf->unpin_list,
> -							folios[k]);
> -				if (ret < 0) {
> -					kfree(folios);
> -					goto err;
> -				}
I can see that having a loop that iterates from 0 to nr_folios is more intuitive
compared to the previous version which was a legacy carryover.

> +		for (j = 0, cnt = 0; j < nr_folios; ++j, pgoff = 0) {
Can all the code in this outer loop be moved into a separate function? This
would reduce the length of udmabuf_create().

> +			u32 k;
> +			long fsize = folio_size(folios[j]);
> +
> +			ret = add_to_unpin_list(&ubuf->unpin_list, folios[j]);
> +			if (ret < 0) {
> +				kfree(folios);
> +				goto err;
>  			}
> 
> -			pgbuf++;
> -			if (++pgoff == folio_nr_pages(folios[k])) {
> -				pgoff = 0;
> -				if (++k == nr_folios)
> -					break;
> +			for (k = pgoff; k < fsize; k += PAGE_SIZE) {
I think renaming k to something like subpgoff or tailpgoff would make this
more easier to understand.

> +				ubuf->folios[pgbuf] = folios[j];
> +				ubuf->offsets[pgbuf] = k;
> +				++pgbuf;
> +
> +				if (++cnt >= pgcnt)
> +					goto end;
>  			}
>  		}
> -
> +end:
> +		// update the number of pages that have already been set
> up.
> +		ubuf->pagecount += pgcnt;
Since pgbuf also reflects the total number of pages (or folios) processed,
I think you can use that instead of having to mess with pagecount.

Thanks,
Vivek

>  		kvfree(folios);
>  		fput(memfd);
>  		memfd = NULL;
> --
> 2.45.2
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v3 5/5] udmabuf: remove udmabuf_folio
  2024-08-13  9:05 ` [PATCH v3 5/5] udmabuf: remove udmabuf_folio Huan Yang
  2024-08-16 12:54   ` kernel test robot
@ 2024-08-17  1:05   ` Kasireddy, Vivek
  2024-08-20  1:41     ` Huan Yang
  1 sibling, 1 reply; 15+ messages in thread
From: Kasireddy, Vivek @ 2024-08-17  1:05 UTC (permalink / raw)
  To: Huan Yang, Gerd Hoffmann, Sumit Semwal, Christian König,
	dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
	linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org
  Cc: opensource.kernel@vivo.com

Hi Huan,

> 
> Currently, udmabuf handles folio by creating an unpin list to record
> each folio obtained from the list and unpinning them when released. To
> maintain this approach, many data structures have been established.
> 
> However, maintaining this type of data structure requires a significant
> amount of memory and traversing the list is a substantial overhead,
> which is not friendly to the CPU cache.
> 
> Considering that during creation, we arranged the folio array in the
> order of pin and set the offset according to pgcnt.
> 
> We actually don't need to use unpin_list to unpin during release.
> Instead, we can iterate through the folios array during release and
> unpin any folio that is different from the ones previously accessed.
No, that won't work because iterating the folios array doesn't tell you
anything about how many times a folio was pinned (via memfd_pin_folios()),
as a folio could be part of multiple ranges.

For example, if userspace provides ranges 64..128 and 256..512 (assuming
these are 4k sized subpage offsets and we have a 2MB hugetlb folio), then
the same folio would cover both ranges and there would be 2 entries for
this folio in unpin_list. But, with your logic, we'd be erroneously unpinning
it only once.

Not sure if there are any great solutions available to address this situation,
but one option I can think of is to convert unpin_list to unpin array (dynamically
resized with krealloc?) and track its length separately. Or, as suggested earlier,
another way is to not use unpin_list for memfds backed by shmem, but I suspect
this may not work if THP is enabled.

Thanks,
Vivek

> 
> By this, not only saves the overhead of the udmabuf_folio data structure
> but also makes array access more cache-friendly.
> 
> Signed-off-by: Huan Yang <link@vivo.com>
> ---
>  drivers/dma-buf/udmabuf.c | 68 +++++++++++++++++----------------------
>  1 file changed, 30 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
> index 8f9cb0e2e71a..1e7f46c33d1a 100644
> --- a/drivers/dma-buf/udmabuf.c
> +++ b/drivers/dma-buf/udmabuf.c
> @@ -26,16 +26,19 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a
> dmabuf, in megabytes. Default is
> 
>  struct udmabuf {
>  	pgoff_t pagecount;
> -	struct folio **folios;
>  	struct sg_table *sg;
>  	struct miscdevice *device;
> +	struct folio **folios;
> +	/**
> +	 * offset in folios array's folio, byte unit.
> +	 * udmabuf can use either shmem or hugetlb pages, an array based
> on
> +	 * pages may not be suitable.
> +	 * Especially when HVO is enabled, the tail page will be released,
> +	 * so our reference to the page will no longer be correct.
> +	 * Hence, it's necessary to record the offset in order to reference
> +	 * the correct PFN within the folio.
> +	 */
>  	pgoff_t *offsets;
> -	struct list_head unpin_list;
> -};
> -
> -struct udmabuf_folio {
> -	struct folio *folio;
> -	struct list_head list;
>  };
> 
>  static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct
> *vma)
> @@ -160,32 +163,28 @@ static void unmap_udmabuf(struct
> dma_buf_attachment *at,
>  	return put_sg_table(at->dev, sg, direction);
>  }
> 
> -static void unpin_all_folios(struct list_head *unpin_list)
> +/**
> + * unpin_all_folios:		unpin each folio we pinned in create
> + * The udmabuf set all folio in folios and pinned it, but for large folio,
> + * We may have only used a small portion of the physical in the folio.
> + * we will repeatedly, sequentially set the folio into the array to ensure
> + * that the offset can index the correct folio at the corresponding index.
> + * Hence, we only need to unpin the first iterred folio.
> + */
> +static void unpin_all_folios(struct udmabuf *ubuf)
>  {
> -	struct udmabuf_folio *ubuf_folio;
> -
> -	while (!list_empty(unpin_list)) {
> -		ubuf_folio = list_first_entry(unpin_list,
> -					      struct udmabuf_folio, list);
> -		unpin_folio(ubuf_folio->folio);
> -
> -		list_del(&ubuf_folio->list);
> -		kfree(ubuf_folio);
> -	}
> -}
> +	pgoff_t pg;
> +	struct folio *last = NULL;
> 
> -static int add_to_unpin_list(struct list_head *unpin_list,
> -			     struct folio *folio)
> -{
> -	struct udmabuf_folio *ubuf_folio;
> +	for (pg = 0; pg < ubuf->pagecount; pg++) {
> +		struct folio *tmp = ubuf->folios[pg];
> 
> -	ubuf_folio = kzalloc(sizeof(*ubuf_folio), GFP_KERNEL);
> -	if (!ubuf_folio)
> -		return -ENOMEM;
> +		if (tmp == last)
> +			continue;
> 
> -	ubuf_folio->folio = folio;
> -	list_add_tail(&ubuf_folio->list, unpin_list);
> -	return 0;
> +		last = tmp;
> +		unpin_folio(tmp);
> +	}
>  }
> 
>  static void release_udmabuf(struct dma_buf *buf)
> @@ -196,7 +195,7 @@ static void release_udmabuf(struct dma_buf *buf)
>  	if (ubuf->sg)
>  		put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
> 
> -	unpin_all_folios(&ubuf->unpin_list);
> +	unpin_all_folios(ubuf);
>  	kvfree(ubuf->offsets);
>  	kvfree(ubuf->folios);
>  	kfree(ubuf);
> @@ -308,7 +307,6 @@ static long udmabuf_create(struct miscdevice
> *device,
>  	if (!ubuf)
>  		return -ENOMEM;
> 
> -	INIT_LIST_HEAD(&ubuf->unpin_list);
>  	pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
>  	for (i = 0; i < head->count; i++) {
>  		if (!IS_ALIGNED(list[i].offset, PAGE_SIZE))
> @@ -366,12 +364,6 @@ static long udmabuf_create(struct miscdevice
> *device,
>  			u32 k;
>  			long fsize = folio_size(folios[j]);
> 
> -			ret = add_to_unpin_list(&ubuf->unpin_list, folios[j]);
> -			if (ret < 0) {
> -				kfree(folios);
> -				goto err;
> -			}
> -
>  			for (k = pgoff; k < fsize; k += PAGE_SIZE) {
>  				ubuf->folios[pgbuf] = folios[j];
>  				ubuf->offsets[pgbuf] = k;
> @@ -399,7 +391,7 @@ static long udmabuf_create(struct miscdevice
> *device,
>  err:
>  	if (memfd)
>  		fput(memfd);
> -	unpin_all_folios(&ubuf->unpin_list);
> +	unpin_all_folios(ubuf);
>  	kvfree(ubuf->offsets);
>  	kvfree(ubuf->folios);
>  	kfree(ubuf);
> --
> 2.45.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/5] udmabuf: cancel mmap page fault, direct map it
  2024-08-17  0:53   ` Kasireddy, Vivek
@ 2024-08-20  1:30     ` Huan Yang
  0 siblings, 0 replies; 15+ messages in thread
From: Huan Yang @ 2024-08-20  1:30 UTC (permalink / raw)
  To: Kasireddy, Vivek, Gerd Hoffmann, Sumit Semwal,
	Christian König, dri-devel@lists.freedesktop.org,
	linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	linux-kernel@vger.kernel.org
  Cc: opensource.kernel@vivo.com


在 2024/8/17 8:53, Kasireddy, Vivek 写道:
> Hi Huan,
>
>> The current udmabuf mmap uses a page fault to populate the vma.
>>
>> However, the current udmabuf has already obtained and pinned the folio
>> upon completion of the creation.This means that the physical memory has
>> already been acquired, rather than being accessed dynamically. The
>> current page fault method only saves some page table memory.
>>
>> As a result, the page fault has lost its purpose as a demanding
>> page. Due to the fact that page fault requires trapping into kernel mode
>> and filling in when accessing the corresponding virtual address in mmap,
>> when creating a large size udmabuf, this represents a considerable
>> overhead.
>>
>> The current patch removes the page fault method of mmap and
>> instead fills pfn directly when mmap is triggered.
>>
>> Signed-off-by: Huan Yang <link@vivo.com>
>> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>> ---
>>   drivers/dma-buf/udmabuf.c | 37 +++++++++++++++----------------------
>>   1 file changed, 15 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>> index 047c3cd2ceff..d39f9e1cd532 100644
>> --- a/drivers/dma-buf/udmabuf.c
>> +++ b/drivers/dma-buf/udmabuf.c
>> @@ -38,36 +38,29 @@ struct udmabuf_folio {
>>   	struct list_head list;
>>   };
>>
>> -static vm_fault_t udmabuf_vm_fault(struct vm_fault *vmf)
>> -{
>> -	struct vm_area_struct *vma = vmf->vma;
>> -	struct udmabuf *ubuf = vma->vm_private_data;
>> -	pgoff_t pgoff = vmf->pgoff;
>> -	unsigned long pfn;
>> -
>> -	if (pgoff >= ubuf->pagecount)
>> -		return VM_FAULT_SIGBUS;
>> -
>> -	pfn = folio_pfn(ubuf->folios[pgoff]);
>> -	pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
>> -
>> -	return vmf_insert_pfn(vma, vmf->address, pfn);
>> -}
>> -
>> -static const struct vm_operations_struct udmabuf_vm_ops = {
>> -	.fault = udmabuf_vm_fault,
>> -};
> So, what I was suggesting earlier is that it would be OK to populate the whole
> vma after first fault because userspace can simply call mmap() but choose not
> to use the returned pointer for various reasons. This is what Qemu's virtio-gpu
> module does and in this case we'd be unnecessarily populating the vma.

I may get your point. Fill pgtable when access is better than fill when 
invoke mmap?

This is reasonable. And I'll try to test it too.

IMO, there won't be much of a difference in performance.

>
> Therefore, my request to you is to try to benchmark your userspace to see if
> there is a significant difference in performance when you populate the vma
> during mmap() vs doing it after first fault (which means moving the for loop
> to udmabuf_vm_fault()).
>
>> -
>>   static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct
>> *vma)
>>   {
>>   	struct udmabuf *ubuf = buf->priv;
>> +	unsigned long addr;
>> +	unsigned long end;
>> +	unsigned long pgoff;
>> +	int ret;
> Looks like ret's type needs to be vm_fault_t.
>
>>   	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0)
>>   		return -EINVAL;
>>
>> -	vma->vm_ops = &udmabuf_vm_ops;
>> -	vma->vm_private_data = ubuf;
>>   	vm_flags_set(vma, VM_PFNMAP | VM_DONTEXPAND |
>> VM_DONTDUMP);
>> +
>> +	for (pgoff = vma->vm_pgoff, end = vma->vm_end, addr = vma-
> I think initializing these variables above at the declaration time looks better
> than initializing them in the for loop, IMO.

Yes, even though initializing in the loop declaration can better 
indicate which variables the loop needs,

here it makes the loop declaration too long.

I'll change it in next version.

Thanks.

>
> Thanks,
> Vivek
>
>>> vm_start;
>> +	     addr < end; pgoff++, addr += PAGE_SIZE) {
>> +		unsigned long pfn = folio_pfn(ubuf->folios[pgoff]);
>> +
>> +		pfn += ubuf->offsets[pgoff] >> PAGE_SHIFT;
>> +		ret = vmf_insert_pfn(vma, addr, pfn);
>> +		if (ret & VM_FAULT_ERROR)
>> +			return vm_fault_to_errno(ret, 0);
>> +	}
>> +
>>   	return 0;
>>   }
>>
>> --
>> 2.45.2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/5] fix vmap_udmabuf error page set
  2024-08-17  0:54   ` Kasireddy, Vivek
@ 2024-08-20  1:33     ` Huan Yang
  0 siblings, 0 replies; 15+ messages in thread
From: Huan Yang @ 2024-08-20  1:33 UTC (permalink / raw)
  To: Kasireddy, Vivek, Gerd Hoffmann, Sumit Semwal,
	Christian König, dri-devel@lists.freedesktop.org,
	linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	linux-kernel@vger.kernel.org
  Cc: opensource.kernel@vivo.com


在 2024/8/17 8:54, Kasireddy, Vivek 写道:
> Hi Huan,
>
>> Subject: [PATCH v3 3/5] fix vmap_udmabuf error page set
> Please prepend a "udmabuf:" to the subject line and improve the wording.
OK, sorry for this mistake.
>
>> Currently vmap_udmabuf set page's array by each folio.
>> But, ubuf->folios is only contain's the folio's head page.
>>
>> That mean we repeatedly mapped the folio head page to the vmalloc area.
>>
>> Due to udmabuf can use hugetlb, if HVO enabled, tail page may not exist,
>> so, we can't use page array to map, instead, use pfn array.
>>
>> Signed-off-by: Huan Yang <link@vivo.com>
>> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>> ---
>>   drivers/dma-buf/udmabuf.c | 22 +++++++++++++++-------
>>   1 file changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>> index 3ec72d47bb1c..4ec54c60d76c 100644
>> --- a/drivers/dma-buf/udmabuf.c
>> +++ b/drivers/dma-buf/udmabuf.c
>> @@ -67,21 +67,29 @@ static int mmap_udmabuf(struct dma_buf *buf,
>> struct vm_area_struct *vma)
>>   static int vmap_udmabuf(struct dma_buf *buf, struct iosys_map *map)
>>   {
>>   	struct udmabuf *ubuf = buf->priv;
>> -	struct page **pages;
>> +	unsigned long *pfns;
> I wish there was a way to easily test the vmap scenario but its great that

It need a driver set in kernel. maybe write a debug code in udmabuf?(not 
sure)

> you are able to eliminate the usage of "struct page" completely from
> udmabuf driver, with this patch.
>
>>   	void *vaddr;
>>   	pgoff_t pg;
>>
>>   	dma_resv_assert_held(buf->resv);
>>
>> -	pages = kvmalloc_array(ubuf->pagecount, sizeof(*pages),
>> GFP_KERNEL);
>> -	if (!pages)
>> +	/**
>> +	 * HVO may free tail pages, so just use pfn to map each folio
>> +	 * into vmalloc area.
>> +	 */
>> +	pfns = kvmalloc_array(ubuf->pagecount, sizeof(*pfns), GFP_KERNEL);
>> +	if (!pfns)
>>   		return -ENOMEM;
>>
>> -	for (pg = 0; pg < ubuf->pagecount; pg++)
>> -		pages[pg] = &ubuf->folios[pg]->page;
>> +	for (pg = 0; pg < ubuf->pagecount; pg++) {
>> +		unsigned long pfn = folio_pfn(ubuf->folios[pg]);
>>
>> -	vaddr = vm_map_ram(pages, ubuf->pagecount, -1);
>> -	kvfree(pages);
>> +		pfn += ubuf->offsets[pg] >> PAGE_SHIFT;
>> +		pfns[pg] = pfn;
>> +	}
>> +
>> +	vaddr = vmap_pfn(pfns, ubuf->pagecount, PAGE_KERNEL);
> Looks like this would require a "select VMAP_PFN" in Kconfig.

Yes, I miss this.

Thank you too much.

>
> Thanks,
> Vivek
>
>> +	kvfree(pfns);
>>   	if (!vaddr)
>>   		return -EINVAL;
>>
>> --
>> 2.45.2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/5] udmabuf: codestyle cleanup
  2024-08-17  0:58   ` Kasireddy, Vivek
@ 2024-08-20  1:37     ` Huan Yang
  0 siblings, 0 replies; 15+ messages in thread
From: Huan Yang @ 2024-08-20  1:37 UTC (permalink / raw)
  To: Kasireddy, Vivek, Gerd Hoffmann, Sumit Semwal,
	Christian König, dri-devel@lists.freedesktop.org,
	linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	linux-kernel@vger.kernel.org
  Cc: opensource.kernel@vivo.com


在 2024/8/17 8:58, Kasireddy, Vivek 写道:
> Hi Huan,
>
>> There are some variables in udmabuf_create that are only used inside the
>> loop. Therefore, there is no need to declare them outside the scope.
>> This patch moved it into loop.
>>
>> It is difficult to understand the loop condition of the code that adds
>> folio to the unpin_list.
>>
>> The outer loop of this patch iterates through folios, while the inner
>> loop correctly sets the folio and corresponding offset into the udmabuf
>> starting from the offset. if reach to pgcnt or nr_folios, end of loop.
>>
>> By this, more readable.
>>
>> Signed-off-by: Huan Yang <link@vivo.com>
>> ---
>>   drivers/dma-buf/udmabuf.c | 65 ++++++++++++++++++++-------------------
>>   1 file changed, 33 insertions(+), 32 deletions(-)
>>
>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>> index 4ec54c60d76c..8f9cb0e2e71a 100644
>> --- a/drivers/dma-buf/udmabuf.c
>> +++ b/drivers/dma-buf/udmabuf.c
>> @@ -296,12 +296,12 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   			   struct udmabuf_create_list *head,
>>   			   struct udmabuf_create_item *list)
>>   {
>> -	pgoff_t pgoff, pgcnt, pglimit, pgbuf = 0;
>> -	long nr_folios, ret = -EINVAL;
>> +	pgoff_t upgcnt = 0, pglimit, pgbuf = 0;
>> +	long ret = -EINVAL;
>>   	struct file *memfd = NULL;
>>   	struct folio **folios;
>>   	struct udmabuf *ubuf;
>> -	u32 i, j, k, flags;
>> +	u32 i, flags;
>>   	loff_t end;
>>
>>   	ubuf = kzalloc(sizeof(*ubuf), GFP_KERNEL);
>> @@ -315,22 +315,21 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   			goto err;
>>   		if (!IS_ALIGNED(list[i].size, PAGE_SIZE))
>>   			goto err;
>> -		ubuf->pagecount += list[i].size >> PAGE_SHIFT;
>> +		upgcnt += list[i].size >> PAGE_SHIFT;
>>   		if (ubuf->pagecount > pglimit)
>>   			goto err;
>>   	}
>>
>> -	if (!ubuf->pagecount)
>> +	if (!upgcnt)
>>   		goto err;
>>
>> -	ubuf->folios = kvmalloc_array(ubuf->pagecount, sizeof(*ubuf-
>>> folios),
>> +	ubuf->folios = kvmalloc_array(upgcnt, sizeof(*ubuf->folios),
>>   				      GFP_KERNEL);
>>   	if (!ubuf->folios) {
>>   		ret = -ENOMEM;
>>   		goto err;
>>   	}
>> -	ubuf->offsets = kvcalloc(ubuf->pagecount, sizeof(*ubuf->offsets),
>> -				 GFP_KERNEL);
>> +	ubuf->offsets = kvcalloc(upgcnt, sizeof(*ubuf->offsets),
>> GFP_KERNEL);
>>   	if (!ubuf->offsets) {
>>   		ret = -ENOMEM;
>>   		goto err;
>> @@ -338,6 +337,10 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>
>>   	pgbuf = 0;
>>   	for (i = 0; i < head->count; i++) {
>> +		pgoff_t pgoff, pgcnt;
>> +		u32 j, cnt;
>> +		long nr_folios;
>> +
>>   		memfd = fget(list[i].memfd);
>>   		ret = check_memfd_seals(memfd);
>>   		if (ret < 0)
>> @@ -351,38 +354,36 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   		}
>>
>>   		end = list[i].offset + (pgcnt << PAGE_SHIFT) - 1;
>> -		ret = memfd_pin_folios(memfd, list[i].offset, end,
>> -				       folios, pgcnt, &pgoff);
>> -		if (ret <= 0) {
>> +		nr_folios = memfd_pin_folios(memfd, list[i].offset, end,
>> folios,
>> +					     pgcnt, &pgoff);
>> +		if (nr_folios <= 0) {
>>   			kvfree(folios);
>> -			if (!ret)
>> -				ret = -EINVAL;
>> +			ret = nr_folios ? nr_folios : -EINVAL;
>>   			goto err;
>>   		}
>>
>> -		nr_folios = ret;
>> -		pgoff >>= PAGE_SHIFT;
>> -		for (j = 0, k = 0; j < pgcnt; j++) {
>> -			ubuf->folios[pgbuf] = folios[k];
>> -			ubuf->offsets[pgbuf] = pgoff << PAGE_SHIFT;
>> -
>> -			if (j == 0 || ubuf->folios[pgbuf-1] != folios[k]) {
>> -				ret = add_to_unpin_list(&ubuf->unpin_list,
>> -							folios[k]);
>> -				if (ret < 0) {
>> -					kfree(folios);
>> -					goto err;
>> -				}
> I can see that having a loop that iterates from 0 to nr_folios is more intuitive
> compared to the previous version which was a legacy carryover.
>
>> +		for (j = 0, cnt = 0; j < nr_folios; ++j, pgoff = 0) {
> Can all the code in this outer loop be moved into a separate function? This
> would reduce the length of udmabuf_create().
I'll try this in next version
>
>> +			u32 k;
>> +			long fsize = folio_size(folios[j]);
>> +
>> +			ret = add_to_unpin_list(&ubuf->unpin_list, folios[j]);
>> +			if (ret < 0) {
>> +				kfree(folios);
>> +				goto err;
>>   			}
>>
>> -			pgbuf++;
>> -			if (++pgoff == folio_nr_pages(folios[k])) {
>> -				pgoff = 0;
>> -				if (++k == nr_folios)
>> -					break;
>> +			for (k = pgoff; k < fsize; k += PAGE_SIZE) {
> I think renaming k to something like subpgoff or tailpgoff would make this
> more easier to understand.
Yes.
>
>> +				ubuf->folios[pgbuf] = folios[j];
>> +				ubuf->offsets[pgbuf] = k;
>> +				++pgbuf;
>> +
>> +				if (++cnt >= pgcnt)
>> +					goto end;
>>   			}
>>   		}
>> -
>> +end:
>> +		// update the number of pages that have already been set
>> up.
>> +		ubuf->pagecount += pgcnt;
> Since pgbuf also reflects the total number of pages (or folios) processed,
> I think you can use that instead of having to mess with pagecount.
Yes, but need take care of when into error, ubuf->pagecount also need 
update or else, deinit in error path will also "errorr"
>
> Thanks,
> Vivek
>
>>   		kvfree(folios);
>>   		fput(memfd);
>>   		memfd = NULL;
>> --
>> 2.45.2
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 5/5] udmabuf: remove udmabuf_folio
  2024-08-17  1:05   ` Kasireddy, Vivek
@ 2024-08-20  1:41     ` Huan Yang
  0 siblings, 0 replies; 15+ messages in thread
From: Huan Yang @ 2024-08-20  1:41 UTC (permalink / raw)
  To: Kasireddy, Vivek, Gerd Hoffmann, Sumit Semwal,
	Christian König, dri-devel@lists.freedesktop.org,
	linux-media@vger.kernel.org, linaro-mm-sig@lists.linaro.org,
	linux-kernel@vger.kernel.org
  Cc: opensource.kernel@vivo.com


在 2024/8/17 9:05, Kasireddy, Vivek 写道:
> Hi Huan,
>
>> Currently, udmabuf handles folio by creating an unpin list to record
>> each folio obtained from the list and unpinning them when released. To
>> maintain this approach, many data structures have been established.
>>
>> However, maintaining this type of data structure requires a significant
>> amount of memory and traversing the list is a substantial overhead,
>> which is not friendly to the CPU cache.
>>
>> Considering that during creation, we arranged the folio array in the
>> order of pin and set the offset according to pgcnt.
>>
>> We actually don't need to use unpin_list to unpin during release.
>> Instead, we can iterate through the folios array during release and
>> unpin any folio that is different from the ones previously accessed.
> No, that won't work because iterating the folios array doesn't tell you
> anything about how many times a folio was pinned (via memfd_pin_folios()),
> as a folio could be part of multiple ranges.
>
> For example, if userspace provides ranges 64..128 and 256..512 (assuming
> these are 4k sized subpage offsets and we have a 2MB hugetlb folio), then
> the same folio would cover both ranges and there would be 2 entries for
> this folio in unpin_list. But, with your logic, we'd be erroneously unpinning
> it only once.
:(, too complex. I got a misunderstand, thank you.
>
> Not sure if there are any great solutions available to address this situation,
> but one option I can think of is to convert unpin_list to unpin array (dynamically
> resized with krealloc?) and track its length separately. Or, as suggested earlier,

Maybe just a folio array(size pagecount) set each folio like unpin list?

even if waste some memory, but access will fast than list. (and low than 
unpin_list)

> another way is to not use unpin_list for memfds backed by shmem, but I suspect
> this may not work if THP is enabled.
>
> Thanks,
> Vivek
>
>> By this, not only saves the overhead of the udmabuf_folio data structure
>> but also makes array access more cache-friendly.
>>
>> Signed-off-by: Huan Yang <link@vivo.com>
>> ---
>>   drivers/dma-buf/udmabuf.c | 68 +++++++++++++++++----------------------
>>   1 file changed, 30 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
>> index 8f9cb0e2e71a..1e7f46c33d1a 100644
>> --- a/drivers/dma-buf/udmabuf.c
>> +++ b/drivers/dma-buf/udmabuf.c
>> @@ -26,16 +26,19 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a
>> dmabuf, in megabytes. Default is
>>
>>   struct udmabuf {
>>   	pgoff_t pagecount;
>> -	struct folio **folios;
>>   	struct sg_table *sg;
>>   	struct miscdevice *device;
>> +	struct folio **folios;
>> +	/**
>> +	 * offset in folios array's folio, byte unit.
>> +	 * udmabuf can use either shmem or hugetlb pages, an array based
>> on
>> +	 * pages may not be suitable.
>> +	 * Especially when HVO is enabled, the tail page will be released,
>> +	 * so our reference to the page will no longer be correct.
>> +	 * Hence, it's necessary to record the offset in order to reference
>> +	 * the correct PFN within the folio.
>> +	 */
>>   	pgoff_t *offsets;
>> -	struct list_head unpin_list;
>> -};
>> -
>> -struct udmabuf_folio {
>> -	struct folio *folio;
>> -	struct list_head list;
>>   };
>>
>>   static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct
>> *vma)
>> @@ -160,32 +163,28 @@ static void unmap_udmabuf(struct
>> dma_buf_attachment *at,
>>   	return put_sg_table(at->dev, sg, direction);
>>   }
>>
>> -static void unpin_all_folios(struct list_head *unpin_list)
>> +/**
>> + * unpin_all_folios:		unpin each folio we pinned in create
>> + * The udmabuf set all folio in folios and pinned it, but for large folio,
>> + * We may have only used a small portion of the physical in the folio.
>> + * we will repeatedly, sequentially set the folio into the array to ensure
>> + * that the offset can index the correct folio at the corresponding index.
>> + * Hence, we only need to unpin the first iterred folio.
>> + */
>> +static void unpin_all_folios(struct udmabuf *ubuf)
>>   {
>> -	struct udmabuf_folio *ubuf_folio;
>> -
>> -	while (!list_empty(unpin_list)) {
>> -		ubuf_folio = list_first_entry(unpin_list,
>> -					      struct udmabuf_folio, list);
>> -		unpin_folio(ubuf_folio->folio);
>> -
>> -		list_del(&ubuf_folio->list);
>> -		kfree(ubuf_folio);
>> -	}
>> -}
>> +	pgoff_t pg;
>> +	struct folio *last = NULL;
>>
>> -static int add_to_unpin_list(struct list_head *unpin_list,
>> -			     struct folio *folio)
>> -{
>> -	struct udmabuf_folio *ubuf_folio;
>> +	for (pg = 0; pg < ubuf->pagecount; pg++) {
>> +		struct folio *tmp = ubuf->folios[pg];
>>
>> -	ubuf_folio = kzalloc(sizeof(*ubuf_folio), GFP_KERNEL);
>> -	if (!ubuf_folio)
>> -		return -ENOMEM;
>> +		if (tmp == last)
>> +			continue;
>>
>> -	ubuf_folio->folio = folio;
>> -	list_add_tail(&ubuf_folio->list, unpin_list);
>> -	return 0;
>> +		last = tmp;
>> +		unpin_folio(tmp);
>> +	}
>>   }
>>
>>   static void release_udmabuf(struct dma_buf *buf)
>> @@ -196,7 +195,7 @@ static void release_udmabuf(struct dma_buf *buf)
>>   	if (ubuf->sg)
>>   		put_sg_table(dev, ubuf->sg, DMA_BIDIRECTIONAL);
>>
>> -	unpin_all_folios(&ubuf->unpin_list);
>> +	unpin_all_folios(ubuf);
>>   	kvfree(ubuf->offsets);
>>   	kvfree(ubuf->folios);
>>   	kfree(ubuf);
>> @@ -308,7 +307,6 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   	if (!ubuf)
>>   		return -ENOMEM;
>>
>> -	INIT_LIST_HEAD(&ubuf->unpin_list);
>>   	pglimit = (size_limit_mb * 1024 * 1024) >> PAGE_SHIFT;
>>   	for (i = 0; i < head->count; i++) {
>>   		if (!IS_ALIGNED(list[i].offset, PAGE_SIZE))
>> @@ -366,12 +364,6 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   			u32 k;
>>   			long fsize = folio_size(folios[j]);
>>
>> -			ret = add_to_unpin_list(&ubuf->unpin_list, folios[j]);
>> -			if (ret < 0) {
>> -				kfree(folios);
>> -				goto err;
>> -			}
>> -
>>   			for (k = pgoff; k < fsize; k += PAGE_SIZE) {
>>   				ubuf->folios[pgbuf] = folios[j];
>>   				ubuf->offsets[pgbuf] = k;
>> @@ -399,7 +391,7 @@ static long udmabuf_create(struct miscdevice
>> *device,
>>   err:
>>   	if (memfd)
>>   		fput(memfd);
>> -	unpin_all_folios(&ubuf->unpin_list);
>> +	unpin_all_folios(ubuf);
>>   	kvfree(ubuf->offsets);
>>   	kvfree(ubuf->folios);
>>   	kfree(ubuf);
>> --
>> 2.45.2

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-08-20  1:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13  9:05 [PATCH v3 0/5] udmbuf bug fix and some improvements Huan Yang
2024-08-13  9:05 ` [PATCH v3 1/5] udmabuf: cancel mmap page fault, direct map it Huan Yang
2024-08-17  0:53   ` Kasireddy, Vivek
2024-08-20  1:30     ` Huan Yang
2024-08-13  9:05 ` [PATCH v3 2/5] udmabuf: change folios array from kmalloc to kvmalloc Huan Yang
2024-08-13  9:05 ` [PATCH v3 3/5] fix vmap_udmabuf error page set Huan Yang
2024-08-17  0:54   ` Kasireddy, Vivek
2024-08-20  1:33     ` Huan Yang
2024-08-13  9:05 ` [PATCH v3 4/5] udmabuf: codestyle cleanup Huan Yang
2024-08-17  0:58   ` Kasireddy, Vivek
2024-08-20  1:37     ` Huan Yang
2024-08-13  9:05 ` [PATCH v3 5/5] udmabuf: remove udmabuf_folio Huan Yang
2024-08-16 12:54   ` kernel test robot
2024-08-17  1:05   ` Kasireddy, Vivek
2024-08-20  1:41     ` Huan Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.