[PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
@ 2025-06-30  7:25 lizhe.67
  2025-06-30  7:25 ` [PATCH 1/4] vfio/type1: optimize vfio_pin_pages_remote() for large folios lizhe.67
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: lizhe.67 @ 2025-06-30  7:25 UTC (permalink / raw)
  To: alex.williamson, jgg, david, peterx; +Cc: kvm, linux-kernel, lizhe.67

From: Li Zhe <lizhe.67@bytedance.com>

This patchset is an consolidation of the two previous patchsets[1][2].

When vfio_pin_pages_remote() is called with a range of addresses that
includes large folios, the function currently performs individual
statistics counting operations for each page. This can lead to significant
performance overheads, especially when dealing with large ranges of pages.

The function vfio_unpin_pages_remote() has a similar issue, where executing
put_pfn() for each pfn brings considerable consumption.

This patchset optimizes the performance of the relevant functions by
batching the less efficient operations mentioned before.

The first patch optimizes the performance of the function
vfio_pin_pages_remote(), while the remaining patches optimize the
performance of the function vfio_unpin_pages_remote().

The performance test results, based on v6.16-rc4, for completing the 16G
VFIO MAP/UNMAP DMA, obtained through unit test[3] with slight
modifications[4], are as follows.

Base(6.16-rc4):
./vfio-pci-mem-dma-map 0000:03:00.0 16
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.047 s (340.2 GB/s)
VFIO UNMAP DMA in 0.135 s (118.6 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.280 s (57.2 GB/s)
VFIO UNMAP DMA in 0.312 s (51.3 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.052 s (310.5 GB/s)
VFIO UNMAP DMA in 0.136 s (117.3 GB/s)

With this patchset:
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.027 s (596.4 GB/s)
VFIO UNMAP DMA in 0.045 s (357.6 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.288 s (55.5 GB/s)
VFIO UNMAP DMA in 0.288 s (55.6 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.031 s (508.3 GB/s)
VFIO UNMAP DMA in 0.045 s (352.9 GB/s)

For large folio, we achieve an over 40% performance improvement for VFIO
MAP DMA and an over 66% performance improvement for VFIO DMA UNMAP. For
small folios, the performance test results show little difference compared
with the performance before optimization.

[1]: https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/
[2]: https://lore.kernel.org/all/20250620032344.13382-1-lizhe.67@bytedance.com/
[3]: https://github.com/awilliam/tests/blob/vfio-pci-mem-dma-map/vfio-pci-mem-dma-map.c
[4]: https://lore.kernel.org/all/20250610031013.98556-1-lizhe.67@bytedance.com/

Li Zhe (4):
  vfio/type1: optimize vfio_pin_pages_remote() for large folios
  vfio/type1: batch vfio_find_vpfn() in function
    vfio_unpin_pages_remote()
  vfio/type1: introduce a new member has_rsvd for struct vfio_dma
  vfio/type1: optimize vfio_unpin_pages_remote() for large folio

 drivers/vfio/vfio_iommu_type1.c | 121 ++++++++++++++++++++++++++------
 1 file changed, 100 insertions(+), 21 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] vfio/type1: optimize vfio_pin_pages_remote() for large folios
  2025-06-30  7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
@ 2025-06-30  7:25 ` lizhe.67
  2025-06-30  7:25 ` [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-06-30  7:25 UTC (permalink / raw)
  To: alex.williamson, jgg, david, peterx; +Cc: kvm, linux-kernel, lizhe.67

From: Li Zhe <lizhe.67@bytedance.com>

When vfio_pin_pages_remote() is called with a range of addresses that
includes large folios, the function currently performs individual
statistics counting operations for each page. This can lead to significant
performance overheads, especially when dealing with large ranges of pages.

This patch optimize this process by batching the statistics counting
operations.

The performance test results for completing the 16G VFIO IOMMU DMA mapping
are as follows.

Base(v6.16-rc4):
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.047 s (340.2 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.280 s (57.2 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.052 s (310.5 GB/s)

With this patch:
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.027 s (596.5 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.290 s (55.2 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.031 s (511.1 GB/s)

For large folio, we achieve an over 40% performance improvement.
For small folios, the performance test results indicate a
particularly minor performance drop.

Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/vfio/vfio_iommu_type1.c | 93 ++++++++++++++++++++++++++++-----
 1 file changed, 81 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1136d7ac6b59..a2d7abd4f2c2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -318,7 +318,13 @@ static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
 /*
  * Helper Functions for host iova-pfn list
  */
-static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
+
+/*
+ * Find the highest vfio_pfn that overlapping the range
+ * [iova_start, iova_end) in rb tree.
+ */
+static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma,
+		dma_addr_t iova_start, dma_addr_t iova_end)
 {
 	struct vfio_pfn *vpfn;
 	struct rb_node *node = dma->pfn_list.rb_node;
@@ -326,9 +332,9 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
 	while (node) {
 		vpfn = rb_entry(node, struct vfio_pfn, node);
 
-		if (iova < vpfn->iova)
+		if (iova_end <= vpfn->iova)
 			node = node->rb_left;
-		else if (iova > vpfn->iova)
+		else if (iova_start > vpfn->iova)
 			node = node->rb_right;
 		else
 			return vpfn;
@@ -336,6 +342,11 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
 	return NULL;
 }
 
+static inline struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
+{
+	return vfio_find_vpfn_range(dma, iova, iova + PAGE_SIZE);
+}
+
 static void vfio_link_pfn(struct vfio_dma *dma,
 			  struct vfio_pfn *new)
 {
@@ -614,6 +625,56 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
 	return ret;
 }
 
+static long contig_pages(struct vfio_dma *dma,
+		struct vfio_batch *batch, dma_addr_t iova)
+{
+	struct page *page = batch->pages[batch->offset];
+	struct folio *folio = page_folio(page);
+	long idx = folio_page_idx(folio, page);
+	long max = min_t(long, batch->size, folio_nr_pages(folio) - idx);
+	long nr_pages;
+
+	for (nr_pages = 1; nr_pages < max; nr_pages++) {
+		if (batch->pages[batch->offset + nr_pages] !=
+				folio_page(folio, idx + nr_pages))
+			break;
+	}
+
+	return nr_pages;
+}
+
+static long vpfn_pages(struct vfio_dma *dma,
+		dma_addr_t iova_start, long nr_pages)
+{
+	dma_addr_t iova_end = iova_start + (nr_pages << PAGE_SHIFT);
+	struct vfio_pfn *top = vfio_find_vpfn_range(dma, iova_start, iova_end);
+	long ret = 1;
+	struct vfio_pfn *vpfn;
+	struct rb_node *prev;
+	struct rb_node *next;
+
+	if (likely(!top))
+		return 0;
+
+	prev = next = &top->node;
+
+	while ((prev = rb_prev(prev))) {
+		vpfn = rb_entry(prev, struct vfio_pfn, node);
+		if (vpfn->iova < iova_start)
+			break;
+		ret++;
+	}
+
+	while ((next = rb_next(next))) {
+		vpfn = rb_entry(next, struct vfio_pfn, node);
+		if (vpfn->iova >= iova_end)
+			break;
+		ret++;
+	}
+
+	return ret;
+}
+
 /*
  * Attempt to pin pages.  We really don't want to track all the pfns and
  * the iommu can only map chunks of consecutive pfns anyway, so get the
@@ -680,32 +741,40 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 		 * and rsvd here, and therefore continues to use the batch.
 		 */
 		while (true) {
+			long nr_pages, acct_pages = 0;
+
 			if (pfn != *pfn_base + pinned ||
 			    rsvd != is_invalid_reserved_pfn(pfn))
 				goto out;
 
+			nr_pages = contig_pages(dma, batch, iova);
+			if (!rsvd) {
+				acct_pages = nr_pages;
+				acct_pages -= vpfn_pages(dma, iova, nr_pages);
+			}
+
 			/*
 			 * Reserved pages aren't counted against the user,
 			 * externally pinned pages are already counted against
 			 * the user.
 			 */
-			if (!rsvd && !vfio_find_vpfn(dma, iova)) {
+			if (acct_pages) {
 				if (!dma->lock_cap &&
-				    mm->locked_vm + lock_acct + 1 > limit) {
+						mm->locked_vm + lock_acct + acct_pages > limit) {
 					pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
 						__func__, limit << PAGE_SHIFT);
 					ret = -ENOMEM;
 					goto unpin_out;
 				}
-				lock_acct++;
+				lock_acct += acct_pages;
 			}
 
-			pinned++;
-			npage--;
-			vaddr += PAGE_SIZE;
-			iova += PAGE_SIZE;
-			batch->offset++;
-			batch->size--;
+			pinned += nr_pages;
+			npage -= nr_pages;
+			vaddr += PAGE_SIZE * nr_pages;
+			iova += PAGE_SIZE * nr_pages;
+			batch->offset += nr_pages;
+			batch->size -= nr_pages;
 
 			if (!batch->size)
 				break;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
  2025-06-30  7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
  2025-06-30  7:25 ` [PATCH 1/4] vfio/type1: optimize vfio_pin_pages_remote() for large folios lizhe.67
@ 2025-06-30  7:25 ` lizhe.67
  2025-07-02 18:27   ` Jason Gunthorpe
  2025-06-30  7:25 ` [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-06-30  7:25 UTC (permalink / raw)
  To: alex.williamson, jgg, david, peterx; +Cc: kvm, linux-kernel, lizhe.67

From: Li Zhe <lizhe.67@bytedance.com>

The function vpfn_pages() can help us determine the number of vpfn
nodes on the vpfn rb tree within a specified range. This allows us
to avoid searching for each vpfn individually in the function
vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
calls in function vfio_unpin_pages_remote().

Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
 drivers/vfio/vfio_iommu_type1.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a2d7abd4f2c2..330fff4fe96d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -804,16 +804,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
 				    unsigned long pfn, unsigned long npage,
 				    bool do_accounting)
 {
-	long unlocked = 0, locked = 0;
+	long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
 	long i;
 
-	for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
-		if (put_pfn(pfn++, dma->prot)) {
+	for (i = 0; i < npage; i++)
+		if (put_pfn(pfn++, dma->prot))
 			unlocked++;
-			if (vfio_find_vpfn(dma, iova))
-				locked++;
-		}
-	}
 
 	if (do_accounting)
 		vfio_lock_acct(dma, locked - unlocked, true);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
  2025-06-30  7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
  2025-06-30  7:25 ` [PATCH 1/4] vfio/type1: optimize vfio_pin_pages_remote() for large folios lizhe.67
  2025-06-30  7:25 ` [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
@ 2025-06-30  7:25 ` lizhe.67
  2025-07-01 15:13   ` Dan Carpenter
  2025-06-30  7:25 ` [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
  2025-07-02  8:15 ` [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and " David Hildenbrand
  4 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-06-30  7:25 UTC (permalink / raw)
  To: alex.williamson, jgg, david, peterx; +Cc: kvm, linux-kernel, lizhe.67

From: Li Zhe <lizhe.67@bytedance.com>

Introduce a new member has_rsvd for struct vfio_dma. This member is
used to indicate whether there are any reserved or invalid pfns in
the region represented by this vfio_dma. If it is true, it indicates
that there is at least one pfn in this region that is either reserved
or invalid.

Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
 drivers/vfio/vfio_iommu_type1.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 330fff4fe96d..a02bc340c112 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -92,6 +92,7 @@ struct vfio_dma {
 	bool			iommu_mapped;
 	bool			lock_cap;	/* capable(CAP_IPC_LOCK) */
 	bool			vaddr_invalid;
+	bool			has_rsvd;	/* has 1 or more rsvd pfns */
 	struct task_struct	*task;
 	struct rb_root		pfn_list;	/* Ex-user pinned pfn list */
 	unsigned long		*bitmap;
@@ -784,6 +785,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 	}
 
 out:
+	dma->has_rsvd |= rsvd;
 	ret = vfio_lock_acct(dma, lock_acct, false);
 
 unpin_out:
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio
  2025-06-30  7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
                   ` (2 preceding siblings ...)
  2025-06-30  7:25 ` [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
@ 2025-06-30  7:25 ` lizhe.67
  2025-07-02 18:28   ` Jason Gunthorpe
  2025-07-02  8:15 ` [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and " David Hildenbrand
  4 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-06-30  7:25 UTC (permalink / raw)
  To: alex.williamson, jgg, david, peterx; +Cc: kvm, linux-kernel, lizhe.67

From: Li Zhe <lizhe.67@bytedance.com>

When vfio_unpin_pages_remote() is called with a range of addresses that
includes large folios, the function currently performs individual
put_pfn() operations for each page. This can lead to significant
performance overheads, especially when dealing with large ranges of pages.

It would be very rare for reserved PFNs and non reserved will to be mixed
within the same range. So this patch utilizes the has_rsvd variable
introduced in the previous patch to determine whether batch put_pfn()
operations can be performed. Moreover, compared to put_pfn(),
unpin_user_page_range_dirty_lock() is capable of handling large folio
scenarios more efficiently.

The performance test results for completing the 16G VFIO IOMMU DMA
unmapping are as follows.

Base(v6.16-rc4):
./vfio-pci-mem-dma-map 0000:03:00.0 16
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO UNMAP DMA in 0.135 s (118.6 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO UNMAP DMA in 0.312 s (51.3 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO UNMAP DMA in 0.136 s (117.3 GB/s)

With this patchset:
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO UNMAP DMA in 0.045 s (357.6 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO UNMAP DMA in 0.288 s (55.6 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO UNMAP DMA in 0.045 s (352.9 GB/s)

For large folio, we achieve an over 66% performance improvement in
the VFIO UNMAP DMA item. For small folios, the performance test
results appear to show no significant changes.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
 drivers/vfio/vfio_iommu_type1.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a02bc340c112..7cacfb2cefe3 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -802,17 +802,29 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
 	return pinned;
 }
 
+static inline void put_valid_unreserved_pfns(unsigned long start_pfn,
+		unsigned long npage, int prot)
+{
+	unpin_user_page_range_dirty_lock(pfn_to_page(start_pfn), npage,
+					 prot & IOMMU_WRITE);
+}
+
 static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
 				    unsigned long pfn, unsigned long npage,
 				    bool do_accounting)
 {
 	long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
-	long i;
 
-	for (i = 0; i < npage; i++)
-		if (put_pfn(pfn++, dma->prot))
-			unlocked++;
+	if (dma->has_rsvd) {
+		long i;
 
+		for (i = 0; i < npage; i++)
+			if (put_pfn(pfn++, dma->prot))
+				unlocked++;
+	} else {
+		put_valid_unreserved_pfns(pfn, npage, dma->prot);
+		unlocked = npage;
+	}
 	if (do_accounting)
 		vfio_lock_acct(dma, locked - unlocked, true);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
  2025-06-30  7:25 ` [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
@ 2025-07-01 15:13   ` Dan Carpenter
  2025-07-02  3:47     ` lizhe.67
  0 siblings, 1 reply; 24+ messages in thread
From: Dan Carpenter @ 2025-07-01 15:13 UTC (permalink / raw)
  To: oe-kbuild, lizhe.67, alex.williamson, jgg, david, peterx
  Cc: lkp, oe-kbuild-all, kvm, linux-kernel, lizhe.67

Hi,

kernel test robot noticed the following build warnings:

https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/lizhe-67-bytedance-com/vfio-type1-optimize-vfio_pin_pages_remote-for-large-folios/20250630-152849
base:   https://github.com/awilliam/linux-vfio.git next
patch link:    https://lore.kernel.org/r/20250630072518.31846-4-lizhe.67%40bytedance.com
patch subject: [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
config: x86_64-randconfig-161-20250701 (https://download.01.org/0day-ci/archive/20250701/202507012121.wkDLcDXn-lkp@intel.com/config)
compiler: clang version 20.1.7 (https://github.com/llvm/llvm-project 6146a88f60492b520a36f8f8f3231e15f3cc6082)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202507012121.wkDLcDXn-lkp@intel.com/

New smatch warnings:
drivers/vfio/vfio_iommu_type1.c:788 vfio_pin_pages_remote() error: uninitialized symbol 'rsvd'.

Old smatch warnings:
drivers/vfio/vfio_iommu_type1.c:2376 vfio_iommu_type1_attach_group() warn: '&group->next' not removed from list

vim +/rsvd +788 drivers/vfio/vfio_iommu_type1.c

8f0d5bb95f763c Kirti Wankhede  2016-11-17  684  static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
0635559233434a Alex Williamson 2025-02-18  685  				  unsigned long npage, unsigned long *pfn_base,
4b6c33b3229678 Daniel Jordan   2021-02-19  686  				  unsigned long limit, struct vfio_batch *batch)
73fa0d10d077d9 Alex Williamson 2012-07-31  687  {
4d83de6da265cd Daniel Jordan   2021-02-19  688  	unsigned long pfn;
4d83de6da265cd Daniel Jordan   2021-02-19  689  	struct mm_struct *mm = current->mm;
6c38c055cc4c0a Alex Williamson 2016-12-30  690  	long ret, pinned = 0, lock_acct = 0;
89c29def6b0101 Alex Williamson 2018-06-02  691  	bool rsvd;
a54eb55045ae9b Kirti Wankhede  2016-11-17  692  	dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
166fd7d94afdac Alex Williamson 2013-06-21  693  
6c38c055cc4c0a Alex Williamson 2016-12-30  694  	/* This code path is only user initiated */
4d83de6da265cd Daniel Jordan   2021-02-19  695  	if (!mm)
166fd7d94afdac Alex Williamson 2013-06-21  696  		return -ENODEV;
73fa0d10d077d9 Alex Williamson 2012-07-31  697  
4d83de6da265cd Daniel Jordan   2021-02-19  698  	if (batch->size) {
4d83de6da265cd Daniel Jordan   2021-02-19  699  		/* Leftover pages in batch from an earlier call. */
4d83de6da265cd Daniel Jordan   2021-02-19  700  		*pfn_base = page_to_pfn(batch->pages[batch->offset]);
4d83de6da265cd Daniel Jordan   2021-02-19  701  		pfn = *pfn_base;
89c29def6b0101 Alex Williamson 2018-06-02  702  		rsvd = is_invalid_reserved_pfn(*pfn_base);
4d83de6da265cd Daniel Jordan   2021-02-19  703  	} else {
4d83de6da265cd Daniel Jordan   2021-02-19  704  		*pfn_base = 0;
5c6c2b21ecc9ad Alex Williamson 2013-06-21  705  	}
5c6c2b21ecc9ad Alex Williamson 2013-06-21  706  
eb996eec783c1e Alex Williamson 2025-02-18  707  	if (unlikely(disable_hugepages))
eb996eec783c1e Alex Williamson 2025-02-18  708  		npage = 1;
eb996eec783c1e Alex Williamson 2025-02-18  709  
4d83de6da265cd Daniel Jordan   2021-02-19  710  	while (npage) {
4d83de6da265cd Daniel Jordan   2021-02-19  711  		if (!batch->size) {
4d83de6da265cd Daniel Jordan   2021-02-19  712  			/* Empty batch, so refill it. */
eb996eec783c1e Alex Williamson 2025-02-18  713  			ret = vaddr_get_pfns(mm, vaddr, npage, dma->prot,
eb996eec783c1e Alex Williamson 2025-02-18  714  					     &pfn, batch);
be16c1fd99f41a Daniel Jordan   2021-02-19  715  			if (ret < 0)
4d83de6da265cd Daniel Jordan   2021-02-19  716  				goto unpin_out;
166fd7d94afdac Alex Williamson 2013-06-21  717  
4d83de6da265cd Daniel Jordan   2021-02-19  718  			if (!*pfn_base) {
4d83de6da265cd Daniel Jordan   2021-02-19  719  				*pfn_base = pfn;
4d83de6da265cd Daniel Jordan   2021-02-19  720  				rsvd = is_invalid_reserved_pfn(*pfn_base);
4d83de6da265cd Daniel Jordan   2021-02-19  721  			}

If "*pfn_base" is true then "rsvd" is uninitialized.

eb996eec783c1e Alex Williamson 2025-02-18  722  
eb996eec783c1e Alex Williamson 2025-02-18  723  			/* Handle pfnmap */
eb996eec783c1e Alex Williamson 2025-02-18  724  			if (!batch->size) {
eb996eec783c1e Alex Williamson 2025-02-18  725  				if (pfn != *pfn_base + pinned || !rsvd)
eb996eec783c1e Alex Williamson 2025-02-18  726  					goto out;

goto out;

eb996eec783c1e Alex Williamson 2025-02-18  727  
eb996eec783c1e Alex Williamson 2025-02-18  728  				pinned += ret;
eb996eec783c1e Alex Williamson 2025-02-18  729  				npage -= ret;
eb996eec783c1e Alex Williamson 2025-02-18  730  				vaddr += (PAGE_SIZE * ret);
eb996eec783c1e Alex Williamson 2025-02-18  731  				iova += (PAGE_SIZE * ret);
eb996eec783c1e Alex Williamson 2025-02-18  732  				continue;
eb996eec783c1e Alex Williamson 2025-02-18  733  			}
166fd7d94afdac Alex Williamson 2013-06-21  734  		}
166fd7d94afdac Alex Williamson 2013-06-21  735  
4d83de6da265cd Daniel Jordan   2021-02-19  736  		/*
eb996eec783c1e Alex Williamson 2025-02-18  737  		 * pfn is preset for the first iteration of this inner loop
eb996eec783c1e Alex Williamson 2025-02-18  738  		 * due to the fact that vaddr_get_pfns() needs to provide the
eb996eec783c1e Alex Williamson 2025-02-18  739  		 * initial pfn for pfnmaps.  Therefore to reduce redundancy,
eb996eec783c1e Alex Williamson 2025-02-18  740  		 * the next pfn is fetched at the end of the loop.
eb996eec783c1e Alex Williamson 2025-02-18  741  		 * A PageReserved() page could still qualify as page backed
eb996eec783c1e Alex Williamson 2025-02-18  742  		 * and rsvd here, and therefore continues to use the batch.
4d83de6da265cd Daniel Jordan   2021-02-19  743  		 */
4d83de6da265cd Daniel Jordan   2021-02-19  744  		while (true) {
6a2d9b72168041 Li Zhe          2025-06-30  745  			long nr_pages, acct_pages = 0;
6a2d9b72168041 Li Zhe          2025-06-30  746  
4d83de6da265cd Daniel Jordan   2021-02-19  747  			if (pfn != *pfn_base + pinned ||
4d83de6da265cd Daniel Jordan   2021-02-19  748  			    rsvd != is_invalid_reserved_pfn(pfn))
4d83de6da265cd Daniel Jordan   2021-02-19  749  				goto out;
4d83de6da265cd Daniel Jordan   2021-02-19  750  
6a2d9b72168041 Li Zhe          2025-06-30  751  			nr_pages = contig_pages(dma, batch, iova);
6a2d9b72168041 Li Zhe          2025-06-30  752  			if (!rsvd) {
6a2d9b72168041 Li Zhe          2025-06-30  753  				acct_pages = nr_pages;
6a2d9b72168041 Li Zhe          2025-06-30  754  				acct_pages -= vpfn_pages(dma, iova, nr_pages);
6a2d9b72168041 Li Zhe          2025-06-30  755  			}
6a2d9b72168041 Li Zhe          2025-06-30  756  
4d83de6da265cd Daniel Jordan   2021-02-19  757  			/*
4d83de6da265cd Daniel Jordan   2021-02-19  758  			 * Reserved pages aren't counted against the user,
4d83de6da265cd Daniel Jordan   2021-02-19  759  			 * externally pinned pages are already counted against
4d83de6da265cd Daniel Jordan   2021-02-19  760  			 * the user.
4d83de6da265cd Daniel Jordan   2021-02-19  761  			 */
6a2d9b72168041 Li Zhe          2025-06-30  762  			if (acct_pages) {
48d8476b41eed6 Alex Williamson 2018-05-11  763  				if (!dma->lock_cap &&
6a2d9b72168041 Li Zhe          2025-06-30  764  						mm->locked_vm + lock_acct + acct_pages > limit) {
6c38c055cc4c0a Alex Williamson 2016-12-30  765  					pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
6c38c055cc4c0a Alex Williamson 2016-12-30  766  						__func__, limit << PAGE_SHIFT);
0cfef2b7410b64 Alex Williamson 2017-04-13  767  					ret = -ENOMEM;
0cfef2b7410b64 Alex Williamson 2017-04-13  768  					goto unpin_out;
166fd7d94afdac Alex Williamson 2013-06-21  769  				}
6a2d9b72168041 Li Zhe          2025-06-30  770  				lock_acct += acct_pages;
a54eb55045ae9b Kirti Wankhede  2016-11-17  771  			}
4d83de6da265cd Daniel Jordan   2021-02-19  772  
6a2d9b72168041 Li Zhe          2025-06-30  773  			pinned += nr_pages;
6a2d9b72168041 Li Zhe          2025-06-30  774  			npage -= nr_pages;
6a2d9b72168041 Li Zhe          2025-06-30  775  			vaddr += PAGE_SIZE * nr_pages;
6a2d9b72168041 Li Zhe          2025-06-30  776  			iova += PAGE_SIZE * nr_pages;
6a2d9b72168041 Li Zhe          2025-06-30  777  			batch->offset += nr_pages;
6a2d9b72168041 Li Zhe          2025-06-30  778  			batch->size -= nr_pages;
4d83de6da265cd Daniel Jordan   2021-02-19  779  
4d83de6da265cd Daniel Jordan   2021-02-19  780  			if (!batch->size)
4d83de6da265cd Daniel Jordan   2021-02-19  781  				break;
4d83de6da265cd Daniel Jordan   2021-02-19  782  
4d83de6da265cd Daniel Jordan   2021-02-19  783  			pfn = page_to_pfn(batch->pages[batch->offset]);
4d83de6da265cd Daniel Jordan   2021-02-19  784  		}
a54eb55045ae9b Kirti Wankhede  2016-11-17  785  	}
166fd7d94afdac Alex Williamson 2013-06-21  786  
6c38c055cc4c0a Alex Williamson 2016-12-30  787  out:
20448310d6b71d Li Zhe          2025-06-30 @788  	dma->has_rsvd |= rsvd;
                                                                         ^^^^

48d8476b41eed6 Alex Williamson 2018-05-11  789  	ret = vfio_lock_acct(dma, lock_acct, false);
0cfef2b7410b64 Alex Williamson 2017-04-13  790  
0cfef2b7410b64 Alex Williamson 2017-04-13  791  unpin_out:
be16c1fd99f41a Daniel Jordan   2021-02-19  792  	if (ret < 0) {
4d83de6da265cd Daniel Jordan   2021-02-19  793  		if (pinned && !rsvd) {
0cfef2b7410b64 Alex Williamson 2017-04-13  794  			for (pfn = *pfn_base ; pinned ; pfn++, pinned--)
0cfef2b7410b64 Alex Williamson 2017-04-13  795  				put_pfn(pfn, dma->prot);
89c29def6b0101 Alex Williamson 2018-06-02  796  		}
4d83de6da265cd Daniel Jordan   2021-02-19  797  		vfio_batch_unpin(batch, dma);
0cfef2b7410b64 Alex Williamson 2017-04-13  798  
0cfef2b7410b64 Alex Williamson 2017-04-13  799  		return ret;
0cfef2b7410b64 Alex Williamson 2017-04-13  800  	}
166fd7d94afdac Alex Williamson 2013-06-21  801  
6c38c055cc4c0a Alex Williamson 2016-12-30  802  	return pinned;
73fa0d10d077d9 Alex Williamson 2012-07-31  803  }

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
  2025-07-01 15:13   ` Dan Carpenter
@ 2025-07-02  3:47     ` lizhe.67
  2025-07-02 16:11       ` Dan Carpenter
  0 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-07-02  3:47 UTC (permalink / raw)
  To: dan.carpenter
  Cc: alex.williamson, david, jgg, kvm, linux-kernel, lizhe.67, lkp,
	oe-kbuild-all, oe-kbuild, peterx

On Tue, 1 Jul 2025 18:13:48 +0300, dan.carpenter@linaro.org wrote:

> New smatch warnings:
> drivers/vfio/vfio_iommu_type1.c:788 vfio_pin_pages_remote() error: uninitialized symbol 'rsvd'.
> 
> Old smatch warnings:
> drivers/vfio/vfio_iommu_type1.c:2376 vfio_iommu_type1_attach_group() warn: '&group->next' not removed from list
> 
> vim +/rsvd +788 drivers/vfio/vfio_iommu_type1.c
> 
> 8f0d5bb95f763c Kirti Wankhede  2016-11-17  684  static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> 0635559233434a Alex Williamson 2025-02-18  685  				  unsigned long npage, unsigned long *pfn_base,
> 4b6c33b3229678 Daniel Jordan   2021-02-19  686  				  unsigned long limit, struct vfio_batch *batch)
> 73fa0d10d077d9 Alex Williamson 2012-07-31  687  {
> 4d83de6da265cd Daniel Jordan   2021-02-19  688  	unsigned long pfn;
> 4d83de6da265cd Daniel Jordan   2021-02-19  689  	struct mm_struct *mm = current->mm;
> 6c38c055cc4c0a Alex Williamson 2016-12-30  690  	long ret, pinned = 0, lock_acct = 0;
> 89c29def6b0101 Alex Williamson 2018-06-02  691  	bool rsvd;
> a54eb55045ae9b Kirti Wankhede  2016-11-17  692  	dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
> 166fd7d94afdac Alex Williamson 2013-06-21  693  
> 6c38c055cc4c0a Alex Williamson 2016-12-30  694  	/* This code path is only user initiated */
> 4d83de6da265cd Daniel Jordan   2021-02-19  695  	if (!mm)
> 166fd7d94afdac Alex Williamson 2013-06-21  696  		return -ENODEV;
> 73fa0d10d077d9 Alex Williamson 2012-07-31  697  
> 4d83de6da265cd Daniel Jordan   2021-02-19  698  	if (batch->size) {
> 4d83de6da265cd Daniel Jordan   2021-02-19  699  		/* Leftover pages in batch from an earlier call. */
> 4d83de6da265cd Daniel Jordan   2021-02-19  700  		*pfn_base = page_to_pfn(batch->pages[batch->offset]);
> 4d83de6da265cd Daniel Jordan   2021-02-19  701  		pfn = *pfn_base;
> 89c29def6b0101 Alex Williamson 2018-06-02  702  		rsvd = is_invalid_reserved_pfn(*pfn_base);

When batch->size is not zero, we initialize rsvd here.

> 4d83de6da265cd Daniel Jordan   2021-02-19  703  	} else {
> 4d83de6da265cd Daniel Jordan   2021-02-19  704  		*pfn_base = 0;

When the value of batch->size is zero, we set the value of *pfn_base
to zero and do not initialize rsvd for the time being.

> 5c6c2b21ecc9ad Alex Williamson 2013-06-21  705  	}
> 5c6c2b21ecc9ad Alex Williamson 2013-06-21  706  
> eb996eec783c1e Alex Williamson 2025-02-18  707  	if (unlikely(disable_hugepages))
> eb996eec783c1e Alex Williamson 2025-02-18  708  		npage = 1;
> eb996eec783c1e Alex Williamson 2025-02-18  709  
> 4d83de6da265cd Daniel Jordan   2021-02-19  710  	while (npage) {
> 4d83de6da265cd Daniel Jordan   2021-02-19  711  		if (!batch->size) {
> 4d83de6da265cd Daniel Jordan   2021-02-19  712  			/* Empty batch, so refill it. */
> eb996eec783c1e Alex Williamson 2025-02-18  713  			ret = vaddr_get_pfns(mm, vaddr, npage, dma->prot,
> eb996eec783c1e Alex Williamson 2025-02-18  714  					     &pfn, batch);
> be16c1fd99f41a Daniel Jordan   2021-02-19  715  			if (ret < 0)
> 4d83de6da265cd Daniel Jordan   2021-02-19  716  				goto unpin_out;
> 166fd7d94afdac Alex Williamson 2013-06-21  717  
> 4d83de6da265cd Daniel Jordan   2021-02-19  718  			if (!*pfn_base) {
> 4d83de6da265cd Daniel Jordan   2021-02-19  719  				*pfn_base = pfn;
> 4d83de6da265cd Daniel Jordan   2021-02-19  720  				rsvd = is_invalid_reserved_pfn(*pfn_base);

Therefore, for the first loop, when batch->size is zero, *pfn_base must
be zero, which will then lead to the initialization of rsvd.

> 4d83de6da265cd Daniel Jordan   2021-02-19  721  			}
> 
> If "*pfn_base" is true then "rsvd" is uninitialized.
> 
> eb996eec783c1e Alex Williamson 2025-02-18  722  
> eb996eec783c1e Alex Williamson 2025-02-18  723  			/* Handle pfnmap */
> eb996eec783c1e Alex Williamson 2025-02-18  724  			if (!batch->size) {
> eb996eec783c1e Alex Williamson 2025-02-18  725  				if (pfn != *pfn_base + pinned || !rsvd)
> eb996eec783c1e Alex Williamson 2025-02-18  726  					goto out;
> 
> goto out;
> 
> eb996eec783c1e Alex Williamson 2025-02-18  727  
> eb996eec783c1e Alex Williamson 2025-02-18  728  				pinned += ret;
> eb996eec783c1e Alex Williamson 2025-02-18  729  				npage -= ret;
> eb996eec783c1e Alex Williamson 2025-02-18  730  				vaddr += (PAGE_SIZE * ret);
> eb996eec783c1e Alex Williamson 2025-02-18  731  				iova += (PAGE_SIZE * ret);
> eb996eec783c1e Alex Williamson 2025-02-18  732  				continue;
> eb996eec783c1e Alex Williamson 2025-02-18  733  			}
> 166fd7d94afdac Alex Williamson 2013-06-21  734  		}
> 166fd7d94afdac Alex Williamson 2013-06-21  735  
> 4d83de6da265cd Daniel Jordan   2021-02-19  736  		/*
> eb996eec783c1e Alex Williamson 2025-02-18  737  		 * pfn is preset for the first iteration of this inner loop
> eb996eec783c1e Alex Williamson 2025-02-18  738  		 * due to the fact that vaddr_get_pfns() needs to provide the
> eb996eec783c1e Alex Williamson 2025-02-18  739  		 * initial pfn for pfnmaps.  Therefore to reduce redundancy,
> eb996eec783c1e Alex Williamson 2025-02-18  740  		 * the next pfn is fetched at the end of the loop.
> eb996eec783c1e Alex Williamson 2025-02-18  741  		 * A PageReserved() page could still qualify as page backed
> eb996eec783c1e Alex Williamson 2025-02-18  742  		 * and rsvd here, and therefore continues to use the batch.
> 4d83de6da265cd Daniel Jordan   2021-02-19  743  		 */
> 4d83de6da265cd Daniel Jordan   2021-02-19  744  		while (true) {
> 6a2d9b72168041 Li Zhe          2025-06-30  745  			long nr_pages, acct_pages = 0;
> 6a2d9b72168041 Li Zhe          2025-06-30  746  
> 4d83de6da265cd Daniel Jordan   2021-02-19  747  			if (pfn != *pfn_base + pinned ||
> 4d83de6da265cd Daniel Jordan   2021-02-19  748  			    rsvd != is_invalid_reserved_pfn(pfn))
> 4d83de6da265cd Daniel Jordan   2021-02-19  749  				goto out;
> 4d83de6da265cd Daniel Jordan   2021-02-19  750  
> 6a2d9b72168041 Li Zhe          2025-06-30  751  			nr_pages = contig_pages(dma, batch, iova);
> 6a2d9b72168041 Li Zhe          2025-06-30  752  			if (!rsvd) {
> 6a2d9b72168041 Li Zhe          2025-06-30  753  				acct_pages = nr_pages;
> 6a2d9b72168041 Li Zhe          2025-06-30  754  				acct_pages -= vpfn_pages(dma, iova, nr_pages);
> 6a2d9b72168041 Li Zhe          2025-06-30  755  			}
> 6a2d9b72168041 Li Zhe          2025-06-30  756  
> 4d83de6da265cd Daniel Jordan   2021-02-19  757  			/*
> 4d83de6da265cd Daniel Jordan   2021-02-19  758  			 * Reserved pages aren't counted against the user,
> 4d83de6da265cd Daniel Jordan   2021-02-19  759  			 * externally pinned pages are already counted against
> 4d83de6da265cd Daniel Jordan   2021-02-19  760  			 * the user.
> 4d83de6da265cd Daniel Jordan   2021-02-19  761  			 */
> 6a2d9b72168041 Li Zhe          2025-06-30  762  			if (acct_pages) {
> 48d8476b41eed6 Alex Williamson 2018-05-11  763  				if (!dma->lock_cap &&
> 6a2d9b72168041 Li Zhe          2025-06-30  764  						mm->locked_vm + lock_acct + acct_pages > limit) {
> 6c38c055cc4c0a Alex Williamson 2016-12-30  765  					pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
> 6c38c055cc4c0a Alex Williamson 2016-12-30  766  						__func__, limit << PAGE_SHIFT);
> 0cfef2b7410b64 Alex Williamson 2017-04-13  767  					ret = -ENOMEM;
> 0cfef2b7410b64 Alex Williamson 2017-04-13  768  					goto unpin_out;
> 166fd7d94afdac Alex Williamson 2013-06-21  769  				}
> 6a2d9b72168041 Li Zhe          2025-06-30  770  				lock_acct += acct_pages;
> a54eb55045ae9b Kirti Wankhede  2016-11-17  771  			}
> 4d83de6da265cd Daniel Jordan   2021-02-19  772  
> 6a2d9b72168041 Li Zhe          2025-06-30  773  			pinned += nr_pages;
> 6a2d9b72168041 Li Zhe          2025-06-30  774  			npage -= nr_pages;
> 6a2d9b72168041 Li Zhe          2025-06-30  775  			vaddr += PAGE_SIZE * nr_pages;
> 6a2d9b72168041 Li Zhe          2025-06-30  776  			iova += PAGE_SIZE * nr_pages;
> 6a2d9b72168041 Li Zhe          2025-06-30  777  			batch->offset += nr_pages;
> 6a2d9b72168041 Li Zhe          2025-06-30  778  			batch->size -= nr_pages;
> 4d83de6da265cd Daniel Jordan   2021-02-19  779  
> 4d83de6da265cd Daniel Jordan   2021-02-19  780  			if (!batch->size)
> 4d83de6da265cd Daniel Jordan   2021-02-19  781  				break;
> 4d83de6da265cd Daniel Jordan   2021-02-19  782  
> 4d83de6da265cd Daniel Jordan   2021-02-19  783  			pfn = page_to_pfn(batch->pages[batch->offset]);
> 4d83de6da265cd Daniel Jordan   2021-02-19  784  		}
> a54eb55045ae9b Kirti Wankhede  2016-11-17  785  	}
> 166fd7d94afdac Alex Williamson 2013-06-21  786  
> 6c38c055cc4c0a Alex Williamson 2016-12-30  787  out:
> 20448310d6b71d Li Zhe          2025-06-30 @788  	dma->has_rsvd |= rsvd;
>                                                                        ^^^^

In summary, it is likely to be a false alarm.
Please correct me if I am wrong.

Thanks,
Zhe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
  2025-06-30  7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
                   ` (3 preceding siblings ...)
  2025-06-30  7:25 ` [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
@ 2025-07-02  8:15 ` David Hildenbrand
  2025-07-02  9:38   ` lizhe.67
  4 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-07-02  8:15 UTC (permalink / raw)
  To: lizhe.67, alex.williamson, jgg, peterx; +Cc: kvm, linux-kernel, Jason Gunthorpe

On 30.06.25 09:25, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
> 
> This patchset is an consolidation of the two previous patchsets[1][2].
> 
> When vfio_pin_pages_remote() is called with a range of addresses that
> includes large folios, the function currently performs individual
> statistics counting operations for each page. This can lead to significant
> performance overheads, especially when dealing with large ranges of pages.
> 
> The function vfio_unpin_pages_remote() has a similar issue, where executing
> put_pfn() for each pfn brings considerable consumption.
> 
> This patchset optimizes the performance of the relevant functions by
> batching the less efficient operations mentioned before.
> 
> The first patch optimizes the performance of the function
> vfio_pin_pages_remote(), while the remaining patches optimize the
> performance of the function vfio_unpin_pages_remote().
> 
> The performance test results, based on v6.16-rc4, for completing the 16G
> VFIO MAP/UNMAP DMA, obtained through unit test[3] with slight
> modifications[4], are as follows.
> 
> Base(6.16-rc4):
> ./vfio-pci-mem-dma-map 0000:03:00.0 16
> ------- AVERAGE (MADV_HUGEPAGE) --------
> VFIO MAP DMA in 0.047 s (340.2 GB/s)
> VFIO UNMAP DMA in 0.135 s (118.6 GB/s)
> ------- AVERAGE (MAP_POPULATE) --------
> VFIO MAP DMA in 0.280 s (57.2 GB/s)
> VFIO UNMAP DMA in 0.312 s (51.3 GB/s)
> ------- AVERAGE (HUGETLBFS) --------
> VFIO MAP DMA in 0.052 s (310.5 GB/s)
> VFIO UNMAP DMA in 0.136 s (117.3 GB/s)
> 
> With this patchset:
> ------- AVERAGE (MADV_HUGEPAGE) --------
> VFIO MAP DMA in 0.027 s (596.4 GB/s)
> VFIO UNMAP DMA in 0.045 s (357.6 GB/s)
> ------- AVERAGE (MAP_POPULATE) --------
> VFIO MAP DMA in 0.288 s (55.5 GB/s)
> VFIO UNMAP DMA in 0.288 s (55.6 GB/s)
> ------- AVERAGE (HUGETLBFS) --------
> VFIO MAP DMA in 0.031 s (508.3 GB/s)
> VFIO UNMAP DMA in 0.045 s (352.9 GB/s)
> 
> For large folio, we achieve an over 40% performance improvement for VFIO
> MAP DMA and an over 66% performance improvement for VFIO DMA UNMAP. For
> small folios, the performance test results show little difference compared
> with the performance before optimization.

Jason mentioned in reply to the other series that, ideally, vfio 
shouldn't be messing with folios at all.

While we now do that on the unpin side, we still do it at the pin side.

Which makes me wonder if we can avoid folios in patch #1 in 
contig_pages(), and simply collect pages that correspond to consecutive 
PFNs.

What was the reason again, that contig_pages() would not exceed a single 
folio?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
  2025-07-02  8:15 ` [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and " David Hildenbrand
@ 2025-07-02  9:38   ` lizhe.67
  2025-07-02  9:57     ` David Hildenbrand
  0 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-07-02  9:38 UTC (permalink / raw)
  To: david; +Cc: alex.williamson, jgg, jgg, kvm, linux-kernel, lizhe.67, peterx

On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:

> Jason mentioned in reply to the other series that, ideally, vfio 
> shouldn't be messing with folios at all.
>
> While we now do that on the unpin side, we still do it at the pin side.

Yes.

> Which makes me wonder if we can avoid folios in patch #1 in 
> contig_pages(), and simply collect pages that correspond to consecutive 
> PFNs.

In my opinion, comparing whether the pfns of two pages are contiguous
is relatively inefficient. Using folios might be a more efficient
solution.

Given that 'page' is already in use within vfio, it seems that adopting
'folio' wouldn't be particularly troublesome? If you have any better
suggestions, I sincerely hope you would share them with me.

> What was the reason again, that contig_pages() would not exceed a single 
> folio?

Regarding this issue, I think Alex and I are on the same page[1]. For a
folio, all of its pages have the same invalid or reserved state. In
the function vfio_pin_pages_remote(), we need to ensure that the state
is the same as the previous pfn (through variable 'rsvd' and function
is_invalid_reserved_pfn()). Therefore, we do not want the return value
of contig_pages() to exceed a single folio.

Thanks,
Zhe

[1]: https://lore.kernel.org/all/20250613081613.0bef3d39.alex.williamson@redhat.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
  2025-07-02  9:38   ` lizhe.67
@ 2025-07-02  9:57     ` David Hildenbrand
  2025-07-02 12:47       ` Jason Gunthorpe
  2025-07-03  3:54       ` lizhe.67
  0 siblings, 2 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-07-02  9:57 UTC (permalink / raw)
  To: lizhe.67; +Cc: alex.williamson, jgg, jgg, kvm, linux-kernel, peterx

On 02.07.25 11:38, lizhe.67@bytedance.com wrote:
> On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
> 
>> Jason mentioned in reply to the other series that, ideally, vfio
>> shouldn't be messing with folios at all.
>>
>> While we now do that on the unpin side, we still do it at the pin side.
> 
> Yes.
> 
>> Which makes me wonder if we can avoid folios in patch #1 in
>> contig_pages(), and simply collect pages that correspond to consecutive
>> PFNs.
> 
> In my opinion, comparing whether the pfns of two pages are contiguous
> is relatively inefficient. Using folios might be a more efficient
> solution.

	buffer[i + 1] == nth_page(buffer[i], 1)

Is extremely efficient, except on

	#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)

Because it's essentially

	buffer[i + 1] == buffer[i] + 1

But with that config it's less efficient

	buffer[i + 1] == pfn_to_page(page_to_pfn(buffer[i]) + 1)

That could be optimized (if we care about the config), assuming that we don't cross
memory sections (e.g., 128 MiB on x86).

See page_ext_iter_next_fast_possible(), that optimized for something similar.

So based on the first page, one could easily determine how far to batch
using the simple

	buffer[i + 1] == buffer[i] + 1

comparison.

That would mean that one could exceed a folio, in theory.

> 
> Given that 'page' is already in use within vfio, it seems that adopting
> 'folio' wouldn't be particularly troublesome? If you have any better
> suggestions, I sincerely hope you would share them with me.

One challenge in the future will likely be that not all pages that we can
GUP will belong to folios. We would possibly be able to handle that by
checking if the page actually belongs to a folio.

Not dealing with folios where avoidable would be easier.

> 
>> What was the reason again, that contig_pages() would not exceed a single
>> folio?
> 
> Regarding this issue, I think Alex and I are on the same page[1]. For a
> folio, all of its pages have the same invalid or reserved state. In
> the function vfio_pin_pages_remote(), we need to ensure that the state
> is the same as the previous pfn (through variable 'rsvd' and function
> is_invalid_reserved_pfn()). Therefore, we do not want the return value
> of contig_pages() to exceed a single folio.

If we obtained a page from GUP, is_invalid_reserved_pfn() would only trigger
for the shared zeropage. but that one can no longer be returned from FOLL_LONGTERM.

So if you know the pages came from GUP, I would assume they are never invalid_reserved?

Again, just a thought on how to apply something similar as done for the unpin case, avoiding
messing with folios.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
  2025-07-02  9:57     ` David Hildenbrand
@ 2025-07-02 12:47       ` Jason Gunthorpe
  2025-07-03  4:04         ` lizhe.67
  2025-07-03  3:54       ` lizhe.67
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-07-02 12:47 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: lizhe.67, alex.williamson, kvm, linux-kernel, peterx

On Wed, Jul 02, 2025 at 11:57:08AM +0200, David Hildenbrand wrote:
> On 02.07.25 11:38, lizhe.67@bytedance.com wrote:
> > On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
> > 
> > > Jason mentioned in reply to the other series that, ideally, vfio
> > > shouldn't be messing with folios at all.
> > > 
> > > While we now do that on the unpin side, we still do it at the pin side.
> > 
> > Yes.
> > 
> > > Which makes me wonder if we can avoid folios in patch #1 in
> > > contig_pages(), and simply collect pages that correspond to consecutive
> > > PFNs.
> > 
> > In my opinion, comparing whether the pfns of two pages are contiguous
> > is relatively inefficient. Using folios might be a more efficient
> > solution.
> 
> 	buffer[i + 1] == nth_page(buffer[i], 1)
>
> Is extremely efficient, except on

sg_alloc_append_table_from_pages() is using the

                next_pfn = (sg_phys(sgt_append->prv) + prv_len) / PAGE_SIZE;
                        last_pg = pfn_to_page(next_pfn - 1);

Approach to evaluate contiguity.

iommufd is also using very similar in batch_from_pages():

                if (!batch_add_pfn(batch, page_to_pfn(*pages)))

So we should not be trying to optimize this only in VFIO, I would drop
that from this series.

If it can be optimized we should try to have some kind of generic
helper for building a physical contiguous range from a struct page
list.

> If we obtained a page from GUP, is_invalid_reserved_pfn() would only trigger
> for the shared zeropage. but that one can no longer be returned from FOLL_LONGTERM.

AFAIK the use of "reserved" here means it is non-gupable memory that
was acquired through follow_pfn. When it is pulled back out of the
iommu_domain as a phys_addr_t the is_invalid_reserved_pfn() is used to
tell if the address came from GUP or if it came from follow_pfn.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
  2025-07-02  3:47     ` lizhe.67
@ 2025-07-02 16:11       ` Dan Carpenter
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Carpenter @ 2025-07-02 16:11 UTC (permalink / raw)
  To: lizhe.67
  Cc: alex.williamson, david, jgg, kvm, linux-kernel, lkp,
	oe-kbuild-all, oe-kbuild, peterx

On Wed, Jul 02, 2025 at 11:47:20AM +0800, lizhe.67@bytedance.com wrote:
> On Tue, 1 Jul 2025 18:13:48 +0300, dan.carpenter@linaro.org wrote:
> 
> > New smatch warnings:
> > drivers/vfio/vfio_iommu_type1.c:788 vfio_pin_pages_remote() error: uninitialized symbol 'rsvd'.
> > 
> > Old smatch warnings:
> > drivers/vfio/vfio_iommu_type1.c:2376 vfio_iommu_type1_attach_group() warn: '&group->next' not removed from list
> > 
> > vim +/rsvd +788 drivers/vfio/vfio_iommu_type1.c
> > 
> > 8f0d5bb95f763c Kirti Wankhede  2016-11-17  684  static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> > 0635559233434a Alex Williamson 2025-02-18  685  				  unsigned long npage, unsigned long *pfn_base,
> > 4b6c33b3229678 Daniel Jordan   2021-02-19  686  				  unsigned long limit, struct vfio_batch *batch)
> > 73fa0d10d077d9 Alex Williamson 2012-07-31  687  {
> > 4d83de6da265cd Daniel Jordan   2021-02-19  688  	unsigned long pfn;
> > 4d83de6da265cd Daniel Jordan   2021-02-19  689  	struct mm_struct *mm = current->mm;
> > 6c38c055cc4c0a Alex Williamson 2016-12-30  690  	long ret, pinned = 0, lock_acct = 0;
> > 89c29def6b0101 Alex Williamson 2018-06-02  691  	bool rsvd;
> > a54eb55045ae9b Kirti Wankhede  2016-11-17  692  	dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
> > 166fd7d94afdac Alex Williamson 2013-06-21  693  
> > 6c38c055cc4c0a Alex Williamson 2016-12-30  694  	/* This code path is only user initiated */
> > 4d83de6da265cd Daniel Jordan   2021-02-19  695  	if (!mm)
> > 166fd7d94afdac Alex Williamson 2013-06-21  696  		return -ENODEV;
> > 73fa0d10d077d9 Alex Williamson 2012-07-31  697  
> > 4d83de6da265cd Daniel Jordan   2021-02-19  698  	if (batch->size) {
> > 4d83de6da265cd Daniel Jordan   2021-02-19  699  		/* Leftover pages in batch from an earlier call. */
> > 4d83de6da265cd Daniel Jordan   2021-02-19  700  		*pfn_base = page_to_pfn(batch->pages[batch->offset]);
> > 4d83de6da265cd Daniel Jordan   2021-02-19  701  		pfn = *pfn_base;
> > 89c29def6b0101 Alex Williamson 2018-06-02  702  		rsvd = is_invalid_reserved_pfn(*pfn_base);
> 
> When batch->size is not zero, we initialize rsvd here.
> 
> > 4d83de6da265cd Daniel Jordan   2021-02-19  703  	} else {
> > 4d83de6da265cd Daniel Jordan   2021-02-19  704  		*pfn_base = 0;
> 
> When the value of batch->size is zero, we set the value of *pfn_base
> to zero and do not initialize rsvd for the time being.
> 
> > 5c6c2b21ecc9ad Alex Williamson 2013-06-21  705  	}
> > 5c6c2b21ecc9ad Alex Williamson 2013-06-21  706  
> > eb996eec783c1e Alex Williamson 2025-02-18  707  	if (unlikely(disable_hugepages))
> > eb996eec783c1e Alex Williamson 2025-02-18  708  		npage = 1;
> > eb996eec783c1e Alex Williamson 2025-02-18  709  
> > 4d83de6da265cd Daniel Jordan   2021-02-19  710  	while (npage) {
> > 4d83de6da265cd Daniel Jordan   2021-02-19  711  		if (!batch->size) {
> > 4d83de6da265cd Daniel Jordan   2021-02-19  712  			/* Empty batch, so refill it. */
> > eb996eec783c1e Alex Williamson 2025-02-18  713  			ret = vaddr_get_pfns(mm, vaddr, npage, dma->prot,
> > eb996eec783c1e Alex Williamson 2025-02-18  714  					     &pfn, batch);
> > be16c1fd99f41a Daniel Jordan   2021-02-19  715  			if (ret < 0)
> > 4d83de6da265cd Daniel Jordan   2021-02-19  716  				goto unpin_out;
> > 166fd7d94afdac Alex Williamson 2013-06-21  717  
> > 4d83de6da265cd Daniel Jordan   2021-02-19  718  			if (!*pfn_base) {
> > 4d83de6da265cd Daniel Jordan   2021-02-19  719  				*pfn_base = pfn;
> > 4d83de6da265cd Daniel Jordan   2021-02-19  720  				rsvd = is_invalid_reserved_pfn(*pfn_base);
> 
> Therefore, for the first loop, when batch->size is zero, *pfn_base must
> be zero, which will then lead to the initialization of rsvd.
> 

Yeah.  :/

I don't know why this warning was printed honestly.  Smatch is supposed
to figure that kind of thing out correctly.  It isn't printed on my
system.  I've tried deleting the cross function DB (which shouldn't
matter) and I'm using the published version of Smatch but I can't get it
to print.  Ah well.  My bad.  Thanks for taking a look.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
  2025-06-30  7:25 ` [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
@ 2025-07-02 18:27   ` Jason Gunthorpe
  2025-07-03  4:18     ` lizhe.67
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-07-02 18:27 UTC (permalink / raw)
  To: lizhe.67; +Cc: alex.williamson, david, peterx, kvm, linux-kernel

On Mon, Jun 30, 2025 at 03:25:16PM +0800, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
> 
> The function vpfn_pages() can help us determine the number of vpfn
> nodes on the vpfn rb tree within a specified range. This allows us
> to avoid searching for each vpfn individually in the function
> vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
> calls in function vfio_unpin_pages_remote().
> 
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index a2d7abd4f2c2..330fff4fe96d 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -804,16 +804,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
>  				    unsigned long pfn, unsigned long npage,
>  				    bool do_accounting)
>  {
> -	long unlocked = 0, locked = 0;
> +	long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
>  	long i;

The logic in vpfn_pages?() doesn't seem quite right? Don't we want  to
count the number of pages within the range that fall within the rb
tree?

vpfn_pages() looks like it is only counting the number of RB tree
nodes within the range?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio
  2025-06-30  7:25 ` [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
@ 2025-07-02 18:28   ` Jason Gunthorpe
  2025-07-03  6:12     ` lizhe.67
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-07-02 18:28 UTC (permalink / raw)
  To: lizhe.67; +Cc: alex.williamson, david, peterx, kvm, linux-kernel

On Mon, Jun 30, 2025 at 03:25:18PM +0800, lizhe.67@bytedance.com wrote:
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index a02bc340c112..7cacfb2cefe3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -802,17 +802,29 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
>  	return pinned;
>  }
>  
> +static inline void put_valid_unreserved_pfns(unsigned long start_pfn,
> +		unsigned long npage, int prot)
> +{
> +	unpin_user_page_range_dirty_lock(pfn_to_page(start_pfn), npage,
> +					 prot & IOMMU_WRITE);
> +}

I don't think you need this wrapper.

This patch and the prior look OK

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
  2025-07-02  9:57     ` David Hildenbrand
  2025-07-02 12:47       ` Jason Gunthorpe
@ 2025-07-03  3:54       ` lizhe.67
  2025-07-03 11:06         ` David Hildenbrand
  1 sibling, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-07-03  3:54 UTC (permalink / raw)
  To: david; +Cc: alex.williamson, jgg, jgg, kvm, linux-kernel, lizhe.67, peterx

On Wed, 2 Jul 2025 11:57:08 +0200, david@redhat.com wrote:

> On 02.07.25 11:38, lizhe.67@bytedance.com wrote:
> > On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
> > 
> >> Jason mentioned in reply to the other series that, ideally, vfio
> >> shouldn't be messing with folios at all.
> >>
> >> While we now do that on the unpin side, we still do it at the pin side.
> > 
> > Yes.
> > 
> >> Which makes me wonder if we can avoid folios in patch #1 in
> >> contig_pages(), and simply collect pages that correspond to consecutive
> >> PFNs.
> > 
> > In my opinion, comparing whether the pfns of two pages are contiguous
> > is relatively inefficient. Using folios might be a more efficient
> > solution.
> 
> 	buffer[i + 1] == nth_page(buffer[i], 1)
> 
> Is extremely efficient, except on
> 
> 	#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> 
> Because it's essentially
> 
> 	buffer[i + 1] == buffer[i] + 1
> 
> But with that config it's less efficient
> 
> 	buffer[i + 1] == pfn_to_page(page_to_pfn(buffer[i]) + 1)
> 
> That could be optimized (if we care about the config), assuming that we don't cross
> memory sections (e.g., 128 MiB on x86).
> 
> See page_ext_iter_next_fast_possible(), that optimized for something similar.
> 
> So based on the first page, one could easily determine how far to batch
> using the simple
> 
> 	buffer[i + 1] == buffer[i] + 1
> 
> comparison.
> 
> That would mean that one could exceed a folio, in theory.

Thank you very much for your suggestion. I think we can focus on
optimizing the case where

!(defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP))

I believe that in most scenarios where vfio is used,
CONFIG_SPARSEMEM_VMEMMAP is enabled. Excessive CONFIG
may make the patch appear overly complicated.

> > Given that 'page' is already in use within vfio, it seems that adopting
> > 'folio' wouldn't be particularly troublesome? If you have any better
> > suggestions, I sincerely hope you would share them with me.
> 
> One challenge in the future will likely be that not all pages that we can
> GUP will belong to folios. We would possibly be able to handle that by
> checking if the page actually belongs to a folio.
> 
> Not dealing with folios where avoidable would be easier.
> 
> > 
> >> What was the reason again, that contig_pages() would not exceed a single
> >> folio?
> > 
> > Regarding this issue, I think Alex and I are on the same page[1]. For a
> > folio, all of its pages have the same invalid or reserved state. In
> > the function vfio_pin_pages_remote(), we need to ensure that the state
> > is the same as the previous pfn (through variable 'rsvd' and function
> > is_invalid_reserved_pfn()). Therefore, we do not want the return value
> > of contig_pages() to exceed a single folio.
> 
> If we obtained a page from GUP, is_invalid_reserved_pfn() would only trigger
> for the shared zeropage. but that one can no longer be returned from FOLL_LONGTERM.
> 
> So if you know the pages came from GUP, I would assume they are never invalid_reserved?

Yes, we use function vaddr_get_pfns(), which ultimately invokes GUP
with the FOLL_LONGTERM flag.

> Again, just a thought on how to apply something similar as done for the unpin case, avoiding
> messing with folios.

Taking into account the previous discussion, it seems that we might
simply replace the contig_pages() in patch #1 with the following one.
Also, function contig_pages() could also be extracted into mm.h as a
helper function. It seems that Jason would like to utilize it in other
contexts. Moreover, the subject of this patchset should be changed to
"Optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote()". Do
you think this would work?

+static inline unsigned long contig_pages(struct page **pages,
+					 unsigned long size)
+{
+	struct page *first_page = pages[0];
+	unsigned long i;
+
+	for (i = 1; i < size; i++)
+		if (pages[i] != nth_page(first_page, i))
+			break;
+	return i;
+}

I have conducted a preliminary performance test, and the results are
similar to those obtained previously.

Thanks,
Zhe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
  2025-07-02 12:47       ` Jason Gunthorpe
@ 2025-07-03  4:04         ` lizhe.67
  0 siblings, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-07-03  4:04 UTC (permalink / raw)
  To: jgg; +Cc: alex.williamson, david, kvm, linux-kernel, lizhe.67, peterx

> On Wed, 2 Jul 2025 09:47:56 -0300, jgg@nvidia.com wrote:
> 
> On Wed, Jul 02, 2025 at 11:57:08AM +0200, David Hildenbrand wrote:
> > On 02.07.25 11:38, lizhe.67@bytedance.com wrote:
> > > On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
> > > 
> > > > Jason mentioned in reply to the other series that, ideally, vfio
> > > > shouldn't be messing with folios at all.
> > > > 
> > > > While we now do that on the unpin side, we still do it at the pin side.
> > > 
> > > Yes.
> > > 
> > > > Which makes me wonder if we can avoid folios in patch #1 in
> > > > contig_pages(), and simply collect pages that correspond to consecutive
> > > > PFNs.
> > > 
> > > In my opinion, comparing whether the pfns of two pages are contiguous
> > > is relatively inefficient. Using folios might be a more efficient
> > > solution.
> > 
> > 	buffer[i + 1] == nth_page(buffer[i], 1)
> >
> > Is extremely efficient, except on
> 
> sg_alloc_append_table_from_pages() is using the
> 
>                 next_pfn = (sg_phys(sgt_append->prv) + prv_len) / PAGE_SIZE;
>                         last_pg = pfn_to_page(next_pfn - 1);
> 
> Approach to evaluate contiguity.
> 
> iommufd is also using very similar in batch_from_pages():
> 
>                 if (!batch_add_pfn(batch, page_to_pfn(*pages)))

I'm not particularly familiar with this section of the code, so I
can't say for certain. Regarding the two locations mentioned earlier,
if it's possible to determine the contiguity of physical memory by
passing in an array of page pointers, then we could adopt the
approach suggested by David. I've made a preliminary implementation
here[1]. Is that helper function okay with you?

> So we should not be trying to optimize this only in VFIO, I would drop
> that from this series.
> 
> If it can be optimized we should try to have some kind of generic
> helper for building a physical contiguous range from a struct page
> list.

Thanks,
Zhe

[1]: https://lore.kernel.org/all/20250703035425.36124-1-lizhe.67@bytedance.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
  2025-07-02 18:27   ` Jason Gunthorpe
@ 2025-07-03  4:18     ` lizhe.67
  2025-07-03 12:27       ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-07-03  4:18 UTC (permalink / raw)
  To: jgg; +Cc: alex.williamson, david, kvm, linux-kernel, lizhe.67, peterx, jgg

On Wed, 2 Jul 2025 15:27:59 -0300, jgg@ziepe.ca wrote:

> On Mon, Jun 30, 2025 at 03:25:16PM +0800, lizhe.67@bytedance.com wrote:
> > From: Li Zhe <lizhe.67@bytedance.com>
> > 
> > The function vpfn_pages() can help us determine the number of vpfn
> > nodes on the vpfn rb tree within a specified range. This allows us
> > to avoid searching for each vpfn individually in the function
> > vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
> > calls in function vfio_unpin_pages_remote().
> > 
> > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 10 +++-------
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index a2d7abd4f2c2..330fff4fe96d 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -804,16 +804,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
> >  				    unsigned long pfn, unsigned long npage,
> >  				    bool do_accounting)
> >  {
> > -	long unlocked = 0, locked = 0;
> > +	long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
> >  	long i;
> 
> The logic in vpfn_pages?() doesn't seem quite right? Don't we want  to
> count the number of pages within the range that fall within the rb
> tree?
> 
> vpfn_pages() looks like it is only counting the number of RB tree
> nodes within the range?

As I understand it, a vfio_pfn corresponds to a single page, am I right?

Thanks,
Zhe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio
  2025-07-02 18:28   ` Jason Gunthorpe
@ 2025-07-03  6:12     ` lizhe.67
  0 siblings, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-07-03  6:12 UTC (permalink / raw)
  To: jgg; +Cc: alex.williamson, david, kvm, linux-kernel, lizhe.67, peterx, jgg

On Wed, 2 Jul 2025 15:28:44 -0300, jgg@ziepe.ca wrote:

> On Mon, Jun 30, 2025 at 03:25:18PM +0800, lizhe.67@bytedance.com wrote:
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index a02bc340c112..7cacfb2cefe3 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -802,17 +802,29 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> >  	return pinned;
> >  }
> >  
> > +static inline void put_valid_unreserved_pfns(unsigned long start_pfn,
> > +		unsigned long npage, int prot)
> > +{
> > +	unpin_user_page_range_dirty_lock(pfn_to_page(start_pfn), npage,
> > +					 prot & IOMMU_WRITE);
> > +}
> 
> I don't think you need this wrapper.
> 
> This patch and the prior look OK

Thank you very much for your review. The primary purpose of adding
this wrapper is to make the code more comprehensible. Would it be
better to keep this wrapper? Perhaps we could also save on the need
for some comments.

Thanks,
Zhe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
  2025-07-03  3:54       ` lizhe.67
@ 2025-07-03 11:06         ` David Hildenbrand
  2025-07-03 11:12           ` Jason Gunthorpe
  2025-07-03 11:34           ` lizhe.67
  0 siblings, 2 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-07-03 11:06 UTC (permalink / raw)
  To: lizhe.67; +Cc: alex.williamson, jgg, jgg, kvm, linux-kernel, peterx

On 03.07.25 05:54, lizhe.67@bytedance.com wrote:
> On Wed, 2 Jul 2025 11:57:08 +0200, david@redhat.com wrote:
> 
>> On 02.07.25 11:38, lizhe.67@bytedance.com wrote:
>>> On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
>>>
>>>> Jason mentioned in reply to the other series that, ideally, vfio
>>>> shouldn't be messing with folios at all.
>>>>
>>>> While we now do that on the unpin side, we still do it at the pin side.
>>>
>>> Yes.
>>>
>>>> Which makes me wonder if we can avoid folios in patch #1 in
>>>> contig_pages(), and simply collect pages that correspond to consecutive
>>>> PFNs.
>>>
>>> In my opinion, comparing whether the pfns of two pages are contiguous
>>> is relatively inefficient. Using folios might be a more efficient
>>> solution.
>>
>> 	buffer[i + 1] == nth_page(buffer[i], 1)
>>
>> Is extremely efficient, except on
>>
>> 	#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>>
>> Because it's essentially
>>
>> 	buffer[i + 1] == buffer[i] + 1
>>
>> But with that config it's less efficient
>>
>> 	buffer[i + 1] == pfn_to_page(page_to_pfn(buffer[i]) + 1)
>>
>> That could be optimized (if we care about the config), assuming that we don't cross
>> memory sections (e.g., 128 MiB on x86).
>>
>> See page_ext_iter_next_fast_possible(), that optimized for something similar.
>>
>> So based on the first page, one could easily determine how far to batch
>> using the simple
>>
>> 	buffer[i + 1] == buffer[i] + 1
>>
>> comparison.
>>
>> That would mean that one could exceed a folio, in theory.
> 
> Thank you very much for your suggestion. I think we can focus on
> optimizing the case where
> 
> !(defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP))
> 
> I believe that in most scenarios where vfio is used,
> CONFIG_SPARSEMEM_VMEMMAP is enabled. Excessive CONFIG
> may make the patch appear overly complicated.
> 
>>> Given that 'page' is already in use within vfio, it seems that adopting
>>> 'folio' wouldn't be particularly troublesome? If you have any better
>>> suggestions, I sincerely hope you would share them with me.
>>
>> One challenge in the future will likely be that not all pages that we can
>> GUP will belong to folios. We would possibly be able to handle that by
>> checking if the page actually belongs to a folio.
>>
>> Not dealing with folios where avoidable would be easier.
>>
>>>
>>>> What was the reason again, that contig_pages() would not exceed a single
>>>> folio?
>>>
>>> Regarding this issue, I think Alex and I are on the same page[1]. For a
>>> folio, all of its pages have the same invalid or reserved state. In
>>> the function vfio_pin_pages_remote(), we need to ensure that the state
>>> is the same as the previous pfn (through variable 'rsvd' and function
>>> is_invalid_reserved_pfn()). Therefore, we do not want the return value
>>> of contig_pages() to exceed a single folio.
>>
>> If we obtained a page from GUP, is_invalid_reserved_pfn() would only trigger
>> for the shared zeropage. but that one can no longer be returned from FOLL_LONGTERM.
>>
>> So if you know the pages came from GUP, I would assume they are never invalid_reserved?
> 
> Yes, we use function vaddr_get_pfns(), which ultimately invokes GUP
> with the FOLL_LONGTERM flag.
> 
>> Again, just a thought on how to apply something similar as done for the unpin case, avoiding
>> messing with folios.
> 
> Taking into account the previous discussion, it seems that we might
> simply replace the contig_pages() in patch #1 with the following one.
> Also, function contig_pages() could also be extracted into mm.h as a
> helper function. It seems that Jason would like to utilize it in other
> contexts. Moreover, the subject of this patchset should be changed to
> "Optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote()". Do
> you think this would work?
> 
> +static inline unsigned long contig_pages(struct page **pages,
> +					 unsigned long size)

size -> nr_pages

> +{
> +	struct page *first_page = pages[0];
> +	unsigned long i;
> +
> +	for (i = 1; i < size; i++)
> +		if (pages[i] != nth_page(first_page, i))
> +			break;
> +	return i;
> +}

LGTM.

I wonder if we can find a better function name, especially when moving 
this to some header where it can be reused.

Something that expresses that we will return the next batch that starts 
at the first page.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
  2025-07-03 11:06         ` David Hildenbrand
@ 2025-07-03 11:12           ` Jason Gunthorpe
  2025-07-03 11:35             ` lizhe.67
  2025-07-03 11:34           ` lizhe.67
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-07-03 11:12 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: lizhe.67, alex.williamson, kvm, linux-kernel, peterx

On Thu, Jul 03, 2025 at 01:06:26PM +0200, David Hildenbrand wrote:
> > +{
> > +	struct page *first_page = pages[0];
> > +	unsigned long i;
> > +
> > +	for (i = 1; i < size; i++)
> > +		if (pages[i] != nth_page(first_page, i))
> > +			break;
> > +	return i;
> > +}
> 
> LGTM.
> 
> I wonder if we can find a better function name, especially when moving this
> to some header where it can be reused.

It should be a common function:

  unsigned long num_pages_contiguous(struct page *list, size_t nelms);

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
  2025-07-03 11:06         ` David Hildenbrand
  2025-07-03 11:12           ` Jason Gunthorpe
@ 2025-07-03 11:34           ` lizhe.67
  1 sibling, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-07-03 11:34 UTC (permalink / raw)
  To: david; +Cc: alex.williamson, jgg, jgg, kvm, linux-kernel, lizhe.67, peterx

On Thu, 3 Jul 2025 13:06:26 +0200, david@redhat.com wrote:

> > +static inline unsigned long contig_pages(struct page **pages,
> > +					 unsigned long size)
> 
> size -> nr_pages
> 
> > +{
> > +	struct page *first_page = pages[0];
> > +	unsigned long i;
> > +
> > +	for (i = 1; i < size; i++)
> > +		if (pages[i] != nth_page(first_page, i))
> > +			break;
> > +	return i;
> > +}
> 
> LGTM.
> 
> I wonder if we can find a better function name, especially when moving 
> this to some header where it can be reused.
> 
> Something that expresses that we will return the next batch that starts 
> at the first page.

Thank you. Given that this function may have more users in the future,
I will place it in include/linux/mm.h instead of the vfio file. Once
I've addressed the comments on the other patches with Jason, I will
resend a new patchset.

Thanks,
Zhe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
  2025-07-03 11:12           ` Jason Gunthorpe
@ 2025-07-03 11:35             ` lizhe.67
  0 siblings, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-07-03 11:35 UTC (permalink / raw)
  To: jgg; +Cc: alex.williamson, david, kvm, linux-kernel, lizhe.67, peterx

On Thu, 3 Jul 2025 08:12:16 -0300, jgg@ziepe.ca wrote:

> On Thu, Jul 03, 2025 at 01:06:26PM +0200, David Hildenbrand wrote:
> > > +{
> > > +	struct page *first_page = pages[0];
> > > +	unsigned long i;
> > > +
> > > +	for (i = 1; i < size; i++)
> > > +		if (pages[i] != nth_page(first_page, i))
> > > +			break;
> > > +	return i;
> > > +}
> > 
> > LGTM.
> > 
> > I wonder if we can find a better function name, especially when moving this
> > to some header where it can be reused.
> 
> It should be a common function:
> 
>   unsigned long num_pages_contiguous(struct page *list, size_t nelms);

I fully agree with you.

Thanks,
Zhe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
  2025-07-03  4:18     ` lizhe.67
@ 2025-07-03 12:27       ` Jason Gunthorpe
  2025-07-04  2:20         ` lizhe.67
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-07-03 12:27 UTC (permalink / raw)
  To: lizhe.67; +Cc: alex.williamson, david, kvm, linux-kernel, peterx

On Thu, Jul 03, 2025 at 12:18:22PM +0800, lizhe.67@bytedance.com wrote:
> On Wed, 2 Jul 2025 15:27:59 -0300, jgg@ziepe.ca wrote:
> 
> > On Mon, Jun 30, 2025 at 03:25:16PM +0800, lizhe.67@bytedance.com wrote:
> > > From: Li Zhe <lizhe.67@bytedance.com>
> > > 
> > > The function vpfn_pages() can help us determine the number of vpfn
> > > nodes on the vpfn rb tree within a specified range. This allows us
> > > to avoid searching for each vpfn individually in the function
> > > vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
> > > calls in function vfio_unpin_pages_remote().
> > > 
> > > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> > > ---
> > >  drivers/vfio/vfio_iommu_type1.c | 10 +++-------
> > >  1 file changed, 3 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > > index a2d7abd4f2c2..330fff4fe96d 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -804,16 +804,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
> > >  				    unsigned long pfn, unsigned long npage,
> > >  				    bool do_accounting)
> > >  {
> > > -	long unlocked = 0, locked = 0;
> > > +	long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
> > >  	long i;
> > 
> > The logic in vpfn_pages?() doesn't seem quite right? Don't we want  to
> > count the number of pages within the range that fall within the rb
> > tree?
> > 
> > vpfn_pages() looks like it is only counting the number of RB tree
> > nodes within the range?
> 
> As I understand it, a vfio_pfn corresponds to a single page, am I right?

It does look that way, it is not what I was expecting iommufd holds
ranges for this job..

So this is OK then

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
  2025-07-03 12:27       ` Jason Gunthorpe
@ 2025-07-04  2:20         ` lizhe.67
  0 siblings, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-07-04  2:20 UTC (permalink / raw)
  To: jgg; +Cc: alex.williamson, david, kvm, linux-kernel, lizhe.67, peterx

On Thu, 3 Jul 2025 09:27:56 -0300, jgg@nvidia.com wrote:

> On Thu, Jul 03, 2025 at 12:18:22PM +0800, lizhe.67@bytedance.com wrote:
> > On Wed, 2 Jul 2025 15:27:59 -0300, jgg@ziepe.ca wrote:
> > 
> > > On Mon, Jun 30, 2025 at 03:25:16PM +0800, lizhe.67@bytedance.com wrote:
> > > > From: Li Zhe <lizhe.67@bytedance.com>
> > > > 
> > > > The function vpfn_pages() can help us determine the number of vpfn
> > > > nodes on the vpfn rb tree within a specified range. This allows us
> > > > to avoid searching for each vpfn individually in the function
> > > > vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
> > > > calls in function vfio_unpin_pages_remote().
> > > > 
> > > > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> > > > ---
> > > >  drivers/vfio/vfio_iommu_type1.c | 10 +++-------
> > > >  1 file changed, 3 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > > > index a2d7abd4f2c2..330fff4fe96d 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -804,16 +804,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
> > > >  				    unsigned long pfn, unsigned long npage,
> > > >  				    bool do_accounting)
> > > >  {
> > > > -	long unlocked = 0, locked = 0;
> > > > +	long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
> > > >  	long i;
> > > 
> > > The logic in vpfn_pages?() doesn't seem quite right? Don't we want  to
> > > count the number of pages within the range that fall within the rb
> > > tree?
> > > 
> > > vpfn_pages() looks like it is only counting the number of RB tree
> > > nodes within the range?
> > 
> > As I understand it, a vfio_pfn corresponds to a single page, am I right?
> 
> It does look that way, it is not what I was expecting iommufd holds
> ranges for this job..
> 
> So this is OK then

Thank you. It seems that we have reached a consensus on all the comments.
I will send out a v2 patchset soon.

Thanks,
Zhe

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-07-04  2:20 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-30  7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-30  7:25 ` [PATCH 1/4] vfio/type1: optimize vfio_pin_pages_remote() for large folios lizhe.67
2025-06-30  7:25 ` [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
2025-07-02 18:27   ` Jason Gunthorpe
2025-07-03  4:18     ` lizhe.67
2025-07-03 12:27       ` Jason Gunthorpe
2025-07-04  2:20         ` lizhe.67
2025-06-30  7:25 ` [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
2025-07-01 15:13   ` Dan Carpenter
2025-07-02  3:47     ` lizhe.67
2025-07-02 16:11       ` Dan Carpenter
2025-06-30  7:25 ` [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-07-02 18:28   ` Jason Gunthorpe
2025-07-03  6:12     ` lizhe.67
2025-07-02  8:15 ` [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and " David Hildenbrand
2025-07-02  9:38   ` lizhe.67
2025-07-02  9:57     ` David Hildenbrand
2025-07-02 12:47       ` Jason Gunthorpe
2025-07-03  4:04         ` lizhe.67
2025-07-03  3:54       ` lizhe.67
2025-07-03 11:06         ` David Hildenbrand
2025-07-03 11:12           ` Jason Gunthorpe
2025-07-03 11:35             ` lizhe.67
2025-07-03 11:34           ` lizhe.67

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).