* [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
@ 2025-06-30 7:25 lizhe.67
2025-06-30 7:25 ` [PATCH 1/4] vfio/type1: optimize vfio_pin_pages_remote() for large folios lizhe.67
` (4 more replies)
0 siblings, 5 replies; 24+ messages in thread
From: lizhe.67 @ 2025-06-30 7:25 UTC (permalink / raw)
To: alex.williamson, jgg, david, peterx; +Cc: kvm, linux-kernel, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
This patchset is an consolidation of the two previous patchsets[1][2].
When vfio_pin_pages_remote() is called with a range of addresses that
includes large folios, the function currently performs individual
statistics counting operations for each page. This can lead to significant
performance overheads, especially when dealing with large ranges of pages.
The function vfio_unpin_pages_remote() has a similar issue, where executing
put_pfn() for each pfn brings considerable consumption.
This patchset optimizes the performance of the relevant functions by
batching the less efficient operations mentioned before.
The first patch optimizes the performance of the function
vfio_pin_pages_remote(), while the remaining patches optimize the
performance of the function vfio_unpin_pages_remote().
The performance test results, based on v6.16-rc4, for completing the 16G
VFIO MAP/UNMAP DMA, obtained through unit test[3] with slight
modifications[4], are as follows.
Base(6.16-rc4):
./vfio-pci-mem-dma-map 0000:03:00.0 16
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.047 s (340.2 GB/s)
VFIO UNMAP DMA in 0.135 s (118.6 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.280 s (57.2 GB/s)
VFIO UNMAP DMA in 0.312 s (51.3 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.052 s (310.5 GB/s)
VFIO UNMAP DMA in 0.136 s (117.3 GB/s)
With this patchset:
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.027 s (596.4 GB/s)
VFIO UNMAP DMA in 0.045 s (357.6 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.288 s (55.5 GB/s)
VFIO UNMAP DMA in 0.288 s (55.6 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.031 s (508.3 GB/s)
VFIO UNMAP DMA in 0.045 s (352.9 GB/s)
For large folio, we achieve an over 40% performance improvement for VFIO
MAP DMA and an over 66% performance improvement for VFIO DMA UNMAP. For
small folios, the performance test results show little difference compared
with the performance before optimization.
[1]: https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/
[2]: https://lore.kernel.org/all/20250620032344.13382-1-lizhe.67@bytedance.com/
[3]: https://github.com/awilliam/tests/blob/vfio-pci-mem-dma-map/vfio-pci-mem-dma-map.c
[4]: https://lore.kernel.org/all/20250610031013.98556-1-lizhe.67@bytedance.com/
Li Zhe (4):
vfio/type1: optimize vfio_pin_pages_remote() for large folios
vfio/type1: batch vfio_find_vpfn() in function
vfio_unpin_pages_remote()
vfio/type1: introduce a new member has_rsvd for struct vfio_dma
vfio/type1: optimize vfio_unpin_pages_remote() for large folio
drivers/vfio/vfio_iommu_type1.c | 121 ++++++++++++++++++++++++++------
1 file changed, 100 insertions(+), 21 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/4] vfio/type1: optimize vfio_pin_pages_remote() for large folios
2025-06-30 7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
@ 2025-06-30 7:25 ` lizhe.67
2025-06-30 7:25 ` [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
` (3 subsequent siblings)
4 siblings, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-06-30 7:25 UTC (permalink / raw)
To: alex.williamson, jgg, david, peterx; +Cc: kvm, linux-kernel, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
When vfio_pin_pages_remote() is called with a range of addresses that
includes large folios, the function currently performs individual
statistics counting operations for each page. This can lead to significant
performance overheads, especially when dealing with large ranges of pages.
This patch optimize this process by batching the statistics counting
operations.
The performance test results for completing the 16G VFIO IOMMU DMA mapping
are as follows.
Base(v6.16-rc4):
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.047 s (340.2 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.280 s (57.2 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.052 s (310.5 GB/s)
With this patch:
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.027 s (596.5 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.290 s (55.2 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.031 s (511.1 GB/s)
For large folio, we achieve an over 40% performance improvement.
For small folios, the performance test results indicate a
particularly minor performance drop.
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
drivers/vfio/vfio_iommu_type1.c | 93 ++++++++++++++++++++++++++++-----
1 file changed, 81 insertions(+), 12 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1136d7ac6b59..a2d7abd4f2c2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -318,7 +318,13 @@ static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
/*
* Helper Functions for host iova-pfn list
*/
-static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
+
+/*
+ * Find the highest vfio_pfn that overlapping the range
+ * [iova_start, iova_end) in rb tree.
+ */
+static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma,
+ dma_addr_t iova_start, dma_addr_t iova_end)
{
struct vfio_pfn *vpfn;
struct rb_node *node = dma->pfn_list.rb_node;
@@ -326,9 +332,9 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
while (node) {
vpfn = rb_entry(node, struct vfio_pfn, node);
- if (iova < vpfn->iova)
+ if (iova_end <= vpfn->iova)
node = node->rb_left;
- else if (iova > vpfn->iova)
+ else if (iova_start > vpfn->iova)
node = node->rb_right;
else
return vpfn;
@@ -336,6 +342,11 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
return NULL;
}
+static inline struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
+{
+ return vfio_find_vpfn_range(dma, iova, iova + PAGE_SIZE);
+}
+
static void vfio_link_pfn(struct vfio_dma *dma,
struct vfio_pfn *new)
{
@@ -614,6 +625,56 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
return ret;
}
+static long contig_pages(struct vfio_dma *dma,
+ struct vfio_batch *batch, dma_addr_t iova)
+{
+ struct page *page = batch->pages[batch->offset];
+ struct folio *folio = page_folio(page);
+ long idx = folio_page_idx(folio, page);
+ long max = min_t(long, batch->size, folio_nr_pages(folio) - idx);
+ long nr_pages;
+
+ for (nr_pages = 1; nr_pages < max; nr_pages++) {
+ if (batch->pages[batch->offset + nr_pages] !=
+ folio_page(folio, idx + nr_pages))
+ break;
+ }
+
+ return nr_pages;
+}
+
+static long vpfn_pages(struct vfio_dma *dma,
+ dma_addr_t iova_start, long nr_pages)
+{
+ dma_addr_t iova_end = iova_start + (nr_pages << PAGE_SHIFT);
+ struct vfio_pfn *top = vfio_find_vpfn_range(dma, iova_start, iova_end);
+ long ret = 1;
+ struct vfio_pfn *vpfn;
+ struct rb_node *prev;
+ struct rb_node *next;
+
+ if (likely(!top))
+ return 0;
+
+ prev = next = &top->node;
+
+ while ((prev = rb_prev(prev))) {
+ vpfn = rb_entry(prev, struct vfio_pfn, node);
+ if (vpfn->iova < iova_start)
+ break;
+ ret++;
+ }
+
+ while ((next = rb_next(next))) {
+ vpfn = rb_entry(next, struct vfio_pfn, node);
+ if (vpfn->iova >= iova_end)
+ break;
+ ret++;
+ }
+
+ return ret;
+}
+
/*
* Attempt to pin pages. We really don't want to track all the pfns and
* the iommu can only map chunks of consecutive pfns anyway, so get the
@@ -680,32 +741,40 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
* and rsvd here, and therefore continues to use the batch.
*/
while (true) {
+ long nr_pages, acct_pages = 0;
+
if (pfn != *pfn_base + pinned ||
rsvd != is_invalid_reserved_pfn(pfn))
goto out;
+ nr_pages = contig_pages(dma, batch, iova);
+ if (!rsvd) {
+ acct_pages = nr_pages;
+ acct_pages -= vpfn_pages(dma, iova, nr_pages);
+ }
+
/*
* Reserved pages aren't counted against the user,
* externally pinned pages are already counted against
* the user.
*/
- if (!rsvd && !vfio_find_vpfn(dma, iova)) {
+ if (acct_pages) {
if (!dma->lock_cap &&
- mm->locked_vm + lock_acct + 1 > limit) {
+ mm->locked_vm + lock_acct + acct_pages > limit) {
pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
__func__, limit << PAGE_SHIFT);
ret = -ENOMEM;
goto unpin_out;
}
- lock_acct++;
+ lock_acct += acct_pages;
}
- pinned++;
- npage--;
- vaddr += PAGE_SIZE;
- iova += PAGE_SIZE;
- batch->offset++;
- batch->size--;
+ pinned += nr_pages;
+ npage -= nr_pages;
+ vaddr += PAGE_SIZE * nr_pages;
+ iova += PAGE_SIZE * nr_pages;
+ batch->offset += nr_pages;
+ batch->size -= nr_pages;
if (!batch->size)
break;
--
2.20.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
2025-06-30 7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-30 7:25 ` [PATCH 1/4] vfio/type1: optimize vfio_pin_pages_remote() for large folios lizhe.67
@ 2025-06-30 7:25 ` lizhe.67
2025-07-02 18:27 ` Jason Gunthorpe
2025-06-30 7:25 ` [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
` (2 subsequent siblings)
4 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-06-30 7:25 UTC (permalink / raw)
To: alex.williamson, jgg, david, peterx; +Cc: kvm, linux-kernel, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
The function vpfn_pages() can help us determine the number of vpfn
nodes on the vpfn rb tree within a specified range. This allows us
to avoid searching for each vpfn individually in the function
vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
calls in function vfio_unpin_pages_remote().
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
drivers/vfio/vfio_iommu_type1.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a2d7abd4f2c2..330fff4fe96d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -804,16 +804,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
unsigned long pfn, unsigned long npage,
bool do_accounting)
{
- long unlocked = 0, locked = 0;
+ long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
long i;
- for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
- if (put_pfn(pfn++, dma->prot)) {
+ for (i = 0; i < npage; i++)
+ if (put_pfn(pfn++, dma->prot))
unlocked++;
- if (vfio_find_vpfn(dma, iova))
- locked++;
- }
- }
if (do_accounting)
vfio_lock_acct(dma, locked - unlocked, true);
--
2.20.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
2025-06-30 7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-30 7:25 ` [PATCH 1/4] vfio/type1: optimize vfio_pin_pages_remote() for large folios lizhe.67
2025-06-30 7:25 ` [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
@ 2025-06-30 7:25 ` lizhe.67
2025-07-01 15:13 ` Dan Carpenter
2025-06-30 7:25 ` [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-07-02 8:15 ` [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and " David Hildenbrand
4 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-06-30 7:25 UTC (permalink / raw)
To: alex.williamson, jgg, david, peterx; +Cc: kvm, linux-kernel, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
Introduce a new member has_rsvd for struct vfio_dma. This member is
used to indicate whether there are any reserved or invalid pfns in
the region represented by this vfio_dma. If it is true, it indicates
that there is at least one pfn in this region that is either reserved
or invalid.
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
drivers/vfio/vfio_iommu_type1.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 330fff4fe96d..a02bc340c112 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -92,6 +92,7 @@ struct vfio_dma {
bool iommu_mapped;
bool lock_cap; /* capable(CAP_IPC_LOCK) */
bool vaddr_invalid;
+ bool has_rsvd; /* has 1 or more rsvd pfns */
struct task_struct *task;
struct rb_root pfn_list; /* Ex-user pinned pfn list */
unsigned long *bitmap;
@@ -784,6 +785,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
}
out:
+ dma->has_rsvd |= rsvd;
ret = vfio_lock_acct(dma, lock_acct, false);
unpin_out:
--
2.20.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio
2025-06-30 7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
` (2 preceding siblings ...)
2025-06-30 7:25 ` [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
@ 2025-06-30 7:25 ` lizhe.67
2025-07-02 18:28 ` Jason Gunthorpe
2025-07-02 8:15 ` [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and " David Hildenbrand
4 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-06-30 7:25 UTC (permalink / raw)
To: alex.williamson, jgg, david, peterx; +Cc: kvm, linux-kernel, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
When vfio_unpin_pages_remote() is called with a range of addresses that
includes large folios, the function currently performs individual
put_pfn() operations for each page. This can lead to significant
performance overheads, especially when dealing with large ranges of pages.
It would be very rare for reserved PFNs and non reserved will to be mixed
within the same range. So this patch utilizes the has_rsvd variable
introduced in the previous patch to determine whether batch put_pfn()
operations can be performed. Moreover, compared to put_pfn(),
unpin_user_page_range_dirty_lock() is capable of handling large folio
scenarios more efficiently.
The performance test results for completing the 16G VFIO IOMMU DMA
unmapping are as follows.
Base(v6.16-rc4):
./vfio-pci-mem-dma-map 0000:03:00.0 16
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO UNMAP DMA in 0.135 s (118.6 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO UNMAP DMA in 0.312 s (51.3 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO UNMAP DMA in 0.136 s (117.3 GB/s)
With this patchset:
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO UNMAP DMA in 0.045 s (357.6 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO UNMAP DMA in 0.288 s (55.6 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO UNMAP DMA in 0.045 s (352.9 GB/s)
For large folio, we achieve an over 66% performance improvement in
the VFIO UNMAP DMA item. For small folios, the performance test
results appear to show no significant changes.
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
drivers/vfio/vfio_iommu_type1.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index a02bc340c112..7cacfb2cefe3 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -802,17 +802,29 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
return pinned;
}
+static inline void put_valid_unreserved_pfns(unsigned long start_pfn,
+ unsigned long npage, int prot)
+{
+ unpin_user_page_range_dirty_lock(pfn_to_page(start_pfn), npage,
+ prot & IOMMU_WRITE);
+}
+
static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
unsigned long pfn, unsigned long npage,
bool do_accounting)
{
long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
- long i;
- for (i = 0; i < npage; i++)
- if (put_pfn(pfn++, dma->prot))
- unlocked++;
+ if (dma->has_rsvd) {
+ long i;
+ for (i = 0; i < npage; i++)
+ if (put_pfn(pfn++, dma->prot))
+ unlocked++;
+ } else {
+ put_valid_unreserved_pfns(pfn, npage, dma->prot);
+ unlocked = npage;
+ }
if (do_accounting)
vfio_lock_acct(dma, locked - unlocked, true);
--
2.20.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
2025-06-30 7:25 ` [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
@ 2025-07-01 15:13 ` Dan Carpenter
2025-07-02 3:47 ` lizhe.67
0 siblings, 1 reply; 24+ messages in thread
From: Dan Carpenter @ 2025-07-01 15:13 UTC (permalink / raw)
To: oe-kbuild, lizhe.67, alex.williamson, jgg, david, peterx
Cc: lkp, oe-kbuild-all, kvm, linux-kernel, lizhe.67
Hi,
kernel test robot noticed the following build warnings:
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/lizhe-67-bytedance-com/vfio-type1-optimize-vfio_pin_pages_remote-for-large-folios/20250630-152849
base: https://github.com/awilliam/linux-vfio.git next
patch link: https://lore.kernel.org/r/20250630072518.31846-4-lizhe.67%40bytedance.com
patch subject: [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
config: x86_64-randconfig-161-20250701 (https://download.01.org/0day-ci/archive/20250701/202507012121.wkDLcDXn-lkp@intel.com/config)
compiler: clang version 20.1.7 (https://github.com/llvm/llvm-project 6146a88f60492b520a36f8f8f3231e15f3cc6082)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202507012121.wkDLcDXn-lkp@intel.com/
New smatch warnings:
drivers/vfio/vfio_iommu_type1.c:788 vfio_pin_pages_remote() error: uninitialized symbol 'rsvd'.
Old smatch warnings:
drivers/vfio/vfio_iommu_type1.c:2376 vfio_iommu_type1_attach_group() warn: '&group->next' not removed from list
vim +/rsvd +788 drivers/vfio/vfio_iommu_type1.c
8f0d5bb95f763c Kirti Wankhede 2016-11-17 684 static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
0635559233434a Alex Williamson 2025-02-18 685 unsigned long npage, unsigned long *pfn_base,
4b6c33b3229678 Daniel Jordan 2021-02-19 686 unsigned long limit, struct vfio_batch *batch)
73fa0d10d077d9 Alex Williamson 2012-07-31 687 {
4d83de6da265cd Daniel Jordan 2021-02-19 688 unsigned long pfn;
4d83de6da265cd Daniel Jordan 2021-02-19 689 struct mm_struct *mm = current->mm;
6c38c055cc4c0a Alex Williamson 2016-12-30 690 long ret, pinned = 0, lock_acct = 0;
89c29def6b0101 Alex Williamson 2018-06-02 691 bool rsvd;
a54eb55045ae9b Kirti Wankhede 2016-11-17 692 dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
166fd7d94afdac Alex Williamson 2013-06-21 693
6c38c055cc4c0a Alex Williamson 2016-12-30 694 /* This code path is only user initiated */
4d83de6da265cd Daniel Jordan 2021-02-19 695 if (!mm)
166fd7d94afdac Alex Williamson 2013-06-21 696 return -ENODEV;
73fa0d10d077d9 Alex Williamson 2012-07-31 697
4d83de6da265cd Daniel Jordan 2021-02-19 698 if (batch->size) {
4d83de6da265cd Daniel Jordan 2021-02-19 699 /* Leftover pages in batch from an earlier call. */
4d83de6da265cd Daniel Jordan 2021-02-19 700 *pfn_base = page_to_pfn(batch->pages[batch->offset]);
4d83de6da265cd Daniel Jordan 2021-02-19 701 pfn = *pfn_base;
89c29def6b0101 Alex Williamson 2018-06-02 702 rsvd = is_invalid_reserved_pfn(*pfn_base);
4d83de6da265cd Daniel Jordan 2021-02-19 703 } else {
4d83de6da265cd Daniel Jordan 2021-02-19 704 *pfn_base = 0;
5c6c2b21ecc9ad Alex Williamson 2013-06-21 705 }
5c6c2b21ecc9ad Alex Williamson 2013-06-21 706
eb996eec783c1e Alex Williamson 2025-02-18 707 if (unlikely(disable_hugepages))
eb996eec783c1e Alex Williamson 2025-02-18 708 npage = 1;
eb996eec783c1e Alex Williamson 2025-02-18 709
4d83de6da265cd Daniel Jordan 2021-02-19 710 while (npage) {
4d83de6da265cd Daniel Jordan 2021-02-19 711 if (!batch->size) {
4d83de6da265cd Daniel Jordan 2021-02-19 712 /* Empty batch, so refill it. */
eb996eec783c1e Alex Williamson 2025-02-18 713 ret = vaddr_get_pfns(mm, vaddr, npage, dma->prot,
eb996eec783c1e Alex Williamson 2025-02-18 714 &pfn, batch);
be16c1fd99f41a Daniel Jordan 2021-02-19 715 if (ret < 0)
4d83de6da265cd Daniel Jordan 2021-02-19 716 goto unpin_out;
166fd7d94afdac Alex Williamson 2013-06-21 717
4d83de6da265cd Daniel Jordan 2021-02-19 718 if (!*pfn_base) {
4d83de6da265cd Daniel Jordan 2021-02-19 719 *pfn_base = pfn;
4d83de6da265cd Daniel Jordan 2021-02-19 720 rsvd = is_invalid_reserved_pfn(*pfn_base);
4d83de6da265cd Daniel Jordan 2021-02-19 721 }
If "*pfn_base" is true then "rsvd" is uninitialized.
eb996eec783c1e Alex Williamson 2025-02-18 722
eb996eec783c1e Alex Williamson 2025-02-18 723 /* Handle pfnmap */
eb996eec783c1e Alex Williamson 2025-02-18 724 if (!batch->size) {
eb996eec783c1e Alex Williamson 2025-02-18 725 if (pfn != *pfn_base + pinned || !rsvd)
eb996eec783c1e Alex Williamson 2025-02-18 726 goto out;
goto out;
eb996eec783c1e Alex Williamson 2025-02-18 727
eb996eec783c1e Alex Williamson 2025-02-18 728 pinned += ret;
eb996eec783c1e Alex Williamson 2025-02-18 729 npage -= ret;
eb996eec783c1e Alex Williamson 2025-02-18 730 vaddr += (PAGE_SIZE * ret);
eb996eec783c1e Alex Williamson 2025-02-18 731 iova += (PAGE_SIZE * ret);
eb996eec783c1e Alex Williamson 2025-02-18 732 continue;
eb996eec783c1e Alex Williamson 2025-02-18 733 }
166fd7d94afdac Alex Williamson 2013-06-21 734 }
166fd7d94afdac Alex Williamson 2013-06-21 735
4d83de6da265cd Daniel Jordan 2021-02-19 736 /*
eb996eec783c1e Alex Williamson 2025-02-18 737 * pfn is preset for the first iteration of this inner loop
eb996eec783c1e Alex Williamson 2025-02-18 738 * due to the fact that vaddr_get_pfns() needs to provide the
eb996eec783c1e Alex Williamson 2025-02-18 739 * initial pfn for pfnmaps. Therefore to reduce redundancy,
eb996eec783c1e Alex Williamson 2025-02-18 740 * the next pfn is fetched at the end of the loop.
eb996eec783c1e Alex Williamson 2025-02-18 741 * A PageReserved() page could still qualify as page backed
eb996eec783c1e Alex Williamson 2025-02-18 742 * and rsvd here, and therefore continues to use the batch.
4d83de6da265cd Daniel Jordan 2021-02-19 743 */
4d83de6da265cd Daniel Jordan 2021-02-19 744 while (true) {
6a2d9b72168041 Li Zhe 2025-06-30 745 long nr_pages, acct_pages = 0;
6a2d9b72168041 Li Zhe 2025-06-30 746
4d83de6da265cd Daniel Jordan 2021-02-19 747 if (pfn != *pfn_base + pinned ||
4d83de6da265cd Daniel Jordan 2021-02-19 748 rsvd != is_invalid_reserved_pfn(pfn))
4d83de6da265cd Daniel Jordan 2021-02-19 749 goto out;
4d83de6da265cd Daniel Jordan 2021-02-19 750
6a2d9b72168041 Li Zhe 2025-06-30 751 nr_pages = contig_pages(dma, batch, iova);
6a2d9b72168041 Li Zhe 2025-06-30 752 if (!rsvd) {
6a2d9b72168041 Li Zhe 2025-06-30 753 acct_pages = nr_pages;
6a2d9b72168041 Li Zhe 2025-06-30 754 acct_pages -= vpfn_pages(dma, iova, nr_pages);
6a2d9b72168041 Li Zhe 2025-06-30 755 }
6a2d9b72168041 Li Zhe 2025-06-30 756
4d83de6da265cd Daniel Jordan 2021-02-19 757 /*
4d83de6da265cd Daniel Jordan 2021-02-19 758 * Reserved pages aren't counted against the user,
4d83de6da265cd Daniel Jordan 2021-02-19 759 * externally pinned pages are already counted against
4d83de6da265cd Daniel Jordan 2021-02-19 760 * the user.
4d83de6da265cd Daniel Jordan 2021-02-19 761 */
6a2d9b72168041 Li Zhe 2025-06-30 762 if (acct_pages) {
48d8476b41eed6 Alex Williamson 2018-05-11 763 if (!dma->lock_cap &&
6a2d9b72168041 Li Zhe 2025-06-30 764 mm->locked_vm + lock_acct + acct_pages > limit) {
6c38c055cc4c0a Alex Williamson 2016-12-30 765 pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
6c38c055cc4c0a Alex Williamson 2016-12-30 766 __func__, limit << PAGE_SHIFT);
0cfef2b7410b64 Alex Williamson 2017-04-13 767 ret = -ENOMEM;
0cfef2b7410b64 Alex Williamson 2017-04-13 768 goto unpin_out;
166fd7d94afdac Alex Williamson 2013-06-21 769 }
6a2d9b72168041 Li Zhe 2025-06-30 770 lock_acct += acct_pages;
a54eb55045ae9b Kirti Wankhede 2016-11-17 771 }
4d83de6da265cd Daniel Jordan 2021-02-19 772
6a2d9b72168041 Li Zhe 2025-06-30 773 pinned += nr_pages;
6a2d9b72168041 Li Zhe 2025-06-30 774 npage -= nr_pages;
6a2d9b72168041 Li Zhe 2025-06-30 775 vaddr += PAGE_SIZE * nr_pages;
6a2d9b72168041 Li Zhe 2025-06-30 776 iova += PAGE_SIZE * nr_pages;
6a2d9b72168041 Li Zhe 2025-06-30 777 batch->offset += nr_pages;
6a2d9b72168041 Li Zhe 2025-06-30 778 batch->size -= nr_pages;
4d83de6da265cd Daniel Jordan 2021-02-19 779
4d83de6da265cd Daniel Jordan 2021-02-19 780 if (!batch->size)
4d83de6da265cd Daniel Jordan 2021-02-19 781 break;
4d83de6da265cd Daniel Jordan 2021-02-19 782
4d83de6da265cd Daniel Jordan 2021-02-19 783 pfn = page_to_pfn(batch->pages[batch->offset]);
4d83de6da265cd Daniel Jordan 2021-02-19 784 }
a54eb55045ae9b Kirti Wankhede 2016-11-17 785 }
166fd7d94afdac Alex Williamson 2013-06-21 786
6c38c055cc4c0a Alex Williamson 2016-12-30 787 out:
20448310d6b71d Li Zhe 2025-06-30 @788 dma->has_rsvd |= rsvd;
^^^^
48d8476b41eed6 Alex Williamson 2018-05-11 789 ret = vfio_lock_acct(dma, lock_acct, false);
0cfef2b7410b64 Alex Williamson 2017-04-13 790
0cfef2b7410b64 Alex Williamson 2017-04-13 791 unpin_out:
be16c1fd99f41a Daniel Jordan 2021-02-19 792 if (ret < 0) {
4d83de6da265cd Daniel Jordan 2021-02-19 793 if (pinned && !rsvd) {
0cfef2b7410b64 Alex Williamson 2017-04-13 794 for (pfn = *pfn_base ; pinned ; pfn++, pinned--)
0cfef2b7410b64 Alex Williamson 2017-04-13 795 put_pfn(pfn, dma->prot);
89c29def6b0101 Alex Williamson 2018-06-02 796 }
4d83de6da265cd Daniel Jordan 2021-02-19 797 vfio_batch_unpin(batch, dma);
0cfef2b7410b64 Alex Williamson 2017-04-13 798
0cfef2b7410b64 Alex Williamson 2017-04-13 799 return ret;
0cfef2b7410b64 Alex Williamson 2017-04-13 800 }
166fd7d94afdac Alex Williamson 2013-06-21 801
6c38c055cc4c0a Alex Williamson 2016-12-30 802 return pinned;
73fa0d10d077d9 Alex Williamson 2012-07-31 803 }
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
2025-07-01 15:13 ` Dan Carpenter
@ 2025-07-02 3:47 ` lizhe.67
2025-07-02 16:11 ` Dan Carpenter
0 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-07-02 3:47 UTC (permalink / raw)
To: dan.carpenter
Cc: alex.williamson, david, jgg, kvm, linux-kernel, lizhe.67, lkp,
oe-kbuild-all, oe-kbuild, peterx
On Tue, 1 Jul 2025 18:13:48 +0300, dan.carpenter@linaro.org wrote:
> New smatch warnings:
> drivers/vfio/vfio_iommu_type1.c:788 vfio_pin_pages_remote() error: uninitialized symbol 'rsvd'.
>
> Old smatch warnings:
> drivers/vfio/vfio_iommu_type1.c:2376 vfio_iommu_type1_attach_group() warn: '&group->next' not removed from list
>
> vim +/rsvd +788 drivers/vfio/vfio_iommu_type1.c
>
> 8f0d5bb95f763c Kirti Wankhede 2016-11-17 684 static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> 0635559233434a Alex Williamson 2025-02-18 685 unsigned long npage, unsigned long *pfn_base,
> 4b6c33b3229678 Daniel Jordan 2021-02-19 686 unsigned long limit, struct vfio_batch *batch)
> 73fa0d10d077d9 Alex Williamson 2012-07-31 687 {
> 4d83de6da265cd Daniel Jordan 2021-02-19 688 unsigned long pfn;
> 4d83de6da265cd Daniel Jordan 2021-02-19 689 struct mm_struct *mm = current->mm;
> 6c38c055cc4c0a Alex Williamson 2016-12-30 690 long ret, pinned = 0, lock_acct = 0;
> 89c29def6b0101 Alex Williamson 2018-06-02 691 bool rsvd;
> a54eb55045ae9b Kirti Wankhede 2016-11-17 692 dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
> 166fd7d94afdac Alex Williamson 2013-06-21 693
> 6c38c055cc4c0a Alex Williamson 2016-12-30 694 /* This code path is only user initiated */
> 4d83de6da265cd Daniel Jordan 2021-02-19 695 if (!mm)
> 166fd7d94afdac Alex Williamson 2013-06-21 696 return -ENODEV;
> 73fa0d10d077d9 Alex Williamson 2012-07-31 697
> 4d83de6da265cd Daniel Jordan 2021-02-19 698 if (batch->size) {
> 4d83de6da265cd Daniel Jordan 2021-02-19 699 /* Leftover pages in batch from an earlier call. */
> 4d83de6da265cd Daniel Jordan 2021-02-19 700 *pfn_base = page_to_pfn(batch->pages[batch->offset]);
> 4d83de6da265cd Daniel Jordan 2021-02-19 701 pfn = *pfn_base;
> 89c29def6b0101 Alex Williamson 2018-06-02 702 rsvd = is_invalid_reserved_pfn(*pfn_base);
When batch->size is not zero, we initialize rsvd here.
> 4d83de6da265cd Daniel Jordan 2021-02-19 703 } else {
> 4d83de6da265cd Daniel Jordan 2021-02-19 704 *pfn_base = 0;
When the value of batch->size is zero, we set the value of *pfn_base
to zero and do not initialize rsvd for the time being.
> 5c6c2b21ecc9ad Alex Williamson 2013-06-21 705 }
> 5c6c2b21ecc9ad Alex Williamson 2013-06-21 706
> eb996eec783c1e Alex Williamson 2025-02-18 707 if (unlikely(disable_hugepages))
> eb996eec783c1e Alex Williamson 2025-02-18 708 npage = 1;
> eb996eec783c1e Alex Williamson 2025-02-18 709
> 4d83de6da265cd Daniel Jordan 2021-02-19 710 while (npage) {
> 4d83de6da265cd Daniel Jordan 2021-02-19 711 if (!batch->size) {
> 4d83de6da265cd Daniel Jordan 2021-02-19 712 /* Empty batch, so refill it. */
> eb996eec783c1e Alex Williamson 2025-02-18 713 ret = vaddr_get_pfns(mm, vaddr, npage, dma->prot,
> eb996eec783c1e Alex Williamson 2025-02-18 714 &pfn, batch);
> be16c1fd99f41a Daniel Jordan 2021-02-19 715 if (ret < 0)
> 4d83de6da265cd Daniel Jordan 2021-02-19 716 goto unpin_out;
> 166fd7d94afdac Alex Williamson 2013-06-21 717
> 4d83de6da265cd Daniel Jordan 2021-02-19 718 if (!*pfn_base) {
> 4d83de6da265cd Daniel Jordan 2021-02-19 719 *pfn_base = pfn;
> 4d83de6da265cd Daniel Jordan 2021-02-19 720 rsvd = is_invalid_reserved_pfn(*pfn_base);
Therefore, for the first loop, when batch->size is zero, *pfn_base must
be zero, which will then lead to the initialization of rsvd.
> 4d83de6da265cd Daniel Jordan 2021-02-19 721 }
>
> If "*pfn_base" is true then "rsvd" is uninitialized.
>
> eb996eec783c1e Alex Williamson 2025-02-18 722
> eb996eec783c1e Alex Williamson 2025-02-18 723 /* Handle pfnmap */
> eb996eec783c1e Alex Williamson 2025-02-18 724 if (!batch->size) {
> eb996eec783c1e Alex Williamson 2025-02-18 725 if (pfn != *pfn_base + pinned || !rsvd)
> eb996eec783c1e Alex Williamson 2025-02-18 726 goto out;
>
> goto out;
>
> eb996eec783c1e Alex Williamson 2025-02-18 727
> eb996eec783c1e Alex Williamson 2025-02-18 728 pinned += ret;
> eb996eec783c1e Alex Williamson 2025-02-18 729 npage -= ret;
> eb996eec783c1e Alex Williamson 2025-02-18 730 vaddr += (PAGE_SIZE * ret);
> eb996eec783c1e Alex Williamson 2025-02-18 731 iova += (PAGE_SIZE * ret);
> eb996eec783c1e Alex Williamson 2025-02-18 732 continue;
> eb996eec783c1e Alex Williamson 2025-02-18 733 }
> 166fd7d94afdac Alex Williamson 2013-06-21 734 }
> 166fd7d94afdac Alex Williamson 2013-06-21 735
> 4d83de6da265cd Daniel Jordan 2021-02-19 736 /*
> eb996eec783c1e Alex Williamson 2025-02-18 737 * pfn is preset for the first iteration of this inner loop
> eb996eec783c1e Alex Williamson 2025-02-18 738 * due to the fact that vaddr_get_pfns() needs to provide the
> eb996eec783c1e Alex Williamson 2025-02-18 739 * initial pfn for pfnmaps. Therefore to reduce redundancy,
> eb996eec783c1e Alex Williamson 2025-02-18 740 * the next pfn is fetched at the end of the loop.
> eb996eec783c1e Alex Williamson 2025-02-18 741 * A PageReserved() page could still qualify as page backed
> eb996eec783c1e Alex Williamson 2025-02-18 742 * and rsvd here, and therefore continues to use the batch.
> 4d83de6da265cd Daniel Jordan 2021-02-19 743 */
> 4d83de6da265cd Daniel Jordan 2021-02-19 744 while (true) {
> 6a2d9b72168041 Li Zhe 2025-06-30 745 long nr_pages, acct_pages = 0;
> 6a2d9b72168041 Li Zhe 2025-06-30 746
> 4d83de6da265cd Daniel Jordan 2021-02-19 747 if (pfn != *pfn_base + pinned ||
> 4d83de6da265cd Daniel Jordan 2021-02-19 748 rsvd != is_invalid_reserved_pfn(pfn))
> 4d83de6da265cd Daniel Jordan 2021-02-19 749 goto out;
> 4d83de6da265cd Daniel Jordan 2021-02-19 750
> 6a2d9b72168041 Li Zhe 2025-06-30 751 nr_pages = contig_pages(dma, batch, iova);
> 6a2d9b72168041 Li Zhe 2025-06-30 752 if (!rsvd) {
> 6a2d9b72168041 Li Zhe 2025-06-30 753 acct_pages = nr_pages;
> 6a2d9b72168041 Li Zhe 2025-06-30 754 acct_pages -= vpfn_pages(dma, iova, nr_pages);
> 6a2d9b72168041 Li Zhe 2025-06-30 755 }
> 6a2d9b72168041 Li Zhe 2025-06-30 756
> 4d83de6da265cd Daniel Jordan 2021-02-19 757 /*
> 4d83de6da265cd Daniel Jordan 2021-02-19 758 * Reserved pages aren't counted against the user,
> 4d83de6da265cd Daniel Jordan 2021-02-19 759 * externally pinned pages are already counted against
> 4d83de6da265cd Daniel Jordan 2021-02-19 760 * the user.
> 4d83de6da265cd Daniel Jordan 2021-02-19 761 */
> 6a2d9b72168041 Li Zhe 2025-06-30 762 if (acct_pages) {
> 48d8476b41eed6 Alex Williamson 2018-05-11 763 if (!dma->lock_cap &&
> 6a2d9b72168041 Li Zhe 2025-06-30 764 mm->locked_vm + lock_acct + acct_pages > limit) {
> 6c38c055cc4c0a Alex Williamson 2016-12-30 765 pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
> 6c38c055cc4c0a Alex Williamson 2016-12-30 766 __func__, limit << PAGE_SHIFT);
> 0cfef2b7410b64 Alex Williamson 2017-04-13 767 ret = -ENOMEM;
> 0cfef2b7410b64 Alex Williamson 2017-04-13 768 goto unpin_out;
> 166fd7d94afdac Alex Williamson 2013-06-21 769 }
> 6a2d9b72168041 Li Zhe 2025-06-30 770 lock_acct += acct_pages;
> a54eb55045ae9b Kirti Wankhede 2016-11-17 771 }
> 4d83de6da265cd Daniel Jordan 2021-02-19 772
> 6a2d9b72168041 Li Zhe 2025-06-30 773 pinned += nr_pages;
> 6a2d9b72168041 Li Zhe 2025-06-30 774 npage -= nr_pages;
> 6a2d9b72168041 Li Zhe 2025-06-30 775 vaddr += PAGE_SIZE * nr_pages;
> 6a2d9b72168041 Li Zhe 2025-06-30 776 iova += PAGE_SIZE * nr_pages;
> 6a2d9b72168041 Li Zhe 2025-06-30 777 batch->offset += nr_pages;
> 6a2d9b72168041 Li Zhe 2025-06-30 778 batch->size -= nr_pages;
> 4d83de6da265cd Daniel Jordan 2021-02-19 779
> 4d83de6da265cd Daniel Jordan 2021-02-19 780 if (!batch->size)
> 4d83de6da265cd Daniel Jordan 2021-02-19 781 break;
> 4d83de6da265cd Daniel Jordan 2021-02-19 782
> 4d83de6da265cd Daniel Jordan 2021-02-19 783 pfn = page_to_pfn(batch->pages[batch->offset]);
> 4d83de6da265cd Daniel Jordan 2021-02-19 784 }
> a54eb55045ae9b Kirti Wankhede 2016-11-17 785 }
> 166fd7d94afdac Alex Williamson 2013-06-21 786
> 6c38c055cc4c0a Alex Williamson 2016-12-30 787 out:
> 20448310d6b71d Li Zhe 2025-06-30 @788 dma->has_rsvd |= rsvd;
> ^^^^
In summary, it is likely to be a false alarm.
Please correct me if I am wrong.
Thanks,
Zhe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
2025-06-30 7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
` (3 preceding siblings ...)
2025-06-30 7:25 ` [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
@ 2025-07-02 8:15 ` David Hildenbrand
2025-07-02 9:38 ` lizhe.67
4 siblings, 1 reply; 24+ messages in thread
From: David Hildenbrand @ 2025-07-02 8:15 UTC (permalink / raw)
To: lizhe.67, alex.williamson, jgg, peterx; +Cc: kvm, linux-kernel, Jason Gunthorpe
On 30.06.25 09:25, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
>
> This patchset is an consolidation of the two previous patchsets[1][2].
>
> When vfio_pin_pages_remote() is called with a range of addresses that
> includes large folios, the function currently performs individual
> statistics counting operations for each page. This can lead to significant
> performance overheads, especially when dealing with large ranges of pages.
>
> The function vfio_unpin_pages_remote() has a similar issue, where executing
> put_pfn() for each pfn brings considerable consumption.
>
> This patchset optimizes the performance of the relevant functions by
> batching the less efficient operations mentioned before.
>
> The first patch optimizes the performance of the function
> vfio_pin_pages_remote(), while the remaining patches optimize the
> performance of the function vfio_unpin_pages_remote().
>
> The performance test results, based on v6.16-rc4, for completing the 16G
> VFIO MAP/UNMAP DMA, obtained through unit test[3] with slight
> modifications[4], are as follows.
>
> Base(6.16-rc4):
> ./vfio-pci-mem-dma-map 0000:03:00.0 16
> ------- AVERAGE (MADV_HUGEPAGE) --------
> VFIO MAP DMA in 0.047 s (340.2 GB/s)
> VFIO UNMAP DMA in 0.135 s (118.6 GB/s)
> ------- AVERAGE (MAP_POPULATE) --------
> VFIO MAP DMA in 0.280 s (57.2 GB/s)
> VFIO UNMAP DMA in 0.312 s (51.3 GB/s)
> ------- AVERAGE (HUGETLBFS) --------
> VFIO MAP DMA in 0.052 s (310.5 GB/s)
> VFIO UNMAP DMA in 0.136 s (117.3 GB/s)
>
> With this patchset:
> ------- AVERAGE (MADV_HUGEPAGE) --------
> VFIO MAP DMA in 0.027 s (596.4 GB/s)
> VFIO UNMAP DMA in 0.045 s (357.6 GB/s)
> ------- AVERAGE (MAP_POPULATE) --------
> VFIO MAP DMA in 0.288 s (55.5 GB/s)
> VFIO UNMAP DMA in 0.288 s (55.6 GB/s)
> ------- AVERAGE (HUGETLBFS) --------
> VFIO MAP DMA in 0.031 s (508.3 GB/s)
> VFIO UNMAP DMA in 0.045 s (352.9 GB/s)
>
> For large folio, we achieve an over 40% performance improvement for VFIO
> MAP DMA and an over 66% performance improvement for VFIO DMA UNMAP. For
> small folios, the performance test results show little difference compared
> with the performance before optimization.
Jason mentioned in reply to the other series that, ideally, vfio
shouldn't be messing with folios at all.
While we now do that on the unpin side, we still do it at the pin side.
Which makes me wonder if we can avoid folios in patch #1 in
contig_pages(), and simply collect pages that correspond to consecutive
PFNs.
What was the reason again, that contig_pages() would not exceed a single
folio?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
2025-07-02 8:15 ` [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and " David Hildenbrand
@ 2025-07-02 9:38 ` lizhe.67
2025-07-02 9:57 ` David Hildenbrand
0 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-07-02 9:38 UTC (permalink / raw)
To: david; +Cc: alex.williamson, jgg, jgg, kvm, linux-kernel, lizhe.67, peterx
On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
> Jason mentioned in reply to the other series that, ideally, vfio
> shouldn't be messing with folios at all.
>
> While we now do that on the unpin side, we still do it at the pin side.
Yes.
> Which makes me wonder if we can avoid folios in patch #1 in
> contig_pages(), and simply collect pages that correspond to consecutive
> PFNs.
In my opinion, comparing whether the pfns of two pages are contiguous
is relatively inefficient. Using folios might be a more efficient
solution.
Given that 'page' is already in use within vfio, it seems that adopting
'folio' wouldn't be particularly troublesome? If you have any better
suggestions, I sincerely hope you would share them with me.
> What was the reason again, that contig_pages() would not exceed a single
> folio?
Regarding this issue, I think Alex and I are on the same page[1]. For a
folio, all of its pages have the same invalid or reserved state. In
the function vfio_pin_pages_remote(), we need to ensure that the state
is the same as the previous pfn (through variable 'rsvd' and function
is_invalid_reserved_pfn()). Therefore, we do not want the return value
of contig_pages() to exceed a single folio.
Thanks,
Zhe
[1]: https://lore.kernel.org/all/20250613081613.0bef3d39.alex.williamson@redhat.com/
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
2025-07-02 9:38 ` lizhe.67
@ 2025-07-02 9:57 ` David Hildenbrand
2025-07-02 12:47 ` Jason Gunthorpe
2025-07-03 3:54 ` lizhe.67
0 siblings, 2 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-07-02 9:57 UTC (permalink / raw)
To: lizhe.67; +Cc: alex.williamson, jgg, jgg, kvm, linux-kernel, peterx
On 02.07.25 11:38, lizhe.67@bytedance.com wrote:
> On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
>
>> Jason mentioned in reply to the other series that, ideally, vfio
>> shouldn't be messing with folios at all.
>>
>> While we now do that on the unpin side, we still do it at the pin side.
>
> Yes.
>
>> Which makes me wonder if we can avoid folios in patch #1 in
>> contig_pages(), and simply collect pages that correspond to consecutive
>> PFNs.
>
> In my opinion, comparing whether the pfns of two pages are contiguous
> is relatively inefficient. Using folios might be a more efficient
> solution.
buffer[i + 1] == nth_page(buffer[i], 1)
Is extremely efficient, except on
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
Because it's essentially
buffer[i + 1] == buffer[i] + 1
But with that config it's less efficient
buffer[i + 1] == pfn_to_page(page_to_pfn(buffer[i]) + 1)
That could be optimized (if we care about the config), assuming that we don't cross
memory sections (e.g., 128 MiB on x86).
See page_ext_iter_next_fast_possible(), that optimized for something similar.
So based on the first page, one could easily determine how far to batch
using the simple
buffer[i + 1] == buffer[i] + 1
comparison.
That would mean that one could exceed a folio, in theory.
>
> Given that 'page' is already in use within vfio, it seems that adopting
> 'folio' wouldn't be particularly troublesome? If you have any better
> suggestions, I sincerely hope you would share them with me.
One challenge in the future will likely be that not all pages that we can
GUP will belong to folios. We would possibly be able to handle that by
checking if the page actually belongs to a folio.
Not dealing with folios where avoidable would be easier.
>
>> What was the reason again, that contig_pages() would not exceed a single
>> folio?
>
> Regarding this issue, I think Alex and I are on the same page[1]. For a
> folio, all of its pages have the same invalid or reserved state. In
> the function vfio_pin_pages_remote(), we need to ensure that the state
> is the same as the previous pfn (through variable 'rsvd' and function
> is_invalid_reserved_pfn()). Therefore, we do not want the return value
> of contig_pages() to exceed a single folio.
If we obtained a page from GUP, is_invalid_reserved_pfn() would only trigger
for the shared zeropage. but that one can no longer be returned from FOLL_LONGTERM.
So if you know the pages came from GUP, I would assume they are never invalid_reserved?
Again, just a thought on how to apply something similar as done for the unpin case, avoiding
messing with folios.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
2025-07-02 9:57 ` David Hildenbrand
@ 2025-07-02 12:47 ` Jason Gunthorpe
2025-07-03 4:04 ` lizhe.67
2025-07-03 3:54 ` lizhe.67
1 sibling, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-07-02 12:47 UTC (permalink / raw)
To: David Hildenbrand; +Cc: lizhe.67, alex.williamson, kvm, linux-kernel, peterx
On Wed, Jul 02, 2025 at 11:57:08AM +0200, David Hildenbrand wrote:
> On 02.07.25 11:38, lizhe.67@bytedance.com wrote:
> > On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
> >
> > > Jason mentioned in reply to the other series that, ideally, vfio
> > > shouldn't be messing with folios at all.
> > >
> > > While we now do that on the unpin side, we still do it at the pin side.
> >
> > Yes.
> >
> > > Which makes me wonder if we can avoid folios in patch #1 in
> > > contig_pages(), and simply collect pages that correspond to consecutive
> > > PFNs.
> >
> > In my opinion, comparing whether the pfns of two pages are contiguous
> > is relatively inefficient. Using folios might be a more efficient
> > solution.
>
> buffer[i + 1] == nth_page(buffer[i], 1)
>
> Is extremely efficient, except on
sg_alloc_append_table_from_pages() is using the
next_pfn = (sg_phys(sgt_append->prv) + prv_len) / PAGE_SIZE;
last_pg = pfn_to_page(next_pfn - 1);
Approach to evaluate contiguity.
iommufd is also using very similar in batch_from_pages():
if (!batch_add_pfn(batch, page_to_pfn(*pages)))
So we should not be trying to optimize this only in VFIO, I would drop
that from this series.
If it can be optimized we should try to have some kind of generic
helper for building a physical contiguous range from a struct page
list.
> If we obtained a page from GUP, is_invalid_reserved_pfn() would only trigger
> for the shared zeropage. but that one can no longer be returned from FOLL_LONGTERM.
AFAIK the use of "reserved" here means it is non-gupable memory that
was acquired through follow_pfn. When it is pulled back out of the
iommu_domain as a phys_addr_t the is_invalid_reserved_pfn() is used to
tell if the address came from GUP or if it came from follow_pfn.
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
2025-07-02 3:47 ` lizhe.67
@ 2025-07-02 16:11 ` Dan Carpenter
0 siblings, 0 replies; 24+ messages in thread
From: Dan Carpenter @ 2025-07-02 16:11 UTC (permalink / raw)
To: lizhe.67
Cc: alex.williamson, david, jgg, kvm, linux-kernel, lkp,
oe-kbuild-all, oe-kbuild, peterx
On Wed, Jul 02, 2025 at 11:47:20AM +0800, lizhe.67@bytedance.com wrote:
> On Tue, 1 Jul 2025 18:13:48 +0300, dan.carpenter@linaro.org wrote:
>
> > New smatch warnings:
> > drivers/vfio/vfio_iommu_type1.c:788 vfio_pin_pages_remote() error: uninitialized symbol 'rsvd'.
> >
> > Old smatch warnings:
> > drivers/vfio/vfio_iommu_type1.c:2376 vfio_iommu_type1_attach_group() warn: '&group->next' not removed from list
> >
> > vim +/rsvd +788 drivers/vfio/vfio_iommu_type1.c
> >
> > 8f0d5bb95f763c Kirti Wankhede 2016-11-17 684 static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> > 0635559233434a Alex Williamson 2025-02-18 685 unsigned long npage, unsigned long *pfn_base,
> > 4b6c33b3229678 Daniel Jordan 2021-02-19 686 unsigned long limit, struct vfio_batch *batch)
> > 73fa0d10d077d9 Alex Williamson 2012-07-31 687 {
> > 4d83de6da265cd Daniel Jordan 2021-02-19 688 unsigned long pfn;
> > 4d83de6da265cd Daniel Jordan 2021-02-19 689 struct mm_struct *mm = current->mm;
> > 6c38c055cc4c0a Alex Williamson 2016-12-30 690 long ret, pinned = 0, lock_acct = 0;
> > 89c29def6b0101 Alex Williamson 2018-06-02 691 bool rsvd;
> > a54eb55045ae9b Kirti Wankhede 2016-11-17 692 dma_addr_t iova = vaddr - dma->vaddr + dma->iova;
> > 166fd7d94afdac Alex Williamson 2013-06-21 693
> > 6c38c055cc4c0a Alex Williamson 2016-12-30 694 /* This code path is only user initiated */
> > 4d83de6da265cd Daniel Jordan 2021-02-19 695 if (!mm)
> > 166fd7d94afdac Alex Williamson 2013-06-21 696 return -ENODEV;
> > 73fa0d10d077d9 Alex Williamson 2012-07-31 697
> > 4d83de6da265cd Daniel Jordan 2021-02-19 698 if (batch->size) {
> > 4d83de6da265cd Daniel Jordan 2021-02-19 699 /* Leftover pages in batch from an earlier call. */
> > 4d83de6da265cd Daniel Jordan 2021-02-19 700 *pfn_base = page_to_pfn(batch->pages[batch->offset]);
> > 4d83de6da265cd Daniel Jordan 2021-02-19 701 pfn = *pfn_base;
> > 89c29def6b0101 Alex Williamson 2018-06-02 702 rsvd = is_invalid_reserved_pfn(*pfn_base);
>
> When batch->size is not zero, we initialize rsvd here.
>
> > 4d83de6da265cd Daniel Jordan 2021-02-19 703 } else {
> > 4d83de6da265cd Daniel Jordan 2021-02-19 704 *pfn_base = 0;
>
> When the value of batch->size is zero, we set the value of *pfn_base
> to zero and do not initialize rsvd for the time being.
>
> > 5c6c2b21ecc9ad Alex Williamson 2013-06-21 705 }
> > 5c6c2b21ecc9ad Alex Williamson 2013-06-21 706
> > eb996eec783c1e Alex Williamson 2025-02-18 707 if (unlikely(disable_hugepages))
> > eb996eec783c1e Alex Williamson 2025-02-18 708 npage = 1;
> > eb996eec783c1e Alex Williamson 2025-02-18 709
> > 4d83de6da265cd Daniel Jordan 2021-02-19 710 while (npage) {
> > 4d83de6da265cd Daniel Jordan 2021-02-19 711 if (!batch->size) {
> > 4d83de6da265cd Daniel Jordan 2021-02-19 712 /* Empty batch, so refill it. */
> > eb996eec783c1e Alex Williamson 2025-02-18 713 ret = vaddr_get_pfns(mm, vaddr, npage, dma->prot,
> > eb996eec783c1e Alex Williamson 2025-02-18 714 &pfn, batch);
> > be16c1fd99f41a Daniel Jordan 2021-02-19 715 if (ret < 0)
> > 4d83de6da265cd Daniel Jordan 2021-02-19 716 goto unpin_out;
> > 166fd7d94afdac Alex Williamson 2013-06-21 717
> > 4d83de6da265cd Daniel Jordan 2021-02-19 718 if (!*pfn_base) {
> > 4d83de6da265cd Daniel Jordan 2021-02-19 719 *pfn_base = pfn;
> > 4d83de6da265cd Daniel Jordan 2021-02-19 720 rsvd = is_invalid_reserved_pfn(*pfn_base);
>
> Therefore, for the first loop, when batch->size is zero, *pfn_base must
> be zero, which will then lead to the initialization of rsvd.
>
Yeah. :/
I don't know why this warning was printed honestly. Smatch is supposed
to figure that kind of thing out correctly. It isn't printed on my
system. I've tried deleting the cross function DB (which shouldn't
matter) and I'm using the published version of Smatch but I can't get it
to print. Ah well. My bad. Thanks for taking a look.
regards,
dan carpenter
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
2025-06-30 7:25 ` [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
@ 2025-07-02 18:27 ` Jason Gunthorpe
2025-07-03 4:18 ` lizhe.67
0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-07-02 18:27 UTC (permalink / raw)
To: lizhe.67; +Cc: alex.williamson, david, peterx, kvm, linux-kernel
On Mon, Jun 30, 2025 at 03:25:16PM +0800, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
>
> The function vpfn_pages() can help us determine the number of vpfn
> nodes on the vpfn rb tree within a specified range. This allows us
> to avoid searching for each vpfn individually in the function
> vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
> calls in function vfio_unpin_pages_remote().
>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
> drivers/vfio/vfio_iommu_type1.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index a2d7abd4f2c2..330fff4fe96d 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -804,16 +804,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
> unsigned long pfn, unsigned long npage,
> bool do_accounting)
> {
> - long unlocked = 0, locked = 0;
> + long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
> long i;
The logic in vpfn_pages?() doesn't seem quite right? Don't we want to
count the number of pages within the range that fall within the rb
tree?
vpfn_pages() looks like it is only counting the number of RB tree
nodes within the range?
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio
2025-06-30 7:25 ` [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
@ 2025-07-02 18:28 ` Jason Gunthorpe
2025-07-03 6:12 ` lizhe.67
0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-07-02 18:28 UTC (permalink / raw)
To: lizhe.67; +Cc: alex.williamson, david, peterx, kvm, linux-kernel
On Mon, Jun 30, 2025 at 03:25:18PM +0800, lizhe.67@bytedance.com wrote:
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index a02bc340c112..7cacfb2cefe3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -802,17 +802,29 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> return pinned;
> }
>
> +static inline void put_valid_unreserved_pfns(unsigned long start_pfn,
> + unsigned long npage, int prot)
> +{
> + unpin_user_page_range_dirty_lock(pfn_to_page(start_pfn), npage,
> + prot & IOMMU_WRITE);
> +}
I don't think you need this wrapper.
This patch and the prior look OK
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
2025-07-02 9:57 ` David Hildenbrand
2025-07-02 12:47 ` Jason Gunthorpe
@ 2025-07-03 3:54 ` lizhe.67
2025-07-03 11:06 ` David Hildenbrand
1 sibling, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-07-03 3:54 UTC (permalink / raw)
To: david; +Cc: alex.williamson, jgg, jgg, kvm, linux-kernel, lizhe.67, peterx
On Wed, 2 Jul 2025 11:57:08 +0200, david@redhat.com wrote:
> On 02.07.25 11:38, lizhe.67@bytedance.com wrote:
> > On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
> >
> >> Jason mentioned in reply to the other series that, ideally, vfio
> >> shouldn't be messing with folios at all.
> >>
> >> While we now do that on the unpin side, we still do it at the pin side.
> >
> > Yes.
> >
> >> Which makes me wonder if we can avoid folios in patch #1 in
> >> contig_pages(), and simply collect pages that correspond to consecutive
> >> PFNs.
> >
> > In my opinion, comparing whether the pfns of two pages are contiguous
> > is relatively inefficient. Using folios might be a more efficient
> > solution.
>
> buffer[i + 1] == nth_page(buffer[i], 1)
>
> Is extremely efficient, except on
>
> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>
> Because it's essentially
>
> buffer[i + 1] == buffer[i] + 1
>
> But with that config it's less efficient
>
> buffer[i + 1] == pfn_to_page(page_to_pfn(buffer[i]) + 1)
>
> That could be optimized (if we care about the config), assuming that we don't cross
> memory sections (e.g., 128 MiB on x86).
>
> See page_ext_iter_next_fast_possible(), that optimized for something similar.
>
> So based on the first page, one could easily determine how far to batch
> using the simple
>
> buffer[i + 1] == buffer[i] + 1
>
> comparison.
>
> That would mean that one could exceed a folio, in theory.
Thank you very much for your suggestion. I think we can focus on
optimizing the case where
!(defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP))
I believe that in most scenarios where vfio is used,
CONFIG_SPARSEMEM_VMEMMAP is enabled. Excessive CONFIG
may make the patch appear overly complicated.
> > Given that 'page' is already in use within vfio, it seems that adopting
> > 'folio' wouldn't be particularly troublesome? If you have any better
> > suggestions, I sincerely hope you would share them with me.
>
> One challenge in the future will likely be that not all pages that we can
> GUP will belong to folios. We would possibly be able to handle that by
> checking if the page actually belongs to a folio.
>
> Not dealing with folios where avoidable would be easier.
>
> >
> >> What was the reason again, that contig_pages() would not exceed a single
> >> folio?
> >
> > Regarding this issue, I think Alex and I are on the same page[1]. For a
> > folio, all of its pages have the same invalid or reserved state. In
> > the function vfio_pin_pages_remote(), we need to ensure that the state
> > is the same as the previous pfn (through variable 'rsvd' and function
> > is_invalid_reserved_pfn()). Therefore, we do not want the return value
> > of contig_pages() to exceed a single folio.
>
> If we obtained a page from GUP, is_invalid_reserved_pfn() would only trigger
> for the shared zeropage. but that one can no longer be returned from FOLL_LONGTERM.
>
> So if you know the pages came from GUP, I would assume they are never invalid_reserved?
Yes, we use function vaddr_get_pfns(), which ultimately invokes GUP
with the FOLL_LONGTERM flag.
> Again, just a thought on how to apply something similar as done for the unpin case, avoiding
> messing with folios.
Taking into account the previous discussion, it seems that we might
simply replace the contig_pages() in patch #1 with the following one.
Also, function contig_pages() could also be extracted into mm.h as a
helper function. It seems that Jason would like to utilize it in other
contexts. Moreover, the subject of this patchset should be changed to
"Optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote()". Do
you think this would work?
+static inline unsigned long contig_pages(struct page **pages,
+ unsigned long size)
+{
+ struct page *first_page = pages[0];
+ unsigned long i;
+
+ for (i = 1; i < size; i++)
+ if (pages[i] != nth_page(first_page, i))
+ break;
+ return i;
+}
I have conducted a preliminary performance test, and the results are
similar to those obtained previously.
Thanks,
Zhe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
2025-07-02 12:47 ` Jason Gunthorpe
@ 2025-07-03 4:04 ` lizhe.67
0 siblings, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-07-03 4:04 UTC (permalink / raw)
To: jgg; +Cc: alex.williamson, david, kvm, linux-kernel, lizhe.67, peterx
> On Wed, 2 Jul 2025 09:47:56 -0300, jgg@nvidia.com wrote:
>
> On Wed, Jul 02, 2025 at 11:57:08AM +0200, David Hildenbrand wrote:
> > On 02.07.25 11:38, lizhe.67@bytedance.com wrote:
> > > On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
> > >
> > > > Jason mentioned in reply to the other series that, ideally, vfio
> > > > shouldn't be messing with folios at all.
> > > >
> > > > While we now do that on the unpin side, we still do it at the pin side.
> > >
> > > Yes.
> > >
> > > > Which makes me wonder if we can avoid folios in patch #1 in
> > > > contig_pages(), and simply collect pages that correspond to consecutive
> > > > PFNs.
> > >
> > > In my opinion, comparing whether the pfns of two pages are contiguous
> > > is relatively inefficient. Using folios might be a more efficient
> > > solution.
> >
> > buffer[i + 1] == nth_page(buffer[i], 1)
> >
> > Is extremely efficient, except on
>
> sg_alloc_append_table_from_pages() is using the
>
> next_pfn = (sg_phys(sgt_append->prv) + prv_len) / PAGE_SIZE;
> last_pg = pfn_to_page(next_pfn - 1);
>
> Approach to evaluate contiguity.
>
> iommufd is also using very similar in batch_from_pages():
>
> if (!batch_add_pfn(batch, page_to_pfn(*pages)))
I'm not particularly familiar with this section of the code, so I
can't say for certain. Regarding the two locations mentioned earlier,
if it's possible to determine the contiguity of physical memory by
passing in an array of page pointers, then we could adopt the
approach suggested by David. I've made a preliminary implementation
here[1]. Is that helper function okay with you?
> So we should not be trying to optimize this only in VFIO, I would drop
> that from this series.
>
> If it can be optimized we should try to have some kind of generic
> helper for building a physical contiguous range from a struct page
> list.
Thanks,
Zhe
[1]: https://lore.kernel.org/all/20250703035425.36124-1-lizhe.67@bytedance.com/
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
2025-07-02 18:27 ` Jason Gunthorpe
@ 2025-07-03 4:18 ` lizhe.67
2025-07-03 12:27 ` Jason Gunthorpe
0 siblings, 1 reply; 24+ messages in thread
From: lizhe.67 @ 2025-07-03 4:18 UTC (permalink / raw)
To: jgg; +Cc: alex.williamson, david, kvm, linux-kernel, lizhe.67, peterx, jgg
On Wed, 2 Jul 2025 15:27:59 -0300, jgg@ziepe.ca wrote:
> On Mon, Jun 30, 2025 at 03:25:16PM +0800, lizhe.67@bytedance.com wrote:
> > From: Li Zhe <lizhe.67@bytedance.com>
> >
> > The function vpfn_pages() can help us determine the number of vpfn
> > nodes on the vpfn rb tree within a specified range. This allows us
> > to avoid searching for each vpfn individually in the function
> > vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
> > calls in function vfio_unpin_pages_remote().
> >
> > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> > ---
> > drivers/vfio/vfio_iommu_type1.c | 10 +++-------
> > 1 file changed, 3 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index a2d7abd4f2c2..330fff4fe96d 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -804,16 +804,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
> > unsigned long pfn, unsigned long npage,
> > bool do_accounting)
> > {
> > - long unlocked = 0, locked = 0;
> > + long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
> > long i;
>
> The logic in vpfn_pages?() doesn't seem quite right? Don't we want to
> count the number of pages within the range that fall within the rb
> tree?
>
> vpfn_pages() looks like it is only counting the number of RB tree
> nodes within the range?
As I understand it, a vfio_pfn corresponds to a single page, am I right?
Thanks,
Zhe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio
2025-07-02 18:28 ` Jason Gunthorpe
@ 2025-07-03 6:12 ` lizhe.67
0 siblings, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-07-03 6:12 UTC (permalink / raw)
To: jgg; +Cc: alex.williamson, david, kvm, linux-kernel, lizhe.67, peterx, jgg
On Wed, 2 Jul 2025 15:28:44 -0300, jgg@ziepe.ca wrote:
> On Mon, Jun 30, 2025 at 03:25:18PM +0800, lizhe.67@bytedance.com wrote:
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index a02bc340c112..7cacfb2cefe3 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -802,17 +802,29 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> > return pinned;
> > }
> >
> > +static inline void put_valid_unreserved_pfns(unsigned long start_pfn,
> > + unsigned long npage, int prot)
> > +{
> > + unpin_user_page_range_dirty_lock(pfn_to_page(start_pfn), npage,
> > + prot & IOMMU_WRITE);
> > +}
>
> I don't think you need this wrapper.
>
> This patch and the prior look OK
Thank you very much for your review. The primary purpose of adding
this wrapper is to make the code more comprehensible. Would it be
better to keep this wrapper? Perhaps we could also save on the need
for some comments.
Thanks,
Zhe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
2025-07-03 3:54 ` lizhe.67
@ 2025-07-03 11:06 ` David Hildenbrand
2025-07-03 11:12 ` Jason Gunthorpe
2025-07-03 11:34 ` lizhe.67
0 siblings, 2 replies; 24+ messages in thread
From: David Hildenbrand @ 2025-07-03 11:06 UTC (permalink / raw)
To: lizhe.67; +Cc: alex.williamson, jgg, jgg, kvm, linux-kernel, peterx
On 03.07.25 05:54, lizhe.67@bytedance.com wrote:
> On Wed, 2 Jul 2025 11:57:08 +0200, david@redhat.com wrote:
>
>> On 02.07.25 11:38, lizhe.67@bytedance.com wrote:
>>> On Wed, 2 Jul 2025 10:15:29 +0200, david@redhat.com wrote:
>>>
>>>> Jason mentioned in reply to the other series that, ideally, vfio
>>>> shouldn't be messing with folios at all.
>>>>
>>>> While we now do that on the unpin side, we still do it at the pin side.
>>>
>>> Yes.
>>>
>>>> Which makes me wonder if we can avoid folios in patch #1 in
>>>> contig_pages(), and simply collect pages that correspond to consecutive
>>>> PFNs.
>>>
>>> In my opinion, comparing whether the pfns of two pages are contiguous
>>> is relatively inefficient. Using folios might be a more efficient
>>> solution.
>>
>> buffer[i + 1] == nth_page(buffer[i], 1)
>>
>> Is extremely efficient, except on
>>
>> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>>
>> Because it's essentially
>>
>> buffer[i + 1] == buffer[i] + 1
>>
>> But with that config it's less efficient
>>
>> buffer[i + 1] == pfn_to_page(page_to_pfn(buffer[i]) + 1)
>>
>> That could be optimized (if we care about the config), assuming that we don't cross
>> memory sections (e.g., 128 MiB on x86).
>>
>> See page_ext_iter_next_fast_possible(), that optimized for something similar.
>>
>> So based on the first page, one could easily determine how far to batch
>> using the simple
>>
>> buffer[i + 1] == buffer[i] + 1
>>
>> comparison.
>>
>> That would mean that one could exceed a folio, in theory.
>
> Thank you very much for your suggestion. I think we can focus on
> optimizing the case where
>
> !(defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP))
>
> I believe that in most scenarios where vfio is used,
> CONFIG_SPARSEMEM_VMEMMAP is enabled. Excessive CONFIG
> may make the patch appear overly complicated.
>
>>> Given that 'page' is already in use within vfio, it seems that adopting
>>> 'folio' wouldn't be particularly troublesome? If you have any better
>>> suggestions, I sincerely hope you would share them with me.
>>
>> One challenge in the future will likely be that not all pages that we can
>> GUP will belong to folios. We would possibly be able to handle that by
>> checking if the page actually belongs to a folio.
>>
>> Not dealing with folios where avoidable would be easier.
>>
>>>
>>>> What was the reason again, that contig_pages() would not exceed a single
>>>> folio?
>>>
>>> Regarding this issue, I think Alex and I are on the same page[1]. For a
>>> folio, all of its pages have the same invalid or reserved state. In
>>> the function vfio_pin_pages_remote(), we need to ensure that the state
>>> is the same as the previous pfn (through variable 'rsvd' and function
>>> is_invalid_reserved_pfn()). Therefore, we do not want the return value
>>> of contig_pages() to exceed a single folio.
>>
>> If we obtained a page from GUP, is_invalid_reserved_pfn() would only trigger
>> for the shared zeropage. but that one can no longer be returned from FOLL_LONGTERM.
>>
>> So if you know the pages came from GUP, I would assume they are never invalid_reserved?
>
> Yes, we use function vaddr_get_pfns(), which ultimately invokes GUP
> with the FOLL_LONGTERM flag.
>
>> Again, just a thought on how to apply something similar as done for the unpin case, avoiding
>> messing with folios.
>
> Taking into account the previous discussion, it seems that we might
> simply replace the contig_pages() in patch #1 with the following one.
> Also, function contig_pages() could also be extracted into mm.h as a
> helper function. It seems that Jason would like to utilize it in other
> contexts. Moreover, the subject of this patchset should be changed to
> "Optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote()". Do
> you think this would work?
>
> +static inline unsigned long contig_pages(struct page **pages,
> + unsigned long size)
size -> nr_pages
> +{
> + struct page *first_page = pages[0];
> + unsigned long i;
> +
> + for (i = 1; i < size; i++)
> + if (pages[i] != nth_page(first_page, i))
> + break;
> + return i;
> +}
LGTM.
I wonder if we can find a better function name, especially when moving
this to some header where it can be reused.
Something that expresses that we will return the next batch that starts
at the first page.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
2025-07-03 11:06 ` David Hildenbrand
@ 2025-07-03 11:12 ` Jason Gunthorpe
2025-07-03 11:35 ` lizhe.67
2025-07-03 11:34 ` lizhe.67
1 sibling, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-07-03 11:12 UTC (permalink / raw)
To: David Hildenbrand; +Cc: lizhe.67, alex.williamson, kvm, linux-kernel, peterx
On Thu, Jul 03, 2025 at 01:06:26PM +0200, David Hildenbrand wrote:
> > +{
> > + struct page *first_page = pages[0];
> > + unsigned long i;
> > +
> > + for (i = 1; i < size; i++)
> > + if (pages[i] != nth_page(first_page, i))
> > + break;
> > + return i;
> > +}
>
> LGTM.
>
> I wonder if we can find a better function name, especially when moving this
> to some header where it can be reused.
It should be a common function:
unsigned long num_pages_contiguous(struct page *list, size_t nelms);
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
2025-07-03 11:06 ` David Hildenbrand
2025-07-03 11:12 ` Jason Gunthorpe
@ 2025-07-03 11:34 ` lizhe.67
1 sibling, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-07-03 11:34 UTC (permalink / raw)
To: david; +Cc: alex.williamson, jgg, jgg, kvm, linux-kernel, lizhe.67, peterx
On Thu, 3 Jul 2025 13:06:26 +0200, david@redhat.com wrote:
> > +static inline unsigned long contig_pages(struct page **pages,
> > + unsigned long size)
>
> size -> nr_pages
>
> > +{
> > + struct page *first_page = pages[0];
> > + unsigned long i;
> > +
> > + for (i = 1; i < size; i++)
> > + if (pages[i] != nth_page(first_page, i))
> > + break;
> > + return i;
> > +}
>
> LGTM.
>
> I wonder if we can find a better function name, especially when moving
> this to some header where it can be reused.
>
> Something that expresses that we will return the next batch that starts
> at the first page.
Thank you. Given that this function may have more users in the future,
I will place it in include/linux/mm.h instead of the vfio file. Once
I've addressed the comments on the other patches with Jason, I will
resend a new patchset.
Thanks,
Zhe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio
2025-07-03 11:12 ` Jason Gunthorpe
@ 2025-07-03 11:35 ` lizhe.67
0 siblings, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-07-03 11:35 UTC (permalink / raw)
To: jgg; +Cc: alex.williamson, david, kvm, linux-kernel, lizhe.67, peterx
On Thu, 3 Jul 2025 08:12:16 -0300, jgg@ziepe.ca wrote:
> On Thu, Jul 03, 2025 at 01:06:26PM +0200, David Hildenbrand wrote:
> > > +{
> > > + struct page *first_page = pages[0];
> > > + unsigned long i;
> > > +
> > > + for (i = 1; i < size; i++)
> > > + if (pages[i] != nth_page(first_page, i))
> > > + break;
> > > + return i;
> > > +}
> >
> > LGTM.
> >
> > I wonder if we can find a better function name, especially when moving this
> > to some header where it can be reused.
>
> It should be a common function:
>
> unsigned long num_pages_contiguous(struct page *list, size_t nelms);
I fully agree with you.
Thanks,
Zhe
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
2025-07-03 4:18 ` lizhe.67
@ 2025-07-03 12:27 ` Jason Gunthorpe
2025-07-04 2:20 ` lizhe.67
0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2025-07-03 12:27 UTC (permalink / raw)
To: lizhe.67; +Cc: alex.williamson, david, kvm, linux-kernel, peterx
On Thu, Jul 03, 2025 at 12:18:22PM +0800, lizhe.67@bytedance.com wrote:
> On Wed, 2 Jul 2025 15:27:59 -0300, jgg@ziepe.ca wrote:
>
> > On Mon, Jun 30, 2025 at 03:25:16PM +0800, lizhe.67@bytedance.com wrote:
> > > From: Li Zhe <lizhe.67@bytedance.com>
> > >
> > > The function vpfn_pages() can help us determine the number of vpfn
> > > nodes on the vpfn rb tree within a specified range. This allows us
> > > to avoid searching for each vpfn individually in the function
> > > vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
> > > calls in function vfio_unpin_pages_remote().
> > >
> > > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> > > ---
> > > drivers/vfio/vfio_iommu_type1.c | 10 +++-------
> > > 1 file changed, 3 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > > index a2d7abd4f2c2..330fff4fe96d 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -804,16 +804,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
> > > unsigned long pfn, unsigned long npage,
> > > bool do_accounting)
> > > {
> > > - long unlocked = 0, locked = 0;
> > > + long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
> > > long i;
> >
> > The logic in vpfn_pages?() doesn't seem quite right? Don't we want to
> > count the number of pages within the range that fall within the rb
> > tree?
> >
> > vpfn_pages() looks like it is only counting the number of RB tree
> > nodes within the range?
>
> As I understand it, a vfio_pfn corresponds to a single page, am I right?
It does look that way, it is not what I was expecting iommufd holds
ranges for this job..
So this is OK then
Jason
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
2025-07-03 12:27 ` Jason Gunthorpe
@ 2025-07-04 2:20 ` lizhe.67
0 siblings, 0 replies; 24+ messages in thread
From: lizhe.67 @ 2025-07-04 2:20 UTC (permalink / raw)
To: jgg; +Cc: alex.williamson, david, kvm, linux-kernel, lizhe.67, peterx
On Thu, 3 Jul 2025 09:27:56 -0300, jgg@nvidia.com wrote:
> On Thu, Jul 03, 2025 at 12:18:22PM +0800, lizhe.67@bytedance.com wrote:
> > On Wed, 2 Jul 2025 15:27:59 -0300, jgg@ziepe.ca wrote:
> >
> > > On Mon, Jun 30, 2025 at 03:25:16PM +0800, lizhe.67@bytedance.com wrote:
> > > > From: Li Zhe <lizhe.67@bytedance.com>
> > > >
> > > > The function vpfn_pages() can help us determine the number of vpfn
> > > > nodes on the vpfn rb tree within a specified range. This allows us
> > > > to avoid searching for each vpfn individually in the function
> > > > vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
> > > > calls in function vfio_unpin_pages_remote().
> > > >
> > > > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> > > > ---
> > > > drivers/vfio/vfio_iommu_type1.c | 10 +++-------
> > > > 1 file changed, 3 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > > > index a2d7abd4f2c2..330fff4fe96d 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -804,16 +804,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
> > > > unsigned long pfn, unsigned long npage,
> > > > bool do_accounting)
> > > > {
> > > > - long unlocked = 0, locked = 0;
> > > > + long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
> > > > long i;
> > >
> > > The logic in vpfn_pages?() doesn't seem quite right? Don't we want to
> > > count the number of pages within the range that fall within the rb
> > > tree?
> > >
> > > vpfn_pages() looks like it is only counting the number of RB tree
> > > nodes within the range?
> >
> > As I understand it, a vfio_pfn corresponds to a single page, am I right?
>
> It does look that way, it is not what I was expecting iommufd holds
> ranges for this job..
>
> So this is OK then
Thank you. It seems that we have reached a consensus on all the comments.
I will send out a v2 patchset soon.
Thanks,
Zhe
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2025-07-04 2:20 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-30 7:25 [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() for large folio lizhe.67
2025-06-30 7:25 ` [PATCH 1/4] vfio/type1: optimize vfio_pin_pages_remote() for large folios lizhe.67
2025-06-30 7:25 ` [PATCH 2/4] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
2025-07-02 18:27 ` Jason Gunthorpe
2025-07-03 4:18 ` lizhe.67
2025-07-03 12:27 ` Jason Gunthorpe
2025-07-04 2:20 ` lizhe.67
2025-06-30 7:25 ` [PATCH 3/4] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
2025-07-01 15:13 ` Dan Carpenter
2025-07-02 3:47 ` lizhe.67
2025-07-02 16:11 ` Dan Carpenter
2025-06-30 7:25 ` [PATCH 4/4] vfio/type1: optimize vfio_unpin_pages_remote() for large folio lizhe.67
2025-07-02 18:28 ` Jason Gunthorpe
2025-07-03 6:12 ` lizhe.67
2025-07-02 8:15 ` [PATCH 0/4] vfio/type1: optimize vfio_pin_pages_remote() and " David Hildenbrand
2025-07-02 9:38 ` lizhe.67
2025-07-02 9:57 ` David Hildenbrand
2025-07-02 12:47 ` Jason Gunthorpe
2025-07-03 4:04 ` lizhe.67
2025-07-03 3:54 ` lizhe.67
2025-07-03 11:06 ` David Hildenbrand
2025-07-03 11:12 ` Jason Gunthorpe
2025-07-03 11:35 ` lizhe.67
2025-07-03 11:34 ` lizhe.67
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).