* [PATCH v3 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote()
@ 2025-07-07 6:49 lizhe.67
2025-07-07 6:49 ` [PATCH v3 1/5] mm: introduce num_pages_contiguous() lizhe.67
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: lizhe.67 @ 2025-07-07 6:49 UTC (permalink / raw)
To: alex.williamson, akpm, david, jgg, peterx
Cc: kvm, linux-kernel, linux-mm, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
This patchset is an integration of the two previous patchsets[1][2].
When vfio_pin_pages_remote() is called with a range of addresses that
includes large folios, the function currently performs individual
statistics counting operations for each page. This can lead to significant
performance overheads, especially when dealing with large ranges of pages.
The function vfio_unpin_pages_remote() has a similar issue, where executing
put_pfn() for each pfn brings considerable consumption.
This patchset primarily optimizes the performance of the relevant functions
by batching the less efficient operations mentioned before.
The first two patch optimizes the performance of the function
vfio_pin_pages_remote(), while the remaining patches optimize the
performance of the function vfio_unpin_pages_remote().
The performance test results, based on v6.16-rc4, for completing the 16G
VFIO MAP/UNMAP DMA, obtained through unit test[3] with slight
modifications[4], are as follows.
Base(6.16-rc4):
./vfio-pci-mem-dma-map 0000:03:00.0 16
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.047 s (340.2 GB/s)
VFIO UNMAP DMA in 0.135 s (118.6 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.280 s (57.2 GB/s)
VFIO UNMAP DMA in 0.312 s (51.3 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.052 s (310.5 GB/s)
VFIO UNMAP DMA in 0.136 s (117.3 GB/s)
With this patchset:
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.027 s (600.7 GB/s)
VFIO UNMAP DMA in 0.045 s (357.0 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.261 s (61.4 GB/s)
VFIO UNMAP DMA in 0.288 s (55.6 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.031 s (516.4 GB/s)
VFIO UNMAP DMA in 0.045 s (353.9 GB/s)
For large folio, we achieve an over 40% performance improvement for VFIO
MAP DMA and an over 66% performance improvement for VFIO DMA UNMAP. For
small folios, the performance test results show a slight improvement with
the performance before optimization.
[1]: https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/
[2]: https://lore.kernel.org/all/20250620032344.13382-1-lizhe.67@bytedance.com/#t
[3]: https://github.com/awilliam/tests/blob/vfio-pci-mem-dma-map/vfio-pci-mem-dma-map.c
[4]: https://lore.kernel.org/all/20250610031013.98556-1-lizhe.67@bytedance.com/
Li Zhe (5):
mm: introduce num_pages_contiguous()
vfio/type1: optimize vfio_pin_pages_remote()
vfio/type1: batch vfio_find_vpfn() in function
vfio_unpin_pages_remote()
vfio/type1: introduce a new member has_rsvd for struct vfio_dma
vfio/type1: optimize vfio_unpin_pages_remote()
drivers/vfio/vfio_iommu_type1.c | 111 ++++++++++++++++++++++++++------
include/linux/mm.h | 23 +++++++
2 files changed, 113 insertions(+), 21 deletions(-)
---
Changelogs:
v2->v3:
- Add a "Suggested-by" and a "Reviewed-by" tag.
- Address the compilation errors introduced by patch #1.
- Resolved several variable type issues.
- Add clarification for function num_pages_contiguous().
v1->v2:
- Update the performance test results.
- The function num_pages_contiguous() is extracted and placed in a
separate commit.
- The phrase 'for large folio' has been removed from the patchset title.
v2: https://lore.kernel.org/all/20250704062602.33500-1-lizhe.67@bytedance.com/
v1: https://lore.kernel.org/all/20250630072518.31846-1-lizhe.67@bytedance.com/
--
2.20.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/5] mm: introduce num_pages_contiguous()
2025-07-07 6:49 [PATCH v3 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() lizhe.67
@ 2025-07-07 6:49 ` lizhe.67
2025-07-07 7:27 ` David Hildenbrand
2025-07-07 6:49 ` [PATCH v3 2/5] vfio/type1: optimize vfio_pin_pages_remote() lizhe.67
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: lizhe.67 @ 2025-07-07 6:49 UTC (permalink / raw)
To: alex.williamson, akpm, david, jgg, peterx
Cc: kvm, linux-kernel, linux-mm, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
Function num_pages_contiguous() determine the number of contiguous
pages starting from the first page in the given array of page pointers.
VFIO will utilize this interface to accelerate the VFIO DMA map process.
Suggested-by: David Hildenbrand <david@redhat.com>
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
include/linux/mm.h | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0ef2ba0c667a..fae82df6d7d7 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1761,6 +1761,29 @@ static inline unsigned long page_to_section(const struct page *page)
}
#endif
+/*
+ * num_pages_contiguous() - determine the number of contiguous pages
+ * starting from the first page.
+ *
+ * Pages are contiguous if they represent contiguous PFNs. Depending on
+ * the memory model, this can mean that the addresses of the "struct page"s
+ * are not contiguous.
+ *
+ * @pages: an array of page pointers
+ * @nr_pages: length of the array
+ */
+static inline unsigned long num_pages_contiguous(struct page **pages,
+ size_t nr_pages)
+{
+ size_t i;
+
+ for (i = 1; i < nr_pages; i++)
+ if (pages[i] != nth_page(pages[0], i))
+ break;
+
+ return i;
+}
+
/**
* folio_pfn - Return the Page Frame Number of a folio.
* @folio: The folio.
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 2/5] vfio/type1: optimize vfio_pin_pages_remote()
2025-07-07 6:49 [PATCH v3 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() lizhe.67
2025-07-07 6:49 ` [PATCH v3 1/5] mm: introduce num_pages_contiguous() lizhe.67
@ 2025-07-07 6:49 ` lizhe.67
2025-07-07 7:29 ` David Hildenbrand
2025-07-07 6:49 ` [PATCH v3 3/5] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: lizhe.67 @ 2025-07-07 6:49 UTC (permalink / raw)
To: alex.williamson, akpm, david, jgg, peterx
Cc: kvm, linux-kernel, linux-mm, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
When vfio_pin_pages_remote() is called with a range of addresses that
includes large folios, the function currently performs individual
statistics counting operations for each page. This can lead to significant
performance overheads, especially when dealing with large ranges of pages.
Batch processing of statistical counting operations can effectively enhance
performance.
In addition, the pages obtained through longterm GUP are neither invalid
nor reserved. Therefore, we can reduce the overhead associated with some
calls to function is_invalid_reserved_pfn().
The performance test results for completing the 16G VFIO IOMMU DMA mapping
are as follows.
Base(v6.16-rc4):
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.047 s (340.2 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.280 s (57.2 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.052 s (310.5 GB/s)
With this patch:
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO MAP DMA in 0.027 s (602.1 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO MAP DMA in 0.257 s (62.4 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO MAP DMA in 0.031 s (517.4 GB/s)
For large folio, we achieve an over 40% performance improvement.
For small folios, the performance test results indicate a
slight improvement.
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
drivers/vfio/vfio_iommu_type1.c | 83 ++++++++++++++++++++++++++++-----
1 file changed, 71 insertions(+), 12 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1136d7ac6b59..03fce54e1372 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -318,7 +318,13 @@ static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
/*
* Helper Functions for host iova-pfn list
*/
-static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
+
+/*
+ * Find the highest vfio_pfn that overlapping the range
+ * [iova_start, iova_end) in rb tree.
+ */
+static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma,
+ dma_addr_t iova_start, dma_addr_t iova_end)
{
struct vfio_pfn *vpfn;
struct rb_node *node = dma->pfn_list.rb_node;
@@ -326,9 +332,9 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
while (node) {
vpfn = rb_entry(node, struct vfio_pfn, node);
- if (iova < vpfn->iova)
+ if (iova_end <= vpfn->iova)
node = node->rb_left;
- else if (iova > vpfn->iova)
+ else if (iova_start > vpfn->iova)
node = node->rb_right;
else
return vpfn;
@@ -336,6 +342,11 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
return NULL;
}
+static inline struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
+{
+ return vfio_find_vpfn_range(dma, iova, iova + PAGE_SIZE);
+}
+
static void vfio_link_pfn(struct vfio_dma *dma,
struct vfio_pfn *new)
{
@@ -614,6 +625,39 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
return ret;
}
+
+static long vpfn_pages(struct vfio_dma *dma,
+ dma_addr_t iova_start, long nr_pages)
+{
+ dma_addr_t iova_end = iova_start + (nr_pages << PAGE_SHIFT);
+ struct vfio_pfn *top = vfio_find_vpfn_range(dma, iova_start, iova_end);
+ long ret = 1;
+ struct vfio_pfn *vpfn;
+ struct rb_node *prev;
+ struct rb_node *next;
+
+ if (likely(!top))
+ return 0;
+
+ prev = next = &top->node;
+
+ while ((prev = rb_prev(prev))) {
+ vpfn = rb_entry(prev, struct vfio_pfn, node);
+ if (vpfn->iova < iova_start)
+ break;
+ ret++;
+ }
+
+ while ((next = rb_next(next))) {
+ vpfn = rb_entry(next, struct vfio_pfn, node);
+ if (vpfn->iova >= iova_end)
+ break;
+ ret++;
+ }
+
+ return ret;
+}
+
/*
* Attempt to pin pages. We really don't want to track all the pfns and
* the iommu can only map chunks of consecutive pfns anyway, so get the
@@ -680,32 +724,47 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
* and rsvd here, and therefore continues to use the batch.
*/
while (true) {
+ long nr_pages, acct_pages = 0;
+
if (pfn != *pfn_base + pinned ||
rsvd != is_invalid_reserved_pfn(pfn))
goto out;
+ /*
+ * Using GUP with the FOLL_LONGTERM in
+ * vaddr_get_pfns() will not return invalid
+ * or reserved pages.
+ */
+ nr_pages = num_pages_contiguous(
+ &batch->pages[batch->offset],
+ batch->size);
+ if (!rsvd) {
+ acct_pages = nr_pages;
+ acct_pages -= vpfn_pages(dma, iova, nr_pages);
+ }
+
/*
* Reserved pages aren't counted against the user,
* externally pinned pages are already counted against
* the user.
*/
- if (!rsvd && !vfio_find_vpfn(dma, iova)) {
+ if (acct_pages) {
if (!dma->lock_cap &&
- mm->locked_vm + lock_acct + 1 > limit) {
+ mm->locked_vm + lock_acct + acct_pages > limit) {
pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
__func__, limit << PAGE_SHIFT);
ret = -ENOMEM;
goto unpin_out;
}
- lock_acct++;
+ lock_acct += acct_pages;
}
- pinned++;
- npage--;
- vaddr += PAGE_SIZE;
- iova += PAGE_SIZE;
- batch->offset++;
- batch->size--;
+ pinned += nr_pages;
+ npage -= nr_pages;
+ vaddr += PAGE_SIZE * nr_pages;
+ iova += PAGE_SIZE * nr_pages;
+ batch->offset += nr_pages;
+ batch->size -= nr_pages;
if (!batch->size)
break;
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 3/5] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote()
2025-07-07 6:49 [PATCH v3 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() lizhe.67
2025-07-07 6:49 ` [PATCH v3 1/5] mm: introduce num_pages_contiguous() lizhe.67
2025-07-07 6:49 ` [PATCH v3 2/5] vfio/type1: optimize vfio_pin_pages_remote() lizhe.67
@ 2025-07-07 6:49 ` lizhe.67
2025-07-07 6:49 ` [PATCH v3 4/5] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
2025-07-07 6:49 ` [PATCH v3 5/5] vfio/type1: optimize vfio_unpin_pages_remote() lizhe.67
4 siblings, 0 replies; 10+ messages in thread
From: lizhe.67 @ 2025-07-07 6:49 UTC (permalink / raw)
To: alex.williamson, akpm, david, jgg, peterx
Cc: kvm, linux-kernel, linux-mm, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
The function vpfn_pages() can help us determine the number of vpfn
nodes on the vpfn rb tree within a specified range. This allows us
to avoid searching for each vpfn individually in the function
vfio_unpin_pages_remote(). This patch batches the vfio_find_vpfn()
calls in function vfio_unpin_pages_remote().
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
drivers/vfio/vfio_iommu_type1.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 03fce54e1372..b770eb1c4e07 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -794,16 +794,12 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
unsigned long pfn, unsigned long npage,
bool do_accounting)
{
- long unlocked = 0, locked = 0;
+ long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
long i;
- for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
- if (put_pfn(pfn++, dma->prot)) {
+ for (i = 0; i < npage; i++)
+ if (put_pfn(pfn++, dma->prot))
unlocked++;
- if (vfio_find_vpfn(dma, iova))
- locked++;
- }
- }
if (do_accounting)
vfio_lock_acct(dma, locked - unlocked, true);
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 4/5] vfio/type1: introduce a new member has_rsvd for struct vfio_dma
2025-07-07 6:49 [PATCH v3 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() lizhe.67
` (2 preceding siblings ...)
2025-07-07 6:49 ` [PATCH v3 3/5] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
@ 2025-07-07 6:49 ` lizhe.67
2025-07-07 6:49 ` [PATCH v3 5/5] vfio/type1: optimize vfio_unpin_pages_remote() lizhe.67
4 siblings, 0 replies; 10+ messages in thread
From: lizhe.67 @ 2025-07-07 6:49 UTC (permalink / raw)
To: alex.williamson, akpm, david, jgg, peterx
Cc: kvm, linux-kernel, linux-mm, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
Introduce a new member has_rsvd for struct vfio_dma. This member is
used to indicate whether there are any reserved or invalid pfns in
the region represented by this vfio_dma. If it is true, it indicates
that there is at least one pfn in this region that is either reserved
or invalid.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
drivers/vfio/vfio_iommu_type1.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b770eb1c4e07..13c5667d431c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -92,6 +92,7 @@ struct vfio_dma {
bool iommu_mapped;
bool lock_cap; /* capable(CAP_IPC_LOCK) */
bool vaddr_invalid;
+ bool has_rsvd; /* has 1 or more rsvd pfns */
struct task_struct *task;
struct rb_root pfn_list; /* Ex-user pinned pfn list */
unsigned long *bitmap;
@@ -774,6 +775,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
}
out:
+ dma->has_rsvd |= rsvd;
ret = vfio_lock_acct(dma, lock_acct, false);
unpin_out:
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v3 5/5] vfio/type1: optimize vfio_unpin_pages_remote()
2025-07-07 6:49 [PATCH v3 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() lizhe.67
` (3 preceding siblings ...)
2025-07-07 6:49 ` [PATCH v3 4/5] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
@ 2025-07-07 6:49 ` lizhe.67
2025-07-07 7:30 ` David Hildenbrand
4 siblings, 1 reply; 10+ messages in thread
From: lizhe.67 @ 2025-07-07 6:49 UTC (permalink / raw)
To: alex.williamson, akpm, david, jgg, peterx
Cc: kvm, linux-kernel, linux-mm, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
When vfio_unpin_pages_remote() is called with a range of addresses that
includes large folios, the function currently performs individual
put_pfn() operations for each page. This can lead to significant
performance overheads, especially when dealing with large ranges of pages.
It would be very rare for reserved PFNs and non reserved will to be mixed
within the same range. So this patch utilizes the has_rsvd variable
introduced in the previous patch to determine whether batch put_pfn()
operations can be performed. Moreover, compared to put_pfn(),
unpin_user_page_range_dirty_lock() is capable of handling large folio
scenarios more efficiently.
The performance test results for completing the 16G VFIO IOMMU DMA
unmapping are as follows.
Base(v6.16-rc4):
./vfio-pci-mem-dma-map 0000:03:00.0 16
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO UNMAP DMA in 0.135 s (118.6 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO UNMAP DMA in 0.312 s (51.3 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO UNMAP DMA in 0.136 s (117.3 GB/s)
With this patchset:
------- AVERAGE (MADV_HUGEPAGE) --------
VFIO UNMAP DMA in 0.045 s (357.0 GB/s)
------- AVERAGE (MAP_POPULATE) --------
VFIO UNMAP DMA in 0.288 s (55.6 GB/s)
------- AVERAGE (HUGETLBFS) --------
VFIO UNMAP DMA in 0.045 s (353.9 GB/s)
For large folio, we achieve an over 66% performance improvement in
the VFIO UNMAP DMA item. For small folios, the performance test
results appear to show a slight improvement.
Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
drivers/vfio/vfio_iommu_type1.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 13c5667d431c..208576bd5ac3 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -792,17 +792,29 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
return pinned;
}
+static inline void put_valid_unreserved_pfns(unsigned long start_pfn,
+ unsigned long npage, int prot)
+{
+ unpin_user_page_range_dirty_lock(pfn_to_page(start_pfn), npage,
+ prot & IOMMU_WRITE);
+}
+
static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
unsigned long pfn, unsigned long npage,
bool do_accounting)
{
long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
- long i;
- for (i = 0; i < npage; i++)
- if (put_pfn(pfn++, dma->prot))
- unlocked++;
+ if (dma->has_rsvd) {
+ unsigned long i;
+ for (i = 0; i < npage; i++)
+ if (put_pfn(pfn++, dma->prot))
+ unlocked++;
+ } else {
+ put_valid_unreserved_pfns(pfn, npage, dma->prot);
+ unlocked = npage;
+ }
if (do_accounting)
vfio_lock_acct(dma, locked - unlocked, true);
--
2.20.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/5] mm: introduce num_pages_contiguous()
2025-07-07 6:49 ` [PATCH v3 1/5] mm: introduce num_pages_contiguous() lizhe.67
@ 2025-07-07 7:27 ` David Hildenbrand
0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-07-07 7:27 UTC (permalink / raw)
To: lizhe.67, alex.williamson, akpm, jgg, peterx; +Cc: kvm, linux-kernel, linux-mm
On 07.07.25 08:49, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
>
> Function num_pages_contiguous() determine the number of contiguous
> pages starting from the first page in the given array of page pointers.
> VFIO will utilize this interface to accelerate the VFIO DMA map process.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/5] vfio/type1: optimize vfio_pin_pages_remote()
2025-07-07 6:49 ` [PATCH v3 2/5] vfio/type1: optimize vfio_pin_pages_remote() lizhe.67
@ 2025-07-07 7:29 ` David Hildenbrand
2025-07-08 3:00 ` lizhe.67
0 siblings, 1 reply; 10+ messages in thread
From: David Hildenbrand @ 2025-07-07 7:29 UTC (permalink / raw)
To: lizhe.67, alex.williamson, akpm, jgg, peterx; +Cc: kvm, linux-kernel, linux-mm
On 07.07.25 08:49, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
>
> When vfio_pin_pages_remote() is called with a range of addresses that
> includes large folios, the function currently performs individual
> statistics counting operations for each page. This can lead to significant
> performance overheads, especially when dealing with large ranges of pages.
> Batch processing of statistical counting operations can effectively enhance
> performance.
>
> In addition, the pages obtained through longterm GUP are neither invalid
> nor reserved. Therefore, we can reduce the overhead associated with some
> calls to function is_invalid_reserved_pfn().
>
> The performance test results for completing the 16G VFIO IOMMU DMA mapping
> are as follows.
>
> Base(v6.16-rc4):
> ------- AVERAGE (MADV_HUGEPAGE) --------
> VFIO MAP DMA in 0.047 s (340.2 GB/s)
> ------- AVERAGE (MAP_POPULATE) --------
> VFIO MAP DMA in 0.280 s (57.2 GB/s)
> ------- AVERAGE (HUGETLBFS) --------
> VFIO MAP DMA in 0.052 s (310.5 GB/s)
>
> With this patch:
> ------- AVERAGE (MADV_HUGEPAGE) --------
> VFIO MAP DMA in 0.027 s (602.1 GB/s)
> ------- AVERAGE (MAP_POPULATE) --------
> VFIO MAP DMA in 0.257 s (62.4 GB/s)
> ------- AVERAGE (HUGETLBFS) --------
> VFIO MAP DMA in 0.031 s (517.4 GB/s)
>
> For large folio, we achieve an over 40% performance improvement.
> For small folios, the performance test results indicate a
> slight improvement.
>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> Co-developed-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> drivers/vfio/vfio_iommu_type1.c | 83 ++++++++++++++++++++++++++++-----
> 1 file changed, 71 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 1136d7ac6b59..03fce54e1372 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -318,7 +318,13 @@ static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
> /*
> * Helper Functions for host iova-pfn list
> */
> -static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
> +
> +/*
> + * Find the highest vfio_pfn that overlapping the range
> + * [iova_start, iova_end) in rb tree.
> + */
> +static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma,
> + dma_addr_t iova_start, dma_addr_t iova_end)
> {
> struct vfio_pfn *vpfn;
> struct rb_node *node = dma->pfn_list.rb_node;
> @@ -326,9 +332,9 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
> while (node) {
> vpfn = rb_entry(node, struct vfio_pfn, node);
>
> - if (iova < vpfn->iova)
> + if (iova_end <= vpfn->iova)
> node = node->rb_left;
> - else if (iova > vpfn->iova)
> + else if (iova_start > vpfn->iova)
> node = node->rb_right;
> else
> return vpfn;
> @@ -336,6 +342,11 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
> return NULL;
> }
>
> +static inline struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova)
> +{
> + return vfio_find_vpfn_range(dma, iova, iova + PAGE_SIZE);
> +}
> +
> static void vfio_link_pfn(struct vfio_dma *dma,
> struct vfio_pfn *new)
> {
> @@ -614,6 +625,39 @@ static long vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
> return ret;
> }
>
> +
> +static long vpfn_pages(struct vfio_dma *dma,
> + dma_addr_t iova_start, long nr_pages)
> +{
> + dma_addr_t iova_end = iova_start + (nr_pages << PAGE_SHIFT);
> + struct vfio_pfn *top = vfio_find_vpfn_range(dma, iova_start, iova_end);
> + long ret = 1;
> + struct vfio_pfn *vpfn;
> + struct rb_node *prev;
> + struct rb_node *next;
> +
> + if (likely(!top))
> + return 0;
> +
> + prev = next = &top->node;
> +
> + while ((prev = rb_prev(prev))) {
> + vpfn = rb_entry(prev, struct vfio_pfn, node);
> + if (vpfn->iova < iova_start)
> + break;
> + ret++;
> + }
> +
> + while ((next = rb_next(next))) {
> + vpfn = rb_entry(next, struct vfio_pfn, node);
> + if (vpfn->iova >= iova_end)
> + break;
> + ret++;
> + }
> +
> + return ret;
> +}
> +
> /*
> * Attempt to pin pages. We really don't want to track all the pfns and
> * the iommu can only map chunks of consecutive pfns anyway, so get the
> @@ -680,32 +724,47 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> * and rsvd here, and therefore continues to use the batch.
> */
> while (true) {
> + long nr_pages, acct_pages = 0;
> +
> if (pfn != *pfn_base + pinned ||
> rsvd != is_invalid_reserved_pfn(pfn))
> goto out;
>
> + /*
> + * Using GUP with the FOLL_LONGTERM in
> + * vaddr_get_pfns() will not return invalid
> + * or reserved pages.
> + */
> + nr_pages = num_pages_contiguous(
> + &batch->pages[batch->offset],
> + batch->size);
> + if (!rsvd) {
> + acct_pages = nr_pages;
> + acct_pages -= vpfn_pages(dma, iova, nr_pages);
> + }
> +
> /*
> * Reserved pages aren't counted against the user,
> * externally pinned pages are already counted against
> * the user.
> */
> - if (!rsvd && !vfio_find_vpfn(dma, iova)) {
> + if (acct_pages) {
> if (!dma->lock_cap &&
> - mm->locked_vm + lock_acct + 1 > limit) {
> + mm->locked_vm + lock_acct + acct_pages > limit) {
Weird indentation change.
It should be
if (!dma->lock_cap &&
mm->locked_vm + lock_acct + acct_pages > limit) {
^ aligned here
Please don't drop acks/rbs already given in previous submissions.
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 5/5] vfio/type1: optimize vfio_unpin_pages_remote()
2025-07-07 6:49 ` [PATCH v3 5/5] vfio/type1: optimize vfio_unpin_pages_remote() lizhe.67
@ 2025-07-07 7:30 ` David Hildenbrand
0 siblings, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2025-07-07 7:30 UTC (permalink / raw)
To: lizhe.67, alex.williamson, akpm, jgg, peterx; +Cc: kvm, linux-kernel, linux-mm
On 07.07.25 08:49, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
>
> When vfio_unpin_pages_remote() is called with a range of addresses that
> includes large folios, the function currently performs individual
> put_pfn() operations for each page. This can lead to significant
> performance overheads, especially when dealing with large ranges of pages.
>
> It would be very rare for reserved PFNs and non reserved will to be mixed
> within the same range. So this patch utilizes the has_rsvd variable
> introduced in the previous patch to determine whether batch put_pfn()
> operations can be performed. Moreover, compared to put_pfn(),
> unpin_user_page_range_dirty_lock() is capable of handling large folio
> scenarios more efficiently.
>
> The performance test results for completing the 16G VFIO IOMMU DMA
> unmapping are as follows.
>
> Base(v6.16-rc4):
> ./vfio-pci-mem-dma-map 0000:03:00.0 16
> ------- AVERAGE (MADV_HUGEPAGE) --------
> VFIO UNMAP DMA in 0.135 s (118.6 GB/s)
> ------- AVERAGE (MAP_POPULATE) --------
> VFIO UNMAP DMA in 0.312 s (51.3 GB/s)
> ------- AVERAGE (HUGETLBFS) --------
> VFIO UNMAP DMA in 0.136 s (117.3 GB/s)
>
> With this patchset:
> ------- AVERAGE (MADV_HUGEPAGE) --------
> VFIO UNMAP DMA in 0.045 s (357.0 GB/s)
> ------- AVERAGE (MAP_POPULATE) --------
> VFIO UNMAP DMA in 0.288 s (55.6 GB/s)
> ------- AVERAGE (HUGETLBFS) --------
> VFIO UNMAP DMA in 0.045 s (353.9 GB/s)
>
> For large folio, we achieve an over 66% performance improvement in
> the VFIO UNMAP DMA item. For small folios, the performance test
> results appear to show a slight improvement.
>
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
> drivers/vfio/vfio_iommu_type1.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 13c5667d431c..208576bd5ac3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -792,17 +792,29 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> return pinned;
> }
>
> +static inline void put_valid_unreserved_pfns(unsigned long start_pfn,
> + unsigned long npage, int prot)
> +{
> + unpin_user_page_range_dirty_lock(pfn_to_page(start_pfn), npage,
> + prot & IOMMU_WRITE);
> +}
> +
> static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
> unsigned long pfn, unsigned long npage,
> bool do_accounting)
> {
> long unlocked = 0, locked = vpfn_pages(dma, iova, npage);
> - long i;
>
> - for (i = 0; i < npage; i++)
> - if (put_pfn(pfn++, dma->prot))
> - unlocked++;
> + if (dma->has_rsvd) {
> + unsigned long i;
>
> + for (i = 0; i < npage; i++)
> + if (put_pfn(pfn++, dma->prot))
> + unlocked++;
> + } else {
> + put_valid_unreserved_pfns(pfn, npage, dma->prot);
> + unlocked = npage;
> + }
> if (do_accounting)
> vfio_lock_acct(dma, locked - unlocked, true);
>
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 2/5] vfio/type1: optimize vfio_pin_pages_remote()
2025-07-07 7:29 ` David Hildenbrand
@ 2025-07-08 3:00 ` lizhe.67
0 siblings, 0 replies; 10+ messages in thread
From: lizhe.67 @ 2025-07-08 3:00 UTC (permalink / raw)
To: david
Cc: akpm, alex.williamson, jgg, kvm, linux-kernel, linux-mm, lizhe.67,
peterx
On Mon, 7 Jul 2025 09:29:30 +0200, david@redhat.com wrote:
> > @@ -680,32 +724,47 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> > * and rsvd here, and therefore continues to use the batch.
> > */
> > while (true) {
> > + long nr_pages, acct_pages = 0;
> > +
> > if (pfn != *pfn_base + pinned ||
> > rsvd != is_invalid_reserved_pfn(pfn))
> > goto out;
> >
> > + /*
> > + * Using GUP with the FOLL_LONGTERM in
> > + * vaddr_get_pfns() will not return invalid
> > + * or reserved pages.
> > + */
> > + nr_pages = num_pages_contiguous(
> > + &batch->pages[batch->offset],
> > + batch->size);
> > + if (!rsvd) {
> > + acct_pages = nr_pages;
> > + acct_pages -= vpfn_pages(dma, iova, nr_pages);
> > + }
> > +
> > /*
> > * Reserved pages aren't counted against the user,
> > * externally pinned pages are already counted against
> > * the user.
> > */
> > - if (!rsvd && !vfio_find_vpfn(dma, iova)) {
> > + if (acct_pages) {
> > if (!dma->lock_cap &&
> > - mm->locked_vm + lock_acct + 1 > limit) {
> > + mm->locked_vm + lock_acct + acct_pages > limit) {
>
> Weird indentation change.
>
> It should be
>
> if (!dma->lock_cap &&
> mm->locked_vm + lock_acct + acct_pages > limit) {
>
> ^ aligned here
Thank you! We need the following fixup code.
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 03fce54e1372..6909275e46c2 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -750,7 +750,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
*/
if (acct_pages) {
if (!dma->lock_cap &&
- mm->locked_vm + lock_acct + acct_pages > limit) {
+ mm->locked_vm + lock_acct + acct_pages > limit) {
pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n",
__func__, limit << PAGE_SHIFT);
ret = -ENOMEM;
> Please don't drop acks/rbs already given in previous submissions.
Sorry for the inconvenience it may have caused.
> Acked-by: David Hildenbrand <david@redhat.com>
Thank you once again for your review.
Thanks,
Zhe
^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-07-08 3:00 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-07 6:49 [PATCH v3 0/5] vfio/type1: optimize vfio_pin_pages_remote() and vfio_unpin_pages_remote() lizhe.67
2025-07-07 6:49 ` [PATCH v3 1/5] mm: introduce num_pages_contiguous() lizhe.67
2025-07-07 7:27 ` David Hildenbrand
2025-07-07 6:49 ` [PATCH v3 2/5] vfio/type1: optimize vfio_pin_pages_remote() lizhe.67
2025-07-07 7:29 ` David Hildenbrand
2025-07-08 3:00 ` lizhe.67
2025-07-07 6:49 ` [PATCH v3 3/5] vfio/type1: batch vfio_find_vpfn() in function vfio_unpin_pages_remote() lizhe.67
2025-07-07 6:49 ` [PATCH v3 4/5] vfio/type1: introduce a new member has_rsvd for struct vfio_dma lizhe.67
2025-07-07 6:49 ` [PATCH v3 5/5] vfio/type1: optimize vfio_unpin_pages_remote() lizhe.67
2025-07-07 7:30 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).