* [PATCH] vfio/pci: Insert full vma on mmap'd MMIO fault
@ 2024-06-07 3:52 Alex Williamson
2024-06-11 15:23 ` Alex Williamson
0 siblings, 1 reply; 4+ messages in thread
From: Alex Williamson @ 2024-06-07 3:52 UTC (permalink / raw)
To: kvm; +Cc: Alex Williamson, ajones, yan.y.zhao, kevin.tian, jgg, peterx
In order to improve performance of typical scenarios we can try to insert
the entire vma on fault. This accelerates typical cases, such as when
the MMIO region is DMA mapped by QEMU. The vfio_iommu_type1 driver will
fault in the entire DMA mapped range through fixup_user_fault().
In synthetic testing, this improves the time required to walk a PCI BAR
mapping from userspace by roughly 1/3rd.
This is likely an interim solution until vmf_insert_pfn_{pmd,pud}() gain
support for pfnmaps.
Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/all/Zl6XdUkt%2FzMMGOLF@yzhao56-desk.sh.intel.com/
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
I'm sending this as a follow-on patch to the v2 series[1] because this
is largely a performance optimization, and one that we may want to
revert when we can introduce huge_fault support. In the meantime, I
can't argue with the 1/3rd performance improvement this provides to
reduce the overall impact of the series below. Without objection I'd
therefore target this for v6.10 as well. Thanks,
Alex
[1]https://lore.kernel.org/all/20240530045236.1005864-1-alex.williamson@redhat.com/
drivers/vfio/pci/vfio_pci_core.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index db31c27bf78b..987c7921affa 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1662,6 +1662,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
struct vm_area_struct *vma = vmf->vma;
struct vfio_pci_core_device *vdev = vma->vm_private_data;
unsigned long pfn, pgoff = vmf->pgoff - vma->vm_pgoff;
+ unsigned long addr = vma->vm_start;
vm_fault_t ret = VM_FAULT_SIGBUS;
pfn = vma_to_pfn(vma);
@@ -1669,11 +1670,25 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
down_read(&vdev->memory_lock);
if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev))
- goto out_disabled;
+ goto out_unlock;
ret = vmf_insert_pfn(vma, vmf->address, pfn + pgoff);
+ if (ret & VM_FAULT_ERROR)
+ goto out_unlock;
-out_disabled:
+ /*
+ * Pre-fault the remainder of the vma, abort further insertions and
+ * supress error if fault is encountered during pre-fault.
+ */
+ for (; addr < vma->vm_end; addr += PAGE_SIZE, pfn++) {
+ if (addr == vmf->address)
+ continue;
+
+ if (vmf_insert_pfn(vma, addr, pfn) & VM_FAULT_ERROR)
+ break;
+ }
+
+out_unlock:
up_read(&vdev->memory_lock);
return ret;
--
2.45.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] vfio/pci: Insert full vma on mmap'd MMIO fault
2024-06-07 3:52 [PATCH] vfio/pci: Insert full vma on mmap'd MMIO fault Alex Williamson
@ 2024-06-11 15:23 ` Alex Williamson
2024-06-12 10:08 ` Yan Zhao
2024-06-12 12:17 ` Jason Gunthorpe
0 siblings, 2 replies; 4+ messages in thread
From: Alex Williamson @ 2024-06-11 15:23 UTC (permalink / raw)
To: kvm; +Cc: ajones, yan.y.zhao, kevin.tian, jgg, peterx
Any support for this or should we just go with the v2 series[1] by
itself for v6.10? Thanks,
Alex
[1]https://lore.kernel.org/all/20240530045236.1005864-1-alex.williamson@redhat.com/
On Thu, 6 Jun 2024 21:52:07 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:
> In order to improve performance of typical scenarios we can try to insert
> the entire vma on fault. This accelerates typical cases, such as when
> the MMIO region is DMA mapped by QEMU. The vfio_iommu_type1 driver will
> fault in the entire DMA mapped range through fixup_user_fault().
>
> In synthetic testing, this improves the time required to walk a PCI BAR
> mapping from userspace by roughly 1/3rd.
>
> This is likely an interim solution until vmf_insert_pfn_{pmd,pud}() gain
> support for pfnmaps.
>
> Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
> Link: https://lore.kernel.org/all/Zl6XdUkt%2FzMMGOLF@yzhao56-desk.sh.intel.com/
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>
> I'm sending this as a follow-on patch to the v2 series[1] because this
> is largely a performance optimization, and one that we may want to
> revert when we can introduce huge_fault support. In the meantime, I
> can't argue with the 1/3rd performance improvement this provides to
> reduce the overall impact of the series below. Without objection I'd
> therefore target this for v6.10 as well. Thanks,
>
> Alex
>
> [1]https://lore.kernel.org/all/20240530045236.1005864-1-alex.williamson@redhat.com/
>
> drivers/vfio/pci/vfio_pci_core.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index db31c27bf78b..987c7921affa 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1662,6 +1662,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
> struct vm_area_struct *vma = vmf->vma;
> struct vfio_pci_core_device *vdev = vma->vm_private_data;
> unsigned long pfn, pgoff = vmf->pgoff - vma->vm_pgoff;
> + unsigned long addr = vma->vm_start;
> vm_fault_t ret = VM_FAULT_SIGBUS;
>
> pfn = vma_to_pfn(vma);
> @@ -1669,11 +1670,25 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
> down_read(&vdev->memory_lock);
>
> if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev))
> - goto out_disabled;
> + goto out_unlock;
>
> ret = vmf_insert_pfn(vma, vmf->address, pfn + pgoff);
> + if (ret & VM_FAULT_ERROR)
> + goto out_unlock;
>
> -out_disabled:
> + /*
> + * Pre-fault the remainder of the vma, abort further insertions and
> + * supress error if fault is encountered during pre-fault.
> + */
> + for (; addr < vma->vm_end; addr += PAGE_SIZE, pfn++) {
> + if (addr == vmf->address)
> + continue;
> +
> + if (vmf_insert_pfn(vma, addr, pfn) & VM_FAULT_ERROR)
> + break;
> + }
> +
> +out_unlock:
> up_read(&vdev->memory_lock);
>
> return ret;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] vfio/pci: Insert full vma on mmap'd MMIO fault
2024-06-11 15:23 ` Alex Williamson
@ 2024-06-12 10:08 ` Yan Zhao
2024-06-12 12:17 ` Jason Gunthorpe
1 sibling, 0 replies; 4+ messages in thread
From: Yan Zhao @ 2024-06-12 10:08 UTC (permalink / raw)
To: Alex Williamson; +Cc: kvm, ajones, kevin.tian, jgg, peterx
On Tue, Jun 11, 2024 at 09:23:33AM -0600, Alex Williamson wrote:
>
> Any support for this or should we just go with the v2 series[1] by
> itself for v6.10? Thanks,
Tested on GPU passthrough with 1G MMIO bar.
Cnt of vfio_pci_mmap_fault() is reduced from 2M to 18,
Cycles of vfio_pci_mmap_fault is reduced rom 3400M to 2700M.
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
>
> [1]https://lore.kernel.org/all/20240530045236.1005864-1-alex.williamson@redhat.com/
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] vfio/pci: Insert full vma on mmap'd MMIO fault
2024-06-11 15:23 ` Alex Williamson
2024-06-12 10:08 ` Yan Zhao
@ 2024-06-12 12:17 ` Jason Gunthorpe
1 sibling, 0 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2024-06-12 12:17 UTC (permalink / raw)
To: Alex Williamson; +Cc: kvm, ajones, yan.y.zhao, kevin.tian, peterx
On Tue, Jun 11, 2024 at 09:23:33AM -0600, Alex Williamson wrote:
>
> Any support for this or should we just go with the v2 series[1] by
> itself for v6.10? Thanks,
I didn't think of a reason not to do this, but I don't know the fault
path especially well.
It sure would be nice to fault all the memory in one shot.
Jason
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-06-12 12:17 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-07 3:52 [PATCH] vfio/pci: Insert full vma on mmap'd MMIO fault Alex Williamson
2024-06-11 15:23 ` Alex Williamson
2024-06-12 10:08 ` Yan Zhao
2024-06-12 12:17 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox