From: Yan Zhao <yan.y.zhao@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, intel-gfx@lists.freedesktop.org,
linux-kernel@vger.kernel.org, Ben Gardon <bgardon@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
intel-gvt-dev@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry
Date: Fri, 6 Jan 2023 13:56:47 +0800 [thread overview]
Message-ID: <Y7e3fT8/V2NoXAUP@yzhao56-desk.sh.intel.com> (raw)
In-Reply-To: <Y7cLkLUMCy+XLRwm@google.com>
On Thu, Jan 05, 2023 at 05:40:32PM +0000, Sean Christopherson wrote:
> On Thu, Jan 05, 2023, Yan Zhao wrote:
> > On Tue, Jan 03, 2023 at 09:13:54PM +0000, Sean Christopherson wrote:
> > > On Wed, Dec 28, 2022, Yan Zhao wrote:
> > > > On Fri, Dec 23, 2022 at 12:57:15AM +0000, Sean Christopherson wrote:
> > > > > Honor KVM's max allowed page size when determining whether or not a 2MiB
> > > > > GTT shadow page can be created for the guest. Querying KVM's max allowed
> > > > > size is somewhat odd as there's no strict requirement that KVM's memslots
> > > > > and VFIO's mappings are configured with the same gfn=>hva mapping, but
> > > > Without vIOMMU, VFIO's mapping is configured with the same as KVM's
> > > > memslots, i.e. with the same gfn==>HVA mapping
> > >
> > > But that's controlled by userspace, correct?
> >
> > Yes, controlled by QEMU.
>
> ...
>
> > > Strictly speaking, no. E.g. if a 2MiB region is covered with multiple memslots
> > > and the memslots have different properties.
> > >
> > > > If for some reason, KVM maps a 2MiB range in 4K sizes, KVMGT can still map
> > > > it in IOMMU size in 2MiB size as long as the PFNs are continous and the
> > > > whole range is all exposed to guest.
> > >
> > > I agree that practically speaking this will hold true, but if KVMGT wants to honor
> > > KVM's memslots then checking that KVM allows a hugepage is correct. Hrm, but on
> > > the flip side, KVMGT ignores read-only memslot flags, so KVMGT is already ignoring
> > > pieces of KVM's memslots.
> > KVMGT calls dma_map_page() with DMA_BIDIRECTIONAL after checking gvt_pin_guest_page().
> > Though for a read-only memslot, DMA_TO_DEVICE should be used instead
> > (see dma_info_to_prot()),
> > as gvt_pin_guest_page() checks (IOMMU_READ | IOMMU_WRITE) permission for each page,
> > it actually ensures that the pinned GFN is not in a read-only memslot.
> > So, it should be fine.
> >
> > >
> > > I have no objection to KVMGT defining its ABI such that KVMGT is allowed to create
> > > 2MiB so long as (a) the GFN is contiguous according to VFIO, and (b) that the entire
> > > 2MiB range is exposed to the guest.
> > >
> > sorry. I may not put it clearly enough.
> > for a normal device pass-through via VFIO-PCI, VFIO maps IOMMU mappings in this way:
> >
> > (a) fault in PFNs in a GFN range within the same memslot (VFIO saves dma_list, which is
> > the same as memslot list when vIOMMU is not on or not in shadow mode).
> > (b) map continuous PFNs into iommu driver (honour ro attribute and can > 2MiB as long as
> > PFNs are continuous).
> > (c) IOMMU driver decides to map in 2MiB or in 4KiB according to its setting.
> >
> > For KVMGT, gvt_dma_map_page() first calls gvt_pin_guest_page() which
> > (a) calls vfio_pin_pages() to check each GFN is within allowed dma_list with
> > (IOMMU_READ | IOMMU_WRITE) permission and fault-in page.
> > (b) checks PFNs are continuous in 2MiB,
> >
> > Though checking kvm_page_track_max_mapping_level() is also fine, it makes DMA
> > mapping size unnecessarily smaller.
>
> Yeah, I got all that. What I'm trying to say, and why I asked about whether or
> not userspace controls the mappings, is that AFAIK there is nothing in the kernel
> that coordinates mappings between VFIO and KVM. So, very technically, userspace
> could map a 2MiB range contiguous in VFIO but not in KVM, or RW in VFIO but RO in KVM.
>
> I can't imagine there's a real use case for doing so, and arguably there's no
> requirement that KVMGT honor KVM's memslot. But because KVMGT taps into KVM's
> page-tracking, KVMGT _does_ honor KVM's memslots to some extent because KVMGT
> needs to know whether or not a given GFN can be write-protected.
>
> I'm totally fine if KVMGT's ABI is that VFIO is the source of truth for mappings
> and permissions, and that the only requirement for KVM memslots is that GTT page
> tables need to be visible in KVM's memslots. But if that's the ABI, then
> intel_gvt_is_valid_gfn() should be probing VFIO, not KVM (commit cc753fbe1ac4
> ("drm/i915/gvt: validate gfn before set shadow page entry").
>
> In other words, pick either VFIO or KVM. Checking that X is valid according to
> KVM and then mapping X through VFIO is confusing and makes assumptions about how
> userspace configures KVM and VFIO. It works because QEMU always configures KVM
> and VFIO as expected, but IMO it's unnecessarily fragile and again confusing for
> unaware readers because the code is technically flawed.
>
Agreed.
Then after some further thought, I think maybe we can just remove
intel_gvt_is_valid_gfn() in KVMGT, because
(1) both intel_gvt_is_valid_gfn() in emulate_ggtt_mmio_write() and
ppgtt_populate_spt() are not for page track purpose, but to validate bogus
GFN.
(2) gvt_pin_guest_page() with gfn and size can do the validity checking,
which is called in intel_gvt_dma_map_guest_page(). So, we can move the
mapping of scratch page to the error path after intel_gvt_dma_map_guest_page().
As below,
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 54b32ab843eb..5a85936df6d4 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -49,15 +49,6 @@
static bool enable_out_of_sync = false;
static int preallocated_oos_pages = 8192;
-static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn)
-{
- if (!vgpu->attached)
- return false;
-
- /* FIXME: This doesn't properly handle guest entries larger than 4K. */
- return kvm_page_track_is_valid_gfn(vgpu->vfio_device.kvm, gfn);
-}
-
/*
* validate a gm address and related range size,
* translate it to host gm address
@@ -1340,16 +1331,12 @@ static int ppgtt_populate_spt(struct intel_vgpu_ppgtt_spt *spt)
ppgtt_generate_shadow_entry(&se, s, &ge);
ppgtt_set_shadow_entry(spt, &se, i);
} else {
- gfn = ops->get_pfn(&ge);
- if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
+ ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
+ if (ret) {
ops->set_pfn(&se, gvt->gtt.scratch_mfn);
ppgtt_set_shadow_entry(spt, &se, i);
- continue;
- }
-
- ret = ppgtt_populate_shadow_entry(vgpu, spt, i, &ge);
- if (ret)
goto fail;
+ }
}
}
return 0;
@@ -2336,14 +2325,6 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
m.val64 = e.val64;
m.type = e.type;
- /* one PTE update may be issued in multiple writes and the
- * first write may not construct a valid gfn
- */
- if (!intel_gvt_is_valid_gfn(vgpu, gfn)) {
- ops->set_pfn(&m, gvt->gtt.scratch_mfn);
- goto out;
- }
-
ret = intel_gvt_dma_map_guest_page(vgpu, gfn, PAGE_SIZE,
&dma_addr);
if (ret) {
> On a related topic, ppgtt_populate_shadow_entry() should check the validity of the
> gfn. If I'm reading the code correctly, checking only in ppgtt_populate_spt() fails
> to handle the case where the guest creates a bogus mapping when writing an existing
> GTT PT.
Don't get it here. Could you elaborate more?
>
> Combing all my trains of thought, what about this as an end state for this series?
> (completely untested at this point). Get rid of the KVM mapping size checks,
> verify the validity of the entire range being mapped, and add a FIXME to complain
> about using KVM instead of VFIO to determine the validity of ranges.
>
> static bool intel_gvt_is_valid_gfn(struct intel_vgpu *vgpu, unsigned long gfn,
> enum intel_gvt_gtt_type type)
> {
> unsigned long nr_pages;
>
> if (!vgpu->attached)
> return false;
>
> if (type == GTT_TYPE_PPGTT_PTE_64K_ENTRY)
> nr_pages = I915_GTT_PAGE_SIZE_64K >> PAGE_SHIFT;
> else if (type == GTT_TYPE_PPGTT_PTE_2M_ENTRY)
> nr_pages = I915_GTT_PAGE_SIZE_2M >> PAGE_SHIFT;
> else
> nr_pages = 1;
>
> /*
> * FIXME: Probe VFIO, not KVM. VFIO is the source of truth for KVMGT
> * mappings and permissions, KVM's involvement is purely to handle
> * write-tracking of GTT page tables.
> */
> return kvm_page_track_is_contiguous_gfn_range(vgpu->vfio_device.kvm,
> gfn, nr_pages);
> }
>
> static int try_map_2MB_gtt_entry(struct intel_vgpu *vgpu, unsigned long gfn,
> dma_addr_t *dma_addr)
> {
> if (!HAS_PAGE_SIZES(vgpu->gvt->gt->i915, I915_GTT_PAGE_SIZE_2M))
> return 0;
>
> return intel_gvt_dma_map_guest_page(vgpu, gfn,
> I915_GTT_PAGE_SIZE_2M, dma_addr);
> }
>
> static int ppgtt_populate_shadow_entry(struct intel_vgpu *vgpu,
> struct intel_vgpu_ppgtt_spt *spt, unsigned long index,
> struct intel_gvt_gtt_entry *ge)
> {
> const struct intel_gvt_gtt_pte_ops *pte_ops = vgpu->gvt->gtt.pte_ops;
> dma_addr_t dma_addr = vgpu->gvt->gtt.scratch_mfn << PAGE_SHIFT;
> struct intel_gvt_gtt_entry se = *ge;
> unsigned long gfn;
> int ret;
>
> if (!pte_ops->test_present(ge))
> goto set_shadow_entry;
>
> gfn = pte_ops->get_pfn(ge);
> if (!intel_gvt_is_valid_gfn(vgpu, gfn, ge->type))
> goto set_shadow_entry;
As KVMGT only tracks PPGTT page table pages, this check here is not for page
track purpose, but to check bogus GFN.
So, Just leave the bogus GFN check to intel_gvt_dma_map_guest_page() through
VFIO is all right.
On the other hand, for the GFN validity for page track purpose, we can
leave it to kvm_write_track_add_gfn().
Do you think it's ok?
> ...
>
>
> set_shadow_entry:
> pte_ops->set_pfn(&se, dma_addr >> PAGE_SHIFT);
> ppgtt_set_shadow_entry(spt, &se, index);
> return 0;
> }
next prev parent reply other threads:[~2023-01-06 6:20 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-23 0:57 [Intel-gfx] [PATCH 00/27] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 01/27] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page" Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 02/27] KVM: x86/mmu: Factor out helper to get max mapping size of a memslot Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry Sean Christopherson
2022-12-28 5:42 ` Yan Zhao
2023-01-03 21:13 ` Sean Christopherson
2023-01-05 3:07 ` Yan Zhao
2023-01-05 17:40 ` Sean Christopherson
2023-01-06 5:56 ` Yan Zhao [this message]
2023-01-06 23:01 ` Sean Christopherson
2023-01-09 9:58 ` Yan Zhao
2023-01-11 17:55 ` Sean Christopherson
2023-01-19 2:58 ` Zhenyu Wang
2023-01-19 5:26 ` Yan Zhao
2023-02-23 20:41 ` Sean Christopherson
2023-02-24 5:09 ` Yan Zhao
2023-01-12 8:31 ` Yan Zhao
2022-12-23 0:57 ` [Intel-gfx] [PATCH 04/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 05/27] drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn() Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 06/27] drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M GTT Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 07/27] drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 08/27] drm/i915/gvt: Hoist acquisition of vgpu_lock out to kvmgt_page_track_write() Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 09/27] drm/i915/gvt: Protect gfn hash table with dedicated mutex Sean Christopherson
2022-12-28 5:03 ` Yan Zhao
2023-01-03 20:43 ` Sean Christopherson
2023-01-05 0:51 ` Yan Zhao
2022-12-23 0:57 ` [Intel-gfx] [PATCH 10/27] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 11/27] KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 12/27] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 13/27] KVM: x86: Reject memslot MOVE operations if KVMGT is attached Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 14/27] drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 15/27] KVM: x86: Add a new page-track hook to handle memslot deletion Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 16/27] drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region() Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 17/27] KVM: x86: Remove the unused page-track hook track_flush_slot() Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 18/27] KVM: x86/mmu: Move KVM-only page-track declarations to internal header Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 19/27] KVM: x86/mmu: Use page-track notifiers iff there are external users Sean Christopherson
2022-12-28 6:56 ` Yan Zhao
2023-01-04 0:50 ` Sean Christopherson
2023-08-07 12:01 ` Like Xu
2023-08-07 17:19 ` Sean Christopherson
2023-08-09 1:02 ` Yan Zhao
2023-08-09 14:33 ` Sean Christopherson
2023-08-09 23:21 ` Yan Zhao
2023-08-10 3:02 ` Yan Zhao
2023-08-10 15:41 ` Sean Christopherson
2023-08-11 5:57 ` Yan Zhao
2022-12-23 0:57 ` [Intel-gfx] [PATCH 20/27] KVM: x86/mmu: Drop infrastructure for multiple page-track modes Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 21/27] KVM: x86/mmu: Rename page-track APIs to reflect the new reality Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 22/27] KVM: x86/mmu: Assert that correct locks are held for page write-tracking Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 23/27] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 24/27] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 25/27] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 26/27] KVM: x86/mmu: Add page-track API to query if a gfn is valid Sean Christopherson
2022-12-28 7:57 ` Yan Zhao
2023-01-03 21:19 ` Sean Christopherson
2023-01-05 3:12 ` Yan Zhao
2023-01-05 17:53 ` Sean Christopherson
2022-12-23 0:57 ` [Intel-gfx] [PATCH 27/27] drm/i915/gvt: Drop final dependencies on KVM internal details Sean Christopherson
2022-12-23 1:28 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups Patchwork
2022-12-23 9:05 ` [Intel-gfx] [PATCH 00/27] " Yan Zhao
2023-01-04 1:01 ` Sean Christopherson
2023-01-05 3:13 ` Yan Zhao
2022-12-28 5:28 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups (rev2) Patchwork
2023-01-06 6:25 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups (rev3) Patchwork
2023-01-19 9:01 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups (rev4) Patchwork
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y7e3fT8/V2NoXAUP@yzhao56-desk.sh.intel.com \
--to=yan.y.zhao@intel.com \
--cc=bgardon@google.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox