From: Sean Christopherson <seanjc@google.com>
To: Yosry Ahmed <yosry@kernel.org>
Cc: Peter Fang <peter.fang@intel.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Nicholas Piggin <npiggin@gmail.com>,
Ritesh Harjani <ritesh.list@gmail.com>,
Michael Ellerman <mpe@ellerman.id.au>,
"Christophe Leroy (CS GROUP)" <chleroy@kernel.org>,
Thomas Gleixner <tglx@kernel.org>, Ingo Molnar <mingo@redhat.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
kvm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 3/3] KVM: Take gpa_t in kvm_vcpu_map[_readonly]()
Date: Wed, 22 Apr 2026 00:27:08 +0000 [thread overview]
Message-ID: <aegV3ELp20DlC5Ge@google.com> (raw)
In-Reply-To: <CAO9r8zOsHdsMuyjWr3XafjNnMKDci1OOdw8eJHrH-O+jrDH10Q@mail.gmail.com>
On Tue, Apr 21, 2026, Yosry Ahmed wrote:
> On Tue, Apr 21, 2026 at 4:29 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Tue, Apr 21, 2026, Sean Christopherson wrote:
> > > On Tue, Apr 21, 2026, Yosry Ahmed wrote:
> > > > On Tue, Apr 07, 2026 at 05:11:30PM -0700, Peter Fang wrote:
> > > > > Move the conversion from a gpa_t to a gfn_t into kvm_vcpu_map() and
> > > > > kvm_vcpu_map_readonly() so that they take a gpa_t directly, reducing
> > > > > boilerplate at call sites.
> > > > >
> > > > > __kvm_vcpu_map() still takes a gfn_t because guest page mapping is
> > > > > fundamentally GFN-based.
> > > > >
> > > > > No functional change intended.
> > > > >
> > > > > Compile-tested on x86 and ppc, which are the current users of these
> > > > > interfaces.
> > > > >
> > > > > Suggested-by: Yosry Ahmed <yosry@kernel.org>
> > > > > Signed-off-by: Peter Fang <peter.fang@intel.com>
> > > > > ---
> > > >
> > > > I was going to suggest a WARN in kvm_vcpu_map() and
> > > > kvm_vcpu_map_readonly() if the passed GPA is not page-aligned, but Sean
> > > > usually hates my paranoid WARN suggestions.
> > >
> > > Heh, for good reason. Adding such a WARN would be triggered by this code:
> > >
> > > if (!kvm_vcpu_map(vcpu, vmcs12->posted_intr_desc_addr, map)) {
> > > vmx->nested.pi_desc =
> > > (struct pi_desc *)(((void *)map->hva) +
> > > offset_in_page(vmcs12->posted_intr_desc_addr));
> > >
> > > The PI descriptor only needs to be 64-bit aligned, not page-aligned.
To answer your other question: yes, 64-byte, not 64-bit.
> > To elaborate a bit, I'm all for adding WARNs in flows where something bad is all
> > but guaranteed to happen if an assumption is violated, or in APIs where there's
> > a history of goofs and/or subtlety in how the API behaves.
> >
> > What I'm against is adding WARNs because someone could write bad code in the
> > future, or because KVM doesn't do XYZ at this time. Such WARNs usualy just add
> > noise, and can even be actively harmful. E.g. in this case, ignoring the PID
> > usage, a reader might look at the WARN and think it's _wrong_ to map a page in
> > order to access a subset of the page, which is just not true.
>
> Yeah I agree with most/all of your objections to my suggestions, it's
> usually that I don't have enough context to understand how the WARN
> could be harmful (like here), or am just being too paranoid or
> defending against bad code as you mentioned. I was mentioning your
> objections semi-sarcastically and intentionally bringing up the WARN
> in case it's actually useful.
>
> Taking a step back, what I really want to clarify and/or detect misuse
> of, is that kvm_vcpu_map() will map exactly one page, the one that the
> GPA lies in. For example, there's nothing protecting against the PID
> address being the last byte of the page,
Well, technically there is:
CC(!kvm_vcpu_is_legal_aligned_gpa(vcpu, vmcs12->posted_intr_desc_addr, 64))))
return -EINVAL;
But I'm pretty sure what you're saying is that "nothing in the common helper code
protects against a stupid caller".
> in which case accessing all of it would be wrong as it spans the mapped page
> boundary. This is difficult to hit if you are passing in a GFN, as it's more
> obvious that KVM is mapping one physical page.
Eh, that just leads to a different class of bugs (and possible even *worse* bugs).
E.g. caller passes in a GFN, accesses the kernel mapping beyond the page, and
reads/write arbitrary kernel memory (pfn_valid() case using the direct map), or
hits a !PRESENT #PF (remap case).
> Perhaps we just need to rename the functions (e.g.
> kvm_vcpu_map_page()), or more intrusively pass in a size and do bounds
> checking.
Definitely the latter. Or both I guess, but probably just the latter.
Commit 025dde582bbf ("KVM: Harden guest memory APIs against out-of-bounds accesses")
added that type of hardening for the "slow" APIs, exactly because of the type of
OOB bug you're describing: commit f559b2e9c5c5 ("KVM: nSVM: Ignore nCR3[4:0] when
loading PDPTEs from memory").
Actually, that's actually useful feedback for patch 3. __kvm_vcpu_map() should
do the GPA=>GFN conversion, not its caller.
Anyways, back to the hardening. We can do it with minimal additional churn. After
patch 3 (passing a @gpa to __kvm_vcpu_map(), not a @gfn), do the below over a few
patches (completely untested). This way the common case of mapping and accessing
an entire page Just Works, and flows like the PI descriptor handling don't have to
many provide the length (which also can be error prone).
---
arch/x86/kvm/vmx/nested.c | 12 ++++--------
include/linux/kvm_host.h | 22 ++++++++++++++++------
virt/kvm/kvm_main.c | 28 +++++++++++++++++++---------
3 files changed, 39 insertions(+), 23 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ee3ff76a8678..eb75d97c7453 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3453,7 +3453,7 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
map = &vmx->nested.virtual_apic_map;
if (!kvm_vcpu_map(vcpu, vmcs12->virtual_apic_page_addr, map)) {
- vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, pfn_to_hpa(map->pfn));
+ vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, map->hpa));
} else if (nested_cpu_has(vmcs12, CPU_BASED_CR8_LOAD_EXITING) &&
nested_cpu_has(vmcs12, CPU_BASED_CR8_STORE_EXITING) &&
!nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) {
@@ -3478,12 +3478,9 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
if (nested_cpu_has_posted_intr(vmcs12)) {
map = &vmx->nested.pi_desc_map;
- if (!kvm_vcpu_map(vcpu, vmcs12->posted_intr_desc_addr, map)) {
- vmx->nested.pi_desc =
- (struct pi_desc *)(((void *)map->hva) +
- offset_in_page(vmcs12->posted_intr_desc_addr));
- vmcs_write64(POSTED_INTR_DESC_ADDR,
- pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr));
+ if (!kvm_vcpu_map_ptr(vcpu, vmcs12->posted_intr_desc_addr,
+ vmx->nested.pi_desc, map)) {
+ vmcs_write64(POSTED_INTR_DESC_ADDR, map->hpa);
} else {
/*
* Defer the KVM_INTERNAL_EXIT until KVM tries to
@@ -3491,7 +3488,6 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu)
* descriptor. (Note that KVM may do this when it
* should not, per the architectural specification.)
*/
- vmx->nested.pi_desc = NULL;
pin_controls_clearbit(vmx, PIN_BASED_POSTED_INTR);
}
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 893a8c76a665..da6f08aa0ac4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -291,10 +291,11 @@ struct kvm_host_map {
*/
struct page *pinned_page;
struct page *page;
- void *hva;
- kvm_pfn_t pfn;
kvm_pfn_t gfn;
bool writable;
+
+ hpa_t hpa;
+ void *hva;
};
/*
@@ -1893,22 +1894,31 @@ static inline hpa_t pfn_to_hpa(kvm_pfn_t pfn)
return (hpa_t)pfn << PAGE_SHIFT;
}
-int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
- bool writable);
+int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, gpa_t len,
+ struct kvm_host_map *map, bool writable);
void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map);
static inline int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa,
struct kvm_host_map *map)
{
- return __kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), map, true);
+ return __kvm_vcpu_map(vcpu, gpa, PAGE_SIZE, map, true);
}
static inline int kvm_vcpu_map_readonly(struct kvm_vcpu *vcpu, gpa_t gpa,
struct kvm_host_map *map)
{
- return __kvm_vcpu_map(vcpu, gpa_to_gfn(gpa), map, false);
+ return __kvm_vcpu_map(vcpu, gpa, PAGE_SIZE, map, false);
}
+#define kvm_vcpu_map_ptr(__vcpu, __gpa, __ptr, __map) \
+({ \
+ int r; \
+ \
+ r = __kvm_vcpu_map(__vcpu, __gpa, sizeof(*(__ptr)), __map, true); \
+ __ptr = !r ? (__map)->hva : NULL; \
+ r; \
+})
+
static inline void kvm_vcpu_map_mark_dirty(struct kvm_vcpu *vcpu,
struct kvm_host_map *map)
{
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9093251beb39..e8d2e98b0068 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3114,9 +3114,10 @@ struct page *__gfn_to_page(struct kvm *kvm, gfn_t gfn, bool write)
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(__gfn_to_page);
-int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
- bool writable)
+int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, gpa_t len,
+ struct kvm_host_map *map, bool writable)
{
+ gfn_t gfn = gpa_to_gfn(gpa);
struct kvm_follow_pfn kfp = {
.slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn),
.gfn = gfn,
@@ -3124,6 +3125,10 @@ int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
.refcounted_page = &map->pinned_page,
.pin = true,
};
+ kvm_pfn_t pfn;
+
+ if (WARN_ON_ONCE(offset_in_page(gpa) + len > PAGE_SIZE))
+ return -EINVAL;
map->pinned_page = NULL;
map->page = NULL;
@@ -3131,20 +3136,25 @@ int __kvm_vcpu_map(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_host_map *map,
map->gfn = gfn;
map->writable = writable;
- map->pfn = kvm_follow_pfn(&kfp);
- if (is_error_noslot_pfn(map->pfn))
+ pfn = kvm_follow_pfn(&kfp);
+ if (is_error_noslot_pfn(pfn))
return -EINVAL;
- if (pfn_valid(map->pfn)) {
- map->page = pfn_to_page(map->pfn);
+ map->hpa = pfn_to_hpa(pfn);
+ if (pfn_valid(pfn)) {
+ map->page = pfn_to_page(pfn);
map->hva = kmap(map->page);
#ifdef CONFIG_HAS_IOMEM
} else {
- map->hva = memremap(pfn_to_hpa(map->pfn), PAGE_SIZE, MEMREMAP_WB);
+ map->hva = memremap(map->hpa, PAGE_SIZE, MEMREMAP_WB);
+ if (!map->hva)
+ return -EFAULT;
#endif
}
- return map->hva ? 0 : -EFAULT;
+ map->hpa += offset_in_page(gpa);
+ map->hva += offset_in_page(gpa);
+ return 0;
}
EXPORT_SYMBOL_FOR_KVM_INTERNAL(__kvm_vcpu_map);
@@ -3157,7 +3167,7 @@ void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map)
kunmap(map->page);
#ifdef CONFIG_HAS_IOMEM
else
- memunmap(map->hva);
+ memunmap(PTR_ALIGN_DOWN(map->hva, PAGE_SIZE));
#endif
if (map->writable)
base-commit: d9d61b2f6793deb6b72aa792ae70c09fa26fda37
--
prev parent reply other threads:[~2026-04-22 0:27 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 0:11 [PATCH v2 0/3] KVM: Fix and clean up kvm_vcpu_map[_readonly]() usages Peter Fang
2026-04-08 0:11 ` [PATCH v2 1/3] KVM: Fix kvm_vcpu_map[_readonly]() function prototypes Peter Fang
2026-04-21 23:05 ` Yosry Ahmed
2026-04-08 0:11 ` [PATCH v2 2/3] KVM: Move page mapping/unmapping APIs in kvm_host.h Peter Fang
2026-04-21 23:06 ` Yosry Ahmed
2026-04-08 0:11 ` [PATCH v2 3/3] KVM: Take gpa_t in kvm_vcpu_map[_readonly]() Peter Fang
2026-04-21 23:08 ` Yosry Ahmed
2026-04-21 23:19 ` Sean Christopherson
2026-04-21 23:25 ` Yosry Ahmed
2026-04-21 23:29 ` Sean Christopherson
2026-04-21 23:41 ` Yosry Ahmed
2026-04-22 0:27 ` Sean Christopherson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aegV3ELp20DlC5Ge@google.com \
--to=seanjc@google.com \
--cc=bp@alien8.de \
--cc=chleroy@kernel.org \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=pbonzini@redhat.com \
--cc=peter.fang@intel.com \
--cc=ritesh.list@gmail.com \
--cc=tglx@kernel.org \
--cc=x86@kernel.org \
--cc=yosry@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox