From: Sean Christopherson <seanjc@google.com>
To: ankita@nvidia.com
Cc: jgg@nvidia.com, maz@kernel.org, oliver.upton@linux.dev,
joey.gouly@arm.com, suzuki.poulose@arm.com,
yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org,
ryan.roberts@arm.com, shahuang@redhat.com,
lpieralisi@kernel.org, david@redhat.com, aniketa@nvidia.com,
cjia@nvidia.com, kwankhede@nvidia.com, kjaju@nvidia.com,
targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
zhiw@nvidia.com, mochs@nvidia.com, udhoke@nvidia.com,
dnigam@nvidia.com, alex.williamson@redhat.com,
sebastianene@google.com, coltonlewis@google.com,
kevin.tian@intel.com, yi.l.liu@intel.com, ardb@kernel.org,
akpm@linux-foundation.org, gshan@redhat.com, linux-mm@kvack.org,
ddutile@redhat.com, tabba@google.com, qperret@google.com,
kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, maobibo@loongson.cn
Subject: Re: [PATCH v6 1/5] KVM: arm64: Block cacheable PFNMAP mapping
Date: Fri, 6 Jun 2025 11:11:56 -0700 [thread overview]
Message-ID: <aEMvbIu530nCqwhG@google.com> (raw)
In-Reply-To: <20250524013943.2832-2-ankita@nvidia.com>
On Sat, May 24, 2025, ankita@nvidia.com wrote:
> From: Ankit Agrawal <ankita@nvidia.com>
>
> Fixes a security bug due to mismatched attributes between S1 and
> S2 mapping.
>
> Currently, it is possible for a region to be cacheable in S1, but mapped
> non cached in S2. This creates a potential issue where the VMM may
> sanitize cacheable memory across VMs using cacheable stores, ensuring
> it is zeroed. However, if KVM subsequently assigns this memory to a VM
> as uncached, the VM could end up accessing stale, non-zeroed data from
> a previous VM, leading to unintended data exposure. This is a security
> risk.
>
> Block such mismatch attributes case by returning EINVAL when userspace
> try to map PFNMAP cacheable. Only allow NORMAL_NC and DEVICE_*.
>
> CC: Oliver Upton <oliver.upton@linux.dev>
> CC: Sean Christopherson <seanjc@google.com>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> ---
> arch/arm64/kvm/mmu.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 2feb6c6b63af..305a0e054f81 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1466,6 +1466,18 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
> return vma->vm_flags & VM_MTE_ALLOWED;
> }
>
> +/*
> + * Determine the memory region cacheability from VMA's pgprot. This
> + * is used to set the stage 2 PTEs.
> + */
> +static unsigned long mapping_type_noncacheable(pgprot_t page_prot)
Return a bool. And given that all the usage queries cachaeable, maybe invert
this predicate?
> +{
> + unsigned long mt = FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(page_prot));
> +
> + return (mt == MT_NORMAL_NC || mt == MT_DEVICE_nGnRnE ||
> + mt == MT_DEVICE_nGnRE);
> +}
No need for the parantheses. And since the values are clumped together, maybe
use a switch statement to let the compiler optimize the checks (though I'm
guessing modern compilers will optimize either way).
E.g.
static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
{
switch (FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(vma->vm_page_prot))) {
case MT_NORMAL_NC:
case MT_DEVICE_nGnRnE:
case MT_DEVICE_nGnRE:
return false;
default:
return true;
}
}
> static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> struct kvm_s2_trans *nested,
> struct kvm_memory_slot *memslot, unsigned long hva,
> @@ -1612,6 +1624,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>
> vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
>
> + if ((vma->vm_flags & VM_PFNMAP) &&
> + !mapping_type_noncacheable(vma->vm_page_prot))
I don't think this is correct, and there's a very real chance this will break
existing setups. PFNMAP memory isn't strictly device memory, and IIUC, KVM
force DEVICE/NORMAL_NC based on kvm_is_device_pfn(), not based on VM_PFNMAP.
if (kvm_is_device_pfn(pfn)) {
/*
* If the page was identified as device early by looking at
* the VMA flags, vma_pagesize is already representing the
* largest quantity we can map. If instead it was mapped
* via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
* and must not be upgraded.
*
* In both cases, we don't let transparent_hugepage_adjust()
* change things at the last minute.
*/
device = true;
}
if (device) {
if (vfio_allow_any_uc)
prot |= KVM_PGTABLE_PROT_NORMAL_NC;
else
prot |= KVM_PGTABLE_PROT_DEVICE;
} else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) &&
(!nested || kvm_s2_trans_executable(nested))) {
prot |= KVM_PGTABLE_PROT_X;
}
which gets morphed into the hardware memtype attributes as:
switch (prot & (KVM_PGTABLE_PROT_DEVICE |
KVM_PGTABLE_PROT_NORMAL_NC)) {
case KVM_PGTABLE_PROT_DEVICE | KVM_PGTABLE_PROT_NORMAL_NC:
return -EINVAL;
case KVM_PGTABLE_PROT_DEVICE:
if (prot & KVM_PGTABLE_PROT_X)
return -EINVAL;
attr = KVM_S2_MEMATTR(pgt, DEVICE_nGnRE);
break;
case KVM_PGTABLE_PROT_NORMAL_NC:
if (prot & KVM_PGTABLE_PROT_X)
return -EINVAL;
attr = KVM_S2_MEMATTR(pgt, NORMAL_NC);
break;
default:
attr = KVM_S2_MEMATTR(pgt, NORMAL);
}
E.g. if the admin hides RAM from the kernel and manages it in userspace via
/dev/mem, this will break (I think).
So I believe what you want is something like this:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index eeda92330ade..4129ab5ac871 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1466,6 +1466,18 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma)
return vma->vm_flags & VM_MTE_ALLOWED;
}
+static bool kvm_vma_is_cacheable(struct vm_area_struct *vma)
+{
+ switch (FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(vma->vm_page_prot))) {
+ case MT_NORMAL_NC:
+ case MT_DEVICE_nGnRnE:
+ case MT_DEVICE_nGnRE:
+ return false;
+ default:
+ return true;
+ }
+}
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_s2_trans *nested,
struct kvm_memory_slot *memslot, unsigned long hva,
@@ -1473,7 +1485,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
{
int ret = 0;
bool write_fault, writable, force_pte = false;
- bool exec_fault, mte_allowed;
+ bool exec_fault, mte_allowed, is_vma_cacheable;
bool device = false, vfio_allow_any_uc = false;
unsigned long mmu_seq;
phys_addr_t ipa = fault_ipa;
@@ -1615,6 +1627,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED;
+ is_vma_cacheable = kvm_vma_is_cacheable(vma);
+
/* Don't use the VMA after the unlock -- it may have vanished */
vma = NULL;
@@ -1639,6 +1653,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
if (kvm_is_device_pfn(pfn)) {
+ if (is_vma_cacheable)
+ return -EINVAL;
+
/*
* If the page was identified as device early by looking at
* the VMA flags, vma_pagesize is already representing the
@@ -1722,6 +1739,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
prot |= KVM_PGTABLE_PROT_X;
if (device) {
+ if (is_vma_cacheable) {
+ ret = -EINVAL;
+ goto out;
+ }
+
if (vfio_allow_any_uc)
prot |= KVM_PGTABLE_PROT_NORMAL_NC;
else
next prev parent reply other threads:[~2025-06-06 18:12 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-24 1:39 [PATCH v6 0/5] KVM: arm64: Map GPU device memory as cacheable ankita
2025-05-24 1:39 ` [PATCH v6 1/5] KVM: arm64: Block cacheable PFNMAP mapping ankita
2025-05-26 15:25 ` Jason Gunthorpe
2025-05-27 4:04 ` Ankit Agrawal
2025-06-06 18:11 ` Sean Christopherson [this message]
2025-06-09 12:24 ` Jason Gunthorpe
2025-06-09 14:21 ` Sean Christopherson
2025-05-24 1:39 ` [PATCH v6 2/5] KVM: arm64: New function to determine hardware cache management support ankita
2025-05-27 0:25 ` Jason Gunthorpe
2025-05-24 1:39 ` [PATCH v6 3/5] kvm: arm64: New memslot flag to indicate cacheable mapping ankita
2025-05-27 0:26 ` Jason Gunthorpe
2025-05-27 4:33 ` Ankit Agrawal
2025-06-02 4:42 ` Ankit Agrawal
2025-06-06 17:57 ` Sean Christopherson
2025-06-13 19:38 ` Oliver Upton
2025-06-16 11:37 ` Ankit Agrawal
2025-05-24 1:39 ` [PATCH v6 4/5] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags ankita
2025-06-06 18:14 ` Sean Christopherson
2025-05-24 1:39 ` [PATCH v6 5/5] KVM: arm64: Expose new KVM cap for cacheable PFNMAP ankita
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aEMvbIu530nCqwhG@google.com \
--to=seanjc@google.com \
--cc=acurrid@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=aniketa@nvidia.com \
--cc=ankita@nvidia.com \
--cc=apopple@nvidia.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cjia@nvidia.com \
--cc=coltonlewis@google.com \
--cc=danw@nvidia.com \
--cc=david@redhat.com \
--cc=ddutile@redhat.com \
--cc=dnigam@nvidia.com \
--cc=gshan@redhat.com \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=joey.gouly@arm.com \
--cc=kevin.tian@intel.com \
--cc=kjaju@nvidia.com \
--cc=kvmarm@lists.linux.dev \
--cc=kwankhede@nvidia.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lpieralisi@kernel.org \
--cc=maobibo@loongson.cn \
--cc=maz@kernel.org \
--cc=mochs@nvidia.com \
--cc=oliver.upton@linux.dev \
--cc=qperret@google.com \
--cc=ryan.roberts@arm.com \
--cc=sebastianene@google.com \
--cc=shahuang@redhat.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=targupta@nvidia.com \
--cc=udhoke@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
--cc=yi.l.liu@intel.com \
--cc=yuzenghui@huawei.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).