From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3134EC5AD49 for ; Fri, 6 Jun 2025 18:12:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B9F686B0088; Fri, 6 Jun 2025 14:12:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B770C6B008C; Fri, 6 Jun 2025 14:12:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8D386B0092; Fri, 6 Jun 2025 14:12:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 893AF6B0088 for ; Fri, 6 Jun 2025 14:12:01 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 475691205C5 for ; Fri, 6 Jun 2025 18:12:01 +0000 (UTC) X-FDA: 83525769642.26.5A1ED9E Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf26.hostedemail.com (Postfix) with ESMTP id 5ABAB140013 for ; Fri, 6 Jun 2025 18:11:59 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=PqIGjUL6; spf=pass (imf26.hostedemail.com: domain of 3bi9DaAYKCCMRD9MIBFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--seanjc.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3bi9DaAYKCCMRD9MIBFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749233519; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=goGsGlGK4aCvgRlt6QQZwnV9SY/etRc0Yb1ngROaDOc=; b=LgWgSYct29WHJxNDsyn1YEZ1BB8fLwRNaPXmC38QqHrdrOexSVh4tNW7H0z7LgHHENYTLw du4d4qUqoXqqxFo8AtLcFNSsAqJzfAubXbw43zStU6Nso7/BLEi6encf1etLmWe/UnsN8U xFq/XbIvGwstCyCs168Rz8DHlUoW7+E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749233519; a=rsa-sha256; cv=none; b=Us4914/NPCk/7uCa9pXYCLGltiVJH54oOmG+KnFTHtcNSb0f2ertLk6ij36s3C9AV2rlLb aA+lR5Pio+21pjO2Ik9gYUWloNJa5pT4bKvmC/K5EYsDJHyta7m1KpW7rdKIzElhgmX7b2 J4N/wzeLF3rx+JwgsvrErHvG+8ksMmY= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=PqIGjUL6; spf=pass (imf26.hostedemail.com: domain of 3bi9DaAYKCCMRD9MIBFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--seanjc.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3bi9DaAYKCCMRD9MIBFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-310efe825ccso2525600a91.3 for ; Fri, 06 Jun 2025 11:11:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1749233518; x=1749838318; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=goGsGlGK4aCvgRlt6QQZwnV9SY/etRc0Yb1ngROaDOc=; b=PqIGjUL60eMRnjQNDh8Wh2JTgeHJ0wE4xIa0X3P8KA+XKKd35X3Icl8EBXWlJO8d8m +lOWuMr1Vp0O7/2vN1E9hOToZSLrJzGSy9//FpHEUDJvzZTJGXpJMOvU/vQb2WXzXUHy L4JyDtxXjpF0/pSEy1fzAfjOg5CGx2IIBR/b8O2tJixbfOKJHe51EWVDN6V1LInw5Qz1 7A3p1m3075mml7KuQNReNQf4WxkyGsbFK7o0N0q0KCOHbNZITLmRvNVW2OmxK/WMJCew IerHuJLREkECLuSzR3u1PvyZMapMjWAGZlvRv48kHt8g90qAJ3X3cOinvxAuIyp6ZQmN Kq/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749233518; x=1749838318; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=goGsGlGK4aCvgRlt6QQZwnV9SY/etRc0Yb1ngROaDOc=; b=IPu5AWP4z3WJZqHdtCuXi/Yhka9bwFwERwCO2RG7lOPOrldPY5eGTzdd1AhUrJAzAK jRNlJ2ncAkje+gpZlWJDYw44RfdJZcOvy2inA7DQ53srdJEyHdyUfIjFSDeqn3Ei4LuN LAGz2TMTvBPnh4YhT5l1vQppsfOMGCjsCff0/lvHcYtu3/yLWHXBhulKKe6G65XQQD2M I94i5m3GH3jjH3NocboEpStUS14XudB5GET/HBzH30yfi0r5aIIl1hP0syb8Y8z436K1 eJnmZ46J6fqttVApPjE0g5IspFYUkHUCeRTMQFIWVmnOZKxcS5iTeTsCl4We2LsXsRLK 804g== X-Forwarded-Encrypted: i=1; AJvYcCWqhi5Tdp6bO851ud4TY/GkihIQT+tpfOi/HHYCR4Q66PybPTqpt5KO7CQABXokTwOugoKsaLKmtg==@kvack.org X-Gm-Message-State: AOJu0Yy4yKhIyBuKqbbFRclFcEbT3MXNCHF+2Ksb761bMiaOBEw7mGen lTigxA34lHPGh5GelisO7UF0d2AFha5cy3bBLZhZgzAtirtTZeblwOYmqRrh25nfRpSE6QrThku 0kEtkKA== X-Google-Smtp-Source: AGHT+IEL48Zr+RVpxcph90N0P9k6HL0QyILrRx6484oP33DwOnxzZeVG7VJ6lc1LW/g9pPx5ZMAaWFJF2Pc= X-Received: from pjbqd13.prod.google.com ([2002:a17:90b:3ccd:b0:312:4274:c4ce]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:17c5:b0:313:33ca:3b8b with SMTP id 98e67ed59e1d1-31346b21437mr6696240a91.9.1749233518162; Fri, 06 Jun 2025 11:11:58 -0700 (PDT) Date: Fri, 6 Jun 2025 11:11:56 -0700 In-Reply-To: <20250524013943.2832-2-ankita@nvidia.com> Mime-Version: 1.0 References: <20250524013943.2832-1-ankita@nvidia.com> <20250524013943.2832-2-ankita@nvidia.com> Message-ID: Subject: Re: [PATCH v6 1/5] KVM: arm64: Block cacheable PFNMAP mapping From: Sean Christopherson To: ankita@nvidia.com Cc: jgg@nvidia.com, maz@kernel.org, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, ryan.roberts@arm.com, shahuang@redhat.com, lpieralisi@kernel.org, david@redhat.com, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, kjaju@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, zhiw@nvidia.com, mochs@nvidia.com, udhoke@nvidia.com, dnigam@nvidia.com, alex.williamson@redhat.com, sebastianene@google.com, coltonlewis@google.com, kevin.tian@intel.com, yi.l.liu@intel.com, ardb@kernel.org, akpm@linux-foundation.org, gshan@redhat.com, linux-mm@kvack.org, ddutile@redhat.com, tabba@google.com, qperret@google.com, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, maobibo@loongson.cn Content-Type: text/plain; charset="us-ascii" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5ABAB140013 X-Stat-Signature: 9qzi9sgjm5r8tdw88wnt4y558nub34kp X-Rspam-User: X-HE-Tag: 1749233519-625278 X-HE-Meta: U2FsdGVkX1/5vO15/WGxrhySXIdgDdFqbZtOtak0CmCmFAZdPd/01WCEpcUcFxb3js8vqatfIIpivrn0aDIrtXtTxjiBNMZXkwCvHV+d49Dr6orpcD+7LhZXB0EUfQrtAiumflCtwtXRPprPhABFpOdbk84nCHqyxi1jquWK+GCnBuSyMsRFTD7v3npY1n41eh0yZXC2OvxZ3HqiM13SN2XWot0axK2lhJkC/JJe6ISTIf/SZwNozKWVElUquiBpXfTSkqeq9lMyiCz7250aVl00roiqJ2+kgV/PnEOa/aYoKTh+L/cuOp2Mu1OUdqsKJ07aaX0If1qLbQYoUmuI7QactF8WskiM/pKzLyPQSkvhI11rqiIbs/xGcLN13Wq2h7Dt0vORuwD5XlZgUk0TNCKmdY8ouPTeBobbmzphx6FW/RXcxPQhATEdDG9VnZK8yIwJyjjFqDjex0O1baEJxeGgzWwWX1bzKnLkBB6O3edX+4gDFa3WTWTZEUeU5hdnQl93/nXYF/pcPcXQ/cNy7qBseWw1QvG989+txEsH9C71ZO7tuBkGBm2z6m9OPgFE17feL8QMsRB5duZhuh6MhKIZlXx0izYLNePFvh9em53bJk2cm1QYWGE+GWHz71USb7KGN6SF00NpGSbVfgOLyyheuhQ16Q8COC8Hn7/dM5N62d58PrhhnnpsCTLmuMzimyJsdAQXUrz17AdaI3UMo5b6cqxkXYi4eEo0vBS09Cf6iZHpsLZyRdEEp/nIY5yettPcLnKG/r/ktYoxnn/WuIp3d/ReciTdTfIvnIzLLuybbXcxhnwAxj53OcVB1IPPowX+2Zw383GdVls+2DP+TaJ/YGTcR4yZchfN3S8ITfL5QaqRvv9b4Lrj3iwcTOp1HQIEhrQd/YR3zP7y7AVjY5XEuzyILdQMQUEHciG014XJ5iOIg76QgtOe5SuS/wnIZ8HN0MGhDo8cdHdAfpl 6L+KDZ3N CQaAOTnPdkooSNCiSXU5dhBdDWE233D+HPjdu/8L/JV1ETgjTvb3zjwaMBeP0yD2OK3xie9b3t5mB6JqvYaeOn6scpuCu5JhgKDuvXj1RTos/Q62tgQDM5Q0julZrfAOFNXAmmkDOBQgLxVeqUsUA4LG4/LNvUclbqFv4wDgmv9gHLEel6DG5+4jrIrO87A6aZ5dwXD06sAbHdILlqa6DgIeR/dWUWPsrqUUcUkbBtvnLpdpC4OaSLVyIknirpI6h9Y1JfeK38VWPKGLcXguar01GGJ2rh+mdpvj0/Y9whyga9Vy1OkeLuc3rLpFTGYSpEjQhnJo2SnxC9PFIye2Ruc4aEyk6PDTPi/fxuikqp7MbXngf//wltIUjRvLbgTmk1rhoNHUObtYGOD/g075+aZ/xtd7siTaPCkWM6zi8V8LtKMQ9EdqmJW8YRjltIZI39J+hrHHYUhnjmwYXL/mwOYrY2kKYHz4CXm1MyK5gUG1XWmKbp4WDmelxxADM3ue7b133zmg9etIYhP4nTXT1HU9jnf+XxwBDm8n+DWyyft4Y3ZcqrhjWcZJMo4mrl0KAwpfy3Z3BNH/2JzBDyxerCufBuuLPOSCbkPXFKNRm0cVon6ZdqJrGzPHoQY6HsP7tT2xpH51iKYqMU7iTh+A1vWvK0g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, May 24, 2025, ankita@nvidia.com wrote: > From: Ankit Agrawal > > Fixes a security bug due to mismatched attributes between S1 and > S2 mapping. > > Currently, it is possible for a region to be cacheable in S1, but mapped > non cached in S2. This creates a potential issue where the VMM may > sanitize cacheable memory across VMs using cacheable stores, ensuring > it is zeroed. However, if KVM subsequently assigns this memory to a VM > as uncached, the VM could end up accessing stale, non-zeroed data from > a previous VM, leading to unintended data exposure. This is a security > risk. > > Block such mismatch attributes case by returning EINVAL when userspace > try to map PFNMAP cacheable. Only allow NORMAL_NC and DEVICE_*. > > CC: Oliver Upton > CC: Sean Christopherson > CC: Catalin Marinas > Suggested-by: Jason Gunthorpe > Signed-off-by: Ankit Agrawal > --- > arch/arm64/kvm/mmu.c | 22 ++++++++++++++++++++++ > 1 file changed, 22 insertions(+) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 2feb6c6b63af..305a0e054f81 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1466,6 +1466,18 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) > return vma->vm_flags & VM_MTE_ALLOWED; > } > > +/* > + * Determine the memory region cacheability from VMA's pgprot. This > + * is used to set the stage 2 PTEs. > + */ > +static unsigned long mapping_type_noncacheable(pgprot_t page_prot) Return a bool. And given that all the usage queries cachaeable, maybe invert this predicate? > +{ > + unsigned long mt = FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(page_prot)); > + > + return (mt == MT_NORMAL_NC || mt == MT_DEVICE_nGnRnE || > + mt == MT_DEVICE_nGnRE); > +} No need for the parantheses. And since the values are clumped together, maybe use a switch statement to let the compiler optimize the checks (though I'm guessing modern compilers will optimize either way). E.g. static bool kvm_vma_is_cacheable(struct vm_area_struct *vma) { switch (FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(vma->vm_page_prot))) { case MT_NORMAL_NC: case MT_DEVICE_nGnRnE: case MT_DEVICE_nGnRE: return false; default: return true; } } > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > struct kvm_s2_trans *nested, > struct kvm_memory_slot *memslot, unsigned long hva, > @@ -1612,6 +1624,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED; > > + if ((vma->vm_flags & VM_PFNMAP) && > + !mapping_type_noncacheable(vma->vm_page_prot)) I don't think this is correct, and there's a very real chance this will break existing setups. PFNMAP memory isn't strictly device memory, and IIUC, KVM force DEVICE/NORMAL_NC based on kvm_is_device_pfn(), not based on VM_PFNMAP. if (kvm_is_device_pfn(pfn)) { /* * If the page was identified as device early by looking at * the VMA flags, vma_pagesize is already representing the * largest quantity we can map. If instead it was mapped * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE * and must not be upgraded. * * In both cases, we don't let transparent_hugepage_adjust() * change things at the last minute. */ device = true; } if (device) { if (vfio_allow_any_uc) prot |= KVM_PGTABLE_PROT_NORMAL_NC; else prot |= KVM_PGTABLE_PROT_DEVICE; } else if (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) && (!nested || kvm_s2_trans_executable(nested))) { prot |= KVM_PGTABLE_PROT_X; } which gets morphed into the hardware memtype attributes as: switch (prot & (KVM_PGTABLE_PROT_DEVICE | KVM_PGTABLE_PROT_NORMAL_NC)) { case KVM_PGTABLE_PROT_DEVICE | KVM_PGTABLE_PROT_NORMAL_NC: return -EINVAL; case KVM_PGTABLE_PROT_DEVICE: if (prot & KVM_PGTABLE_PROT_X) return -EINVAL; attr = KVM_S2_MEMATTR(pgt, DEVICE_nGnRE); break; case KVM_PGTABLE_PROT_NORMAL_NC: if (prot & KVM_PGTABLE_PROT_X) return -EINVAL; attr = KVM_S2_MEMATTR(pgt, NORMAL_NC); break; default: attr = KVM_S2_MEMATTR(pgt, NORMAL); } E.g. if the admin hides RAM from the kernel and manages it in userspace via /dev/mem, this will break (I think). So I believe what you want is something like this: diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index eeda92330ade..4129ab5ac871 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1466,6 +1466,18 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) return vma->vm_flags & VM_MTE_ALLOWED; } +static bool kvm_vma_is_cacheable(struct vm_area_struct *vma) +{ + switch (FIELD_GET(PTE_ATTRINDX_MASK, pgprot_val(vma->vm_page_prot))) { + case MT_NORMAL_NC: + case MT_DEVICE_nGnRnE: + case MT_DEVICE_nGnRE: + return false; + default: + return true; + } +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_s2_trans *nested, struct kvm_memory_slot *memslot, unsigned long hva, @@ -1473,7 +1485,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, { int ret = 0; bool write_fault, writable, force_pte = false; - bool exec_fault, mte_allowed; + bool exec_fault, mte_allowed, is_vma_cacheable; bool device = false, vfio_allow_any_uc = false; unsigned long mmu_seq; phys_addr_t ipa = fault_ipa; @@ -1615,6 +1627,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, vfio_allow_any_uc = vma->vm_flags & VM_ALLOW_ANY_UNCACHED; + is_vma_cacheable = kvm_vma_is_cacheable(vma); + /* Don't use the VMA after the unlock -- it may have vanished */ vma = NULL; @@ -1639,6 +1653,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; if (kvm_is_device_pfn(pfn)) { + if (is_vma_cacheable) + return -EINVAL; + /* * If the page was identified as device early by looking at * the VMA flags, vma_pagesize is already representing the @@ -1722,6 +1739,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, prot |= KVM_PGTABLE_PROT_X; if (device) { + if (is_vma_cacheable) { + ret = -EINVAL; + goto out; + } + if (vfio_allow_any_uc) prot |= KVM_PGTABLE_PROT_NORMAL_NC; else