From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7B76C4332F for ; Sun, 5 Nov 2023 16:33:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 642FC440160; Sun, 5 Nov 2023 11:33:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5F23C440150; Sun, 5 Nov 2023 11:33:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41ECA440160; Sun, 5 Nov 2023 11:33:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 2EFBD440150 for ; Sun, 5 Nov 2023 11:33:18 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D7F5C1601D6 for ; Sun, 5 Nov 2023 16:33:17 +0000 (UTC) X-FDA: 81424445634.05.6AFD0E7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf06.hostedemail.com (Postfix) with ESMTP id 0BC1C18000E for ; Sun, 5 Nov 2023 16:33:15 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CIE3hpQf; spf=pass (imf06.hostedemail.com: domain of pbonzini@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=pbonzini@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699201996; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FUHANpSUCQq/Vn3VmA5evrMLSRbIjN8Q7uWLc36lk+E=; b=KoBTsGUVeFeMlldP4Al0Wte8T2utGejXEyj5LcnPtBF68e2dmsHFxpgXkHw1KfdvbGjZU1 HlB4havrv3C+LbXpJpkJHlKolc26jjJ21Ca9DRik88EbLSkwZq97VTN3EV3uKuD9xBS5kV xW+WD7/zmjDczaKBo+1TzTLkXX9oQKM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699201996; a=rsa-sha256; cv=none; b=FjN6MLBl7ri/bCiUGrcp75Y4hCduR0PR2IAQJ5fnGcBbp/lKW5FRfbhKBpmoLOTuAgWOkM bDf7KZb6YCKk2HEMJOV/e7ILbL19b6GTBXSAtIXD0tCLCOIe+ZBdT+xJ3ostJe2czOocf3 WOSca1snZLdajz/Scl1jBzr6cZPLyks= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CIE3hpQf; spf=pass (imf06.hostedemail.com: domain of pbonzini@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=pbonzini@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1699201995; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FUHANpSUCQq/Vn3VmA5evrMLSRbIjN8Q7uWLc36lk+E=; b=CIE3hpQfq4plAj5hK27WTjhI776PTUjAmDLDjHWvcNsBYG3VwmPaDkcR4T7OyR6rIDS0uZ RwBofpeD3cQLQQZWMqOqtUeNkQLKAnbKCzvhLEgsPXpjThM2pCtQv81c+aBsrvg8Ky9Gha V2vAHe3PRzZFUqLMzh7Q+uNG+jH10lg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-606-GPp6iyB1NIaokvN2T6Vd7g-1; Sun, 05 Nov 2023 11:33:12 -0500 X-MC-Unique: GPp6iyB1NIaokvN2T6Vd7g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E3EE5185A782; Sun, 5 Nov 2023 16:33:09 +0000 (UTC) Received: from avogadro.redhat.com (unknown [10.39.192.93]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0D7222166B28; Sun, 5 Nov 2023 16:33:01 +0000 (UTC) From: Paolo Bonzini To: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , Alexander Viro , Christian Brauner , "Matthew Wilcox (Oracle)" , Andrew Morton Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Xiaoyao Li , Xu Yilun , Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , David Matlack , Yu Zhang , Isaku Yamahata , =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A. Shutemov" Subject: [PATCH 18/34] KVM: x86/mmu: Handle page fault for private memory Date: Sun, 5 Nov 2023 17:30:21 +0100 Message-ID: <20231105163040.14904-19-pbonzini@redhat.com> In-Reply-To: <20231105163040.14904-1-pbonzini@redhat.com> References: <20231105163040.14904-1-pbonzini@redhat.com> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Rspamd-Queue-Id: 0BC1C18000E X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: j8qpg9amgcipqu8r3niufdis7cu7dzka X-HE-Tag: 1699201995-9046 X-HE-Meta: U2FsdGVkX191NgaIq2QPVNy9vVJaZP6yJ7GgPKyNFpBxnOzPhOdl1Ul3s/KkO50VzAAxgv+MGjbWVkKHN1x5FyJ8Ry8KxG3acV1zhKuGfZNaH2hJNEnbvpmibAB5jlu23xIK7WoOunKT7PU1JNZKZwlZS2S6LSBT8A+BDTxCuPhShLX74sxwgIrsDqVCgqes+taY0z/vhO/M+qk/9F+JE2jM0IZ/X990FT1rZg8TWTBJQ/KjCuMehCgx91UltKhr2euEk54Ut7E1al/P8VvJmilZh3pwNF2faf905gAkxrasKch86xRkvD8E6H39ECidaWVKQ3gbkBPDduPgbyaRXASM315o3AXSHAOBRqqBMMK01WsuKgK8bcLSs/itHad2bKZZ4NXpV7xjW6h0O4fhiDAZecziOyXT9K7kzm3XLmmLHpgcW3SUHTHujIqyjtcOQn0STBS3euEMYxjBaNcuhpKwGx31grwqk78sHVbX/r6pfc/auJloyWeBEQmc6JQTek84cSz4/A2FAQfydpYgWkSxB6fV2fTYkg7mZirrJMgS9M1NZw7+P5rA7MRg/BdOhC1MVoAcX8bZ7WEA2cZK5XIHYSyqj5JtYaaGnyhLAm+MRofW6sfYzk9691SpfEHpD0c4GGq1N+Yr+kwBm/yog6g9npqhM54bIVDJREGLed7nVtpEMLoCgonWvl92kREicGEBqcGi+8iiqGZirndCUM8mbzuXpgaLKMWAuiF2sYx860kFzdeEIkTEPHVTA8k46JLdlySrGkmot2yKtl86CCbpFDhszj/vUQw+G5bIzkNwU/My8//e1Tlkfveft0rBkxN7CjkfPLsX0B6QhRCr+ZRqp1F3FM8zLiTDe1MgOnvAutlJiXBmCH37sa99or6Dmj5wIbB/kIrA4czLoteyi15K4eP09AYC/z3A0zNLKHXsGiJhSXfZVJP90n/9EUn3Vld7unReRb3poMrUqYB H0bYFDbP gVoJ7dr4awvu59hLWT3u/IhxpzIoO0jd8+F+o9T2cyWQ6TtW3viITYaCwGZpCVoOl7ooWYUQX9b5epVzsmX4ucOil6gqcY/jLuV/8zejUB/qoT/55wXJncA+KEnxwmvSFgNoqaeV2X1S5zoRCT82NWW8UXmwWbZG8wQCx1fb8nXWC66vhLmPr+1kxXds4/pjRp/O+2UpeWUO+g0O7ikKDuBEyB5FKGXHfnh3pv/+Ozwx0+6zk1LPjzrfgNuAx69xONOvMhOmpJTtBiqqep4x9TlQP4BvASFHD4EFVZ2Dmb9A7n868NL6HLk1BFLLxCr247ztt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Chao Peng Add support for resolving page faults on guest private memory for VMs that differentiate between "shared" and "private" memory. For such VMs, KVM_MEM_PRIVATE memslots can include both fd-based private memory and hva-based shared memory, and KVM needs to map in the "correct" variant, i.e. KVM needs to map the gfn shared/private as appropriate based on the current state of the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE flag. For AMD's SEV-SNP and Intel's TDX, the guest effectively gets to request shared vs. private via a bit in the guest page tables, i.e. what the guest wants may conflict with the current memory attributes. To support such "implicit" conversion requests, exit to user with KVM_EXIT_MEMORY_FAULT to forward the request to userspace. Add a new flag for memory faults, KVM_MEMORY_EXIT_FLAG_PRIVATE, to communicate whether the guest wants to map memory as shared vs. private. Like KVM_MEMORY_ATTRIBUTE_PRIVATE, use bit 3 for flagging private memory so that KVM can use bits 0-2 for capturing RWX behavior if/when userspace needs such information, e.g. a likely user of KVM_EXIT_MEMORY_FAULT is to exit on missing mappings when handling guest page fault VM-Exits. In that case, userspace will want to know RWX information in order to correctly/precisely resolve the fault. Note, private memory *must* be backed by guest_memfd, i.e. shared mappings always come from the host userspace page tables, and private mappings always come from a guest_memfd instance. Co-developed-by: Yu Zhang Signed-off-by: Yu Zhang Signed-off-by: Chao Peng Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Reviewed-by: Fuad Tabba Tested-by: Fuad Tabba Message-Id: <20231027182217.3615211-21-seanjc@google.com> Signed-off-by: Paolo Bonzini --- Documentation/virt/kvm/api.rst | 8 ++- arch/x86/kvm/mmu/mmu.c | 101 ++++++++++++++++++++++++++++++-- arch/x86/kvm/mmu/mmu_internal.h | 1 + include/linux/kvm_host.h | 8 ++- include/uapi/linux/kvm.h | 1 + 5 files changed, 110 insertions(+), 9 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 6d681f45969e..4a9a291380ad 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6953,6 +6953,7 @@ spec refer, https://github.com/riscv/riscv-sbi-doc. /* KVM_EXIT_MEMORY_FAULT */ struct { + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) __u64 flags; __u64 gpa; __u64 size; @@ -6961,8 +6962,11 @@ spec refer, https://github.com/riscv/riscv-sbi-doc. KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the guest physical address range [gpa, gpa + size) of the fault. The 'flags' field -describes properties of the faulting access that are likely pertinent. -Currently, no flags are defined. +describes properties of the faulting access that are likely pertinent: + + - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred + on a private memory access. When clear, indicates the fault occurred on a + shared access. Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it accompanies a return code of '-1', not '0'! errno will always be set to EFAULT diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index f5c6b0643645..754a5aaebee5 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3147,9 +3147,9 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, return level; } -int kvm_mmu_max_mapping_level(struct kvm *kvm, - const struct kvm_memory_slot *slot, gfn_t gfn, - int max_level) +static int __kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, + gfn_t gfn, int max_level, bool is_private) { struct kvm_lpage_info *linfo; int host_level; @@ -3161,6 +3161,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, break; } + if (is_private) + return max_level; + if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; @@ -3168,6 +3171,16 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, return min(host_level, max_level); } +int kvm_mmu_max_mapping_level(struct kvm *kvm, + const struct kvm_memory_slot *slot, gfn_t gfn, + int max_level) +{ + bool is_private = kvm_slot_can_be_private(slot) && + kvm_mem_is_private(kvm, gfn); + + return __kvm_mmu_max_mapping_level(kvm, slot, gfn, max_level, is_private); +} + void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; @@ -3188,8 +3201,9 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault * Enforce the iTLB multihit workaround after capturing the requested * level, which will be used to do precise, accurate accounting. */ - fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, - fault->gfn, fault->max_level); + fault->req_level = __kvm_mmu_max_mapping_level(vcpu->kvm, slot, + fault->gfn, fault->max_level, + fault->is_private); if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed) return; @@ -4269,6 +4283,55 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true, NULL); } +static inline u8 kvm_max_level_for_order(int order) +{ + BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G); + + KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K)); + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G)) + return PG_LEVEL_1G; + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M)) + return PG_LEVEL_2M; + + return PG_LEVEL_4K; +} + +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, + PAGE_SIZE, fault->write, fault->exec, + fault->is_private); +} + +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, + struct kvm_page_fault *fault) +{ + int max_order, r; + + if (!kvm_slot_can_be_private(fault->slot)) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return -EFAULT; + } + + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, + &max_order); + if (r) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return r; + } + + fault->max_level = min(kvm_max_level_for_order(max_order), + fault->max_level); + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); + + return RET_PF_CONTINUE; +} + static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; @@ -4301,6 +4364,14 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault return RET_PF_EMULATE; } + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); + return -EFAULT; + } + + if (fault->is_private) + return kvm_faultin_pfn_private(vcpu, fault); + async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, false, &async, fault->write, &fault->map_writable, @@ -7188,6 +7259,26 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm) } #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES +bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm, + struct kvm_gfn_range *range) +{ + /* + * Zap SPTEs even if the slot can't be mapped PRIVATE. KVM x86 only + * supports KVM_MEMORY_ATTRIBUTE_PRIVATE, and so it *seems* like KVM + * can simply ignore such slots. But if userspace is making memory + * PRIVATE, then KVM must prevent the guest from accessing the memory + * as shared. And if userspace is making memory SHARED and this point + * is reached, then at least one page within the range was previously + * PRIVATE, i.e. the slot's possible hugepage ranges are changing. + * Zapping SPTEs in this case ensures KVM will reassess whether or not + * a hugepage can be used for affected ranges. + */ + if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm))) + return false; + + return kvm_unmap_gfn_range(kvm, range); +} + static bool hugepage_test_mixed(struct kvm_memory_slot *slot, gfn_t gfn, int level) { diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index decc1f153669..86c7cb692786 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -201,6 +201,7 @@ struct kvm_page_fault { /* Derived from mmu and global state. */ const bool is_tdp; + const bool is_private; const bool nx_huge_page_workaround_enabled; /* diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a6de526c0426..67dfd4d79529 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2357,14 +2357,18 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) #define KVM_DIRTY_RING_MAX_ENTRIES 65536 static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, - gpa_t gpa, gpa_t size) + gpa_t gpa, gpa_t size, + bool is_write, bool is_exec, + bool is_private) { vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; vcpu->run->memory_fault.gpa = gpa; vcpu->run->memory_fault.size = size; - /* Flags are not (yet) defined or communicated to userspace. */ + /* RWX flags are not (yet) defined or communicated to userspace. */ vcpu->run->memory_fault.flags = 0; + if (is_private) + vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; } #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2802d10aa88c..8eb10f560c69 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -535,6 +535,7 @@ struct kvm_run { } notify; /* KVM_EXIT_MEMORY_FAULT */ struct { +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) __u64 flags; __u64 gpa; __u64 size; -- 2.39.1