From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBF3CE7D0D0 for ; Fri, 22 Sep 2023 06:03:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D3B006B01F9; Fri, 22 Sep 2023 02:03:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CE9D36B0248; Fri, 22 Sep 2023 02:03:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB18D6B025F; Fri, 22 Sep 2023 02:03:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id AB65A6B01F9 for ; Fri, 22 Sep 2023 02:03:44 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6494F160F04 for ; Fri, 22 Sep 2023 06:03:44 +0000 (UTC) X-FDA: 81263191968.28.B22D486 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.24]) by imf26.hostedemail.com (Postfix) with ESMTP id 8DA49140006 for ; Fri, 22 Sep 2023 06:03:41 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Xo8T4589; spf=pass (imf26.hostedemail.com: domain of xiaoyao.li@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=xiaoyao.li@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695362622; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f4mFCRZI8m/3asEWvcjcorN2PWqawgka1A/WbVZaV7c=; b=KyWtfJNwqW/WK2yBCvTwgoL4/HIbwZIh5tWD86gJF3QiHqagDHdklXU0g/EILPdFCZ3oK1 PqJ24xJLCfzZCPpm757X0DuSA1PAw+sBC4T0OmUAJ1WQYLwfxcRSYM+ktSUUixkzGkLmv5 dFB+ObhRdTmc9OlOXhOJZBaOalLG5+0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695362622; a=rsa-sha256; cv=none; b=WffcIsAgADtMf7eneyRQp/9rA3OPRmEmv+1L2bgyxCBp3ZMCbZzvCVNGry+SLcAdME1Etn A1AxiqZZP5nUWWAobWyAIga80LYLSbey8rxd43+AQ8nE69m39cGPSENI4xyBb1y2PZazO8 UcKiygmkU/g5ietX4YG8mAeriJKe3b8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Xo8T4589; spf=pass (imf26.hostedemail.com: domain of xiaoyao.li@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=xiaoyao.li@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695362621; x=1726898621; h=message-id:date:mime-version:from:subject:to:cc: references:in-reply-to:content-transfer-encoding; bh=H0XuqgbqUt5ljQRehQN74Kzg/lYi5jLocq1zdrh6djo=; b=Xo8T4589DUgecAbNIRhYS/ZEJZ6EeYRKGOQgTtMfESrOya1Pbl92+HU2 o07uSdbzlO/8wGKwMFxZTgRQT23dQYr3fFfpQdvw5iwSZNeD/lflARZfm fXoHQnMGOaQVKrxbbgq7xEZoMa9qNV/oS7z1JlLu2JHCtydlICh7im1uR ej1UM/kLBgXT3Glwc1j5tlxmrFLG7FNpO10Mgu/qBY7qUGYoBxHiG4SNv 01VQ+Ur2jIx4D7Q4b0x0SuGW2uEhwkc/hTLnCd6uAOI3a48f95O/aWOYQ V7QiNDI02HUNIWC8SAb0O0tfznhNmQAJbXdf4eMjjNsJAeKJQ0Ox+Q8Ka w==; X-IronPort-AV: E=McAfee;i="6600,9927,10840"; a="383496377" X-IronPort-AV: E=Sophos;i="6.03,167,1694761200"; d="scan'208";a="383496377" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 23:03:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10840"; a="921035819" X-IronPort-AV: E=Sophos;i="6.03,167,1694761200"; d="scan'208";a="921035819" Received: from xiaoyaol-hp-g830.ccr.corp.intel.com (HELO [10.93.11.250]) ([10.93.11.250]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 23:03:26 -0700 Message-ID: <117db856-9aec-e91c-b1d4-db2b90ae563d@intel.com> Date: Fri, 22 Sep 2023 14:03:23 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.15.1 From: Xiaoyao Li Subject: Re: [RFC PATCH v12 07/33] KVM: Add KVM_EXIT_MEMORY_FAULT exit to report faults to userspace To: Sean Christopherson , Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" Cc: kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-8-seanjc@google.com> Content-Language: en-US In-Reply-To: <20230914015531.1419405-8-seanjc@google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Stat-Signature: 9f1f1q7kc1jnn75h4f4sz96owbjwq61d X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 8DA49140006 X-Rspam-User: X-HE-Tag: 1695362621-731026 X-HE-Meta: U2FsdGVkX1+yL+3CFdlNkgBesd9rfvY+4Y51BXcPi3LkzIzQJH4CO7nHS7m5dGBnDyIuLQCzgryL2aJOkUh0M/t66OexebAC8FjnjstmG+5LYcJhIK7jjzOukmwFDQGMOcae6/t4ZOQa06EmrBjkHpbdM3W6KDmnZgW1SOabdwYE/+xH86nME9be0u7Mxm5bnHwNyys7rHayoNcPNuIrAx0bLC8Xni/rCCrgKdM1ndTJn1xBbnU7DtDvVHDK7RADeOqFJW6ErNSo2HMRY3AR/cc9lnUhreVt03d6gX4jCC74vOYW2hrirRsJ+KiocfY03TJhejXNpFGYEdk1larR+cEtBE83rL7uS4WkoNSPSOQ4ZzqS3laFUKJoL5pfJOWjsfDVTAm0fz07P5kJJKNqxkOFfSTEUEn3xK/QZJzxVm3qlVKsLSliozFVOdLoPPwcB5p87q9RoBU+NEk90guHbXzy+vNGgC922dMLkFj5MU/tLV9vfT4tjGdimYSSSr43hQzmKpzHqHMuY0hzhIA3A1EnNZAVo0GJlX7pWjeLXu5oMPtUh5kCPZynbXkWVagbhxpewNO1qc8MClFfd9iE+df/+xg7H89uN6IzE44R4TDwB8ZrNafElvJKw8SWAYkZ+RfSOAAiQd9324PA84Jk2qAvwYHgsqTXcygKLBC8tSr1ejYd0YfdFr5slz2L5IdCIVQ9r71QdkyBiWLD/RLSeNWrA+IHVyJvAcSPNI9rIXWdUZHlZfr8wzMFMCzqEytmnmvJSKzHtdhwlSKGLBXqKWYaVTBSG/YqgDluN7Ke7z4TWe1xeKae0rJpJr9CWunUe9Q9hRLcYM3E1KLROgUEXWYqxKmr8pcpl7zfhr0ZTVENGB4RxX9pPepegiNxxuRpe05Xi1RHT5Ak+OnRVIcc+94fYEnPsw0tOx/YePtqH2gW1DWI9S16SeOj+HAlm/r6zkdkGOuK/yvJlpjJetE Upa3DJAK PP1n/fkzeP8obxKICj6VFEgxGZtkZ3l4u9AQ/Ee9WDniI/xaUGXmKcd7Ivar5gpmrJ64XI3yusElDY5Cb2dk6UVvF2jZzpTCGOTrR1NGzP/RR4e1HPljrrfdSSQcEEIW7/gGph3QJwas5rc4bgUnJ+1fKH+jUEL4J/RDgxgBl7mWI5eKMrFftD4CUGsmNjTkZEZg6Vbj02/JjVKVjpT71k7sioEma1c/WeaLVoGU86kX2x+d29POSRjX6fi2vRGtLppi2M/7iBtOYnMpcxdw6WUwv5cH+mCSHRJNNMdXlC3sNo14bqCPKqP00GUoiy5xpw7A//x7SuQ859j30nVHSbDZuoqXGVnqnC++stkuqtIRYiTuwZkzmGrybmC+JeSYgo4yqKdvwCFa+1JY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/14/2023 9:55 AM, Sean Christopherson wrote: > From: Chao Peng > > Add a new KVM exit type to allow userspace to handle memory faults that > KVM cannot resolve, but that userspace *may* be able to handle (without > terminating the guest). > > KVM will initially use KVM_EXIT_MEMORY_FAULT to report implicit > conversions between private and shared memory. With guest private memory, > there will be two kind of memory conversions: > > - explicit conversion: happens when the guest explicitly calls into KVM > to map a range (as private or shared) > > - implicit conversion: happens when the guest attempts to access a gfn > that is configured in the "wrong" state (private vs. shared) > > On x86 (first architecture to support guest private memory), explicit > conversions will be reported via KVM_EXIT_HYPERCALL+KVM_HC_MAP_GPA_RANGE, side topic. Do we expect to integrate TDVMCALL(MAPGPA) of TDX into KVM_HC_MAP_GPA_RANGE? > but reporting KVM_EXIT_HYPERCALL for implicit conversions is undesriable > as there is (obviously) no hypercall, and there is no guarantee that the > guest actually intends to convert between private and shared, i.e. what > KVM thinks is an implicit conversion "request" could actually be the > result of a guest code bug. > > KVM_EXIT_MEMORY_FAULT will be used to report memory faults that appear to > be implicit conversions. > > Place "struct memory_fault" in a second anonymous union so that filling > memory_fault doesn't clobber state from other yet-to-be-fulfilled exits, > and to provide additional information if KVM does NOT ultimately exit to > userspace with KVM_EXIT_MEMORY_FAULT, e.g. if KVM suppresses (or worse, > loses) the exit, as KVM often suppresses exits for memory failures that > occur when accessing paravirt data structures. The initial usage for > private memory will be all-or-nothing, but other features such as the > proposed "userfault on missing mappings" support will use > KVM_EXIT_MEMORY_FAULT for potentially _all_ guest memory accesses, i.e. > will run afoul of KVM's various quirks. So when exit reason is KVM_EXIT_MEMORY_FAULT, how can we tell which field in the first union is valid? When exit reason is not KVM_EXIT_MEMORY_FAULT, how can we know the info in the second union run.memory is valid without a run.memory.valid field? > Use bit 3 for flagging private memory so that KVM can use bits 0-2 for > capturing RWX behavior if/when userspace needs such information. > > Note! To allow for future possibilities where KVM reports > KVM_EXIT_MEMORY_FAULT and fills run->memory_fault on _any_ unresolved > fault, KVM returns "-EFAULT" (-1 with errno == EFAULT from userspace's > perspective), not '0'! Due to historical baggage within KVM, exiting to > userspace with '0' from deep callstacks, e.g. in emulation paths, is > infeasible as doing so would require a near-complete overhaul of KVM, > whereas KVM already propagates -errno return codes to userspace even when > the -errno originated in a low level helper. > > Link: https://lore.kernel.org/all/20230908222905.1321305-5-amoorthy@google.com > Cc: Anish Moorthy > Suggested-by: Sean Christopherson > Co-developed-by: Yu Zhang > Signed-off-by: Yu Zhang > Signed-off-by: Chao Peng > Co-developed-by: Sean Christopherson > Signed-off-by: Sean Christopherson > --- > Documentation/virt/kvm/api.rst | 24 ++++++++++++++++++++++++ > include/linux/kvm_host.h | 15 +++++++++++++++ > include/uapi/linux/kvm.h | 24 ++++++++++++++++++++++++ > 3 files changed, 63 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 21a7578142a1..e28a13439a95 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6702,6 +6702,30 @@ array field represents return values. The userspace should update the return > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > spec refer, https://github.com/riscv/riscv-sbi-doc. > > +:: > + > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) > + __u64 flags; > + __u64 gpa; > + __u64 size; > + } memory; > + > +KVM_EXIT_MEMORY_FAULT indicates the vCPU has encountered a memory fault that > +could not be resolved by KVM. The 'gpa' and 'size' (in bytes) describe the > +guest physical address range [gpa, gpa + size) of the fault. The 'flags' field > +describes properties of the faulting access that are likely pertinent: > + > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred > + on a private memory access. When clear, indicates the fault occurred on a > + shared access. > + > +Note! KVM_EXIT_MEMORY_FAULT is unique among all KVM exit reasons in that it > +accompanies a return code of '-1', not '0'! errno will always be set to EFAULT > +or EHWPOISON when KVM exits with KVM_EXIT_MEMORY_FAULT, userspace should assume > +kvm_run.exit_reason is stale/undefined for all other error numbers. > + Initially, this section is the copy of struct kvm_run and had comments for each field accordingly. Unfortunately, the consistence has not been well maintained during the new filed being added. Do we expect to fix it? > :: > > /* KVM_EXIT_NOTIFY */ > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 4e741ff27af3..d8c6ce6c8211 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -2327,4 +2327,19 @@ static inline void kvm_account_pgtable_pages(void *virt, int nr) > /* Max number of entries allowed for each kvm dirty ring */ > #define KVM_DIRTY_RING_MAX_ENTRIES 65536 > > +static inline void kvm_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, > + gpa_t gpa, gpa_t size, > + bool is_write, bool is_exec, > + bool is_private) > +{ > + vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; > + vcpu->run->memory_fault.gpa = gpa; > + vcpu->run->memory_fault.size = size; > + > + /* RWX flags are not (yet) defined or communicated to userspace. */ > + vcpu->run->memory_fault.flags = 0; > + if (is_private) > + vcpu->run->memory_fault.flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE; > +} > + > #endif > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index bd1abe067f28..d2d913acf0df 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -274,6 +274,7 @@ struct kvm_xen_exit { > #define KVM_EXIT_RISCV_SBI 35 > #define KVM_EXIT_RISCV_CSR 36 > #define KVM_EXIT_NOTIFY 37 > +#define KVM_EXIT_MEMORY_FAULT 38 > > /* For KVM_EXIT_INTERNAL_ERROR */ > /* Emulate instruction failed. */ > @@ -541,6 +542,29 @@ struct kvm_run { > struct kvm_sync_regs regs; > char padding[SYNC_REGS_SIZE_BYTES]; > } s; > + > + /* > + * This second exit union holds structs for exit types which may be > + * triggered after KVM has already initiated a different exit, or which > + * may be ultimately dropped by KVM. > + * > + * For example, because of limitations in KVM's uAPI, KVM x86 can > + * generate a memory fault exit an MMIO exit is initiated (exit_reason > + * and kvm_run.mmio are filled). And conversely, KVM often disables > + * paravirt features if a memory fault occurs when accessing paravirt > + * data instead of reporting the error to userspace. > + */ > + union { > + /* KVM_EXIT_MEMORY_FAULT */ > + struct { > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1ULL << 3) > + __u64 flags; > + __u64 gpa; > + __u64 size; > + } memory_fault; > + /* Fix the size of the union. */ > + char padding2[256]; > + }; > }; > > /* for KVM_REGISTER_COALESCED_MMIO / KVM_UNREGISTER_COALESCED_MMIO */