From mboxrd@z Thu Jan 1 00:00:00 1970 From: Binbin Wu Date: Thu, 21 Sep 2023 13:51:34 +0800 Subject: [RFC PATCH v12 18/33] KVM: x86/mmu: Handle page fault for private memory In-Reply-To: References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-19-seanjc@google.com> Message-ID: List-Id: To: kvm-riscv@lists.infradead.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On 9/15/2023 10:26 PM, Sean Christopherson wrote: > On Fri, Sep 15, 2023, Yan Zhao wrote: >> On Wed, Sep 13, 2023 at 06:55:16PM -0700, Sean Christopherson wrote: >> .... >>> +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, >>> + struct kvm_page_fault *fault) >>> +{ >>> + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, >>> + PAGE_SIZE, fault->write, fault->exec, >>> + fault->is_private); >>> +} >>> + >>> +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, >>> + struct kvm_page_fault *fault) >>> +{ >>> + int max_order, r; >>> + >>> + if (!kvm_slot_can_be_private(fault->slot)) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return -EFAULT; >>> + } >>> + >>> + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, >>> + &max_order); >>> + if (r) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return r; >>> + } >>> + >>> + fault->max_level = min(kvm_max_level_for_order(max_order), >>> + fault->max_level); >>> + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); >>> + >>> + return RET_PF_CONTINUE; >>> +} >>> + >>> static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) >>> { >>> struct kvm_memory_slot *slot = fault->slot; >>> @@ -4293,6 +4356,14 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault >>> return RET_PF_EMULATE; >>> } >>> >>> + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { >> In patch 21, >> fault->is_private is set as: >> ".is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT)", >> then, the inequality here means memory attribute has been updated after >> last check. >> So, why an exit to user space for converting is required instead of a mere retry? >> >> Or, is it because how .is_private is assigned in patch 21 is subjected to change >> in future? > This. Retrying on SNP or TDX would hang the guest. I suppose we could special > case VMs where .is_private is derived from the memory attributes, but the > SW_PROTECTED_VM type is primary a development vehicle at this point. I'd like to > have it mimic SNP/TDX as much as possible; performance is a secondary concern. So when .is_private is derived from the memory attributes, and if I didn't miss anything, there is no explicit conversion mechanism introduced yet so far, does it mean for pure sw-protected VM (withouth SNP/TDX), the page fault will be handled according to the memory attributes setup by host/user vmm, no implicit conversion will be triggered, right? > > E.g. userspace needs to be prepared for "spurious" exits due to races on SNP and > TDX, which this can theoretically exercise. Though the window is quite small so > I doubt that'll actually happen in practice; which of course also makes it less > important to retry instead of exiting. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 24B822106 for ; Thu, 21 Sep 2023 05:51:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695275516; x=1726811516; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=oZ26NaKC2zv5YONE8y6jW8KHDfibqQbE4Lcj557476A=; b=eBv3B8nHTdPjPqzbO4ZZ1TiW3Utab3WGoLCWJNIvOgNtsIMi+N5qG93+ qwTCoWHTHPiTR9RDt2vosnNSaii582kt9l04vzF8al9b3/Nz+ZhsmjYnz XBlMEPjIMEKYzO8SMrgX5Di8pY+8lYZgsf1BOcjCysDoTaK2MerYkydux PWLx4MykFs/ly4VL/xodFhGDKHIlLNERrwSZYNAV8rjnRAiNCz7QSfaNy RN0g98omxlSOCB5meXgx/iIPnLcEpRxQ58YWEExqJseFLwzagyGmYTdeu oMRudh+3/0maVugmaudNcYERExcon9GAhA8e9VSw9bhAROBMvdklJ0dKW Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="466734464" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="466734464" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2023 22:51:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="837187245" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="837187245" Received: from binbinwu-mobl.ccr.corp.intel.com (HELO [10.93.17.222]) ([10.93.17.222]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2023 22:51:36 -0700 Message-ID: Date: Thu, 21 Sep 2023 13:51:34 +0800 Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [RFC PATCH v12 18/33] KVM: x86/mmu: Handle page fault for private memory To: Sean Christopherson , Yan Zhao Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-19-seanjc@google.com> From: Binbin Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 9/15/2023 10:26 PM, Sean Christopherson wrote: > On Fri, Sep 15, 2023, Yan Zhao wrote: >> On Wed, Sep 13, 2023 at 06:55:16PM -0700, Sean Christopherson wrote: >> .... >>> +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, >>> + struct kvm_page_fault *fault) >>> +{ >>> + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, >>> + PAGE_SIZE, fault->write, fault->exec, >>> + fault->is_private); >>> +} >>> + >>> +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, >>> + struct kvm_page_fault *fault) >>> +{ >>> + int max_order, r; >>> + >>> + if (!kvm_slot_can_be_private(fault->slot)) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return -EFAULT; >>> + } >>> + >>> + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, >>> + &max_order); >>> + if (r) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return r; >>> + } >>> + >>> + fault->max_level = min(kvm_max_level_for_order(max_order), >>> + fault->max_level); >>> + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); >>> + >>> + return RET_PF_CONTINUE; >>> +} >>> + >>> static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) >>> { >>> struct kvm_memory_slot *slot = fault->slot; >>> @@ -4293,6 +4356,14 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault >>> return RET_PF_EMULATE; >>> } >>> >>> + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { >> In patch 21, >> fault->is_private is set as: >> ".is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT)", >> then, the inequality here means memory attribute has been updated after >> last check. >> So, why an exit to user space for converting is required instead of a mere retry? >> >> Or, is it because how .is_private is assigned in patch 21 is subjected to change >> in future? > This. Retrying on SNP or TDX would hang the guest. I suppose we could special > case VMs where .is_private is derived from the memory attributes, but the > SW_PROTECTED_VM type is primary a development vehicle at this point. I'd like to > have it mimic SNP/TDX as much as possible; performance is a secondary concern. So when .is_private is derived from the memory attributes, and if I didn't miss anything, there is no explicit conversion mechanism introduced yet so far, does it mean for pure sw-protected VM (withouth SNP/TDX), the page fault will be handled according to the memory attributes setup by host/user vmm, no implicit conversion will be triggered, right? > > E.g. userspace needs to be prepared for "spurious" exits due to races on SNP and > TDX, which this can theoretically exercise. Though the window is quite small so > I doubt that'll actually happen in practice; which of course also makes it less > important to retry instead of exiting. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 209C0CD4958 for ; Thu, 21 Sep 2023 05:52:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To:Subject: MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=vfiRJ3gBpWV2VH+NlUwWOIUOkn6Y5fkZ2p47fkOez+8=; b=28UdPsscRwdGcE Tdj1bdOiHttQo7Y0689kjs69mXz7G7zSrP76M+OTAE/GpU4kN4v/BJ/kAveDLrlFnU+A5UNC2i60a KjE5gKvgfMF06Qn4ObQonpVqZUSFOIxZK0pHEXvJI3ZHGoo17xlC1zMhECqL8Ix8OhRERM2nqmzs2 U5d4U6Fo3UGNpsku9GZj44v8ckGasWzyimrgysLT6M1KthfElHWt++1GX2NnmoSdd7GQFUb/l/ywo xDtxB+a2EpHRVbddDcj8jDblSgg8KlfSMZ4uJfT9dVw3AfY2q2YDeITaKJlxZsiSQFDhhmCKxz5C5 eUFZ3FTNnpqimf+hssqw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qjCbY-005CvO-2J; Thu, 21 Sep 2023 05:52:00 +0000 Received: from mgamail.intel.com ([192.55.52.43]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qjCbU-005Cu5-35; Thu, 21 Sep 2023 05:51:58 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695275516; x=1726811516; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=oZ26NaKC2zv5YONE8y6jW8KHDfibqQbE4Lcj557476A=; b=eBv3B8nHTdPjPqzbO4ZZ1TiW3Utab3WGoLCWJNIvOgNtsIMi+N5qG93+ qwTCoWHTHPiTR9RDt2vosnNSaii582kt9l04vzF8al9b3/Nz+ZhsmjYnz XBlMEPjIMEKYzO8SMrgX5Di8pY+8lYZgsf1BOcjCysDoTaK2MerYkydux PWLx4MykFs/ly4VL/xodFhGDKHIlLNERrwSZYNAV8rjnRAiNCz7QSfaNy RN0g98omxlSOCB5meXgx/iIPnLcEpRxQ58YWEExqJseFLwzagyGmYTdeu oMRudh+3/0maVugmaudNcYERExcon9GAhA8e9VSw9bhAROBMvdklJ0dKW Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="466734475" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="466734475" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2023 22:51:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="837187245" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="837187245" Received: from binbinwu-mobl.ccr.corp.intel.com (HELO [10.93.17.222]) ([10.93.17.222]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2023 22:51:36 -0700 Message-ID: Date: Thu, 21 Sep 2023 13:51:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [RFC PATCH v12 18/33] KVM: x86/mmu: Handle page fault for private memory To: Sean Christopherson , Yan Zhao Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-19-seanjc@google.com> From: Binbin Wu In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230920_225157_061369_8ABF4FC2 X-CRM114-Status: GOOD ( 20.05 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On 9/15/2023 10:26 PM, Sean Christopherson wrote: > On Fri, Sep 15, 2023, Yan Zhao wrote: >> On Wed, Sep 13, 2023 at 06:55:16PM -0700, Sean Christopherson wrote: >> .... >>> +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, >>> + struct kvm_page_fault *fault) >>> +{ >>> + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, >>> + PAGE_SIZE, fault->write, fault->exec, >>> + fault->is_private); >>> +} >>> + >>> +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, >>> + struct kvm_page_fault *fault) >>> +{ >>> + int max_order, r; >>> + >>> + if (!kvm_slot_can_be_private(fault->slot)) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return -EFAULT; >>> + } >>> + >>> + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, >>> + &max_order); >>> + if (r) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return r; >>> + } >>> + >>> + fault->max_level = min(kvm_max_level_for_order(max_order), >>> + fault->max_level); >>> + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); >>> + >>> + return RET_PF_CONTINUE; >>> +} >>> + >>> static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) >>> { >>> struct kvm_memory_slot *slot = fault->slot; >>> @@ -4293,6 +4356,14 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault >>> return RET_PF_EMULATE; >>> } >>> >>> + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { >> In patch 21, >> fault->is_private is set as: >> ".is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT)", >> then, the inequality here means memory attribute has been updated after >> last check. >> So, why an exit to user space for converting is required instead of a mere retry? >> >> Or, is it because how .is_private is assigned in patch 21 is subjected to change >> in future? > This. Retrying on SNP or TDX would hang the guest. I suppose we could special > case VMs where .is_private is derived from the memory attributes, but the > SW_PROTECTED_VM type is primary a development vehicle at this point. I'd like to > have it mimic SNP/TDX as much as possible; performance is a secondary concern. So when .is_private is derived from the memory attributes, and if I didn't miss anything, there is no explicit conversion mechanism introduced yet so far, does it mean for pure sw-protected VM (withouth SNP/TDX), the page fault will be handled according to the memory attributes setup by host/user vmm, no implicit conversion will be triggered, right? > > E.g. userspace needs to be prepared for "spurious" exits due to races on SNP and > TDX, which this can theoretically exercise. Though the window is quite small so > I doubt that'll actually happen in practice; which of course also makes it less > important to retry instead of exiting. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AAA32CD4959 for ; Thu, 21 Sep 2023 05:52:57 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=CLz6/Zmx; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Rrl1N2Hhbz3cLv for ; Thu, 21 Sep 2023 15:52:56 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.a=rsa-sha256 header.s=Intel header.b=CLz6/Zmx; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=linux.intel.com (client-ip=192.55.52.43; helo=mgamail.intel.com; envelope-from=binbin.wu@linux.intel.com; receiver=lists.ozlabs.org) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4Rrl0L34ysz2ynB for ; Thu, 21 Sep 2023 15:52:00 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695275522; x=1726811522; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=oZ26NaKC2zv5YONE8y6jW8KHDfibqQbE4Lcj557476A=; b=CLz6/Zmx1HuzJzz0rbxY5fzIOxEtJbyK5nsT24g7mXkO3h1SFjPCUDk9 1EQcy267TEM/jjhymrrtD5t8GQDoWD1X09jD6JDBtk8qaxv4ESlRpmQci wH3pleyEdzAUmES5//OLfT8gKylRIqyyHwQM3pmDvS12YYN6HmV6tV+Le 1eAQzUxZEsa4bzs84zPcTYq1TwGDFEecH8QKWnd8+kRJ9DO9wd85M6yb9 imOhkf1CpHqGuliOAHZFdMZcdUpJeLszcO4xGGykd36lZaDZLhfmUgNpx m/ODZzUu6P5lTMNpFGOVNowL+e/XkZtrAezqO6y9CP92Xk9htit7jx2rW w==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="466734483" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="466734483" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2023 22:51:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="837187245" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="837187245" Received: from binbinwu-mobl.ccr.corp.intel.com (HELO [10.93.17.222]) ([10.93.17.222]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2023 22:51:36 -0700 Message-ID: Date: Thu, 21 Sep 2023 13:51:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [RFC PATCH v12 18/33] KVM: x86/mmu: Handle page fault for private memory To: Sean Christopherson , Yan Zhao References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-19-seanjc@google.com> From: Binbin Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kvm@vger.kernel.org, David Hildenbrand , Yu Zhang , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chao Peng , linux-riscv@lists.infradead.org, Isaku Yamahata , Paul Moore , Marc Zyngier , Huacai Chen , James Morris , "Matthew Wilcox \(Oracle\)" , Wang , Fuad Tabba , Jarkko Sakkinen , "Serge E. Hallyn" , Maciej Szmigiero , Albert Ou , Vlastimil Babka , Michael Roth , Ackerley Tng , Paul Walmsley , kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, Isaku Yamahata , Quentin Perret , Liam Merwick , linux-mips@vger.kernel.org, Oliver Upton , linux-security-module@vger.kernel.org, Palmer Dabbelt , "Kirill A . Shutemov" , kvm-riscv@lists.infradead.org, Anup Patel , linux-fsdevel@vger.kernel.org, Paolo Bonzini , Andrew Morton , Vishal Annapurve , linuxppc-dev@lists.ozlabs.org, Xu Yilun , Anish Moorthy Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 9/15/2023 10:26 PM, Sean Christopherson wrote: > On Fri, Sep 15, 2023, Yan Zhao wrote: >> On Wed, Sep 13, 2023 at 06:55:16PM -0700, Sean Christopherson wrote: >> .... >>> +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, >>> + struct kvm_page_fault *fault) >>> +{ >>> + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, >>> + PAGE_SIZE, fault->write, fault->exec, >>> + fault->is_private); >>> +} >>> + >>> +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, >>> + struct kvm_page_fault *fault) >>> +{ >>> + int max_order, r; >>> + >>> + if (!kvm_slot_can_be_private(fault->slot)) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return -EFAULT; >>> + } >>> + >>> + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, >>> + &max_order); >>> + if (r) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return r; >>> + } >>> + >>> + fault->max_level = min(kvm_max_level_for_order(max_order), >>> + fault->max_level); >>> + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); >>> + >>> + return RET_PF_CONTINUE; >>> +} >>> + >>> static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) >>> { >>> struct kvm_memory_slot *slot = fault->slot; >>> @@ -4293,6 +4356,14 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault >>> return RET_PF_EMULATE; >>> } >>> >>> + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { >> In patch 21, >> fault->is_private is set as: >> ".is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT)", >> then, the inequality here means memory attribute has been updated after >> last check. >> So, why an exit to user space for converting is required instead of a mere retry? >> >> Or, is it because how .is_private is assigned in patch 21 is subjected to change >> in future? > This. Retrying on SNP or TDX would hang the guest. I suppose we could special > case VMs where .is_private is derived from the memory attributes, but the > SW_PROTECTED_VM type is primary a development vehicle at this point. I'd like to > have it mimic SNP/TDX as much as possible; performance is a secondary concern. So when .is_private is derived from the memory attributes, and if I didn't miss anything, there is no explicit conversion mechanism introduced yet so far, does it mean for pure sw-protected VM (withouth SNP/TDX), the page fault will be handled according to the memory attributes setup by host/user vmm, no implicit conversion will be triggered, right? > > E.g. userspace needs to be prepared for "spurious" exits due to races on SNP and > TDX, which this can theoretically exercise. Though the window is quite small so > I doubt that'll actually happen in practice; which of course also makes it less > important to retry instead of exiting. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A6C6CCD4959 for ; Thu, 21 Sep 2023 05:52:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:Content-Type: Content-Transfer-Encoding:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To:Subject: MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=n8oNK77v6bN8hteACRn5XH5F7K6VRWrKiO/n6Bnq85E=; b=EJE87Eek0aet5U 3pXy6trrnoiA7VEVu2lp8FWwJ4TL9hBKOFsQCl+Jo6RK9wgqTtiNIQzy61Ip/1hQcKd3gp4Md53C3 erYgmE8WhJ8mVpeoAVHEnDUQj8Djzzf8kWUDSGrWEGIxaSPw6vz99+iz/CRRdqoS+Knnjcd4JCB9Z 5uUxrOB7xV/GbXVh69dUUWHi3DWWGbrZD4e/yr7zfr/kMPR4rNh1lfONnYKuBWOTgfyXON2m+c8Hq fgJPeD01DwNapyp1Mc0eeQyxrfkpsRA+bn6ck1gG/m4tl2Qmowbg4FeME/HsN+cDzCX5w0GrVX86a iqM+kbOb53uZn7F7sepg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qjCbX-005Cuw-1p; Thu, 21 Sep 2023 05:51:59 +0000 Received: from mgamail.intel.com ([192.55.52.43]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qjCbU-005Cu5-35; Thu, 21 Sep 2023 05:51:58 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695275516; x=1726811516; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=oZ26NaKC2zv5YONE8y6jW8KHDfibqQbE4Lcj557476A=; b=eBv3B8nHTdPjPqzbO4ZZ1TiW3Utab3WGoLCWJNIvOgNtsIMi+N5qG93+ qwTCoWHTHPiTR9RDt2vosnNSaii582kt9l04vzF8al9b3/Nz+ZhsmjYnz XBlMEPjIMEKYzO8SMrgX5Di8pY+8lYZgsf1BOcjCysDoTaK2MerYkydux PWLx4MykFs/ly4VL/xodFhGDKHIlLNERrwSZYNAV8rjnRAiNCz7QSfaNy RN0g98omxlSOCB5meXgx/iIPnLcEpRxQ58YWEExqJseFLwzagyGmYTdeu oMRudh+3/0maVugmaudNcYERExcon9GAhA8e9VSw9bhAROBMvdklJ0dKW Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="466734475" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="466734475" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2023 22:51:47 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="837187245" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="837187245" Received: from binbinwu-mobl.ccr.corp.intel.com (HELO [10.93.17.222]) ([10.93.17.222]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Sep 2023 22:51:36 -0700 Message-ID: Date: Thu, 21 Sep 2023 13:51:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [RFC PATCH v12 18/33] KVM: x86/mmu: Handle page fault for private memory To: Sean Christopherson , Yan Zhao Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-19-seanjc@google.com> From: Binbin Wu In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230920_225157_061369_8ABF4FC2 X-CRM114-Status: GOOD ( 20.05 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 9/15/2023 10:26 PM, Sean Christopherson wrote: > On Fri, Sep 15, 2023, Yan Zhao wrote: >> On Wed, Sep 13, 2023 at 06:55:16PM -0700, Sean Christopherson wrote: >> .... >>> +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, >>> + struct kvm_page_fault *fault) >>> +{ >>> + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, >>> + PAGE_SIZE, fault->write, fault->exec, >>> + fault->is_private); >>> +} >>> + >>> +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, >>> + struct kvm_page_fault *fault) >>> +{ >>> + int max_order, r; >>> + >>> + if (!kvm_slot_can_be_private(fault->slot)) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return -EFAULT; >>> + } >>> + >>> + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, >>> + &max_order); >>> + if (r) { >>> + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); >>> + return r; >>> + } >>> + >>> + fault->max_level = min(kvm_max_level_for_order(max_order), >>> + fault->max_level); >>> + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); >>> + >>> + return RET_PF_CONTINUE; >>> +} >>> + >>> static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) >>> { >>> struct kvm_memory_slot *slot = fault->slot; >>> @@ -4293,6 +4356,14 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault >>> return RET_PF_EMULATE; >>> } >>> >>> + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { >> In patch 21, >> fault->is_private is set as: >> ".is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT)", >> then, the inequality here means memory attribute has been updated after >> last check. >> So, why an exit to user space for converting is required instead of a mere retry? >> >> Or, is it because how .is_private is assigned in patch 21 is subjected to change >> in future? > This. Retrying on SNP or TDX would hang the guest. I suppose we could special > case VMs where .is_private is derived from the memory attributes, but the > SW_PROTECTED_VM type is primary a development vehicle at this point. I'd like to > have it mimic SNP/TDX as much as possible; performance is a secondary concern. So when .is_private is derived from the memory attributes, and if I didn't miss anything, there is no explicit conversion mechanism introduced yet so far, does it mean for pure sw-protected VM (withouth SNP/TDX), the page fault will be handled according to the memory attributes setup by host/user vmm, no implicit conversion will be triggered, right? > > E.g. userspace needs to be prepared for "spurious" exits due to races on SNP and > TDX, which this can theoretically exercise. Though the window is quite small so > I doubt that'll actually happen in practice; which of course also makes it less > important to retry instead of exiting. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel