From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 358AEEE0214 for ; Fri, 15 Sep 2023 14:26:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235724AbjIOO00 (ORCPT ); Fri, 15 Sep 2023 10:26:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235710AbjIOO0Z (ORCPT ); Fri, 15 Sep 2023 10:26:25 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C36D1FC9 for ; Fri, 15 Sep 2023 07:26:19 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d814634fe4bso2702153276.1 for ; Fri, 15 Sep 2023 07:26:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1694787978; x=1695392778; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=n3wwM0IxTv36mTuzbjanMwII1EwyeoxiFmog5O5ddM4=; b=fq/Z51Vl+wsdyQeFFTigKSJR/xZ5ekxgC0VPW+sxlOZjsXFQYVeCN1l4NLXvaxsfPX lBZW4eUXMZ5ZpCEcIK9CTAo/kYuG0IuVEK/nJYBUDODEvqV8S2yQXSb8opZue6gk+gR7 7TKaLmemzjsULzHlVAfxEzORHOEbfrBfrZv0y+UHE3yuSV0QWuuO7/TStU6B1Gu5eE1U 793YcHzKnwESwHWSIud2iQfG4jbTVZniCyUdD/aFjkGGzcGwzZ/Q/A8ymOekRe7cZMXx R2AVrzJjxkBpflOkQj5nyQ7CSVTLosaDnlUtjdyPCGxHfPnIvrJ0XxWa1lidx7r5AMDt dP1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694787978; x=1695392778; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=n3wwM0IxTv36mTuzbjanMwII1EwyeoxiFmog5O5ddM4=; b=D5qNrmaLVPtoYlzI0N7G0cI7t+O74T732Yf+v5BjDbjQUxu71fgCouMUz9Zc3RH0x7 M6XpN7YclxXaP4oF4IiYMjTPzvtqyWmJV/X9KOxvzM0h4rppsb8MvDDOfz0ldazyC2Zl veOnzp1dZxNXFAbDtkW6h3ySWVHvwLfd2gF7e08fsoWLj3N903iJRQf4CDGzEZ6e7Ru+ DpFa4WMsAARyIhOxBrXDCHh6jCp5tvm70TiHTX9SVO1yxcN3DJP2QbKpA5g1M9IJlF0a un04fdhC4/XUvDyHJE+FmhyLqrKO9znPFKAPgduPcjMWCOBcaiB+6tkE1pnWVv+fTac9 cKIg== X-Gm-Message-State: AOJu0Yx5zVgaroqaFdIjfqryOlgAgDJ7qBz1IPvEgP4cwUAEKOyV3CaN Uosc4WA5rVpKJvM4pQtvZC4tfH0k5LY= X-Google-Smtp-Source: AGHT+IEtd+ZFiwjGOxGj1IwoYm6uY+zMlhHeo3LWWwltDCF+cgsfqNfOOXS7Bflkew1pSHX6ZJTGM/96YR0= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:bc8:0:b0:d80:eb4:9ca with SMTP id 191-20020a250bc8000000b00d800eb409camr37310ybl.0.1694787978719; Fri, 15 Sep 2023 07:26:18 -0700 (PDT) Date: Fri, 15 Sep 2023 07:26:16 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-19-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v12 18/33] KVM: x86/mmu: Handle page fault for private memory From: Sean Christopherson To: Yan Zhao Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Fuad Tabba , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="us-ascii" Precedence: bulk List-ID: On Fri, Sep 15, 2023, Yan Zhao wrote: > On Wed, Sep 13, 2023 at 06:55:16PM -0700, Sean Christopherson wrote: > .... > > +static void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu, > > + struct kvm_page_fault *fault) > > +{ > > + kvm_prepare_memory_fault_exit(vcpu, fault->gfn << PAGE_SHIFT, > > + PAGE_SIZE, fault->write, fault->exec, > > + fault->is_private); > > +} > > + > > +static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu, > > + struct kvm_page_fault *fault) > > +{ > > + int max_order, r; > > + > > + if (!kvm_slot_can_be_private(fault->slot)) { > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > > + return -EFAULT; > > + } > > + > > + r = kvm_gmem_get_pfn(vcpu->kvm, fault->slot, fault->gfn, &fault->pfn, > > + &max_order); > > + if (r) { > > + kvm_mmu_prepare_memory_fault_exit(vcpu, fault); > > + return r; > > + } > > + > > + fault->max_level = min(kvm_max_level_for_order(max_order), > > + fault->max_level); > > + fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); > > + > > + return RET_PF_CONTINUE; > > +} > > + > > static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) > > { > > struct kvm_memory_slot *slot = fault->slot; > > @@ -4293,6 +4356,14 @@ static int __kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault > > return RET_PF_EMULATE; > > } > > > > + if (fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { > In patch 21, > fault->is_private is set as: > ".is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT)", > then, the inequality here means memory attribute has been updated after > last check. > So, why an exit to user space for converting is required instead of a mere retry? > > Or, is it because how .is_private is assigned in patch 21 is subjected to change > in future? This. Retrying on SNP or TDX would hang the guest. I suppose we could special case VMs where .is_private is derived from the memory attributes, but the SW_PROTECTED_VM type is primary a development vehicle at this point. I'd like to have it mimic SNP/TDX as much as possible; performance is a secondary concern. E.g. userspace needs to be prepared for "spurious" exits due to races on SNP and TDX, which this can theoretically exercise. Though the window is quite small so I doubt that'll actually happen in practice; which of course also makes it less important to retry instead of exiting.