From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4BF50634 for ; Sat, 19 Aug 2023 02:09:09 +0000 (UTC) Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-99357737980so191992966b.2 for ; Fri, 18 Aug 2023 19:09:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692410947; x=1693015747; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=k267oI+dAzHZnNHVlp3pyDBJcaxDuy7lG9Dkzmn1h54=; b=IbadbjFEIrpByLcq3LOudohJQP2TQWqZ0x1Sw2sI0NDuGFEM6+9ANcua8u4AxFhPF0 hK5AnMtjOJdFCJYcwcNlRYoD5C4JBeZgyPOHKHUGZf9X2qe3x3EeCYVMGp4HbEpTAW6h CbW+bZGRLJJBI3bCz8xUBIL6nk7MNujaLsNTrpy9chBrhGx6/PCiH6t4V+d9AwtkvNj7 DCjvo099IDXY/BS9HBA8/gkXjsCdPzGlibIBbXWg/VtcdE6MKVov37lKs68CtwxBOk1H KlYsOa7gSDONJS+ku+ZpNecNzgp0OwYRlGXVuALnpVcN2Gq4igK20o3eGAtJn75Y2+Ad Ia4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692410947; x=1693015747; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k267oI+dAzHZnNHVlp3pyDBJcaxDuy7lG9Dkzmn1h54=; b=WXiZJxwe+vBFoQVGCYlETFyy/fFYxccVL3K7h3+rLPQ5olHA4Tz+79rXp0+19+ecIU d0vJWKHRcVngJYjrD3LutQJ2tWNixYcsepgde1pp3FkF12Jq1JastRvM0jLc5X4MJqW9 c/xYR7fNOxe5ZfY3t34IGcnqc/QcJePnTJpyUxkNFdcKhS4TnkiNZzwYOOIGNLwj/jM7 fb280KHDK7qNupaQSQgA01KGOc5SLD2/u8YAWL00WFL/p5WpvkbHOdb5kqpzj6rW5D5j DxQ/01bLg6BuVcIEpHZtiXFq2yZ0aO5JSqHFYkkjSdIisQ30EmA65trVdGC4sxghhEzC k9Fw== X-Gm-Message-State: AOJu0Yy3rTYN8n9tjbDLawuvGeHV0KtigAPk21tggrsfhd2V6QnoTrEN D1+8m/JRkRKf3YhCMwBk0oC8y4AN4vsgnGKdGRSGtg== X-Google-Smtp-Source: AGHT+IFuMGk+o9gVI4or+3uEppGaYEoS4rzz60lJAU827sHHVMZx4j/Ip1xtz6eMASi3EVSZaRpwSuKFikjlG5+fIQ0= X-Received: by 2002:a17:906:1c:b0:99c:bb4d:f596 with SMTP id 28-20020a170906001c00b0099cbb4df596mr709096eja.6.1692410947101; Fri, 18 Aug 2023 19:09:07 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-coco@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <52c6a8a6-3a0a-83ba-173d-0833e16b64fd@amd.com> In-Reply-To: From: Mingwei Zhang Date: Fri, 18 Aug 2023 19:08:30 -0700 Message-ID: Subject: Re: [PATCH 4/8] KVM: gmem: protect kvm_mmu_invalidate_end() To: Sean Christopherson , Jacky Li Cc: Ashish Kalra , isaku.yamahata@intel.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, isaku.yamahata@gmail.com, Michael Roth , Paolo Bonzini , erdemaktas@google.com, Sagi Shahar , David Matlack , Kai Huang , Zhi Wang , chen.bo@intel.com, linux-coco@lists.linux.dev, Chao Peng , Ackerley Tng , Vishal Annapurve , Yuan Yao , Jarkko Sakkinen , Xu Yilun , Quentin Perret , wei.w.wang@intel.com, Fuad Tabba Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable +Jacky Li On Fri, Aug 18, 2023 at 3:45=E2=80=AFPM Sean Christopherson wrote: > > +Mingwei to correct me if I'm wrong > > On Fri, Aug 18, 2023, Ashish Kalra wrote: > > > > On 8/18/2023 12:55 PM, Sean Christopherson wrote: > > > On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote: > > > > From: Isaku Yamahata > > > > > > > > kvm_mmu_invalidate_end() updates struct kvm::mmu_invalidate_in_prog= ress > > > > and it's protected by kvm::mmu_lock. call kvm_mmu_invalidate_end()= before > > > > unlocking it. Not after the unlock. > > > > > > > > Fixes: 8e9009ca6d14 ("KVM: Introduce per-page memory attributes") > > > > > > This fixes is wrong. It won't matter in the long run, but it makes m= y life that > > > much harder. > > > > > > > Signed-off-by: Isaku Yamahata > > > > --- > > > > virt/kvm/kvm_main.c | 15 ++++++++++++++- > > > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > > index 8bfeb615fc4d..49380cd62367 100644 > > > > --- a/virt/kvm/kvm_main.c > > > > +++ b/virt/kvm/kvm_main.c > > > > @@ -535,6 +535,7 @@ struct kvm_mmu_notifier_range { > > > > } arg; > > > > gfn_handler_t handler; > > > > on_lock_fn_t on_lock; > > > > + on_unlock_fn_t before_unlock; > > > > on_unlock_fn_t on_unlock; > > > > > > Ugh, shame on my past me. Having on_lock and on_unlock be asymmetric= al with respect > > > to the lock is nasty. > > > > > > I would much rather we either (a) be explicit, e.g. before_(un)lock a= nd after_(un)lock, > > > or (b) have just on_(un)lock, make them symetrical, and handle the SE= V mess a > > > different way. > > > > > > The SEV hook doesn't actually care about running immediately after un= lock, it just > > > wants to know if there was an overlapping memslot. It can run after = SRCU is dropped, > > > because even if we make the behavior more precise (right now it blast= s WBINVD), > > > just having a reference to memslots isn't sufficient, the code needs = to guarantee > > > memslots are *stable*. And that is already guaranteed by the notifie= r code, i.e. > > > the SEV code could just reacquire SRCU. > > > > On a separate note here, the SEV hook blasting WBINVD is still causing > > serious performance degradation issues with SNP triggered via > > AutoNUMA/numad/KSM, etc. With reference to previous discussions related= to > > it, we have plans to replace WBINVD with CLFLUSHOPT. > > Isn't the flush unnecessary when freeing shared memory? My recollection = is that > the problematic scenario is when encrypted memory is freed back to the ho= st, > because KVM already flushes when potentially encrypted mapping memory int= o the > guest. > > With SNP+guest_memfd, private/encrypted memory should be unreachabled via= the > hva-based mmu_notifiers. gmem should have full control of the page lifec= ycles, > i.e. can get the kernel virtual address as appropriated, and so it SNP sh= ouldn't > need the nuclear option. > > E.g. something like this? > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c > index 07756b7348ae..1c6828ae391d 100644 > --- a/arch/x86/kvm/svm/sev.c > +++ b/arch/x86/kvm/svm/sev.c > @@ -2328,7 +2328,7 @@ static void sev_flush_encrypted_page(struct kvm_vcp= u *vcpu, void *va) > > void sev_guest_memory_reclaimed(struct kvm *kvm) > { > - if (!sev_guest(kvm)) > + if (!sev_guest(kvm) || sev_snp_guest(kvm)) > return; > > wbinvd_on_all_cpus(); I hope this is the final solution :) So, short answer: no. SNP+guest_memfd prevent untrusted host user space from directly modifying the data, this is good enough for CVE-2022-0171, but there is no such guarantee that the host kernel in some scenarios could access the data and generate dirty caches. In fact, AFAIC, SNP VM does not track whether each page is previously shared, isn't it? If a page was previously shared and was written by the host kernel or devices before it was changed to private. No one tracks it and dirty caches are there! So, to avoid any corner case situations like the above, it seems currently we have to retain the property: flushing the cache when the guest memory mapping leaves KVM NPT. Of course, this is fundamentally because SME_COHERENT only applies to CPU cores, but not DMA. If SME_COHERENT is complete, flushing is no longer needed. Alternatively, we need extra bookkeeping for KVM to know whether each page has dirty cache lines. Another alternative is to filter mmu_notifier reasons, which is the part that I am planning to take. thoughts? Thanks. -Mingwei