From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5B4B1CD2C for ; Tue, 16 Jul 2024 19:31:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721158316; cv=none; b=r1hzu3BvxZCRL66Ka1BQcg/IWWYXaxsbhSUtE2gaJm15/GL15sHu642IiMUxL2XrY2AfiD7pVKbpDIjjb0zbmjJrbD4FIbM1If5In8NlKOgSKTO+W0PV4vuTAcV2KSX1jocA0QZO0gcz54hrwSifel0kmesPj416E5D0YcFegiA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721158316; c=relaxed/simple; bh=c8UzbSu94SHZlCBc7Dko7y4EM7V9wstxHkNN0vs0sHk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZsWBXN2XWRVlNyI3REsjGVoinjddk+JMpWdjTuqScjK5h7yJADj+ml+e6m7r+Y8s1Y0vvhel8stRAu0N9WfYygVgie9w4aD3rP6abGsZcrDP4NrOVsg2Pc+Ic7oZ2vaCVe4Vn9wSiQCBkvG3fHUXy7n6eRJFj3dVNUAvAQwa0Qw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fVNq/fUb; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fVNq/fUb" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-70afd833ba9so3359762b3a.1 for ; Tue, 16 Jul 2024 12:31:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721158314; x=1721763114; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=/WJoNCYn7js3hz23PZCgEw2FtBVSDvq0l3C/wykvag0=; b=fVNq/fUbCqfARQX5RRKMg5ujkZJujD6Y49ym4NFhPydh+SR0RUTG5GV3XNJTyklLnN EVC7LsNYPU2tOFFgKhE6bSBfqvRrADQv/4NWz9ymAJsP2If2oHmUx2fpUwGdj7zIzu1B AYqc6tY3ELhn3Vdd4FCEVflbbepiBIZyvbMT2Qa5RjQ2DPYnkQLnkZTAPAR6iyJ046RM a9ZbJYyI82qbrdRqXS2kL8SO9BtLepgry5835MuprMy6PC76nI2DEvOav+fN6JQfMveK mLfF8l4oNvLirZqq6xcwDFRqwqRG8BRzRcZ6C6ZtkLnRrdkHwi8kmjk9vD46JFUy7ajl R8Jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721158314; x=1721763114; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=/WJoNCYn7js3hz23PZCgEw2FtBVSDvq0l3C/wykvag0=; b=I63CkgumUGLMbAku4pIiVrENB7JdZW3udHNU3GZrCysY65UnkF/XO90IKbo34kSQPx Hh98LY5fM5wPKcTm39LDUQ4vQ9GGbvBNyTHatDuAy6KQ4r8rS3yauBY5U/Hvohzzvq5G LZbxGpxkRtYi1oIj0fAZ9bz5Eh7P9TplRJVJhQS75nGWseNKajuh+taZMpGo4b2VXczA AXOUCJ0zHDfa/ICmSosYmcECN9oIy5dcOSrJm/5rJOqf4I7qU6nH7dyFTNMtR+nzO9nf B3YYImJUKp0lK5XnXZboDY3nRk/bWoStr5+bf+jpuPy9LCv0UTeF/ixYMjuvzlHUm8SE 6WGA== X-Forwarded-Encrypted: i=1; AJvYcCW4R9bEGObN+bDmXXYhks1++q+q8dY2dTzJQr/V8PQkUuxnvfE3J6RlEPZVbTYZALygDRlWxMre6/1a72T4bNenyUgSuOu9Kb/GGCc8 X-Gm-Message-State: AOJu0Ywol91XtFe6h/WzSS2P67iLXKLpWWVwq6EP1cqiemSrp+8Hz7Rd 0NqNMhoKJ9ce3k6ca5sK7gwo+udfKVYBqtGlllHt33mhtYsTLd8CaEuVpuna5lk4Ty6AwCr1KLR VhA== X-Google-Smtp-Source: AGHT+IHzKKMLIUP4x9S22RErcOWeGFNe3jUdiw80u+WvUdaciKtVUPOhD43xVzlWxPTnAmN35iApP23KZVY= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a00:1891:b0:70b:1086:cf8c with SMTP id d2e1a72fcca58-70c2ea732dcmr237619b3a.6.1721158313928; Tue, 16 Jul 2024 12:31:53 -0700 (PDT) Date: Tue, 16 Jul 2024 12:31:52 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: Message-ID: Subject: Re: [PATCH] KVM: x86/mmu: Allow per VM kvm_mmu_max_gfn() From: Sean Christopherson To: isaku.yamahata@intel.com Cc: kvm@vger.kernel.org, isaku.yamahata@gmail.com, Paolo Bonzini , rick.p.edgecombe@intel.com, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="us-ascii" On Mon, Jul 15, 2024, isaku.yamahata@intel.com wrote: > From: Isaku Yamahata > > Prepare for TDX support by making kvm_mmu_max_gfn() configurable. Have > this preparation also be useful by non-TDX changes to improve correctness > associated with the combination of 4-level EPT and MAXPA > 48. The issue > is analyzed at [1]. > > Like other confidential computing technologies, TDX has the concept of > private and shared memory. For TDX, the private and shared mappings of the > same GFN are mapped at separate GPAs, with a configurable GPA bit selecting > between private and shared aliases of the same GFN. This means that > operations working with the concept of a maximum GFN should only go up to > this configurable GPA bit instead of the existing behavior based on > shadow_phys_bits. Other TDX changes will handle applying the operation > across both GFN aliases. This is going to be confusing. A TDX will be all but guaranteed to generate an EPT violation on a gfn > kvm->mmu_max_gfn. > Using the existing kvm_mmu_max_gfn() based on shadow_phys_bits would cause > functional problems for TDX. Specifically, because the function is used to > restrict the range where memslots can be created. For TDX, if a memslot is > created at a GFN that includes the bit that selects between private/shared, > it would confuse the logic that zaps both aliases. It would end up zapping > only the higher alias and missing the lower one. In this case, freed pages > could remain mapped in the TDX guest. So why don't we simply disallow creating aliased memslots? > Since this GPA bit is configurable per-VM, make kvm_mmu_max_gfn() per-VM by > having it take a struct kvm, and keep the max GFN as a member on that > struct. Future TDX changes will set this member based on the configurable > position of the private/shared bit. > > Besides functional issues, it is generally more correct and easier to No, it most definitely is not more correct. There is absolutely no issue zapping SPTEs that should never exist. In fact, restricting the zapping path is far more likely to *cause* correctness issues, e.g. see 524a1e4e381f ("KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs") 86931ff7207b ("KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR") Creating SPTEs is a different matter, but unless I'm missing something, the only path that _needs_ to be updated is kvm_arch_prepare_memory_region(), to disallow aliased memslots. I assume TDX will strip the shared bit from fault->gfn, and shove it back in when creating MMIO SPTEs in the shared EPT page tables. Why can't we simply do: diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 842a3a4cdfe9..5ea428dde891 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4395,6 +4395,9 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, struct kvm_memory_slot *slot = fault->slot; int ret; + if (WARN_ON_ONCE(vcpu->kvm, kvm_is_gfn_alias(fault->gfn))) + return -EFAULT; + /* * Note that the mmu_invalidate_seq also serves to detect a concurrent * change in attributes. is_page_fault_stale() will detect an diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 994743266480..091da7607025 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12979,6 +12979,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, if ((new->base_gfn + new->npages - 1) > kvm_mmu_max_gfn()) return -EINVAL; + if (kvm_is_gfn_alias(kvm, new->base_gfn + new->npages - 1)) + return -EINVAL; + return kvm_alloc_memslot_metadata(kvm, new); }