public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Isaku Yamahata <isaku.yamahata@intel.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"seanjc@google.com" <seanjc@google.com>,
	"sagis@google.com" <sagis@google.com>,
	"dmatlack@google.com" <dmatlack@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"isaku.yamahata@gmail.com" <isaku.yamahata@gmail.com>,
	"Zhao, Yan Y" <yan.y.zhao@intel.com>,
	"Aktas, Erdem" <erdemaktas@google.com>
Subject: Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU
Date: Tue, 28 May 2024 18:57:49 -0700	[thread overview]
Message-ID: <20240529015749.GF386318@ls.amr.corp.intel.com> (raw)
In-Reply-To: <a5ad658b1eec38f621315431a24e028187b5ca2d.camel@intel.com>

On Tue, May 28, 2024 at 11:06:45PM +0000,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com> wrote:

> On Tue, 2024-05-14 at 17:59 -0700, Rick Edgecombe wrote:
> >  static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> > -                               u64 old_spte, u64 new_spte, int level,
> > -                               bool shared)
> > +                               u64 old_spte, u64 new_spte,
> > +                               union kvm_mmu_page_role role, bool shared)
> >  {
> > +       bool is_private = kvm_mmu_page_role_is_private(role);
> > +       int level = role.level;
> >         bool was_present = is_shadow_present_pte(old_spte);
> >         bool is_present = is_shadow_present_pte(new_spte);
> >         bool was_leaf = was_present && is_last_spte(old_spte, level);
> >         bool is_leaf = is_present && is_last_spte(new_spte, level);
> > -       bool pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
> > +       kvm_pfn_t old_pfn = spte_to_pfn(old_spte);
> > +       kvm_pfn_t new_pfn = spte_to_pfn(new_spte);
> > +       bool pfn_changed = old_pfn != new_pfn;
> >  
> >         WARN_ON_ONCE(level > PT64_ROOT_MAX_LEVEL);
> >         WARN_ON_ONCE(level < PG_LEVEL_4K);
> > @@ -513,7 +636,7 @@ static void handle_changed_spte(struct kvm *kvm, int
> > as_id, gfn_t gfn,
> >  
> >         if (was_leaf && is_dirty_spte(old_spte) &&
> >             (!is_present || !is_dirty_spte(new_spte) || pfn_changed))
> > -               kvm_set_pfn_dirty(spte_to_pfn(old_spte));
> > +               kvm_set_pfn_dirty(old_pfn);
> >  
> >         /*
> >          * Recursively handle child PTs if the change removed a subtree from
> > @@ -522,15 +645,21 @@ static void handle_changed_spte(struct kvm *kvm, int
> > as_id, gfn_t gfn,
> >          * pages are kernel allocations and should never be migrated.
> >          */
> >         if (was_present && !was_leaf &&
> > -           (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
> > +           (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) {
> > +               KVM_BUG_ON(is_private !=
> > is_private_sptep(spte_to_child_pt(old_spte, level)),
> > +                          kvm);
> >                 handle_removed_pt(kvm, spte_to_child_pt(old_spte, level),
> > shared);
> > +       }
> > +
> > +       if (is_private && !is_present)
> > +               handle_removed_private_spte(kvm, gfn, old_spte, new_spte,
> > role.level);
> 
> I'm a little bothered by the asymmetry of where the mirrored hooks get called
> between setting and zapping PTEs. Tracing through the code, the relevent
> operations that are needed for TDX are:
> 1. tdp_mmu_iter_set_spte() from tdp_mmu_zap_leafs() and __tdp_mmu_zap_root()
> 2. tdp_mmu_set_spte_atomic() is used for mapping, linking
> 
> (1) is a simple case because the mmu_lock is held for writes. It updates the
> mirror root like normal, then has extra logic to call out to update the S-EPT.
> 
> (2) on the other hand just has the read lock, so it has to do the whole
> operation in a special way. First set REMOVED_SPTE, then update the private
> copy, then write to the mirror page tables. It can't get stuffed into
> handle_changed_spte() because it has to write REMOVED_SPTE first.
> 
> In some ways it makes sense to update the S-EPT. Despite claiming
> "handle_changed_spte() only updates stats.", it does some updating of other PTEs
> based on the current PTE change. Which is pretty similar to what the mirrored
> PTEs are doing. But we can't really do the setting of present PTEs because of
> the REMOVED_SPTE stuff.
> 
> So we could only make it more symmetrical by moving the S-EPT ops out of
> handle_changed_spte() and manually call it in the two places relevant for TDX,
> like the below.
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index e966986bb9f2..c9ddb1c2a550 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -438,6 +438,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t
> pt, bool shared)
>                          */
>                         old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte,
>                                                           REMOVED_SPTE, level);
> +
> +                       if (is_mirror_sp(sp))
> +                               reflect_removed_spte(kvm, gfn, old_spte,
> REMOVED_SPTE, level);

The callback before handling lower level will result in error.


>                 }
>                 handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn,
>                                     old_spte, REMOVED_SPTE, sp->role, shared);


We should call it here after processing lower level.



> @@ -667,9 +670,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id,
> gfn_t gfn,
>                 handle_removed_pt(kvm, spte_to_child_pt(old_spte, level),
> shared);
>         }
>  
> -       if (is_mirror && !is_present)
> -               reflect_removed_spte(kvm, gfn, old_spte, new_spte, role.level);
> -
>         if (was_leaf && is_accessed_spte(old_spte) &&
>             (!is_present || !is_accessed_spte(new_spte) || pfn_changed))
>                 kvm_set_pfn_accessed(spte_to_pfn(old_spte));
> @@ -839,6 +839,9 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id,
> tdp_ptep_t sptep,
>                                                       new_spte, level), kvm);
>         }
>  
> +       if (is_mirror_sptep(sptep))
> +               reflect_removed_spte(kvm, gfn, old_spte, REMOVED_SPTE, level);
> +

Ditto.


>         role = sptep_to_sp(sptep)->role;
>         role.level = level;
>         handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, role, false);

The callback should be here.  It should be after handling the lower level.



> Otherwise, we could move the "set present" mirroring operations into
> handle_changed_spte(), and have some earlier conditional logic do the
> REMOVED_SPTE parts. It starts to become more scattered.
> Anyway, it's just a code clarity thing arising from having hard time explaining
> the design in the log. Any opinions?

Originally I tried to consolidate the callbacks by following TDP MMU using
handle_changed_spte().  Anyway we can pick from two outcomes based on which is
easy to understand/maintain.


> A separate but related comment is below.
> 
> >  
> >         if (was_leaf && is_accessed_spte(old_spte) &&
> >             (!is_present || !is_accessed_spte(new_spte) || pfn_changed))
> >                 kvm_set_pfn_accessed(spte_to_pfn(old_spte));
> >  }
> >  
> > @@ -648,6 +807,8 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
> >  static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> >                             u64 old_spte, u64 new_spte, gfn_t gfn, int level)
> >  {
> > +       union kvm_mmu_page_role role;
> > +
> >         lockdep_assert_held_write(&kvm->mmu_lock);
> >  
> >         /*
> > @@ -660,8 +821,16 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id,
> > tdp_ptep_t sptep,
> >         WARN_ON_ONCE(is_removed_spte(old_spte) || is_removed_spte(new_spte));
> >  
> >         old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
> > +       if (is_private_sptep(sptep) && !is_removed_spte(new_spte) &&
> > +           is_shadow_present_pte(new_spte)) {
> > +               /* Because write spin lock is held, no race.  It should
> > success. */
> > +               KVM_BUG_ON(__set_private_spte_present(kvm, sptep, gfn,
> > old_spte,
> > +                                                     new_spte, level), kvm);
> > +       }
> 
> Based on the above enumeration, I don't see how this hunk gets used.

I should've removed it.  This is leftover from the old patches.
-- 
Isaku Yamahata <isaku.yamahata@intel.com>

  reply	other threads:[~2024-05-29  1:57 UTC|newest]

Thread overview: 152+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-15  0:59 [PATCH 00/16] TDX MMU prep series part 1 Rick Edgecombe
2024-05-15  0:59 ` [PATCH 01/16] KVM: x86: Add a VM type define for TDX Rick Edgecombe
2024-05-15  0:59 ` [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion Rick Edgecombe
2024-05-15 13:24   ` Huang, Kai
2024-05-15 19:09     ` Sean Christopherson
2024-05-15 19:23       ` Edgecombe, Rick P
2024-05-15 20:05         ` Sean Christopherson
2024-05-15 20:53           ` Edgecombe, Rick P
2024-05-15 22:47             ` Sean Christopherson
2024-05-15 23:06               ` Huang, Kai
2024-05-15 23:20                 ` Sean Christopherson
2024-05-15 23:36                   ` Huang, Kai
2024-05-16  1:12                   ` Xiaoyao Li
2024-05-17 15:30                   ` Paolo Bonzini
2024-05-22  1:29                     ` Yan Zhao
2024-05-22  2:31                       ` Sean Christopherson
2024-05-22  6:48                         ` Yan Zhao
2024-05-22 15:45                           ` Paolo Bonzini
2024-05-24  1:50                             ` Yan Zhao
2024-05-15 23:56               ` Edgecombe, Rick P
2024-05-16  2:21                 ` Edgecombe, Rick P
2024-05-16  3:56                 ` Yan Zhao
2024-05-17 15:27           ` Paolo Bonzini
2024-05-17 15:25       ` Paolo Bonzini
2024-05-15 18:03   ` Isaku Yamahata
2024-05-15  0:59 ` [PATCH 03/16] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Rick Edgecombe
2024-05-17  7:44   ` Chao Gao
2024-05-17  9:08     ` Isaku Yamahata
2024-05-15  0:59 ` [PATCH 04/16] KVM: x86/mmu: Add address conversion functions for TDX shared bit of GPA Rick Edgecombe
2024-05-15 22:34   ` Huang, Kai
2024-05-15 23:21     ` Edgecombe, Rick P
2024-05-15 23:31       ` Huang, Kai
2024-05-15 23:38         ` Edgecombe, Rick P
2024-05-15 23:44           ` Huang, Kai
2024-05-15 23:59             ` Edgecombe, Rick P
2024-05-16  0:12               ` Huang, Kai
2024-05-16  0:19                 ` Edgecombe, Rick P
2024-05-16  0:25                   ` Huang, Kai
2024-05-16  0:35                     ` Edgecombe, Rick P
2024-05-16  1:04                       ` Huang, Kai
2024-05-16  1:20                         ` Edgecombe, Rick P
2024-05-16  1:40                           ` Huang, Kai
2024-05-16  5:52                             ` Yan Zhao
2024-05-18  0:25                               ` Edgecombe, Rick P
2024-05-16 23:08                           ` Edgecombe, Rick P
2024-05-17  0:37                             ` Huang, Kai
2024-05-17  1:51                               ` Edgecombe, Rick P
2024-05-17  4:26                                 ` Huang, Kai
2024-05-17 21:12                                   ` Edgecombe, Rick P
2024-05-15  0:59 ` [PATCH 05/16] KVM: Add member to struct kvm_gfn_range for target alias Rick Edgecombe
2024-05-17 20:58   ` Edgecombe, Rick P
2024-05-15  0:59 ` [PATCH 06/16] KVM: x86/mmu: Add a new is_private member for union kvm_mmu_page_role Rick Edgecombe
2024-05-15  0:59 ` [PATCH 07/16] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page Rick Edgecombe
2024-05-15  0:59 ` [PATCH 08/16] KVM: x86/mmu: Bug the VM if kvm_zap_gfn_range() is called for TDX Rick Edgecombe
2024-05-15 13:27   ` Huang, Kai
2024-05-15 15:22     ` Edgecombe, Rick P
2024-05-15 23:14       ` Huang, Kai
2024-05-15 15:34   ` Sean Christopherson
2024-05-15 15:49     ` Edgecombe, Rick P
2024-05-15 15:56       ` Edgecombe, Rick P
2024-05-15 16:02         ` Sean Christopherson
2024-05-15 16:12           ` Edgecombe, Rick P
2024-05-15 18:09             ` Sean Christopherson
2024-05-15 18:22               ` Edgecombe, Rick P
2024-05-15 19:48                 ` Sean Christopherson
2024-05-15 20:32                   ` Edgecombe, Rick P
2024-05-15 23:26                     ` Sean Christopherson
2024-05-15 16:22     ` Isaku Yamahata
2024-05-15 22:17       ` Huang, Kai
2024-05-15 23:14         ` Edgecombe, Rick P
2024-05-15 23:38           ` Huang, Kai
2024-05-16  0:13             ` Edgecombe, Rick P
2024-05-16  0:27               ` Isaku Yamahata
2024-05-16  1:11               ` Huang, Kai
2024-05-16  0:15         ` Isaku Yamahata
2024-05-16  0:52           ` Edgecombe, Rick P
2024-05-16  1:21           ` Huang, Kai
2024-05-16 17:27             ` Isaku Yamahata
2024-05-16 21:46   ` Edgecombe, Rick P
2024-05-16 22:23     ` Huang, Kai
2024-05-16 22:38       ` Edgecombe, Rick P
2024-05-16 23:16         ` Huang, Kai
2024-05-15  0:59 ` [PATCH 09/16] KVM: x86/mmu: Make kvm_tdp_mmu_alloc_root() return void Rick Edgecombe
2024-05-15  0:59 ` [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU Rick Edgecombe
2024-05-15 17:35   ` Isaku Yamahata
2024-05-15 18:00     ` Edgecombe, Rick P
2024-05-16  0:52   ` Huang, Kai
2024-05-16  1:27     ` Edgecombe, Rick P
2024-05-16  2:07       ` Huang, Kai
2024-05-16  2:57         ` Edgecombe, Rick P
2024-05-16 13:04           ` Huang, Kai
2024-05-16 16:36             ` Edgecombe, Rick P
2024-05-16 19:42               ` Isaku Yamahata
2024-05-17  2:35                 ` Edgecombe, Rick P
2024-05-17  9:03                   ` Isaku Yamahata
2024-05-17 18:16                     ` Edgecombe, Rick P
2024-05-17 19:16                       ` Isaku Yamahata
2024-05-20 23:32                         ` Isaku Yamahata
2024-05-21 15:07                           ` Edgecombe, Rick P
2024-05-21 16:15                             ` Isaku Yamahata
2024-05-22 22:34                               ` Isaku Yamahata
2024-05-22 23:09                                 ` Edgecombe, Rick P
2024-05-22 23:47                                   ` Isaku Yamahata
2024-05-22 23:50                                     ` Edgecombe, Rick P
2024-05-23  0:01                                       ` Isaku Yamahata
2024-05-23 18:27                                         ` Edgecombe, Rick P
2024-05-24  7:55                                           ` Isaku Yamahata
2024-05-28 16:27                                             ` Edgecombe, Rick P
2024-05-28 17:47                                               ` Paolo Bonzini
2024-05-29  2:13                                                 ` Edgecombe, Rick P
2024-05-29  7:25                                                   ` Paolo Bonzini
2024-05-31 14:11                                                     ` Isaku Yamahata
2024-05-28 17:43                           ` Paolo Bonzini
2024-05-28 17:16                         ` Paolo Bonzini
2024-05-28 18:29                           ` Edgecombe, Rick P
2024-05-29  1:06                             ` Isaku Yamahata
2024-05-29  1:51                               ` Edgecombe, Rick P
2024-05-17  2:36                 ` Huang, Kai
2024-05-17  8:14                   ` Isaku Yamahata
2024-05-18  5:42                     ` Huang, Kai
2024-05-18 15:41                       ` Edgecombe, Rick P
2024-05-20 10:38                         ` Huang, Kai
2024-05-20 18:58                           ` Isaku Yamahata
2024-05-20 19:02                             ` Edgecombe, Rick P
2024-05-20 23:39                               ` Edgecombe, Rick P
2024-05-21  2:25                                 ` Isaku Yamahata
2024-05-21  2:57                                   ` Edgecombe, Rick P
2024-05-20 22:34                             ` Huang, Kai
2024-05-16  1:48     ` Isaku Yamahata
2024-05-16  2:00       ` Edgecombe, Rick P
2024-05-16  2:10         ` Huang, Kai
2024-05-28 16:59           ` Paolo Bonzini
2024-05-16 17:10         ` Isaku Yamahata
2024-05-23 23:14   ` Edgecombe, Rick P
2024-05-24  8:20     ` Isaku Yamahata
2024-05-28 21:48       ` Edgecombe, Rick P
2024-05-29  1:16         ` Isaku Yamahata
2024-05-29  1:50           ` Edgecombe, Rick P
2024-05-29  2:20             ` Isaku Yamahata
2024-05-29  2:29               ` Edgecombe, Rick P
2024-05-28 20:54   ` Edgecombe, Rick P
2024-05-29  1:24     ` Isaku Yamahata
2024-05-28 23:06   ` Edgecombe, Rick P
2024-05-29  1:57     ` Isaku Yamahata [this message]
2024-05-29  2:13       ` Edgecombe, Rick P
2024-05-29 16:55         ` Isaku Yamahata
2024-05-15  0:59 ` [PATCH 11/16] KVM: x86/tdp_mmu: Extract root invalid check from tdx_mmu_next_root() Rick Edgecombe
2024-05-15  0:59 ` [PATCH 12/16] KVM: x86/tdp_mmu: Introduce KVM MMU root types to specify page table type Rick Edgecombe
2024-05-15  0:59 ` [PATCH 13/16] KVM: x86/tdp_mmu: Introduce shared, private KVM MMU root types Rick Edgecombe
2024-05-15  0:59 ` [PATCH 14/16] KVM: x86/tdp_mmu: Take root types for kvm_tdp_mmu_invalidate_all_roots() Rick Edgecombe
2024-05-15  0:59 ` [PATCH 15/16] KVM: x86/tdp_mmu: Make mmu notifier callbacks to check kvm_process Rick Edgecombe
2024-05-15  0:59 ` [PATCH 16/16] KVM: x86/tdp_mmu: Invalidate correct roots Rick Edgecombe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240529015749.GF386318@ls.amr.corp.intel.com \
    --to=isaku.yamahata@intel.com \
    --cc=dmatlack@google.com \
    --cc=erdemaktas@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sagis@google.com \
    --cc=seanjc@google.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox