Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: "Huang, Kai" <kai.huang@intel.com>
To: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"seanjc@google.com" <seanjc@google.com>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "sagis@google.com" <sagis@google.com>,
	"Aktas, Erdem" <erdemaktas@google.com>,
	"dmatlack@google.com" <dmatlack@google.com>,
	"Zhao, Yan Y" <yan.y.zhao@intel.com>,
	"isaku.yamahata@gmail.com" <isaku.yamahata@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU
Date: Thu, 16 May 2024 13:04:26 +0000	[thread overview]
Message-ID: <021e8ee11c87bfac90e886e78795d825ddab32ee.camel@intel.com> (raw)
In-Reply-To: <588d801796415df61136ce457156d9ff3f2a2661.camel@intel.com>

On Thu, 2024-05-16 at 02:57 +0000, Edgecombe, Rick P wrote:
> On Thu, 2024-05-16 at 14:07 +1200, Huang, Kai wrote:
> > 
> > I meant it seems we should just strip shared bit away from the GPA in 
> > handle_ept_violation() and pass it as 'cr2_or_gpa' here, so fault->addr 
> > won't have the shared bit.
> > 
> > Do you see any problem of doing so?
> 
> We would need to add it back in "raw_gfn" in kvm_tdp_mmu_map().

I don't see any big difference?

Now in this patch the raw_gfn is directly from fault->addr:

	raw_gfn = gpa_to_gfn(fault->addr);

	tdp_mmu_for_each_pte(iter, mmu, is_private, raw_gfn, raw_gfn+1) {
		...
  	}

But there's nothing wrong to get the raw_gfn from the fault->gfn.  In
fact, the zapping code just does this:

        /*
         * start and end doesn't have GFN shared bit.  This function zaps
         * a region including alias.  Adjust shared bit of [start, end) if
         * the root is shared.
         */
        start = kvm_gfn_for_root(kvm, root, start);
        end = kvm_gfn_for_root(kvm, root, end);

So there's nothing wrong to just do the same thing in both functions.

The point is fault->gfn has shared bit stripped away at the beginning, and
AFAICT there's no useful reason to keep shared bit in fault->addr.  The
entire @fault is a temporary structure on the stack during fault handling
anyway.

> 
> In the past I did something like the private/shared split, but for execute-only
> aliases and a few other wacky things.
> 
> It also had a synthetic error code. For awhile I had it so GPA had alias bits
> (i.e. shared bit) not stripped, like TDX has today, but there was always some
> code that got surprised by the extra bits in the GPA. I want to say it was the
> emulation of PAE or something like that (execute-only had to support all the
> normal VM stuff).
> 
> So in the later revisions I actually had a helper to take a GFN and PF error
> code and put the alias bits back in. Then alias bits got stripped immediately
> and at the same time the synthetic error code was set. Something similar could
> probably work to recreate "raw_gfn" from a fault.
> 
> IIRC (and I could easily be wrong), when I discussed this with Sean he said TDX
> didn't need to support whatever issue I was working around, and the original
> solution was slightly better for TDX.
> 
> In any case, I doubt Sean is wedded to a remark he may or may not have made long
> ago. But looking at the TDX code today, it doesn't feel that confusing to me.

[...]

> 
> So I'm not against adding the shared bits back in later, but it doesn't seem
> that big of a gain to me. It also has kind of been tried before a long time ago.

As mentioned above, we are already doing that anyway in the zapping code
path.

> 
> > 
> > > 
> > > > 
> > > > >           }
> > > > >     
> > > > > diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
> > > > > index fae559559a80..8a64bcef9deb 100644
> > > > > --- a/arch/x86/kvm/mmu/tdp_iter.h
> > > > > +++ b/arch/x86/kvm/mmu/tdp_iter.h
> > > > > @@ -91,7 +91,7 @@ struct tdp_iter {
> > > > >           tdp_ptep_t pt_path[PT64_ROOT_MAX_LEVEL];
> > > > >           /* A pointer to the current SPTE */
> > > > >           tdp_ptep_t sptep;
> > > > > -       /* The lowest GFN mapped by the current SPTE */
> > > > > +       /* The lowest GFN (shared bits included) mapped by the current
> > > > > SPTE
> > > > > */
> > > > >           gfn_t gfn;
> > > > 
> > > > IMHO we need more clarification of this design.
> > > 

Btw, another thing after second thought:

So regardless of how to implement in KVM, IIUC TDX hardware requires below
two operations to have the shared bit set in the GPA for shared mapping:

  1) Setup/teardown shared page table mapping
  2) GPA range in TLB flush for shared mapping

(I kinda forgot the TLB flush part so better double check, but I guess I
am >90% sure about it.)

So in the fault handler path, we actually need to be careful of the GFN
passed to relevant functions, because for other operations like finding
memslot based on GFN, we must pass the GFN w/o shared bit.

Now the tricky thing is due to 1) the 'tdp_iter->gfn' is set to the
"raw_gfn" with shared bit in order to find the correct SPTE in the fault
handler path.  And as a result, the current implementation sets the sp-
>gfn to the "raw_gfn" too.

	sp = tdp_mmu_alloc_sp(vcpu);
	...
        tdp_mmu_init_child_sp(sp, &iter);

The problem is in current KVM implementation, iter->gfn and sp->gfn are
used in both cases: 1) page table walk and TLB flush; 2) others like
memslot lookup.

So the result is we need to be very careful whether we should strip the
shared bit away when using them.

E.g., Looking at the current dev branch, if I am reading code correctly,
it seems we have bug around here:

static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu,
                                          struct kvm_page_fault *fault,
                                          struct tdp_iter *iter)
{                   
	...

        if (unlikely(!fault->slot))
                new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL);
        else
                wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, 
				iter->gfn, fault->pfn, iter->old_spte, 
				fault->prefetch, true, 
				fault->map_writable, &new_spte);
	...
}

See @iter->gfn (which is "raw_gfn" AFAICT) is passed to both
make_mmio_spte() and make_spte().  But AFAICT both the two functions treat
GFN as the actual GFN.  E.g., 

bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
               const struct kvm_memory_slot *slot,
               unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
               u64 old_spte, bool prefetch, bool can_unsync,
               bool host_writable, u64 *new_spte)
{
	...

	if (shadow_memtype_mask)
                spte |= static_call(kvm_x86_get_mt_mask)(vcpu, gfn,
				kvm_is_mmio_pfn(pfn));
	...

	if ((spte & PT_WRITABLE_MASK) &&
			kvm_slot_dirty_track_enabled(slot)) {
                /* Enforced by kvm_mmu_hugepage_adjust. */
                WARN_ON_ONCE(level > PG_LEVEL_4K);
                mark_page_dirty_in_slot(vcpu->kvm, slot, gfn);
        }
	...
}

AFAICT both @gfn in kvm_x86_get_mt_mask() and mark_page_dirty_in_slot()
needs the actual GFN.  They may not be a concern for TDX now, but I think
it's logically wrong to use the raw GFN.

This kinda issue is hard to find in code writing and review.  I am
thinking whether we should have a more clear way to avoid such issues.

The idea is to add a new 'raw_gfn' to @tdp_iter and 'kvm_mmu_page'.  When
we walk the GFN range using iter, we always use the "actual GFN" w/o
shared bit.  Like:

	tdp_mmu_for_each_pte(kvm, iter, mmu, is_private, gfn, gfn + 1) {
		...
	}

But in the tdp_iter_*() functions, we internally calculate the "raw_gfn"
using the "actual GFN" + the 'kvm', and we use the "raw_gfn" to walk the
page table to find the correct SPTE.

So the end code will be: 1) explicitly use iter->raw_gfn for page table
walk and do TLB flush; 2) For all others like memslot lookup, use iter-
>gfn.

(sp->gfn and sp->raw_gfn can be used similarly, e.g., sp->raw_gfn is used
for TLB flush, and for others like memslot lookup we use sp->gfn.)

I think in this way the code will be more clear?

next prev parent reply	other threads:[~2024-05-16 13:04 UTC|newest]

Thread overview: 152+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-15  0:59 [PATCH 00/16] TDX MMU prep series part 1 Rick Edgecombe
2024-05-15  0:59 ` [PATCH 01/16] KVM: x86: Add a VM type define for TDX Rick Edgecombe
2024-05-15  0:59 ` [PATCH 02/16] KVM: x86/mmu: Introduce a slot flag to zap only slot leafs on slot deletion Rick Edgecombe
2024-05-15 13:24   ` Huang, Kai
2024-05-15 19:09     ` Sean Christopherson
2024-05-15 19:23       ` Edgecombe, Rick P
2024-05-15 20:05         ` Sean Christopherson
2024-05-15 20:53           ` Edgecombe, Rick P
2024-05-15 22:47             ` Sean Christopherson
2024-05-15 23:06               ` Huang, Kai
2024-05-15 23:20                 ` Sean Christopherson
2024-05-15 23:36                   ` Huang, Kai
2024-05-16  1:12                   ` Xiaoyao Li
2024-05-17 15:30                   ` Paolo Bonzini
2024-05-22  1:29                     ` Yan Zhao
2024-05-22  2:31                       ` Sean Christopherson
2024-05-22  6:48                         ` Yan Zhao
2024-05-22 15:45                           ` Paolo Bonzini
2024-05-24  1:50                             ` Yan Zhao
2024-05-15 23:56               ` Edgecombe, Rick P
2024-05-16  2:21                 ` Edgecombe, Rick P
2024-05-16  3:56                 ` Yan Zhao
2024-05-17 15:27           ` Paolo Bonzini
2024-05-17 15:25       ` Paolo Bonzini
2024-05-15 18:03   ` Isaku Yamahata
2024-05-15  0:59 ` [PATCH 03/16] KVM: x86/tdp_mmu: Add a helper function to walk down the TDP MMU Rick Edgecombe
2024-05-17  7:44   ` Chao Gao
2024-05-17  9:08     ` Isaku Yamahata
2024-05-15  0:59 ` [PATCH 04/16] KVM: x86/mmu: Add address conversion functions for TDX shared bit of GPA Rick Edgecombe
2024-05-15 22:34   ` Huang, Kai
2024-05-15 23:21     ` Edgecombe, Rick P
2024-05-15 23:31       ` Huang, Kai
2024-05-15 23:38         ` Edgecombe, Rick P
2024-05-15 23:44           ` Huang, Kai
2024-05-15 23:59             ` Edgecombe, Rick P
2024-05-16  0:12               ` Huang, Kai
2024-05-16  0:19                 ` Edgecombe, Rick P
2024-05-16  0:25                   ` Huang, Kai
2024-05-16  0:35                     ` Edgecombe, Rick P
2024-05-16  1:04                       ` Huang, Kai
2024-05-16  1:20                         ` Edgecombe, Rick P
2024-05-16  1:40                           ` Huang, Kai
2024-05-16  5:52                             ` Yan Zhao
2024-05-18  0:25                               ` Edgecombe, Rick P
2024-05-16 23:08                           ` Edgecombe, Rick P
2024-05-17  0:37                             ` Huang, Kai
2024-05-17  1:51                               ` Edgecombe, Rick P
2024-05-17  4:26                                 ` Huang, Kai
2024-05-17 21:12                                   ` Edgecombe, Rick P
2024-05-15  0:59 ` [PATCH 05/16] KVM: Add member to struct kvm_gfn_range for target alias Rick Edgecombe
2024-05-17 20:58   ` Edgecombe, Rick P
2024-05-15  0:59 ` [PATCH 06/16] KVM: x86/mmu: Add a new is_private member for union kvm_mmu_page_role Rick Edgecombe
2024-05-15  0:59 ` [PATCH 07/16] KVM: x86/mmu: Add a private pointer to struct kvm_mmu_page Rick Edgecombe
2024-05-15  0:59 ` [PATCH 08/16] KVM: x86/mmu: Bug the VM if kvm_zap_gfn_range() is called for TDX Rick Edgecombe
2024-05-15 13:27   ` Huang, Kai
2024-05-15 15:22     ` Edgecombe, Rick P
2024-05-15 23:14       ` Huang, Kai
2024-05-15 15:34   ` Sean Christopherson
2024-05-15 15:49     ` Edgecombe, Rick P
2024-05-15 15:56       ` Edgecombe, Rick P
2024-05-15 16:02         ` Sean Christopherson
2024-05-15 16:12           ` Edgecombe, Rick P
2024-05-15 18:09             ` Sean Christopherson
2024-05-15 18:22               ` Edgecombe, Rick P
2024-05-15 19:48                 ` Sean Christopherson
2024-05-15 20:32                   ` Edgecombe, Rick P
2024-05-15 23:26                     ` Sean Christopherson
2024-05-15 16:22     ` Isaku Yamahata
2024-05-15 22:17       ` Huang, Kai
2024-05-15 23:14         ` Edgecombe, Rick P
2024-05-15 23:38           ` Huang, Kai
2024-05-16  0:13             ` Edgecombe, Rick P
2024-05-16  0:27               ` Isaku Yamahata
2024-05-16  1:11               ` Huang, Kai
2024-05-16  0:15         ` Isaku Yamahata
2024-05-16  0:52           ` Edgecombe, Rick P
2024-05-16  1:21           ` Huang, Kai
2024-05-16 17:27             ` Isaku Yamahata
2024-05-16 21:46   ` Edgecombe, Rick P
2024-05-16 22:23     ` Huang, Kai
2024-05-16 22:38       ` Edgecombe, Rick P
2024-05-16 23:16         ` Huang, Kai
2024-05-15  0:59 ` [PATCH 09/16] KVM: x86/mmu: Make kvm_tdp_mmu_alloc_root() return void Rick Edgecombe
2024-05-15  0:59 ` [PATCH 10/16] KVM: x86/tdp_mmu: Support TDX private mapping for TDP MMU Rick Edgecombe
2024-05-15 17:35   ` Isaku Yamahata
2024-05-15 18:00     ` Edgecombe, Rick P
2024-05-16  0:52   ` Huang, Kai
2024-05-16  1:27     ` Edgecombe, Rick P
2024-05-16  2:07       ` Huang, Kai
2024-05-16  2:57         ` Edgecombe, Rick P
2024-05-16 13:04           ` Huang, Kai [this message]
2024-05-16 16:36             ` Edgecombe, Rick P
2024-05-16 19:42               ` Isaku Yamahata
2024-05-17  2:35                 ` Edgecombe, Rick P
2024-05-17  9:03                   ` Isaku Yamahata
2024-05-17 18:16                     ` Edgecombe, Rick P
2024-05-17 19:16                       ` Isaku Yamahata
2024-05-20 23:32                         ` Isaku Yamahata
2024-05-21 15:07                           ` Edgecombe, Rick P
2024-05-21 16:15                             ` Isaku Yamahata
2024-05-22 22:34                               ` Isaku Yamahata
2024-05-22 23:09                                 ` Edgecombe, Rick P
2024-05-22 23:47                                   ` Isaku Yamahata
2024-05-22 23:50                                     ` Edgecombe, Rick P
2024-05-23  0:01                                       ` Isaku Yamahata
2024-05-23 18:27                                         ` Edgecombe, Rick P
2024-05-24  7:55                                           ` Isaku Yamahata
2024-05-28 16:27                                             ` Edgecombe, Rick P
2024-05-28 17:47                                               ` Paolo Bonzini
2024-05-29  2:13                                                 ` Edgecombe, Rick P
2024-05-29  7:25                                                   ` Paolo Bonzini
2024-05-31 14:11                                                     ` Isaku Yamahata
2024-05-28 17:43                           ` Paolo Bonzini
2024-05-28 17:16                         ` Paolo Bonzini
2024-05-28 18:29                           ` Edgecombe, Rick P
2024-05-29  1:06                             ` Isaku Yamahata
2024-05-29  1:51                               ` Edgecombe, Rick P
2024-05-17  2:36                 ` Huang, Kai
2024-05-17  8:14                   ` Isaku Yamahata
2024-05-18  5:42                     ` Huang, Kai
2024-05-18 15:41                       ` Edgecombe, Rick P
2024-05-20 10:38                         ` Huang, Kai
2024-05-20 18:58                           ` Isaku Yamahata
2024-05-20 19:02                             ` Edgecombe, Rick P
2024-05-20 23:39                               ` Edgecombe, Rick P
2024-05-21  2:25                                 ` Isaku Yamahata
2024-05-21  2:57                                   ` Edgecombe, Rick P
2024-05-20 22:34                             ` Huang, Kai
2024-05-16  1:48     ` Isaku Yamahata
2024-05-16  2:00       ` Edgecombe, Rick P
2024-05-16  2:10         ` Huang, Kai
2024-05-28 16:59           ` Paolo Bonzini
2024-05-16 17:10         ` Isaku Yamahata
2024-05-23 23:14   ` Edgecombe, Rick P
2024-05-24  8:20     ` Isaku Yamahata
2024-05-28 21:48       ` Edgecombe, Rick P
2024-05-29  1:16         ` Isaku Yamahata
2024-05-29  1:50           ` Edgecombe, Rick P
2024-05-29  2:20             ` Isaku Yamahata
2024-05-29  2:29               ` Edgecombe, Rick P
2024-05-28 20:54   ` Edgecombe, Rick P
2024-05-29  1:24     ` Isaku Yamahata
2024-05-28 23:06   ` Edgecombe, Rick P
2024-05-29  1:57     ` Isaku Yamahata
2024-05-29  2:13       ` Edgecombe, Rick P
2024-05-29 16:55         ` Isaku Yamahata
2024-05-15  0:59 ` [PATCH 11/16] KVM: x86/tdp_mmu: Extract root invalid check from tdx_mmu_next_root() Rick Edgecombe
2024-05-15  0:59 ` [PATCH 12/16] KVM: x86/tdp_mmu: Introduce KVM MMU root types to specify page table type Rick Edgecombe
2024-05-15  0:59 ` [PATCH 13/16] KVM: x86/tdp_mmu: Introduce shared, private KVM MMU root types Rick Edgecombe
2024-05-15  0:59 ` [PATCH 14/16] KVM: x86/tdp_mmu: Take root types for kvm_tdp_mmu_invalidate_all_roots() Rick Edgecombe
2024-05-15  0:59 ` [PATCH 15/16] KVM: x86/tdp_mmu: Make mmu notifier callbacks to check kvm_process Rick Edgecombe
2024-05-15  0:59 ` [PATCH 16/16] KVM: x86/tdp_mmu: Invalidate correct roots Rick Edgecombe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=021e8ee11c87bfac90e886e78795d825ddab32ee.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=dmatlack@google.com \
    --cc=erdemaktas@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=sagis@google.com \
    --cc=seanjc@google.com \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox