All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Huang <wei.huang2@amd.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	pbonzini@redhat.com, vkuznets@redhat.com, wanpengli@tencent.com,
	jmattson@google.com, joro@8bytes.org, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com
Subject: Re: [PATCH v1 2/3] KVM: x86: Handle the case of 5-level shadow page table
Date: Sun, 8 Aug 2021 12:49:27 -0500	[thread overview]
Message-ID: <60fe6735-9e0b-cd35-0660-a2bcafef2191@amd.com> (raw)
In-Reply-To: <YQ14RmuYxlAydmOu@google.com>



On 8/6/21 12:58 PM, Sean Christopherson wrote:
> On Thu, Aug 05, 2021, Wei Huang wrote:
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 44e4561e41f5..b162c3e530aa 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -3428,7 +3428,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>>   	 * the shadow page table may be a PAE or a long mode page table.
>>   	 */
>>   	pm_mask = PT_PRESENT_MASK | shadow_me_mask;
>> -	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL) {
>> +	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
>>   		pm_mask |= PT_ACCESSED_MASK | PT_WRITABLE_MASK | PT_USER_MASK;
>>   
>>   		if (WARN_ON_ONCE(!mmu->pml4_root)) {
>> @@ -3454,11 +3454,17 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>>   				      PT32_ROOT_LEVEL, false);
>>   		mmu->pae_root[i] = root | pm_mask;
>>   	}
>> +	mmu->root_hpa = __pa(mmu->pae_root);
>>   
>> -	if (mmu->shadow_root_level == PT64_ROOT_4LEVEL)
>> +	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
>> +		mmu->pml4_root[0] = mmu->root_hpa | pm_mask;
>>   		mmu->root_hpa = __pa(mmu->pml4_root);
>> -	else
>> -		mmu->root_hpa = __pa(mmu->pae_root);
>> +	}
>> +
>> +	if (mmu->shadow_root_level == PT64_ROOT_5LEVEL) {
>> +		mmu->pml5_root[0] = mmu->root_hpa | pm_mask;
>> +		mmu->root_hpa = __pa(mmu->pml5_root);
>> +	}
> 
> Ouch, the root_hpa chaining is subtle.  That's my fault :-)  I think it would be
> better to explicitly chain pae->pml4->pml5?  E.g.
> 
> 	if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
> 		mmu->pml4_root[0] = __pa(mmu->pae_root) | pm_mask;
> 
> 		if (mmu->shadow_root_level == PT64_ROOT_5LEVEL) {
> 			mmu->pml5_root[0] = __pa(mmu->pml4_root) | pm_mask;
> 			mmu->root_hpa = __pa(mmu->pml5_root);
> 		} else {
> 			mmu->root_hpa = __pa(mmu->pml4_root);
> 		}
> 	} else {
> 		mmu->root_hpa = __pa(mmu->pae_root);
> 	}
> 
> It'd require more churn if we get to 6-level paging, but that's a risk I'm willing
> to take ;-)
> 

Thanks for the review. This part of code is indeed subtle. The chaining 
trick will be easier to understand with a proper explanation. My 
proposal is to keep the original approach, but add more comments to this 
group of code.

       /* 
 

        * Depending on the shadow_root_level, build the root_hpa table 
by 

        * chaining either pml5->pml4->pae or pml4->pae. 
 

        */
       mmu->root_hpa = __pa(mmu->pae_root);
       if (mmu->shadow_root_level >= PT64_ROOT_4LEVEL) {
               mmu->pml4_root[0] = mmu->root_hpa | pm_mask;
               mmu->root_hpa = __pa(mmu->pml4_root);
       }
       if (mmu->shadow_root_level == PT64_ROOT_5LEVEL) {
               mmu->pml5_root[0] = mmu->root_hpa | pm_mask;
               mmu->root_hpa = __pa(mmu->pml5_root);
       }

This code will be easy to extend for 6-level page table (if needed) in 
the future.

>>   
>>   set_root_pgd:
>>   	mmu->root_pgd = root_pgd;
>> @@ -3471,7 +3477,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
>>   static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
>>   {
>>   	struct kvm_mmu *mmu = vcpu->arch.mmu;
>> -	u64 *pml4_root, *pae_root;
>> +	u64 *pml5_root, *pml4_root, *pae_root;
>>   
>>   	/*
>>   	 * When shadowing 32-bit or PAE NPT with 64-bit NPT, the PML4 and PDP
>> @@ -3487,17 +3493,18 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
>>   	 * This mess only works with 4-level paging and needs to be updated to
>>   	 * work with 5-level paging.
>>   	 */
>> -	if (WARN_ON_ONCE(mmu->shadow_root_level != PT64_ROOT_4LEVEL))
>> +	if (WARN_ON_ONCE(mmu->shadow_root_level < PT64_ROOT_4LEVEL)) {
> 
> This is amusingly wrong.  The check above this is:
> 
> 	if (mmu->direct_map || mmu->root_level >= PT64_ROOT_4LEVEL ||
> 	    mmu->shadow_root_level < PT64_ROOT_4LEVEL)  <--------
> 		return 0;
> 
> meaning this is dead code.  It should simply deleted.  If we reaaaaaly wanted to
> future proof the code, we could do:
> 
> 	if (WARN_ON_ONCE(mmu->shadow_root_level > PT64_ROOT_5LEVEL)
> 		return -EIO;
> 
> but at that point we're looking at a completely different architecture, so I don't
> think we need to be that paranoid :-)

You are right that this can be removed.

> 
>>   		return -EIO;
>> +	}
>>   
>> -	if (mmu->pae_root && mmu->pml4_root)
>> +	if (mmu->pae_root && mmu->pml4_root && mmu->pml5_root)
>>   		return 0;
>>   
>>   	/*
>>   	 * The special roots should always be allocated in concert.  Yell and
>>   	 * bail if KVM ends up in a state where only one of the roots is valid.
>>   	 */
>> -	if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->pml4_root))
>> +	if (WARN_ON_ONCE(!tdp_enabled || mmu->pae_root || mmu->pml4_root || mmu->pml5_root))
>>   		return -EIO;
>>   
>>   	/*
>> @@ -3506,18 +3513,30 @@ static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu)
>>   	 */
>>   	pae_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
>>   	if (!pae_root)
>> -		return -ENOMEM;
>> +		goto err_out;
> 
> Branching to the error handling here is silly, it's the first allocation.
> 
>>   
>>   	pml4_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
>> -	if (!pml4_root) {
>> -		free_page((unsigned long)pae_root);
>> -		return -ENOMEM;
>> -	}
>> +	if (!pml4_root)
>> +		goto err_out;
>> +
>> +	pml5_root = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
> 
> This should be guarded by "mmu->shadow_root_level > PT64_ROOT_4LEVEL", there's no
> need to waste a page on PML5 if it can't exist.

Will do

> 
>> +	if (!pml5_root)
>> +		goto err_out;
>>   
>>   	mmu->pae_root = pae_root;
>>   	mmu->pml4_root = pml4_root;
>> +	mmu->pml5_root = pml5_root;
>>   
>>   	return 0;
>> +err_out:
>> +	if (pae_root)
>> +		free_page((unsigned long)pae_root);
>> +	if (pml4_root)
>> +		free_page((unsigned long)pml4_root);
>> +	if (pml5_root)
>> +		free_page((unsigned long)pml5_root);
> 
> This is flawed as failure to allocate pml4_root will consume an uninitialized
> pml5_root.  There's also no need to check for non-NULL values as free_page plays
> nice with NULL pointers.
> 
> If you drop the unnecessary goto for pae_root allocation failure, than this can
> become:
> 
> err_out:
> 	free_page((unsigned long)pml4_root);
> 	free_page((unsigned long)pae_root);
> 
> since pml4_root will be NULL if pml4_root allocation failures.  IMO that's
> unnecessarily clever though, and a more standard:
> 
> err_pml5:
> 	free_page((unsigned long)pml4_root);
> err_pml4:
> 	free_page((unsigned long)pae_root);
> 	return -ENOMEM;
> 
> would be far easier to read/maintain.
> 

I will take the advice for this part of code.

>> +
>> +	return -ENOMEM;
>>   }
>>   
>>   void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
>> @@ -5320,6 +5339,7 @@ static void free_mmu_pages(struct kvm_mmu *mmu)
>>   		set_memory_encrypted((unsigned long)mmu->pae_root, 1);
>>   	free_page((unsigned long)mmu->pae_root);
>>   	free_page((unsigned long)mmu->pml4_root);
>> +	free_page((unsigned long)mmu->pml5_root);
>>   }
>>   
>>   static int __kvm_mmu_create(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu)
>> -- 
>> 2.31.1
>>

  reply	other threads:[~2021-08-08 17:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 20:55 [PATCH v1 0/3] SVM 5-level page table support Wei Huang
2021-08-05 20:55 ` [PATCH v1 1/3] KVM: x86: Convert TDP level calculation to vendor's specific code Wei Huang
2021-08-05 21:51   ` Sean Christopherson
2021-08-05 22:26     ` Wei Huang
2021-08-08 19:30     ` Wei Huang
2021-08-05 22:35   ` Jim Mattson
2021-08-05 22:44     ` Sean Christopherson
2021-08-05 20:55 ` [PATCH v1 2/3] KVM: x86: Handle the case of 5-level shadow page table Wei Huang
2021-08-06 17:58   ` Sean Christopherson
2021-08-08 17:49     ` Wei Huang [this message]
2021-08-05 20:55 ` [PATCH v1 3/3] KVM: SVM: Add 5-level page table support for SVM Wei Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=60fe6735-9e0b-cd35-0660-a2bcafef2191@amd.com \
    --to=wei.huang2@amd.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.