linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: kvm-ppc@vger.kernel.org, Paul Mackerras <paulus@ozlabs.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size
Date: Wed, 5 Sep 2018 13:59:52 +1000	[thread overview]
Message-ID: <20180905035952.GJ2679@umbus.fritz.box> (raw)
In-Reply-To: <20180904081601.32703-1-npiggin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5572 bytes --]

On Tue, Sep 04, 2018 at 06:16:01PM +1000, Nicholas Piggin wrote:
> THP paths can defer splitting compound pages until after the actual
> remap and TLB flushes to split a huge PMD/PUD. This causes radix
> partition scope page table mappings to get out of synch with the host
> qemu page table mappings.
> 
> This results in random memory corruption in the guest when running
> with THP. The easiest way to reproduce is use KVM baloon to free up
> a lot of memory in the guest and then shrink the balloon to give the
> memory back, while some work is being done in the guest.
> 
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Seems to fix the problem on my test case.

Tested-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  arch/powerpc/kvm/book3s_64_mmu_radix.c | 88 ++++++++++----------------
>  1 file changed, 34 insertions(+), 54 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index 0af1c0aea1fe..d8792445d95a 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -525,8 +525,8 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  				   unsigned long ea, unsigned long dsisr)
>  {
>  	struct kvm *kvm = vcpu->kvm;
> -	unsigned long mmu_seq, pte_size;
> -	unsigned long gpa, gfn, hva, pfn;
> +	unsigned long mmu_seq;
> +	unsigned long gpa, gfn, hva;
>  	struct kvm_memory_slot *memslot;
>  	struct page *page = NULL;
>  	long ret;
> @@ -623,9 +623,10 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  	 */
>  	hva = gfn_to_hva_memslot(memslot, gfn);
>  	if (upgrade_p && __get_user_pages_fast(hva, 1, 1, &page) == 1) {
> -		pfn = page_to_pfn(page);
>  		upgrade_write = true;
>  	} else {
> +		unsigned long pfn;
> +
>  		/* Call KVM generic code to do the slow-path check */
>  		pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL,
>  					   writing, upgrade_p);
> @@ -639,63 +640,42 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
>  		}
>  	}
>  
> -	/* See if we can insert a 1GB or 2MB large PTE here */
> -	level = 0;
> -	if (page && PageCompound(page)) {
> -		pte_size = PAGE_SIZE << compound_order(compound_head(page));
> -		if (pte_size >= PUD_SIZE &&
> -		    (gpa & (PUD_SIZE - PAGE_SIZE)) ==
> -		    (hva & (PUD_SIZE - PAGE_SIZE))) {
> -			level = 2;
> -			pfn &= ~((PUD_SIZE >> PAGE_SHIFT) - 1);
> -		} else if (pte_size >= PMD_SIZE &&
> -			   (gpa & (PMD_SIZE - PAGE_SIZE)) ==
> -			   (hva & (PMD_SIZE - PAGE_SIZE))) {
> -			level = 1;
> -			pfn &= ~((PMD_SIZE >> PAGE_SHIFT) - 1);
> -		}
> -	}
> -
>  	/*
> -	 * Compute the PTE value that we need to insert.
> +	 * Read the PTE from the process' radix tree and use that
> +	 * so we get the shift and attribute bits.
>  	 */
> -	if (page) {
> -		pgflags = _PAGE_READ | _PAGE_EXEC | _PAGE_PRESENT | _PAGE_PTE |
> -			_PAGE_ACCESSED;
> -		if (writing || upgrade_write)
> -			pgflags |= _PAGE_WRITE | _PAGE_DIRTY;
> -		pte = pfn_pte(pfn, __pgprot(pgflags));
> +	local_irq_disable();
> +	ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift);
> +	pte = *ptep;
> +	local_irq_enable();
> +
> +	/* Get pte level from shift/size */
> +	if (shift == PUD_SHIFT &&
> +	    (gpa & (PUD_SIZE - PAGE_SIZE)) ==
> +	    (hva & (PUD_SIZE - PAGE_SIZE))) {
> +		level = 2;
> +	} else if (shift == PMD_SHIFT &&
> +		   (gpa & (PMD_SIZE - PAGE_SIZE)) ==
> +		   (hva & (PMD_SIZE - PAGE_SIZE))) {
> +		level = 1;
>  	} else {
> -		/*
> -		 * Read the PTE from the process' radix tree and use that
> -		 * so we get the attribute bits.
> -		 */
> -		local_irq_disable();
> -		ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift);
> -		pte = *ptep;
> -		local_irq_enable();
> -		if (shift == PUD_SHIFT &&
> -		    (gpa & (PUD_SIZE - PAGE_SIZE)) ==
> -		    (hva & (PUD_SIZE - PAGE_SIZE))) {
> -			level = 2;
> -		} else if (shift == PMD_SHIFT &&
> -			   (gpa & (PMD_SIZE - PAGE_SIZE)) ==
> -			   (hva & (PMD_SIZE - PAGE_SIZE))) {
> -			level = 1;
> -		} else if (shift && shift != PAGE_SHIFT) {
> -			/* Adjust PFN */
> -			unsigned long mask = (1ul << shift) - PAGE_SIZE;
> -			pte = __pte(pte_val(pte) | (hva & mask));
> -		}
> -		pte = __pte(pte_val(pte) | _PAGE_EXEC | _PAGE_ACCESSED);
> -		if (writing || upgrade_write) {
> -			if (pte_val(pte) & _PAGE_WRITE)
> -				pte = __pte(pte_val(pte) | _PAGE_DIRTY);
> -		} else {
> -			pte = __pte(pte_val(pte) & ~(_PAGE_WRITE | _PAGE_DIRTY));
> +		level = 0;
> +
> +		/* Can not cope with unknown page shift */
> +		if (shift && shift != PAGE_SHIFT) {
> +			WARN_ON_ONCE(1);
> +			return -EFAULT;
>  		}
>  	}
>  
> +	pte = __pte(pte_val(pte) | _PAGE_EXEC | _PAGE_ACCESSED);
> +	if (writing || upgrade_write) {
> +		if (pte_val(pte) & _PAGE_WRITE)
> +			pte = __pte(pte_val(pte) | _PAGE_DIRTY);
> +	} else {
> +		pte = __pte(pte_val(pte) & ~(_PAGE_WRITE | _PAGE_DIRTY));
> +	}
> +
>  	/* Allocate space in the tree and write the PTE */
>  	ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2018-09-05  5:47 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-04  8:16 [PATCH] KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size Nicholas Piggin
2018-09-04  9:10 ` Aneesh Kumar K.V
2018-09-05  3:59 ` David Gibson [this message]
2018-09-11 10:01 ` Paul Mackerras
2018-09-11 10:46   ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180905035952.GJ2679@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=paulus@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).