[PATCH] arm64: KVM: fix 2-level page tables unmapping

All of lore.kernel.org
 help / color / mirror / Atom feed

From: christoffer.dall@linaro.org (Christoffer Dall)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] arm64: KVM: fix 2-level page tables unmapping
Date: Wed, 7 Aug 2013 10:32:31 -0700	[thread overview]
Message-ID: <20130807173231.GS16694@cbox> (raw)
In-Reply-To: <5202229C.5040007@arm.com>

On Wed, Aug 07, 2013 at 11:34:04AM +0100, Marc Zyngier wrote:
> On 06/08/13 21:49, Christoffer Dall wrote:
> > On Tue, Aug 06, 2013 at 01:05:48PM +0100, Marc Zyngier wrote:
> >> When using 64kB pages, we only have two levels of page tables,
> >> meaning that PGD, PUD and PMD are fused. In this case, trying
> >> to refcount PUDs and PMDs independantly is a a complete disaster,
> > 
> > independently
> > 
> >> as they are the same.
> >>
> >> We manage to get it right for the allocation (stage2_set_pte uses
> >> {pmd,pud}_none), but the unmapping path clears both pud and pmd
> >> refcounts, which fails spectacularly with 2-level page tables.
> >>
> >> The fix is to avoid calling clear_pud_entry when both the pmd and
> >> pud pages are empty. For this, and instead of introducing another
> >> pud_empty function, consolidate both pte_empty and pmd_empty into
> >> page_empty (the code is actually identical) and use that to also
> >> test the validity of the pud.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/kvm/mmu.c | 22 ++++++++--------------
> >>  1 file changed, 8 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index ca6bea4..7e1d899 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -85,6 +85,12 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
> >>  	return p;
> >>  }
> >>  
> >> +static bool page_empty(void *ptr)
> >> +{
> >> +	struct page *ptr_page = virt_to_page(ptr);
> >> +	return page_count(ptr_page) == 1;
> >> +}
> >> +
> >>  static void clear_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr)
> >>  {
> >>  	pmd_t *pmd_table = pmd_offset(pud, 0);
> >> @@ -103,12 +109,6 @@ static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
> >>  	put_page(virt_to_page(pmd));
> >>  }
> >>  
> >> -static bool pmd_empty(pmd_t *pmd)
> >> -{
> >> -	struct page *pmd_page = virt_to_page(pmd);
> >> -	return page_count(pmd_page) == 1;
> >> -}
> >> -
> >>  static void clear_pte_entry(struct kvm *kvm, pte_t *pte, phys_addr_t addr)
> >>  {
> >>  	if (pte_present(*pte)) {
> >> @@ -118,12 +118,6 @@ static void clear_pte_entry(struct kvm *kvm, pte_t *pte, phys_addr_t addr)
> >>  	}
> >>  }
> >>  
> >> -static bool pte_empty(pte_t *pte)
> >> -{
> >> -	struct page *pte_page = virt_to_page(pte);
> >> -	return page_count(pte_page) == 1;
> >> -}
> >> -
> >>  static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
> >>  			unsigned long long start, u64 size)
> >>  {
> >> @@ -153,10 +147,10 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
> >>  		range = PAGE_SIZE;
> >>  
> >>  		/* If we emptied the pte, walk back up the ladder */
> >> -		if (pte_empty(pte)) {
> >> +		if (page_empty(pte)) {
> >>  			clear_pmd_entry(kvm, pmd, addr);
> >>  			range = PMD_SIZE;
> >> -			if (pmd_empty(pmd)) {
> >> +			if (page_empty(pmd) && !page_empty(pud)) {
> >>  				clear_pud_entry(kvm, pud, addr);
> >>  				range = PUD_SIZE;
> >>  			}
> > 
> > looks right, an alternative would be to check in clear_pud_entry if the
> > entry actually had a value, but I don't think it's really clearer.
> > 
> > However, this got me thinking a bit.  What happens if we pass a non-pmd
> > aligned address to unmap_range, and let's assume the size of the range
> > is more than 2MB, won't we be leaking memory by incrementing with
> > PMD_SIZE?  (same argument goes for PUD_SIZE).  See the patch below:
> > 
> > 
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index ca6bea4..80a83ec 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -132,37 +132,37 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
> >  	pmd_t *pmd;
> >  	pte_t *pte;
> >  	unsigned long long addr = start, end = start + size;
> > -	u64 range;
> > +	u64 next;
> >  
> >  	while (addr < end) {
> >  		pgd = pgdp + pgd_index(addr);
> >  		pud = pud_offset(pgd, addr);
> >  		if (pud_none(*pud)) {
> > -			addr += PUD_SIZE;
> > +			addr = pud_addr_end(addr, end);
> >  			continue;
> >  		}
> >  
> >  		pmd = pmd_offset(pud, addr);
> >  		if (pmd_none(*pmd)) {
> > -			addr += PMD_SIZE;
> > +			addr = pmd_addr_end(addr, end);
> >  			continue;
> >  		}
> >  
> >  		pte = pte_offset_kernel(pmd, addr);
> >  		clear_pte_entry(kvm, pte, addr);
> > -		range = PAGE_SIZE;
> > +		next = addr + PAGE_SIZE;
> >  
> >  		/* If we emptied the pte, walk back up the ladder */
> >  		if (pte_empty(pte)) {
> >  			clear_pmd_entry(kvm, pmd, addr);
> > -			range = PMD_SIZE;
> > +			next = pmd_addr_end(addr, end);
> >  			if (pmd_empty(pmd)) {
> >  				clear_pud_entry(kvm, pud, addr);
> > -				range = PUD_SIZE;
> > +				next = pud_addr_end(addr, end);
> >  			}
> >  		}
> >  
> > -		addr += range;
> > +		addr = next;
> >  	}
> >  }
> 
> That looks sensible. Would you prepare a patch on which I could rebase
> the above?
> 
I can just apply both patches as they are will the spell fix to
kvm-arm-fixes.

-Christoffer

WARNING: multiple messages have this Message-ID (diff)

From: Christoffer Dall <christoffer.dall@linaro.org>
To: Marc Zyngier <marc.zyngier@arm.com>
Cc: "linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH] arm64: KVM: fix 2-level page tables unmapping
Date: Wed, 7 Aug 2013 10:32:31 -0700	[thread overview]
Message-ID: <20130807173231.GS16694@cbox> (raw)
In-Reply-To: <5202229C.5040007@arm.com>

On Wed, Aug 07, 2013 at 11:34:04AM +0100, Marc Zyngier wrote:
> On 06/08/13 21:49, Christoffer Dall wrote:
> > On Tue, Aug 06, 2013 at 01:05:48PM +0100, Marc Zyngier wrote:
> >> When using 64kB pages, we only have two levels of page tables,
> >> meaning that PGD, PUD and PMD are fused. In this case, trying
> >> to refcount PUDs and PMDs independantly is a a complete disaster,
> > 
> > independently
> > 
> >> as they are the same.
> >>
> >> We manage to get it right for the allocation (stage2_set_pte uses
> >> {pmd,pud}_none), but the unmapping path clears both pud and pmd
> >> refcounts, which fails spectacularly with 2-level page tables.
> >>
> >> The fix is to avoid calling clear_pud_entry when both the pmd and
> >> pud pages are empty. For this, and instead of introducing another
> >> pud_empty function, consolidate both pte_empty and pmd_empty into
> >> page_empty (the code is actually identical) and use that to also
> >> test the validity of the pud.
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >>  arch/arm/kvm/mmu.c | 22 ++++++++--------------
> >>  1 file changed, 8 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index ca6bea4..7e1d899 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -85,6 +85,12 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
> >>  	return p;
> >>  }
> >>  
> >> +static bool page_empty(void *ptr)
> >> +{
> >> +	struct page *ptr_page = virt_to_page(ptr);
> >> +	return page_count(ptr_page) == 1;
> >> +}
> >> +
> >>  static void clear_pud_entry(struct kvm *kvm, pud_t *pud, phys_addr_t addr)
> >>  {
> >>  	pmd_t *pmd_table = pmd_offset(pud, 0);
> >> @@ -103,12 +109,6 @@ static void clear_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr)
> >>  	put_page(virt_to_page(pmd));
> >>  }
> >>  
> >> -static bool pmd_empty(pmd_t *pmd)
> >> -{
> >> -	struct page *pmd_page = virt_to_page(pmd);
> >> -	return page_count(pmd_page) == 1;
> >> -}
> >> -
> >>  static void clear_pte_entry(struct kvm *kvm, pte_t *pte, phys_addr_t addr)
> >>  {
> >>  	if (pte_present(*pte)) {
> >> @@ -118,12 +118,6 @@ static void clear_pte_entry(struct kvm *kvm, pte_t *pte, phys_addr_t addr)
> >>  	}
> >>  }
> >>  
> >> -static bool pte_empty(pte_t *pte)
> >> -{
> >> -	struct page *pte_page = virt_to_page(pte);
> >> -	return page_count(pte_page) == 1;
> >> -}
> >> -
> >>  static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
> >>  			unsigned long long start, u64 size)
> >>  {
> >> @@ -153,10 +147,10 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
> >>  		range = PAGE_SIZE;
> >>  
> >>  		/* If we emptied the pte, walk back up the ladder */
> >> -		if (pte_empty(pte)) {
> >> +		if (page_empty(pte)) {
> >>  			clear_pmd_entry(kvm, pmd, addr);
> >>  			range = PMD_SIZE;
> >> -			if (pmd_empty(pmd)) {
> >> +			if (page_empty(pmd) && !page_empty(pud)) {
> >>  				clear_pud_entry(kvm, pud, addr);
> >>  				range = PUD_SIZE;
> >>  			}
> > 
> > looks right, an alternative would be to check in clear_pud_entry if the
> > entry actually had a value, but I don't think it's really clearer.
> > 
> > However, this got me thinking a bit.  What happens if we pass a non-pmd
> > aligned address to unmap_range, and let's assume the size of the range
> > is more than 2MB, won't we be leaking memory by incrementing with
> > PMD_SIZE?  (same argument goes for PUD_SIZE).  See the patch below:
> > 
> > 
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index ca6bea4..80a83ec 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -132,37 +132,37 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
> >  	pmd_t *pmd;
> >  	pte_t *pte;
> >  	unsigned long long addr = start, end = start + size;
> > -	u64 range;
> > +	u64 next;
> >  
> >  	while (addr < end) {
> >  		pgd = pgdp + pgd_index(addr);
> >  		pud = pud_offset(pgd, addr);
> >  		if (pud_none(*pud)) {
> > -			addr += PUD_SIZE;
> > +			addr = pud_addr_end(addr, end);
> >  			continue;
> >  		}
> >  
> >  		pmd = pmd_offset(pud, addr);
> >  		if (pmd_none(*pmd)) {
> > -			addr += PMD_SIZE;
> > +			addr = pmd_addr_end(addr, end);
> >  			continue;
> >  		}
> >  
> >  		pte = pte_offset_kernel(pmd, addr);
> >  		clear_pte_entry(kvm, pte, addr);
> > -		range = PAGE_SIZE;
> > +		next = addr + PAGE_SIZE;
> >  
> >  		/* If we emptied the pte, walk back up the ladder */
> >  		if (pte_empty(pte)) {
> >  			clear_pmd_entry(kvm, pmd, addr);
> > -			range = PMD_SIZE;
> > +			next = pmd_addr_end(addr, end);
> >  			if (pmd_empty(pmd)) {
> >  				clear_pud_entry(kvm, pud, addr);
> > -				range = PUD_SIZE;
> > +				next = pud_addr_end(addr, end);
> >  			}
> >  		}
> >  
> > -		addr += range;
> > +		addr = next;
> >  	}
> >  }
> 
> That looks sensible. Would you prepare a patch on which I could rebase
> the above?
> 
I can just apply both patches as they are will the spell fix to
kvm-arm-fixes.

-Christoffer

next prev parent reply	other threads:[~2013-08-07 17:32 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-06 12:05 [PATCH] arm64: KVM: fix 2-level page tables unmapping Marc Zyngier
2013-08-06 12:05 ` Marc Zyngier
2013-08-06 20:49 ` Christoffer Dall
2013-08-06 20:49   ` Christoffer Dall
2013-08-07 10:34   ` Marc Zyngier
2013-08-07 10:34     ` Marc Zyngier
2013-08-07 17:32     ` Christoffer Dall [this message]
2013-08-07 17:32       ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130807173231.GS16694@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.