[PATCH 3/3] KVM: ARM: Transparent huge pages and hugetlbfs support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: steve.capper@linaro.org (Steve Capper)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 3/3] KVM: ARM: Transparent huge pages and hugetlbfs support
Date: Tue, 24 Sep 2013 16:41:19 +0100	[thread overview]
Message-ID: <20130924154118.GA18779@linaro.org> (raw)
In-Reply-To: <524013BB.4090303@arm.com>

On Mon, Sep 23, 2013 at 11:11:07AM +0100, Marc Zyngier wrote:
> Hi Christoffer,
> 
> Finally taking some time to review this patch. Sorry for the delay...
> 
> On 09/08/13 05:07, Christoffer Dall wrote:
> > From: Christoffer Dall <cdall@cs.columbia.edu>
> > 

[ snip ]

> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > -                         gfn_t gfn, struct kvm_memory_slot *memslot,
> > +                         struct kvm_memory_slot *memslot,
> >                           unsigned long fault_status)
> >  {
> > -       pte_t new_pte;
> > -       pfn_t pfn;
> >         int ret;
> > -       bool write_fault, writable;
> > +       bool write_fault, writable, hugetlb = false, force_pte = false;
> >         unsigned long mmu_seq;
> > +       gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> > +       unsigned long hva = gfn_to_hva(vcpu->kvm, gfn);
> > +       struct kvm *kvm = vcpu->kvm;
> >         struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> > +       struct vm_area_struct *vma;
> > +       pfn_t pfn;
> > +       unsigned long psize;
> > 
> >         write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
> >         if (fault_status == FSC_PERM && !write_fault) {
> > @@ -525,6 +638,27 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                 return -EFAULT;
> >         }
> > 
> > +       /* Let's check if we will get back a huge page */
> > +       down_read(&current->mm->mmap_sem);
> > +       vma = find_vma_intersection(current->mm, hva, hva + 1);
> > +       if (is_vm_hugetlb_page(vma)) {
> > +               hugetlb = true;
> > +               hva &= PMD_MASK;
> > +               gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> > +               psize = PMD_SIZE;
> > +       } else {
> > +               psize = PAGE_SIZE;
> > +               if (vma->vm_start & ~PMD_MASK)
> > +                       force_pte = true;
> > +       }
> > +       up_read(&current->mm->mmap_sem);
> > +
> > +       pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
> > +       if (is_error_pfn(pfn))
> > +               return -EFAULT;
> 
> How does this work with respect to the comment that talks about reading
> mmu_seq before calling gfp_to_pfn_prot? Either the comment is wrong, or
> we read this too early.
> 
> > +       coherent_icache_guest_page(kvm, hva, psize);
> > +
> >         /* We need minimum second+third level pages */
> >         ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
> >         if (ret)
> > @@ -542,26 +676,34 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >          */
> >         smp_rmb();
> > 
> > -       pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
> > -       if (is_error_pfn(pfn))
> > -               return -EFAULT;
> > -
> > -       new_pte = pfn_pte(pfn, PAGE_S2);
> > -       coherent_icache_guest_page(vcpu->kvm, gfn);
> > -
> > -       spin_lock(&vcpu->kvm->mmu_lock);
> > -       if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
> > +       spin_lock(&kvm->mmu_lock);
> > +       if (mmu_notifier_retry(kvm, mmu_seq))
> >                 goto out_unlock;
> > -       if (writable) {
> > -               kvm_set_s2pte_writable(&new_pte);
> > -               kvm_set_pfn_dirty(pfn);
> > +       if (!hugetlb && !force_pte)
> > +               hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> 
> How do we ensure that there is no race between this pte being promoted
> to a pmd and another page allocation in the same pmd somewhere else in
> the system? We only hold the kvm lock here, so there must be some extra
> implicit guarantee somewhere...
> 

This isn't a promotion to a huge page, it is a mechanism to ensure that
pfn corresponds with the head page of a THP as that's where refcount
information is stored. I think this is safe.

I'm still getting my brain working with kvm, so sorry don't have any
other feedback yet sorry :-).

Cheers,
-- 
Steve

WARNING: multiple messages have this Message-ID (diff)

From: Steve Capper <steve.capper@linaro.org>
To: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>,
	Christoffer Dall <cdall@cs.columbia.edu>,
	"linaro-kernel@lists.linaro.org" <linaro-kernel@lists.linaro.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"patches@linaro.org" <patches@linaro.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 3/3] KVM: ARM: Transparent huge pages and hugetlbfs support
Date: Tue, 24 Sep 2013 16:41:19 +0100	[thread overview]
Message-ID: <20130924154118.GA18779@linaro.org> (raw)
In-Reply-To: <524013BB.4090303@arm.com>

On Mon, Sep 23, 2013 at 11:11:07AM +0100, Marc Zyngier wrote:
> Hi Christoffer,
> 
> Finally taking some time to review this patch. Sorry for the delay...
> 
> On 09/08/13 05:07, Christoffer Dall wrote:
> > From: Christoffer Dall <cdall@cs.columbia.edu>
> > 

[ snip ]

> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > -                         gfn_t gfn, struct kvm_memory_slot *memslot,
> > +                         struct kvm_memory_slot *memslot,
> >                           unsigned long fault_status)
> >  {
> > -       pte_t new_pte;
> > -       pfn_t pfn;
> >         int ret;
> > -       bool write_fault, writable;
> > +       bool write_fault, writable, hugetlb = false, force_pte = false;
> >         unsigned long mmu_seq;
> > +       gfn_t gfn = fault_ipa >> PAGE_SHIFT;
> > +       unsigned long hva = gfn_to_hva(vcpu->kvm, gfn);
> > +       struct kvm *kvm = vcpu->kvm;
> >         struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> > +       struct vm_area_struct *vma;
> > +       pfn_t pfn;
> > +       unsigned long psize;
> > 
> >         write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
> >         if (fault_status == FSC_PERM && !write_fault) {
> > @@ -525,6 +638,27 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                 return -EFAULT;
> >         }
> > 
> > +       /* Let's check if we will get back a huge page */
> > +       down_read(&current->mm->mmap_sem);
> > +       vma = find_vma_intersection(current->mm, hva, hva + 1);
> > +       if (is_vm_hugetlb_page(vma)) {
> > +               hugetlb = true;
> > +               hva &= PMD_MASK;
> > +               gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT;
> > +               psize = PMD_SIZE;
> > +       } else {
> > +               psize = PAGE_SIZE;
> > +               if (vma->vm_start & ~PMD_MASK)
> > +                       force_pte = true;
> > +       }
> > +       up_read(&current->mm->mmap_sem);
> > +
> > +       pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
> > +       if (is_error_pfn(pfn))
> > +               return -EFAULT;
> 
> How does this work with respect to the comment that talks about reading
> mmu_seq before calling gfp_to_pfn_prot? Either the comment is wrong, or
> we read this too early.
> 
> > +       coherent_icache_guest_page(kvm, hva, psize);
> > +
> >         /* We need minimum second+third level pages */
> >         ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
> >         if (ret)
> > @@ -542,26 +676,34 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >          */
> >         smp_rmb();
> > 
> > -       pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write_fault, &writable);
> > -       if (is_error_pfn(pfn))
> > -               return -EFAULT;
> > -
> > -       new_pte = pfn_pte(pfn, PAGE_S2);
> > -       coherent_icache_guest_page(vcpu->kvm, gfn);
> > -
> > -       spin_lock(&vcpu->kvm->mmu_lock);
> > -       if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
> > +       spin_lock(&kvm->mmu_lock);
> > +       if (mmu_notifier_retry(kvm, mmu_seq))
> >                 goto out_unlock;
> > -       if (writable) {
> > -               kvm_set_s2pte_writable(&new_pte);
> > -               kvm_set_pfn_dirty(pfn);
> > +       if (!hugetlb && !force_pte)
> > +               hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> 
> How do we ensure that there is no race between this pte being promoted
> to a pmd and another page allocation in the same pmd somewhere else in
> the system? We only hold the kvm lock here, so there must be some extra
> implicit guarantee somewhere...
> 

This isn't a promotion to a huge page, it is a mechanism to ensure that
pfn corresponds with the head page of a THP as that's where refcount
information is stored. I think this is safe.

I'm still getting my brain working with kvm, so sorry don't have any
other feedback yet sorry :-).

Cheers,
-- 
Steve

next prev parent reply	other threads:[~2013-09-24 15:41 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-09  4:07 [PATCH 0/3] KVM/ARM Huge pages support Christoffer Dall
2013-08-09  4:07 ` Christoffer Dall
2013-08-09  4:07 ` [PATCH 1/3] KVM: Move gfn_to_index to x86 specific code Christoffer Dall
2013-08-09  4:07   ` Christoffer Dall
2013-08-09  4:07 ` [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_XXX defines Christoffer Dall
2013-08-09  4:07   ` Christoffer Dall
2013-08-09  4:07 ` [PATCH 3/3] KVM: ARM: Transparent huge pages and hugetlbfs support Christoffer Dall
2013-08-09  4:07   ` Christoffer Dall
2013-09-23 10:11   ` Marc Zyngier
2013-09-23 10:11     ` Marc Zyngier
2013-09-23 14:46     ` Marc Zyngier
2013-09-23 14:46       ` Marc Zyngier
2013-09-24 15:41     ` Steve Capper [this message]
2013-09-24 15:41       ` Steve Capper
2013-10-02 22:36     ` Christoffer Dall
2013-10-02 22:36       ` Christoffer Dall
2013-10-03 20:33     ` Christoffer Dall
2013-10-03 20:33       ` Christoffer Dall
2013-10-04  9:23       ` Marc Zyngier
2013-10-04  9:23         ` Marc Zyngier
2013-08-09 14:30 ` [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines Christoffer Dall
2013-08-09 14:30   ` Christoffer Dall
2013-08-25 14:05   ` Gleb Natapov
2013-08-25 14:05     ` Gleb Natapov
2013-08-25 14:29     ` Peter Maydell
2013-08-25 14:29       ` Peter Maydell
2013-08-25 14:48       ` Gleb Natapov
2013-08-25 14:48         ` Gleb Natapov
2013-08-25 15:18         ` Peter Maydell
2013-08-25 15:18           ` Peter Maydell
2013-08-25 15:27           ` Alexander Graf
2013-08-25 15:27             ` Alexander Graf
2013-08-26 10:55             ` Gleb Natapov
2013-08-26 10:55               ` Gleb Natapov
2013-08-26  0:46           ` Christoffer Dall
2013-08-26  0:46             ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130924154118.GA18779@linaro.org \
    --to=steve.capper@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.