Re: [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems.

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Steve Capper <steve.capper@arm.com>
To: Christoffer Dall <c.dall@virtualopensystems.com>
Cc: Steve Capper <Steve.Capper@arm.com>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"aarcange@redhat.com" <aarcange@redhat.com>,
	"bill4carson@gmail.com" <bill4carson@gmail.com>,
	"tawfik@marvell.com" <tawfik@marvell.com>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	Will Deacon <Will.Deacon@arm.com>,
	"cmetcalf@tilera.com" <cmetcalf@tilera.com>,
	"mhocko@suse.cz" <mhocko@suse.cz>,
	"maen@marvell.com" <maen@marvell.com>,
	"hoffman@marvell.com" <hoffman@marvell.com>,
	"notasas@gmail.com" <notasas@gmail.com>,
	"kirill@shutemov.name" <kirill@shutemov.name>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"shadi@marvell.com" <shadi@marvell.com>
Subject: Re: [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems.
Date: Tue, 8 Jan 2013 17:59:08 +0000	[thread overview]
Message-ID: <20130108175908.GE18347@e103986-lin> (raw)
In-Reply-To: <CANM98qJmN0ZGK48pKVSMQ5D+d0hrydqr0CHQEH5vk=x=NZ=8Mw@mail.gmail.com>

On Fri, Jan 04, 2013 at 05:04:50AM +0000, Christoffer Dall wrote:
> On Thu, Oct 18, 2012 at 12:15 PM, Steve Capper <steve.capper@arm.com> wrote:

> > diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
> > index d086f61..31c071f 100644
> > --- a/arch/arm/include/asm/pgtable-3level.h
> > +++ b/arch/arm/include/asm/pgtable-3level.h
> > @@ -85,6 +85,9 @@
> >  #define L_PTE_DIRTY            (_AT(pteval_t, 1) << 55)        /* unused */
> >  #define L_PTE_SPECIAL          (_AT(pteval_t, 1) << 56)        /* unused */
> >
> > +#define PMD_SECT_DIRTY         (_AT(pmdval_t, 1) << 55)
> > +#define PMD_SECT_SPLITTING     (_AT(pmdval_t, 1) << 57)
> > +
> >  /*
> >   * To be used in assembly code with the upper page attributes.
> >   */
> > @@ -166,6 +169,60 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
> >  #define pte_mkhuge(pte)                (__pte((pte_val(pte) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
> >
> >
> > +#define pmd_present(pmd)       ((pmd_val(pmd) & PMD_TYPE_MASK) != PMD_TYPE_FAULT)
> > +#define pmd_young(pmd)         (pmd_val(pmd) & PMD_SECT_AF)
> > +
> > +#define __HAVE_ARCH_PMD_WRITE
> > +#define pmd_write(pmd)         (!(pmd_val(pmd) & PMD_SECT_RDONLY))
> > +
> > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> > +#define pmd_trans_huge(pmd)    ((pmd_val(pmd) & PMD_TYPE_MASK) == PMD_TYPE_SECT)
> > +#define pmd_trans_splitting(pmd) (pmd_val(pmd) & PMD_SECT_SPLITTING)
> > +#endif
> > +
> > +#define PMD_BIT_FUNC(fn,op) \
> > +static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }
> > +
> > +PMD_BIT_FUNC(wrprotect,        |= PMD_SECT_RDONLY);
> > +PMD_BIT_FUNC(mkold,    &= ~PMD_SECT_AF);
> > +PMD_BIT_FUNC(mksplitting, |= PMD_SECT_SPLITTING);
> > +PMD_BIT_FUNC(mkwrite,   &= ~PMD_SECT_RDONLY);
> > +PMD_BIT_FUNC(mkdirty,   |= PMD_SECT_DIRTY);
> > +PMD_BIT_FUNC(mkyoung,   |= PMD_SECT_AF);
> > +PMD_BIT_FUNC(mknotpresent, &= ~PMD_TYPE_MASK);
> 
> personally I would prefer not to automate the prefixing of pmd_: it
> doesn't really save a lot of characters, it doesn't improve
> readability and it breaks grep/cscope.
> 

This follows the pte bit functions to a degree.

> > +
> > +#define pmd_mkhuge(pmd)                (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
> > +
> > +#define pmd_pfn(pmd)           ((pmd_val(pmd) & PHYS_MASK) >> PAGE_SHIFT)
> 
> the arm arm says UNK/SBZP, so we should be fine here right? (noone is
> crazy enough to try and squeeze some extra information in the extra
> bits here or something like that). For clarity, one could consider:
> 
> (((pmd_val(pmd) & PMD_MASK) & PHYS_MASK) >> PAGE_SHIFT)
> 

Thanks, yes, it's better to PMD_MASK the value too.

> > +#define pfn_pmd(pfn,prot)      (__pmd(((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)))
> > +#define mk_pmd(page,prot)      pfn_pmd(page_to_pfn(page),prot)
> > +
> > +static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
> > +{
> > +       const pmdval_t mask = PMD_SECT_USER | PMD_SECT_XN | PMD_SECT_RDONLY;
> > +       pmd_val(pmd) = (pmd_val(pmd) & ~mask) | (pgprot_val(newprot) & mask);
> > +       return pmd;
> > +}
> > +
> > +static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
> > +{
> > +       *pmdp = pmd;
> > +}
> 
> why this level of indirection?
> 

Over manipulation in git :-), this can be scrubbed.

> > +
> > +static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
> > +                             pmd_t *pmdp, pmd_t pmd)
> > +{
> > +       BUG_ON(addr >= TASK_SIZE);
> > +       pmd = __pmd(pmd_val(pmd) | PMD_SECT_nG);
> 
> why this side affect?
> 

This replicates the side effect found when placing ptes into page tables. We
need the NG bit for user pages.

> > +       set_pmd(pmdp, pmd);
> > +       flush_pmd_entry(pmdp);
> > +}
> > +
> > +static inline int has_transparent_hugepage(void)
> > +{
> > +       return 1;
> > +}
> > +
> >  #endif /* __ASSEMBLY__ */
> >
> >  #endif /* _ASM_PGTABLE_3LEVEL_H */
> > diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
> > index c35bf46..767aa7c 100644
> > --- a/arch/arm/include/asm/pgtable.h
> > +++ b/arch/arm/include/asm/pgtable.h
> > @@ -24,6 +24,9 @@
> >  #include <asm/memory.h>
> >  #include <asm/pgtable-hwdef.h>
> >
> > +
> > +#include <asm/tlbflush.h>
> > +
> >  #ifdef CONFIG_ARM_LPAE
> >  #include <asm/pgtable-3level.h>
> >  #else
> > @@ -163,7 +166,6 @@ extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
> >  #define pgd_offset_k(addr)     pgd_offset(&init_mm, addr)
> >
> >  #define pmd_none(pmd)          (!pmd_val(pmd))
> > -#define pmd_present(pmd)       (pmd_val(pmd))
> >
> >  static inline pte_t *pmd_page_vaddr(pmd_t pmd)
> >  {
> > diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
> > index 685e9e87..0fc2d9d 100644
> > --- a/arch/arm/include/asm/tlb.h
> > +++ b/arch/arm/include/asm/tlb.h
> > @@ -229,6 +229,12 @@ static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp,
> >  #endif
> >  }
> >
> > +static inline void
> > +tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr)
> > +{
> > +       tlb_add_flush(tlb, addr);
> > +}
> > +
> >  #define pte_free_tlb(tlb, ptep, addr)  __pte_free_tlb(tlb, ptep, addr)
> >  #define pmd_free_tlb(tlb, pmdp, addr)  __pmd_free_tlb(tlb, pmdp, addr)
> >  #define pud_free_tlb(tlb, pudp, addr)  pud_free((tlb)->mm, pudp)
> > diff --git a/arch/arm/include/asm/tlbflush.h b/arch/arm/include/asm/tlbflush.h
> > index 6e924d3..907cede 100644
> > --- a/arch/arm/include/asm/tlbflush.h
> > +++ b/arch/arm/include/asm/tlbflush.h
> > @@ -505,6 +505,8 @@ static inline void update_mmu_cache(struct vm_area_struct *vma,
> >  }
> >  #endif
> >
> > +#define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
> > +
> >  #endif
> >
> >  #endif /* CONFIG_MMU */
> > diff --git a/arch/arm/mm/fsr-3level.c b/arch/arm/mm/fsr-3level.c
> > index 05a4e94..47f4c6f 100644
> > --- a/arch/arm/mm/fsr-3level.c
> > +++ b/arch/arm/mm/fsr-3level.c
> > @@ -9,7 +9,7 @@ static struct fsr_info fsr_info[] = {
> >         { do_page_fault,        SIGSEGV, SEGV_MAPERR,   "level 3 translation fault"     },
> >         { do_bad,               SIGBUS,  0,             "reserved access flag fault"    },
> >         { do_bad,               SIGSEGV, SEGV_ACCERR,   "level 1 access flag fault"     },
> > -       { do_bad,               SIGSEGV, SEGV_ACCERR,   "level 2 access flag fault"     },
> > +       { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 2 access flag fault"     },
> >         { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 3 access flag fault"     },
> >         { do_bad,               SIGBUS,  0,             "reserved permission fault"     },
> >         { do_bad,               SIGSEGV, SEGV_ACCERR,   "level 1 permission fault"      },
> > --
> > 1.7.9.5
> >
> 
> Besides the nits it looks fine to me. I've done quite extensive
> testing with varied workloads on this code over the last couple of
> months on the vexpress TC2 and on the ARNDALE board using KVM/ARM with
> huge pages, and it gives a nice ~15% performance increase on average
> and is completely stable.

That's great to hear \o/.
Also I've found a decent perf boost when running tools like xz backed by huge pages.
(One can use the LD_PRELOAD mechanism in libhugetlbfs to make mallocs point to
huge pages).

> 
> -Christoffer
>

next prev parent reply	other threads:[~2013-01-08 17:59 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-18 16:15 [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Steve Capper
2012-10-18 16:15 ` [RFC PATCH 1/6] ARM: mm: correct pte_same behaviour for LPAE Steve Capper
2013-01-04  5:03   ` Christoffer Dall
2013-01-08 17:56     ` Steve Capper
2012-10-18 16:15 ` [RFC PATCH 2/6] ARM: mm: Add support for flushing HugeTLB pages Steve Capper
2013-01-04  5:03   ` Christoffer Dall
2013-01-08 17:56     ` Steve Capper
2012-10-18 16:15 ` [RFC PATCH 3/6] ARM: mm: HugeTLB support for LPAE systems Steve Capper
2013-01-04  5:03   ` Christoffer Dall
2013-01-08 17:57     ` Steve Capper
2013-01-08 18:10       ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 4/6] ARM: mm: HugeTLB support for non-LPAE systems Steve Capper
2013-01-04  5:04   ` Christoffer Dall
2013-01-04  5:04     ` Christoffer Dall
2013-01-08 17:58     ` Steve Capper
2013-01-08 18:13       ` Christoffer Dall
2013-01-08 18:13         ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 5/6] ARM: mm: Transparent huge page support for LPAE systems Steve Capper
2013-01-04  5:04   ` Christoffer Dall
2013-01-08 17:59     ` Steve Capper [this message]
2013-01-08 18:15       ` Christoffer Dall
2012-10-18 16:15 ` [RFC PATCH 6/6] ARM: mm: Transparent huge page support for non-LPAE systems Steve Capper
2013-01-04  5:04   ` Christoffer Dall
2013-01-08 17:59     ` Steve Capper
2013-01-08 18:17       ` Christoffer Dall
2012-12-21 13:41 ` [RFC PATCH 0/6] ARM: mm: HugeTLB + THP support Gregory CLEMENT
2012-12-23 11:11   ` Will Deacon
2012-12-23 11:11     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130108175908.GE18347@e103986-lin \
    --to=steve.capper@arm.com \
    --cc=Catalin.Marinas@arm.com \
    --cc=Will.Deacon@arm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bill4carson@gmail.com \
    --cc=c.dall@virtualopensystems.com \
    --cc=cmetcalf@tilera.com \
    --cc=hoffman@marvell.com \
    --cc=kirill@shutemov.name \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maen@marvell.com \
    --cc=mhocko@suse.cz \
    --cc=notasas@gmail.com \
    --cc=shadi@marvell.com \
    --cc=tawfik@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).