linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org, Anton Blanchard <anton@samba.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: Re: [PATCH] powerpc/64: implement a slice mask cache
Date: Mon, 25 Jul 2016 14:35:34 +1000	[thread overview]
Message-ID: <20160725143534.78bbf834@roar.ozlabs.ibm.com> (raw)
In-Reply-To: <20160723103642.GB13796@350D>

On Sat, 23 Jul 2016 20:36:42 +1000
Balbir Singh <bsingharora@gmail.com> wrote:

> On Sat, Jul 23, 2016 at 05:10:36PM +1000, Nicholas Piggin wrote:
> > On Sat, 23 Jul 2016 12:19:37 +1000
> > Balbir Singh <bsingharora@gmail.com> wrote:
> >   
> > > On Fri, Jul 22, 2016 at 10:57:28PM +1000, Nicholas Piggin wrote:  
> > > > Calculating the slice mask can become a signifcant overhead for
> > > > get_unmapped_area. The mask is relatively small and does not
> > > > change frequently, so we can cache it in the mm context.
> > > > 
> > > > This saves about 30% kernel time on a 4K user address allocation
> > > > in a microbenchmark.
> > > > 
> > > > Comments on the approach taken? I think there is the option for
> > > > fixed allocations to avoid some of the slice calculation
> > > > entirely, but first I think it will be good to have a general
> > > > speedup that covers all mmaps.
> > > > 
> > > > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > > > Cc: Anton Blanchard <anton@samba.org>
> > > > ---
> > > >  arch/powerpc/include/asm/book3s/64/mmu.h |  8 +++++++
> > > >  arch/powerpc/mm/slice.c                  | 39
> > > > ++++++++++++++++++++++++++++++-- 2 files changed, 45
> > > > insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h
> > > > b/arch/powerpc/include/asm/book3s/64/mmu.h index
> > > > 5854263..0d15af4 100644 ---
> > > > a/arch/powerpc/include/asm/book3s/64/mmu.h +++
> > > > b/arch/powerpc/include/asm/book3s/64/mmu.h @@ -71,6 +71,14 @@
> > > > typedef struct { #ifdef CONFIG_PPC_MM_SLICES
> > > >  	u64 low_slices_psize;	/* SLB page size
> > > > encodings */ unsigned char high_slices_psize[SLICE_ARRAY_SIZE];
> > > > +	struct slice_mask mask_4k;
> > > > +# ifdef CONFIG_PPC_64K_PAGES
> > > > +	struct slice_mask mask_64k;
> > > > +# endif
> > > > +# ifdef CONFIG_HUGETLB_PAGE
> > > > +	struct slice_mask mask_16m;
> > > > +	struct slice_mask mask_16g;
> > > > +# endif    
> > > 
> > > Should we cache these in mmu_psize_defs? I am not 100% sure
> > > if want to overload that structure, but it provides a convient
> > > way of saying mmu_psize_defs[psize].mask instead of all
> > > the if checks  
> > 
> > I'm not sure if we can, can we? mmu_psize_defs is global
> > whereas we need per-process structure.
> >  
> 
> Oh! sorry, I meant a structure like mmu_psize_defs.

In that case, sure. Avoiding the branches might be worthwhile.


> > The branches are a bit annoying, but we can't directly use an array
> > because it's too big. But see the comment at MMU_PAGE_* defines.
> > Perhaps we could change this structure to be sized at compile time
> > to only include possible page sizes, and would enable building a
> > structure like the above with simply
> > 
> > struct type blah[MMU_POSSIBLE_PAGE_COUNT];
> > 
> > Perhaps we can consider that as a follow on patch? It's probably a
> > bit more work to implement.
> >   
> 
> 
> Yeah.. good idea
> MMU_PAGE_COUNT is 15, the size is going to be 15*8 bytes?

Unfortunately, slice_mask is 16 bytes. Only 10 are used, but it
seemed too ugly to try squashing things together.


> > Good question. The slice_convert_lock is... interesting. It only
> > protects the update-side of the slice page size arrays. I thought
> > this was okay last time I looked, but now you make me think again
> > maybe it is not. I need to check again what's providing exclusion
> > on the read side too.
> > 
> > I wanted to avoid doing more work under slice_convert_lock, but
> > we should just make that a per-mm lock anyway shouldn't we?
> >  
> 
> Yeah and Ben's comment in the reply suggest we already hold a
> per mm lock on the read side.

Let's discuss this further in my reply to Ben.

Thanks,
Nick

      reply	other threads:[~2016-07-25  4:35 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-22 12:57 [PATCH] powerpc/64: implement a slice mask cache Nicholas Piggin
2016-07-23  2:19 ` Balbir Singh
2016-07-23  7:10   ` Nicholas Piggin
2016-07-23  8:49     ` Benjamin Herrenschmidt
2016-07-25  2:28       ` Nicholas Piggin
2016-07-23 10:36     ` Balbir Singh
2016-07-25  4:35       ` Nicholas Piggin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160725143534.78bbf834@roar.ozlabs.ibm.com \
    --to=npiggin@gmail.com \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=bsingharora@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).