linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Hillf Danton <dhillf@gmail.com>, Dan Smith <danms@us.ibm.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	Paul Turner <pjt@google.com>,
	Suresh Siddha <suresh.b.siddha@intel.com>,
	Mike Galbraith <efault@gmx.de>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Bharata B Rao <bharata.rao@gmail.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
	Christoph Lameter <cl@linux.com>, Alex Shi <alex.shi@intel.com>,
	Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Don Morris <don.morris@hp.com>
Subject: Re: [PATCH 33/36] autonuma: powerpc port
Date: Thu, 23 Aug 2012 00:35:42 +0200	[thread overview]
Message-ID: <20120822223542.GG8107@redhat.com> (raw)
In-Reply-To: <1345672907.2617.44.camel@pasglop>

On Thu, Aug 23, 2012 at 08:01:47AM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-08-22 at 16:59 +0200, Andrea Arcangeli wrote:
> > diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> > index 2e0e411..5f03079 100644
> > --- a/arch/powerpc/include/asm/pgtable.h
> > +++ b/arch/powerpc/include/asm/pgtable.h
> > @@ -33,10 +33,56 @@ static inline int pte_dirty(pte_t pte)		{ return pte_val(pte) & _PAGE_DIRTY; }
> >  static inline int pte_young(pte_t pte)		{ return pte_val(pte) & _PAGE_ACCESSED; }
> >  static inline int pte_file(pte_t pte)		{ return pte_val(pte) & _PAGE_FILE; }
> >  static inline int pte_special(pte_t pte)	{ return pte_val(pte) & _PAGE_SPECIAL; }
> > -static inline int pte_present(pte_t pte)	{ return pte_val(pte) & _PAGE_PRESENT; }
> > +static inline int pte_present(pte_t pte)	{ return pte_val(pte) &
> > +							(_PAGE_PRESENT|_PAGE_NUMA_PTE); }
> 
> Is this absolutely necessary ? (testing two bits). It somewhat changes
> the semantics of "pte_present" which I don't really like.

I'm actually surprised you don't already check for PROTNONE
there. Anyway yes this is necessary, the whole concept of NUMA hinting
page faults is to make the pte not present, and to set another bit (be
it a reserved bit or PROTNONE doesn't change anything in that
respect). But another bit replacing _PAGE_PRESENT must exist.

This change is zero cost at runtime, and 0x1 or 0x3 won't change a
thing for the CPU.

> >  static inline int pte_none(pte_t pte)		{ return (pte_val(pte) & ~_PTE_NONE_MASK) == 0; }
> >  static inline pgprot_t pte_pgprot(pte_t pte)	{ return __pgprot(pte_val(pte) & PAGE_PROT_BITS); }
> >  
> > +#ifdef CONFIG_AUTONUMA
> > +static inline int pte_numa(pte_t pte)
> > +{
> > +       return (pte_val(pte) &
> > +               (_PAGE_NUMA_PTE|_PAGE_PRESENT)) == _PAGE_NUMA_PTE;
> > +}
> > +
> > +#endif
> 
> Why the ifdef and not anywhere else ?

The generic version is implemented in asm-generic/pgtable.h to avoid dups.

> > diff --git a/arch/powerpc/include/asm/pte-hash64-64k.h b/arch/powerpc/include/asm/pte-hash64-64k.h
> > index 59247e8..f7e1468 100644
> > --- a/arch/powerpc/include/asm/pte-hash64-64k.h
> > +++ b/arch/powerpc/include/asm/pte-hash64-64k.h
> > @@ -7,6 +7,8 @@
> >  #define _PAGE_COMBO	0x10000000 /* this is a combo 4k page */
> >  #define _PAGE_4K_PFN	0x20000000 /* PFN is for a single 4k page */
> >  
> > +#define _PAGE_NUMA_PTE 0x40000000 /* Adjust PTE_RPN_SHIFT below */
> > +
> >  /* For 64K page, we don't have a separate _PAGE_HASHPTE bit. Instead,
> >   * we set that to be the whole sub-bits mask. The C code will only
> >   * test this, so a multi-bit mask will work. For combo pages, this
> > @@ -36,7 +38,7 @@
> >   * That gives us a max RPN of 34 bits, which means a max of 50 bits
> >   * of addressable physical space, or 46 bits for the special 4k PFNs.
> >   */
> > -#define PTE_RPN_SHIFT	(30)
> > +#define PTE_RPN_SHIFT	(31)
> 
> I'm concerned. We are already running short on RPN bits. We can't spare
> more. If you absolutely need a PTE bit, we'll need to explore ways to
> free some, but just reducing the RPN isn't an option.

No way to do it without a spare bit.

Note that this is now true for sched-numa rewrite as well because it
also introduced the NUMA hinting page faults of AutoNUMA (except what
it does during the fault is different there, but the mechanism of
firing them and the need of a spare pte bit is identical).

But you must have a bit for protnone, don't you? You can implement it
with prot none, I can add the vma as parameter to some function to
achieve it if you need. It may be good idea to do anyway even if
there's no need on x86 at this point.

> Think of what happens if PTE_4K_PFN is set...

It may very well broken with PTE_4K_PFN is set, I'm not familiar with
that. If that's the case we'll just add an option to prevent
AUTONUMA=y to be set if PTE_4K_PFN is set thanks for the info.

> Also you conveniently avoided all the other pte-*.h variants meaning you
> broke the build for everything except ppc64 with 64k pages.

This can only be enabled on PPC64 in KConfig so no problem about
ppc32.

> > diff --git a/mm/autonuma.c b/mm/autonuma.c
> > index ada6c57..a4da3f3 100644
> > --- a/mm/autonuma.c
> > +++ b/mm/autonuma.c
> > @@ -25,7 +25,7 @@ unsigned long autonuma_flags __read_mostly =
> >  #ifdef CONFIG_AUTONUMA_DEFAULT_ENABLED
> >  	|(1<<AUTONUMA_ENABLED_FLAG)
> >  #endif
> > -	|(1<<AUTONUMA_SCAN_PMD_FLAG);
> > +	|(0<<AUTONUMA_SCAN_PMD_FLAG);
> 
> That changes the default accross all architectures, is that ok vs.
> Andrea ?

:) Indeed! But the next patch (34) undoes this hack. I just merged the
patch with "git am" and then introduced a proper way for the arch to
specify if the PMD scan is supported or not in an incremental
patch. Adding ppc64 support, and making the PMD scan mode arch
conditional are two separate things so I thought it was cleaner
keeping those in two separate patches but I can fold them if you
prefer.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-08-22 22:36 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-22 14:58 [PATCH 00/36] AutoNUMA24 Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 01/36] autonuma: make set_pmd_at always available Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 02/36] autonuma: export is_vma_temporary_stack() even if CONFIG_TRANSPARENT_HUGEPAGE=n Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 03/36] autonuma: define _PAGE_NUMA_PTE and _PAGE_NUMA_PMD Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 04/36] autonuma: pte_numa() and pmd_numa() Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 05/36] autonuma: teach gup_fast about pmd_numa Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 06/36] autonuma: introduce kthread_bind_node() Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 07/36] autonuma: mm_autonuma and task_autonuma data structures Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 08/36] autonuma: define the autonuma flags Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 09/36] autonuma: core autonuma.h header Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 10/36] autonuma: CPU follows memory algorithm Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 11/36] autonuma: add page structure fields Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 12/36] autonuma: knuma_migrated per NUMA node queues Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 13/36] autonuma: autonuma_enter/exit Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 14/36] autonuma: call autonuma_setup_new_exec() Andrea Arcangeli
2012-08-22 14:58 ` [PATCH 15/36] autonuma: alloc/free/init task_autonuma Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 16/36] autonuma: alloc/free/init mm_autonuma Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 17/36] autonuma: prevent select_task_rq_fair to return -1 Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 18/36] autonuma: teach CFS about autonuma affinity Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 19/36] autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection Andrea Arcangeli
2012-08-22 20:19   ` Andi Kleen
2012-08-22 21:22     ` Hugh Dickins
2012-08-22 21:24     ` Andrea Arcangeli
2012-08-22 22:37       ` Andi Kleen
2012-08-22 22:46         ` Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 20/36] autonuma: default mempolicy follow AutoNUMA Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 21/36] autonuma: call autonuma_split_huge_page() Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 22/36] autonuma: make khugepaged pte_numa aware Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 23/36] autonuma: retain page last_nid information in khugepaged Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 24/36] autonuma: numa hinting page faults entry points Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 25/36] autonuma: reset autonuma page data when pages are freed Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 26/36] autonuma: link mm/autonuma.o and kernel/sched/numa.o Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 27/36] autonuma: add CONFIG_AUTONUMA and CONFIG_AUTONUMA_DEFAULT_ENABLED Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 28/36] autonuma: page_autonuma Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 29/36] autonuma: autonuma_migrate_head[0] dynamic size Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 30/36] autonuma: bugcheck page_autonuma fields on newly allocated pages Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 31/36] autonuma: shrink the per-page page_autonuma struct size Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 32/36] autonuma: boost khugepaged scanning rate Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 33/36] autonuma: powerpc port Andrea Arcangeli
2012-08-22 22:01   ` Benjamin Herrenschmidt
2012-08-22 22:35     ` Andrea Arcangeli [this message]
2012-08-23  5:11       ` Benjamin Herrenschmidt
2012-08-23 15:23         ` Andrea Arcangeli
2012-08-23 22:13         ` Benjamin Herrenschmidt
2012-08-22 22:56     ` Benjamin Herrenschmidt
2012-08-22 23:06       ` Andrea Arcangeli
2012-08-23  4:15       ` Vaidyanathan Srinivasan
2012-08-22 14:59 ` [PATCH 34/36] autonuma: make the AUTONUMA_SCAN_PMD_FLAG conditional to CONFIG_HAVE_ARCH_AUTONUMA_SCAN_PMD Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 35/36] autonuma: add knuma_migrated/allow_first_fault in sysfs Andrea Arcangeli
2012-08-22 14:59 ` [PATCH 36/36] autonuma: add mm_autonuma working set estimation Andrea Arcangeli
2012-08-22 19:26 ` [PATCH 00/36] AutoNUMA24 Rik van Riel
2012-08-22 21:40   ` Ingo Molnar
2012-08-22 22:19     ` Andrea Arcangeli
2012-08-23  8:42       ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120822223542.GG8107@redhat.com \
    --to=aarcange@redhat.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=benh@kernel.crashing.org \
    --cc=bharata.rao@gmail.com \
    --cc=cl@linux.com \
    --cc=danms@us.ibm.com \
    --cc=dhillf@gmail.com \
    --cc=don.morris@hp.com \
    --cc=efault@gmx.de \
    --cc=hannes@cmpxchg.org \
    --cc=konrad.wilk@oracle.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mauricfo@linux.vnet.ibm.com \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=svaidy@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).