Re: [PATCH 1/8] THP: Use real address for NUMA policy

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alex Thorlton <athorlton@sgi.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Dave Hansen <dave.hansen@intel.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Eric W . Biederman" <ebiederm@xmission.com>,
	Sedat Dilek <sedat.dilek@gmail.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Dave Jones <davej@redhat.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	"Paul E . McKenney" <paulmck@linux.vnet.ibm.com>,
	David Howells <dhowells@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Oleg Nesterov <oleg@redhat.com>,
	Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Kees Cook <keescook@chromium.org>
Subject: Re: [PATCH 1/8] THP: Use real address for NUMA policy
Date: Mon, 9 Sep 2013 11:48:23 -0500	[thread overview]
Message-ID: <20130909164823.GD12435@sgi.com> (raw)
In-Reply-To: <20130905111510.GC23362@gmail.com>

On Thu, Sep 05, 2013 at 01:15:10PM +0200, Ingo Molnar wrote:
> 
> * Alex Thorlton <athorlton@sgi.com> wrote:
> 
> > > Robin,
> > > 
> > > I tweaked one of our other tests to behave pretty much exactly as I
> > > - malloc a large array
> > > - Spawn a specified number of threads
> > > - Have each thread touch small, evenly spaced chunks of the array (e.g.
> > >   for 128 threads, the array is divided into 128 chunks, and each thread
> > >   touches 1/128th of each chunk, dividing the array into 16,384 pieces)
> > 
> > Forgot to mention that the threads don't touch their chunks of memory
> > concurrently, i.e. thread 2 has to wait for thread 1 to finish first.
> > This is important to note, since the pages won't all get stuck on the
> > first node without this behavior.
> 
> Could you post the testcase please?
> 
> Thanks,
> 
> 	Ingo

Sorry for the delay here, had to make sure that everything in my tests
was okay to push out to the public.  Here's a pointer to the test I
wrote:

ftp://shell.sgi.com/collect/appsx_test/pthread_test.tar.gz

Everything to compile the test should be there (just run make in the
thp_pthread directory).  To run the test use something like:

time ./thp_pthread -C 0 -m 0 -c <max_cores> -b <memory>

I ran:

time ./thp_pthread -C 0 -m 0 -c 128 -b 128g

On a 256 core machine, with ~500gb of memory and got these results:

THP off:

real	0m57.797s
user	46m22.156s
sys	6m14.220s

THP on:

real	1m36.906s
user	0m2.612s
sys	143m13.764s

I snagged some code from another test we use, so I can't vouch for the
usefulness/accuracy of all the output (actually, I know some of it is
wrong).  I've mainly been looking at the total run time.

Don't want to bloat this e-mail up with too many test results, but I
found this one really interesting.  Same machine, using all the cores,
with the same amount of memory.  This means that each cpu is actually
doing *less* work, since the chunk we reserve gets divided up evenly
amongst the cpus:

time ./thp_pthread -C 0 -m 0 -c 256 -b 128g

THP off:

real	1m1.028s
user	104m58.448s
sys	8m52.908s

THP on:

real	2m26.072s
user	60m39.404s
sys	337m10.072s

Seems that the test scales really well in the THP off case, but, once
again, with THP on, we really see the performance start to degrade.

I'm planning to start investigating possible ways to split up THPs, if
we detect that that majority of the references to a THP are off-node.
I've heard some horror stories about migrating pages in this situation
(i.e., process switches cpu and then all the pages follow it), but I
think we might be able to get some better results if we can cleverly
determine an appropriate time to split up pages.  I've heard a bit of
talk about doing something similar to this from a few people, but
haven't seen any code/test results.  If anybody has any input on that
topic, it would be greatly appreciated.

- Alex

next prev parent reply	other threads:[~2013-09-09 16:48 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-02 19:46 [PATCH] Add per-process flag to control thp Alex Thorlton
2013-08-02 19:53 ` Dave Jones
2013-08-02 20:00   ` Alex Thorlton
2013-08-02 20:13 ` Kirill A. Shutemov
2013-08-02 20:34   ` Alex Thorlton
2013-08-02 23:59     ` Kirill A. Shutemov
2013-08-03 19:35     ` Kees Cook
2013-08-04 14:19       ` Oleg Nesterov
2013-08-05  2:36     ` Andi Kleen
2013-08-05 15:07       ` Alex Thorlton
2013-08-16 14:33       ` [PATCH 0/8] " Alex Thorlton
2013-08-16 14:33         ` [PATCH 1/8] THP: Use real address for NUMA policy Alex Thorlton
2013-08-16 17:53           ` Dave Hansen
2013-08-16 18:17             ` Alex Thorlton
2013-08-16 18:52               ` Kirill A. Shutemov
2013-08-27 16:50                 ` Alex Thorlton
2013-08-27 17:01                   ` Robin Holt
2013-09-04 15:43                     ` Alex Thorlton
2013-09-04 17:15                       ` Alex Thorlton
2013-09-05 11:15                         ` Ingo Molnar
2013-09-09 16:48                           ` Alex Thorlton [this message]
2013-09-10  7:47                             ` [benchmark] THP performance testcase Ingo Molnar
2013-09-13 13:06                               ` [PATCH 0/9] split page table lock for PMD tables Kirill A. Shutemov
2013-09-13 13:06                                 ` [PATCH 1/9] mm: rename SPLIT_PTLOCKS to SPLIT_PTE_PTLOCKS Kirill A. Shutemov
2013-09-13 15:20                                   ` Dave Hansen
2013-09-13 13:06                                 ` [PATCH 2/9] mm: convert mm->nr_ptes to atomic_t Kirill A. Shutemov
2013-09-13 13:06                                 ` [PATCH 3/9] mm: introduce api for split page table lock for PMD level Kirill A. Shutemov
2013-09-13 13:19                                   ` Peter Zijlstra
2013-09-13 14:22                                     ` Kirill A. Shutemov
2013-09-13 13:06                                 ` [PATCH 4/9] mm, thp: change pmd_trans_huge_lock() to return taken lock Kirill A. Shutemov
2013-09-13 13:06                                 ` [PATCH 5/9] mm, thp: move ptl taking inside page_check_address_pmd() Kirill A. Shutemov
2013-09-13 13:06                                 ` [PATCH 6/9] mm, thp: do not access mm->pmd_huge_pte directly Kirill A. Shutemov
2013-09-13 13:06                                 ` [PATCH 7/9] mm: convent the rest to new page table lock api Kirill A. Shutemov
2013-09-13 13:06                                 ` [PATCH 8/9] mm: implement split page table lock for PMD level Kirill A. Shutemov
2013-09-13 13:24                                   ` Peter Zijlstra
2013-09-13 14:25                                     ` Kirill A. Shutemov
2013-09-13 14:52                                       ` Peter Zijlstra
2013-09-13 13:36                                   ` Peter Zijlstra
2013-09-13 14:25                                     ` Kirill A. Shutemov
2013-09-13 15:45                                   ` Naoya Horiguchi
2013-09-13 19:57                                   ` Dave Hansen
2013-09-13 13:06                                 ` [PATCH 9/9] x86, mm: enable " Kirill A. Shutemov
     [not found]                   ` <20130828091814.GA13681@gmail.com>
2013-08-28  9:32                     ` [PATCH 1/8] THP: Use real address for NUMA policy Peter Zijlstra
2013-08-16 19:46               ` Peter Zijlstra
2013-08-16 19:49                 ` Alex Thorlton
2013-08-16 14:33         ` [PATCH 2/8] mm: make clear_huge_page tolerate non aligned address Alex Thorlton
2013-08-16 14:33         ` [PATCH 3/8] THP: Pass real, not rounded, address to clear_huge_page Alex Thorlton
2013-08-16 14:34         ` [PATCH 4/8] x86: Add clear_page_nocache Alex Thorlton
2013-08-16 14:34         ` [PATCH 5/8] mm: make clear_huge_page cache clear only around the fault address Alex Thorlton
2013-08-16 18:02           ` Dave Hansen
2013-08-16 14:34         ` [PATCH 6/8] x86: switch the 64bit uncached page clear to SSE/AVX v2 Alex Thorlton
2013-08-16 14:34         ` [PATCH 7/8] remove KM_USER0 from kmap_atomic call Alex Thorlton
2013-08-16 14:34         ` [PATCH 8/8] fix up references to kernel_fpu_begin/end Alex Thorlton
2013-08-16 14:47         ` [PATCH 0/8] Re: [PATCH] Add per-process flag to control thp Peter Zijlstra
2013-08-16 15:04           ` Alex Thorlton
2013-08-04 14:44 ` Rasmus Villemoes
2013-08-28 13:56 ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130909164823.GD12435@sgi.com \
    --to=athorlton@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=davej@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=fweisbec@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=robinmholt@gmail.com \
    --cc=sedat.dilek@gmail.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).