From: Alex Thorlton <athorlton@sgi.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dave Hansen <dave.hansen@intel.com>,
linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mel Gorman <mgorman@suse.de>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
"Eric W . Biederman" <ebiederm@xmission.com>,
Sedat Dilek <sedat.dilek@gmail.com>,
Frederic Weisbecker <fweisbec@gmail.com>,
Dave Jones <davej@redhat.com>,
Michael Kerrisk <mtk.manpages@gmail.com>,
"Paul E . McKenney" <paulmck@linux.vnet.ibm.com>,
David Howells <dhowells@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Al Viro <viro@zeniv.linux.org.uk>,
Oleg Nesterov <oleg@redhat.com>,
Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
Kees Cook <keescook@chromium.org>,
Robin Holt <robinmholt@gmail.com>
Subject: Re: [PATCH 1/8] THP: Use real address for NUMA policy
Date: Tue, 27 Aug 2013 11:50:39 -0500 [thread overview]
Message-ID: <20130827165039.GC2886@sgi.com> (raw)
In-Reply-To: <20130816185212.GA3568@shutemov.name>
> Here's more up-to-date version: https://lkml.org/lkml/2012/8/20/337
These don't seem to give us a noticeable performance change either:
With THP:
real 22m34.279s
user 10797m35.984s
sys 39m18.188s
Without THP:
real 4m48.957s
user 2118m23.208s
sys 113m12.740s
Looks like we got a few minutes faster on the with THP case, but it's
still significantly slower, and that could just be a fluke result; we're
still floating at about a 5x performance degradation.
I talked with one of our performance/benchmarking experts last week and
he's done a bit more research into the actual problem here, so I've got
a bit more information:
The real performance hit, based on our testing, seems to be coming from
the increased latency that comes into play on large NUMA systems when a
process has to go off-node to read from/write to memory.
To give an extreme example, say we have a 16 node system with 8 cores
per node. If we have a job that shares a 2MB data structure between 128
threads, with THP on, the first thread to touch the structure will
allocate all 2MB of space for that structure in a 2MB page, local to its
socket. This means that all the memory accessses for the other 120
threads will be remote acceses. With THP off, each thread could locally
allocate a number of 4K pages sufficient to hold the chunk of the
structure on which it needs to work, significantly reducing the number
of remote accesses that each thread will need to perform.
So, with that in mind, do we agree that a per-process tunable (or
something similar) to control THP seems like a reasonable method to
handle this issue?
Just want to confirm that everyone likes this approach before moving
forward with another revision of the patch. I'm currently in favor of
moving this to a per-mm tunable, since that seems to make more sense
when it comes to threaded jobs. Also, a decent chunk of the code I've
already written can be reused with this approach, and prctl will still
be an appropriate place from which to control the behavior. Andrew
Morton suggested possibly controlling this through the ELF header, but
I'm going to lean towards the per-mm route unless anyone has a major
objection to it.
- Alex
next prev parent reply other threads:[~2013-08-27 16:50 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-02 19:46 [PATCH] Add per-process flag to control thp Alex Thorlton
2013-08-02 19:53 ` Dave Jones
2013-08-02 20:00 ` Alex Thorlton
2013-08-02 20:13 ` Kirill A. Shutemov
2013-08-02 20:34 ` Alex Thorlton
2013-08-02 23:59 ` Kirill A. Shutemov
2013-08-03 19:35 ` Kees Cook
2013-08-04 14:19 ` Oleg Nesterov
2013-08-05 2:36 ` Andi Kleen
2013-08-05 15:07 ` Alex Thorlton
2013-08-16 14:33 ` [PATCH 0/8] " Alex Thorlton
2013-08-16 14:33 ` [PATCH 1/8] THP: Use real address for NUMA policy Alex Thorlton
2013-08-16 17:53 ` Dave Hansen
2013-08-16 18:17 ` Alex Thorlton
2013-08-16 18:52 ` Kirill A. Shutemov
2013-08-27 16:50 ` Alex Thorlton [this message]
2013-08-27 17:01 ` Robin Holt
2013-09-04 15:43 ` Alex Thorlton
2013-09-04 17:15 ` Alex Thorlton
2013-09-05 11:15 ` Ingo Molnar
2013-09-09 16:48 ` Alex Thorlton
2013-09-10 7:47 ` [benchmark] THP performance testcase Ingo Molnar
2013-09-13 13:06 ` [PATCH 0/9] split page table lock for PMD tables Kirill A. Shutemov
2013-09-13 13:06 ` [PATCH 1/9] mm: rename SPLIT_PTLOCKS to SPLIT_PTE_PTLOCKS Kirill A. Shutemov
2013-09-13 15:20 ` Dave Hansen
2013-09-13 13:06 ` [PATCH 2/9] mm: convert mm->nr_ptes to atomic_t Kirill A. Shutemov
2013-09-13 13:06 ` [PATCH 3/9] mm: introduce api for split page table lock for PMD level Kirill A. Shutemov
2013-09-13 13:19 ` Peter Zijlstra
2013-09-13 14:22 ` Kirill A. Shutemov
2013-09-13 13:06 ` [PATCH 4/9] mm, thp: change pmd_trans_huge_lock() to return taken lock Kirill A. Shutemov
2013-09-13 13:06 ` [PATCH 5/9] mm, thp: move ptl taking inside page_check_address_pmd() Kirill A. Shutemov
2013-09-13 13:06 ` [PATCH 6/9] mm, thp: do not access mm->pmd_huge_pte directly Kirill A. Shutemov
2013-09-13 13:06 ` [PATCH 7/9] mm: convent the rest to new page table lock api Kirill A. Shutemov
2013-09-13 13:06 ` [PATCH 8/9] mm: implement split page table lock for PMD level Kirill A. Shutemov
2013-09-13 13:24 ` Peter Zijlstra
2013-09-13 14:25 ` Kirill A. Shutemov
2013-09-13 14:52 ` Peter Zijlstra
2013-09-13 13:36 ` Peter Zijlstra
2013-09-13 14:25 ` Kirill A. Shutemov
2013-09-13 15:45 ` Naoya Horiguchi
2013-09-13 19:57 ` Dave Hansen
2013-09-13 13:06 ` [PATCH 9/9] x86, mm: enable " Kirill A. Shutemov
[not found] ` <20130828091814.GA13681@gmail.com>
2013-08-28 9:32 ` [PATCH 1/8] THP: Use real address for NUMA policy Peter Zijlstra
2013-08-16 19:46 ` Peter Zijlstra
2013-08-16 19:49 ` Alex Thorlton
2013-08-16 14:33 ` [PATCH 2/8] mm: make clear_huge_page tolerate non aligned address Alex Thorlton
2013-08-16 14:33 ` [PATCH 3/8] THP: Pass real, not rounded, address to clear_huge_page Alex Thorlton
2013-08-16 14:34 ` [PATCH 4/8] x86: Add clear_page_nocache Alex Thorlton
2013-08-16 14:34 ` [PATCH 5/8] mm: make clear_huge_page cache clear only around the fault address Alex Thorlton
2013-08-16 18:02 ` Dave Hansen
2013-08-16 14:34 ` [PATCH 6/8] x86: switch the 64bit uncached page clear to SSE/AVX v2 Alex Thorlton
2013-08-16 14:34 ` [PATCH 7/8] remove KM_USER0 from kmap_atomic call Alex Thorlton
2013-08-16 14:34 ` [PATCH 8/8] fix up references to kernel_fpu_begin/end Alex Thorlton
2013-08-16 14:47 ` [PATCH 0/8] Re: [PATCH] Add per-process flag to control thp Peter Zijlstra
2013-08-16 15:04 ` Alex Thorlton
2013-08-04 14:44 ` Rasmus Villemoes
2013-08-28 13:56 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130827165039.GC2886@sgi.com \
--to=athorlton@sgi.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=davej@redhat.com \
--cc=dhowells@redhat.com \
--cc=ebiederm@xmission.com \
--cc=fweisbec@gmail.com \
--cc=hannes@cmpxchg.org \
--cc=keescook@chromium.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mtk.manpages@gmail.com \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=robinmholt@gmail.com \
--cc=sedat.dilek@gmail.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).