Re: [rfc] balance-on-fork NUMA placement

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin Bligh <mbligh@mbligh.org>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <ak@suse.de>, Ingo Molnar <mingo@elte.hu>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [rfc] balance-on-fork NUMA placement
Date: Wed, 01 Aug 2007 10:53:39 -0700	[thread overview]
Message-ID: <46B0C8A3.8090506@mbligh.org> (raw)
In-Reply-To: <20070801002313.GC31006@wotan.suse.de>

Nick Piggin wrote:
> On Tue, Jul 31, 2007 at 11:14:08AM +0200, Andi Kleen wrote:
>> On Tuesday 31 July 2007 07:41, Nick Piggin wrote:
>>
>>> I haven't given this idea testing yet, but I just wanted to get some
>>> opinions on it first. NUMA placement still isn't ideal (eg. tasks with
>>> a memory policy will not do any placement, and process migrations of
>>> course will leave the memory behind...), but it does give a bit more
>>> chance for the memory controllers and interconnects to get evenly
>>> loaded.
>> I didn't think slab honored mempolicies by default? 
>> At least you seem to need to set special process flags.
>>
>>> NUMA balance-on-fork code is in a good position to allocate all of a new
>>> process's memory on a chosen node. However, it really only starts
>>> allocating on the correct node after the process starts running.
>>>
>>> task and thread structures, stack, mm_struct, vmas, page tables etc. are
>>> all allocated on the parent's node.
>> The page tables should be only allocated when the process runs; except
>> for the PGD.
> 
> We certainly used to copy all page tables on fork. Not any more, but we
> must still copy anonymous page tables.

This topic seems to come up periodically every since we first introduced
the NUMA scheduler, and every time we decide it's a bad idea. What's
changed? What workloads does this improve (aside from some artificial
benchmark like stream)?

To repeat the conclusions of last time ... the primary problem is that
99% of the time, we exec after we fork, and it makes that fork/exec
cycle slower, not faster, so exec is generally a much better time to do
this. There's no good predictor of whether we'll exec after fork, unless
one has magically appeared since late 2.5.x ?

M.

WARNING: multiple messages have this Message-ID (diff)

From: Martin Bligh <mbligh@mbligh.org>
To: Nick Piggin <npiggin@suse.de>
Cc: Andi Kleen <ak@suse.de>, Ingo Molnar <mingo@elte.hu>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: [rfc] balance-on-fork NUMA placement
Date: Wed, 01 Aug 2007 10:53:39 -0700	[thread overview]
Message-ID: <46B0C8A3.8090506@mbligh.org> (raw)
In-Reply-To: <20070801002313.GC31006@wotan.suse.de>

Nick Piggin wrote:
> On Tue, Jul 31, 2007 at 11:14:08AM +0200, Andi Kleen wrote:
>> On Tuesday 31 July 2007 07:41, Nick Piggin wrote:
>>
>>> I haven't given this idea testing yet, but I just wanted to get some
>>> opinions on it first. NUMA placement still isn't ideal (eg. tasks with
>>> a memory policy will not do any placement, and process migrations of
>>> course will leave the memory behind...), but it does give a bit more
>>> chance for the memory controllers and interconnects to get evenly
>>> loaded.
>> I didn't think slab honored mempolicies by default? 
>> At least you seem to need to set special process flags.
>>
>>> NUMA balance-on-fork code is in a good position to allocate all of a new
>>> process's memory on a chosen node. However, it really only starts
>>> allocating on the correct node after the process starts running.
>>>
>>> task and thread structures, stack, mm_struct, vmas, page tables etc. are
>>> all allocated on the parent's node.
>> The page tables should be only allocated when the process runs; except
>> for the PGD.
> 
> We certainly used to copy all page tables on fork. Not any more, but we
> must still copy anonymous page tables.

This topic seems to come up periodically every since we first introduced
the NUMA scheduler, and every time we decide it's a bad idea. What's
changed? What workloads does this improve (aside from some artificial
benchmark like stream)?

To repeat the conclusions of last time ... the primary problem is that
99% of the time, we exec after we fork, and it makes that fork/exec
cycle slower, not faster, so exec is generally a much better time to do
this. There's no good predictor of whether we'll exec after fork, unless
one has magically appeared since late 2.5.x ?

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2007-08-01 17:53 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-31  5:41 [rfc] balance-on-fork NUMA placement Nick Piggin
2007-07-31  5:41 ` Nick Piggin
2007-07-31  8:01 ` Ingo Molnar
2007-07-31  8:01   ` Ingo Molnar
2007-08-01  0:21   ` Nick Piggin
2007-08-01  0:21     ` Nick Piggin
2007-08-01  6:19     ` Ingo Molnar
2007-08-01  6:19       ` Ingo Molnar
2007-07-31  9:14 ` Andi Kleen
2007-07-31  9:14   ` Andi Kleen
2007-07-31 23:40   ` Christoph Lameter
2007-07-31 23:40     ` Christoph Lameter
2007-08-01  8:39     ` Andi Kleen
2007-08-01  8:39       ` Andi Kleen
2007-08-02  3:42     ` Nick Piggin
2007-08-02  3:42       ` Nick Piggin
2007-08-02 19:58       ` Christoph Lameter
2007-08-02 19:58         ` Christoph Lameter
2007-08-03  0:26         ` Nick Piggin
2007-08-03  0:26           ` Nick Piggin
2007-08-03  0:52           ` Christoph Lameter
2007-08-03  0:52             ` Christoph Lameter
2007-08-03  0:57             ` Nick Piggin
2007-08-03  0:57               ` Nick Piggin
2007-08-03  1:02               ` Christoph Lameter
2007-08-03  1:02                 ` Christoph Lameter
2007-08-03  1:14                 ` Nick Piggin
2007-08-03  1:14                   ` Nick Piggin
2007-08-03  1:34                   ` Christoph Lameter
2007-08-03  1:34                     ` Christoph Lameter
2007-08-03  3:14                     ` Nick Piggin
2007-08-03  3:14                       ` Nick Piggin
2007-08-03  5:47                       ` Christoph Lameter
2007-08-03  5:47                         ` Christoph Lameter
2007-08-01  0:23   ` Nick Piggin
2007-08-01  0:23     ` Nick Piggin
2007-08-01 17:53     ` Martin Bligh [this message]
2007-08-01 17:53       ` Martin Bligh
2007-08-01 18:32       ` Lee Schermerhorn
2007-08-01 18:32         ` Lee Schermerhorn
2007-08-01 22:52         ` Martin Bligh
2007-08-01 22:52           ` Martin Bligh
2007-08-02  1:36           ` Nick Piggin
2007-08-02  1:36             ` Nick Piggin
2007-08-02 18:33             ` Martin Bligh
2007-08-02 18:33               ` Martin Bligh
2007-08-03  0:20               ` Nick Piggin
2007-08-03  0:20                 ` Nick Piggin
2007-08-03 20:10                 ` Siddha, Suresh B
2007-08-03 20:10                   ` Siddha, Suresh B
2007-08-06  1:20                   ` Nick Piggin
2007-08-06  1:20                     ` Nick Piggin
2007-08-02 14:49           ` Lee Schermerhorn
2007-08-02 14:49             ` Lee Schermerhorn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46B0C8A3.8090506@mbligh.org \
    --to=mbligh@mbligh.org \
    --cc=ak@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.