All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 7/8] sched: Split accounting of NUMA hinting faults that pass two-stage filter
Date: Fri, 28 Jun 2013 17:12:56 +0200	[thread overview]
Message-ID: <20130628151256.GC6626@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20130628142925.GB1875@suse.de>

On Fri, Jun 28, 2013 at 03:29:25PM +0100, Mel Gorman wrote:
> > Oh duh indeed. I totally missed it did that. Changelog also isn't giving
> > rationale for this. Mel?
> > 
> 
> There were a few reasons
> 
> First, if there are many tasks sharing the page then they'll all move towards
> the same node. The node will be compute overloaded and then scheduled away
> later only to bounce back again. Alternatively the shared tasks would
> just bounce around nodes because the fault information is effectively
> noise. Either way I felt that accounting for shared faults with private
> faults would be slower overall.
> 
> The second reason was based on a hypothetical workload that had a small
> number of very important, heavily accessed private pages but a large shared
> array. The shared array would dominate the number of faults and be selected
> as a preferred node even though it's the wrong decision.
> 
> The third reason was because multiple threads in a process will race
> each other to fault the shared page making the information unreliable.
> 
> It is important that *something* be done with shared faults but I haven't
> thought of what exactly yet. One possibility would be to give them a
> different weight, maybe based on the number of active NUMA nodes, but I had
> not tested anything yet. Peter suggested privately that if shared faults
> dominate the workload that the shared pages would be migrated based on an
> interleave policy which has some potential.
> 

It would be good to put something like this in the Changelog, or even as
a comment near how we select the preferred node.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Mel Gorman <mgorman@suse.de>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 7/8] sched: Split accounting of NUMA hinting faults that pass two-stage filter
Date: Fri, 28 Jun 2013 17:12:56 +0200	[thread overview]
Message-ID: <20130628151256.GC6626@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20130628142925.GB1875@suse.de>

On Fri, Jun 28, 2013 at 03:29:25PM +0100, Mel Gorman wrote:
> > Oh duh indeed. I totally missed it did that. Changelog also isn't giving
> > rationale for this. Mel?
> > 
> 
> There were a few reasons
> 
> First, if there are many tasks sharing the page then they'll all move towards
> the same node. The node will be compute overloaded and then scheduled away
> later only to bounce back again. Alternatively the shared tasks would
> just bounce around nodes because the fault information is effectively
> noise. Either way I felt that accounting for shared faults with private
> faults would be slower overall.
> 
> The second reason was based on a hypothetical workload that had a small
> number of very important, heavily accessed private pages but a large shared
> array. The shared array would dominate the number of faults and be selected
> as a preferred node even though it's the wrong decision.
> 
> The third reason was because multiple threads in a process will race
> each other to fault the shared page making the information unreliable.
> 
> It is important that *something* be done with shared faults but I haven't
> thought of what exactly yet. One possibility would be to give them a
> different weight, maybe based on the number of active NUMA nodes, but I had
> not tested anything yet. Peter suggested privately that if shared faults
> dominate the workload that the shared pages would be migrated based on an
> interleave policy which has some potential.
> 

It would be good to put something like this in the Changelog, or even as
a comment near how we select the preferred node.

  reply	other threads:[~2013-06-28 15:13 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-26 14:37 [PATCH 0/6] Basic scheduler support for automatic NUMA balancing Mel Gorman
2013-06-26 14:37 ` Mel Gorman
2013-06-26 14:38 ` [PATCH 1/8] mm: numa: Document automatic NUMA balancing sysctls Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-26 14:38 ` [PATCH 2/8] sched: Track NUMA hinting faults on per-node basis Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-27 15:57   ` Peter Zijlstra
2013-06-27 15:57     ` Peter Zijlstra
2013-06-28 12:22     ` Mel Gorman
2013-06-28 12:22       ` Mel Gorman
2013-06-28  6:08   ` Srikar Dronamraju
2013-06-28  6:08     ` Srikar Dronamraju
2013-06-28  8:56     ` Peter Zijlstra
2013-06-28  8:56       ` Peter Zijlstra
2013-06-28 12:30     ` Mel Gorman
2013-06-28 12:30       ` Mel Gorman
2013-06-26 14:38 ` [PATCH 3/8] sched: Select a preferred node with the most numa hinting faults Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-28  6:14   ` Srikar Dronamraju
2013-06-28  6:14     ` Srikar Dronamraju
2013-06-28  8:59     ` Peter Zijlstra
2013-06-28  8:59       ` Peter Zijlstra
2013-06-28 10:24       ` Srikar Dronamraju
2013-06-28 10:24         ` Srikar Dronamraju
2013-06-28 12:33     ` Mel Gorman
2013-06-28 12:33       ` Mel Gorman
2013-06-26 14:38 ` [PATCH 4/8] sched: Update NUMA hinting faults once per scan Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-28  6:32   ` Srikar Dronamraju
2013-06-28  6:32     ` Srikar Dronamraju
2013-06-28  9:01     ` Peter Zijlstra
2013-06-28  9:01       ` Peter Zijlstra
2013-06-26 14:38 ` [PATCH 5/8] sched: Favour moving tasks towards the preferred node Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-27 14:52   ` Peter Zijlstra
2013-06-27 14:52     ` Peter Zijlstra
2013-06-27 14:53   ` Peter Zijlstra
2013-06-27 14:53     ` Peter Zijlstra
2013-06-28 13:00     ` Mel Gorman
2013-06-28 13:00       ` Mel Gorman
2013-06-27 16:01   ` Peter Zijlstra
2013-06-27 16:01     ` Peter Zijlstra
2013-06-28 13:01     ` Mel Gorman
2013-06-28 13:01       ` Mel Gorman
2013-06-27 16:11   ` Peter Zijlstra
2013-06-27 16:11     ` Peter Zijlstra
2013-06-28 13:45     ` Mel Gorman
2013-06-28 13:45       ` Mel Gorman
2013-06-28 15:10       ` Peter Zijlstra
2013-06-28 15:10         ` Peter Zijlstra
2013-06-28  8:11   ` Srikar Dronamraju
2013-06-28  8:11     ` Srikar Dronamraju
2013-06-28  9:04     ` Peter Zijlstra
2013-06-28  9:04       ` Peter Zijlstra
2013-06-28 10:07       ` Srikar Dronamraju
2013-06-28 10:07         ` Srikar Dronamraju
2013-06-28 10:24         ` Peter Zijlstra
2013-06-28 10:24           ` Peter Zijlstra
2013-06-28 13:51         ` Mel Gorman
2013-06-28 13:51           ` Mel Gorman
2013-06-28 17:14           ` Srikar Dronamraju
2013-06-28 17:14             ` Srikar Dronamraju
2013-06-28 17:34             ` Mel Gorman
2013-06-28 17:34               ` Mel Gorman
2013-06-28 17:44               ` Srikar Dronamraju
2013-06-28 17:44                 ` Srikar Dronamraju
2013-06-26 14:38 ` [PATCH 6/8] sched: Reschedule task on preferred NUMA node once selected Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-27 14:54   ` Peter Zijlstra
2013-06-27 14:54     ` Peter Zijlstra
2013-06-28 13:54     ` Mel Gorman
2013-06-28 13:54       ` Mel Gorman
2013-07-02 12:06   ` Srikar Dronamraju
2013-07-02 12:06     ` Srikar Dronamraju
2013-07-02 16:29     ` Mel Gorman
2013-07-02 16:29       ` Mel Gorman
2013-07-02 18:17     ` Peter Zijlstra
2013-07-02 18:17       ` Peter Zijlstra
2013-07-06  6:44       ` Srikar Dronamraju
2013-07-06  6:44         ` Srikar Dronamraju
2013-07-06 10:47         ` Peter Zijlstra
2013-07-06 10:47           ` Peter Zijlstra
2013-07-02 18:15   ` Peter Zijlstra
2013-07-02 18:15     ` Peter Zijlstra
2013-07-03  9:50     ` Peter Zijlstra
2013-07-03  9:50       ` Peter Zijlstra
2013-07-03 15:28       ` Mel Gorman
2013-07-03 15:28         ` Mel Gorman
2013-07-03 18:46         ` Peter Zijlstra
2013-07-03 18:46           ` Peter Zijlstra
2013-06-26 14:38 ` [PATCH 7/8] sched: Split accounting of NUMA hinting faults that pass two-stage filter Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-27 14:56   ` Peter Zijlstra
2013-06-27 14:56     ` Peter Zijlstra
2013-06-28 14:00     ` Mel Gorman
2013-06-28 14:00       ` Mel Gorman
2013-06-28  7:00   ` Srikar Dronamraju
2013-06-28  7:00     ` Srikar Dronamraju
2013-06-28  9:36     ` Peter Zijlstra
2013-06-28  9:36       ` Peter Zijlstra
2013-06-28 10:12       ` Srikar Dronamraju
2013-06-28 10:12         ` Srikar Dronamraju
2013-06-28 10:33         ` Peter Zijlstra
2013-06-28 10:33           ` Peter Zijlstra
2013-06-28 14:29           ` Mel Gorman
2013-06-28 14:29             ` Mel Gorman
2013-06-28 15:12             ` Peter Zijlstra [this message]
2013-06-28 15:12               ` Peter Zijlstra
2013-06-26 14:38 ` [PATCH 8/8] sched: Increase NUMA PTE scanning when a new preferred node is selected Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-27 14:59 ` [PATCH 0/6] Basic scheduler support for automatic NUMA balancing Peter Zijlstra
2013-06-27 14:59   ` Peter Zijlstra
2013-06-28 13:54 ` Srikar Dronamraju
2013-06-28 13:54   ` Srikar Dronamraju
2013-07-01  5:39   ` Srikar Dronamraju
2013-07-01  5:39     ` Srikar Dronamraju
2013-07-01  8:43     ` Mel Gorman
2013-07-01  8:43       ` Mel Gorman
2013-07-02  5:28       ` Srikar Dronamraju
2013-07-02  5:28         ` Srikar Dronamraju
2013-07-02  7:46   ` Peter Zijlstra
2013-07-02  7:46     ` Peter Zijlstra
2013-07-02  8:55     ` Peter Zijlstra
2013-07-02  8:55       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130628151256.GC6626@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aarcange@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=srikar@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.