[tip:sched/core] sched/numa: Count pages on active node as local

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: tip-bot for Rik van Riel <tipbot@zytor.com>
To: linux-tip-commits@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hpa@zytor.com, mingo@kernel.org,
	torvalds@linux-foundation.org, peterz@infradead.org,
	riel@redhat.com, chegu_vinod@hp.com, mgorman@suse.de,
	tglx@linutronix.de
Subject: [tip:sched/core] sched/numa: Count pages on active node as local
Date: Thu, 8 May 2014 03:42:39 -0700	[thread overview]
Message-ID: <tip-792568ec6a31ca560ca4d528782cbc6cd2cea8b0@git.kernel.org> (raw)
In-Reply-To: <1397235629-16328-2-git-send-email-riel@redhat.com>

Commit-ID:  792568ec6a31ca560ca4d528782cbc6cd2cea8b0
Gitweb:     http://git.kernel.org/tip/792568ec6a31ca560ca4d528782cbc6cd2cea8b0
Author:     Rik van Riel <riel@redhat.com>
AuthorDate: Fri, 11 Apr 2014 13:00:27 -0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 7 May 2014 13:33:45 +0200

sched/numa: Count pages on active node as local

The NUMA code is smart enough to distribute the memory of workloads
that span multiple NUMA nodes across those NUMA nodes.

However, it still has a pretty high scan rate for such workloads,
because any memory that is left on a node other than the node of
the CPU that faulted on the memory is counted as non-local, which
causes the scan rate to go up.

Counting the memory on any node where the task's numa group is
actively running as local, allows the scan rate to slow down
once the application is settled in.

This should reduce the overhead of the automatic NUMA placement
code, when a workload spans multiple NUMA nodes.

Signed-off-by: Rik van Riel <riel@redhat.com>
Tested-by: Vinod Chegu <chegu_vinod@hp.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1397235629-16328-2-git-send-email-riel@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5d859ec..f6457b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1738,6 +1738,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
 	struct task_struct *p = current;
 	bool migrated = flags & TNF_MIGRATED;
 	int cpu_node = task_node(current);
+	int local = !!(flags & TNF_FAULT_LOCAL);
 	int priv;
 
 	if (!numabalancing_enabled)
@@ -1786,6 +1787,17 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
 			task_numa_group(p, last_cpupid, flags, &priv);
 	}
 
+	/*
+	 * If a workload spans multiple NUMA nodes, a shared fault that
+	 * occurs wholly within the set of nodes that the workload is
+	 * actively using should be counted as local. This allows the
+	 * scan rate to slow down when a workload has settled down.
+	 */
+	if (!priv && !local && p->numa_group &&
+			node_isset(cpu_node, p->numa_group->active_nodes) &&
+			node_isset(mem_node, p->numa_group->active_nodes))
+		local = 1;
+
 	task_numa_placement(p);
 
 	/*
@@ -1800,7 +1812,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags)
 
 	p->numa_faults_buffer_memory[task_faults_idx(mem_node, priv)] += pages;
 	p->numa_faults_buffer_cpu[task_faults_idx(cpu_node, priv)] += pages;
-	p->numa_faults_locality[!!(flags & TNF_FAULT_LOCAL)] += pages;
+	p->numa_faults_locality[local] += pages;
 }
 
 static void reset_ptenuma_scan(struct task_struct *p)

next prev parent reply	other threads:[~2014-05-08 10:43 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-11 17:00 [PATCH 0/3] sched,numa: reduce page migrations with pseudo-interleaving riel
2014-04-11 17:00 ` [PATCH 1/3] sched,numa: count pages on active node as local riel
2014-04-11 17:34   ` Joe Perches
2014-04-11 17:41     ` Rik van Riel
2014-04-11 18:01       ` Joe Perches
2014-04-25  9:04   ` Mel Gorman
2014-05-08 10:42   ` tip-bot for Rik van Riel [this message]
2014-04-11 17:00 ` [PATCH 2/3] sched,numa: retry placement more frequently when misplaced riel
2014-04-11 17:46   ` Joe Perches
2014-04-11 18:03     ` Rik van Riel
2014-04-14  8:19       ` Ingo Molnar
2014-04-25  9:05   ` Mel Gorman
2014-05-08 10:42   ` [tip:sched/core] sched/numa: Retry " tip-bot for Rik van Riel
2014-04-11 17:00 ` [PATCH 3/3] sched,numa: do not set preferred_node on migration to a second choice node riel
2014-04-14 12:56   ` Peter Zijlstra
2014-04-15 14:35     ` Rik van Riel
2014-04-15 16:51       ` Peter Zijlstra
2014-04-25  9:09   ` Mel Gorman
2014-05-08 10:43   ` [tip:sched/core] sched/numa: Do " tip-bot for Rik van Riel

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:5d859ec dfblob:f6457b6 )
 OR (
bs:"[tip:sched/core] sched/numa: Count pages on active node as local" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tip-792568ec6a31ca560ca4d528782cbc6cd2cea8b0@git.kernel.org \
    --to=tipbot@zytor.com \
    --cc=chegu_vinod@hp.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox