linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Ingo Molnar <mingo@kernel.org>
Cc: Rik van Riel <riel@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hugh Dickins <hughd@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>, Mel Gorman <mgorman@suse.de>
Subject: [PATCH 36/43] sched: numa: Introduce tsk_home_node()
Date: Fri, 16 Nov 2012 11:22:46 +0000	[thread overview]
Message-ID: <1353064973-26082-37-git-send-email-mgorman@suse.de> (raw)
In-Reply-To: <1353064973-26082-1-git-send-email-mgorman@suse.de>

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Introduce the home-node concept for tasks. In order to keep memory
locality we need to have a something to stay local to, we define the
home-node of a task as the node we prefer to allocate memory from and
prefer to execute on.

These are no hard guarantees, merely soft preferences. This allows for
optimal resource usage, we can run a task away from the home-node, the
remote memory hit -- while expensive -- is less expensive than not
running at all, or very little, due to severe cpu overload.

Similarly, we can allocate memory from another node if our home-node
is depleted, again, some memory is better than no memory.

This patch merely introduces the basic infrastructure, all policy
comes later.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/linux/init_task.h |    8 ++++++++
 include/linux/sched.h     |   10 ++++++++++
 kernel/sched/core.c       |   36 ++++++++++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 6d087c5..fdf0692 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -143,6 +143,13 @@ extern struct task_group root_task_group;
 
 #define INIT_TASK_COMM "swapper"
 
+#ifdef CONFIG_BALANCE_NUMA
+# define INIT_TASK_NUMA(tsk)						\
+	.home_node = -1,
+#else
+# define INIT_TASK_NUMA(tsk)
+#endif
+
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1fffff (=2MB)
@@ -210,6 +217,7 @@ extern struct task_group root_task_group;
 	INIT_TRACE_RECURSION						\
 	INIT_TASK_RCU_PREEMPT(tsk)					\
 	INIT_CPUSET_SEQ							\
+	INIT_TASK_NUMA(tsk)						\
 }
 
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a2b06ea..b8580f5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1480,6 +1480,7 @@ struct task_struct {
 	short pref_node_fork;
 #endif
 #ifdef CONFIG_BALANCE_NUMA
+	int home_node;
 	int numa_scan_seq;
 	int numa_migrate_seq;
 	unsigned int numa_scan_period;
@@ -1569,6 +1570,15 @@ static inline void task_numa_fault(int node, int pages)
 }
 #endif
 
+static inline int tsk_home_node(struct task_struct *p)
+{
+#ifdef CONFIG_BALANCE_NUMA
+	return p->home_node;
+#else
+	return -1;
+#endif
+}
+
 /*
  * Priority of a process goes from 0..MAX_PRIO-1, valid RT
  * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 047e3c7..55dcf53 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5972,6 +5972,42 @@ static struct sched_domain_topology_level default_topology[] = {
 
 static struct sched_domain_topology_level *sched_domain_topology = default_topology;
 
+#ifdef CONFIG_BALANCE_NUMA
+
+/*
+ * Requeues a task ensuring its on the right load-balance list so
+ * that it might get migrated to its new home.
+ *
+ * Note that we cannot actively migrate ourselves since our callers
+ * can be from atomic context. We rely on the regular load-balance
+ * mechanisms to move us around -- its all preference anyway.
+ */
+void sched_setnode(struct task_struct *p, int node)
+{
+	unsigned long flags;
+	int on_rq, running;
+	struct rq *rq;
+
+	rq = task_rq_lock(p, &flags);
+	on_rq = p->on_rq;
+	running = task_current(rq, p);
+
+	if (on_rq)
+		dequeue_task(rq, p, 0);
+	if (running)
+		p->sched_class->put_prev_task(rq, p);
+
+	p->home_node = node;
+
+	if (running)
+		p->sched_class->set_curr_task(rq);
+	if (on_rq)
+		enqueue_task(rq, p, 0);
+	task_rq_unlock(rq, p, &flags);
+}
+
+#endif /* CONFIG_BALANCE_NUMA */
+
 #ifdef CONFIG_NUMA
 
 static int sched_domains_numa_levels;
-- 
1.7.9.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-11-16 11:23 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-16 11:22 [RFC PATCH 00/43] Automatic NUMA Balancing V3 Mel Gorman
2012-11-16 11:22 ` [PATCH 01/43] mm: compaction: Move migration fail/success stats to migrate.c Mel Gorman
2012-11-16 11:22 ` [PATCH 02/43] mm: migrate: Add a tracepoint for migrate_pages Mel Gorman
2012-11-16 11:22 ` [PATCH 03/43] mm: compaction: Add scanned and isolated counters for compaction Mel Gorman
2012-11-16 11:22 ` [PATCH 04/43] mm: numa: define _PAGE_NUMA Mel Gorman
2012-11-16 11:22 ` [PATCH 05/43] mm: numa: pte_numa() and pmd_numa() Mel Gorman
2012-11-16 11:22 ` [PATCH 06/43] mm: numa: Make pte_numa() and pmd_numa() a generic implementation Mel Gorman
2012-11-16 14:09   ` Rik van Riel
2012-11-16 14:41     ` Mel Gorman
2012-11-16 15:32       ` Linus Torvalds
2012-11-16 16:08         ` Ingo Molnar
2012-11-16 16:56           ` Mel Gorman
2012-11-16 17:12             ` Ingo Molnar
2012-11-16 17:48               ` Mel Gorman
2012-11-16 18:04                 ` Ingo Molnar
2012-11-16 18:55                   ` Mel Gorman
2012-11-16 17:26             ` Rik van Riel
2012-11-16 17:37             ` Ingo Molnar
2012-11-16 18:44               ` Mel Gorman
2012-11-16 16:19         ` Mel Gorman
2012-11-16 11:22 ` [PATCH 07/43] mm: numa: Support NUMA hinting page faults from gup/gup_fast Mel Gorman
2012-11-16 14:09   ` Rik van Riel
2012-11-16 11:22 ` [PATCH 08/43] mm: numa: split_huge_page: transfer the NUMA type from the pmd to the pte Mel Gorman
2012-11-16 11:22 ` [PATCH 09/43] mm: numa: Create basic numa page hinting infrastructure Mel Gorman
2012-11-16 11:22 ` [PATCH 10/43] mm: mempolicy: Make MPOL_LOCAL a real policy Mel Gorman
2012-11-16 11:22 ` [PATCH 11/43] mm: mempolicy: Add MPOL_MF_NOOP Mel Gorman
2012-11-16 11:22 ` [PATCH 12/43] mm: mempolicy: Check for misplaced page Mel Gorman
2012-11-16 11:22 ` [PATCH 13/43] mm: migrate: Introduce migrate_misplaced_page() Mel Gorman
2012-11-19 19:44   ` [tip:numa/core] mm/migration: Improve migrate_misplaced_page() tip-bot for Mel Gorman
2012-11-16 11:22 ` [PATCH 14/43] mm: mempolicy: Use _PAGE_NUMA to migrate pages Mel Gorman
2012-11-16 16:08   ` Rik van Riel
2012-11-16 11:22 ` [PATCH 15/43] mm: mempolicy: Add MPOL_MF_LAZY Mel Gorman
2012-11-16 11:22 ` [PATCH 16/43] mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now Mel Gorman
2012-11-16 16:22   ` Rik van Riel
2012-11-16 11:22 ` [PATCH 17/43] sched, mm, x86: Add the ARCH_SUPPORTS_NUMA_BALANCING flag Mel Gorman
2012-11-16 11:22 ` [PATCH 18/43] mm: numa: Add fault driven placement and migration Mel Gorman
2012-11-16 11:22 ` [PATCH 19/43] mm: numa: Avoid double faulting after migrating misplaced page Mel Gorman
2012-11-16 11:22 ` [PATCH 20/43] mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate Mel Gorman
2012-11-16 11:22 ` [PATCH 21/43] sched, numa, mm: Count WS scanning against present PTEs, not virtual memory ranges Mel Gorman
2012-11-16 11:22 ` [PATCH 22/43] mm: sched: numa: Implement slow start for working set sampling Mel Gorman
2012-11-16 11:22 ` [PATCH 23/43] mm: numa: Add pte updates, hinting and migration stats Mel Gorman
2012-11-16 11:22 ` [PATCH 24/43] mm: numa: Migrate on reference policy Mel Gorman
2012-11-16 11:22 ` [PATCH 25/43] mm: numa: Migrate pages handled during a pmd_numa hinting fault Mel Gorman
2012-11-16 11:22 ` [PATCH 26/43] mm: numa: Only mark a PMD pmd_numa if the pages are all on the same node Mel Gorman
2012-11-16 11:22 ` [PATCH 27/43] mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting Mel Gorman
2012-11-16 11:22 ` [PATCH 28/43] mm: numa: Rate limit the amount of memory that is migrated between nodes Mel Gorman
2012-11-16 11:22 ` [PATCH 29/43] mm: numa: Rate limit setting of pte_numa if node is saturated Mel Gorman
2012-11-16 11:22 ` [PATCH 30/43] sched: numa: Slowly increase the scanning period as NUMA faults are handled Mel Gorman
2012-11-16 11:22 ` [PATCH 31/43] mm: numa: Introduce last_nid to the page frame Mel Gorman
2012-11-16 11:22 ` [PATCH 32/43] mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships Mel Gorman
2012-11-16 11:22 ` [PATCH 33/43] x86: mm: only do a local tlb flush in ptep_set_access_flags() Mel Gorman
2012-11-16 11:22 ` [PATCH 34/43] x86: mm: drop TLB flush from ptep_set_access_flags Mel Gorman
2012-11-16 11:22 ` [PATCH 35/43] mm,generic: only flush the local TLB in ptep_set_access_flags Mel Gorman
2012-11-16 11:22 ` Mel Gorman [this message]
2012-11-16 11:22 ` [PATCH 37/43] sched: numa: Make find_busiest_queue() a method Mel Gorman
2012-11-16 11:22 ` [PATCH 38/43] sched: numa: Implement home-node awareness Mel Gorman
2012-11-16 11:22 ` [PATCH 39/43] sched: numa: Introduce per-mm and per-task structures Mel Gorman
2012-11-16 11:22 ` [PATCH 40/43] sched: numa: CPU follows memory Mel Gorman
2012-11-16 11:22 ` [PATCH 41/43] sched: numa: Rename mempolicy to HOME Mel Gorman
2012-11-16 11:22 ` [PATCH 42/43] sched: numa: Consider only one CPU per node for CPU-follows-memory Mel Gorman
2012-11-16 11:22 ` [PATCH 43/43] sched: numa: Increase and decrease a tasks scanning period based on task fault statistics Mel Gorman
2012-11-16 14:56 ` [RFC PATCH 00/43] Automatic NUMA Balancing V3 Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1353064973-26082-37-git-send-email-mgorman@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).