public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <jolsa@kernel.org>
To: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: lkml <linux-kernel@vger.kernel.org>,
	James Hartsock <hartsjc@redhat.com>,
	Rik van Riel <riel@redhat.com>,
	Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>,
	Kirill Tkhai <ktkhai@parallels.com>
Subject: [PATCH 3/4] sched/fair: Add REBALANCE_AFFINITY rebalancing code
Date: Mon, 20 Jun 2016 14:15:13 +0200	[thread overview]
Message-ID: <1466424914-8981-4-git-send-email-jolsa@kernel.org> (raw)
In-Reply-To: <1466424914-8981-1-git-send-email-jolsa@kernel.org>

Adding rebalance_affinity function that place tasks
based on their cpus_allowed with following logic.

Current load balancing places tasks on runqueues based
on their weight to achieve balance within sched domains.

Sched domains are defined at the start and can't be changed
during runtime. If user defines workload affinity settings
unevenly with sched domains, he could get unbalanced state
within his affinity group, like:

Say we have following sched domains:
  domain 0: (pairs)
  domain 1: 0-5,12-17 (group1)  6-11,18-23 (group2)
  domain 2: 0-23 level NUMA

User runs workload with affinity setup that takes
one CPU from group1 (0) and the rest from group 2:
    0,6,7,8,9,10,11,18,19,20,21,22

User will see idle CPUs within his affinity group,
because load balancer will balance tasks based on load
within group1 and group2, thus placing eqaul load
of tasks on CPU 0 and on the rest of CPUs.

The rebalance_affinity function detects above setup
and tries to place task with cpus_allowed on idle
CPUs within their allowed mask if there are any.

Once such task is re-balanced the load balancer is not
allowed to touch it (balance it) unless it's reattached
to runqueue.

This functionality is in place only if REBALANCE_AFFINITY
feature is enabled.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---
 kernel/sched/fair.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 99 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 78c4127f2f3a..736e525e189c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6100,16 +6100,22 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	return 0;
 }
 
+static void __detach_task(struct task_struct *p,
+			  struct rq *src_rq, int dst_cpu)
+{
+	lockdep_assert_held(&src_rq->lock);
+
+	p->on_rq = TASK_ON_RQ_MIGRATING;
+	deactivate_task(src_rq, p, 0);
+	set_task_cpu(p, dst_cpu);
+}
+
 /*
  * detach_task() -- detach the task for the migration specified in env
  */
 static void detach_task(struct task_struct *p, struct lb_env *env)
 {
-	lockdep_assert_held(&env->src_rq->lock);
-
-	p->on_rq = TASK_ON_RQ_MIGRATING;
-	deactivate_task(env->src_rq, p, 0);
-	set_task_cpu(p, env->dst_cpu);
+	__detach_task(p, env->src_rq, env->dst_cpu);
 }
 
 /*
@@ -7833,6 +7839,91 @@ void sched_idle_exit(int cpu)
 	}
 }
 
+static bool has_affinity_set(struct task_struct *p, cpumask_var_t mask)
+{
+	if (!cpumask_and(mask, tsk_cpus_allowed(p), cpu_active_mask))
+		return false;
+
+	cpumask_xor(mask, mask, cpu_active_mask);
+	return !cpumask_empty(mask);
+}
+
+static void rebalance_affinity(struct rq *rq)
+{
+	struct task_struct *p;
+	unsigned long flags;
+	cpumask_var_t mask;
+	bool mask_alloc = false;
+
+	/*
+	 * No need to bother if:
+	 * - there's only 1 task on the queue
+	 * - there's no idle cpu at the moment.
+	 */
+	if (rq->nr_running <= 1)
+		return;
+
+	if (!atomic_read(&balance.nr_cpus))
+		return;
+
+	raw_spin_lock_irqsave(&rq->lock, flags);
+
+	list_for_each_entry(p, &rq->cfs_tasks, se.group_node) {
+		struct rq *dst_rq;
+		int cpu;
+
+		/*
+		 * Force affinity balance only if:
+		 * - task is not current one
+		 * - task is already balanced (p->se.dont_balance is set)
+		 * - task has cpus_allowed set
+		 * - we have idle cpu ready within task's cpus_allowed
+		 */
+		if (task_running(rq, p))
+			continue;
+
+		if (p->se.dont_balance)
+			continue;
+
+		if (!mask_alloc) {
+			int ret = zalloc_cpumask_var(&mask, GFP_KERNEL);
+
+			if (WARN_ON_ONCE(!ret))
+				return;
+			mask_alloc = true;
+		}
+
+		if (!has_affinity_set(p, mask))
+			continue;
+
+		if (!cpumask_and(mask, tsk_cpus_allowed(p), balance.idle_cpus_mask))
+			continue;
+
+		cpu = cpumask_any_but(mask, task_cpu(p));
+		if (cpu >= nr_cpu_ids)
+			continue;
+
+		__detach_task(p, rq, cpu);
+		raw_spin_unlock(&rq->lock);
+
+		dst_rq = cpu_rq(cpu);
+
+		raw_spin_lock(&dst_rq->lock);
+		attach_task(dst_rq, p);
+		p->se.dont_balance = true;
+		raw_spin_unlock(&dst_rq->lock);
+
+		local_irq_restore(flags);
+		free_cpumask_var(mask);
+		return;
+	}
+
+	raw_spin_unlock_irqrestore(&rq->lock, flags);
+
+	if (mask_alloc)
+		free_cpumask_var(mask);
+}
+
 #ifdef CONFIG_NO_HZ_COMMON
 /*
  * idle load balancing details
@@ -8077,6 +8168,9 @@ out:
 			nohz.next_balance = rq->next_balance;
 #endif
 	}
+
+	if (sched_feat(REBALANCE_AFFINITY))
+		rebalance_affinity(rq);
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
-- 
2.4.11

  parent reply	other threads:[~2016-06-20 12:24 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-20 12:15 [RFC 0/4] sched/fair: Rebalance tasks based on affinity Jiri Olsa
2016-06-20 12:15 ` [PATCH 1/4] sched/fair: Introduce sched_entity::dont_balance Jiri Olsa
2016-06-20 14:28   ` Peter Zijlstra
2016-06-20 15:59     ` Jiri Olsa
2016-06-21  8:09       ` Peter Zijlstra
2016-06-20 12:15 ` [PATCH 2/4] sched/fair: Introduce idle enter/exit balance callbacks Jiri Olsa
2016-06-20 14:30   ` Peter Zijlstra
2016-06-20 16:27     ` Jiri Olsa
2016-06-21  8:12       ` Peter Zijlstra
2016-06-20 12:15 ` Jiri Olsa [this message]
2016-06-30 10:58   ` [PATCH 3/4] sched/fair: Add REBALANCE_AFFINITY rebalancing code Peter Zijlstra
2016-07-01  7:35     ` Jiri Olsa
2016-07-01  8:24       ` Peter Zijlstra
     [not found]         ` <CAM1qU8Ms2ZO5fnYRVX51uiBC5pZX6Hpa2W3Wqj5NjnYp3t8kOA@mail.gmail.com>
2016-07-01 14:58           ` Peter Zijlstra
2016-06-20 12:15 ` [PATCH 4/4] sched/fair: Add schedstat debug values for REBALANCE_AFFINITY Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1466424914-8981-4-git-send-email-jolsa@kernel.org \
    --to=jolsa@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=hartsjc@redhat.com \
    --cc=ktkhai@parallels.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=vatsa@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox