[RFCv3 6/8] sched/fair: Tune task wake-up logic to pack jitter tasks

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Parth Shah <parth@linux.ibm.com>
To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Cc: mingo@redhat.com, peterz@infradead.org, dietmar.eggemann@arm.com,
	patrick.bellasi@arm.com
Subject: [RFCv3 6/8] sched/fair: Tune task wake-up logic to pack jitter tasks
Date: Tue, 25 Jun 2019 10:07:24 +0530	[thread overview]
Message-ID: <20190625043726.21490-7-parth@linux.ibm.com> (raw)
In-Reply-To: <20190625043726.21490-1-parth@linux.ibm.com>

The algorithm finds the first non idle core in the system and tries to
place a task in the least utilized CPU in the chosen core. To maintain
cache hotness, work of finding non idle core starts from the prev_cpu,
which also reduces task ping-pong behaviour inside of the core.

This patch defines a new method named core_underutilized() which will
determine if the core utilization is less than 12.5% of its capacity.
Since core with low utilization should not be selected for packing, the
margin of under-utilization is kept at 12.5% of core capacity.

12.5% is an experimental number which identifies whether the core is
considered to be idle or not.  For task packing, the algorithm should
select the best core where the task can be accommodated such that it does
not wake up an idle core. But the jitter tasks should not be placed on the
core which is about to go idle. If the core has aggregated utilization of
<12.5%, it may go idle soon and hence packing on such core should be
ignored. The experiment showed that keeping this threshold to 12.5% gives
better decision capability on not selecting the core which will idle out
soon.

Signed-off-by: Parth Shah <parth@linux.ibm.com>
---
 kernel/sched/fair.c | 116 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 114 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ff3f88d788d8..9d11631ce18c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5318,6 +5318,8 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 /* Working cpumask for: load_balance, load_balance_newidle. */
 DEFINE_PER_CPU(cpumask_var_t, load_balance_mask);
 DEFINE_PER_CPU(cpumask_var_t, select_idle_mask);
+/* A cpumask to find active cores in the system. */
+DEFINE_PER_CPU(cpumask_var_t, turbo_sched_mask);
 
 #ifdef CONFIG_NO_HZ_COMMON
 
@@ -5929,8 +5931,22 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	return cpu;
 }
 
-#ifdef CONFIG_SCHED_SMT
+#ifdef CONFIG_UCLAMP_TASK
+static inline bool is_task_jitter(struct task_struct *p)
+{
+	if (p->is_jitter == 1)
+		return true;
 
+	return false;
+}
+#else
+static inline bool is_task_jitter(struct task_struct *p)
+{
+	return false;
+}
+#endif
+
+#ifdef CONFIG_SCHED_SMT
 #ifndef arch_scale_core_capacity
 static inline unsigned long arch_scale_core_capacity(int first_thread,
 						     unsigned long smt_cap)
@@ -5946,6 +5962,81 @@ static inline unsigned long arch_scale_core_capacity(int first_thread,
 }
 #endif
 
+/*
+ * Core is defined as under-utilized in case if the aggregated utilization of a
+ * all the CPUs in a core is less than 12.5%
+ */
+#define UNDERUTILIZED_THRESHOLD 3
+static inline bool core_underutilized(unsigned long core_util,
+				      unsigned long core_capacity)
+{
+	return core_util < (core_capacity >> UNDERUTILIZED_THRESHOLD);
+}
+
+/*
+ * Try to find a non idle core in the system  with spare capacity
+ * available for task packing, thereby keeping minimal cores active.
+ * Uses first fit algorithm to pack low util jitter tasks on active cores.
+ */
+static int select_non_idle_core(struct task_struct *p, int prev_cpu, int target)
+{
+	struct cpumask *cpus = this_cpu_cpumask_var_ptr(turbo_sched_mask);
+	int iter_cpu, sibling;
+
+	cpumask_and(cpus, cpu_online_mask, p->cpus_ptr);
+
+	for_each_cpu_wrap(iter_cpu, cpus, prev_cpu) {
+		unsigned long core_util = 0;
+		unsigned long core_cap = arch_scale_core_capacity(iter_cpu,
+				capacity_of(iter_cpu));
+		unsigned long est_util = 0, est_util_enqueued = 0;
+		unsigned long util_best_cpu = ULONG_MAX;
+		int best_cpu = iter_cpu;
+		struct cfs_rq *cfs_rq;
+
+		for_each_cpu(sibling, cpu_smt_mask(iter_cpu)) {
+			__cpumask_clear_cpu(sibling, cpus);
+			core_util += cpu_util(sibling);
+
+			/*
+			 * Keep track of least utilized CPU in the core
+			 */
+			if (cpu_util(sibling) < util_best_cpu) {
+				util_best_cpu = cpu_util(sibling);
+				best_cpu = sibling;
+			}
+		}
+
+		/*
+		 * Find if the selected task will fit into this core or not by
+		 * estimating the utilization of the core.
+		 */
+		if (!core_underutilized(core_util, core_cap)) {
+			cfs_rq = &cpu_rq(best_cpu)->cfs;
+			est_util =
+				READ_ONCE(cfs_rq->avg.util_avg) + task_util(p);
+			est_util_enqueued =
+				READ_ONCE(cfs_rq->avg.util_est.enqueued);
+			est_util_enqueued += _task_util_est(p);
+			est_util = max(est_util, est_util_enqueued);
+			est_util = core_util - util_best_cpu + est_util;
+
+			if (est_util < core_cap) {
+				/*
+				 * Try to bias towards prev_cpu to avoid task
+				 * ping-pong behaviour inside the core.
+				 */
+				if (cpumask_test_cpu(prev_cpu,
+						     cpu_smt_mask(iter_cpu)))
+					return prev_cpu;
+
+				return best_cpu;
+			}
+		}
+	}
+
+	return select_idle_sibling(p, prev_cpu, target);
+}
 #endif
 
 /*
@@ -6402,6 +6493,23 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 	return -1;
 }
 
+#ifdef CONFIG_SCHED_SMT
+/*
+ * Select all tasks of type 1(jitter) for task packing
+ */
+static inline int turbosched_select_non_idle_core(struct task_struct *p,
+						  int prev_cpu, int target)
+{
+	return select_non_idle_core(p, prev_cpu, target);
+}
+#else
+static inline int turbosched_select_non_idle_core(struct task_struct *p,
+						  int prev_cpu, int target)
+{
+	return select_idle_sibling(p, prev_cpu, target);
+}
+#endif
+
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in domains
  * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
@@ -6467,7 +6575,11 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 	} else if (sd_flag & SD_BALANCE_WAKE) { /* XXX always ? */
 		/* Fast path */
 
-		new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
+		if (is_turbosched_enabled() && unlikely(is_task_jitter(p)))
+			new_cpu = turbosched_select_non_idle_core(p, prev_cpu,
+								  new_cpu);
+		else
+			new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
 
 		if (want_affine)
 			current->recent_used_cpu = cpu;
-- 
2.17.1

next prev parent reply	other threads:[~2019-06-25  4:38 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-25  4:37 [RFCv3 0/8] TurboSched: A scheduler for sustaining Turbo Frequencies for longer durations Parth Shah
2019-06-25  4:37 ` [RFCv3 1/8] sched/core: Add manual jitter classification using sched_setattr syscall Parth Shah
2019-06-25  4:37 ` [RFCv3 2/8] sched: Introduce switch to enable TurboSched mode Parth Shah
2019-06-25  4:37 ` [RFCv3 3/8] sched/core: Update turbo_sched count only when required Parth Shah
2019-06-25  4:37 ` [RFCv3 4/8] sched/fair: Define core capacity to limit task packing Parth Shah
2019-06-25  4:37 ` [RFCv3 5/8] powerpc: Define Core Capacity for POWER systems Parth Shah
2019-06-25  4:37 ` Parth Shah [this message]
2019-06-25  4:37 ` [RFCv3 7/8] sched/fair: Bound non idle core search within LLC domain Parth Shah
2019-06-25  4:37 ` [RFCv3 8/8] powerpc: Set turbo domain to NUMA node for task packing Parth Shah
2019-06-28 13:14 ` [RFCv3 0/8] TurboSched: A scheduler for sustaining Turbo Frequencies for longer durations Patrick Bellasi
2019-06-28 16:42   ` Parth Shah

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:ff3f88d788d dfblob:9d11631ce18 )
 OR (
bs:"[RFCv3 6/8] sched/fair: Tune task wake-up logic to pack jitter tasks" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190625043726.21490-7-parth@linux.ibm.com \
    --to=parth@linux.ibm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).