public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Vernet <void@manifault.com>
To: linux-kernel@vger.kernel.org
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, vschneid@redhat.com, gautham.shenoy@amd.com,
	kprateek.nayak@amd.com, aaron.lu@intel.com, clm@meta.com,
	tj@kernel.org, roman.gushchin@linux.dev, kernel-team@meta.com
Subject: [PATCH v2 4/7] sched/fair: Add SHARED_RUNQ sched feature and skeleton calls
Date: Mon, 10 Jul 2023 15:03:39 -0500	[thread overview]
Message-ID: <20230710200342.358255-5-void@manifault.com> (raw)
In-Reply-To: <20230710200342.358255-1-void@manifault.com>

For certain workloads in CFS, CPU utilization is of the upmost
importance. For example, at Meta, our main web workload benefits from a
1 - 1.5% improvement in RPS, and a 1 - 2% improvement in p99 latency,
when CPU utilization is pushed as high as possible.

This is likely something that would be useful for any workload with long
slices, or for which avoiding migration is unlikely to result in
improved cache locality.

We will soon be enabling more aggressive load balancing via a new
feature called shared_runq, which places tasks into a FIFO queue on the
wakeup path, and then dequeues them in newidle_balance(). We don't want
to enable the feature by default, so this patch defines and declares a
new scheduler feature called SHARED_RUNQ which is disabled by default.
In addition, we add some calls to empty / skeleton functions in the
relevant fair codepaths where shared_runq will be utilized.

A set of future patches will implement these functions, and enable
shared_runq for both single and multi socket / CCX architectures.

Note as well that in future patches, the location of these calls may
change. For example, if we end up enqueueing tasks in a shared runqueue
any time they become runnable, we'd move the calls from
enqueue_task_fair() and pick_next_task_fair() to __enqueue_entity() and
__dequeue_entity() respectively.

Originally-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: David Vernet <void@manifault.com>
---
 kernel/sched/fair.c     | 32 ++++++++++++++++++++++++++++++++
 kernel/sched/features.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e882b7bf5b4..f7967be7646c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -140,6 +140,18 @@ static int __init setup_sched_thermal_decay_shift(char *str)
 __setup("sched_thermal_decay_shift=", setup_sched_thermal_decay_shift);
 
 #ifdef CONFIG_SMP
+static void shared_runq_enqueue_task(struct rq *rq, struct task_struct *p,
+				     int enq_flags)
+{}
+
+static int shared_runq_pick_next_task(struct rq *rq, struct rq_flags *rf)
+{
+	return 0;
+}
+
+static void shared_runq_dequeue_task(struct task_struct *p)
+{}
+
 /*
  * For asym packing, by default the lower numbered CPU has higher priority.
  */
@@ -162,6 +174,13 @@ int __weak arch_asym_cpu_priority(int cpu)
  * (default: ~5%)
  */
 #define capacity_greater(cap1, cap2) ((cap1) * 1024 > (cap2) * 1078)
+#else
+static void shared_runq_enqueue_task(struct rq *rq, struct task_struct *p,
+				     int enq_flags)
+{}
+
+static void shared_runq_dequeue_task(struct task_struct *p)
+{}
 #endif
 
 #ifdef CONFIG_CFS_BANDWIDTH
@@ -6386,6 +6405,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 	if (!task_new)
 		update_overutilized_status(rq);
 
+	if (sched_feat(SHARED_RUNQ))
+		shared_runq_enqueue_task(rq, p, flags);
+
 enqueue_throttle:
 	assert_list_leaf_cfs_rq(rq);
 
@@ -6467,6 +6489,9 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 dequeue_throttle:
 	util_est_update(&rq->cfs, p, task_sleep);
 	hrtick_update(rq);
+
+	if (sched_feat(SHARED_RUNQ))
+		shared_runq_dequeue_task(p);
 }
 
 #ifdef CONFIG_SMP
@@ -8173,6 +8198,9 @@ done: __maybe_unused;
 
 	update_misfit_status(p, rq);
 
+	if (sched_feat(SHARED_RUNQ))
+		shared_runq_dequeue_task(p);
+
 	return p;
 
 idle:
@@ -11843,6 +11871,9 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 	if (!cpu_active(this_cpu))
 		return 0;
 
+	if (sched_feat(SHARED_RUNQ) && shared_runq_pick_next_task(this_rq, rf))
+		return -1;
+
 	/*
 	 * We must set idle_stamp _before_ calling idle_balance(), such that we
 	 * measure the duration of idle_balance() as idle time.
@@ -12343,6 +12374,7 @@ static void attach_task_cfs_rq(struct task_struct *p)
 
 static void switched_from_fair(struct rq *rq, struct task_struct *p)
 {
+	shared_runq_dequeue_task(p);
 	detach_task_cfs_rq(p);
 }
 
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index ee7f23c76bd3..cd5db1a24181 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -101,3 +101,4 @@ SCHED_FEAT(LATENCY_WARN, false)
 
 SCHED_FEAT(ALT_PERIOD, true)
 SCHED_FEAT(BASE_SLICE, true)
+SCHED_FEAT(SHARED_RUNQ, false)
-- 
2.40.1


  parent reply	other threads:[~2023-07-10 20:04 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10 20:03 [PATCH v2 0/7] sched: Implement shared runqueue in CFS David Vernet
2023-07-10 20:03 ` [PATCH v2 1/7] sched: Expose move_queued_task() from core.c David Vernet
2023-07-10 20:03 ` [PATCH v2 2/7] sched: Move is_cpu_allowed() into sched.h David Vernet
2023-07-10 20:03 ` [PATCH v2 3/7] sched: Check cpu_active() earlier in newidle_balance() David Vernet
2023-07-10 20:03 ` David Vernet [this message]
2023-07-11  9:45   ` [PATCH v2 4/7] sched/fair: Add SHARED_RUNQ sched feature and skeleton calls Peter Zijlstra
2023-07-11 16:19     ` David Vernet
2023-07-12  8:39   ` Abel Wu
2023-07-12 21:34     ` David Vernet
2023-07-10 20:03 ` [PATCH v2 5/7] sched: Implement shared runqueue in CFS David Vernet
2023-07-11 10:18   ` Peter Zijlstra
2023-07-11 16:26     ` David Vernet
2023-07-12  6:00   ` Gautham R. Shenoy
2023-07-12 19:13     ` David Vernet
2023-07-12 10:47   ` Abel Wu
2023-07-12 22:16     ` David Vernet
2023-07-13  3:43       ` Abel Wu
2023-07-13  4:05         ` David Vernet
2023-07-13  7:58   ` Aaron Lu
2023-07-13  8:29   ` Peter Zijlstra
2023-07-10 20:03 ` [PATCH v2 6/7] sched: Shard per-LLC shared runqueues David Vernet
2023-07-11 10:49   ` Peter Zijlstra
2023-07-11 19:57     ` David Vernet
2023-07-12 10:06       ` Gautham R. Shenoy
2023-07-12 12:22         ` Peter Zijlstra
2023-07-10 20:03 ` [PATCH v2 7/7] sched: Move shared_runq to __{enqueue,dequeue}_entity() David Vernet
2023-07-11 10:51   ` Peter Zijlstra
2023-07-11 16:30     ` David Vernet
2023-07-11 11:42 ` [PATCH v2 0/7] sched: Implement shared runqueue in CFS Peter Zijlstra
2023-07-11 21:33   ` David Vernet
2023-07-21  9:12 ` Gautham R. Shenoy
2023-07-25 20:22   ` David Vernet
2023-08-02  6:32     ` Gautham R. Shenoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230710200342.358255-5-void@manifault.com \
    --to=void@manifault.com \
    --cc=aaron.lu@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=clm@meta.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=gautham.shenoy@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox