All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Julien Desfossez <jdesfossez@digitalocean.com>,
	Vineeth Pillai <viremana@linux.microsoft.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	Dhaval Giani <dhaval.giani@oracle.com>,
	Chris Hyser <chris.hyser@oracle.com>,
	Nishanth Aravamudan <naravamudan@digitalocean.com>,
	mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
	linux-kernel@vger.kernel.org, fweisbec@gmail.com,
	keescook@chromium.org, kerrnel@google.com,
	Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	vineeth@bitbyteword.org, Chen Yu <yu.c.chen@intel.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Agata Gruza <agata.gruza@intel.com>,
	Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>,
	graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com,
	rostedt@goodmis.org, derkling@google.com, benbjiang@tencent.com,
	Aaron Lu <ziqian.lzq@antfin.com>
Subject: Re: [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison
Date: Fri, 25 Sep 2020 11:02:09 -0400	[thread overview]
Message-ID: <20200925150209.GA3567448@google.com> (raw)
In-Reply-To: <20200923015243.GA1739137@google.com>

On Tue, Sep 22, 2020 at 09:52:43PM -0400, Joel Fernandes wrote:
> On Tue, Sep 22, 2020 at 09:46:22PM -0400, Joel Fernandes wrote:
> > On Fri, Aug 28, 2020 at 11:29:27PM +0200, Peter Zijlstra wrote:
> > > 
> > > 
> > > This is still a horrible patch..
> > 
> > Hi Peter,
> > I wrote a new patch similar to this one and it fares much better in my tests,
> > it is based on Aaron's idea but I do the sync only during force-idle, and not
> > during enqueue. Also I yanked the whole 'core wide min_vruntime' crap. There
> > is a regressing test which improves quite a bit with my patch (results below):
> > 
> > Aaron, Vineeth, Chris any other thoughts? This patch is based on Google's
> > 4.19 device kernel so will require some massaging to apply to mainline/v7
> > series. I will provide an updated patch later based on v7 series.
> > 
> > (Works only for SMT2, maybe we can generalize it more..)
> > --------8<-----------
> > 
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > Subject: [PATCH] sched: Sync the min_vruntime of cores when the system enters
> >  force-idle
> > 
> > This patch provides a vruntime based way to compare two cfs task's priority, be
> > it on the same cpu or different threads of the same core.
> > 
> > It is based on Aaron Lu's patch with some important differences. Namely,
> > the vruntime is sync'ed only when the CPU goes into force-idle. Also I removed
> > the notion of core-wide min_vruntime.
> > 
> > Also I don't care how long a cpu in a core is force idled,  I do my sync
> > whenever the force idle starts essentially bringing both SMTs to a common time
> > base. After that point, selection can happen as usual.
> > 
> > When running an Android audio test, with patch the perf sched latency output:
> > 
> > -----------------------------------------------------------------------------------------------------------------
> > Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
> > -----------------------------------------------------------------------------------------------------------------
> > FinalizerDaemon:(2)   |     23.969 ms |      969 | avg:    0.504 ms | max:  162.020 ms | max at:   1294.327339 s
> > HeapTaskDaemon:(3)    |   2421.287 ms |     4733 | avg:    0.131 ms | max:   96.229 ms | max at:   1302.343366 s
> > adbd:(3)              |      6.101 ms |       79 | avg:    1.105 ms | max:   84.923 ms | max at:   1294.431284 s
> > 
> > Without this patch and with Aubrey's initial patch (in v5 series), the max delay looks much better:
> > 
> > -----------------------------------------------------------------------------------------------------------------
> > Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |
> > -----------------------------------------------------------------------------------------------------------------
> > HeapTaskDaemon:(2)    |   2602.109 ms |     4025 | avg:    0.231 ms | max:   19.152 ms | max at:    522.903934 s
> > surfaceflinger:7478   |     18.994 ms |     1206 | avg:    0.189 ms | max:   17.375 ms | max at:    520.523061 s
> > ksoftirqd/3:30        |      0.093 ms |        5 | avg:    3.328 ms | max:   16.567 ms | max at:    522.903871 s
> 
> I messed up the change log, just to clarify - the first result is without
> patch (bad) and the second result is with patch (good).

Here's another approach that might be worth considering, I was discussing
with Vineeth. Freeze the min_vruntime of CPUs when the core enters into
force-idle. I think this is similar to Peter's suggestion.

It is doing quite well in my real-world audio tests. This applies on top of
our ChromeOS 4.19 kernel tree [1] (which has the v5 series).

Any thoughts or review are most welcome especially from Peter :)

[1] https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-4.19

---8<-----------------------

From: Joel Fernandes <joelaf@google.com>
Subject: [PATCH] Sync the min_vruntime of cores when the system enters force-idle

---
 kernel/sched/core.c  | 24 +++++++++++++++++-
 kernel/sched/fair.c  | 59 +++++++-------------------------------------
 kernel/sched/sched.h |  1 +
 3 files changed, 33 insertions(+), 51 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 715391c418d8..4ab680319a6b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4073,6 +4073,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	const struct cpumask *smt_mask;
 	int i, j, cpu, occ = 0;
 	bool need_sync = false;
+	bool fi_before = false;
 
 	cpu = cpu_of(rq);
 	if (cpu_is_offline(cpu))
@@ -4138,6 +4139,16 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 			update_rq_clock(rq_i);
 	}
 
+	fi_before = need_sync;
+	if (!need_sync) {
+		for_each_cpu(i, smt_mask) {
+			struct rq *rq_i = cpu_rq(i);
+
+			/* Reset the snapshot if core is no longer in force-idle. */
+			rq_i->cfs.min_vruntime_fi = rq_i->cfs.min_vruntime;
+		}
+	}
+
 	/*
 	 * Try and select tasks for each sibling in decending sched_class
 	 * order.
@@ -4247,6 +4258,7 @@ next_class:;
 	 * their task. This ensures there is no inter-sibling overlap between
 	 * non-matching user state.
 	 */
+	need_sync = false;
 	for_each_cpu(i, smt_mask) {
 		struct rq *rq_i = cpu_rq(i);
 
@@ -4255,8 +4267,10 @@ next_class:;
 
 		WARN_ON_ONCE(!rq_i->core_pick);
 
-		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
+		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running) {
 			rq_i->core_forceidle = true;
+			need_sync = true;
+		}
 
 		rq_i->core_pick->core_occupation = occ;
 
@@ -4270,6 +4284,14 @@ next_class:;
 		WARN_ON_ONCE(!cookie_match(next, rq_i->core_pick));
 	}
 
+	if (!fi_before && need_sync) {
+		for_each_cpu(i, smt_mask) {
+			struct rq *rq_i = cpu_rq(i);
+
+			/* Snapshot if core is in force-idle. */
+			rq_i->cfs.min_vruntime_fi = rq_i->cfs.min_vruntime;
+		}
+	}
 done:
 	set_next_task(rq, next);
 	return next;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 23d032ab62d8..3d7c822bb5fb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -479,59 +479,17 @@ static inline struct cfs_rq *core_cfs_rq(struct cfs_rq *cfs_rq)
 
 static inline u64 cfs_rq_min_vruntime(struct cfs_rq *cfs_rq)
 {
-	if (!sched_core_enabled(rq_of(cfs_rq)))
-		return cfs_rq->min_vruntime;
-
-#ifdef CONFIG_SCHED_CORE
-	if (is_root_cfs_rq(cfs_rq))
-		return core_cfs_rq(cfs_rq)->min_vruntime;
-#endif
 	return cfs_rq->min_vruntime;
 }
 
-#ifdef CONFIG_SCHED_CORE
-static void coresched_adjust_vruntime(struct cfs_rq *cfs_rq, u64 delta)
-{
-	struct sched_entity *se, *next;
-
-	if (!cfs_rq)
-		return;
-
-	cfs_rq->min_vruntime -= delta;
-	rbtree_postorder_for_each_entry_safe(se, next,
-			&cfs_rq->tasks_timeline.rb_root, run_node) {
-		if (se->vruntime > delta)
-			se->vruntime -= delta;
-		if (se->my_q)
-			coresched_adjust_vruntime(se->my_q, delta);
-	}
-}
-
-static void update_core_cfs_rq_min_vruntime(struct cfs_rq *cfs_rq)
-{
-	struct cfs_rq *cfs_rq_core;
-
-	if (!sched_core_enabled(rq_of(cfs_rq)))
-		return;
-
-	if (!is_root_cfs_rq(cfs_rq))
-		return;
-
-	cfs_rq_core = core_cfs_rq(cfs_rq);
-	if (cfs_rq_core != cfs_rq &&
-	    cfs_rq->min_vruntime < cfs_rq_core->min_vruntime) {
-		u64 delta = cfs_rq_core->min_vruntime - cfs_rq->min_vruntime;
-		coresched_adjust_vruntime(cfs_rq_core, delta);
-	}
-}
-#endif
-
 #ifdef CONFIG_FAIR_GROUP_SCHED
 bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
 {
+	bool samecpu = task_cpu(a) == task_cpu(b);
 	struct sched_entity *sea = &a->se;
 	struct sched_entity *seb = &b->se;
-	bool samecpu = task_cpu(a) == task_cpu(b);
+	struct cfs_rq *cfs_rqa;
+	struct cfs_rq *cfs_rqb;
 	s64 delta;
 
 	if (samecpu) {
@@ -555,8 +513,13 @@ bool cfs_prio_less(struct task_struct *a, struct task_struct *b)
 		sea = sea->parent;
 	while (seb->parent)
 		seb = seb->parent;
-	delta = (s64)(sea->vruntime - seb->vruntime);
 
+	cfs_rqa = sea->cfs_rq;
+	cfs_rqb = seb->cfs_rq;
+
+	/* normalize vruntime WRT their rq's base */
+	delta = (s64)(sea->vruntime - seb->vruntime) +
+		(s64)(cfs_rqb->min_vruntime_fi - cfs_rqa->min_vruntime_fi);
 out:
 	return delta > 0;
 }
@@ -620,10 +583,6 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 	/* ensure we never gain time by being placed backwards. */
 	cfs_rq->min_vruntime = max_vruntime(cfs_rq_min_vruntime(cfs_rq), vruntime);
 
-#ifdef CONFIG_SCHED_CORE
-	update_core_cfs_rq_min_vruntime(cfs_rq);
-#endif
-
 #ifndef CONFIG_64BIT
 	smp_wmb();
 	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d09cfbd746e5..45c8ce5c2333 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -499,6 +499,7 @@ struct cfs_rq {
 
 	u64			exec_clock;
 	u64			min_vruntime;
+	u64			min_vruntime_fi;
 #ifndef CONFIG_64BIT
 	u64			min_vruntime_copy;
 #endif
-- 
2.28.0.709.gb0816b6eb0-goog

  reply	other threads:[~2020-09-25 15:02 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 01/23] sched: Wrap rq::lock access Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 02/23] sched: Introduce sched_class::pick_task() Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 03/23] sched: Core-wide rq->lock Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 04/23] sched/fair: Add a few assertions Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 05/23] sched: Basic tracking of matching tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit Julien Desfossez
2020-09-03  5:13   ` Randy Dunlap
2020-08-28 19:51 ` [RFC PATCH v7 07/23] cpumask: Introduce a new iterator for_each_cpu_wrap_or Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
2020-08-28 20:51   ` Peter Zijlstra
2020-08-28 22:02     ` Vineeth Pillai
2020-08-28 22:23       ` Joel Fernandes
2020-08-29  7:47       ` peterz
2020-08-31 13:01         ` Vineeth Pillai
2020-08-31 14:24         ` Joel Fernandes
2020-09-01  3:38         ` Joel Fernandes
2020-09-01  5:10         ` Joel Fernandes
2020-09-01 12:34           ` Vineeth Pillai
2020-09-01 17:30             ` Joel Fernandes
2020-09-01 21:23               ` Vineeth Pillai
2020-09-02  1:11                 ` Joel Fernandes
2020-08-28 20:55   ` Peter Zijlstra
2020-08-28 22:15     ` Vineeth Pillai
2020-09-15 20:08   ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case Julien Desfossez
2020-08-28 21:25   ` Peter Zijlstra
2020-08-28 23:24     ` Vineeth Pillai
2020-08-28 19:51 ` [RFC PATCH v7 10/23] sched/fair: wrapper for cfs_rq->min_vruntime Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison Julien Desfossez
2020-08-28 21:29   ` Peter Zijlstra
2020-09-17 14:15     ` Vineeth Pillai
2020-09-17 20:39       ` Vineeth Pillai
2020-09-23  1:46     ` Joel Fernandes
2020-09-23  1:52       ` Joel Fernandes
2020-09-25 15:02         ` Joel Fernandes [this message]
2020-09-15 21:49   ` chris hyser
     [not found]     ` <81b208ad-b9e6-bfbf-631e-02e9f75d73a2@linux.intel.com>
2020-09-16 14:24       ` chris hyser
2020-09-16 20:53         ` chris hyser
2020-09-17  1:09           ` Li, Aubrey
2020-08-28 19:51 ` [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer Julien Desfossez
2020-09-02  7:08   ` Pavan Kondeti
2020-08-28 19:51 ` [RFC PATCH v7 13/23] sched: migration changes for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 14/23] irq_work: Add support to detect if work is pending Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit Julien Desfossez
2020-08-30  2:17   ` kernel test robot
2020-08-28 19:51 ` [RFC PATCH v7 16/23] arch/x86: Add a new TIF flag for untrusted tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Julien Desfossez
2020-08-30  6:50   ` [kernel/entry] 872a0a3f0b: will-it-scale.per_thread_ops -18.7% regression kernel test robot
2020-09-01 15:54   ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Thomas Gleixner
2020-09-01 16:50     ` Joel Fernandes
2020-09-01 20:02       ` Thomas Gleixner
2020-09-02  1:29         ` Joel Fernandes
2020-09-02  7:53           ` Thomas Gleixner
2020-09-02 15:12             ` Joel Fernandes
2020-09-02 16:57             ` Dario Faggioli
2020-09-03  4:34               ` Joel Fernandes
2020-09-03 11:05                 ` Vineeth Pillai
2020-09-03 13:20                 ` Thomas Gleixner
2020-09-03 20:30                   ` Joel Fernandes
2020-09-03 13:43                 ` Dario Faggioli
2020-09-03 20:25                   ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 18/23] entry/idle: Enter and exit kernel protection during idle entry and exit Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 19/23] entry/kvm: Protect the kernel when entering from guest Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 20/23] sched/coresched: config option for kernel protection Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 21/23] sched: cgroup tagging interface for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 22/23] Documentation: Add documentation on " Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 23/23] sched: Debug bits Julien Desfossez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200925150209.GA3567448@google.com \
    --to=joel@joelfernandes.org \
    --cc=aaron.lwe@gmail.com \
    --cc=agata.gruza@intel.com \
    --cc=antonio.gomez.iglesias@intel.com \
    --cc=aubrey.intel@gmail.com \
    --cc=benbjiang@tencent.com \
    --cc=chris.hyser@oracle.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=derkling@google.com \
    --cc=dfaggioli@suse.com \
    --cc=dhaval.giani@oracle.com \
    --cc=fweisbec@gmail.com \
    --cc=graf@amazon.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=valentin.schneider@arm.com \
    --cc=vineeth@bitbyteword.org \
    --cc=viremana@linux.microsoft.com \
    --cc=yu.c.chen@intel.com \
    --cc=ziqian.lzq@antfin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.