From: Aaron Lu <aaron.lwe@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
"Julien Desfossez" <jdesfossez@digitalocean.com>,
"Tim Chen" <tim.c.chen@linux.intel.com>,
"Ingo Molnar" <mingo@kernel.org>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Paul Turner" <pjt@google.com>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Aaron Lu" <aaron.lu@linux.alibaba.com>,
"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
"Frédéric Weisbecker" <fweisbec@gmail.com>,
"Kees Cook" <keescook@chromium.org>,
"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
"Aubrey Li" <aubrey.intel@gmail.com>,
"Li, Aubrey" <aubrey.li@linux.intel.com>,
"Valentin Schneider" <valentin.schneider@arm.com>,
"Mel Gorman" <mgorman@techsingularity.net>,
"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Joel Fernandes" <joelaf@google.com>,
"Joel Fernandes" <joel@joelfernandes.org>
Subject: Re: [PATCH updated v2] sched/fair: core wide cfs task priority comparison
Date: Fri, 8 May 2020 16:44:19 +0800 [thread overview]
Message-ID: <20200508084419.GA120223@aaronlu-desktop> (raw)
In-Reply-To: <20200506143506.GH5298@hirez.programming.kicks-ass.net>
On Wed, May 06, 2020 at 04:35:06PM +0200, Peter Zijlstra wrote:
>
> Sorry for being verbose; I've been procrastinating replying, and in
> doing so the things I wanted to say kept growing.
>
> On Fri, Apr 24, 2020 at 10:24:43PM +0800, Aaron Lu wrote:
>
> > To make this work, the root level sched entities' vruntime of the two
> > threads must be directly comparable. So one of the hyperthread's root
> > cfs_rq's min_vruntime is chosen as the core wide one and all root level
> > sched entities' vruntime is normalized against it.
>
> > +/*
> > + * This is called in stop machine context so no need to take the rq lock.
> > + *
> > + * Core scheduling is going to be enabled and the root level sched entities
> > + * of both siblings will use cfs_rq->min_vruntime as the common cfs_rq
> > + * min_vruntime, so it's necessary to normalize vruntime of existing root
> > + * level sched entities in sibling_cfs_rq.
> > + *
> > + * Update of sibling_cfs_rq's min_vruntime isn't necessary as we will be
> > + * only using cfs_rq->min_vruntime during the entire run of core scheduling.
> > + */
> > +void sched_core_normalize_se_vruntime(int cpu)
> > +{
> > + struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs;
> > + int i;
> > +
> > + for_each_cpu(i, cpu_smt_mask(cpu)) {
> > + struct sched_entity *se, *next;
> > + struct cfs_rq *sibling_cfs_rq;
> > + s64 delta;
> > +
> > + if (i == cpu)
> > + continue;
> > +
> > + sibling_cfs_rq = &cpu_rq(i)->cfs;
> > + if (!sibling_cfs_rq->nr_running)
> > + continue;
> > +
> > + delta = cfs_rq->min_vruntime - sibling_cfs_rq->min_vruntime;
> > + rbtree_postorder_for_each_entry_safe(se, next,
> > + &sibling_cfs_rq->tasks_timeline.rb_root,
> > + run_node) {
> > + se->vruntime += delta;
> > + }
> > + }
> > +}
>
> Aside from this being way to complicated for what it does -- you
> could've saved the min_vruntime for each rq and compared them with
> subtraction -- it is also terminally broken afaict.
>
> Consider any infeasible weight scenario. Take for instance two tasks,
> each bound to their respective sibling, one with weight 1 and one with
> weight 2. Then the lower weight task will run ahead of the higher weight
> task without bound.
I don't follow how this could happen. Even the lower weight task runs
first, after some time, the higher weight task will get its turn and
from then on, the higher weight task will get more chance to run(due to
its higher weight and thus, slower accumulation of vruntime).
We used to have the following patch as a standalone one in v4:
sched/fair : Wake up forced idle siblings if needed
https://lore.kernel.org/lkml/cover.1572437285.git.vpillai@digitalocean.com/T/#md22d25d0e2932d059013e9b56600d8a847b02a13
Which originates from:
https://lore.kernel.org/lkml/20190725143344.GD992@aaronlu/
And in this series, it seems to be merged in:
[RFC PATCH 07/13] sched: Add core wide task selection and scheduling
https://lore.kernel.org/lkml/e942da7fd881977923463f19648085c1bfaa37f8.1583332765.git.vpillai@digitalocean.com/
My local test shows that when two cgroup's share are both set to 1024
and each bound to one sibling of a core, start a cpu intensive task in
each cgroup, then the cpu intensive task will each consume 50% cpu. When
one cgroup's share set to 512, it will consume about 33% while the other
consumes 67%, as expected.
I think the current patch works fine when 2 differently tagged tasks are
competing CPU, but when there are 3 tasks or more, things can get less
fair.
next prev parent reply other threads:[~2020-05-08 8:44 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-04 16:59 [RFC PATCH 00/13] Core scheduling v5 vpillai
2020-03-04 16:59 ` [RFC PATCH 01/13] sched: Wrap rq::lock access vpillai
2020-03-04 16:59 ` [RFC PATCH 02/13] sched: Introduce sched_class::pick_task() vpillai
2020-03-04 16:59 ` [RFC PATCH 03/13] sched: Core-wide rq->lock vpillai
2020-04-01 11:42 ` [PATCH] sched/arm64: store cpu topology before notify_cpu_starting Cheng Jian
2020-04-01 13:23 ` Valentin Schneider
2020-04-06 8:00 ` chengjian (D)
2020-04-09 9:59 ` Sudeep Holla
2020-04-09 10:32 ` Valentin Schneider
2020-04-09 11:08 ` Sudeep Holla
2020-04-09 17:54 ` Joel Fernandes
2020-04-10 13:49 ` chengjian (D)
2020-04-14 11:36 ` [RFC PATCH 03/13] sched: Core-wide rq->lock Peter Zijlstra
2020-04-14 21:35 ` Vineeth Remanan Pillai
2020-04-15 10:55 ` Peter Zijlstra
2020-04-14 14:32 ` Peter Zijlstra
2020-03-04 16:59 ` [RFC PATCH 04/13] sched/fair: Add a few assertions vpillai
2020-03-04 16:59 ` [RFC PATCH 05/13] sched: Basic tracking of matching tasks vpillai
2020-03-04 16:59 ` [RFC PATCH 06/13] sched: Update core scheduler queue when taking cpu online/offline vpillai
2020-03-04 16:59 ` [RFC PATCH 07/13] sched: Add core wide task selection and scheduling vpillai
2020-04-14 13:35 ` Peter Zijlstra
2020-04-16 23:32 ` Tim Chen
2020-04-17 10:57 ` Peter Zijlstra
2020-04-16 3:39 ` Chen Yu
2020-04-16 19:59 ` Vineeth Remanan Pillai
2020-04-17 11:18 ` Peter Zijlstra
2020-04-19 15:31 ` Chen Yu
2020-05-21 23:14 ` Joel Fernandes
2020-05-21 23:16 ` Joel Fernandes
2020-05-22 2:35 ` Joel Fernandes
2020-05-22 3:44 ` Aaron Lu
2020-05-22 20:13 ` Joel Fernandes
2020-03-04 16:59 ` [RFC PATCH 08/13] sched/fair: wrapper for cfs_rq->min_vruntime vpillai
2020-03-04 16:59 ` [RFC PATCH 09/13] sched/fair: core wide vruntime comparison vpillai
2020-04-14 13:56 ` Peter Zijlstra
2020-04-15 3:34 ` Aaron Lu
2020-04-15 4:07 ` Aaron Lu
2020-04-15 21:24 ` Vineeth Remanan Pillai
2020-04-17 9:40 ` Aaron Lu
2020-04-20 8:07 ` [PATCH updated] sched/fair: core wide cfs task priority comparison Aaron Lu
2020-04-20 22:26 ` Vineeth Remanan Pillai
2020-04-21 2:51 ` Aaron Lu
2020-04-24 14:24 ` [PATCH updated v2] " Aaron Lu
2020-05-06 14:35 ` Peter Zijlstra
2020-05-08 8:44 ` Aaron Lu [this message]
2020-05-08 9:09 ` Peter Zijlstra
2020-05-08 12:34 ` Aaron Lu
2020-05-14 13:02 ` Peter Zijlstra
2020-05-14 22:51 ` Vineeth Remanan Pillai
2020-05-15 10:38 ` Peter Zijlstra
2020-05-15 10:43 ` Peter Zijlstra
2020-05-15 14:24 ` Vineeth Remanan Pillai
2020-05-16 3:42 ` Aaron Lu
2020-05-22 9:40 ` Aaron Lu
2020-06-08 1:41 ` Ning, Hongyu
2020-03-04 17:00 ` [RFC PATCH 10/13] sched: Trivial forced-newidle balancer vpillai
2020-03-04 17:00 ` [RFC PATCH 11/13] sched: migration changes for core scheduling vpillai
2020-06-12 13:21 ` Joel Fernandes
2020-06-12 21:32 ` Vineeth Remanan Pillai
2020-06-13 2:25 ` Joel Fernandes
2020-06-13 18:59 ` Vineeth Remanan Pillai
2020-06-15 2:05 ` Li, Aubrey
2020-03-04 17:00 ` [RFC PATCH 12/13] sched: cgroup tagging interface " vpillai
2020-06-26 15:06 ` Vineeth Remanan Pillai
2020-03-04 17:00 ` [RFC PATCH 13/13] sched: Debug bits vpillai
2020-03-04 17:36 ` [RFC PATCH 00/13] Core scheduling v5 Tim Chen
2020-03-04 17:42 ` Vineeth Remanan Pillai
2020-04-14 14:21 ` Peter Zijlstra
2020-04-15 16:32 ` Joel Fernandes
2020-04-17 11:12 ` Peter Zijlstra
2020-04-17 12:35 ` Alexander Graf
2020-04-17 13:08 ` Peter Zijlstra
2020-04-18 2:25 ` Joel Fernandes
2020-05-09 14:35 ` Dario Faggioli
[not found] ` <38805656-2e2f-222a-c083-692f4b113313@linux.intel.com>
2020-05-09 3:39 ` Ning, Hongyu
2020-05-14 20:51 ` FW: " Gruza, Agata
2020-05-10 23:46 ` [PATCH RFC] Add support for core-wide protection of IRQ and softirq Joel Fernandes (Google)
2020-05-11 13:49 ` Peter Zijlstra
2020-05-11 14:54 ` Joel Fernandes
2020-05-20 22:26 ` [PATCH RFC] sched: Add a per-thread core scheduling interface Joel Fernandes (Google)
2020-05-21 4:09 ` [PATCH RFC] sched: Add a per-thread core scheduling interface(Internet mail) benbjiang(蒋彪)
2020-05-21 13:49 ` Joel Fernandes
2020-05-21 8:51 ` [PATCH RFC] sched: Add a per-thread core scheduling interface Peter Zijlstra
2020-05-21 13:47 ` Joel Fernandes
2020-05-21 20:20 ` Vineeth Remanan Pillai
2020-05-22 12:59 ` Peter Zijlstra
2020-05-22 21:35 ` Joel Fernandes
2020-05-24 14:00 ` Phil Auld
2020-05-28 14:51 ` Joel Fernandes
2020-05-28 17:01 ` Peter Zijlstra
2020-05-28 18:17 ` Phil Auld
2020-05-28 18:34 ` Phil Auld
2020-05-28 18:23 ` Joel Fernandes
2020-05-21 18:31 ` Linus Torvalds
2020-05-21 20:40 ` Joel Fernandes
2020-05-21 21:58 ` Jesse Barnes
2020-05-22 16:33 ` Linus Torvalds
2020-05-20 22:37 ` [PATCH RFC v2] Add support for core-wide protection of IRQ and softirq Joel Fernandes (Google)
2020-05-20 22:48 ` [PATCH RFC] sched: Use sched-RCU in core-scheduling balancing logic Joel Fernandes (Google)
2020-05-21 22:52 ` Paul E. McKenney
2020-05-22 1:26 ` Joel Fernandes
2020-06-25 20:12 ` [RFC PATCH 00/13] Core scheduling v5 Vineeth Remanan Pillai
2020-06-26 1:47 ` Joel Fernandes
2020-06-26 14:36 ` Vineeth Remanan Pillai
2020-06-26 15:10 ` Joel Fernandes
2020-06-26 15:12 ` Joel Fernandes
2020-06-27 16:21 ` Joel Fernandes
2020-06-30 14:11 ` Phil Auld
2020-06-29 12:33 ` Li, Aubrey
2020-06-29 19:41 ` Vineeth Remanan Pillai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200508084419.GA120223@aaronlu-desktop \
--to=aaron.lwe@gmail.com \
--cc=aaron.lu@linux.alibaba.com \
--cc=aubrey.intel@gmail.com \
--cc=aubrey.li@linux.intel.com \
--cc=fweisbec@gmail.com \
--cc=jdesfossez@digitalocean.com \
--cc=joel@joelfernandes.org \
--cc=joelaf@google.com \
--cc=keescook@chromium.org \
--cc=kerrnel@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=naravamudan@digitalocean.com \
--cc=pauld@redhat.com \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=valentin.schneider@arm.com \
--cc=vpillai@digitalocean.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox