linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Balbir Singh <bsingharora@gmail.com>
Cc: Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Vineeth Pillai <viremana@linux.microsoft.com>,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	tglx@linutronix.de, linux-kernel@vger.kernel.org,
	mingo@kernel.org, torvalds@linux-foundation.org,
	fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com,
	Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	vineeth@bitbyteword.org, Chen Yu <yu.c.chen@intel.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Agata Gruza <agata.gruza@intel.com>,
	Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>,
	graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com,
	pjt@google.com, rostedt@goodmis.org, derkling@google.com,
	benbjiang@tencent.com,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	James.Bottomley@hansenpartnership.com, OWeisse@umich.edu,
	Dhaval Giani <dhaval.giani@oracle.com>,
	Junaid Shahid <junaids@google.com>,
	jsbarnes@google.com, chris.hyser@oracle.com,
	Ben Segall <bsegall@google.com>, Josh Don <joshdon@google.com>,
	Hao Luo <haoluo@google.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Aubrey Li <aubrey.li@linux.intel.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tim Chen <tim.c.chen@intel.com>
Subject: Re: [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling
Date: Tue, 1 Dec 2020 12:49:32 -0500	[thread overview]
Message-ID: <20201201174932.GA201514@google.com> (raw)
In-Reply-To: <20201125230519.GC163610@balbir-desktop>

On Thu, Nov 26, 2020 at 10:05:19AM +1100, Balbir Singh wrote:
> On Tue, Nov 24, 2020 at 01:30:38PM -0500, Joel Fernandes wrote:
> > On Mon, Nov 23, 2020 at 09:41:23AM +1100, Balbir Singh wrote:
> > > On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote:
> > > > From: Peter Zijlstra <peterz@infradead.org>
> > > > 
> > > > The rationale is as follows. In the core-wide pick logic, even if
> > > > need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> > > > see if they could be running RT.
> > > > 
> > > > Say the RQs in a particular core look like this:
> > > > Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.
> > > > 
> > > > rq0            rq1
> > > > CFS1 (tagged)  RT1 (not tag)
> > > > CFS2 (tagged)
> > > > 
> > > > Say schedule() runs on rq0. Now, it will enter the above loop and
> > > > pick_task(RT) will return NULL for 'p'. It will enter the above if() block
> > > > and see that need_sync == false and will skip RT entirely.
> > > > 
> > > > The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
> > > > rq0             rq1
> > > > CFS1            IDLE
> > > > 
> > > > When it should have selected:
> > > > rq0             r1
> > > > IDLE            RT
> > > > 
> > > > Joel saw this issue on real-world usecases in ChromeOS where an RT task
> > > > gets constantly force-idled and breaks RT. Lets cure it.
> > > > 
> > > > NOTE: This problem will be fixed differently in a later patch. It just
> > > >       kept here for reference purposes about this issue, and to make
> > > >       applying later patches easier.
> > > >
> > > 
> > > The changelog is hard to read, it refers to above if(), whereas there
> > > is no code snippet in the changelog.
> > 
> > Yeah sorry, it comes from this email where I described the issue:
> > http://lore.kernel.org/r/20201023175724.GA3563800@google.com
> > 
> > I corrected the changelog and appended the patch below. Also pushed it to:
> > https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched
> > 
> > > Also, from what I can see following
> > > the series, p->core_cookie is not yet set anywhere (unless I missed it),
> > > so fixing it in here did not make sense just reading the series.
> > 
> > The interface patches for core_cookie are added later, that's how it is. The
> > infrastructure comes first here. It would also not make sense to add
> > interface first as well so I think the current ordering is fine.
> >
> 
> Some comments below to help make the code easier to understand
> 
> > ---8<-----------------------
> > 
> > From: Peter Zijlstra <peterz@infradead.org>
> > Subject: [PATCH] sched: Fix priority inversion of cookied task with sibling
> > 
> > The rationale is as follows. In the core-wide pick logic, even if
> > need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> > see if they could be running RT.
> > 
> > Say the RQs in a particular core look like this:
> > Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task.
> > 
> > rq0            rq1
> > CFS1 (tagged)  RT1 (not tag)
> > CFS2 (tagged)
> > 
> > The end result of the selection will be (say prio(CFS1) > prio(CFS2)):
> > rq0             rq1
> > CFS1            IDLE
> > 
> > When it should have selected:
> > rq0             r1
> > IDLE            RT
> > 
> > Fix this issue by forcing need_sync and restarting the search if a
> > cookied task was discovered. This will avoid this optimization from
> > making incorrect picks.
> > 
> > Joel saw this issue on real-world usecases in ChromeOS where an RT task
> > gets constantly force-idled and breaks RT. Lets cure it.
> > 
> > NOTE: This problem will be fixed differently in a later patch. It just
> >       kept here for reference purposes about this issue, and to make
> >       applying later patches easier.
> > 
> > Reported-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > ---
> >  kernel/sched/core.c | 25 ++++++++++++++++---------
> >  1 file changed, 16 insertions(+), 9 deletions(-)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 4ee4902c2cf5..53af817740c0 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5195,6 +5195,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> >  	need_sync = !!rq->core->core_cookie;
> >  
> >  	/* reset state */
> > +reset:
> >  	rq->core->core_cookie = 0UL;
> >  	if (rq->core->core_forceidle) {
> >  		need_sync = true;
> > @@ -5242,14 +5243,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> >  				/*
> >  				 * If there weren't no cookies; we don't need to
> >  				 * bother with the other siblings.
> > -				 * If the rest of the core is not running a tagged
> > -				 * task, i.e.  need_sync == 0, and the current CPU
> > -				 * which called into the schedule() loop does not
> > -				 * have any tasks for this class, skip selecting for
> > -				 * other siblings since there's no point. We don't skip
> > -				 * for RT/DL because that could make CFS force-idle RT.
> >  				 */
> > -				if (i == cpu && !need_sync && class == &fair_sched_class)
> > +				if (i == cpu && !need_sync)
> >  					goto next_class;
> >  
> >  				continue;
> > @@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> >  			 * Optimize the 'normal' case where there aren't any
> >  			 * cookies and we don't need to sync up.
> >  			 */
> > -			if (i == cpu && !need_sync && !p->core_cookie) {
> > +			if (i == cpu && !need_sync) {
> > +				if (p->core_cookie) {
> > +					/*
> > +					 * This optimization is only valid as
> > +					 * long as there are no cookies
> 
> This is not entirely true, need_sync is a function of core cookies, so I
> think this needs more clarification, it sounds like we enter this when
> the core has no cookies, but the task has a core_cookie? The term cookie
> is quite overloaded when used in the context of core vs task.
> 
> Effectively from what I understand this means that p wants to be
> coscheduled, but the core itself is not coscheduling anything at the
> moment, so we need to see if we should do a sync and that sync might
> cause p to get kicked out and a higher priority class to come in?

Yeah so about need_sync, it is basically a flag that says if the HT running
the schedule() loop needs to bother with siblings.

need_sync is true only in following conditions:
- A cookied task is running on any HT on the core.
- Any HT in the core is force idled.

The above code comment you referred to is now reworked. That was for the case
where we discovered during local selection that we found a task with a
cookie so now we have to do a core-wide scan (need_sync = false before but
now it becomes true and we start over). This optimization is done sligtly
differently now, we run ->pick_task() on every class of the local CPU until
we find something, if we find something with a cookie then we do core-wide
selection.

The latest version of this code is now in Peter's branch:
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/tree/kernel/sched/core.c?id=6288c0a49631ce6b53eeab7021a43e49c4c4d436

- Joel


  parent reply	other threads:[~2020-12-01 17:50 UTC|newest]

Thread overview: 150+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-17 23:19 [PATCH -tip 00/32] Core scheduling (v9) Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 01/32] sched: Wrap rq::lock access Joel Fernandes (Google)
2020-11-19 23:31   ` Singh, Balbir
2020-11-20 16:55     ` Joel Fernandes
2020-11-22  8:52       ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 02/32] sched: Introduce sched_class::pick_task() Joel Fernandes (Google)
2020-11-19 23:56   ` Singh, Balbir
2020-11-20 16:58     ` Joel Fernandes
2020-11-25 23:19       ` Balbir Singh
2020-11-25 16:28   ` Vincent Guittot
2020-11-26  9:07     ` Peter Zijlstra
2020-11-26 10:17       ` Vincent Guittot
2020-11-26 12:40         ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 03/32] sched/fair: Fix pick_task_fair crashes due to empty rbtree Joel Fernandes (Google)
2020-11-20 10:15   ` Singh, Balbir
2020-11-20 18:11     ` Vineeth Pillai
2020-11-23 22:31       ` Balbir Singh
2020-11-24  8:31     ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 04/32] sched: Core-wide rq->lock Joel Fernandes (Google)
2020-11-22  9:11   ` Balbir Singh
2020-11-24  8:16     ` Peter Zijlstra
2020-11-26  0:35       ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 05/32] sched/fair: Add a few assertions Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 06/32] sched: Basic tracking of matching tasks Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 07/32] sched: Add core wide task selection and scheduling Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 08/32] sched/fair: Fix forced idle sibling starvation corner case Joel Fernandes (Google)
2020-11-22 10:35   ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 09/32] sched/fair: Snapshot the min_vruntime of CPUs on force idle Joel Fernandes (Google)
2020-11-22 11:44   ` Balbir Singh
2020-11-23 12:31     ` Vineeth Pillai
2020-11-23 23:31       ` Balbir Singh
2020-11-24  9:09         ` Peter Zijlstra
2020-11-25 23:17           ` Balbir Singh
2020-11-26  8:23             ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 10/32] sched: Fix priority inversion of cookied task with sibling Joel Fernandes (Google)
2020-11-22 22:41   ` Balbir Singh
2020-11-24 18:30     ` Joel Fernandes
2020-11-25 23:05       ` Balbir Singh
2020-11-26  8:29         ` Peter Zijlstra
2020-11-26 22:27           ` Balbir Singh
2020-12-01 17:49         ` Joel Fernandes [this message]
2020-11-17 23:19 ` [PATCH -tip 11/32] sched: Enqueue task into core queue only after vruntime is updated Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 12/32] sched: Simplify the core pick loop for optimized case Joel Fernandes (Google)
2020-11-24 12:04   ` Peter Zijlstra
2020-11-24 17:04     ` Joel Fernandes
2020-11-25  8:37       ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 13/32] sched: Trivial forced-newidle balancer Joel Fernandes (Google)
2020-11-23  4:38   ` Balbir Singh
2020-11-23 15:07     ` Li, Aubrey
2020-11-23 23:35       ` Balbir Singh
2020-11-24  0:32         ` Li, Aubrey
2020-11-25 21:28           ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 14/32] sched: migration changes for core scheduling Joel Fernandes (Google)
2020-11-22 23:54   ` Balbir Singh
2020-11-23  4:36     ` Li, Aubrey
2020-11-24 15:42       ` Peter Zijlstra
2020-11-25  3:12         ` Li, Aubrey
2020-11-25 22:57           ` Balbir Singh
2020-11-26  3:20             ` Li, Aubrey
2020-11-26  8:32               ` Balbir Singh
2020-11-26  9:26                 ` Li, Aubrey
2020-11-30  9:33                   ` Balbir Singh
2020-11-30 12:29                     ` Li, Aubrey
2020-12-02 14:09                       ` Li, Aubrey
2020-12-03  1:06                         ` Li, Aubrey
2020-11-30 10:35   ` Vincent Guittot
2020-11-30 12:32     ` Li, Aubrey
2020-11-17 23:19 ` [PATCH -tip 15/32] sched: Improve snapshotting of min_vruntime for CGroups Joel Fernandes (Google)
2020-11-24 10:27   ` Peter Zijlstra
2020-11-24 17:07     ` Joel Fernandes
2020-11-25  8:41       ` Peter Zijlstra
2020-11-24 10:41   ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 16/32] irq_work: Cleanup Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 17/32] arch/x86: Add a new TIF flag for untrusted tasks Joel Fernandes (Google)
2020-11-23  5:18   ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 18/32] kernel/entry: Add support for core-wide protection of kernel-mode Joel Fernandes (Google)
2020-11-24 16:09   ` Peter Zijlstra
2020-11-24 17:52     ` Joel Fernandes
2020-11-25  9:37   ` Peter Zijlstra
2020-12-01 17:55     ` Joel Fernandes
2020-11-26  5:37   ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 19/32] entry/idle: Enter and exit kernel protection during idle entry and exit Joel Fernandes (Google)
2020-11-24 16:13   ` Peter Zijlstra
2020-11-24 18:03     ` Joel Fernandes
2020-11-25  8:49       ` Peter Zijlstra
2020-12-01 18:24         ` Joel Fernandes
2020-11-17 23:19 ` [PATCH -tip 20/32] entry/kvm: Protect the kernel when entering from guest Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 21/32] sched: CGroup tagging interface for core scheduling Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 22/32] sched: Split the cookie and setup per-task cookie on fork Joel Fernandes (Google)
2020-11-25 11:07   ` Peter Zijlstra
2020-12-01 18:56     ` Joel Fernandes
2020-11-25 11:10   ` Peter Zijlstra
2020-12-01 19:20     ` Joel Fernandes
2020-12-01 19:34       ` Peter Zijlstra
2020-12-02  6:36         ` Josh Don
2020-12-02  7:54           ` Peter Zijlstra
2020-12-04  0:20             ` Josh Don
2020-12-06 17:49         ` Joel Fernandes
2020-11-25 11:11   ` Peter Zijlstra
2020-12-01 19:16     ` Joel Fernandes
2020-11-25 11:15   ` Peter Zijlstra
2020-12-01 19:11     ` Joel Fernandes
2020-12-01 19:20       ` Peter Zijlstra
2020-12-06 18:15         ` Joel Fernandes
2020-11-25 12:54   ` Peter Zijlstra
2020-12-01 18:38     ` Joel Fernandes
2020-11-25 13:03   ` Peter Zijlstra
2020-12-01 18:52     ` Joel Fernandes
2020-11-30 23:05   ` Balbir Singh
2020-11-17 23:19 ` [PATCH -tip 23/32] sched: Add a per-thread core scheduling interface Joel Fernandes (Google)
2020-11-25 13:08   ` Peter Zijlstra
2020-12-01 19:36     ` Joel Fernandes
2020-12-02 21:47   ` Chris Hyser
2020-12-02 23:13     ` chris hyser
2020-12-06 17:34     ` Joel Fernandes
2020-12-07 21:48       ` chris hyser
2020-12-09 18:52       ` Chris Hyser
2020-12-14 19:31         ` Joel Fernandes
2020-12-14 19:44           ` chris hyser
2020-12-14 23:25             ` Joel Fernandes
2020-12-15 14:56               ` chris hyser
2020-12-15 16:23               ` chris hyser
2020-12-15 18:13               ` Dhaval Giani
2020-12-16  0:35                 ` Joel Fernandes
2020-11-17 23:19 ` [PATCH -tip 24/32] sched: Release references to the per-task cookie on exit Joel Fernandes (Google)
2020-11-25 13:03   ` Peter Zijlstra
2020-11-17 23:19 ` [PATCH -tip 25/32] sched: Refactor core cookie into struct Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 26/32] sched: Add a second-level tag for nested CGroup usecase Joel Fernandes (Google)
2020-11-25 13:42   ` Peter Zijlstra
2020-11-30 23:10     ` Balbir Singh
2020-12-01 20:08     ` Joel Fernandes
2020-12-02  6:18     ` Josh Don
2020-12-02  8:02       ` Peter Zijlstra
2020-12-02 18:53         ` Tejun Heo
2020-12-04  0:51         ` Josh Don
2020-12-04 15:45           ` Tejun Heo
2020-11-17 23:19 ` [PATCH -tip 27/32] sched/debug: Add CGroup node for printing group cookie if SCHED_DEBUG Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 28/32] kselftest: Add tests for core-sched interface Joel Fernandes (Google)
2020-11-17 23:19 ` [PATCH -tip 29/32] sched: Move core-scheduler interfacing code to a new file Joel Fernandes (Google)
2020-11-17 23:20 ` [PATCH -tip 30/32] Documentation: Add core scheduling documentation Joel Fernandes (Google)
2020-11-17 23:20 ` [PATCH -tip 31/32] sched: Add a coresched command line option Joel Fernandes (Google)
2020-11-19 23:39   ` Randy Dunlap
2020-11-25 13:45   ` Peter Zijlstra
2020-11-26  0:11     ` Balbir Singh
2020-11-17 23:20 ` [PATCH -tip 32/32] sched: Debug bits Joel Fernandes (Google)
2020-12-01  0:21   ` Balbir Singh
2021-01-15 15:10     ` Joel Fernandes
2020-11-24 11:48 ` [PATCH -tip 00/32] Core scheduling (v9) Vincent Guittot
2020-11-24 15:08   ` Joel Fernandes
2020-12-03  6:16     ` Ning, Hongyu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201201174932.GA201514@google.com \
    --to=joel@joelfernandes.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=OWeisse@umich.edu \
    --cc=aaron.lwe@gmail.com \
    --cc=agata.gruza@intel.com \
    --cc=alexandre.chartre@oracle.com \
    --cc=antonio.gomez.iglesias@intel.com \
    --cc=aubrey.intel@gmail.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=benbjiang@tencent.com \
    --cc=bsegall@google.com \
    --cc=bsingharora@gmail.com \
    --cc=chris.hyser@oracle.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=derkling@google.com \
    --cc=dfaggioli@suse.com \
    --cc=dhaval.giani@oracle.com \
    --cc=fweisbec@gmail.com \
    --cc=graf@amazon.com \
    --cc=haoluo@google.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=joshdon@google.com \
    --cc=jsbarnes@google.com \
    --cc=junaids@google.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tim.c.chen@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineeth@bitbyteword.org \
    --cc=viremana@linux.microsoft.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).