From: Joel Fernandes <joel@joelfernandes.org>
To: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Vineeth Remanan Pillai <vpillai@digitalocean.com>,
Peter Zijlstra <peterz@infradead.org>,
Nishanth Aravamudan <naravamudan@digitalocean.com>,
Julien Desfossez <jdesfossez@digitalocean.com>,
tglx@linutronix.de, pjt@google.com,
torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
subhra.mazumdar@oracle.com, mingo@kernel.org, fweisbec@gmail.com,
keescook@chromium.org, kerrnel@google.com,
Phil Auld <pauld@redhat.com>, Aaron Lu <aaron.lwe@gmail.com>,
Aubrey Li <aubrey.intel@gmail.com>,
Valentin Schneider <valentin.schneider@arm.com>,
Mel Gorman <mgorman@techsingularity.net>,
Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
Paolo Bonzini <pbonzini@redhat.com>,
vineethrp@gmail.com, Chen Yu <yu.c.chen@intel.com>,
Christian Brauner <christian.brauner@ubuntu.com>,
Aaron Lu <aaron.lu@linux.alibaba.com>,
paulmck@kernel.org
Subject: Re: [RFC PATCH 06/16] sched: Add core wide task selection and scheduling.
Date: Thu, 2 Jul 2020 08:57:57 -0400 [thread overview]
Message-ID: <20200702125757.GB439212@google.com> (raw)
In-Reply-To: <200c81ef-c961-dcd5-1221-84897c459b05@linux.intel.com>
On Wed, Jul 01, 2020 at 05:54:11PM -0700, Tim Chen wrote:
>
>
> On 7/1/20 4:28 PM, Joel Fernandes wrote:
> > On Tue, Jun 30, 2020 at 09:32:27PM +0000, Vineeth Remanan Pillai wrote:
> >> From: Peter Zijlstra <peterz@infradead.org>
> >>
> >> Instead of only selecting a local task, select a task for all SMT
> >> siblings for every reschedule on the core (irrespective which logical
> >> CPU does the reschedule).
> >>
> >> There could be races in core scheduler where a CPU is trying to pick
> >> a task for its sibling in core scheduler, when that CPU has just been
> >> offlined. We should not schedule any tasks on the CPU in this case.
> >> Return an idle task in pick_next_task for this situation.
> >>
> >> NOTE: there is still potential for siblings rivalry.
> >> NOTE: this is far too complicated; but thus far I've failed to
> >> simplify it further.
> >>
> >> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> >> Signed-off-by: Julien Desfossez <jdesfossez@digitalocean.com>
> >> Signed-off-by: Vineeth Remanan Pillai <vpillai@digitalocean.com>
> >> Signed-off-by: Aaron Lu <aaron.lu@linux.alibaba.com>
> >> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> >
> > Hi Peter, Tim, all, the below patch fixes the hotplug issue described in the
> > below patch's Link tag. Patch description below describes the issues fixed
> > and it applies on top of this patch.
> >
> > ------8<----------
> >
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > Subject: [PATCH] sched: Fix CPU hotplug causing crashes in task selection logic
> >
> > The selection logic does not run correctly if the current CPU is not in the
> > cpu_smt_mask (which it is not because the CPU is offlined when the stopper
> > finishes running and needs to switch to idle). There are also other issues
> > fixed by the patch I think such as: if some other sibling set core_pick to
> > something, however the selection logic on current cpu resets it before
> > selecting. In this case, we need to run the task selection logic again to
> > make sure it picks something if there is something to run. It might end up
> > picking the wrong task. Yet another issue was, if the stopper thread is an
"It might end up picking the wrong task" needs to be: "We might end up
picking a different task but that's Ok".
> > unconstrained pick, then rq->core_pick is set. The next time task selection
> > logic runs when stopper needs to switch to idle, the current CPU is not in
> > the smt_mask. This causes the previous ->core_pick to be picked again which
> > happens to be the unconstrained task! so the stopper keeps getting selected
> > forever.
> >
> > That and there are a few more safe guards and checks around checking/setting
> > rq->core_pick. To test it, I ran rcutorture and made it tag all torture
> > threads. Then ran it in hotplug mode (hotplugging every 200ms) and it hit the
> > issue. Now it runs for an hour or so without issue. (Torture testing debug
> > changes: https://bit.ly/38htfqK ).
> >
> > Various fixes were tried causing varying degrees of crashes. Finally I found
> > that it is easiest to just add current CPU to the smt_mask's copy always.
> > This is so that task selection logic always runs on the current CPU which
> > called schedule().
>
>
> It looks good to me.
Thank you for your review! Could I add your Reviewed-by tag to the patch?
- Joel
> Thanks.
>
> Tim
next prev parent reply other threads:[~2020-07-02 12:58 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-30 21:32 [RFC PATCH 00/16] Core scheduling v6 Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 01/16] sched: Wrap rq::lock access Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 02/16] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 03/16] sched: Core-wide rq->lock Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 04/16] sched/fair: Add a few assertions Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 05/16] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2020-07-21 14:02 ` [RFC PATCH 05/16] sched: Basic tracking of matching tasks(Internet mail) benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 06/16] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2020-07-01 23:28 ` Joel Fernandes
2020-07-02 0:54 ` Tim Chen
2020-07-02 12:57 ` Joel Fernandes [this message]
2020-07-02 13:23 ` Joel Fernandes
2020-07-05 23:44 ` Tim Chen
2020-07-03 20:21 ` Vineeth Remanan Pillai
2020-07-06 14:09 ` Joel Fernandes
2020-07-06 14:38 ` Vineeth Remanan Pillai
2020-07-06 17:37 ` Joel Fernandes
2020-06-30 21:32 ` [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case Vineeth Remanan Pillai
2020-07-21 7:35 ` [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case(Internet mail) benbjiang(蒋彪)
2020-07-22 7:20 ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 08/16] sched/fair: wrapper for cfs_rq->min_vruntime Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 09/16] sched/fair: core wide cfs task priority comparison Vineeth Remanan Pillai
2020-07-22 0:23 ` [RFC PATCH 09/16] sched/fair: core wide cfs task priority comparison(Internet mail) benbjiang(蒋彪)
2020-07-24 7:14 ` Aaron Lu
2020-07-24 12:08 ` Jiang Biao
2020-06-30 21:32 ` [RFC PATCH 10/16] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2020-07-20 4:06 ` [RFC PATCH 10/16] sched: Trivial forced-newidle balancer(Internet mail) benbjiang(蒋彪)
2020-07-20 6:06 ` Li, Aubrey
[not found] ` <8082F052-2F52-42D3-B396-18A35A94F26F@tencent.com>
2020-07-20 8:03 ` Li, Aubrey
2020-07-20 8:22 ` benbjiang(蒋彪)
2020-07-20 14:34 ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 11/16] sched: migration changes for core scheduling Vineeth Remanan Pillai
2020-07-22 8:54 ` [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail) benbjiang(蒋彪)
2020-07-22 12:13 ` Li, Aubrey
2020-07-22 14:32 ` benbjiang(蒋彪)
2020-07-23 1:57 ` Li, Aubrey
2020-07-23 2:42 ` benbjiang(蒋彪)
2020-07-23 3:35 ` Li, Aubrey
2020-07-23 4:23 ` benbjiang(蒋彪)
2020-07-23 5:39 ` Li, Aubrey
2020-07-23 7:47 ` benbjiang(蒋彪)
2020-07-23 8:06 ` Li, Aubrey
2020-07-23 8:28 ` benbjiang(蒋彪)
2020-07-23 23:43 ` Aubrey Li
2020-07-24 1:26 ` benbjiang(蒋彪)
2020-07-24 2:05 ` Li, Aubrey
2020-07-24 2:29 ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 12/16] sched: cgroup tagging interface for core scheduling Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 13/16] sched: Fix pick_next_task() race condition in " Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 14/16] irq: Add support for core-wide protection of IRQ and softirq Vineeth Remanan Pillai
2020-07-10 12:19 ` Li, Aubrey
2020-07-10 13:21 ` Joel Fernandes
2020-07-13 2:23 ` Li, Aubrey
2020-07-13 15:58 ` Joel Fernandes
2020-07-10 13:36 ` Vineeth Remanan Pillai
2020-07-11 1:33 ` Aubrey Li
2020-07-17 23:37 ` Thomas Gleixner
2020-07-18 17:05 ` Joel Fernandes
2020-07-17 23:36 ` Thomas Gleixner
2020-07-20 3:53 ` Joel Fernandes
2020-07-20 8:20 ` Thomas Gleixner
2020-07-20 11:09 ` Vineeth Pillai
2020-06-30 21:32 ` [RFC PATCH 15/16] Documentation: Add documentation on core scheduling Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 16/16] sched: Debug bits Vineeth Remanan Pillai
2020-07-31 16:41 ` [RFC PATCH 00/16] Core scheduling v6 Vineeth Pillai
2020-08-03 8:23 ` Li, Aubrey
2020-08-03 16:53 ` Joel Fernandes
2020-08-05 3:57 ` Li, Aubrey
2020-08-05 6:16 ` [RFC PATCH 00/16] Core scheduling v6(Internet mail) benbjiang(蒋彪)
2020-08-09 16:44 ` [RFC PATCH 00/16] Core scheduling v6 Joel Fernandes
2020-08-12 2:01 ` Li, Aubrey
2020-08-12 23:08 ` Joel Fernandes
2020-08-13 4:28 ` Li, Aubrey
2020-08-14 0:26 ` [RFC PATCH 00/16] Core scheduling v6(Internet mail) benbjiang(蒋彪)
2020-08-14 1:36 ` Li, Aubrey
2020-08-14 4:04 ` benbjiang(蒋彪)
2020-08-14 5:18 ` Li, Aubrey
2020-08-14 7:54 ` benbjiang(蒋彪)
2020-08-20 22:37 ` [RFC PATCH 00/16] Core scheduling v6 Joel Fernandes
2020-08-27 0:30 ` Alexander Graf
2020-08-27 1:20 ` Vineeth Pillai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200702125757.GB439212@google.com \
--to=joel@joelfernandes.org \
--cc=aaron.lu@linux.alibaba.com \
--cc=aaron.lwe@gmail.com \
--cc=aubrey.intel@gmail.com \
--cc=christian.brauner@ubuntu.com \
--cc=fweisbec@gmail.com \
--cc=jdesfossez@digitalocean.com \
--cc=keescook@chromium.org \
--cc=kerrnel@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=naravamudan@digitalocean.com \
--cc=pauld@redhat.com \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=subhra.mazumdar@oracle.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=valentin.schneider@arm.com \
--cc=vineethrp@gmail.com \
--cc=vpillai@digitalocean.com \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.