All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: peterz@infradead.org
Cc: Vineeth Pillai <viremana@linux.microsoft.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	Dhaval Giani <dhaval.giani@oracle.com>,
	Chris Hyser <chris.hyser@oracle.com>,
	Nishanth Aravamudan <naravamudan@digitalocean.com>,
	mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com,
	Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	vineeth@bitbyteword.org, Chen Yu <yu.c.chen@intel.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Agata Gruza <agata.gruza@intel.com>,
	Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>,
	graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com,
	rostedt@goodmis.org, derkling@google.com, benbjiang@tencent.com,
	Vineeth Remanan Pillai <vpillai@digitalocean.com>,
	Aaron Lu <aaron.lu@linux.alibaba.com>
Subject: Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
Date: Tue, 1 Sep 2020 01:10:14 -0400	[thread overview]
Message-ID: <20200901051014.GA3993517@google.com> (raw)
In-Reply-To: <20200829074719.GJ1362448@hirez.programming.kicks-ass.net>

On Sat, Aug 29, 2020 at 09:47:19AM +0200, peterz@infradead.org wrote:
> On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote:
> > On 8/28/20 4:51 PM, Peter Zijlstra wrote:
> 
> > > So where do things go side-ways?
> 
> > During hotplug stress test, we have noticed that while a sibling is in
> > pick_next_task, another sibling can go offline or come online. What
> > we have observed is smt_mask get updated underneath us even if
> > we hold the lock. From reading the code, looks like we don't hold the
> > rq lock when the mask is updated. This extra logic was to take care of that.
> 
> Sure, the mask is updated async, but _where_ is the actual problem with
> that?

Hi Peter,

I tried again and came up with the simple patch below which handles all
issues and does not cause any more crashes. I added elaborate commit messages
and code comments enlisting all the issues. Hope it makes sense now. IMHO any
other solutions seems unclear or overhead. The simple solution below Just
Works (Tm) and does not add overhead.

Let me know what you think, thanks.

---8<-----------------------

From 546c5b48f372111589117f51fd79ac1e9493c7e7 Mon Sep 17 00:00:00 2001
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Date: Tue, 1 Sep 2020 00:56:36 -0400
Subject: [PATCH] sched/core: Hotplug fixes to pick_next_task()

The follow 3 cases need to be handled to avoid crashes in pick_next_task() when
CPUs in a core are going offline or coming online.

1. The stopper task is switching into idle when it is brought down by CPU
hotplug. It is not in the cpu_smt_mask so nothing need be selected for it.
Further, the current code ends up not selecting anything for it, not even idle.
This ends up causing crashes in set_next_task(). Just do the __pick_next_task()
selection which will select the idle task. No need to do core-wide selection as
other siblings will handle it for themselves when they call schedule.

2. The rq->core_pick for a sibling in a core can be NULL if no selection was
made for it because it was either offline or went offline during a sibling's
core-wide selection. In this case, do a core-wide selection. In this case, we
have to completely ignore the checks:
        if (rq->core->core_pick_seq == rq->core->core_task_seq &&
	    rq->core->core_pick_seq != rq->core_sched_seq)

Otherwise, it would again end up crashing like #1.

3. The 'Rescheduling siblings' loop of pick_next_task() is quite fragile. It
calls various functions on rq->core_pick which could very well be NULL because:
An online sibling might have gone offline before a task could be picked for it,
or it might be offline but later happen to come online, but its too late and
nothing was picked for it. Just ignore the siblings for which nothing could be
picked. This avoids any crashes that may occur in this loop that assume
rq->core_pick is not NULL.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
 kernel/sched/core.c | 24 +++++++++++++++++++++---
 1 file changed, 21 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 717122a3dca1..4966e9f14f39 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4610,13 +4610,24 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 	if (!sched_core_enabled(rq))
 		return __pick_next_task(rq, prev, rf);
 
+	cpu = cpu_of(rq);
+
+	/* Stopper task is switching into idle, no need core-wide selection. */
+	if (cpu_is_offline(cpu))
+		return __pick_next_task(rq, prev, rf);
+
 	/*
 	 * If there were no {en,de}queues since we picked (IOW, the task
 	 * pointers are all still valid), and we haven't scheduled the last
 	 * pick yet, do so now.
+	 *
+	 * rq->core_pick can be NULL if no selection was made for a CPU because
+	 * it was either offline or went offline during a sibling's core-wide
+	 * selection. In this case, do a core-wide selection.
 	 */
 	if (rq->core->core_pick_seq == rq->core->core_task_seq &&
-	    rq->core->core_pick_seq != rq->core_sched_seq) {
+	    rq->core->core_pick_seq != rq->core_sched_seq &&
+	    !rq->core_pick) {
 		WRITE_ONCE(rq->core_sched_seq, rq->core->core_pick_seq);
 
 		next = rq->core_pick;
@@ -4629,7 +4640,6 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
 
 	put_prev_task_balance(rq, prev, rf);
 
-	cpu = cpu_of(rq);
 	smt_mask = cpu_smt_mask(cpu);
 
 	/*
@@ -4761,7 +4771,15 @@ next_class:;
 	for_each_cpu(i, smt_mask) {
 		struct rq *rq_i = cpu_rq(i);
 
-		WARN_ON_ONCE(!rq_i->core_pick);
+		/*
+		 * An online sibling might have gone offline before a task
+		 * could be picked for it, or it might be offline but later
+		 * happen to come online, but its too late and nothing was
+		 * picked for it.  That's Ok - it will pick tasks for itself,
+		 * so ignore it.
+		 */
+		if (!rq_i->core_pick)
+			continue;
 
 		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
 			rq_i->core_forceidle = true;
-- 
2.28.0.402.g5ffc5be6b7-goog


  parent reply	other threads:[~2020-09-01  5:10 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 01/23] sched: Wrap rq::lock access Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 02/23] sched: Introduce sched_class::pick_task() Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 03/23] sched: Core-wide rq->lock Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 04/23] sched/fair: Add a few assertions Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 05/23] sched: Basic tracking of matching tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit Julien Desfossez
2020-09-03  5:13   ` Randy Dunlap
2020-08-28 19:51 ` [RFC PATCH v7 07/23] cpumask: Introduce a new iterator for_each_cpu_wrap_or Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
2020-08-28 20:51   ` Peter Zijlstra
2020-08-28 22:02     ` Vineeth Pillai
2020-08-28 22:23       ` Joel Fernandes
2020-08-29  7:47       ` peterz
2020-08-31 13:01         ` Vineeth Pillai
2020-08-31 14:24         ` Joel Fernandes
2020-09-01  3:38         ` Joel Fernandes
2020-09-01  5:10         ` Joel Fernandes [this message]
2020-09-01 12:34           ` Vineeth Pillai
2020-09-01 17:30             ` Joel Fernandes
2020-09-01 21:23               ` Vineeth Pillai
2020-09-02  1:11                 ` Joel Fernandes
2020-08-28 20:55   ` Peter Zijlstra
2020-08-28 22:15     ` Vineeth Pillai
2020-09-15 20:08   ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case Julien Desfossez
2020-08-28 21:25   ` Peter Zijlstra
2020-08-28 23:24     ` Vineeth Pillai
2020-08-28 19:51 ` [RFC PATCH v7 10/23] sched/fair: wrapper for cfs_rq->min_vruntime Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison Julien Desfossez
2020-08-28 21:29   ` Peter Zijlstra
2020-09-17 14:15     ` Vineeth Pillai
2020-09-17 20:39       ` Vineeth Pillai
2020-09-23  1:46     ` Joel Fernandes
2020-09-23  1:52       ` Joel Fernandes
2020-09-25 15:02         ` Joel Fernandes
2020-09-15 21:49   ` chris hyser
     [not found]     ` <81b208ad-b9e6-bfbf-631e-02e9f75d73a2@linux.intel.com>
2020-09-16 14:24       ` chris hyser
2020-09-16 20:53         ` chris hyser
2020-09-17  1:09           ` Li, Aubrey
2020-08-28 19:51 ` [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer Julien Desfossez
2020-09-02  7:08   ` Pavan Kondeti
2020-08-28 19:51 ` [RFC PATCH v7 13/23] sched: migration changes for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 14/23] irq_work: Add support to detect if work is pending Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit Julien Desfossez
2020-08-30  2:17   ` kernel test robot
2020-08-28 19:51 ` [RFC PATCH v7 16/23] arch/x86: Add a new TIF flag for untrusted tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Julien Desfossez
2020-08-30  6:50   ` [kernel/entry] 872a0a3f0b: will-it-scale.per_thread_ops -18.7% regression kernel test robot
2020-09-01 15:54   ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Thomas Gleixner
2020-09-01 16:50     ` Joel Fernandes
2020-09-01 20:02       ` Thomas Gleixner
2020-09-02  1:29         ` Joel Fernandes
2020-09-02  7:53           ` Thomas Gleixner
2020-09-02 15:12             ` Joel Fernandes
2020-09-02 16:57             ` Dario Faggioli
2020-09-03  4:34               ` Joel Fernandes
2020-09-03 11:05                 ` Vineeth Pillai
2020-09-03 13:20                 ` Thomas Gleixner
2020-09-03 20:30                   ` Joel Fernandes
2020-09-03 13:43                 ` Dario Faggioli
2020-09-03 20:25                   ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 18/23] entry/idle: Enter and exit kernel protection during idle entry and exit Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 19/23] entry/kvm: Protect the kernel when entering from guest Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 20/23] sched/coresched: config option for kernel protection Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 21/23] sched: cgroup tagging interface for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 22/23] Documentation: Add documentation on " Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 23/23] sched: Debug bits Julien Desfossez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200901051014.GA3993517@google.com \
    --to=joel@joelfernandes.org \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=aaron.lwe@gmail.com \
    --cc=agata.gruza@intel.com \
    --cc=antonio.gomez.iglesias@intel.com \
    --cc=aubrey.intel@gmail.com \
    --cc=benbjiang@tencent.com \
    --cc=chris.hyser@oracle.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=derkling@google.com \
    --cc=dfaggioli@suse.com \
    --cc=dhaval.giani@oracle.com \
    --cc=fweisbec@gmail.com \
    --cc=graf@amazon.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineeth@bitbyteword.org \
    --cc=viremana@linux.microsoft.com \
    --cc=vpillai@digitalocean.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.