From: Joel Fernandes <joel@joelfernandes.org>
To: peterz@infradead.org
Cc: Vineeth Pillai <viremana@linux.microsoft.com>,
Julien Desfossez <jdesfossez@digitalocean.com>,
Tim Chen <tim.c.chen@linux.intel.com>,
Aaron Lu <aaron.lwe@gmail.com>,
Aubrey Li <aubrey.intel@gmail.com>,
Dhaval Giani <dhaval.giani@oracle.com>,
Chris Hyser <chris.hyser@oracle.com>,
Nishanth Aravamudan <naravamudan@digitalocean.com>,
mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com,
Phil Auld <pauld@redhat.com>,
Valentin Schneider <valentin.schneider@arm.com>,
Mel Gorman <mgorman@techsingularity.net>,
Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
Paolo Bonzini <pbonzini@redhat.com>,
vineeth@bitbyteword.org, Chen Yu <yu.c.chen@intel.com>,
Christian Brauner <christian.brauner@ubuntu.com>,
Agata Gruza <agata.gruza@intel.com>,
Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>,
graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com,
rostedt@goodmis.org, derkling@google.com, benbjiang@tencent.com,
Vineeth Remanan Pillai <vpillai@digitalocean.com>,
Aaron Lu <aaron.lu@linux.alibaba.com>
Subject: Re: [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling.
Date: Mon, 31 Aug 2020 10:24:27 -0400 [thread overview]
Message-ID: <20200831142427.GA3437943@google.com> (raw)
In-Reply-To: <20200829074719.GJ1362448@hirez.programming.kicks-ass.net>
Hi Peter,
On Sat, Aug 29, 2020 at 09:47:19AM +0200, peterz@infradead.org wrote:
> On Fri, Aug 28, 2020 at 06:02:25PM -0400, Vineeth Pillai wrote:
> > On 8/28/20 4:51 PM, Peter Zijlstra wrote:
>
> > > So where do things go side-ways?
>
> > During hotplug stress test, we have noticed that while a sibling is in
> > pick_next_task, another sibling can go offline or come online. What
> > we have observed is smt_mask get updated underneath us even if
> > we hold the lock. From reading the code, looks like we don't hold the
> > rq lock when the mask is updated. This extra logic was to take care of that.
>
> Sure, the mask is updated async, but _where_ is the actual problem with
> that?
>
> On Fri, Aug 28, 2020 at 06:23:55PM -0400, Joel Fernandes wrote:
> > Thanks Vineeth. Peter, also the "v6+" series (which were some addons on v6)
> > detail the individual hotplug changes squashed into this patch:
> > https://lore.kernel.org/lkml/20200815031908.1015049-9-joel@joelfernandes.org/
> > https://lore.kernel.org/lkml/20200815031908.1015049-11-joel@joelfernandes.org/
>
> That one looks fishy, the pick is core wide, making that pick_seq per rq
> just doesn't make sense.
I think Vineeth was trying to handle the case where rq->core_pick happened to
be NULL for an offline CPU, and then schedule() is called when it came online
but its sched_seq != core-wide pick_seq. The reason for this situation is
because a sibling did selection for the offline CPU and ended up leaving its
rq->core_pick as NULL as the then-offline CPU was missing from the
cpu_smt_mask, but it incremented the core-wide pick_seq anyway.
Due to this, the pick_next_task() can crash after entering this if() block:
+ if (rq->core_pick_seq == rq->core->core_task_seq &&
+ rq->core_pick_seq != rq->core_sched_seq) {
How would you suggest to fix it? Maybe we can just assign rq->core_sched_seq
== rq->core_pick_seq for an offline CPU (or any CPU where rq->core_pick ==
NULL), so it does not end up using rq->core_pick and does a full core-wide
selcetion again when it comes online?
Or easier, check for rq->core_pick == NULL and skip this fast-path if() block
completely.
> > https://lore.kernel.org/lkml/20200815031908.1015049-12-joel@joelfernandes.org/
>
> This one reads like tinkering, there is no description of the actual
> problem just some code that makes a symptom go away.
>
> Sure, on hotplug the smt mask can change, but only for a CPU that isn't
> actually scheduling, so who cares.
>
> /me re-reads the hotplug code...
>
> ..ooOO is the problem that we clear the cpumasks on take_cpu_down()
> instead of play_dead() ?! That should be fixable.
I think Vineeth explained this in his email, there is logic across the loops
in the pick_next_task() that depend on the cpu_smt_mask not change. I am not
sure if play_dead() will fix it, the issue is seen in the code doing the
selection and the cpu_smt_mask changing under it due to possibly other CPUs
going offline.
Example, you have a splat and null pointer dereference possibilities in the
below loop if rq_i ->core_pick == NULL, because a sibling CPU came online but
a task was not selected for it in the for loops prior to this for loop:
/*
* Reschedule siblings
*
* NOTE: L1TF -- at this point we're no longer running the old task and
* sending an IPI (below) ensures the sibling will no longer be running
* their task. This ensures there is no inter-sibling overlap between
* non-matching user state.
*/
for_each_cpu(i, smt_mask) {
struct rq *rq_i = cpu_rq(i);
WARN_ON_ONCE(!rq_i->core_pick);
if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
rq_i->core_forceidle = true;
rq_i->core_pick->core_occupation = occ;
Probably the code can be rearchitected to not depend on cpu_smt_mask
changing. What I did in my old tree is I made a copy of the cpu_smt_mask in
the beginning of this function, and that makes all the problems go away. But
I was afraid of overhead of that copying.
(btw, I would not complain one bit if this function was nuked and rewritten
to be simpler).
> > https://lore.kernel.org/lkml/20200815031908.1015049-13-joel@joelfernandes.org/
>
> This is the only one that makes some sense, it makes rq->core consistent
> over hotplug.
Cool at least we got one thing right ;)
> > Agreed we can split the patches for the next series, however for final
> > upstream merge, I suggest we fix hotplug issues in this patch itself so that
> > we don't break bisectability.
>
> Meh, who sodding cares about hotplug :-). Also you can 'fix' such things
> by making sure you can't actually enable core-sched until after
> everything is in place.
Fair enough :)
thanks,
- Joel
next prev parent reply other threads:[~2020-08-31 14:24 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 01/23] sched: Wrap rq::lock access Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 02/23] sched: Introduce sched_class::pick_task() Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 03/23] sched: Core-wide rq->lock Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 04/23] sched/fair: Add a few assertions Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 05/23] sched: Basic tracking of matching tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit Julien Desfossez
2020-09-03 5:13 ` Randy Dunlap
2020-08-28 19:51 ` [RFC PATCH v7 07/23] cpumask: Introduce a new iterator for_each_cpu_wrap_or Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
2020-08-28 20:51 ` Peter Zijlstra
2020-08-28 22:02 ` Vineeth Pillai
2020-08-28 22:23 ` Joel Fernandes
2020-08-29 7:47 ` peterz
2020-08-31 13:01 ` Vineeth Pillai
2020-08-31 14:24 ` Joel Fernandes [this message]
2020-09-01 3:38 ` Joel Fernandes
2020-09-01 5:10 ` Joel Fernandes
2020-09-01 12:34 ` Vineeth Pillai
2020-09-01 17:30 ` Joel Fernandes
2020-09-01 21:23 ` Vineeth Pillai
2020-09-02 1:11 ` Joel Fernandes
2020-08-28 20:55 ` Peter Zijlstra
2020-08-28 22:15 ` Vineeth Pillai
2020-09-15 20:08 ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case Julien Desfossez
2020-08-28 21:25 ` Peter Zijlstra
2020-08-28 23:24 ` Vineeth Pillai
2020-08-28 19:51 ` [RFC PATCH v7 10/23] sched/fair: wrapper for cfs_rq->min_vruntime Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison Julien Desfossez
2020-08-28 21:29 ` Peter Zijlstra
2020-09-17 14:15 ` Vineeth Pillai
2020-09-17 20:39 ` Vineeth Pillai
2020-09-23 1:46 ` Joel Fernandes
2020-09-23 1:52 ` Joel Fernandes
2020-09-25 15:02 ` Joel Fernandes
2020-09-15 21:49 ` chris hyser
[not found] ` <81b208ad-b9e6-bfbf-631e-02e9f75d73a2@linux.intel.com>
2020-09-16 14:24 ` chris hyser
2020-09-16 20:53 ` chris hyser
2020-09-17 1:09 ` Li, Aubrey
2020-08-28 19:51 ` [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer Julien Desfossez
2020-09-02 7:08 ` Pavan Kondeti
2020-08-28 19:51 ` [RFC PATCH v7 13/23] sched: migration changes for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 14/23] irq_work: Add support to detect if work is pending Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit Julien Desfossez
2020-08-30 2:17 ` kernel test robot
2020-08-28 19:51 ` [RFC PATCH v7 16/23] arch/x86: Add a new TIF flag for untrusted tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Julien Desfossez
2020-08-30 6:50 ` [kernel/entry] 872a0a3f0b: will-it-scale.per_thread_ops -18.7% regression kernel test robot
2020-09-01 15:54 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Thomas Gleixner
2020-09-01 16:50 ` Joel Fernandes
2020-09-01 20:02 ` Thomas Gleixner
2020-09-02 1:29 ` Joel Fernandes
2020-09-02 7:53 ` Thomas Gleixner
2020-09-02 15:12 ` Joel Fernandes
2020-09-02 16:57 ` Dario Faggioli
2020-09-03 4:34 ` Joel Fernandes
2020-09-03 11:05 ` Vineeth Pillai
2020-09-03 13:20 ` Thomas Gleixner
2020-09-03 20:30 ` Joel Fernandes
2020-09-03 13:43 ` Dario Faggioli
2020-09-03 20:25 ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 18/23] entry/idle: Enter and exit kernel protection during idle entry and exit Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 19/23] entry/kvm: Protect the kernel when entering from guest Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 20/23] sched/coresched: config option for kernel protection Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 21/23] sched: cgroup tagging interface for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 22/23] Documentation: Add documentation on " Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 23/23] sched: Debug bits Julien Desfossez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200831142427.GA3437943@google.com \
--to=joel@joelfernandes.org \
--cc=aaron.lu@linux.alibaba.com \
--cc=aaron.lwe@gmail.com \
--cc=agata.gruza@intel.com \
--cc=antonio.gomez.iglesias@intel.com \
--cc=aubrey.intel@gmail.com \
--cc=benbjiang@tencent.com \
--cc=chris.hyser@oracle.com \
--cc=christian.brauner@ubuntu.com \
--cc=derkling@google.com \
--cc=dfaggioli@suse.com \
--cc=dhaval.giani@oracle.com \
--cc=fweisbec@gmail.com \
--cc=graf@amazon.com \
--cc=jdesfossez@digitalocean.com \
--cc=keescook@chromium.org \
--cc=kerrnel@google.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@kernel.org \
--cc=naravamudan@digitalocean.com \
--cc=pauld@redhat.com \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=torvalds@linux-foundation.org \
--cc=valentin.schneider@arm.com \
--cc=vineeth@bitbyteword.org \
--cc=viremana@linux.microsoft.com \
--cc=vpillai@digitalocean.com \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.