All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Vineeth Remanan Pillai <vpillai@digitalocean.com>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
	"Julien Desfossez" <jdesfossez@digitalocean.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	"Subhra Mazumdar" <subhra.mazumdar@oracle.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
	"Aaron Lu" <aaron.lwe@gmail.com>,
	"Aubrey Li" <aubrey.intel@gmail.com>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Vineeth Pillai" <vineethrp@gmail.com>,
	"Chen Yu" <yu.c.chen@intel.com>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	"Aaron Lu" <aaron.lu@linux.alibaba.com>,
	"Paul E . McKenney" <paulmck@kernel.org>
Subject: Re: [RFC PATCH 06/16] sched: Add core wide task selection and scheduling.
Date: Mon, 6 Jul 2020 10:09:28 -0400	[thread overview]
Message-ID: <20200706140928.GA170866@google.com> (raw)
In-Reply-To: <CANaguZDtZrXbjmot2crLM0ComgY=NfqxWYs7GzUEY8aLeaUVrg@mail.gmail.com>

On Fri, Jul 03, 2020 at 04:21:46PM -0400, Vineeth Remanan Pillai wrote:
> On Wed, Jul 1, 2020 at 7:28 PM Joel Fernandes <joel@joelfernandes.org> wrote:
> >
> > From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> > Subject: [PATCH] sched: Fix CPU hotplug causing crashes in task selection logic
> >
> > The selection logic does not run correctly if the current CPU is not in the
> > cpu_smt_mask (which it is not because the CPU is offlined when the stopper
> > finishes running and needs to switch to idle).  There are also other issues
> > fixed by the patch I think such as: if some other sibling set core_pick to
> > something, however the selection logic on current cpu resets it before
> > selecting. In this case, we need to run the task selection logic again to
> > make sure it picks something if there is something to run. It might end up
> > picking the wrong task.
> >
> I am not sure if this can happen. If the other sibling sets core_pick, it
> will be under the core wide lock and it should set the core_sched_seq also
> before releasing the lock. So when this cpu tries, it would see the core_pick
> before resetting it. Is this the same case you were mentioning? Sorry if I
> misunderstood the case you mentioned..

If you have a case where you have 3 siblings all trying to enter the schedule
loop. Call them A, B and C.

A picks something for B in core_pick. Now C comes and resets B's core_pick
before running the mega-loop, hoping to select something for it shortly.
However, C then does an unconstrained pick and forgets to set B's pick to
something.

I don't know if this can really happen - but this is why I added the warning
in the end of the patch. I think we should make the code more robust and
handle these kind of cases.

> > Yet another issue was, if the stopper thread is an
> > unconstrained pick, then rq->core_pick is set. The next time task selection
> > logic runs when stopper needs to switch to idle, the current CPU is not in
> > the smt_mask. This causes the previous ->core_pick to be picked again which
> > happens to be the unconstrained task! so the stopper keeps getting selected
> > forever.
> >
> I did not clearly understand this. During an unconstrained pick, current
> cpu's core_pick is not set and tasks are not picked for siblings as well.
> If it is observed being set in the v6 code, I think it should be a bug.

Again, it is about making the code more robust. Why should not set
rq->core_pick when we pick something? As we discussed in the private
discussion - we should make the code robust and consistent. Correctness is
not enough, the code has to be robust and maintainable.

I think in our private discussion, you agreed with me that there is no harm
in setting core_pick in this case.

> > That and there are a few more safe guards and checks around checking/setting
> > rq->core_pick. To test it, I ran rcutorture and made it tag all torture
> > threads. Then ran it in hotplug mode (hotplugging every 200ms) and it hit the
> > issue. Now it runs for an hour or so without issue. (Torture testing debug
> > changes: https://bit.ly/38htfqK ).
> >
> > Various fixes were tried causing varying degrees of crashes.  Finally I found
> > that it is easiest to just add current CPU to the smt_mask's copy always.
> > This is so that task selection logic always runs on the current CPU which
> > called schedule().
> >
> > [...]
> >         cpu = cpu_of(rq);
> > -       smt_mask = cpu_smt_mask(cpu);
> > +       /* Make a copy of cpu_smt_mask as we should not set that. */
> > +       cpumask_copy(&select_mask, cpu_smt_mask(cpu));
> > +
> > +       /*
> > +        * Always make sure current CPU is added to smt_mask so that below
> > +        * selection logic runs on it.
> > +        */
> > +       cpumask_set_cpu(cpu, &select_mask);
> >
> I like this idea. Probably we can optimize it a bit. We get here with cpu
> not in smt_mask only during an offline and online(including the boot time
> online) phase. So we could probably wrap it in an "if (unlikely())". Also,
> during this time, it would be idle thread or some hotplug online thread that
> would be runnable and no other tasks should be runnable on this cpu. So, I
> think it makes sense to do an unconstrained pick rather than a costly sync
> of all siblings. Probably something like:
> 
> cpumask_copy(&select_mask, cpu_smt_mask(cpu));
> if (unlikely(cpumask_test_cpu(cpu, &select_mask))) {
>     cpumask_set_cpu(cpu, &select_mask);
>     need_sync = false;
> }

Nah, more lines of code for no good no reason, plus another branch right? I'd
like to leave my one liner alone than adding 4 more lines :-)

> By setting need_sync to false, we will do an unconstrained pick and will
> not sync with other siblings. I guess we need to reset need_sync after
> or in the following for_each_cpu loop, because the loop may set it.

I don't know if we want to add more conditions really and make it more
confusing. If anything, I believe we should simplify the existing code more TBH.

> >         /*
> >          * core->core_task_seq, core->core_pick_seq, rq->core_sched_seq
> > @@ -4351,7 +4358,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> 
> >                         if (i == cpu && !need_sync && !p->core_cookie) {
> >                                 next = p;
> > +                               rq_i->core_pick = next;
> > +                               rq_i->core_sched_seq = rq_i->core->core_pick_seq;
> >
> I think we would not need these here. core_pick needs to be set only
> for siblings if we are picking a task for them. For unconstrained pick,
> we pick only for ourselves. Also, core_sched_seq need not be synced here.
> We might already be synced with the existing core->core_pick_seq. Even
> if it is not synced, I don't think it will cause an issue in subsequent
> schedule events.

As discussed both privately and above, there is no harm and it is good to
keep the code consistent. I'd rather have any task picking set core_pick and
core_sched_seq to prevent confusion.

And if anything is resetting an existing ->core_pick of a sibling in the
selection loop, it better set it to something sane.

> >  done:
> > +       /*
> > +        * If we reset a sibling's core_pick, make sure that we picked a task
> > +        * for it, this is because we might have reset it though it was set to
> > +        * something by another selector. In this case we cannot leave it as
> > +        * NULL and should have found something for it.
> > +        */
> > +       for_each_cpu(i, &select_mask) {
> > +               WARN_ON_ONCE(!cpu_rq(i)->core_pick);
> > +       }
> > +
> I think this check will not be true always. For unconstrained pick, we
> do not pick tasks for siblings and hence do not set core_pick for them.
> So this WARN_ON will fire for unconstrained pick. Easily reproducible
> by creating an empty cgroup and tagging it. Then only unconstrained
> picks will happen and this WARN_ON fires. I guess this check after the
> done label does not hold and could be removed.

As discussed above, > 2 SMT case, we don't really know if the warning will
fire or not. I would rather keep the warning just in case for the future.

Thanks!

 - Joel


  reply	other threads:[~2020-07-06 14:09 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-30 21:32 [RFC PATCH 00/16] Core scheduling v6 Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 01/16] sched: Wrap rq::lock access Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 02/16] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 03/16] sched: Core-wide rq->lock Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 04/16] sched/fair: Add a few assertions Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 05/16] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2020-07-21 14:02   ` [RFC PATCH 05/16] sched: Basic tracking of matching tasks(Internet mail) benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 06/16] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2020-07-01 23:28   ` Joel Fernandes
2020-07-02  0:54     ` Tim Chen
2020-07-02 12:57       ` Joel Fernandes
2020-07-02 13:23         ` Joel Fernandes
2020-07-05 23:44         ` Tim Chen
2020-07-03 20:21     ` Vineeth Remanan Pillai
2020-07-06 14:09       ` Joel Fernandes [this message]
2020-07-06 14:38         ` Vineeth Remanan Pillai
2020-07-06 17:37           ` Joel Fernandes
2020-06-30 21:32 ` [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case Vineeth Remanan Pillai
2020-07-21  7:35   ` [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case(Internet mail) benbjiang(蒋彪)
2020-07-22  7:20   ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 08/16] sched/fair: wrapper for cfs_rq->min_vruntime Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 09/16] sched/fair: core wide cfs task priority comparison Vineeth Remanan Pillai
2020-07-22  0:23   ` [RFC PATCH 09/16] sched/fair: core wide cfs task priority comparison(Internet mail) benbjiang(蒋彪)
2020-07-24  7:14     ` Aaron Lu
2020-07-24 12:08       ` Jiang Biao
2020-06-30 21:32 ` [RFC PATCH 10/16] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2020-07-20  4:06   ` [RFC PATCH 10/16] sched: Trivial forced-newidle balancer(Internet mail) benbjiang(蒋彪)
2020-07-20  6:06     ` Li, Aubrey
     [not found]       ` <8082F052-2F52-42D3-B396-18A35A94F26F@tencent.com>
2020-07-20  8:03         ` Li, Aubrey
2020-07-20  8:22           ` benbjiang(蒋彪)
2020-07-20 14:34   ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 11/16] sched: migration changes for core scheduling Vineeth Remanan Pillai
2020-07-22  8:54   ` [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail) benbjiang(蒋彪)
2020-07-22 12:13     ` Li, Aubrey
2020-07-22 14:32       ` benbjiang(蒋彪)
2020-07-23  1:57         ` Li, Aubrey
2020-07-23  2:42           ` benbjiang(蒋彪)
2020-07-23  3:35             ` Li, Aubrey
2020-07-23  4:23               ` benbjiang(蒋彪)
2020-07-23  5:39                 ` Li, Aubrey
2020-07-23  7:47                   ` benbjiang(蒋彪)
2020-07-23  8:06                     ` Li, Aubrey
2020-07-23  8:28                       ` benbjiang(蒋彪)
2020-07-23 23:43                         ` Aubrey Li
2020-07-24  1:26                           ` benbjiang(蒋彪)
2020-07-24  2:05                             ` Li, Aubrey
2020-07-24  2:29                               ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 12/16] sched: cgroup tagging interface for core scheduling Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 13/16] sched: Fix pick_next_task() race condition in " Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 14/16] irq: Add support for core-wide protection of IRQ and softirq Vineeth Remanan Pillai
2020-07-10 12:19   ` Li, Aubrey
2020-07-10 13:21     ` Joel Fernandes
2020-07-13  2:23       ` Li, Aubrey
2020-07-13 15:58         ` Joel Fernandes
2020-07-10 13:36     ` Vineeth Remanan Pillai
2020-07-11  1:33       ` Aubrey Li
2020-07-17 23:37     ` Thomas Gleixner
2020-07-18 17:05       ` Joel Fernandes
2020-07-17 23:36   ` Thomas Gleixner
2020-07-20  3:53     ` Joel Fernandes
2020-07-20  8:20       ` Thomas Gleixner
2020-07-20 11:09       ` Vineeth Pillai
2020-06-30 21:32 ` [RFC PATCH 15/16] Documentation: Add documentation on core scheduling Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 16/16] sched: Debug bits Vineeth Remanan Pillai
2020-07-31 16:41 ` [RFC PATCH 00/16] Core scheduling v6 Vineeth Pillai
2020-08-03  8:23 ` Li, Aubrey
2020-08-03 16:53   ` Joel Fernandes
2020-08-05  3:57     ` Li, Aubrey
2020-08-05  6:16       ` [RFC PATCH 00/16] Core scheduling v6(Internet mail) benbjiang(蒋彪)
2020-08-09 16:44       ` [RFC PATCH 00/16] Core scheduling v6 Joel Fernandes
2020-08-12  2:01         ` Li, Aubrey
2020-08-12 23:08           ` Joel Fernandes
2020-08-13  4:28             ` Li, Aubrey
2020-08-14  0:26               ` [RFC PATCH 00/16] Core scheduling v6(Internet mail) benbjiang(蒋彪)
2020-08-14  1:36                 ` Li, Aubrey
2020-08-14  4:04                   ` benbjiang(蒋彪)
2020-08-14  5:18                     ` Li, Aubrey
2020-08-14  7:54                       ` benbjiang(蒋彪)
2020-08-20 22:37               ` [RFC PATCH 00/16] Core scheduling v6 Joel Fernandes
2020-08-27  0:30 ` Alexander Graf
2020-08-27  1:20   ` Vineeth Pillai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200706140928.GA170866@google.com \
    --to=joel@joelfernandes.org \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineethrp@gmail.com \
    --cc=vpillai@digitalocean.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.