All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: vpillai <vpillai@digitalocean.com>
Cc: Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org,
	fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com,
	Phil Auld <pauld@redhat.com>, Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	aubrey.li@linux.intel.com,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Joel Fernandes <joelaf@google.com>,
	joel@joelfernandes.org, Aaron Lu <aaron.lu@linux.alibaba.com>
Subject: Re: [RFC PATCH 07/13] sched: Add core wide task selection and scheduling.
Date: Tue, 14 Apr 2020 15:35:59 +0200	[thread overview]
Message-ID: <20200414133559.GT20730@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <e942da7fd881977923463f19648085c1bfaa37f8.1583332765.git.vpillai@digitalocean.com>

On Wed, Mar 04, 2020 at 04:59:57PM +0000, vpillai wrote:
> +static struct task_struct *
> +pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> +{
> +	struct task_struct *next, *max = NULL;
> +	const struct sched_class *class;
> +	const struct cpumask *smt_mask;
> +	int i, j, cpu;
> +	bool need_sync = false;

AFAICT that assignment is superfluous. Also, you violated the inverse
x-mas tree.

> +
> +	cpu = cpu_of(rq);
> +	if (cpu_is_offline(cpu))
> +		return idle_sched_class.pick_next_task(rq);

Are we actually hitting this one?

> +	if (!sched_core_enabled(rq))
> +		return __pick_next_task(rq, prev, rf);
> +
> +	/*
> +	 * If there were no {en,de}queues since we picked (IOW, the task
> +	 * pointers are all still valid), and we haven't scheduled the last
> +	 * pick yet, do so now.
> +	 */
> +	if (rq->core->core_pick_seq == rq->core->core_task_seq &&
> +	    rq->core->core_pick_seq != rq->core_sched_seq) {
> +		WRITE_ONCE(rq->core_sched_seq, rq->core->core_pick_seq);
> +
> +		next = rq->core_pick;
> +		if (next != prev) {
> +			put_prev_task(rq, prev);
> +			set_next_task(rq, next);
> +		}
> +		return next;
> +	}
> +
> +	prev->sched_class->put_prev_task(rq, prev);
> +	if (!rq->nr_running)
> +		newidle_balance(rq, rf);

This is wrong per commit:

  6e2df0581f56 ("sched: Fix pick_next_task() vs 'change' pattern race")

> +	smt_mask = cpu_smt_mask(cpu);
> +
> +	/*
> +	 * core->core_task_seq, core->core_pick_seq, rq->core_sched_seq
> +	 *
> +	 * @task_seq guards the task state ({en,de}queues)
> +	 * @pick_seq is the @task_seq we did a selection on
> +	 * @sched_seq is the @pick_seq we scheduled
> +	 *
> +	 * However, preemptions can cause multiple picks on the same task set.
> +	 * 'Fix' this by also increasing @task_seq for every pick.
> +	 */
> +	rq->core->core_task_seq++;
> +	need_sync = !!rq->core->core_cookie;
> +
> +	/* reset state */
> +	rq->core->core_cookie = 0UL;
> +	for_each_cpu(i, smt_mask) {
> +		struct rq *rq_i = cpu_rq(i);
> +
> +		rq_i->core_pick = NULL;
> +
> +		if (rq_i->core_forceidle) {
> +			need_sync = true;
> +			rq_i->core_forceidle = false;
> +		}
> +
> +		if (i != cpu)
> +			update_rq_clock(rq_i);
> +	}
> +
> +	/*
> +	 * Try and select tasks for each sibling in decending sched_class
> +	 * order.
> +	 */
> +	for_each_class(class) {
> +again:
> +		for_each_cpu_wrap(i, smt_mask, cpu) {
> +			struct rq *rq_i = cpu_rq(i);
> +			struct task_struct *p;
> +
> +			if (cpu_is_offline(i)) {
> +				rq_i->core_pick = rq_i->idle;
> +				continue;
> +			}

Why are we polluting the 'fast' path with offline crud? Why isn't this
the natural result of running pick_task() on an empty runqueue?

> +
> +			if (rq_i->core_pick)
> +				continue;
> +
> +			/*
> +			 * If this sibling doesn't yet have a suitable task to
> +			 * run; ask for the most elegible task, given the
> +			 * highest priority task already selected for this
> +			 * core.
> +			 */
> +			p = pick_task(rq_i, class, max);
> +			if (!p) {
> +				/*
> +				 * If there weren't no cookies; we don't need
> +				 * to bother with the other siblings.
> +				 */
> +				if (i == cpu && !need_sync)
> +					goto next_class;
> +
> +				continue;
> +			}
> +
> +			/*
> +			 * Optimize the 'normal' case where there aren't any
> +			 * cookies and we don't need to sync up.
> +			 */
> +			if (i == cpu && !need_sync && !p->core_cookie) {
> +				next = p;
> +				goto done;
> +			}
> +
> +			rq_i->core_pick = p;
> +
> +			/*
> +			 * If this new candidate is of higher priority than the
> +			 * previous; and they're incompatible; we need to wipe
> +			 * the slate and start over. pick_task makes sure that
> +			 * p's priority is more than max if it doesn't match
> +			 * max's cookie.
> +			 *
> +			 * NOTE: this is a linear max-filter and is thus bounded
> +			 * in execution time.
> +			 */
> +			if (!max || !cookie_match(max, p)) {
> +				struct task_struct *old_max = max;
> +
> +				rq->core->core_cookie = p->core_cookie;
> +				max = p;
> +
> +				if (old_max) {
> +					for_each_cpu(j, smt_mask) {
> +						if (j == i)
> +							continue;
> +
> +						cpu_rq(j)->core_pick = NULL;
> +					}
> +					goto again;
> +				} else {
> +					/*
> +					 * Once we select a task for a cpu, we
> +					 * should not be doing an unconstrained
> +					 * pick because it might starve a task
> +					 * on a forced idle cpu.
> +					 */
> +					need_sync = true;
> +				}
> +
> +			}
> +		}
> +next_class:;
> +	}
> +
> +	rq->core->core_pick_seq = rq->core->core_task_seq;
> +	next = rq->core_pick;
> +	rq->core_sched_seq = rq->core->core_pick_seq;
> +
> +	/*
> +	 * Reschedule siblings
> +	 *
> +	 * NOTE: L1TF -- at this point we're no longer running the old task and
> +	 * sending an IPI (below) ensures the sibling will no longer be running
> +	 * their task. This ensures there is no inter-sibling overlap between
> +	 * non-matching user state.
> +	 */
> +	for_each_cpu(i, smt_mask) {
> +		struct rq *rq_i = cpu_rq(i);
> +
> +		if (cpu_is_offline(i))
> +			continue;

Another one; please explain how an offline cpu can be part of the
smt_mask. Last time I checked it got cleared in stop-machine.

> +
> +		WARN_ON_ONCE(!rq_i->core_pick);
> +
> +		if (is_idle_task(rq_i->core_pick) && rq_i->nr_running)
> +			rq_i->core_forceidle = true;
> +
> +		if (i == cpu)
> +			continue;
> +
> +		if (rq_i->curr != rq_i->core_pick)
> +			resched_curr(rq_i);
> +
> +		/* Did we break L1TF mitigation requirements? */
> +		WARN_ON_ONCE(!cookie_match(next, rq_i->core_pick));

That comment is misleading...

> +	}
> +
> +done:
> +	set_next_task(rq, next);
> +	return next;
> +}

----8<----

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a9eeef896c78..8432de767730 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4080,6 +4080,13 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  		update_min_vruntime(cfs_rq);
>  }
>  
> +static inline bool
> +__entity_slice_used(struct sched_entity *se)
> +{
> +	return (se->sum_exec_runtime - se->prev_sum_exec_runtime) >
> +		sched_slice(cfs_rq_of(se), se);
> +}
> +
>  /*
>   * Preempt the current task with a newly woken task if needed:
>   */
> @@ -10285,6 +10292,34 @@ static void core_sched_deactivate_fair(struct rq *rq)
>  #endif
>  #endif /* CONFIG_SMP */
>  
> +#ifdef CONFIG_SCHED_CORE
> +/*
> + * If runqueue has only one task which used up its slice and
> + * if the sibling is forced idle, then trigger schedule
> + * to give forced idle task a chance.
> + */
> +static void resched_forceidle_sibling(struct rq *rq, struct sched_entity *se)
> +{
> +	int cpu = cpu_of(rq), sibling_cpu;
> +	if (rq->cfs.nr_running > 1 || !__entity_slice_used(se))
> +		return;
> +
> +	for_each_cpu(sibling_cpu, cpu_smt_mask(cpu)) {
> +		struct rq *sibling_rq;
> +		if (sibling_cpu == cpu)
> +			continue;
> +		if (cpu_is_offline(sibling_cpu))
> +			continue;
> +
> +		sibling_rq = cpu_rq(sibling_cpu);
> +		if (sibling_rq->core_forceidle) {
> +			resched_curr(sibling_rq);
> +		}
> +	}
> +}
> +#endif
> +
> +
>  /*
>   * scheduler tick hitting a task of our scheduling class.
>   *
> @@ -10308,6 +10343,11 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
>  
>  	update_misfit_status(curr, rq);
>  	update_overutilized_status(task_rq(curr));
> +
> +#ifdef CONFIG_SCHED_CORE
> +	if (sched_core_enabled(rq))
> +		resched_forceidle_sibling(rq, &curr->se);
> +#endif
>  }
>  
>  /*

This ^ seems like it should be in it's own patch.

> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 03d502357599..a829e26fa43a 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1003,11 +1003,16 @@ struct rq {
>  #ifdef CONFIG_SCHED_CORE
>  	/* per rq */
>  	struct rq		*core;
> +	struct task_struct	*core_pick;
>  	unsigned int		core_enabled;
> +	unsigned int		core_sched_seq;
>  	struct rb_root		core_tree;
> +	bool			core_forceidle;

Someone forgot that _Bool shouldn't be part of composite types?

>  	/* shared state */
>  	unsigned int		core_task_seq;
> +	unsigned int		core_pick_seq;
> +	unsigned long		core_cookie;
>  #endif
>  };

  reply	other threads:[~2020-04-14 13:37 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-04 16:59 [RFC PATCH 00/13] Core scheduling v5 vpillai
2020-03-04 16:59 ` [RFC PATCH 01/13] sched: Wrap rq::lock access vpillai
2020-03-04 16:59 ` [RFC PATCH 02/13] sched: Introduce sched_class::pick_task() vpillai
2020-03-04 16:59 ` [RFC PATCH 03/13] sched: Core-wide rq->lock vpillai
2020-04-01 11:42   ` [PATCH] sched/arm64: store cpu topology before notify_cpu_starting Cheng Jian
2020-04-01 13:23     ` Valentin Schneider
2020-04-01 13:23       ` Valentin Schneider
2020-04-06  8:00       ` chengjian (D)
2020-04-06  8:00         ` chengjian (D)
2020-04-09  9:59       ` Sudeep Holla
2020-04-09  9:59         ` Sudeep Holla
2020-04-09 10:32         ` Valentin Schneider
2020-04-09 10:32           ` Valentin Schneider
2020-04-09 11:08           ` Sudeep Holla
2020-04-09 11:08             ` Sudeep Holla
2020-04-09 17:54     ` Joel Fernandes
2020-04-10 13:49       ` chengjian (D)
2020-04-14 11:36   ` [RFC PATCH 03/13] sched: Core-wide rq->lock Peter Zijlstra
2020-04-14 21:35     ` Vineeth Remanan Pillai
2020-04-15 10:55       ` Peter Zijlstra
2020-04-14 14:32   ` Peter Zijlstra
2020-03-04 16:59 ` [RFC PATCH 04/13] sched/fair: Add a few assertions vpillai
2020-03-04 16:59 ` [RFC PATCH 05/13] sched: Basic tracking of matching tasks vpillai
2020-03-04 16:59 ` [RFC PATCH 06/13] sched: Update core scheduler queue when taking cpu online/offline vpillai
2020-03-04 16:59 ` [RFC PATCH 07/13] sched: Add core wide task selection and scheduling vpillai
2020-04-14 13:35   ` Peter Zijlstra [this message]
2020-04-16 23:32     ` Tim Chen
2020-04-17 10:57       ` Peter Zijlstra
2020-04-16  3:39   ` Chen Yu
2020-04-16 19:59     ` Vineeth Remanan Pillai
2020-04-17 11:18     ` Peter Zijlstra
2020-04-19 15:31       ` Chen Yu
2020-05-21 23:14   ` Joel Fernandes
2020-05-21 23:16     ` Joel Fernandes
2020-05-22  2:35     ` Joel Fernandes
2020-05-22  3:44       ` Aaron Lu
2020-05-22 20:13         ` Joel Fernandes
2020-03-04 16:59 ` [RFC PATCH 08/13] sched/fair: wrapper for cfs_rq->min_vruntime vpillai
2020-03-04 16:59 ` [RFC PATCH 09/13] sched/fair: core wide vruntime comparison vpillai
2020-04-14 13:56   ` Peter Zijlstra
2020-04-15  3:34     ` Aaron Lu
2020-04-15  4:07       ` Aaron Lu
2020-04-15 21:24         ` Vineeth Remanan Pillai
2020-04-17  9:40           ` Aaron Lu
2020-04-20  8:07             ` [PATCH updated] sched/fair: core wide cfs task priority comparison Aaron Lu
2020-04-20 22:26               ` Vineeth Remanan Pillai
2020-04-21  2:51                 ` Aaron Lu
2020-04-24 14:24                   ` [PATCH updated v2] " Aaron Lu
2020-05-06 14:35                     ` Peter Zijlstra
2020-05-08  8:44                       ` Aaron Lu
2020-05-08  9:09                         ` Peter Zijlstra
2020-05-08 12:34                           ` Aaron Lu
2020-05-14 13:02                             ` Peter Zijlstra
2020-05-14 22:51                               ` Vineeth Remanan Pillai
2020-05-15 10:38                                 ` Peter Zijlstra
2020-05-15 10:43                                   ` Peter Zijlstra
2020-05-15 14:24                                   ` Vineeth Remanan Pillai
2020-05-16  3:42                               ` Aaron Lu
2020-05-22  9:40                                 ` Aaron Lu
2020-06-08  1:41                               ` Ning, Hongyu
2020-03-04 17:00 ` [RFC PATCH 10/13] sched: Trivial forced-newidle balancer vpillai
2020-03-04 17:00 ` [RFC PATCH 11/13] sched: migration changes for core scheduling vpillai
2020-06-12 13:21   ` Joel Fernandes
2020-06-12 21:32     ` Vineeth Remanan Pillai
2020-06-13  2:25       ` Joel Fernandes
2020-06-13 18:59         ` Vineeth Remanan Pillai
2020-06-15  2:05           ` Li, Aubrey
2020-03-04 17:00 ` [RFC PATCH 12/13] sched: cgroup tagging interface " vpillai
2020-06-26 15:06   ` Vineeth Remanan Pillai
2020-03-04 17:00 ` [RFC PATCH 13/13] sched: Debug bits vpillai
2020-03-04 17:36 ` [RFC PATCH 00/13] Core scheduling v5 Tim Chen
2020-03-04 17:42   ` Vineeth Remanan Pillai
2020-04-14 14:21 ` Peter Zijlstra
2020-04-15 16:32   ` Joel Fernandes
2020-04-17 11:12     ` Peter Zijlstra
2020-04-17 12:35       ` Alexander Graf
2020-04-17 13:08         ` Peter Zijlstra
2020-04-18  2:25       ` Joel Fernandes
2020-05-09 14:35   ` Dario Faggioli
     [not found] ` <38805656-2e2f-222a-c083-692f4b113313@linux.intel.com>
2020-05-09  3:39   ` Ning, Hongyu
2020-05-14 20:51     ` FW: " Gruza, Agata
2020-05-10 23:46 ` [PATCH RFC] Add support for core-wide protection of IRQ and softirq Joel Fernandes (Google)
2020-05-11 13:49   ` Peter Zijlstra
2020-05-11 14:54     ` Joel Fernandes
2020-05-20 22:26 ` [PATCH RFC] sched: Add a per-thread core scheduling interface Joel Fernandes (Google)
2020-05-21  4:09   ` [PATCH RFC] sched: Add a per-thread core scheduling interface(Internet mail) benbjiang(蒋彪)
2020-05-21 13:49     ` Joel Fernandes
2020-05-21  8:51   ` [PATCH RFC] sched: Add a per-thread core scheduling interface Peter Zijlstra
2020-05-21 13:47     ` Joel Fernandes
2020-05-21 20:20       ` Vineeth Remanan Pillai
2020-05-22 12:59       ` Peter Zijlstra
2020-05-22 21:35         ` Joel Fernandes
2020-05-24 14:00           ` Phil Auld
2020-05-28 14:51             ` Joel Fernandes
2020-05-28 17:01             ` Peter Zijlstra
2020-05-28 18:17               ` Phil Auld
2020-05-28 18:34                 ` Phil Auld
2020-05-28 18:23               ` Joel Fernandes
2020-05-21 18:31   ` Linus Torvalds
2020-05-21 20:40     ` Joel Fernandes
2020-05-21 21:58       ` Jesse Barnes
2020-05-22 16:33         ` Linus Torvalds
2020-05-20 22:37 ` [PATCH RFC v2] Add support for core-wide protection of IRQ and softirq Joel Fernandes (Google)
2020-05-20 22:48 ` [PATCH RFC] sched: Use sched-RCU in core-scheduling balancing logic Joel Fernandes (Google)
2020-05-21 22:52   ` Paul E. McKenney
2020-05-22  1:26     ` Joel Fernandes
2020-06-25 20:12 ` [RFC PATCH 00/13] Core scheduling v5 Vineeth Remanan Pillai
2020-06-26  1:47   ` Joel Fernandes
2020-06-26 14:36     ` Vineeth Remanan Pillai
2020-06-26 15:10       ` Joel Fernandes
2020-06-26 15:12         ` Joel Fernandes
2020-06-27 16:21         ` Joel Fernandes
2020-06-30 14:11         ` Phil Auld
2020-06-29 12:33   ` Li, Aubrey
2020-06-29 19:41     ` Vineeth Remanan Pillai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200414133559.GT20730@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=joel@joelfernandes.org \
    --cc=joelaf@google.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vpillai@digitalocean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.