[RFC II] Splitting scheduler into two halves

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC II] Splitting scheduler into two halves
@ 2014-03-26 18:37 Yuyang du
  2014-03-27  4:57 ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Yuyang du @ 2014-03-26 18:37 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, linux-pm
  Cc: morten.rasmussen, arjan.van.de.ven, len.brown, rafael.j.wysocki,
	alan.cox

Hi all,

This is continued after the first RFC about splitting the scheduler. Still
work-in-progress, and call for feedback.

The question addressed here is how load balance should be changed. And I think
the question then goes to how to *reuse* common code as much as possible and
meanwhile be able to serve various objectives.

So these are the basic semantics needed in current load balance:

1. [ At balance point ] on this_cpu push task on that_cpu to [ third_cpu ]

Examples are fork/exec/wakeup. Task is determined by the balance point in
question. And that_cpu is determined by task.

2. [ At balance point ] on this_cpu pull [ task/tasks ] on [ that_cpu ] to
this_cpu

Examples are other idle/periodic/nohz balance, and active_load_balance in
ASYM_PACKING (pull first and then a push).

3. [ At balance point ] on this_cpu kick [ that_cpu/those_cpus ] to do [ what
] balance

Examples are nohz idle balance and active balance.

To make the above more general, we need to abstract more:

1. [ At balance point ] on this_cpu push task on that_cpu to [ third_cpu ] in
[ cpu_mask ]

2. [ At balance point ] on this_cpu [ do | skip ] pull [task/tasks ] on [
that_cpu ] in [ cpu_mask ] to this_cpu

3. [ At balance point ] on this_cpu kick [ that_cpu/those_cpus ] in [ cpu_mask
] to do nohz idle balance

So essentially, we give them choice or restrict the scope for them.

Then instead of an all-in-one load_balance class, we define pull or push
classes:

struct push_class:
int (*which_third_cpu);
struct cpumask * (*which_cpu_mask);

struct pull_class:
int (*skip);
int (*which_that_cpu);
struct task_struct * (*which_task);
struct cpumask* (*which_cpu_mask);

Last but not least, currently we configure domain by flags/parameters, how
about attaching push/pull classes directly to them as struct members? So those
classes are responsible specially for its riding domain's "well-being".

Thanks,
Yuyang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC II] Splitting scheduler into two halves
  2014-03-26 18:37 [RFC II] Splitting scheduler into two halves Yuyang du
@ 2014-03-27  4:57 ` Mike Galbraith
  2014-03-27  7:25   ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2014-03-27  4:57 UTC (permalink / raw)
  To: Yuyang du
  Cc: peterz, mingo, linux-kernel, linux-pm, morten.rasmussen,
	arjan.van.de.ven, len.brown, rafael.j.wysocki, alan.cox

On Thu, 2014-03-27 at 02:37 +0800, Yuyang du wrote: 
> Hi all,
> 
> This is continued after the first RFC about splitting the scheduler. Still
> work-in-progress, and call for feedback.
> 
> The question addressed here is how load balance should be changed. And I think
> the question then goes to how to *reuse* common code as much as possible and
> meanwhile be able to serve various objectives.
> 
> So these are the basic semantics needed in current load balance:

I'll probably regret it, but I'm gonna speak my mind.  I think this two
halves concept is fundamentally broken. 

> 1. [ At balance point ] on this_cpu push task on that_cpu to [ third_cpu ]

Load balancing is a necessary part of the fastpath as well as slow path,
you can't just define balance point, and have that mean a point at which
we can separate core functionality from peripheral.  For example, rt
class has push/pull at schedule time, fair class select_idle_sibling()
at wakeup, both in the fastpath, to minimize latency.  It is all load
balancing, is push pull, fastpath does exactly the same things as slow
path, for the exact same reason, only resource investment varies.

I don't think you can separate the scheduler into two halves like this,
load balancing is an integral part and fundamental consequence of being
a multi-queue scheduler.  Scheduling and balancing are not two halves
that make a whole, and can thus be separated, they are one.

-Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC II] Splitting scheduler into two halves
  2014-03-27  4:57 ` Mike Galbraith
@ 2014-03-27  7:25   ` Ingo Molnar
  2014-03-27 22:13     ` Yuyang Du
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2014-03-27  7:25 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Yuyang du, peterz, mingo, linux-kernel, linux-pm,
	morten.rasmussen, arjan.van.de.ven, len.brown, rafael.j.wysocki,
	alan.cox


* Mike Galbraith <umgwanakikbuti@gmail.com> wrote:

> On Thu, 2014-03-27 at 02:37 +0800, Yuyang du wrote: 
> > Hi all,
> > 
> > This is continued after the first RFC about splitting the scheduler. Still
> > work-in-progress, and call for feedback.
> > 
> > The question addressed here is how load balance should be changed. And I think
> > the question then goes to how to *reuse* common code as much as possible and
> > meanwhile be able to serve various objectives.
> > 
> > So these are the basic semantics needed in current load balance:
> 
> I'll probably regret it, but I'm gonna speak my mind.  I think this 
> two halves concept is fundamentally broken.

As PeterZ pointed it out in the previous discussion, this approach, 
besides being fundamentally broken, also gives no valid technical 
rationale given for the change.

Firstly, I'd like to stress it that we are not against abstraction and 
interfaces within the scheduler (at all!) - we already have a 'split' 
and use interfaces between 'scheduler classes':

struct sched_class {
	const struct sched_class *next;

	void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags);
	void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags);
	void (*yield_task) (struct rq *rq);
	bool (*yield_to_task) (struct rq *rq, struct task_struct *p, bool preempt);

	void (*check_preempt_curr) (struct rq *rq, struct task_struct *p, int flags);

	/*
	 * It is the responsibility of the pick_next_task() method that will
	 * return the next task to call put_prev_task() on the @prev task or
	 * something equivalent.
	 *
	 * May return RETRY_TASK when it finds a higher prio class has runnable
	 * tasks.
	 */
	struct task_struct * (*pick_next_task) (struct rq *rq,
						struct task_struct *prev);
	void (*put_prev_task) (struct rq *rq, struct task_struct *p);

#ifdef CONFIG_SMP
	int  (*select_task_rq)(struct task_struct *p, int task_cpu, int sd_flag, int flags);
	void (*migrate_task_rq)(struct task_struct *p, int next_cpu);

	void (*post_schedule) (struct rq *this_rq);
	void (*task_waking) (struct task_struct *task);
	void (*task_woken) (struct rq *this_rq, struct task_struct *task);

	void (*set_cpus_allowed)(struct task_struct *p,
				 const struct cpumask *newmask);

	void (*rq_online)(struct rq *rq);
	void (*rq_offline)(struct rq *rq);
#endif

	void (*set_curr_task) (struct rq *rq);
	void (*task_tick) (struct rq *rq, struct task_struct *p, int queued);
	void (*task_fork) (struct task_struct *p);
	void (*task_dead) (struct task_struct *p);

	void (*switched_from) (struct rq *this_rq, struct task_struct *task);
	void (*switched_to) (struct rq *this_rq, struct task_struct *task);
	void (*prio_changed) (struct rq *this_rq, struct task_struct *task,
			     int oldprio);

	unsigned int (*get_rr_interval) (struct rq *rq,
					 struct task_struct *task);

#ifdef CONFIG_FAIR_GROUP_SCHED
	void (*task_move_group) (struct task_struct *p, int on_rq);
#endif
};

So where it makes sense we make use of this programming technique, to 
the extent it is helpful.

But interfaces and abstraction has a cost, and the justification given 
in this submission looks very weak to me. There's no justification 
given in this specific submission, the closest I could find was in the 
first submission:

> > With the advent of more cores and heterogeneous architectures, the 
> > scheduler is required to be more complex (power efficiency) and 
> > diverse (big.little). For the scheduler to address that challenge 
> > as a whole, it is costly but not necessary. This proposal argues 
> > that the scheduler be spitted into two parts: top half (task 
> > scheduling) and bottom half (load balance). Let the bottom half 
> > take charge of the incoming requirements.

That is just way too generic with no specific technical benefits 
listed. No cost/benefit demonstrated.

If there's any advantage to a 'split', then it must be expressable via 
one or more of these positive attributes:

 - better numbers (better performance, etc.)
 - reduced code
 - new features

A split alone, without making active and convincing use of it, is 
inadequate.

So without a much better rationale, demonstrated via actual, real 
working code that not only does the split but also makes real use of 
every aspect of the proposed abstraction interfaces, which 
demonstrates that the proposed 'split' is the most sensible way 
forward, this specific submission earns a NAK from me.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC II] Splitting scheduler into two halves
  2014-03-27  7:25   ` Ingo Molnar
@ 2014-03-27 22:13     ` Yuyang Du
  2014-03-28  6:50       ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Yuyang Du @ 2014-03-27 22:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mike Galbraith, peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	morten.rasmussen@arm.com, Van De Ven, Arjan, Brown, Len,
	Wysocki, Rafael J, Cox, Alan

Hi,

I should have changed the subject to "Refining the load balancing interfaces".
Spitting does feel brutal or too big a jump for now. But i doubt that would
change your mind anyway.

Overall, I interpret your comment as: calling for substantial stuff. Yay,
working on it.

Thanks,
Yuyang

On Thu, Mar 27, 2014 at 03:25:11PM +0800, Ingo Molnar wrote:
> 
> * Mike Galbraith <umgwanakikbuti@gmail.com> wrote:
> 
> > On Thu, 2014-03-27 at 02:37 +0800, Yuyang du wrote: 
> > > Hi all,
> > > 
> > > This is continued after the first RFC about splitting the scheduler. Still
> > > work-in-progress, and call for feedback.
> > > 
> > > The question addressed here is how load balance should be changed. And I think
> > > the question then goes to how to *reuse* common code as much as possible and
> > > meanwhile be able to serve various objectives.
> > > 
> > > So these are the basic semantics needed in current load balance:
> > 
> > I'll probably regret it, but I'm gonna speak my mind.  I think this 
> > two halves concept is fundamentally broken.
> 
> As PeterZ pointed it out in the previous discussion, this approach, 
> besides being fundamentally broken, also gives no valid technical 
> rationale given for the change.
> 
> Firstly, I'd like to stress it that we are not against abstraction and 
> interfaces within the scheduler (at all!) - we already have a 'split' 
> and use interfaces between 'scheduler classes':
> 
> struct sched_class {
> 	const struct sched_class *next;
> 
> 	void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags);
> 	void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags);
> 	void (*yield_task) (struct rq *rq);
> 	bool (*yield_to_task) (struct rq *rq, struct task_struct *p, bool preempt);
> 
> 	void (*check_preempt_curr) (struct rq *rq, struct task_struct *p, int flags);
> 
> 	/*
> 	 * It is the responsibility of the pick_next_task() method that will
> 	 * return the next task to call put_prev_task() on the @prev task or
> 	 * something equivalent.
> 	 *
> 	 * May return RETRY_TASK when it finds a higher prio class has runnable
> 	 * tasks.
> 	 */
> 	struct task_struct * (*pick_next_task) (struct rq *rq,
> 						struct task_struct *prev);
> 	void (*put_prev_task) (struct rq *rq, struct task_struct *p);
> 
> #ifdef CONFIG_SMP
> 	int  (*select_task_rq)(struct task_struct *p, int task_cpu, int sd_flag, int flags);
> 	void (*migrate_task_rq)(struct task_struct *p, int next_cpu);
> 
> 	void (*post_schedule) (struct rq *this_rq);
> 	void (*task_waking) (struct task_struct *task);
> 	void (*task_woken) (struct rq *this_rq, struct task_struct *task);
> 
> 	void (*set_cpus_allowed)(struct task_struct *p,
> 				 const struct cpumask *newmask);
> 
> 	void (*rq_online)(struct rq *rq);
> 	void (*rq_offline)(struct rq *rq);
> #endif
> 
> 	void (*set_curr_task) (struct rq *rq);
> 	void (*task_tick) (struct rq *rq, struct task_struct *p, int queued);
> 	void (*task_fork) (struct task_struct *p);
> 	void (*task_dead) (struct task_struct *p);
> 
> 	void (*switched_from) (struct rq *this_rq, struct task_struct *task);
> 	void (*switched_to) (struct rq *this_rq, struct task_struct *task);
> 	void (*prio_changed) (struct rq *this_rq, struct task_struct *task,
> 			     int oldprio);
> 
> 	unsigned int (*get_rr_interval) (struct rq *rq,
> 					 struct task_struct *task);
> 
> #ifdef CONFIG_FAIR_GROUP_SCHED
> 	void (*task_move_group) (struct task_struct *p, int on_rq);
> #endif
> };
> 
> So where it makes sense we make use of this programming technique, to 
> the extent it is helpful.
> 
> But interfaces and abstraction has a cost, and the justification given 
> in this submission looks very weak to me. There's no justification 
> given in this specific submission, the closest I could find was in the 
> first submission:
> 
> > > With the advent of more cores and heterogeneous architectures, the 
> > > scheduler is required to be more complex (power efficiency) and 
> > > diverse (big.little). For the scheduler to address that challenge 
> > > as a whole, it is costly but not necessary. This proposal argues 
> > > that the scheduler be spitted into two parts: top half (task 
> > > scheduling) and bottom half (load balance). Let the bottom half 
> > > take charge of the incoming requirements.
> 
> That is just way too generic with no specific technical benefits 
> listed. No cost/benefit demonstrated.
> 
> If there's any advantage to a 'split', then it must be expressable via 
> one or more of these positive attributes:
> 
>  - better numbers (better performance, etc.)
>  - reduced code
>  - new features
> 
> A split alone, without making active and convincing use of it, is 
> inadequate.
> 
> So without a much better rationale, demonstrated via actual, real 
> working code that not only does the split but also makes real use of 
> every aspect of the proposed abstraction interfaces, which 
> demonstrates that the proposed 'split' is the most sensible way 
> forward, this specific submission earns a NAK from me.
> 
> Thanks,
> 
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC II] Splitting scheduler into two halves
  2014-03-27 22:13     ` Yuyang Du
@ 2014-03-28  6:50       ` Mike Galbraith
  2014-03-27 23:00         ` Yuyang Du
  0 siblings, 1 reply; 7+ messages in thread
From: Mike Galbraith @ 2014-03-28  6:50 UTC (permalink / raw)
  To: Yuyang Du
  Cc: Ingo Molnar, peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	morten.rasmussen@arm.com, Van De Ven, Arjan, Brown, Len,
	Wysocki, Rafael J, Cox, Alan

On Fri, 2014-03-28 at 06:13 +0800, Yuyang Du wrote:

> Spitting does feel brutal...

FWIW, "split" stuck hard in my gullet because task placement is core
fastpath mission.  If the fastpath could afford to and did task
placement perfectly, task placement EDC (if you will) mechanisms would
not exist.

-Mike


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC II] Splitting scheduler into two halves
  2014-03-28  6:50       ` Mike Galbraith
@ 2014-03-27 23:00         ` Yuyang Du
  2014-03-28  7:05           ` Mike Galbraith
  0 siblings, 1 reply; 7+ messages in thread
From: Yuyang Du @ 2014-03-27 23:00 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Ingo Molnar, peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	morten.rasmussen@arm.com, Van De Ven, Arjan, Brown, Len,
	Wysocki, Rafael J, Cox, Alan

On Fri, Mar 28, 2014 at 07:50:31AM +0100, Mike Galbraith wrote:
> On Fri, 2014-03-28 at 06:13 +0800, Yuyang Du wrote:
> 
> > Spitting does feel brutal...
> 
> FWIW, "split" stuck hard in my gullet because task placement is core

Oh, sorry for that, :)

> fastpath mission.  If the fastpath could afford to and did task
> placement perfectly, task placement EDC (if you will) mechanisms would
> not exist.

Sometimes, we have to if not get to.

Yuyang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC II] Splitting scheduler into two halves
  2014-03-27 23:00         ` Yuyang Du
@ 2014-03-28  7:05           ` Mike Galbraith
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Galbraith @ 2014-03-28  7:05 UTC (permalink / raw)
  To: Yuyang Du
  Cc: Ingo Molnar, peterz@infradead.org, mingo@redhat.com,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	morten.rasmussen@arm.com, Van De Ven, Arjan, Brown, Len,
	Wysocki, Rafael J, Cox, Alan

On Fri, 2014-03-28 at 07:00 +0800, Yuyang Du wrote: 
> On Fri, Mar 28, 2014 at 07:50:31AM +0100, Mike Galbraith wrote:
> > On Fri, 2014-03-28 at 06:13 +0800, Yuyang Du wrote:
> > 
> > > Spitting does feel brutal...
> > 
> > FWIW, "split" stuck hard in my gullet because task placement is core
> 
> Oh, sorry for that, :)

It wasn't _painful_ or anything, I just couldn't swallow it ;-)

> > fastpath mission.  If the fastpath could afford to and did task
> > placement perfectly, task placement EDC (if you will) mechanisms would
> > not exist.
> 
> Sometimes, we have to if not get to.

Yup... but it's still core mission.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-03-28  7:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-26 18:37 [RFC II] Splitting scheduler into two halves Yuyang du
2014-03-27  4:57 ` Mike Galbraith
2014-03-27  7:25   ` Ingo Molnar
2014-03-27 22:13     ` Yuyang Du
2014-03-28  6:50       ` Mike Galbraith
2014-03-27 23:00         ` Yuyang Du
2014-03-28  7:05           ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).