* [RFC PATCH] sched: find the latest idle cpu
@ 2014-01-15 4:07 Alex Shi
2014-01-15 4:31 ` Michael wang
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Alex Shi @ 2014-01-15 4:07 UTC (permalink / raw)
To: mingo, peterz, tglx, daniel.lezcano, vincent.guittot,
morten.rasmussen
Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel, Alex Shi
Currently we just try to find least load cpu. If some cpus idled,
we just pick the first cpu in cpu mask.
In fact we can get the interrupted idle cpu or the latest idled cpu,
then we may get the benefit from both latency and power.
The selected cpu maybe not the best, since other cpu may be interrupted
during our selecting. But be captious costs too much.
Signed-off-by: Alex Shi <alex.shi@linaro.org>
---
kernel/sched/fair.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..fb52d26 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4167,6 +4167,26 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
min_load = load;
idlest = i;
}
+#ifdef CONFIG_NO_HZ_COMMON
+ /*
+ * Coarsely to get the latest idle cpu for shorter latency and
+ * possible power benefit.
+ */
+ if (!min_load) {
+ struct tick_sched *ts = &per_cpu(tick_cpu_sched, i);
+
+ s64 latest_wake = 0;
+ /* idle cpu doing irq */
+ if (ts->inidle && !ts->idle_active)
+ idlest = i;
+ /* the cpu resched */
+ else if (!ts->inidle)
+ idlest = i;
+ /* find latest idle cpu */
+ else if (ktime_to_us(ts->idle_entrytime) > latest_wake)
+ idlest = i;
+ }
+#endif
}
return idlest;
--
1.8.1.2
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 4:07 [RFC PATCH] sched: find the latest idle cpu Alex Shi @ 2014-01-15 4:31 ` Michael wang 2014-01-15 4:48 ` Alex Shi 2014-01-15 5:33 ` Michael wang 2014-01-15 7:35 ` Peter Zijlstra 2 siblings, 1 reply; 15+ messages in thread From: Michael wang @ 2014-01-15 4:31 UTC (permalink / raw) To: Alex Shi, mingo, peterz, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel Hi, Alex On 01/15/2014 12:07 PM, Alex Shi wrote: [snip] } > +#ifdef CONFIG_NO_HZ_COMMON > + /* > + * Coarsely to get the latest idle cpu for shorter latency and > + * possible power benefit. > + */ > + if (!min_load) { > + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i); > + > + s64 latest_wake = 0; I guess we missed some code for latest_wake here? Regards, Michael Wang > + /* idle cpu doing irq */ > + if (ts->inidle && !ts->idle_active) > + idlest = i; > + /* the cpu resched */ > + else if (!ts->inidle) > + idlest = i; > + /* find latest idle cpu */ > + else if (ktime_to_us(ts->idle_entrytime) > latest_wake) > + idlest = i; > + } > +#endif > } > > return idlest; > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 4:31 ` Michael wang @ 2014-01-15 4:48 ` Alex Shi 2014-01-15 4:53 ` Alex Shi 0 siblings, 1 reply; 15+ messages in thread From: Alex Shi @ 2014-01-15 4:48 UTC (permalink / raw) To: Michael wang, mingo, peterz, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel On 01/15/2014 12:31 PM, Michael wang wrote: > Hi, Alex > > On 01/15/2014 12:07 PM, Alex Shi wrote: > [snip] } >> +#ifdef CONFIG_NO_HZ_COMMON >> + /* >> + * Coarsely to get the latest idle cpu for shorter latency and >> + * possible power benefit. >> + */ >> + if (!min_load) { here should be !load. >> + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i); >> + >> + s64 latest_wake = 0; > > I guess we missed some code for latest_wake here? Yes, thanks for reminder! so updated patch: ==== >From c3a88e73fed3da96549b5a922076e996832685f8 Mon Sep 17 00:00:00 2001 From: Alex Shi <alex.shi@linaro.org> Date: Tue, 14 Jan 2014 23:07:42 +0800 Subject: [PATCH] sched: find the latest idle cpu Currently we just try to find least load cpu. If some cpus idled, we just pick the first cpu in cpu mask. In fact we can get the interrupted idle cpu or the latest idled cpu, then we may get the benefit from both latency and power. The selected cpu maybe not the best, since other cpu may be interrupted during our selecting. But be captious costs too much. Signed-off-by: Alex Shi <alex.shi@linaro.org> --- kernel/sched/fair.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c7395d9..73a2a07 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4167,6 +4167,31 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) min_load = load; idlest = i; } +#ifdef CONFIG_NO_HZ_COMMON + /* + * Coarsely to get the latest idle cpu for shorter latency and + * possible power benefit. + */ + if (!load) { + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i); + + s64 latest_wake = 0; + /* idle cpu doing irq */ + if (ts->inidle && !ts->idle_active) + idlest = i; + /* the cpu resched */ + else if (!ts->inidle) + idlest = i; + /* find latest idle cpu */ + else { + s64 temp = ktime_to_us(ts->idle_entrytime); + if (temp > latest_wake) { + latest_wake = temp; + idlest = i; + } + } + } +#endif } return idlest; -- 1.8.1.2 -- Thanks Alex ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 4:48 ` Alex Shi @ 2014-01-15 4:53 ` Alex Shi 2014-01-15 5:06 ` Alex Shi 0 siblings, 1 reply; 15+ messages in thread From: Alex Shi @ 2014-01-15 4:53 UTC (permalink / raw) To: Michael wang, mingo, peterz, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel On 01/15/2014 12:48 PM, Alex Shi wrote: > On 01/15/2014 12:31 PM, Michael wang wrote: >> Hi, Alex >> >> On 01/15/2014 12:07 PM, Alex Shi wrote: >> [snip] } >>> +#ifdef CONFIG_NO_HZ_COMMON >>> + /* >>> + * Coarsely to get the latest idle cpu for shorter latency and >>> + * possible power benefit. >>> + */ >>> + if (!min_load) { > > here should be !load. >>> + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i); >>> + >>> + s64 latest_wake = 0; >> >> I guess we missed some code for latest_wake here? > > Yes, thanks for reminder! > > so updated patch: > ops, still incorrect. re-updated: === >From 5d48303b3eb3b5ca7fde54a6dfcab79cff360403 Mon Sep 17 00:00:00 2001 From: Alex Shi <alex.shi@linaro.org> Date: Tue, 14 Jan 2014 23:07:42 +0800 Subject: [PATCH] sched: find the latest idle cpu Currently we just try to find least load cpu. If some cpus idled, we just pick the first cpu in cpu mask. In fact we can get the interrupted idle cpu or the latest idled cpu, then we may get the benefit from both latency and power. The selected cpu maybe not the best, since other cpu may be interrupted during our selecting. But be captious costs too much. Signed-off-by: Alex Shi <alex.shi@linaro.org> --- kernel/sched/fair.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c7395d9..e2c4cd9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4161,12 +4161,38 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) /* Traverse only the allowed CPUs */ for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { + s64 latest_wake = 0; + load = weighted_cpuload(i); if (load < min_load || (load == min_load && i == this_cpu)) { min_load = load; idlest = i; } +#ifdef CONFIG_NO_HZ_COMMON + /* + * Coarsely to get the latest idle cpu for shorter latency and + * possible power benefit. + */ + if (!load) { + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i); + + /* idle cpu doing irq */ + if (ts->inidle && !ts->idle_active) + idlest = i; + /* the cpu resched */ + else if (!ts->inidle) + idlest = i; + /* find latest idle cpu */ + else { + s64 temp = ktime_to_us(ts->idle_entrytime); + if (temp > latest_wake) { + latest_wake = temp; + idlest = i; + } + } + } +#endif } return idlest; -- 1.8.1.2 -- Thanks Alex ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 4:53 ` Alex Shi @ 2014-01-15 5:06 ` Alex Shi 0 siblings, 0 replies; 15+ messages in thread From: Alex Shi @ 2014-01-15 5:06 UTC (permalink / raw) To: Michael wang, mingo, peterz, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel On 01/15/2014 12:53 PM, Alex Shi wrote: >>> >> I guess we missed some code for latest_wake here? >> > >> > Yes, thanks for reminder! >> > >> > so updated patch: >> > > ops, still incorrect. re-updated: update to wrong file. re-re-update. :( === >From b75e43bb77df14e2209532c1e5c48e0e03afa414 Mon Sep 17 00:00:00 2001 From: Alex Shi <alex.shi@linaro.org> Date: Tue, 14 Jan 2014 23:07:42 +0800 Subject: [PATCH] sched: find the latest idle cpu Currently we just try to find least load cpu. If some cpus idled, we just pick the first cpu in cpu mask. In fact we can get the interrupted idle cpu or the latest idled cpu, then we may get the benefit from both latency and power. The selected cpu maybe not the best, since other cpu may be interrupted during our selecting. But be captious costs too much. Signed-off-by: Alex Shi <alex.shi@linaro.org> --- kernel/sched/fair.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c7395d9..f82ca3d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4159,6 +4159,10 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) int idlest = -1; int i; +#ifdef CONFIG_NO_HZ_COMMON + s64 latest_wake = 0; +#endif + /* Traverse only the allowed CPUs */ for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { load = weighted_cpuload(i); @@ -4167,6 +4171,30 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) min_load = load; idlest = i; } +#ifdef CONFIG_NO_HZ_COMMON + /* + * Coarsely to get the latest idle cpu for shorter latency and + * possible power benefit. + */ + if (!load) { + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i); + + /* idle cpu doing irq */ + if (ts->inidle && !ts->idle_active) + idlest = i; + /* the cpu resched */ + else if (!ts->inidle) + idlest = i; + /* find latest idle cpu */ + else { + s64 temp = ktime_to_us(ts->idle_entrytime); + if (temp > latest_wake) { + latest_wake = temp; + idlest = i; + } + } + } +#endif } return idlest; -- 1.8.1.2 -- Thanks Alex ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 4:07 [RFC PATCH] sched: find the latest idle cpu Alex Shi 2014-01-15 4:31 ` Michael wang @ 2014-01-15 5:33 ` Michael wang 2014-01-15 6:45 ` Alex Shi 2014-01-15 7:35 ` Peter Zijlstra 2 siblings, 1 reply; 15+ messages in thread From: Michael wang @ 2014-01-15 5:33 UTC (permalink / raw) To: Alex Shi, mingo, peterz, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel On 01/15/2014 12:07 PM, Alex Shi wrote: > Currently we just try to find least load cpu. If some cpus idled, > we just pick the first cpu in cpu mask. > > In fact we can get the interrupted idle cpu or the latest idled cpu, > then we may get the benefit from both latency and power. > The selected cpu maybe not the best, since other cpu may be interrupted > during our selecting. But be captious costs too much. So the idea here is we want to choose the latest idle cpu if we have multiple idle cpu for choosing, correct? And I guess that was in order to avoid choosing tickless cpu while there are un-tickless idle one, is that right? What confused me is, what about those cpu who just going to recover from tickless as you mentioned, which means latest idle doesn't mean the best choice, or even could be the worst (if just two choice, and the longer tickless one is just going to recover while the latest is going to tickless). So what about just check 'ts->tick_stopped' and record one ticking idle cpu? the cost could be lower than time comparison, we could reduce the risk may be...(well, not so risky since the logical only works when system is relaxing with several cpu idle) Regards, Michael Wang > > Signed-off-by: Alex Shi <alex.shi@linaro.org> > --- > kernel/sched/fair.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index c7395d9..fb52d26 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4167,6 +4167,26 @@ find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) > min_load = load; > idlest = i; > } > +#ifdef CONFIG_NO_HZ_COMMON > + /* > + * Coarsely to get the latest idle cpu for shorter latency and > + * possible power benefit. > + */ > + if (!min_load) { > + struct tick_sched *ts = &per_cpu(tick_cpu_sched, i); > + > + s64 latest_wake = 0; > + /* idle cpu doing irq */ > + if (ts->inidle && !ts->idle_active) > + idlest = i; > + /* the cpu resched */ > + else if (!ts->inidle) > + idlest = i; > + /* find latest idle cpu */ > + else if (ktime_to_us(ts->idle_entrytime) > latest_wake) > + idlest = i; > + } > +#endif > } > > return idlest; > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 5:33 ` Michael wang @ 2014-01-15 6:45 ` Alex Shi 2014-01-15 8:05 ` Michael wang 0 siblings, 1 reply; 15+ messages in thread From: Alex Shi @ 2014-01-15 6:45 UTC (permalink / raw) To: Michael wang, mingo, peterz, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel On 01/15/2014 01:33 PM, Michael wang wrote: > On 01/15/2014 12:07 PM, Alex Shi wrote: >> > Currently we just try to find least load cpu. If some cpus idled, >> > we just pick the first cpu in cpu mask. >> > >> > In fact we can get the interrupted idle cpu or the latest idled cpu, >> > then we may get the benefit from both latency and power. >> > The selected cpu maybe not the best, since other cpu may be interrupted >> > during our selecting. But be captious costs too much. > So the idea here is we want to choose the latest idle cpu if we have > multiple idle cpu for choosing, correct? yes. > > And I guess that was in order to avoid choosing tickless cpu while there > are un-tickless idle one, is that right? no, current logical choice least load cpu no matter if it is idle. > > What confused me is, what about those cpu who just going to recover from > tickless as you mentioned, which means latest idle doesn't mean the best > choice, or even could be the worst (if just two choice, and the longer > tickless one is just going to recover while the latest is going to > tickless). yes, to save your scenario, we need to know the next timer for idle cpu, but that is not enough, interrupt is totally unpredictable. So, I'd rather bear the coarse method now. > > So what about just check 'ts->tick_stopped' and record one ticking idle > cpu? the cost could be lower than time comparison, we could reduce the > risk may be...(well, not so risky since the logical only works when > system is relaxing with several cpu idle) first, nohz full also stop tick. second, tick_stopped can not reflect the interrupt. when the idle cpu was interrupted, it's waken, then be a good candidate for task running. -- Thanks Alex ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 6:45 ` Alex Shi @ 2014-01-15 8:05 ` Michael wang 2014-01-15 14:28 ` Alex Shi 0 siblings, 1 reply; 15+ messages in thread From: Michael wang @ 2014-01-15 8:05 UTC (permalink / raw) To: Alex Shi, mingo, peterz, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel On 01/15/2014 02:45 PM, Alex Shi wrote: [snip] > > yes, to save your scenario, we need to know the next timer for idle cpu, > but that is not enough, interrupt is totally unpredictable. So, I'd > rather bear the coarse method now. >> >> So what about just check 'ts->tick_stopped' and record one ticking idle >> cpu? the cost could be lower than time comparison, we could reduce the >> risk may be...(well, not so risky since the logical only works when >> system is relaxing with several cpu idle) > > first, nohz full also stop tick. second, tick_stopped can not reflect > the interrupt. when the idle cpu was interrupted, it's waken, then be a > good candidate for task running. IMHO, if we have to do gamble here, we better choose the cheaper bet, unless we could prove this 'coarse method' have more higher chance for BINGO than just check 'tick_stopped'... BTW, may be the logical should be in the select_idle_sibling()? Regards, Michael Wang > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 8:05 ` Michael wang @ 2014-01-15 14:28 ` Alex Shi 0 siblings, 0 replies; 15+ messages in thread From: Alex Shi @ 2014-01-15 14:28 UTC (permalink / raw) To: Michael wang, mingo, peterz, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen Cc: linux-kernel, akpm, fengguang.wu, linaro-kernel On 01/15/2014 04:05 PM, Michael wang wrote: > On 01/15/2014 02:45 PM, Alex Shi wrote: > [snip] >> >> yes, to save your scenario, we need to know the next timer for idle cpu, >> but that is not enough, interrupt is totally unpredictable. So, I'd >> rather bear the coarse method now. >>> >>> So what about just check 'ts->tick_stopped' and record one ticking idle >>> cpu? the cost could be lower than time comparison, we could reduce the >>> risk may be...(well, not so risky since the logical only works when >>> system is relaxing with several cpu idle) >> >> first, nohz full also stop tick. second, tick_stopped can not reflect >> the interrupt. when the idle cpu was interrupted, it's waken, then be a >> good candidate for task running. > > IMHO, if we have to do gamble here, we better choose the cheaper bet, > unless we could prove this 'coarse method' have more higher chance for > BINGO than just check 'tick_stopped'... Tick stopped on a nohz full CPU, but the cpu still had a task running... > > BTW, may be the logical should be in the select_idle_sibling()? both of functions need to be considered. > > Regards, > Michael Wang > >> > -- Thanks Alex ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 4:07 [RFC PATCH] sched: find the latest idle cpu Alex Shi 2014-01-15 4:31 ` Michael wang 2014-01-15 5:33 ` Michael wang @ 2014-01-15 7:35 ` Peter Zijlstra 2014-01-15 14:37 ` Alex Shi 2 siblings, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2014-01-15 7:35 UTC (permalink / raw) To: Alex Shi Cc: mingo, tglx, daniel.lezcano, vincent.guittot, morten.rasmussen, linux-kernel, akpm, fengguang.wu, linaro-kernel On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote: > Currently we just try to find least load cpu. If some cpus idled, > we just pick the first cpu in cpu mask. > > In fact we can get the interrupted idle cpu or the latest idled cpu, > then we may get the benefit from both latency and power. > The selected cpu maybe not the best, since other cpu may be interrupted > during our selecting. But be captious costs too much. No, we should not do anything like this without first integrating cpuidle. At which point we have a sane view of the idle states and can make a sane choice between them. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 7:35 ` Peter Zijlstra @ 2014-01-15 14:37 ` Alex Shi 2014-01-16 11:03 ` Daniel Lezcano 0 siblings, 1 reply; 15+ messages in thread From: Alex Shi @ 2014-01-15 14:37 UTC (permalink / raw) To: Peter Zijlstra, daniel.lezcano Cc: mingo, tglx, vincent.guittot, morten.rasmussen, linux-kernel, akpm, fengguang.wu, linaro-kernel On 01/15/2014 03:35 PM, Peter Zijlstra wrote: > On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote: >> Currently we just try to find least load cpu. If some cpus idled, >> we just pick the first cpu in cpu mask. >> >> In fact we can get the interrupted idle cpu or the latest idled cpu, >> then we may get the benefit from both latency and power. >> The selected cpu maybe not the best, since other cpu may be interrupted >> during our selecting. But be captious costs too much. > > No, we should not do anything like this without first integrating > cpuidle. > > At which point we have a sane view of the idle states and can make a > sane choice between them. > Daniel, Any comments to make it better? -- Thanks Alex ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-15 14:37 ` Alex Shi @ 2014-01-16 11:03 ` Daniel Lezcano 2014-01-16 11:38 ` Peter Zijlstra 0 siblings, 1 reply; 15+ messages in thread From: Daniel Lezcano @ 2014-01-16 11:03 UTC (permalink / raw) To: Alex Shi, Peter Zijlstra Cc: mingo, tglx, vincent.guittot, morten.rasmussen, linux-kernel, akpm, fengguang.wu, linaro-kernel, Michael wang On 01/15/2014 03:37 PM, Alex Shi wrote: > On 01/15/2014 03:35 PM, Peter Zijlstra wrote: >> On Wed, Jan 15, 2014 at 12:07:59PM +0800, Alex Shi wrote: >>> Currently we just try to find least load cpu. If some cpus idled, >>> we just pick the first cpu in cpu mask. >>> >>> In fact we can get the interrupted idle cpu or the latest idled cpu, >>> then we may get the benefit from both latency and power. >>> The selected cpu maybe not the best, since other cpu may be interrupted >>> during our selecting. But be captious costs too much. >> >> No, we should not do anything like this without first integrating >> cpuidle. >> >> At which point we have a sane view of the idle states and can make a >> sane choice between them. >> > > > Daniel, > > Any comments to make it better? Hi Alex, it is a nice optimization attempt but I agree with Peter we should focus on integrating cpuidle. The question is "how do we integrate cpuidle ?" IMHO, the main problem are the governors, especially the menu governor. The menu governor tries to predict the events per cpu. This approach which gave us a nice benefit for the power saving may not fit well for the scheduler. I think we can classify the events in three categories: 1. fully predictable (timers) 2. partially predictable (eg. MMC, sdd or network) 3. unpredictable (eg. keyboard, network ingress after quiescent period) The menu governor mix 2 and 3 with statistics and a performance multiplier to reach shallow states based on heuristic and experimentation for a specific platform. I was wondering if we shouldn't create a per task io latency tracking. Mostly based on io_schedule and io_schedule_timeout, we track the latency for each task for each device, keeping up to date a rb-tree where the left-most leaf is the minimum latency for all the tasks running on a specific cpu. That allows better tracking when moving tasks across cpus. With this approach, we have something consistent with the per load task tracking. This io latency tracking gives us the next wake up event we can inject to the cpuidle framework directly. That removes all the code related to the menu governor statistics based on IO events and simplify a lot the menu governor code. So we replaced a piece of the cpuidle code by a scheduler code which I hope could be better for prediction, leading to a part of integration. In order to finish integrating the cpuidle framework in the scheduler, there are pending questions about the impact in the current design. Peter or Ingo, if you have time, could you have a look at the email I sent previously [1] ? Thanks -- Daniel [1] https://lkml.org/lkml/2013/12/17/106 -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-16 11:03 ` Daniel Lezcano @ 2014-01-16 11:38 ` Peter Zijlstra 2014-01-16 12:16 ` Daniel Lezcano 0 siblings, 1 reply; 15+ messages in thread From: Peter Zijlstra @ 2014-01-16 11:38 UTC (permalink / raw) To: Daniel Lezcano Cc: Alex Shi, mingo, tglx, vincent.guittot, morten.rasmussen, linux-kernel, akpm, fengguang.wu, linaro-kernel, Michael wang On Thu, Jan 16, 2014 at 12:03:13PM +0100, Daniel Lezcano wrote: > Hi Alex, > > it is a nice optimization attempt but I agree with Peter we should focus on > integrating cpuidle. > > The question is "how do we integrate cpuidle ?" > > IMHO, the main problem are the governors, especially the menu governor. Yah. > The menu governor tries to predict the events per cpu. This approach which > gave us a nice benefit for the power saving may not fit well for the > scheduler. So the way to start all this is I think to gradually share more and more. Start by pulling in the actual idle state; such that we can indeed observe what the relative cost is of waking a cpu (against another), and maybe even the predicted wakeup time. Then pull in the various statistics gathering bits -- without improving them. Then improve the statistics; try and remove duplicate statistics -- if there's such things, try and use the extra information the scheduler has etc.. Then worry about the governors, or what's left of them. > In order to finish integrating the cpuidle framework in the scheduler, there > are pending questions about the impact in the current design. > > Peter or Ingo, if you have time, could you have a look at the email I sent > previously [1] ? I read it once, it didn't make sense at the time, I just read it again, still doesn't make sense. We need the idle task, since we need to DO something to go idle, the scheduler needs to pick a task to go do that something. This is the idle task. You cannot get rid of that. In fact, the 'doing' of that task is running much of the cpuidle code, so by getting rid of it, there's nobody left to execute that code. Also, since its already running that cpuidle stuff, integrating it more closely with the scheduler will not in fact change much, it will still run it. Could of course be I'm not reading what you meant to write, if so, do try again ;-) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-16 11:38 ` Peter Zijlstra @ 2014-01-16 12:16 ` Daniel Lezcano 2014-01-17 2:40 ` Nicolas Pitre 0 siblings, 1 reply; 15+ messages in thread From: Daniel Lezcano @ 2014-01-16 12:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Alex Shi, mingo, tglx, vincent.guittot, morten.rasmussen, linux-kernel, akpm, fengguang.wu, linaro-kernel, Michael wang On 01/16/2014 12:38 PM, Peter Zijlstra wrote: > On Thu, Jan 16, 2014 at 12:03:13PM +0100, Daniel Lezcano wrote: >> Hi Alex, >> >> it is a nice optimization attempt but I agree with Peter we should focus on >> integrating cpuidle. >> >> The question is "how do we integrate cpuidle ?" >> >> IMHO, the main problem are the governors, especially the menu governor. > > Yah. > >> The menu governor tries to predict the events per cpu. This approach which >> gave us a nice benefit for the power saving may not fit well for the >> scheduler. > > So the way to start all this is I think to gradually share more and > more. > > Start by pulling in the actual idle state; such that we can indeed > observe what the relative cost is of waking a cpu (against another), and > maybe even the predicted wakeup time. Ok, I will send a patch for this. > Then pull in the various statistics gathering bits -- without improving > them. > > Then improve the statistics; try and remove duplicate statistics -- if > there's such things, try and use the extra information the scheduler has > etc.. > > Then worry about the governors, or what's left of them. > >> In order to finish integrating the cpuidle framework in the scheduler, there >> are pending questions about the impact in the current design. >> >> Peter or Ingo, if you have time, could you have a look at the email I sent >> previously [1] ? > > I read it once, it didn't make sense at the time, I just read it again, > still doesn't make sense. :) The question raised when I looked closely how to fully integrate cpuidle with the scheduler; in particular, the idle time. The scheduler idle time is not the same than the cpuidle idle time. A cpu can be idle for the scheduler 1s but it could be interrupted several times by an interrupt thus the idle time for cpuidle is different. But anyway ... > We need the idle task, since we need to DO something to go idle, the > scheduler needs to pick a task to go do that something. This is the idle > task. > > You cannot get rid of that. > > In fact, the 'doing' of that task is running much of the cpuidle code, > so by getting rid of it, there's nobody left to execute that code. > > Also, since its already running that cpuidle stuff, integrating it more > closely with the scheduler will not in fact change much, it will still > run it. > > Could of course be I'm not reading what you meant to write, if so, do > try again ;-) Well, I wanted to have a clarification of what was your feeling about how to integrate cpuidle in the scheduler. If removing the idle task (in the future) does not make sense for you, I will not insist. Let's see how the code evolves by integrating cpuidle and we will figure out what will be the impact on the idle task. Thanks for your feedbacks -- Daniel -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-blog/> Blog ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH] sched: find the latest idle cpu 2014-01-16 12:16 ` Daniel Lezcano @ 2014-01-17 2:40 ` Nicolas Pitre 0 siblings, 0 replies; 15+ messages in thread From: Nicolas Pitre @ 2014-01-17 2:40 UTC (permalink / raw) To: Daniel Lezcano Cc: Peter Zijlstra, linaro-kernel, Andrew Morton, linux-kernel, mingo, Michael wang, fengguang.wu On Thu, 16 Jan 2014, Daniel Lezcano wrote: > The question raised when I looked closely how to fully integrate cpuidle with > the scheduler; in particular, the idle time. > The scheduler idle time is not the same than the cpuidle idle time. > A cpu can be idle for the scheduler 1s but it could be interrupted several > times by an interrupt thus the idle time for cpuidle is different. But anyway > ... The idle task would run each time an interrupt has been serviced, either to yield to a newly awaken task or to put the CPU back to sleep. In the later case the idle task may simply do extra idleness accounting locally. If the former case happens most of the time then the scheduler idle time would be most representative already. And if threaded IRQs are used then the the scheduler idle time would be the same as cpuidle's. > > We need the idle task, since we need to DO something to go idle, the > > scheduler needs to pick a task to go do that something. This is the idle > > task. > > > > You cannot get rid of that. > > > > In fact, the 'doing' of that task is running much of the cpuidle code, > > so by getting rid of it, there's nobody left to execute that code. > > > > Also, since its already running that cpuidle stuff, integrating it more > > closely with the scheduler will not in fact change much, it will still > > run it. > > > > Could of course be I'm not reading what you meant to write, if so, do > > try again ;-) > > Well, I wanted to have a clarification of what was your feeling about how to > integrate cpuidle in the scheduler. If removing the idle task (in the future) > does not make sense for you, I will not insist. Let's see how the code evolves > by integrating cpuidle and we will figure out what will be the impact on the > idle task. I think we should be able to get rid of architecture specific idle loops. The idle loop could be moved close to the scheduler and architectures would only need to provide a default CPU halt method for when there is nothing else registered with the cpuidle subsystem. Nicolas ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-01-17 2:40 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-01-15 4:07 [RFC PATCH] sched: find the latest idle cpu Alex Shi 2014-01-15 4:31 ` Michael wang 2014-01-15 4:48 ` Alex Shi 2014-01-15 4:53 ` Alex Shi 2014-01-15 5:06 ` Alex Shi 2014-01-15 5:33 ` Michael wang 2014-01-15 6:45 ` Alex Shi 2014-01-15 8:05 ` Michael wang 2014-01-15 14:28 ` Alex Shi 2014-01-15 7:35 ` Peter Zijlstra 2014-01-15 14:37 ` Alex Shi 2014-01-16 11:03 ` Daniel Lezcano 2014-01-16 11:38 ` Peter Zijlstra 2014-01-16 12:16 ` Daniel Lezcano 2014-01-17 2:40 ` Nicolas Pitre
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).