From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Lezcano Subject: Re: [PATCH 2/3] sched: idle: Add sched balance option Date: Fri, 25 Apr 2014 14:17:39 +0200 Message-ID: <535A5263.6090104@linaro.org> References: <1398342291-16322-1-git-send-email-daniel.lezcano@linaro.org> <535911DC.9050109@linaro.org> <2713863.BLQTYQm2Oa@vostro.rjw.lan> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <2713863.BLQTYQm2Oa@vostro.rjw.lan> Sender: linux-kernel-owner@vger.kernel.org To: "Rafael J. Wysocki" , Amit Kucheria Cc: Peter Zijlstra , Ingo Molnar , Lists linaro-kernel , Linux PM list , Linux Kernel Mailing List List-Id: linux-pm@vger.kernel.org On 04/25/2014 01:46 PM, Rafael J. Wysocki wrote: > On Friday, April 25, 2014 04:24:49 PM Amit Kucheria wrote: >> On Thu, Apr 24, 2014 at 7:00 PM, Daniel Lezcano >> wrote: >>> >>> On 04/24/2014 03:13 PM, Amit Kucheria wrote: >>>> >>>> On Thu, Apr 24, 2014 at 5:54 PM, Daniel Lezcano >>>> > wro= te: >>>> >>>> This patch adds a sysctl schedule balance option to choose ag= ainst: >>>> >>>> * auto (0) >>>> * performance (1) >>>> * power (2) >>>> >>>> It relies on the recently added notifier to monitor the power= supply >>>> changes. >>>> If the scheduler balance option is set to 'auto', then when t= he >>>> system switches >>>> to battery, the balance option change to 'power' and when it = goes >>>> back to AC, it >>>> switches to 'performance'. >>>> >>>> The default value is 'auto'. >>>> >>>> If the kernel is compiled without the CONFIG_POWER_SUPPLY opt= ion, >>>> then any call >>>> to the 'auto' option will fail and the scheduler will use the >>>> 'performance' >>>> option as default. >>>> >>>> Signed-off-by: Daniel Lezcano >>> > >>>> >>>> --- >>>> include/linux/sched/sysctl.h | 14 +++++++ >>>> kernel/sched/fair.c | 92 >>>> +++++++++++++++++++++++++++++++++++++++++- >>>> kernel/sysctl.c | 11 +++++ >>>> 3 files changed, 115 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/include/linux/sched/sysctl.h b/include/linux/sch= ed/sysctl.h >>>> index 8045a55..f8507bf 100644 >>>> --- a/include/linux/sched/sysctl.h >>>> +++ b/include/linux/sched/sysctl.h >>>> @@ -44,6 +44,20 @@ enum sched_tunable_scaling { >>>> }; >>>> extern enum sched_tunable_scaling sysctl_sched_tunable_scal= ing; >>>> >>>> +#ifdef CONFIG_SMP >>>> +enum sched_balance_option { >>>> >>>> >>>> What do you think of s/option/bias/g ? >>>> >>>> It is essentially biasing the scheduler towards performance or pow= er >>> >>> >>> Yes, could be more adequate. >>> >>> >>>> + SCHED_BALANCE_OPTION_PERFORMANCE, >>>> + SCHED_BALANCE_OPTION_POWER, >>>> + SCHED_BALANCE_OPTION_AUTO, >>>> + SCHED_BALANCE_OPTION_END, >>>> +}; >>>> >>>> >>>> +extern enum sched_balance_option sysctl_sched_balance_option= ; >>>> + >>>> +int sched_proc_balance_option_handler(struct ctl_table *tabl= e, int >>>> write, >>>> + void __user *buffer, size_t *length, >>>> + loff_t *ppos); >>>> +#endif >>>> + >>>> extern unsigned int sysctl_numa_balancing_scan_delay; >>>> extern unsigned int sysctl_numa_balancing_scan_period_min; >>>> extern unsigned int sysctl_numa_balancing_scan_period_max; >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>> index 7570dd9..7b8e93d 100644 >>>> --- a/kernel/sched/fair.c >>>> +++ b/kernel/sched/fair.c >>>> @@ -29,7 +29,7 @@ >>>> #include >>>> #include >>>> #include >>>> - >>>> +#include >>>> #include >>>> >>>> #include "sched.h" >>>> @@ -61,6 +61,24 @@ unsigned int normalized_sysctl_sched_laten= cy =3D >>>> 6000000ULL; >>>> enum sched_tunable_scaling sysctl_sched_tunable_scaling >>>> =3D SCHED_TUNABLESCALING_LOG; >>>> >>>> +#ifdef CONFIG_SMP >>>> +/* >>>> + * Scheduler balancing policy: >>>> + * >>>> + * Options are: >>>> + * SCHED_BALANCE_OPTION_PERFORMANCE - full performance >>>> + * SCHED_BALANCE_OPTION_POWER - power saving aggressive >>>> + * SCHED_BALANCE_OPTION_AUTO - switches to 'performance' whe= n plugged >>>> + * on or 'power' on battery >>>> + */ >>>> +enum sched_balance_option sysctl_sched_balance_option >>>> + =3D SCHED_BALANCE_OPTION_AUTO; >>>> + >>>> +static int sched_current_balance_option >>>> + =3D SCHED_BALANCE_OPTION_PERFORMANCE; >>>> + >>>> +#endif >>>> + >>>> /* >>>> * Minimal preemption granularity for CPU-bound tasks: >>>> * (default: 0.75 msec * (1 + ilog(ncpus)), units: nanoseco= nds) >>>> @@ -555,6 +573,76 @@ static struct sched_entity >>>> *__pick_next_entity(struct sched_entity *se) >>>> return rb_entry(next, struct sched_entity, run_node)= ; >>>> } >>>> >>>> +#ifdef CONFIG_SMP >>>> +static int sched_balance_option_update(void) >>>> +{ >>>> + int ret; >>>> + >>>> + /* >>>> + * Copy the current balance option >>>> + */ >>>> + if (sysctl_sched_balance_option !=3D SCHED_BALANCE_OP= TION_AUTO) { >>>> + sched_current_balance_option =3D >>>> sysctl_sched_balance_option; >>>> + return 0; >>>> + } >>>> + >>>> + /* >>>> + * This call may fail if the kernel is not compiled w= ith >>>> + * the POWER_SUPPLY option. >>>> + */ >>>> + ret =3D power_supply_is_system_supplied(); >>>> + if (ret < 0) { >>>> + sysctl_sched_balance_option =3D >>>> sched_current_balance_option; >>>> + return ret; >>>> + } >>>> + >>>> + /* >>>> + * When in 'auto' mode, switch to 'performance if the= system >>>> + * is plugged on the wall, to 'power' if we are on ba= ttery >>>> + */ >>>> + sched_current_balance_option =3D ret ? >>>> + SCHED_BALANCE_OPTION_PERFORMANCE : >>>> + SCHED_BALANCE_OPTION_POWER; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> >>>> >>>> I understand that this is only meant to kick off discussions and o= ther >>>> criteria besides power being plugged in to bias the scheduler >>>> performance could be added later. But does it make sense to hardco= de the >>>> power supply assumption into the scheduler? >>>> >>>> Can't we instead make sched_balance_option_update() a function poi= nter >>>> (with a default implementation that you've provided) that provide >>>> platforms the ability to override that with their own implementati= on? >>> >>> >>> I agree if that really hurts, it could be placed somewhere else, fo= r example in a new file: >>> kernel/sched/energy.c >>> >>> But concerning the callback, I don't see the point to create a spec= ific platform ops for that as the current code is generic. Do you have = any use case in mind ? >>> >> >> I had a offline conversation with Daniel about this since there are >> other triggers - thermal constraints and game-like apps being exampl= es >> - that might want to override the system policy. He intended >> "performance" mode to mean the existing code paths and "power" mode = to >> mean *additional* new heuristics for energy-efficiency. The power >> supply assumption is just the first one of those heuristics. > > Well, so now the question is whether or not we relly want to always > go to the "power" (or "energy efficiency" if you will) mode if the sy= stem > is on battery. That arguably may not be a good thing even for energy > efficiency depending on how exactly the modes are defined. > > So in my opinion it's too early to add things like that at this point= =2E Ok, the point to solve is to have the old code path (performance) and=20 the new code path (power) to be selectable from userspace. It is acceptable to keep the power/performance option for now without=20 the power supply thing ? And we address the policy to switch from one=20 mode to another mode with another option later ? >> In that case, "performance" should be the default so we don't change >> existing system behavior. And perhaps we want to keep these energy >> heuristics in a separate file, at least until we've gotten them righ= t. >> >> While we're at it, can we attempt to replace power with energy since >> that is what we're trying to save in most cases? > > That would be a good thing IMO. --=20 Linaro.org =E2=94=82 Open source software fo= r ARM SoCs =46ollow Linaro: Facebook | Twitter | Blog