public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tim Chen <tim.c.chen@linux.intel.com>
To: Pierre Gondois <pierre.gondois@arm.com>,
	Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Cc: dietmar.eggemann@arm.com, vincent.guittot@linaro.org,
	peterz@infradead.org, mingo@redhat.com, vschneid@redhat.com,
	linux-kernel@vger.kernel.org, ionela.voinescu@arm.com,
	quentin.perret@arm.com, srikar@linux.vnet.ibm.com,
	mgorman@techsingularity.net, mingo@kernel.org,
	yu.c.chen@intel.com
Subject: Re: [PATCH v2] sched/topology: remove sysctl_sched_energy_aware depending on the architecture
Date: Tue, 05 Sep 2023 14:53:20 -0700	[thread overview]
Message-ID: <9b23259e62826ee8be14c6fe5dcb4bfad40d4bee.camel@linux.intel.com> (raw)
In-Reply-To: <b81e3d8f-88e3-e7b5-0dbc-78268193db7e@arm.com>

On Tue, 2023-09-05 at 16:03 +0200, Pierre Gondois wrote:
> Hello Shrikanth,
> I tried the patch (on a platform using the cppc_cpufreq driver). The platform
> normally has EAS enabled, but the patch removed the sched_energy_aware sysctl.
> It seemed the following happened (in the below order):
> 
> 1. sched_energy_aware_sysctl_init()
> Doesn't set sysctl_sched_energy_aware as cpufreq_freq_invariance isn't set
> and arch_scale_freq_invariant() returns false
> 
> 2. cpufreq_register_driver()
> Sets cpufreq_freq_invariance during cpufreq initialization sched_energy_set()
> 
> 3. sched_energy_set()
> Is called with has_eas=0 since build_perf_domains() doesn't see the platform
> as EAS compatible. Indeed sysctl_sched_energy_aware=0.
> So with sysctl_sched_energy_aware=0 and has_eas=0, sched_energy_aware sysctl
> is not enabled even though EAS should be possible.
> 
> 
> On 9/1/23 08:52, Shrikanth Hegde wrote:
> > Currently sysctl_sched_energy_aware doesn't alter the said behaviour on
> > some of the architectures. IIUC its meant to either force rebuild the
> > perf domains or cleanup the perf domains by echoing 1 or 0 respectively.
> 
> There is a definition of the sysctl at:
> Documentation/admin-guide/sysctl/kernel.rst::sched_energy_aware
> 
> Also a personal comment about the commit message (FWIW), I think it should
> be a bit more impersonal and factual. The commit message seems to describe
> the code rather than the desired behaviour.

I also wonder if Shrikanth's description of the operations can be simplified.

In my mind, There are 3 variables describing the system:

1. sched_energy_capable : whether system is EAS capable
2. sched_energy_aware   : whether the admin wants to enables EAS
3. sched_energy_status  : sched_energy_capable && sched_energy_aware

Whenever there is a change in sched_energy_status, then we should trigger a rebuild
of the sched domain.  We should expose sched_energy_capable
to user rather than removing sched_energy_aware when sched_energy_capable == 0.

If the user know the value of sched_energy_capable, the user will know
if setting sched_energy_aware will change the system's sched_energy_status.

For system that can never support EAS,
we should simply make sched_energy_aware to be 0 and disallow it from getting written.

On systems that allow sched_energy_capable to be enabled (e.g. by brining smt on/offline),
we should allow setting sched_energy_aware even when sched_energy_capable is 0.
Once sched_energy_capable becomes 1, EAS is enabled.


Tim 
  
> 
> > 
> > perf domains are not built when there is SMT, or when there is no
> > Asymmetric CPU topologies or when there is no frequency invariance.
> > Since such cases EAS is not set and perf domains are not built. By
> > changing the values of sysctl_sched_energy_aware, its not possible to
> > force build the perf domains. Hence remove this sysctl on such platforms
> > that dont support it. Some of the settings can be changed later
> > such as smt_active by offlining the CPU's, In those cases if
> > build_perf_domains returns true, re-enable the sysctl.
> > 
> > Anytime, when sysctl_sched_energy_aware is changed sched_energy_update
> > is set when building the perf domains. Making use of that to find out if
> > the change is happening by sysctl or dynamic system change.
> > 
> > Taking different cases:
> > Case1. system while booting has EAS capability, sysctl will be set 1. Hence
> > perf domains will be built if needed. On changing the sysctl to 0, since
> > sched_energy_update is true, perf domains would be freed and sysctl will
> > not be removed. later sysctl is changed to 1, enabling the perf domains
> > rebuild again. Since sysctl is already there, it will skip register.
> > 
> > Case2. System while booting doesn't have EAS Capability. Later after system
> > change it becomes capable of EAS. sched_energy_update is false. Though
> > sysctl is 0, will go ahead and try to enable eas. This is the current
> > behaviour. if has_eas  is true, then sysctl will be registered. After
> > that any sysctl change is same as Case1.
> > 
> > Case3. System becomes not capable of EAS due to system change. Here since
> > sched_energy_update is false, build_perf_domains return has_eas as false
> > due to one of the checks and Since this is dynamic change remove the sysctl.
> > Any further change which enables EAS is Case2
> > 
> > Note: This hasn't been tested on platform which supports EAS. If the
> > change can be verified on that it would really help. This has been
> > tested on power10 which doesn't support EAS. sysctl_sched_energy_aware
> > is removed with patch.
> > 
> > changes since v1:
> > Chen Yu had pointed out that this will not destroy the perf domains on
> > architectures where EAS is supported by changing the sysctl. This patch
> > addresses that.
> > [v1] Link: https://lore.kernel.org/lkml/20230829065040.920629-1-sshegde@linux.vnet.ibm.com/#t
> > 
> > Signed-off-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
> > ---
> >   kernel/sched/topology.c | 45 +++++++++++++++++++++++++++++++++--------
> >   1 file changed, 37 insertions(+), 8 deletions(-)
> > 
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 05a5bc678c08..4d16269ac21a 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -208,7 +208,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
> > 
> >   #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> >   DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> > -static unsigned int sysctl_sched_energy_aware = 1;
> > +static unsigned int sysctl_sched_energy_aware;
> > +static struct ctl_table_header *sysctl_eas_header;
> 
> The variables around the presence/absence of EAS are:
> - sched_energy_present:
> EAS is up and running
> 
> - sysctl_sched_energy_aware:
> The user wants to use EAS (or not). Doesn't mean EAS can run on the
> platform.
> 
> - sched_energy_set/partition_sched_domains_locked's "has_eas":
> Local variable. Represent whether EAS can run on the platform.
> 
> IMO it would be simpler to (un)register sched_energy_aware sysctl
> in partition_sched_domains_locked(), based on the value of "has_eas".
> This would allow to let all the logic as it is right now, inside
> build_perf_domains(), and then advertise sched_energy_aware sysctl
> if EAS can run on the platform.
> sched_energy_aware_sysctl_init() would be deleted then.
> 
> 
> >   static DEFINE_MUTEX(sched_energy_mutex);
> >   static bool sched_energy_update;
> > 
> > @@ -226,6 +227,7 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> >   		void *buffer, size_t *lenp, loff_t *ppos)
> >   {
> >   	int ret, state;
> > +	int prev_val = sysctl_sched_energy_aware;
> > 
> >   	if (write && !capable(CAP_SYS_ADMIN))
> >   		return -EPERM;
> > @@ -233,8 +235,11 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> >   	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> >   	if (!ret && write) {
> >   		state = static_branch_unlikely(&sched_energy_present);
> > -		if (state != sysctl_sched_energy_aware)
> > +		if (state != sysctl_sched_energy_aware && prev_val != sysctl_sched_energy_aware) {
> > +			if (sysctl_sched_energy_aware && !state)
> > +				pr_warn("Attempt to build energy domains when EAS is disabled\n");
> >   			rebuild_sched_domains_energy();
> > +		}
> >   	}
> > 
> >   	return ret;
> > @@ -255,7 +260,14 @@ static struct ctl_table sched_energy_aware_sysctls[] = {
> > 
> >   static int __init sched_energy_aware_sysctl_init(void)
> >   {
> > -	register_sysctl_init("kernel", sched_energy_aware_sysctls);
> > +	int cpu = cpumask_first(cpu_active_mask);
> > +
> > +	if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
> > +	    !arch_scale_freq_invariant())
> > +		return 0;
> > +
> > +	sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> > +	sysctl_sched_energy_aware = 1;
> >   	return 0;
> >   }
> > 
> > @@ -336,10 +348,28 @@ static void sched_energy_set(bool has_eas)
> >   		if (sched_debug())
> >   			pr_info("%s: stopping EAS\n", __func__);
> >   		static_branch_disable_cpuslocked(&sched_energy_present);
> > +#ifdef CONFIG_PROC_SYSCTL
> > +		/*
> > +		 * if the architecture supports EAS and forcefully
> > +		 * perf domains are destroyed, there should be a sysctl
> > +		 * to enable it later. If this was due to dynamic system
> > +		 * change such as SMT<->NON_SMT then remove sysctl.
> > +		 */
> > +		if (sysctl_eas_header && !sched_energy_update) {
> > +			unregister_sysctl_table(sysctl_eas_header);
> > +			sysctl_eas_header = NULL;
> > +		}
> > +#endif
> > +		sysctl_sched_energy_aware = 0;
> >   	} else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
> >   		if (sched_debug())
> >   			pr_info("%s: starting EAS\n", __func__);
> >   		static_branch_enable_cpuslocked(&sched_energy_present);
> > +#ifdef CONFIG_PROC_SYSCTL
> > +		if (!sysctl_eas_header)
> > +			sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> > +#endif
> > +		sysctl_sched_energy_aware = 1;
> >   	}
> >   }
> > 
> > @@ -380,15 +410,14 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
> >   	struct cpufreq_policy *policy;
> >   	struct cpufreq_governor *gov;
> > 
> > -	if (!sysctl_sched_energy_aware)
> > +	if (!sysctl_sched_energy_aware && sched_energy_update)
> >   		goto free;
> > 
> >   	/* EAS is enabled for asymmetric CPU capacity topologies. */
> >   	if (!per_cpu(sd_asym_cpucapacity, cpu)) {
> > -		if (sched_debug()) {
> > -			pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
> > -					cpumask_pr_args(cpu_map));
> > -		}
> > +		if (sched_debug())
> > +			pr_info("rd %*pbl: Disabling EAS,  CPUs do not have asymmetric capacities\n",
> > +				cpumask_pr_args(cpu_map));
> >   		goto free;
> >   	}
> > 
> > --
> > 2.31.1
> > 
> > 
> 
> Regards,
> Pierre


  reply	other threads:[~2023-09-05 21:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-01  6:52 [PATCH v2] sched/topology: remove sysctl_sched_energy_aware depending on the architecture Shrikanth Hegde
2023-09-01  9:49 ` Chen Yu
2023-09-01 13:08   ` Shrikanth Hegde
2023-09-07 10:21   ` Pierre Gondois
2023-09-07 15:04     ` Chen Yu
2023-09-05 14:03 ` Pierre Gondois
2023-09-05 21:53   ` Tim Chen [this message]
2023-09-06  3:21   ` Shrikanth Hegde
2023-09-06 11:35   ` Shrikanth Hegde
2023-09-07 10:34     ` Pierre Gondois
2023-09-13 11:55       ` Shrikanth Hegde
  -- strict thread matches above, loose matches on Subject: below --
2023-09-01  6:51 Shrikanth Hegde
2023-09-01  6:57 ` Shrikanth Hegde

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b23259e62826ee8be14c6fe5dcb4bfad40d4bee.camel@linux.intel.com \
    --to=tim.c.chen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=ionela.voinescu@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pierre.gondois@arm.com \
    --cc=quentin.perret@arm.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=sshegde@linux.vnet.ibm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox