Re: [RFC PATCH 3/3] sched/fair: Traverse cpufreq policies to detect capacity inversion

public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed

From: Lukasz Luba <lukasz.luba@arm.com>
To: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	linux-pm@vger.kernel.org,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel@vger.kernel.org, Wei Wang <wvw@google.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>, Hank <han.lin@mediatek.com>,
	Jonathan JMChen <Jonathan.JMChen@mediatek.com>
Subject: Re: [RFC PATCH 3/3] sched/fair: Traverse cpufreq policies to detect capacity inversion
Date: Tue, 13 Dec 2022 17:42:40 +0000	[thread overview]
Message-ID: <19bd3f60-63ea-4ccc-b5a2-6507276c8f0d@arm.com> (raw)
In-Reply-To: <20221212184317.sntxy3h6k44oz4mo@airbuntu>

Hi Qais,

I thought I could help with this issue.

On 12/12/22 18:43, Qais Yousef wrote:
> On 12/09/22 17:47, Vincent Guittot wrote:
> 
> [...]
> 
>>>>>> This patch loops on all cpufreq policy in sched softirq, how can this
>>>>>> be sane ? and not only in eas mode but also in the default asymmetric
>>>>>
>>>>> Hmm I'm still puzzled. Why it's not sane to do it here but it's okay to do it
>>>>> in the wake up path in feec()?
>>>>
>>>> feec() should be considered as an exception not as the default rule.
>>>> Thing like above which loops for_each on external subsystem should be
>>>> prevented and the fact that feec loops all PDs doesn't means that we
>>>> can put that everywhere else
>>>
>>> Fair enough. But really understanding the root cause behind this limitation
>>> will be very helpful. I don't have the same appreciation of why this is
>>> a problem, and shedding more light will help me to think more about it in the
>>> future.
>>>
>>
>> Take the example of 1k cores with per cpu policy. Do you really think a
>> for_each_cpufreq_policy would be reasonable ?
> 
> Hmm I don't think such an HMP system makes sense to ever exist.
> 
> That system has to be a multi-socket system and I doubt inversion detection is
> something of value.
> 
> Point taken anyway. Let's find another way to do this.
> 

Another way might be to use the 'update' code path, which sets this
information source, for the thermal pressure. That code path isn't as
hot as this in the task scheduler. Furthermore, we would also
have time and handle properly CPU hotplug callbacks there.

So something like this, I have in mind:

------------------------------8<-----------------------------
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index e7d6e6657ffa..7f372a93e21b 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -16,6 +16,7 @@
  #include <linux/sched/topology.h>
  #include <linux/cpuset.h>
  #include <linux/cpumask.h>
+#include <linux/mutex.h>
  #include <linux/init.h>
  #include <linux/rcupdate.h>
  #include <linux/sched.h>
@@ -153,6 +154,33 @@ void topology_set_freq_scale(const struct cpumask 
*cpus, unsigned long cur_freq,
  DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
  DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
  EXPORT_PER_CPU_SYMBOL_GPL(cpu_scale);

+static struct cpumask highest_capacity_mask;

+static struct cpumask highest_capacity_mask;
+static unsigned int max_possible_capacity;
+static DEFINE_MUTEX(max_capacity_lock);
+
+static void max_capacity_update(const struct cpumask *cpus,
+                               unsigned long capacity)
+{
+       mutex_lock(&max_capacity_lock);
+
+       if (max_possible_capacity < capacity) {
+               max_possible_capacity = capacity;
+
+               cpumask_clear(&highest_capacity_mask);
+
+               cpumask_or(&highest_capacity_mask,
+                          &highest_capacity_mask, cpus);
+       }
+
+       mutex_unlock(&max_capacity_lock);
+}
+
+bool topology_test_max_cpu_capacity(unsigned int cpu)
+{
+       return cpumask_test_cpu(cpu, &highest_capacity_mask);
+}
+EXPORT_SYMBOL_GPL(topology_test_max_cpu_capacity);
+
  void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
  {
         per_cpu(cpu_scale, cpu) = capacity;
@@ -203,6 +231,8 @@ void topology_update_thermal_pressure(const struct 
cpumask *cpus,

         for_each_cpu(cpu, cpus)
                 WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
+
+       max_capacity_update(cpus, capacity);
  }
  EXPORT_SYMBOL_GPL(topology_update_thermal_pressure);


--------------------------->8--------------------------------

We could use the RCU if there is a potential to read racy date
while the updater modifies the mask in the meantime. Mutex is to
serialize the thermal writers which might be kicked for two
policies at the same time.

If you like I can develop and test such code in the arch_topology.c

Regards,
Lukasz

next prev parent reply	other threads:[~2022-12-13 17:42 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-27 14:17 [PATCH 0/3] Fixes for uclamp and capacity inversion detection Qais Yousef
2022-11-27 14:17 ` [PATCH 1/3] sched/uclamp: Fix a uninitialized variable warnings Qais Yousef
2022-12-01 22:38   ` Dietmar Eggemann
2022-12-03 14:32     ` Qais Yousef
2022-12-08 14:51     ` [PATCH v2] " Qais Yousef
2022-11-27 14:17 ` [PATCH 2/3] sched/fair: Fixes for capacity inversion detection Qais Yousef
2022-12-01 22:39   ` Dietmar Eggemann
2022-12-03 14:32     ` Qais Yousef
2022-12-08 14:54     ` [PATCH v2] " Qais Yousef
2022-12-08 14:58       ` Qais Yousef
2022-11-27 14:17 ` [RFC PATCH 3/3] sched/fair: Traverse cpufreq policies to detect capacity inversion Qais Yousef
2022-11-30 18:27   ` Rafael J. Wysocki
2022-12-03 14:32     ` Qais Yousef
2022-12-05 12:39       ` Rafael J. Wysocki
2022-12-05 14:09         ` Qais Yousef
2022-12-02 14:57   ` Vincent Guittot
2022-12-03 14:33     ` Qais Yousef
2022-12-04 11:35       ` Vincent Guittot
2022-12-05 11:01         ` Qais Yousef
2022-12-06 18:12           ` Vincent Guittot
2022-12-08 14:05             ` Qais Yousef
2022-12-09 16:47               ` Vincent Guittot
2022-12-12 18:43                 ` Qais Yousef
2022-12-13 17:38                   ` Dietmar Eggemann
2022-12-15 17:46                     ` Vincent Guittot
2022-12-20 11:50                     ` Qais Yousef
2022-12-13 17:42                   ` Lukasz Luba [this message]
2022-12-20 11:51                     ` Qais Yousef
2022-12-20 12:52                       ` Lukasz Luba
2022-12-15 17:39                   ` Vincent Guittot
2022-12-20 12:32                     ` Qais Yousef
2022-12-20 13:50                       ` Vincent Guittot
2022-12-23 11:58                         ` Qais Yousef
2022-12-27 13:33                           ` Vincent Guittot
2022-12-28 17:18                             ` Vincent Guittot
2023-01-09 16:40                             ` Qais Yousef
2023-01-10 16:38                               ` Vincent Guittot
2023-01-10 16:44                                 ` Qais Yousef

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e7d6e6657ff dfblob:7f372a93e21 )
 OR (
bs:"Re: [RFC PATCH 3/3] sched/fair: Traverse cpufreq policies to detect capacity inversion" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=19bd3f60-63ea-4ccc-b5a2-6507276c8f0d@arm.com \
    --to=lukasz.luba@arm.com \
    --cc=Jonathan.JMChen@mediatek.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=han.lin@mediatek.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qyousef@layalina.io \
    --cc=rafael@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    --cc=wvw@google.com \
    --cc=xuewen.yan94@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox